Home

You can’t get there from here...

Howard, I have a file on my AS/400 that contains flight segments. I need to identify all jobs by departing and final destination airport codes. The file is keyed by our job number and could have from 1 to 4 records per job number. It would be easy if every flight was direct as the departure and destination codes would be on the same record. The tough part of the search is when the departure and final destination codes are on different records.

The file details the date of a flight, the departure city and arrival city, flight number, sequence and a code indicating if that is the final destination of the flight.  The flight numbers can change between legs of the journey, but a job number field is unique for a set of flights from origination to destination.  As an example, I might want to find all flights departing JAX with a final destination of ORL. Oh SQL Guru, can you show me the light?

Alex.

Dear Alex, this is a real neat question because we get to join a table to itself in order to answer the question. In fact, this is probably the best type of question to illustrate the concept of self-joins.  First, we are going to need a table to hold the sample data.  This table will have the fields FDATE for the flight date, JOBN for the job number, FLTN to hold the flight number, DPC for the departure city, DSC for the destination city, LEG to indicate if it is the final destination and SEQ to indicate the sequence of the flight in the JOB. Records in the table are uniquely keyed by JOBN and SEQ. Here are the SQL statements to create the test table and insert our sample data:

 

CREATE TABLE MCEIS.FLIGHTS
            
(FDATE DATE NOT NULL,
                 JOBN                    INTEGER NOT NULL,
                 FLTN                    CHAR(4) NOT NULL,
                 DPC                       CHAR(3) NOT NULL,
                 DSC                       CHAR(3) NOT NULL,
                 LEG                       CHAR(1) NOT NULL,
                 SEQ                       SMALLINT NOT NULL,
                 PRIMARY KEY (JOBN,SEQ));

INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',1,'1321','JAX','ORL','F',1);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',2,'0012','JAX','ATL','C',1);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',2,'0012','ATL','SFX','C',2);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',2,'0091','SFX','JPL','C',3);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',2,'0456','JPL','ORL','F',4);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',3,'9876','ATL','MIA','C',1);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',3,'1132','MIA','ORL','C',2);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',3,'1132','ORL','JAX','C',3);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',3,'1132','JAX','ATL','F',4);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',4,'3213','IAD','LAX','F',1);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',5,'3320','IAD','BWI','C',1);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',5,'5555','BWI','ORD','C',2);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',5,'1254','ORD','DFW','C',3);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',5,'3213','DFW','LAX','F',4);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',6,'0212','JAX','ORL','C',1);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',6,'0412','ORL','ATL','C',2);
INSERT INTO MCEIS.FLIGHTS VALUES ('11/01/2002',6,'0591','ATL','PHX','F',3);

Here is a table of the information for reference:

FDATE

JOBN

FLTN

DPC

DSC

LEG

SEQ

11/01/2002

1

1321

JAX

ORL

F

1

11/01/2002

2

12

JAX

ATL

C

1

11/01/2002

2

12

ATL

SFX

C

2

11/01/2002

2

91

SFX

JPL

C

3

11/01/2002

2

456

JPL

ORL

F

4

11/01/2002

3

9876

ATL

MIA

C

1

11/01/2002

3

1132

MIA

ORL

C

2

11/01/2002

3

1132

ORL

JAX

C

3

11/01/2002

3

1132

JAX

ATL

F

4

11/01/2002

4

3213

IAD

LAX

F

1

11/01/2002

5

3320

IAD

BWI

C

1

11/01/2002

5

5555

BWI

ORD

C

2

11/01/2002

5

1254

ORD

DFW

C

3

11/01/2002

5

3213

DFW

LAX

F

4

11/01/2002

6

212

JAX

ORL

C

1

11/01/2002

6

412

ORL

ATL

C

2

11/01/2002

6

591

ATL

PHX

F

3

So, at this point we have a file called flights filled with one days data about 6 jobs. First, lets construct a query that finds all flights that originate from JAX and have a final destination of ORL. To do this, I want to create two sets from the table.  The first set will contain all flights that leave from JAX.  A query that returns all JAX departing flights would be:

 SELECT FDATE, JOBN, DPC, DSC
FROM MCEIS.FLIGHTS
WHERE DPC=’JAX’;

Fields:

FDATE

JOBN

DPC

DSC

11/01/2002

1

JAX

ORL

11/01/2002

2

JAX

ATL

11/01/2002

3

JAX

ATL

11/01/2002

6

JAX

ORL

The second set contains all flights that have a final destination of ORL.  The query to final all flights that have a final destination of ORL is:

SELECT FDATE, JOBN, DPC, DSC
FROM MCEIS.FLIGHTS
WHERE DSC=’ORL’ AND LEG=’F’

Yields:

FDATE

JOBN

DPC

DSC

11/01/2002

1

JAX

ORL

11/01/2002

2

JPL

ORL

So, if you look at the results of query 1, we have 4 flights that depart from JAX: 1,2,3 and 6.  Query 2 shows that we have two flights with final destinations of ORL: 1 and 2.  Now, we want to join the results of Query 1 with Query 2.  The following statement accomplishes that:

 SELECT A.FDATE, A.JOBN,
          
A.FLTN AS ORIGIN_FN, B.FLTN AS DEST_FN,
            A.DPC AS ORIGIN, B.DSC AS DEST
FROM MCEIS.FLIGHTS AS A
INNER JOIN MCEIS.FLIGHTS AS B
                        ON (A.JOBN=B.JOBN)
WHERE B.LEG='F' AND A.DPC='JAX' AND B.DSC='ORL';

Yields:

FDATE

JOBN

ORIGIN_FN

DEST_FN

ORIGIN

DEST

11/01/2002

1

1321

1321

JAX

ORL

11/01/2002

2

12

456

JAX

ORL

Look at the FROM clause in the above statement and notice that we are selecting from FLIGHTS and naming it A and then joining to FLIGHTS and naming it B.  You can join a table to itself in SQL, as long as you give it a unique name so you can tell SQL which columns you want.  Next, look at the ON clause of the join.  It is specifying that we want to join the sets when the A.JOBN is equal to the B.JOBN. Finally the WHERE clause indicates we want records from the A table when the DPC=’JAX’ and from the B table when the DSC=’ORL’ and the LEG=’F’. This is a perfect example of joining a table to itself to answer a query.

More Flights of Fancy

Now, lets suppose you want to find all flights that go from JAX to ORL but ORL does not have to be the final leg of the flight.  You might think that you could just drop the B.LEG = ‘F’ from the query.  However, look at the results that would generate:

FDATE

JOBN

ORIGIN_FN

DEST_FN

ORIGIN

DEST

11/01/2002

1

1321

1321

JAX

ORL

11/01/2002

2

12

456

JAX

ORL

11/01/2002

3

1132

1132

JAX

ORL

11/01/2002

6

212

212

JAX

ORL

Take a look at the route of flight 3.  It leaves ATL and goes to MIA, then MIA to ORL. It then leaves ORL and goes to JAX and then leaves JAX and goes back to ATL.  This flight never departs JAX and arrives at ORL.  In order to ensure that you see only flights departing JAX and arriving ORL, you need to execute the following query:

 SELECT A.FDATE, A.JOBN,
          
A.FLTN AS ORIGIN_FN, B.FLTN AS DEST_FN,
            A.DPC AS ORIGIN, B.DSC AS DEST
FROM MCEIS.FLIGHTS AS A
INNER JOIN MCEIS.FLIGHTS AS B
                        ON (A.JOBN=B.JOBN)
WHERE A.DPC='JAX' AND B.DSC='ORL' AND A.SEQ<=B.SEQ;

Yields:

FDATE

JOBN

ORIGIN_FN

DEST_FN

ORIGIN

DEST

11/01/2002

1

1321

1321

JAX

ORL

11/01/2002

2

12

456

JAX

ORL

11/01/2002

6

212

212

JAX

ORL

Because we quantified that the A.SEQ should be less than or equal to the B.SEQ we have eliminated flight 3 from the result set.  Remember, the SEQ field tells the sequence of the leg. Now, if we wanted to find only direct flights from JAX to ORL, we simply change the above query to be A.SEQ=B.SEQ. This would result in flights 1 and 6, which are the only flights that can take us from JAX to ORL with no hops in between.

Retrieving the routes

OK, what if we wanted to find the routes for all flights that depart JAX and arrive ORL. The IN clause allows us to search for records that have a value of a field in a set of values.  The set of values can be an expression or it can be the results of an SQL statement.  The following statement lists the routes of all flights that leave JAX and arrive, (at some point), in ORL:

SELECT *
FROM MCEIS.FLIGHTS
WHERE JOBN IN
(SELECT A.JOBN
FROM MCEIS.FLIGHTS AS A
INNER JOIN MCEIS.FLIGHTS AS B
                        ON (A.JOBN=B.JOBN)
WHERE A.DPC='JAX' AND B.DSC='ORL' AND A.SEQ<=B.SEQ)
ORDER BY JOBN, SEQ;

SQL will execute the query inside the IN clause to get a set of job numbers. It will then join the set of job numbers to the FLIGHTS table and sequence that result set by the JOBN and SEQ fields. The results look like this:

FDATE

JOBN

FLTN

DPC

DSC

LEG

SEQ

11/01/2002

1

1321

JAX

ORL

F

1

11/01/2002

2

12

JAX

ATL

C

1

11/01/2002

2

12

ATL

SFX

C

2

11/01/2002

2

91

SFX

JPL

C

3

11/01/2002

2

456

JPL

ORL

F

4

11/01/2002

6

212

JAX

ORL

C

1

11/01/2002

6

412

ORL

ATL

C

2

11/01/2002

6

591

ATL

PHX

F

3


Calculating Hops

Finally, here is one other interesting piece of information you can gather when looking at the set, the number of stops required before you reach the destination. To calculate this, simply subtract the B.SEQ from the A.SEQ like the following query:

SELECT A.FDATE, A.JOBN,
            A.FLTN AS ORIGIN_FN, B.FLTN AS DEST_FN,
            A.DPC AS ORIGIN, B.DSC AS DEST,
            B.SEQ-A.SEQ AS STOPS
FROM MCEIS.FLIGHTS AS A
            INNER JOIN MCEIS.FLIGHTS AS B
            ON (A.JOBN=B.JOBN)
WHERE A.DPC='JAX' AND B.DSC='ORL' AND A.SEQ<=B.SEQ;

Yields:

FDATE

JOBN

ORIGIN_FN

DEST_FN

ORIGIN

DEST

STOPS

11/01/2002

1

1321

1321

JAX

ORL

0

11/01/2002

2

12

456

JAX

ORL

3

11/01/2002

6

212

212

JAX

ORL

0

And this one shows all flights departure cities and final destination cities and the number of stops on the flight:

 SELECT A.JOBN, A.DPC AS ORIGIN,
          
B.DSC AS DEST, B.SEQ-A.SEQ AS STOPS
FROM MCEIS.FLIGHTS AS A INNER JOIN MCEIS.FLIGHTS AS B
            ON (A.JOBN=B.JOBN)
WHERE A.SEQ=1 AND B.LEG='F';

Yields:

JOBN

ORIGIN

DEST

STOPS

1

JAX

ORL

0

2

JAX

ORL

3

3

ATL

ATL

3

4

IAD

LAX

0

5

IAD

LAX

3

6

JAX

PHX

2

The A set includes only records that have a SEQ = 1, (start of JOB), and the B set includes only records that have a LEG of ‘F’, (final destination).

 Alex, thanks for a good question that illuminates the power of self joins.  I think that most of the readers will be able to find ways to use this technique on their own data.