|
|
|
You can’t get there from here...
Howard, I have a file on my AS/400 that contains
flight segments. I need to identify all jobs by departing and final
destination airport codes. The file is keyed by our job number and
could have from 1 to 4 records per job number. It would be easy if
every flight was direct as the departure and destination codes would
be on the same record. The tough part of the search is when the
departure and final destination codes are on different records.
The file details the date of a flight, the
departure city and arrival city, flight number, sequence and a code
indicating if that is the final destination of the flight.
The flight numbers can change between legs of the journey, but
a job number field is unique for a set of flights from origination to
destination. As an
example, I might want to find all flights departing JAX with a final
destination of ORL. Oh SQL Guru, can you show me the light?
Alex.
Dear Alex, this is a real neat question because
we get to join a table to itself in order to answer the question. In
fact, this is probably the best type of question to illustrate the
concept of self-joins.
First, we are going to need a table to hold the sample data.
This table will have the fields FDATE for the flight date, JOBN
for the job number, FLTN to hold the flight number, DPC for the
departure city, DSC for the destination city, LEG to indicate if it is
the final destination and SEQ to indicate the sequence of the flight
in the JOB. Records in the table are uniquely keyed by JOBN and SEQ.
Here are the SQL statements to create the test table and insert our
sample data:
CREATE
TABLE MCEIS.FLIGHTS
(FDATE DATE NOT NULL,
JOBN
INTEGER NOT NULL,
FLTN
CHAR(4) NOT NULL,
DPC
CHAR(3) NOT NULL,
DSC
CHAR(3) NOT NULL,
LEG
CHAR(1) NOT NULL,
SEQ
SMALLINT NOT NULL,
PRIMARY KEY
(JOBN,SEQ));
INSERT
INTO MCEIS.FLIGHTS VALUES ('11/01/2002',1,'1321','JAX','ORL','F',1);
INSERT INTO MCEIS.FLIGHTS VALUES
('11/01/2002',2,'0012','JAX','ATL','C',1); INSERT INTO
MCEIS.FLIGHTS VALUES ('11/01/2002',2,'0012','ATL','SFX','C',2);
INSERT INTO MCEIS.FLIGHTS VALUES
('11/01/2002',2,'0091','SFX','JPL','C',3); INSERT INTO
MCEIS.FLIGHTS VALUES ('11/01/2002',2,'0456','JPL','ORL','F',4);
INSERT INTO MCEIS.FLIGHTS VALUES
('11/01/2002',3,'9876','ATL','MIA','C',1); INSERT INTO
MCEIS.FLIGHTS VALUES ('11/01/2002',3,'1132','MIA','ORL','C',2);
INSERT INTO MCEIS.FLIGHTS VALUES
('11/01/2002',3,'1132','ORL','JAX','C',3); INSERT INTO
MCEIS.FLIGHTS VALUES ('11/01/2002',3,'1132','JAX','ATL','F',4);
INSERT INTO MCEIS.FLIGHTS VALUES
('11/01/2002',4,'3213','IAD','LAX','F',1); INSERT INTO
MCEIS.FLIGHTS VALUES ('11/01/2002',5,'3320','IAD','BWI','C',1);
INSERT INTO MCEIS.FLIGHTS VALUES
('11/01/2002',5,'5555','BWI','ORD','C',2); INSERT INTO
MCEIS.FLIGHTS VALUES ('11/01/2002',5,'1254','ORD','DFW','C',3);
INSERT INTO MCEIS.FLIGHTS VALUES
('11/01/2002',5,'3213','DFW','LAX','F',4); INSERT INTO
MCEIS.FLIGHTS VALUES ('11/01/2002',6,'0212','JAX','ORL','C',1);
INSERT INTO MCEIS.FLIGHTS VALUES
('11/01/2002',6,'0412','ORL','ATL','C',2); INSERT INTO
MCEIS.FLIGHTS VALUES ('11/01/2002',6,'0591','ATL','PHX','F',3);
Here is a table of the information for reference:
FDATE
|
JOBN
|
FLTN
|
DPC
|
DSC
|
LEG
|
SEQ
|
11/01/2002
|
1
|
1321
|
JAX
|
ORL
|
F
|
1
|
11/01/2002
|
2
|
12
|
JAX
|
ATL
|
C
|
1
|
11/01/2002
|
2
|
12
|
ATL
|
SFX
|
C
|
2
|
11/01/2002
|
2
|
91
|
SFX
|
JPL
|
C
|
3
|
11/01/2002
|
2
|
456
|
JPL
|
ORL
|
F
|
4
|
11/01/2002
|
3
|
9876
|
ATL
|
MIA
|
C
|
1
|
11/01/2002
|
3
|
1132
|
MIA
|
ORL
|
C
|
2
|
11/01/2002
|
3
|
1132
|
ORL
|
JAX
|
C
|
3
|
11/01/2002
|
3
|
1132
|
JAX
|
ATL
|
F
|
4
|
11/01/2002
|
4
|
3213
|
IAD
|
LAX
|
F
|
1
|
11/01/2002
|
5
|
3320
|
IAD
|
BWI
|
C
|
1
|
11/01/2002
|
5
|
5555
|
BWI
|
ORD
|
C
|
2
|
11/01/2002
|
5
|
1254
|
ORD
|
DFW
|
C
|
3
|
11/01/2002
|
5
|
3213
|
DFW
|
LAX
|
F
|
4
|
11/01/2002
|
6
|
212
|
JAX
|
ORL
|
C
|
1
|
11/01/2002
|
6
|
412
|
ORL
|
ATL
|
C
|
2
|
11/01/2002
|
6
|
591
|
ATL
|
PHX
|
F
|
3
|
So, at this point we have a file called flights
filled with one days data about 6 jobs. First, lets construct a query
that finds all flights that originate from JAX and have a final
destination of ORL. To do this, I want to create two sets from the
table. The first set will
contain all flights that leave from JAX.
A query that returns all JAX departing flights would be:
SELECT FDATE, JOBN, DPC, DSC FROM
MCEIS.FLIGHTS WHERE DPC=’JAX’;
Fields:
FDATE
|
JOBN
|
DPC
|
DSC
|
11/01/2002
|
1
|
JAX
|
ORL
|
11/01/2002
|
2
|
JAX
|
ATL
|
11/01/2002
|
3
|
JAX
|
ATL
|
11/01/2002
|
6
|
JAX
|
ORL
|
The second set contains all flights that have a
final destination of ORL.
The query to final all flights that have a final destination of ORL
is:
SELECT FDATE, JOBN, DPC, DSC FROM
MCEIS.FLIGHTS WHERE DSC=’ORL’ AND LEG=’F’
Yields:
FDATE
|
JOBN
|
DPC
|
DSC
|
11/01/2002
|
1
|
JAX
|
ORL
|
11/01/2002
|
2
|
JPL
|
ORL
|
So, if you look at the results of query 1, we
have 4 flights that depart from JAX: 1,2,3 and 6.
Query 2 shows that we have two flights with final destinations
of ORL: 1 and 2. Now, we
want to join the results of Query 1 with Query 2.
The following statement accomplishes that:
SELECT A.FDATE, A.JOBN,
A.FLTN AS ORIGIN_FN, B.FLTN AS DEST_FN,
A.DPC AS ORIGIN, B.DSC AS DEST FROM MCEIS.FLIGHTS AS A
INNER JOIN MCEIS.FLIGHTS AS B
ON (A.JOBN=B.JOBN) WHERE B.LEG='F' AND A.DPC='JAX' AND
B.DSC='ORL';
Yields:
FDATE
|
JOBN
|
ORIGIN_FN
|
DEST_FN
|
ORIGIN
|
DEST
|
11/01/2002
|
1
|
1321
|
1321
|
JAX
|
ORL
|
11/01/2002
|
2
|
12
|
456
|
JAX
|
ORL
|
Look at the FROM clause in the above statement
and notice that we are selecting from FLIGHTS and naming it A and then
joining to FLIGHTS and naming it B.
You can join a table to itself in SQL, as long as you give it a
unique name so you can tell SQL which columns you want.
Next, look at the ON clause of the join.
It is specifying that we want to join the sets when the A.JOBN
is equal to the B.JOBN. Finally the WHERE clause indicates we want
records from the A table when the DPC=’JAX’ and from the B table when
the DSC=’ORL’ and the LEG=’F’. This is a perfect example of joining a
table to itself to answer a query.
More Flights of Fancy
Now, lets suppose you want to find all flights
that go from JAX to ORL but ORL does not have to be the final leg of
the flight. You might
think that you could just drop the B.LEG = ‘F’ from the query.
However, look at the results that would generate:
FDATE
|
JOBN
|
ORIGIN_FN
|
DEST_FN
|
ORIGIN
|
DEST
|
11/01/2002
|
1
|
1321
|
1321
|
JAX
|
ORL
|
11/01/2002
|
2
|
12
|
456
|
JAX
|
ORL
|
11/01/2002
|
3
|
1132
|
1132
|
JAX
|
ORL
|
11/01/2002
|
6
|
212
|
212
|
JAX
|
ORL
|
Take a look at the route of flight 3.
It leaves ATL and goes to MIA, then MIA to ORL. It then leaves
ORL and goes to JAX and then leaves JAX and goes back to ATL.
This flight never departs JAX and arrives at ORL.
In order to ensure that you see only flights departing JAX and
arriving ORL, you need to execute the following query:
SELECT A.FDATE, A.JOBN,
A.FLTN AS ORIGIN_FN, B.FLTN AS DEST_FN,
A.DPC AS ORIGIN, B.DSC AS DEST FROM MCEIS.FLIGHTS AS A
INNER JOIN MCEIS.FLIGHTS AS B
ON (A.JOBN=B.JOBN) WHERE A.DPC='JAX' AND B.DSC='ORL' AND
A.SEQ<=B.SEQ;
Yields:
FDATE
|
JOBN
|
ORIGIN_FN
|
DEST_FN
|
ORIGIN
|
DEST
|
11/01/2002
|
1
|
1321
|
1321
|
JAX
|
ORL
|
11/01/2002
|
2
|
12
|
456
|
JAX
|
ORL
|
11/01/2002
|
6
|
212
|
212
|
JAX
|
ORL
|
Because we quantified that the A.SEQ should be
less than or equal to the B.SEQ we have eliminated flight 3 from the
result set. Remember, the
SEQ field tells the sequence of the leg. Now, if we wanted to find
only direct flights from JAX to ORL, we simply change the above query
to be A.SEQ=B.SEQ. This would result in flights 1 and 6, which are the
only flights that can take us from JAX to ORL with no hops in between.
Retrieving the routes
OK, what if we wanted to find the routes for all
flights that depart JAX and arrive ORL. The IN clause allows us to
search for records that have a value of a field in a set of values.
The set of values can be an expression or it can be the results
of an SQL statement. The
following statement lists the routes of all flights that leave JAX and
arrive, (at some point), in ORL:
SELECT * FROM MCEIS.FLIGHTS WHERE JOBN IN
(SELECT A.JOBN FROM MCEIS.FLIGHTS AS A INNER JOIN
MCEIS.FLIGHTS AS B
ON (A.JOBN=B.JOBN) WHERE A.DPC='JAX' AND B.DSC='ORL' AND
A.SEQ<=B.SEQ) ORDER BY JOBN, SEQ;
SQL will execute the query inside the IN clause
to get a set of job numbers. It will then join the set of job numbers
to the FLIGHTS table and sequence that result set by the JOBN and SEQ
fields. The results look like this:
FDATE
|
JOBN
|
FLTN
|
DPC
|
DSC
|
LEG
|
SEQ
|
11/01/2002
|
1
|
1321
|
JAX
|
ORL
|
F
|
1
|
11/01/2002
|
2
|
12
|
JAX
|
ATL
|
C
|
1
|
11/01/2002
|
2
|
12
|
ATL
|
SFX
|
C
|
2
|
11/01/2002
|
2
|
91
|
SFX
|
JPL
|
C
|
3
|
11/01/2002
|
2
|
456
|
JPL
|
ORL
|
F
|
4
|
11/01/2002
|
6
|
212
|
JAX
|
ORL
|
C
|
1
|
11/01/2002
|
6
|
412
|
ORL
|
ATL
|
C
|
2
|
11/01/2002
|
6
|
591
|
ATL
|
PHX
|
F
|
3
|
Calculating Hops
Finally, here is one other interesting piece of
information you can gather when looking at the set, the number of
stops required before you reach the destination. To calculate this,
simply subtract the B.SEQ from the A.SEQ like the following query:
SELECT A.FDATE, A.JOBN,
A.FLTN AS ORIGIN_FN, B.FLTN AS DEST_FN,
A.DPC AS ORIGIN, B.DSC AS DEST,
B.SEQ-A.SEQ AS STOPS FROM MCEIS.FLIGHTS AS A
INNER JOIN MCEIS.FLIGHTS AS B
ON (A.JOBN=B.JOBN) WHERE A.DPC='JAX' AND B.DSC='ORL' AND
A.SEQ<=B.SEQ;
Yields:
FDATE
|
JOBN
|
ORIGIN_FN
|
DEST_FN
|
ORIGIN
|
DEST
|
STOPS
|
11/01/2002
|
1
|
1321
|
1321
|
JAX
|
ORL
|
0
|
11/01/2002
|
2
|
12
|
456
|
JAX
|
ORL
|
3
|
11/01/2002
|
6
|
212
|
212
|
JAX
|
ORL
|
0
|
And this one shows all flights departure cities
and final destination cities and the number of stops on the flight:
SELECT A.JOBN, A.DPC AS ORIGIN,
B.DSC AS DEST, B.SEQ-A.SEQ AS STOPS FROM MCEIS.FLIGHTS AS A
INNER JOIN MCEIS.FLIGHTS AS B
ON (A.JOBN=B.JOBN) WHERE A.SEQ=1 AND B.LEG='F';
Yields:
JOBN
|
ORIGIN
|
DEST
|
STOPS
|
1
|
JAX
|
ORL
|
0
|
2
|
JAX
|
ORL
|
3
|
3
|
ATL
|
ATL
|
3
|
4
|
IAD
|
LAX
|
0
|
5
|
IAD
|
LAX
|
3
|
6
|
JAX
|
PHX
|
2
|
The A set includes only records that have a SEQ =
1, (start of JOB), and the B set includes only records that have a LEG
of ‘F’, (final destination).
Alex, thanks for a good question that
illuminates the power of self joins.
I think that most of the readers will be able to find ways to
use this technique on their own data.
|
|
|
|