[Puzzles] => 简单查询SQL举例
这是[SQL.Puzzles]的第二篇,都是一些简单的SQL处理,但是整体上来看,这本书还是有点深度的,不像别的一些SQL入门级读物,举的例子都很弱智,起码还考虑到了很多实际工作上会遇到的问题。不过这篇里还是比较简单的,主要是要关注外键删除和工作日的考虑,这个问题非常实际。[SQL.Puzzles]再转这么一篇,以后就等有看着顺眼的SQL再转了,太简单的就直接忽视。
2 . ABSENTEES
This problem was presented on the MS ACCESS forum on CompuServe
by Jim Chupella. He wanted to create a database that tracks employee
absentee rates. Here is the table you will use:
CREATE TABLE Absenteeism
(emp_id INTEGER NOT NULL REFERENCES Personnel (emp_id),
absent_date DATE NOT NULL,
reason_code CHAR (40) NOT NULL REFERENCES ExcuseList(reason_code),
severity_points INTEGER NOT NULL CHECK (severity_points BETWEEN 1 AND 4),
PRIMARY KEY (emp_id, absent_date));
(emp_id INTEGER NOT NULL REFERENCES Personnel (emp_id),
absent_date DATE NOT NULL,
reason_code CHAR (40) NOT NULL REFERENCES ExcuseList(reason_code),
severity_points INTEGER NOT NULL CHECK (severity_points BETWEEN 1 AND 4),
PRIMARY KEY (emp_id, absent_date));
An employee ID number identifies each employee. The reason_code
is a short text explanation for the absence (for example, “hit by beer
truck,” “bad hair day,” and so on) that you pull from an ever-growing
and imaginative list, and severity point is a point system that scores the
penalty associated with the absence.
If an employee accrues 40 severity points within a one-year period,
you automatically discharge that employee. If an employee is absent
more than one day in a row, it is charged as a long-term illness, not as a
typical absence. The employee does not receive severity points on the
second, third, or later days, nor do those days count toward his or her
total absenteeism.
Your job is to write SQL to enforce these two business rules, changing
the schema if necessary.
Answer #1
Looking at the first rule on discharging personnel, the most common
design error is to try to drop the second, third, and later days from the
table. This approach messes up queries that count sick days, and makes chains of sick days very difficult to find.
The trick is to allow a severity score of zero, so you can track the longterm illness of an employee in the Absenteeism table. Simply change the
severity point declaration to “CHECK (severity_points BETWEEN 0 AND 4)” so that you can give a zero to those absences that do not count.
This is a trick newbies miss because storing a zero seems to be a waste of space, but zero is a number and the event is a fact that needs to be noted.
UPDATE Absenteeism
SET severity_points= 0,
reason_code = 'long term illness'
WHERE EXISTS
(SELECT *
FROM Absenteeism AS A2
WHERE Absenteeism.emp_id = A2.emp_id
AND Absenteeism.absent_date = (A2.absent_date - INTERVAL 1 DAY));
table. This approach messes up queries that count sick days, and makes chains of sick days very difficult to find.
The trick is to allow a severity score of zero, so you can track the longterm illness of an employee in the Absenteeism table. Simply change the
severity point declaration to “CHECK (severity_points BETWEEN 0 AND 4)” so that you can give a zero to those absences that do not count.
This is a trick newbies miss because storing a zero seems to be a waste of space, but zero is a number and the event is a fact that needs to be noted.
UPDATE Absenteeism
SET severity_points= 0,
reason_code = 'long term illness'
WHERE EXISTS
(SELECT *
FROM Absenteeism AS A2
WHERE Absenteeism.emp_id = A2.emp_id
AND Absenteeism.absent_date = (A2.absent_date - INTERVAL 1 DAY));
When a new row is inserted, this update will look for another absence
on the day before and change its severity point score and reason_code in
accordance with your first rule.
accordance with your first rule.
The second rule for firing an employee requires that you know what
his or her current point score is. You would write that query as follows:
SELECT emp_id, SUM(severity_points)
FROM Absenteeism
GROUP BY emp_id;
SELECT emp_id, SUM(severity_points)
FROM Absenteeism
GROUP BY emp_id;
This is the basis for a grouped subquery in the DELETE statement
you finally want. Personnel with less than 40 points will return a NULL,
and the test will fail.
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
The GROUP BY clause is not really needed in SQL-92, but some older
SQL implementations will require it.
Answer #2
Bert Scalzo, a senior instructor for Oracle Corporation, pointed out that the puzzle solution had two flaws and room for performance
improvements. The flaws are quite simple. First, the subquery does not check for personnel accruing 40 or more severity points within a one-year period,
as required. It requires the addition of a date range check in the WHERE
clause:
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
AND absent_date
BETWEEN CURRENT_TIMESTAMP - INTERVAL 365 DAYS
AND CURRENT_TIMESTAMP
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
AND absent_date
BETWEEN CURRENT_TIMESTAMP - INTERVAL 365 DAYS
AND CURRENT_TIMESTAMP
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
Second, this SQL code deletes only offending personnel and not their
absences. The related Absenteeism row must be either explicitly or
implicitly deleted as well. You could replicate the above deletion for the
Absenteeism table. However, the best solution is to add a cascading
deletion clause to the Absenteeism table declaration:
CREATE TABLE Absenteeism
( ... emp_id INTEGER NOT NULL
REFERENCES Personnel(emp_id)
ON DELETE CASCADE,
...);
( ... emp_id INTEGER NOT NULL
REFERENCES Personnel(emp_id)
ON DELETE CASCADE,
...);
The performance suggestions are based on some assumptions. If you
can safely assume that the UPDATE is run regularly and people do not
change their departments while they are absent, then you can improve
the UPDATE command’s subquery:
UPDATE Absenteeism AS A1
SET severity_points = 0,
reason_code = 'long term illness'
WHERE EXISTS
(SELECT *
FROM absenteeism as A2
WHERE A1.emp_id = A2.emp_id
AND (A1.absent_date + INTERVAL 1 DAY) = A2.absent_date);
UPDATE Absenteeism AS A1
SET severity_points = 0,
reason_code = 'long term illness'
WHERE EXISTS
(SELECT *
FROM absenteeism as A2
WHERE A1.emp_id = A2.emp_id
AND (A1.absent_date + INTERVAL 1 DAY) = A2.absent_date);
There is still a problem with long-term illnesses that span weeks. The current situation is that if you want to spend your weekends being sick, that is fine with the company. This is not a very nice place to work. If an employee reports in absent on Friday of week number 1, all of week number 2, and just Monday of week number 3, the UPDATE will catch only the five days from week number 2 as long-term illness. The Friday and Monday will show up as sick days with severity points. The subquery in the UPDATE requires additional changes to the missed-date chaining.
I would avoid problems with weekends by having a code for
scheduled days off (weekends, holidays, vacation, and so forth) that
carry a severity point of zero. A business that has people working
weekend shifts would need such codes.
The boss could manually change the Saturday and Sunday “weekend”
codes to “long-term illness” to get the UPDATE to work the way you
described. This same trick would also prevent you from losing scheduled
vacation time if you got the plague just before going on a cruise. If the
boss is a real sweetheart, he or she could also add compensation days for
the lost weekends with a zero severity point to the table, or reschedule an
employee’s vacation by adding absences dated in the future.
While I agreed that I left out the aging on the dates missed, I will
argue that it would be better to have another DELETE statement that
removes the year-old rows from the Absenteeism table, to keep the size
of the table as small as possible.
The expression
(BETWEEN CURRENT_TIMESTAMP - INTERVAL 365 DAYS AND
CURRENT_TIMESTAMP)
(BETWEEN CURRENT_TIMESTAMP - INTERVAL 365 DAYS AND
CURRENT_TIMESTAMP)
could also be
(BETWEEN CURRENT_TIMESTAMP - INTERVAL 1 YEAR AND
CURRENT_TIMESTAMP),
(BETWEEN CURRENT_TIMESTAMP - INTERVAL 1 YEAR AND
CURRENT_TIMESTAMP),
so the system would handle leap years. Better yet, DB2 and some other
SQL products have an AGE(date1) function, which returns the age in
years of something that happened on the date parameter. You would
then write (AGE(absent_date) >= 1) instead.
Answer #3
Another useful tool for this kind of problem is a Calendar table, which has the working days that can count against the employee. In the 10 years since this book was first written, this has become a customary SQL programming practice.
SELECT A.emp_id,
SUM(A.severity_points) AS absentism_score
FROM Absenteeism AS A, Calendar AS C
WHERE C1.cal_date = A.absent_date
AND A.absent_date
BETWEEN CURRENT_TIMESTAMP - INTERVAL 365 DAYS
AND CURRENT_TIMESTAMP
AND C1.date_type = ‘work’
GROUP BY emp_id
HAVING SUM(A.severity_points)>= 40;
Some people will also have a column in the Calendar table that
Julianizes the working days. Holidays and weekends would carry the
same Julian number as the preceding workday. For example
(cal_date,Julian_workday) :
( '2006-04-21 ', 42) – Friday
( '2006-04-22 ', 42) – Saturday
( '2006-04-23 ', 42) – Sunday
( '2006-04-24 ', 43) – Monday
( '2006-04-21 ', 42) – Friday
( '2006-04-22 ', 42) – Saturday
( '2006-04-23 ', 42) – Sunday
( '2006-04-24 ', 43) – Monday
You do the math from the current date’s Julian workday number to
find the start of their adjusted one-year period.
-The End-