[Puzzles] => 简单查询SQL举例

这是[SQL.Puzzles]的第二篇，都是一些简单的SQL处理，但是整体上来看，这本书还是有点深度的，不像别的一些SQL入门级读物，举的例子都很弱智，起码还考虑到了很多实际工作上会遇到的问题。不过这篇里还是比较简单的，主要是要关注外键删除和工作日的考虑，这个问题非常实际。[SQL.Puzzles]再转这么一篇，以后就等有看着顺眼的SQL再转了，太简单的就直接忽视。

2 . ABSENTEES

This problem was presented on the MS ACCESS forum on CompuServe by Jim Chupella. He wanted to create a database that tracks employee absentee rates. Here is the table you will use:

    CREATE TABLE Absenteeism
    (emp_id INTEGER NOT NULL REFERENCES Personnel (emp_id),
     absent_date DATE NOT NULL,
     reason_code CHAR (40) NOT NULL REFERENCES ExcuseList(reason_code),
     severity_points INTEGER NOT NULL CHECK (severity_points BETWEEN 1 AND 4),
     PRIMARY KEY (emp_id, absent_date));

An employee ID number identifies each employee. The reason_code is a short text explanation for the absence (for example, “hit by beer truck,” “bad hair day,” and so on) that you pull from an ever-growing and imaginative list, and severity point is a point system that scores the penalty associated with the absence.

If an employee accrues 40 severity points within a one-year period, you automatically discharge that employee. If an employee is absent more than one day in a row, it is charged as a long-term illness, not as a typical absence. The employee does not receive severity points on the second, third, or later days, nor do those days count toward his or her total absenteeism.

Your job is to write SQL to enforce these two business rules, changing the schema if necessary.

Answer #1

    Looking at the first rule on discharging personnel, the most common design error is to try to drop the second, third, and later days from the
table. This approach messes up queries that count sick days, and makes chains of sick days very difficult to find.
    The trick is to allow a severity score of zero, so you can track the longterm illness of an employee in the Absenteeism table. Simply change the
severity point declaration to “CHECK (severity_points BETWEEN 0 AND 4)” so that you can give a zero to those absences that do not count.

    This is a trick newbies miss because storing a zero seems to be a waste of space, but zero is a number and the event is a fact that needs to be noted.
    UPDATE Absenteeism
       SET severity_points= 0,
           reason_code = 'long term illness'
     WHERE EXISTS
           (SELECT *
              FROM Absenteeism AS A2
            WHERE Absenteeism.emp_id = A2.emp_id
               AND Absenteeism.absent_date = (A2.absent_date - INTERVAL 1 DAY));

When a new row is inserted, this update will look for another absence on the day before and change its severity point score and reason_code in
accordance with your first rule.

    The second rule for firing an employee requires that you know what his or her current point score is. You would write that query as follows:
    SELECT emp_id, SUM(severity_points)
      FROM Absenteeism
     GROUP BY emp_id;

    This is the basis for a grouped subquery in the DELETE statement you finally want. Personnel with less than 40 points will return a NULL, and the test will fail.
    DELETE FROM Personnel
     WHERE emp_id = (SELECT A1.emp_id
                       FROM Absenteeism AS A1
                      WHERE A1.emp_id = Personnel.emp_id
                      GROUP BY A1.emp_id
                     HAVING SUM(severity_points) >= 40);

The GROUP BY clause is not really needed in SQL-92, but some older SQL implementations will require it.

Answer #2

Bert Scalzo, a senior instructor for Oracle Corporation, pointed out that the puzzle solution had two flaws and room for performance
improvements. The flaws are quite simple. First, the subquery does not check for personnel accruing 40 or more severity points within a one-year period,
as required. It requires the addition of a date range check in the WHERE

    clause:
    DELETE FROM Personnel
     WHERE emp_id = (SELECT A1.emp_id
                       FROM Absenteeism AS A1
                      WHERE A1.emp_id = Personnel.emp_id
                        AND absent_date
                            BETWEEN CURRENT_TIMESTAMP - INTERVAL 365 DAYS
                        AND CURRENT_TIMESTAMP
                      GROUP BY A1.emp_id
                     HAVING SUM(severity_points) >= 40);

Second, this SQL code deletes only offending personnel and not their absences. The related Absenteeism row must be either explicitly or implicitly deleted as well. You could replicate the above deletion for the Absenteeism table. However, the best solution is to add a cascading deletion clause to the Absenteeism table declaration:

    CREATE TABLE Absenteeism
    ( ... emp_id INTEGER NOT NULL
      REFERENCES Personnel(emp_id)
     ON DELETE CASCADE,
     ...);

    The performance suggestions are based on some assumptions. If you can safely assume that the UPDATE is run regularly and people do not change their departments while they are absent, then you can improve the UPDATE command’s subquery:
    UPDATE Absenteeism AS A1
       SET severity_points = 0,
           reason_code = 'long term illness'
     WHERE EXISTS
           (SELECT *
              FROM absenteeism as A2
             WHERE A1.emp_id = A2.emp_id
               AND (A1.absent_date + INTERVAL 1 DAY) = A2.absent_date);

There is still a problem with long-term illnesses that span weeks. The current situation is that if you want to spend your weekends being sick, that is fine with the company. This is not a very nice place to work. If an employee reports in absent on Friday of week number 1, all of week number 2, and just Monday of week number 3, the UPDATE will catch only the five days from week number 2 as long-term illness. The Friday and Monday will show up as sick days with severity points. The subquery in the UPDATE requires additional changes to the missed-date chaining.

I would avoid problems with weekends by having a code for scheduled days off (weekends, holidays, vacation, and so forth) that carry a severity point of zero. A business that has people working weekend shifts would need such codes.

The boss could manually change the Saturday and Sunday “weekend” codes to “long-term illness” to get the UPDATE to work the way you described. This same trick would also prevent you from losing scheduled vacation time if you got the plague just before going on a cruise. If the boss is a real sweetheart, he or she could also add compensation days for the lost weekends with a zero severity point to the table, or reschedule an employee’s vacation by adding absences dated in the future.

While I agreed that I left out the aging on the dates missed, I will argue that it would be better to have another DELETE statement that removes the year-old rows from the Absenteeism table, to keep the size of the table as small as possible.

    The expression
    (BETWEEN CURRENT_TIMESTAMP - INTERVAL 365 DAYS AND
     CURRENT_TIMESTAMP)

    could also be
    (BETWEEN CURRENT_TIMESTAMP - INTERVAL 1 YEAR AND
     CURRENT_TIMESTAMP),

so the system would handle leap years. Better yet, DB2 and some other SQL products have an AGE(date1) function, which returns the age in years of something that happened on the date parameter. You would then write (AGE(absent_date) >= 1) instead.

Answer #3

    Another useful tool for this kind of problem is a Calendar table, which has the working days that can count against the employee. In the 10 years since this book was first written, this has become a customary SQL programming practice.
    SELECT A.emp_id,
           SUM(A.severity_points) AS absentism_score
      FROM Absenteeism AS A, Calendar AS C
     WHERE C1.cal_date = A.absent_date
       AND A.absent_date
           BETWEEN CURRENT_TIMESTAMP - INTERVAL 365 DAYS
           AND CURRENT_TIMESTAMP
       AND C1.date_type = ‘work’
     GROUP BY emp_id
    HAVING SUM(A.severity_points)>= 40;

Some people will also have a column in the Calendar table that Julianizes the working days. Holidays and weekends would carry the same Julian number as the preceding workday. For example

    (cal_date,Julian_workday) :
    ( '2006-04-21 ', 42) – Friday
    ( '2006-04-22 ', 42) – Saturday
    ( '2006-04-23 ', 42) – Sunday
    ( '2006-04-24 ', 43) – Monday

You do the math from the current date’s Julian workday number to find the start of their adjusted one-year period.

-The End-

[Puzzles] => 简单查询SQL举例

你可能感兴趣的:([Puzzles] => 简单查询SQL举例)