数据挖掘导论课后习题答案-第一章

Introduction

  1. Discuss whether or not each of the following activities is adata mining
    task.

    (a) Dividing the customers of a company according to their gender.
    No. This is a simple database query.
    (b) Dividing the customers of a company according to their prof-
    itability.
    No. This is an accounting calculation, followed by the applica-
    tion of a threshold. However, predicting the pro?tability of a new
    customer would be data mining.
    (c)Computing the total sales of a company.
    No. Again, this is simple accounting.
    (d) Sorting a student database based on student identi?cation num-
    bers.
    No. Again, this is a simple database query.
    (e) Predicting the outcomes of tossing a (fair) pair of dice.
    No. Since the die is fair, this is a probability calculation. If the
    die were not fair, and we needed to estimate the probabilities of
    each outcome from the data, then this is more like the problems
    considered by data mining. However, in this speci?c case, solu-
    tions to this problem were developed by mathematicians a long
    time ago, and thus, we wouldn ’tconsider it to be data mining.
    (f) Predicting the future stock price of a company using historical
    records.
    Yes. We would attempt to create a model that can predict the
    continuous value of the stock price. This is an example of the
    2 Chapter 1 Introduction
    area of data mining known as predictive modelling. We could use
    regression for this modelling, although researchers in many ?elds
    have developed a wide variety of techniques for predicting time
    series.
    (g) Monitoring the heart rate of a patient for abnormalities.
    Yes. We would build a model of the normal behavior of heart
    rate and raise an alarm when an unusual heart behavior occurred.
    This would involve the area of data mining known asanomaly de-
    tection. This could also be considered asa classi?cation problem
    if we had examples of both normal and abnormal heart behavior.
    (h) Monitoring seismic waves for earthquake activities.
    Yes. In this case, we would build a model of di?erent types of
    seismic wave behavior associated with earthquake activities and
    raise an alarm when one of these di?erent types of seismic activity
    was observed. This is an example of the area of data mining
    known as classi?cation.
    (i) Extracting the frequencies of a sound wave.
    No. This is signal processing.
  2. Suppose that you are employed as a data mining consultant for an In-
    ternet search engine company. Describe how data mining can help the
    company by giving speci?c examples of how techniques, such as clus-
    tering, classi?cation, association rule mining, and anomaly detection
    can be applied.
    The following are examples of possible answers.

    ? Clustering can group results with a similar theme and present
    them to the user in a more concise form, e.g., by reporting the
    10 most frequent words in the cluster.
    ? Classi?cation can assign results to pre-de?ned categories such as
    “Sports, ” “Politics, ” etc.
    ? Sequential association analysis can detect that that certain queries
    follow certain other queries with a high probability, allowing for
    more e?cient caching.
    ? Anomaly detection techniques can discover unusual patterns of
    user tra?c, e.g., that one subject has suddenly become much
    more popular. Advertising strategies could be adjusted to take
    advantage of such developments.
    3
  3. For eachof the following data sets, explain whether or not data privacy
    is an important issue.

    (a) Census data collected from 1900–1950.No
    (b) IP addressesand visit times of Web users who visit your Website.
    Yes
    © Images from Earth-orbiting satellites. No
    (d) Names and addresses of people from the telephone book. No
    (e) Names and email addresses collected from the Web. No

你可能感兴趣的:(数据挖掘,数据挖掘,数据库,人工智能,数据挖掘导论,数据挖掘导论习题)