文本数据 与 量化数据 一同分析

 

to analyze quantitative data and produce information that monitors business performance. The analyses may be summaries or drill downs that present details on subsets of data. More broadly, business intelligence can include any information, such as articles and reports, that offers insights into an industry or company (see sidebar page 9). Usually, quantitative data and text information are considered separately, but now quantitative analysis is being paired with information in text form to achieve a deeper understanding than either can provide alone.

EDF Energy (edfenergy.com), a large U.K. energy company, provides power to a quarter of the U.K.'s population via its electricity distribution networks in London, the South East and the East of England. The company supplies gas and electricity to more than 5 million customers through its U.K. retail brand. It offers about 60 different energyrelated products and services, including a "Green Tariff that lets customers choose renewable energy sources. EDF Energy was using a business intelligence tool to carry out basic analyses but wanted to move into data mining and modeling of customer behavior.

"We wanted to be able to interrogate our databases to get more value out of the information we had," says Clifford Budge, customer insight manager at EDF Energy.

After evaluating several solutions, Budge selected Clementine, a predictive modeling tool from SPSS (spss.com). The flexibility of Clementine was appealing. It uses a wide range of analyses, including regression, neural networking and decision trees. In addition, it has clustering tools that group customers according to multiple behavioral variables.

"One feature that interested us quite a bit was a text mining module that can work in conjunction with the quantitative analyses," Budge adds.

The intended users of Clementine were not statistical analysts, but business sales and marketing analysts who have a good understanding of the business and the associated data.

"They needed an application that was reasonably easy to use," continues Budge, "to allow us to make decisions based on the data while incorporating the expertise of the staff."

The modeling component lets the employees define analyses that predict which types of clusters are likely to buy which kinds of products. The marketing department can then target the campaigns in a more focused way, improving ROI. Being able to predict customer behavior is a key goal of those analyses. One model developed using Clementine was able to identify a group of customers that was three times more likely than average to develop bad debts early in their customer life cycle. The analysis would enable the company's debt teams to manage those customers differently-for example, offering products that limit debt risk like prepayment.

The text module is being used initially for reviewing and reclassifying customers into appropriate categories. For example, some accounts that were initially established as residential can later convert to business accounts, without notifying EDF Energy. By using the text analysis tool to locate words associated with businesses, such as "Ltd.," EDF Energy can find those customers and then offer them products and services that match their business needs. In the future, the company plans to mine data in its customer relationship management (CRM) systems for indications of attitudinal reactions to products and to identify gaps in services.

"We can take data from any source," says Budge, "including our Siebel (siebel.com) system, and compare our findings with insight from the Customer Research Team." If an indication is found that customers would like a certain product, a model can be created that profiles those customers. An offering can then be made to a wider range of similar customers.

 

Traditional BI is about reporting the facts," says Olivier Jouve, VP of market strategy at SPSS, "but text mining explains more about why things are happening." Jouve developed the technology that allows analyses of structured and text data to be combined.

"So much of the available data is unstructured," Jouve adds, "that a lot is lost if it is not included in analyses." Conversely, text analysis conducted alone, without being related to quantitative analysis, cannot demonstrate an ROI.

Text mining is a bottom-up approach that starts with the data, to see what it shows. Search is a topdown approach, most useful when the researcher has a direction for the inquiry. Trying to find the key words to detect customer sentiment can be difficult. Searching call center notes, for example, may not be particularly revealing.

"This information typically consists of very short sentences, without a lot of context," says Jouve. "We've seen 'customer' abbreviated in 27 ways." Text mining of customer data is geared toward extracting opinions and sentiments that may not be known in advance.

Fruitful results

Medical data is another environment in which combined analyses of structured and unstructured data can prove fruitful. At the University of Louisville (louisville.edu), a team of researchers headed by Dr. Patricia Cerrito is using SAS Text Miner from SAS (sas.com) to analyze data from area hospitals. Analyses of text records gathered from medication orders and chart notes are helping to explain the relationship between physician practices and patient outcomes.

The ability of Text Miner to find patterns in clinical reports and other medical documents and to provide quantitative analyses of text is valuable in the research at Louisville. In addition, the close integration of Text Miner with Enterprise Miner provides an easy way to combine analyses of structured and unstructured data.

Mining structured data and unstructured text is something SAS customers have applied to many different industry needs, says Mary Crissey, SAS product marketing manager for data mining and text mining.

For example, American Honda (honda.com) now uses SAS Text Miner to monitor warranty claims, in order to detect early warning signs of engineering problems. Honda analyzes text from call centers, technician feedback and other areas across their dealer network to find patterns in the records that may be early indications of potential problems. Then, Honda engineers can investigate further to pinpoint the root cause of the issue.

The attention to text-based feedback as part of early-warning analysis is now becoming essential for manufacturing companies that strive to identify potential issues and resolve them quickly before they are allowed to snowball into larger, more expensive problems.

SAS has traditionally been strong in analytics; its Enterprise Miner product is used to mine structured data. In order to apply some of the same skills to text, the company turned to Inxight (inxight.com), which specializes in discovery and visualization of text information.

Using Inxight's technology in our Text Miner product allowed us to analyze text for concepts using some of the same algorithms we had developed for Enterprise Miner," says Crissey. "We added some graphical interfaces that allow visualization of patterns found in the text, ranging from basics like word counts to a more sophisticated understanding of word usage that might indicate a specific predictive trend."

The issue of combining analyses of structured and unstructured data to provide more meaningful pictures of business and technical information is receiving increasing attention.

"This problem has been around for years," says Michael Corcoran, VP of strategy at Information Builders (IBI, informationbuilders.com). "Portal technology tried to bring the various data repositories together, but what was delivered was an application in which the user still had to know where the data was located."

With more non-technical people (whether internal to the company, customers or supply chain partners) now seeking information, the ability to find information without knowing its location has become critical.

One interface that users know and are comfortable with is that of Google (google.com), so IBI opted to incorporate that search technology with its BI solution, webFOCUS, to create its Intelligent search tool. That top-down strategy does require the user to know the search target, as opposed to a text mining situation in which the software discovers patterns. However, for actions such as finding all the information on a particular customer, the combination of structured BI and a search engine offers advantages. Besides the familiar interface, the data can be found no matter in which repository it resides, in contrast to situations where the data must be in a dedicated warehouse. IBFs iWay Software integration tool provides adaptors to many structured databases and document management repositories, making everything accessible.

By the end of this year, IBI expects to have a template that presents BI reports and relevant unstructured content within one interface.

"This information will be available in a dashboard," says Corcoran, "with metrics, call center activity, contract information and whatever else the user defines as beneficial to getting the big picture." IBI has also partnered with SPSS to add predictive analytics to its own reporting capability.

Although the practice of combining quantitative analytics and text analytics has not yet been widely adopted, it is becoming increasingly feasible, thanks to advances in technology, and is likely to see greater use in the relatively near term.

你可能感兴趣的:(数据)