There is an information evolution happening in the data warehouse environ<st1:rtx w:st="on">me</st1:rtx>nt today. Changing business require<st1:rtx w:st="on">me</st1:rtx>nts have placed demands on data warehousing technology to do more things faster. Data warehouses have moved from back room strategic decision support systems to operational, business-critical components of the enterprise. As your company evolves in its use of the data warehouse, what you need from the data warehouse evolves, too.
<v:shapetype o:spt="75" coordsize="21600,21600" stroked="f" id="_x0000_t75" filled="f" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe"><v:stroke joinstyle="miter"></v:stroke><v:formulas><v:f eqn="if lineDrawn pixelLineWidth 0"></v:f><v:f eqn="sum @0 1 0"></v:f><v:f eqn="sum 0 0 @1"></v:f><v:f eqn="prod @2 1 2"></v:f><v:f eqn="prod @3 21600 pixelWidth"></v:f><v:f eqn="prod @3 21600 pixelHeight"></v:f><v:f eqn="sum @0 0 1"></v:f><v:f eqn="prod @6 1 2"></v:f><v:f eqn="prod @7 21600 pixelWidth"></v:f><v:f eqn="sum @8 21600 0"></v:f><v:f eqn="prod @7 21600 pixelHeight"></v:f><v:f eqn="sum @10 21600 0"></v:f></v:formulas><v:path gradientshapeok="t" o:extrusionok="f" o:connecttype="rect"></v:path><o:lock v:ext="edit" aspectratio="t"></o:lock></v:shapetype><v:shape type="#_x0000_t75" id="_x0000_i1025" alt="" style="WIDTH: 408pt; HEIGHT: 189pt"><v:imagedata o:href="teradata考试资料/Basic/Basic/Untitled/Mod3-dwevolution.gif" src="file:///C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\msohtml1\01\clip_image001.png"></v:imagedata></v:shape>
Stage 1 Reporting: The initial stage typical<st1:rtx w:st="on">ly</st1:rtx> focuses on reporting from a single source of truth to drive decision-making across functional and/or product boundaries. Questions are usual<st1:rtx w:st="on">ly</st1:rtx> known in advance, such as a week<st1:rtx w:st="on">ly</st1:rtx> sales report.
Stage 2 Ana<st1:rtx w:st="on">ly</st1:rtx>zing: Focus on why so<st1:rtx w:st="on">me</st1:rtx>thing happened, such as why sales went down or discovering patterns in custo<st1:rtx w:st="on">me</st1:rtx>r buying habits. Users perform ad-hoc ana<st1:rtx w:st="on">ly</st1:rtx>sis, slicing and dicing the data at a detail level, and questions are not known in advance.
Stage 3 Predicting: Sophisticated ana<st1:rtx w:st="on">ly</st1:rtx>sts heavi<st1:rtx w:st="on">ly</st1:rtx> utilize the system to leverage information to predict what will happen next in the business to proactive<st1:rtx w:st="on">ly</st1:rtx> manage the organization's strategy. This stage requires data mining tools and building predictive models using historical detail. As an example, users can model custo<st1:rtx w:st="on">me</st1:rtx>r demographics for target marketing.
Stage 4 Operationalizing: Providing access to information for im<st1:rtx w:st="on">me</st1:rtx>diate decision-making in the field enters the realm of active data warehousing. Stages 1 to 3 focus on strategic decision-making within an organization. Stage 4 focuses on tactical decision support.. Tactical decision support is not focused on developing corporate strategy, but rather on supporting the people in the field who execute it. Examples: 1) Inventory manage<st1:rtx w:st="on">me</st1:rtx>nt with just-in-ti<st1:rtx w:st="on">me</st1:rtx> replenish<st1:rtx w:st="on">me</st1:rtx>nt, 2) Scheduling and routing for package delivery. 3) Altering a campaign based on current results.
Stage 5 Active Warehousing: The larger the role an ADW plays in the operational aspects of decision support, the more incentive the business has to automate the decision processes. You can automate decision-making when a custo<st1:rtx w:st="on">me</st1:rtx>r interacts with a web site. Interactive custo<st1:rtx w:st="on">me</st1:rtx>r relationship manage<st1:rtx w:st="on">me</st1:rtx>nt (CRM) on a web site or at an ATM is about making decisions to optimize the custo<st1:rtx w:st="on">me</st1:rtx>r relationship through individualized product offers, pricing, content delivery and so on. As technology evolves, more and more decisions beco<st1:rtx w:st="on">me</st1:rtx> executed with event-driven triggers to initiate ful<st1:rtx w:st="on">ly</st1:rtx> automated decision processes. Example: determine the best offer for a specific custo<st1:rtx w:st="on">me</st1:rtx>r based on a real-ti<st1:rtx w:st="on">me</st1:rtx> event, such as a significant ATM deposit.
Data warehouses are beginning to take on mission-critical roles supporting CRM, one-to-one marketing, and minute-to-minute decision-making. Data warehousing require<st1:rtx w:st="on">me</st1:rtx>nts have evolved to demand a decision capability that is not just oriented toward corporate staff and upper manage<st1:rtx w:st="on">me</st1:rtx>nt, but actionable on a day-to-day basis. Decisions such as when to replenish Barbie dolls at a particular retail outlet may not be strategic at the level of custo<st1:rtx w:st="on">me</st1:rtx>r seg<st1:rtx w:st="on">me</st1:rtx>ntation or long-term pricing strategies, but when executed proper<st1:rtx w:st="on">ly</st1:rtx>, they make a big difference to the bottom line. We refer to this capability as "tactical" decision support. <o:p></o:p>
Tactical decisions are the drivers for day-to-day manage<st1:rtx w:st="on">me</st1:rtx>nt of the business. Businesses today want more than just strategic insight from their data warehouse imple<st1:rtx w:st="on">me</st1:rtx>ntations-they want better execution in running the business through more effective use of information for the decisions that get made thousands of ti<st1:rtx w:st="on">me</st1:rtx>s per day. <o:p></o:p>
The origin of the active data warehouse is the ti<st1:rtx w:st="on">me</st1:rtx><st1:rtx w:st="on">ly</st1:rtx>, integrated store of detail data available for ana<st1:rtx w:st="on">ly</st1:rtx>tic business decision-making. It is on<st1:rtx w:st="on">ly</st1:rtx> from that source that the additional traits needed by the active data warehouse can evolve. These new "active" traits are supple<st1:rtx w:st="on">me</st1:rtx>ntal to data warehouse functionality. For example, the work mix in the database still includes complex decision support queries, but expands to take on short, tactical queries, background data feeds, and possib<st1:rtx w:st="on">ly</st1:rtx> event-driven updates all at the sa<st1:rtx w:st="on">me</st1:rtx> ti<st1:rtx w:st="on">me</st1:rtx>. Data volu<st1:rtx w:st="on">me</st1:rtx>s and user concurrency levels may explode upward beyond expectation. Restraints may need to be placed on the longer, ana<st1:rtx w:st="on">ly</st1:rtx>tical queries in order to guarantee tactical work throughput. While accessing the detail data direct<st1:rtx w:st="on">ly</st1:rtx> remains an important opportunity for ana<st1:rtx w:st="on">ly</st1:rtx>tical work, tactical work may thrive on shortcuts and summaries, such as operational data store (ODS) level information. And for both strategic and tactical decisions to be useful to the business, today's data, this hour's data, even this minute's data has to be at hand.<o:p></o:p>
Teradata is positioned exceptional<st1:rtx w:st="on">ly</st1:rtx> well for stepping up to the challenges related to high availability, large multi-user workloads, and handling complex queries that are required for an active data warehouse imple<st1:rtx w:st="on">me</st1:rtx>ntation. Teradata technology supports the evolving business require<st1:rtx w:st="on">me</st1:rtx>nts by providing high performance and scalability for: <o:p></o:p>
Teradata provides 7x24 availability and reliability, as well as continuous updating of information so data is always fresh and accurate.<o:p></o:p>
Traditional<st1:rtx w:st="on">ly</st1:rtx>, data processing has been divided into two categories: on-line transaction processing (OLTP) and decision support systems (DSS). For either, requests are handled as transactions. A transaction is a logical unit of work, such as a request to update an account. <o:p></o:p>
An RDBMS is used in the following main processing environ<st1:rtx w:st="on">me</st1:rtx>nts: <o:p></o:p>
Decision Support Systems (DSS)
In a decision support environ<st1:rtx w:st="on">me</st1:rtx>nt, users submit requests to anaylze historical detail data stored in the tables. The results are used to establish strategies, reveal trends, and make projections. A database used as a decision support system (DSS) usual<st1:rtx w:st="on">ly</st1:rtx> receives fewer, very complex, ad-hoc queries and may involve nu<st1:rtx w:st="on">me</st1:rtx>rous tables. Decision support systems include batch reports, which roll-up numbers to give business the big picture, and over ti<st1:rtx w:st="on">me</st1:rtx>, have evolved: <o:p></o:p>
<v:shape type="#_x0000_t75" id="_x0000_i1026" alt="" style="WIDTH: 284.25pt; HEIGHT: 96pt"><v:imagedata o:href="teradata考试资料/Basic/Basic/Untitled/ds.gif" src="file:///C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\msohtml1\01\clip_image003.gif"></v:imagedata></v:shape><o:p></o:p>
On-line Transaction Processing (OLTP)
Unlike the DSS environ<st1:rtx w:st="on">me</st1:rtx>nt, an on-line transaction processing (OLTP) environ<st1:rtx w:st="on">me</st1:rtx>nt typical<st1:rtx w:st="on">ly</st1:rtx> has users accessing current data to update, insert, and delete rows in the data tables. OLTP is typified by a small number of rows (or records) or a few of many possible tables being accessed in a matter of seconds or less. Very little I/O processing is required to complete the transaction. This type of transaction takes place when we take out money at an ATM. Once our card is validated, a debit transaction takes place a<st1:rtx w:st="on">gai</st1:rtx>nst our current balance to reflect the amount of cash withdrawn. This type of transaction also takes place when we deposit money into a checking account and the balance gets updated. We expect these transactions to be perfor<st1:rtx w:st="on">me</st1:rtx>d quick<st1:rtx w:st="on">ly</st1:rtx>. They must occur in real ti<st1:rtx w:st="on">me</st1:rtx>.
<v:shape type="#_x0000_t75" id="_x0000_i1027" alt="" style="WIDTH: 318pt; HEIGHT: 61.5pt"><v:imagedata o:href="teradata考试资料/Basic/Basic/Untitled/tp.gif" src="file:///C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\msohtml1\01\clip_image004.gif"></v:imagedata></v:shape><o:p></o:p>
On-line Ana<st1:rtx w:st="on">ly</st1:rtx>tical Processing (OLAP)
OLAP is the kind of processing that takes place in many data warehouses or data marts. Here, the user may be looking for historical trends, sales rankings or seasonal inventory fluctuations for the entire corporation. Usual<st1:rtx w:st="on">ly</st1:rtx>, this involves a lot of detail data to be retrieved, processed and ana<st1:rtx w:st="on">ly</st1:rtx>zed. Therefore, response ti<st1:rtx w:st="on">me</st1:rtx> can be in seconds or minutes. In the most sophisticated OLAP systems, the systems will make automated purchasing or inventory decisions without any human intervention. <o:p></o:p>
<v:shape type="#_x0000_t75" id="_x0000_i1028" alt="" style="WIDTH: 389.25pt; HEIGHT: 261.75pt"><v:imagedata o:href="teradata考试资料/Basic/Basic/Untitled/Mod3-evolutiondp.gif" src="file:///C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\msohtml1\01\clip_image005.gif"></v:imagedata></v:shape><o:p></o:p>
Until recent<st1:rtx w:st="on">ly</st1:rtx>, most business decisions were based on summary data. The problem is that summarized data is not as useful as detail data and cannot answer so<st1:rtx w:st="on">me</st1:rtx> questions with accuracy. With summarized data, peaks and valleys are leveled when the peaks fall at the end of reporting period and are cut in half. <o:p></o:p>
Here's another example. Think of your month<st1:rtx w:st="on">ly</st1:rtx> bank state<st1:rtx w:st="on">me</st1:rtx>nt that records checking account activity. If it on<st1:rtx w:st="on">ly</st1:rtx> told you the total amount of deposits and withdrawals, would you be able to tell if a certain check had cleared? To answer that question you need a list of every check received by your bank. You need detail data. <o:p></o:p>
Decision support-answering business questions-is the real purpose of databases. To answer business questions, decision-makers must have four things: <o:p></o:p>
Consider your own business and how it uses data. Is that data detailed or summarized? If it's summarized, are there questions it cannot answer?<o:p></o:p>
A data warehouse is a central, enterprise-wide database that contains information extracted from the operational systems. Data warehouses have beco<st1:rtx w:st="on">me</st1:rtx> more common in corporations where enterprise-wide detail data may be used in on-line ana<st1:rtx w:st="on">ly</st1:rtx>tical processing to make strategic and tactical business decisions. Warehouses often carry many years worth of detail data so that historical trends may be ana<st1:rtx w:st="on">ly</st1:rtx>zed using the full power of the data.
Many data warehouses get their data direct<st1:rtx w:st="on">ly</st1:rtx> from operational systems so that the data is ti<st1:rtx w:st="on">me</st1:rtx><st1:rtx w:st="on">ly</st1:rtx> and accurate. While data warehouses may begin so<st1:rtx w:st="on">me</st1:rtx>what small in scope and purpose, they often grow quite large as their utility beco<st1:rtx w:st="on">me</st1:rtx>s more ful<st1:rtx w:st="on">ly</st1:rtx> exploited by the enterprise.
Data Warehousing is a process, not a product. It is a technique to proper<st1:rtx w:st="on">ly</st1:rtx> assemble and manage data from various sources to answer business questions not previous<st1:rtx w:st="on">ly</st1:rtx> possible or known.
<v:shape type="#_x0000_t75" id="_x0000_i1029" style="WIDTH: 307.5pt; HEIGHT: 261pt"><v:imagedata src="file:///C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\msohtml1\01\clip_image006.png" o:title=""></v:imagedata></v:shape>
A data mart is a special purpose subset of enterprise data used by a particular depart<st1:rtx w:st="on">me</st1:rtx>nt, function or application. Data marts may have both summary and detail data for a particular use rather than for general use. Usual<st1:rtx w:st="on">ly</st1:rtx> the data has been pre-aggregated or transfor<st1:rtx w:st="on">me</st1:rtx>d in so<st1:rtx w:st="on">me</st1:rtx> way to better handle the particular type of requests of a specific user community.
Independent Data Marts
Independent data marts are created direct<st1:rtx w:st="on">ly</st1:rtx> from operational systems, just as is a data warehouse. In the data mart, the data is usual<st1:rtx w:st="on">ly</st1:rtx> transfor<st1:rtx w:st="on">me</st1:rtx>d as part of the load process. Data might be aggregated, di<st1:rtx w:st="on">me</st1:rtx>nsionalized or summarized historical<st1:rtx w:st="on">ly</st1:rtx>, as the require<st1:rtx w:st="on">me</st1:rtx>nts of the data mart dictate.
Logical Data Marts
Logical data marts are not separate physical structures or a data load from a data warehouse, but rather are an existing part of the data warehouse. Because in theory the data warehouse contains the detail data of the entire enterprise, a logical view of the warehouse might provide the specific information for a given user community, much as a physical data mart would. Without the proper technology, a logical data mart can be a slow and frustrating experience for end users. With the proper technology, it removes the need for massive data loading and transforming, making a single data store available for all user needs.
Dependent Data Marts
Dependent data marts are created from the detail data in the data warehouse. While having many of the advantages of the logical data mart, this approach still requires the move<st1:rtx w:st="on">me</st1:rtx>nt and transformation of data but may provide a better vehicle for performance-critical user queries.
Independent Data Marts
Independent data marts are usual<st1:rtx w:st="on">ly</st1:rtx> the easiest and fastest to imple<st1:rtx w:st="on">me</st1:rtx>nt and their payback value can be almost im<st1:rtx w:st="on">me</st1:rtx>diate. So<st1:rtx w:st="on">me</st1:rtx> corporations start with several data marts before deciding to build a true data warehouse. This approach has several inherent problems: <o:p></o:p>
Logical Data Marts
Logical data marts overco<st1:rtx w:st="on">me</st1:rtx> most of the limitations of independent data marts. They provide a single version of the truth. There is no historical limit to the data and "what if" querying is entire<st1:rtx w:st="on">ly</st1:rtx> feasible. The major drawback to logical data marts is the lack of physical control over the data. Because data in the warehouse in not pre-aggregated or di<st1:rtx w:st="on">me</st1:rtx>nsionalized, performance a<st1:rtx w:st="on">gai</st1:rtx>nst the logical mart will not usual<st1:rtx w:st="on">ly</st1:rtx> be as good as a<st1:rtx w:st="on">gai</st1:rtx>nst an independent mart. However, use of parallelism in the logical mart can overco<st1:rtx w:st="on">me</st1:rtx> so<st1:rtx w:st="on">me</st1:rtx> of the limitations of the non-transfor<st1:rtx w:st="on">me</st1:rtx>d data.<o:p></o:p>
Dependent Data Marts
Dependent data marts provide all advantages of a logical mart and also allow for physical control of the data as it is extracted from the data warehouse. Because dependent marts use the warehouse as their foundation, they are general<st1:rtx w:st="on">ly</st1:rtx> considered a better solution than independent marts, but they take longer and are more expensive to imple<st1:rtx w:st="on">me</st1:rtx>nt. <v:shape type="#_x0000_t75" id="_x0000_i1031" alt="" style="WIDTH: 476.25pt; HEIGHT: 275.25pt"><v:imagedata src="file:///C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\msohtml1\01\clip_image009.gif" o:title="Mod3-dmprocon"></v:imagedata></v:shape><o:p></o:p>
A Teradata system contains one or more nodes. A node is a term for a processing unit under the control of a single operating system. The node is where the processing occurs for the Teradata Database. There are two types of Teradata systems: <o:p></o:p>
<v:shape type="#_x0000_t75" id="_x0000_i1032" alt="" style="WIDTH: 375pt; HEIGHT: 157.5pt"><v:imagedata o:href="teradata考试资料/Basic/Basic/Untitled/smp_mpp.gif" src="file:///C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\msohtml1\01\clip_image010.gif"></v:imagedata></v:shape><o:p></o:p>
To manage a Teradata system, you use: <o:p></o:p>
To access a Teradata system, a user typical<st1:rtx w:st="on">ly</st1:rtx> logs on through one of multiple client platforms (channel-attached mainfra<st1:rtx w:st="on">me</st1:rtx>s or network-attached workstations). Client access is discussed in the next module.<o:p></o:p>
A node is a basic building block of a Teradata system, and contains a large number of hardware and software components. A conceptual diagram of a node and its major components is shown below. Hardware components are shown on the left side of the node and software components are shown on the right side.
<o:p> </o:p>
<v:shape type="#_x0000_t75" id="_x0000_i1033" alt="" style="WIDTH: 291pt; HEIGHT: 225pt"><v:imagedata o:href="teradata考试资料/Basic/Basic/Untitled/node.gif" src="file:///C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\msohtml1\01\clip_image011.gif"></v:imagedata></v:shape>
The Teradata vprocs (which are the PEs and AMPs) share the components of the nodes (<st1:rtx w:st="on">me</st1:rtx>mory and cpu). The main component of the "shared-nothing" architecture is that each AMP manages its own dedicated portion of the system's disk space (called the vdisk) and this space is not shared with other AMPs. Each AMP uses system resources independent<st1:rtx w:st="on">ly</st1:rtx> of the other AMPs so they can all work in parallel for high system performance overall.
The BYNET (pronounced, "bye-net") is a high-speed interconnect (network) that enables multiple nodes in the system to communicate. It has several unique features: <o:p></o:p>
The BYNET hardware and software handle the communication between the vprocs and the nodes. <o:p></o:p>
<v:shape type="#_x0000_t75" id="_x0000_i1034" alt="" style="WIDTH: 267pt; HEIGHT: 143.25pt"><v:imagedata o:href="teradata考试资料/Basic/Basic/Untitled/bynethsw.gif" src="file:///C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\msohtml1\01\clip_image012.gif"></v:imagedata></v:shape><o:p></o:p>
The BYNET hardware can carry the following types of <st1:rtx w:st="on">me</st1:rtx>ssages between nodes: <o:p></o:p>
<v:shape type="#_x0000_t75" id="_x0000_i1035" alt="" style="WIDTH: 262.5pt; HEIGHT: 140.25pt"><v:imagedata o:href="teradata考试资料/Basic/Basic/Untitled/bynet_communication.gif" src="file:///C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\msohtml1\01\clip_image013.gif"></v:imagedata></v:shape><o:p></o:p>
On an MPP system, BYNET hardware is used to first send the communication across nodes (using either the point-to-point or broadcast <st1:rtx w:st="on">me</st1:rtx>ssaging described previous<st1:rtx w:st="on">ly</st1:rtx>). <o:p></o:p>
On an SMP system, this first step is unnecessary since there is on<st1:rtx w:st="on">ly</st1:rtx> one node. <o:p></o:p>
Once a node receives a communication, vproc communication within the node is done by the PDE and BYNET software using the following types of <st1:rtx w:st="on">me</st1:rtx>ssaging:. <o:p></o:p>
Point-to-Point <st1:rtx w:st="on">Me</st1:rtx>ssages <o:p></o:p>
With point-to-point <st1:rtx w:st="on">me</st1:rtx>ssaging between vprocs, a vproc can send a <st1:rtx w:st="on">me</st1:rtx>ssage to another vproc on: <o:p></o:p>
Point-to-Point <st1:rtx w:st="on">Me</st1:rtx>ssage on the Sa<st1:rtx w:st="on">me</st1:rtx> Node<o:p></o:p>
<v:shape type="#_x0000_t75" id="_x0000_i1036" alt="" style="WIDTH: 262.5pt; HEIGHT: 140.25pt"><v:imagedata o:href="teradata考试资料/Basic/Basic/Untitled/bynet_point1.gif" src="file:///C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\msohtml1\01\clip_image014.gif"></v:imagedata></v:shape><o:p></o:p>
Point-to-Point <st1:rtx w:st="on">Me</st1:rtx>ssage on a Different Node <o:p></o:p>
<v:shape type="#_x0000_t75" id="_x0000_i1037" alt="" style="WIDTH: 262.5pt; HEIGHT: 140.25pt"><v:imagedata o:href="teradata考试资料/Basic/Basic/Untitled/bynet_point2.gif" src="file:///C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\msohtml1\01\clip_image015.gif"></v:imagedata></v:shape><o:p></o:p>
Multicast <st1:rtx w:st="on">Me</st1:rtx>ssages <o:p></o:p>
A vproc can send a <st1:rtx w:st="on">me</st1:rtx>ssage to multiple vprocs using two steps: <o:p></o:p>
<v:shape type="#_x0000_t75" id="_x0000_i1038" alt="" style="WIDTH: 262.5pt; HEIGHT: 140.25pt"><v:imagedata o:href="teradata考试资料/Basic/Basic/Untitled/bynet_multicast.gif" src="file:///C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\msohtml1\01\clip_image016.gif"></v:imagedata></v:shape><o:p></o:p>
Broadcast <st1:rtx w:st="on">Me</st1:rtx>ssages <o:p></o:p>
A vproc can send a <st1:rtx w:st="on">me</st1:rtx>ssage to all the vprocs in the system using two steps: <o:p></o:p>