What’s New in Pentaho Data Integration 4.1

Last Modified on October 28, 2010
What’s New in Pentaho Data Integration整合
Enterprise Edition 4.1
Copyright ? 2010 Pentaho Corporation. Redistribution重新分配 permitted. All trademarks商标 are the property所有权 of
their respective各自的 owners.
For the latest information, please visit our web site at www.pentaho.com
PentahoTM What’s New in Pentaho Data Integration Enterprise Edition 4.1 2
Contents
Contents ................................................................................................................... 2
Purpose 目的 of This Document ......................................................................................... 3
Pentaho Data Integration Enterprise Edition 4.1 .......................................................... 3
Pentaho Data Integration for Hadoop ......................................................................... 3
Enhancements增强 to Hops .............................................................................................. 4
Metadata Injection注射 .................................................................................................... 4
General Steps and Job作业 Entries .................................................................................... 4
New Transformation Steps ......................................................................................... 4
New Job Entries ........................................................................................................ 5
PentahoTM What’s New in Pentaho Data Integration Enterprise Edition 4.1 3
Purpose of This Document
This document introduces new capabilities生产力 delivered in Pentaho Data Integration (PDI) 4.1. It is intended 打算to
address people who have a working familiarity with the capabilities of Pentaho Data Integration (PDI), but is
not a complete review of Pentaho Data Integration’s functional capabilities.
Pentaho Data Integration Enterprise Edition 4.1
This PDI release includes: integration with Apache Hadoop, making it easy to leverage Hadoop for storing
and processing very large data sets; usability improvements for working with hops; the first ever support for
the concept 概念of Metadata Injection; and a number of new general purpose transformation steps and job
entries.


Pentaho Data Integration for Hadoop
More and more enterprises are turning to Hadoop to reduce 减少costs成本 and improve their ability to extract获得
actionable可控告的 business insight from the vast巨大的 amount of data being collected throughout the enterprise.
Hadoop’s massive大量的 parallel processing capabilities, along with the ability to store存储extremely large amounts of
data in a low cost and reliable可靠的 manner方式, make it an attractive迷人的 option for building Business Intelligence
solutions for Big Data. However, Hadoop presents many challenges to traditional BI Data Integration users,
including a steep险峻的 technical learning curve学习曲线, a lack of qualified technical staff, and the lack of appropriate
tools for performing运行 data integration and business intelligence tasks with Hadoop.

Pentaho Data Integration Enterprise Edition 4.1 delivers comprehensive综合的 integration with Hadoop, which
lowers the technical barriers障碍 to adopting Hadoop for Big Data projects. By using Pentaho Data Integration’s
easy-to-use, graphical design environment, ETL Designers can now harness治理 the power of Hadoop with zero
Java development to address common Data Integration use cases including:
? Moving data files into and out of the Hadoop Distributed File System (HDFS)
? Input/Output data to and from Hadoop using standard SQL statements
? Coordination协调、和谐 and execution实行 of Hadoop tasks as part of larger Data Integration and Business
Intelligence智力 workflows流
? Graphical Design of new MapReduce jobs taking advantage of Pentaho Data Integration’s vast
library of pre-built mapping and data transformation steps
Pentaho for Hadoop simplifies简化 the use of Hadoop for analytics including file input and output steps as well
as managing Hadoop jobs
Pentaho Data Integration Enterprise Edition 4.1 supports the latest releases of Apache Hadoop as well as
popular commercial distributions such as Cloudera Distribution for Hadoop and Amazon Elastic灵活的 MapReduce.
For information and best practices on how to incorporate使混合 Hadoop into your Big Data architecture结构, visit
http://www.pentaho.com/hadoop/resources.php.
PentahoTM What’s New in Pentaho Data Integration Enterprise Edition 4.1 4
Enhancements to Hops
Pentaho Data Integration 4.1 enhances the handling of hops between steps and job entries进入 by allowing all
hops downstream顺流的 from a certain某一的 point or among all selected steps or job entries to be enabled or disabled.
This allows for easier debugging of a faulty 有错误的 step at the end of the transformation and you can now disable
and enable hops simply by clicking on them once. In addition, when hops are split分开, target and error
handling info is retained.保持

Metadata Injection
Pentaho Data Integration 4.1 supports for the first time in data integration history the concept of Metadata
Injection. Metadata Injection offers increased flexibility灵活的 for developers who want to treat对待 their ETL metadata
as data. Last-minute最后的 injection of file layout and field selection into a transformation template makes this
possible. It can drastically大大的 reduce the number of data transformations in situations情况 where patterns can be
discovered 发现in the data integration workload工作量. Implemented as a metadata injection step, this feature allows
developers to dynamically set step properties in transformations. The step exposes揭发 all the available
properties of the step and enables injection of file names, the removal移走 or renaming改名 of fields, and other
metadata properties.

General Steps and Job Entries
In addition to the Pentaho for Hadoop functionality, Pentaho Data Integration 4.1 includes a number of new
steps and job entries designed to increase developer productivity. These include a conditional blocking阻碍 step,
JSON and YAML input steps, a string operations step, and a write to file job entry step. Below is a complete
list of new steps and transformations.
New Transformation Steps
Pentaho Data Integration 4.1 adds the following new transformation steps:
Icon Step Name Description
Hadoop File Input Processes files from an HDFS or Amazon S3 location.
Hadoop File Output Creates files in an HDFS location.
Conditional Blocking Step
Block this step until steps finish, allows building step logic
depending on some others steps execution
JSON Input Step Enables JSON step to execute even if defined path does not exist
JSON Output Step Create JSON block and output in a field of a file.
PentahoTM What’s New in Pentaho Data Integration Enterprise Edition 4.1 5
LDAP Output Step
Perform Insert, Upsert, Update, Add and Delete operations on
records based on their DN.
YAML Input Step Enables reading information from a YAML file.
Email Messages Input Read POP3/IMAP server and retrieve messages.
Generate Random Credit Card
Number
Generates random valid Credit Card numbers.
String Operations Step
Enables string operations including trimming整理, padding衬垫,
lowercase/uppercase, InitCap, Escape (XML, SQL, CDATA, HTML),
extract only digits, remove special characters (CR, LF, Espace,
Tab)
S3 File Output Creates files in an S3 file location.
Run SSH Commands Runs SSH commands and returns results.
Output steps metrics度量 Returns metrics for one or more steps within a transformation.
New Job Entries
Pentaho Data Integration 4.1 adds the following new job steps/entries:
Icon Step Name Description
Amazon EMR Job Executor执行者 Executes Map/Reduce jobs in Amazon EMR
Hadoop Copy Files Copies files to and from HDFS or Amazon S3
Hadoop Job Executor Executes Map/Reduce jobs in Hadoop
Hadoop Transformation Job
Executor
Executes PDI transformation-based Map/Reduce jobs in Hadoop
Write to File Job Entry At job level, directly write some data (static or in variables)

你可能感兴趣的:(BI,etl)