来源:
https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation
download:
https://ccp.cloudera.com/display/SUPPORT/CDH+Downloads#CDHDownloads-CDH4PackagesandDownloads
https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs
http://archive.cloudera.com/redhat/cdh/
更新:
https://ccp.cloudera.com/display/CDHDOC/Upgrading+CDH3#UpgradingCDH3-upgradehadoop
Contents
- Ways To Install CDH3
- Before You Begin Installing CDH Manually
- Installing CDH3 On Red Hat-compatible systems
- Installing CDH3 on Ubuntu and Debian Systems
- Installing CDH3 on SUSE Systems
- Installing CDH3 Components
- Viewing the Apache Hadoop Documentation
Ways To Install CDH3
You can install CDH3 in any of the following ways:
- Automated method using Cloudera Manager Free Edition; instructions here.
Cloudera Manager Free Edition automates the installation and configuration of CDH3 on an entire cluster (up to 50 nodes) if you have root SSH access to your cluster's machines. For other requirements and installation instructions, see the Cloudera Manager Free Edition Documentation.
- Manual methods described below:
- Download and install the CDH3 package
- Add the CDH3 repository
- Build your own CDH3 repository
- Install from a CDH3 tarball — If you want to download and install a tarball, see Downloads.
The following instructions describe downloading and installing a package, adding a repository, and building your own repository. If you use one of these methods rather than Cloudera Manager Free Edition, the first (downloading and installing a package) is recommended in most cases because it is simpler than building or adding a repository.
Before You Begin Installing CDH Manually
- This section contains instructions for new installations. If you need to upgrade from an earlier release, see Upgrading CDH3.
- For a list of supported operating systems, see Supported Operating Systems for CDH3.
|
Important If you have not already done so, install the Oracle Java Development Kit (JDK). CDH3 requires the Oracle JDK 1.6, update 8 at a minimum, which you can download from the Java SE Downloads page. You may be able to install the Oracle JDK with your package manager, depending on your choice of operating system. For Oracle JDK installation instructions, see Java Development Kit Installation. |
Installing CDH3 On Red Hat-compatible systems
If you are installing CDH3 on a Red Hat system, you can download Cloudera packages using yum or your web browser.
Step 1: Download the CDH3 Repository or Package.
Use one of the following methods to download the CDH3 repository or package:
- Download and install the CDH3 Package or
- Add the CDH3 repository or
- Build a Yum Repository
To download and install the CDH3 Package:
- Click the entry in the table below that matches your Red Hat or CentOS system, choose Save File, and save the file to a directory to which you have write access (it can be your home directory).
For OS Version |
Click this Link |
Red Hat/CentOS/Oracle 5 |
Red Hat/CentOS/Oracle 5 link |
Red Hat/CentOS 6 |
Red Hat/CentOS 6 link |
- Install the RPM:
$ sudo yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm
|
Now continue with Step 2: Install CDH3.
To add the CDH3 repository:
Click the entry in the table below that matches your Red Hat or CentOS system, download the repo file, and save it in your /etc/yum.repos.d/ directory.
For OS Version |
Click this Link |
Red Hat/CentOS/Oracle 5 |
Red Hat/CentOS/Oracle 5 link |
Red Hat/CentOS 6 |
Red Hat/CentOS 6 link |
(To install a different version of CDH on a Red Hat system, open the repo file (for example, cloudera-cdh3.repo, and change the 3 in the repo file to the version number you want. For example, change the 3 to3u0 to install CDH3 Update 0.)
Now continue with Step 2: Install CDH3.
To build a CDH3 repository:
If you want to create your own yum repository, download the appropriate repo file, create the repo, distribute the repo file and set up a web server, as described under Creating a Local Yum Repository.
Now continue with Step 2: Install CDH3.
Step 2: Install CDH3 on all hosts
To install CDH3 on a Red Hat system:
Before installing: (Optionally) add a repository key. Add the Cloudera Public GPG Key to your repository by executing one of the following commands:
For Red Hat/CentOS/Oracle 5 systems:
$ sudo rpm --import http://archive.cloudera.com/redhat/cdh/RPM-GPG-KEY-cloudera
|
For Red Hat/CentOS 6 systems:
$ sudo rpm --import http://archive.cloudera.com/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
|
- Find and install the Hadoop core and native packages. For example:
$ yum search hadoop
$ sudo yum install hadoop-0.20 hadoop-0.20-native
|
|
Note In prior versions of CDH, the hadoop-0.20 package contained all of the service scripts in/etc/init.d/. In CDH3, the hadoop-0.20 package does not contain any service scripts – instead, those scripts are contained in the hadoop-0.20-<daemon> packages. Do the following step to install the daemon packages. |
- Install each type of daemon package on the appropriate machine. For example, install the NameNode package on your NameNode machine:
$ sudo yum install hadoop-0.20-<daemon type>
|
where <daemon type> is one of the following:
namenode |
datanode |
secondarynamenode |
jobtracker |
tasktracker |
- Install the CDH component(s) that you want to use. See Installing CDH3 Components.
Installing CDH3 on Ubuntu and Debian Systems
If you are installing CDH3 on a Debian system, you can download the Cloudera packages using apt or your web browser.
Step 1: Download the CDH3 Repository or Package.
Use one of the following methods to download the CDH3 repository or package:
- Download and install the CDH3 Package or
- Add the CDH3 repository or
- Build a Debian Repository
To download and install the CDH3 package:
- Click one of the following:
this link for a Squeeze system, or
this link for a Lenny system, or
this link for a Lucid system, or
this link for a Maverick system.
- Install the package. Do one of the following:
Choose Open with in the download window to use the package manager, or
Choose Save File, save the package to a directory to which you have write access (it can be your home directory) and install it from the command line, for example:
sudo dpkg -i Downloads/cdh3-repository_1.0_all.deb
|
Now continue with Step 2: Install CDH3.
To add the CDH3 repository:
- Create a new file /etc/apt/sources.list.d/cloudera.list with the following contents:
deb http://archive.cloudera.com/debian <RELEASE>-cdh3 contrib
deb-src http://archive.cloudera.com/debian <RELEASE>-cdh3 contrib
where:
<RELEASE> is the name of your distribution, which you can find by running lsb_release -c. For example, to install CDH3 for Ubuntu Lucid, use lucid-cdh3 in the command above.
(To install a different version of CDH on a Debian system, specify the version number you want in the<RELEASE>-cdh3 section of the deb command. For example, to install CDH3 Update 0 for Ubuntu Maverick, use maverick-cdh3u0 in the command above.)
- (Optionally) add a repository key. Add the Cloudera Public GPG Key to your repository by executing the following command:
$ curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -
|
This key enables you to verify that you are downloading genuine packages.
Now continue with Step 2: Install CDH3.
To build a CDH3 repository:
If you want to create your own apt repository, create a mirror of the CDH Debian directory and then create an apt repository from the mirror.
Now continue with Step 2: Install CDH3.
Step 2: Install CDH3 on all hosts
To install CDH3 on a Debian system:
- Update the APT package index:
- Find and install the Hadoop core and native packages by using your favorite APT package manager, such as apt-get, aptitude, or dselect. For example:
$ apt-cache search hadoop
$ sudo apt-get install hadoop-0.20 hadoop-0.20-native
|
|
Note In prior versions of CDH, the hadoop-0.20 package contained all of the service scripts in/etc/init.d/. In CDH3, the hadoop-0.20 package does not contain any service scripts – instead, those scripts are contained in the hadoop-0.20-<daemon> packages. Do the following step to install the daemon packages. |
- Install each type of daemon package on the appropriate machine. For example, install the NameNode package on your NameNode machine:
$ sudo apt-get install hadoop-0.20-<daemon type>
|
where <daemon type> is one of the following:
namenode |
datanode |
secondarynamenode |
jobtracker |
tasktracker |
- Install the CDH component(s) that you want to use. See Installing CDH3 Components.
Installing CDH3 on SUSE Systems
If you are installing CDH3 on a SUSE system, you can download the Cloudera packages using zypper or YaSTor your web browser.
Step 1: Download the CDH3 Repository or Package.
Use one of the following methods to download the CDH3 repository or package:
- Download and install the CDH3 Package or
- Add the CDH3 repository or
- Build a SUSE Repository
To download and install the CDH3 package:
- Click this link, choose Save File, and save it to a directory to which you have write access (it can be your home directory).
- Install the RPM:
$ sudo rpm -i cdh3-repository-1.0-1.noarch.rpm
|
Now continue with Step 2: Install CDH3.
To add the CDH3 repository:
- Run the following command:
$ sudo zypper addrepo -f http://archive.cloudera.com/sles/11/x86_64/cdh/cloudera-cdh3.repo
|
(To install a different version of CDH on a SUSE system, replace the CDH version number in the command above with the one you want. For example, change the 3 to 3u0 to install CDH3 Update 0.)
- Update your system package index by running:
Now continue with Step 2: Install CDH3.
To build a CDH3 repository:
If you want to create your own SUSE repository, create a mirror of the CDH SUSE directory by followingthese instructions that explain how to create a SUSE repository from the mirror.
Now continue with Step 2: Install CDH3.
Step 2: Install CDH on all hosts
To install CDH3 on a SUSE system:
- (Optionally) add a repository key. Add the Cloudera Public GPG Key to your repository by executing the following command:
$ sudo rpm --import http://archive.cloudera.com/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera
|
- Find and install the Hadoop core and native packages. For example:
$ sudo zypper search hadoop
$ sudo zypper install hadoop-0.20 hadoop-0.20-native
|
- Install each type of daemon package on the appropriate machine. For example, install the NameNode package on your NameNode machine:
$ sudo zypper install hadoop-0.20-<daemon type>
|
where <daemon type> is one of the following:
namenode |
datanode |
secondarynamenode |
jobtracker |
tasktracker |
- Install the CDH component(s) that you want to use. See Installing CDH3 Components.
Installing CDH3 Components
CDH3 includes several components that you can install and use with Apache Hadoop:
- Flume — A distributed, reliable, and available service for efficiently moving large amounts of data as the data is produced. This release provides a scalable conduit to shipping data around a cluster and concentrates on reliable logging. The primary use case is as a logging system that gathers a set of log files on every machine in a cluster and aggregates them to a centralized persistent store such as HDFS. As of CDH3 Update 4, you can install either Flume 0.9.x or Flume 1.x (but not both on the same host).
- Sqoop — A tool that imports data from relational databases into Hadoop clusters. Using JDBC to interface with databases, Sqoop imports the contents of tables into a Hadoop Distributed File System (HDFS) and generates Java classes that enable users to interpret the table's schema. Sqoop can also export records from HDFS to a relational database.
- Hue — A graphical user interface to work with CDH. Hue aggregates several applications which are collected into a desktop-like environment and delivered as a Web application that requires no client installation by individual users.
- Pig — Enables you to analyze large amounts of data using Pig's query language called Pig Latin. Pig Latin queries run in a distributed way on a Hadoop cluster.
- Hive — A powerful data warehousing application built on top of Hadoop which enables you to access your data using Hive QL, a language that is similar to SQL.
- HBase — provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System (HDFS). Cloudera recommends installing HBase in a standalone mode before you try to run it on a whole cluster.
- Zookeeper — A highly reliable and available service that provides coordination between distributed processes.
- Oozie — A server-based workflow engine specialized in running workflow jobs with actions that execute Hadoop jobs. A command line client is also available that allows remote administration and management of workflows within the Oozie server.
- Whirr — Provides a fast way to run cloud services.
- Snappy — A compression/decompression library. You do not need to install Snappy if you are already using the native library, but you do need to configure it; see Snappy Installation for more information.
- Mahout — A machine-learning tool. By enabling you to build machine-learning libraries that are scalable to "reasonably large" datasets, it aims to make building intelligent applications easier and faster.
To install the CDH3 components, follow the instructions in the following sections:
- Flume. See Flume 0.9.x Installation or Flume 1.x Installation.
- Sqoop. See Sqoop Installation.
- Hue. For more information, see Hue Installation.
- Pig. See Pig Installation.
- Oozie. For more information, see Oozie Installation.
- Hive. See Hive Installation.
- HBase. For more information, see HBase Installation.
- ZooKeeper. For more information, see ZooKeeper Installation.
- Whirr. See Whirr Installation.
- Snappy. For more information, see Snappy Installation.
- Mahout. See Mahout Installation.
Viewing the Apache Hadoop Documentation
For additional Apache Hadoop documentation, see http://archive.cloudera.com/cdh/3/hadoop/.