2010-09-03
前提:已经安装好Hadoop
环境交待 :
1.cloudera-training-0.3.4 VMWARE IMAGE :什么都已经给你准备OK了!
2.下载mysql5.1?
正式开始:
chukwa的搭建是分布式的,所以我们配置的时候要分两部分进行,一部分在客户端上配置,一部分在服务端上。
在这里,我们一共分配三台虚拟机:两台作Agent,一台作collector。
1.安装Chukwa
1.1 First Steps
【设置方法】:
$ export CHUKWA_HOME="/home/training/chukwa/chukwa-0.4.0" OR vi /etc/profile 然后加上: export CHUKWA_HOME=/home/training/chukwa/chukwa-0.4.0 查看是否设置成功: $ echo $CHUKWA_HOME
1.2 General Configuration
Agents and collectors are configured differently, but part of the process is common to both.
还是在conf/chukwa-env.sh文件中,设置变量CHUKWA_LOG_DIR 和CHUKWA_PID_DIR 。它们分别用来保存控制台日志和 线程文件(不知道是什么东东,不理解,Orz),而且需要注意的是,线程文件不能放在网络文件系统中,只能放在每个Chukwa实例的本地。
它们的默认值分别是:CHUKWA_LOG_DIR = /tmp/chukwa/log 和 CHUKWA_PID_DIR = /tmp/chukwa/pidDir ,我就是直接在相应的目录下为它们新建了相应的目录。
2. 配置Agents on source nodes
Agents are the Chukwa processes that actually produce data. This section describes how to configure and run them. More details are available in the Agent configuration guide.
This section describes how to set up the agent process on the source nodes.
The one mandatory configuration step is to set up $CHUKWA_HOME/conf/collectors. This file should contain a list of hosts that will run Chukwa collectors. Agents will pick a random collector from this list to try sending to, and will fail-over to another listed collector on error. The file should look something like:
http://<collector1HostName>:<collector1Port>/ http://<collector2HostName>:<collector2Port>/ http://<collector3HostName>:<collector3Port>/
PS:chukwa collector默认在8080端口监听。
agent默认在9093端口监听。
如果conf目录下面没有agents文件,则调用stop-agents.sh或者stop-all.sh时会出现“No such file or directory”的提示,所以我们需要自己建这个文件并写入agent的IP及端口(格式和collector一样)。
Edit the CHUKWA_HOME/conf/initial_adaptors configuration file. This is where you tell Chukwa what log files to monitor. See the adaptor configuration guide for a list of available adaptors.
initial_adaptors 告诉chukwa哪些日志文件需要监控。
There are a number of optional settings in $CHUKWA_HOME/conf/chukwa-agent-conf.xml:
<property> <name>chukwaAgent.tags</name> <value>cluster="demo"</value> <description>The cluster's name for this agent</description> </property>
To run an agent process on a single node, use bin/chukwa agent.
在0.4.0版本中,要在本地启动agent用./chukwa agent。
其它选项如下:
Usage: chukwa [--config confdir] COMMAND" where COMMAND is one of: agent run a Chukwa Agent archive run the Archive Manager collector run a Chukwa Collector demux run the Demux Manager dp run the Post Demux data processors hicc run a HICC Webserver droll run a daily rolling job (deprecated) hroll run a hourly rolling job (deprecated) # Daily rolling and hourly rolling will be deprecated by retention processor retention run the Retention Processor version print the version Utilities: backfill run a back fill data loader utility dumpArchive view an archive file dumpRecord view a record file tail start tailing a file Most command print help when invoked w/o parameters.
Typically, agents run as daemons. The script bin/start-agents.sh will ssh to each machine listed in conf/agents and start an agent, running in the background. The script bin/stop-agents.sh does the reverse.
You can, of course, use any other daemon-management system you like. For instance, tools/init.d includes init scripts for running Chukwa agents.
To check if an agent is working properly, you can telnet to the control port (9093 by default) and hit "enter". You will get a status message if the agent is running normally.
我收到的消息如下:
training-vm: Chukwa Agent running, version 0.4.0-dev, with 0 adaptors
至于bin目录下各个脚本的解释如下(来自官方市文档,貌似是老版本的,不过可以参考一下):
start-all.sh - runs start-collectors.sh, start-agents.sh, start-probes.sh, start-data-processors.sh start-collectors.sh - start the chukwa collector daemon (jettyCollector.sh) on hosts listed in conf/collectors stop-collectors.sh - stop the chukwa collector daemon (jettyCollector.sh) on hosts listed in conf/collectors jettyCollector.sh - start the chukwa collector daemon on the current host start-agents.sh - start chukwa agent daemon (agent.sh) on all hosts listed in conf/agents stop-agents.sh - stop chukwa agent daemon (agent.sh) on all hosts listed in conf/agents agent.sh - start the chukwa agent on the current host start-probes.sh - runs, in this order, systemDataLoader.sh, torqueDataLoader.sh, nodeActivityDataLoader.sh slaves.sh <command command_args ...> - run arbitrary commands on all hosts in conf/slaves jettycollector.sh - start a jetty based version of the Chukwa collector agent.sh - start the chukwa agent on the local machine
One of the key goals for Chukwa is to collect logs from Hadoop clusters. This section describes how to configure Hadoop to send its logs to Chukwa. Note that these directions require Hadoop 0.20.0+. Earlier versions of Hadoop do not have the hooks that Chukwa requires in order to grab MapReduce job logs.
The Hadoop configuration files are located in HADOOP_HOME/conf. To setup Chukwa to collect logs from Hadoop, you need to change some of the Hadoop configuration files.
3. 配置Collectors
This section describes how to set up the Chukwa collectors. For more details, see the collector configuration guide.
First, edit $CHUKWA_HOME/conf/chukwa-env.sh In addition to the general directions given above, you should set HADOOP_HOME. This should be the Hadoop deployment Chukwa will use to store collected data. You will get a version mismatch error if this is configured incorrectly.
Next, edit $CHUKWA_HOME/conf/chukwa-collector-conf.xml. The one mandatory configuration parameter is writer.hdfs.filesystem. This should be set to the HDFS root URL on which Chukwa will store data. Various optional configuration options are described in the collector configuration guide and in the collector configuration file itself.
To run a collector process on a single node, use bin/chukwa collector.
Typically, collectors run as daemons. The script bin/start-collectors.sh will ssh to each collector listed in conf/collectors and start a collector, running in the background. The script bin/stop-collectors.sh does the reverse.
You can, of course, use any other daemon-management system you like. For instance, tools/init.d includes init scripts for running Chukwa collectors.
To check if a collector is working properly, you can simply access http://collectorhost:collectorport/chukwa?ping=true with a web browser. If the collector is running, you should see a status page with a handful of statistics.
i.e : 10.224.172.100:8080/chukwa?ping=true
return:
Date:1283759102858 Now:1283759107788 numberHTTPConnection in time window:0 numberchunks in time window:0 lifetimechunks:0
The Chukwa startup scripts are located in the CHUKWA_HOME/tools/init.d directory.
CHUKWA_HOME/tools/init.d/chukwa-data-processors start
0.4.0已经把该命令脚本放到bin/下面了,叫start/stop-data-processors.sh,反正我没找到/tools/init.d这个目录。
CHUKWA_HOME/bin/downSampling.sh --config <path to chukwa conf> -n add
Set up and configure the MySQL database.
Download MySQL 5.1 from the MySQL site.
tar fxvz mysql-*.tar.gz -C $CHUKWA_HOME/opt cd $CHUKWA_HOME/opt/mysql-*
Configure and then copy the my.cnf file to the CHUKWA_HOME/opt/mysql-* directory:
./scripts/mysql_install_db ./bin/mysqld_safe& ./bin/mysqladmin -u root create <clustername> ./bin/mysql -u root <clustername> < $CHUKWA_HOME/conf/database_create_table
Edit the CHUKWA_HOME/conf/jdbc.conf configuration file.
Set the clustername to the MYSQL root URL:
<clustername>=jdbc:mysql://localhost:3306/<clustername>?user=root
Download the MySQL Connector/J 5.1 from the MySQL site, and place the jar file in $CHUKWA_HOME/lib.
Start the MySQL shell:
mysql -u root -p Enter password:【此处密码应该为空】
From the MySQL shell, enter these commands (replace <username> and <password> with actual values):
GRANT REPLICATION SLAVE ON *.* TO '<username>'@'%' IDENTIFIED BY '<password>'; FLUSH PRIVILEGES;
The Hadoop Infrastructure Care Center (HICC) is the Chukwa web user interface. To set up HICC, do the following:
$CHUKWA_HOME/bin/hicc.sh start
【出现的问题】
NoClassDefFoundError: org/apache/hadoop/metrics/Updater
training@training-vm:~/chukwa/chukwa-0.4.0/bin$ ./chukwa agent training@training-vm:~/chukwa/chukwa-0.4.0/bin$ Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/metrics/Updater at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632) at java.lang.ClassLoader.defineClass(ClassLoader.java:616) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent.<clinit>(ChukwaAgent.java:63) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.metrics.Updater at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 13 more Could not find the main class: org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent. Program will exit.
如果出现这个问题,那多半是环境变量或者参数没有配置好,重新配置一下即可解决问题。
The Chukwa agent process name is identified by:
Command line to use to search for the process name:
Chukwa Collector name is identified by:
Chukwa Data Processors are identified by:
The processes are scheduled execution, therefore they are not always visible from the process list.
At slave server, MySQL prompt, run:
show slave status/G
Make sure both Slave_IO_Running and Slave_SQL_Running are both "Yes".
Things to check if MySQL replication fails:
To reset MySQL replication, run these commands on MySQL:
STOP SLAVE; CHANGE MASTER TO MASTER_HOST='hostname', MASTER_USER='username', MASTER_PASSWORD='password', MASTER_PORT=3306, MASTER_LOG_FILE='master2-bin.001', MASTER_LOG_POS=4, MASTER_CONNECT_RETRY=10; START SLAVE;
If anything is wrong, use /etc/init.d/chukwa-agent and CHUKWA_HOME/tools/init.d/chukwa-system-metrics stop to shutdown Chukwa. Look at agent.log and collector.log file to determine the problems.
The most common problem is the log files are growing unbounded. Set up a cron job to remove old log files:
0 12 * * * CHUKWA_HOME/tools/expiration.sh 10 !CHUKWA_HOME/var/log nowait
This will set up the log file expiration for CHUKWA_HOME/var/log for log files older than 10 days.
If the system is not functioning properly and you cannot find an answer in the Administration Guide, execute the kill command. The current state of the java process will be written to the log files. You can analyze these files to determine the cause of the problem.
kill -3 <pid>