今天应业务方要求,找一个指定URL在HDFS原始日志中的记录条数,为了方便, 就直接使用hadoop-examples-*.jar包中的 grep 作业。
提交作业
- [root@localhost yinjie]>hadoop jar $HADOOP_HOME/hadoop-examples-*.jar grep -Dmapred.job.queue.name=cp_normal_job_queue /group/*****/2011-08-12/00 /group/*****/grep/2011-08-12/00 'www.****.cn'
- 11/08/31 17:12:39 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 140330 for yinjie
- 11/08/31 17:12:39 INFO security.TokenCache: Got dt for hdfs://*****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24681;uri=****:8020;t.service=****:8020
- 11/08/31 17:12:39 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
- 11/08/31 17:12:39 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 2ad6654f3e9cad97d13f716e51a0509253c0aabb]
- 11/08/31 17:12:39 INFO mapred.FileInputFormat: Total input paths to process : 22
- 11/08/31 17:12:40 INFO mapred.JobClient: Running job: job_201108241351_24681
- 11/08/31 17:12:41 INFO mapred.JobClient: map 0% reduce 0%
- 11/08/31 17:12:50 INFO mapred.JobClient: map 4% reduce 0%
- 11/08/31 17:12:51 INFO mapred.JobClient: map 52% reduce 0%
- 11/08/31 17:12:52 INFO mapred.JobClient: map 60% reduce 0%
- 11/08/31 17:12:53 INFO mapred.JobClient: map 69% reduce 0%
- 11/08/31 17:12:54 INFO mapred.JobClient: map 79% reduce 0%
- 11/08/31 17:12:55 INFO mapred.JobClient: map 84% reduce 0%
- 11/08/31 17:12:56 INFO mapred.JobClient: map 90% reduce 0%
- 11/08/31 17:12:57 INFO mapred.JobClient: map 93% reduce 0%
- 11/08/31 17:12:58 INFO mapred.JobClient: map 95% reduce 27%
- 11/08/31 17:12:59 INFO mapred.JobClient: map 97% reduce 27%
- 11/08/31 17:13:01 INFO mapred.JobClient: map 98% reduce 27%
- 11/08/31 17:13:05 INFO mapred.JobClient: map 99% reduce 27%
- 11/08/31 17:13:07 INFO mapred.JobClient: map 99% reduce 32%
- 11/08/31 17:13:09 INFO mapred.JobClient: map 100% reduce 32%
- 11/08/31 17:13:14 INFO mapred.JobClient: map 100% reduce 100%
- 11/08/31 17:13:15 INFO mapred.JobClient: Job complete: job_201108241351_24681
- 11/08/31 17:13:15 INFO mapred.JobClient: Counters: 24
- 11/08/31 17:13:15 INFO mapred.JobClient: Job Counters
- 11/08/31 17:13:15 INFO mapred.JobClient: Launched reduce tasks=1
- 11/08/31 17:13:15 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1542961
- 11/08/31 17:13:15 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
- 11/08/31 17:13:15 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
- 11/08/31 17:13:15 INFO mapred.JobClient: Rack-local map tasks=44
- 11/08/31 17:13:15 INFO mapred.JobClient: Launched map tasks=242
- 11/08/31 17:13:15 INFO mapred.JobClient: Data-local map tasks=198
- 11/08/31 17:13:15 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=23291
- 11/08/31 17:13:15 INFO mapred.JobClient: FileSystemCounters
- 11/08/31 17:13:15 INFO mapred.JobClient: FILE_BYTES_READ=3724
- 11/08/31 17:13:15 INFO mapred.JobClient: HDFS_BYTES_READ=32281139322
- 11/08/31 17:13:15 INFO mapred.JobClient: FILE_BYTES_WRITTEN=14502646
- 11/08/31 17:13:15 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=118
- 11/08/31 17:13:15 INFO mapred.JobClient: Map-Reduce Framework
- 11/08/31 17:13:15 INFO mapred.JobClient: Reduce input groups=1
- 11/08/31 17:13:15 INFO mapred.JobClient: Combine output records=143
- 11/08/31 17:13:15 INFO mapred.JobClient: Map input records=37526374
- 11/08/31 17:13:15 INFO mapred.JobClient: Reduce shuffle bytes=5164
- 11/08/31 17:13:15 INFO mapred.JobClient: Reduce output records=1
- 11/08/31 17:13:15 INFO mapred.JobClient: Spilled Records=286
- 11/08/31 17:13:15 INFO mapred.JobClient: Map output bytes=786984
- 11/08/31 17:13:15 INFO mapred.JobClient: Map input bytes=32280203347
- 11/08/31 17:13:15 INFO mapred.JobClient: Combine input records=32791
- 11/08/31 17:13:15 INFO mapred.JobClient: Map output records=32791
- 11/08/31 17:13:15 INFO mapred.JobClient: SPLIT_RAW_BYTES=38731
- 11/08/31 17:13:15 INFO mapred.JobClient: Reduce input records=143
- 11/08/31 17:13:15 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
- 11/08/31 17:13:15 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 140331 for yinjie
- 11/08/31 17:13:15 INFO security.TokenCache: Got dt for hdfs://****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24682;uri=****:8020;t.service=****:8020
- 11/08/31 17:13:15 INFO mapred.FileInputFormat: Total input paths to process : 1
- 11/08/31 17:13:15 INFO mapred.JobClient: Cleaning up the staging area hdfs://****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24682
- org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException
- at org.apache.hadoop.mapred.QueueManager.getQueueACL(QueueManager.java:382)
- at org.apache.hadoop.mapred.JobTracker.getQueueAdmins(JobTracker.java:4422)
- at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
- at java.lang.reflect.Method.invoke(Method.java:597)
- at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
- at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
- at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
- at java.security.AccessController.doPrivileged(Native Method)
- at javax.security.auth.Subject.doAs(Subject.java:396)
- at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
- at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
- at org.apache.hadoop.ipc.Client.call(Client.java:1107)
- at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
- at org.apache.hadoop.mapred.$Proxy6.getQueueAdmins(Unknown Source)
- at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:886)
- at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
- at java.security.AccessController.doPrivileged(Native Method)
- at javax.security.auth.Subject.doAs(Subject.java:396)
- at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
- at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
- at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
- at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1242)
- at org.apache.hadoop.examples.Grep.run(Grep.java:84)
- at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
- at org.apache.hadoop.examples.Grep.main(Grep.java:93)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
- at java.lang.reflect.Method.invoke(Method.java:597)
- at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
- at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
- at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
- at java.lang.reflect.Method.invoke(Method.java:597)
- at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
- [root@localhost yinjie]>
发现有错, 比较奇怪, 第一个job成功执行, 第二个却失败了, 从异常来看应该是访问控制权限问题。提交的作业中指定了
-Dmapred.job.queue.name=cp_normal_job_queue 参数, 怀疑是不是第一个作业执行时带上该参数, 但后面一个作业没有带上,导致失败
只好先查看下$HADOOP_HOME下的conf配置:
- [root@localhost yinjie]>cat $HADOOP_HOME/conf/mapred-site.xml
- xml version="1.0"?>
- xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
- <property>
- <name>mapred.job.queue.namename>
- <value>cp_admin_job_queuevalue>
- <description> Queue to which a job is submitted. This must match one of the
- queues defined in mapred.queue.names for the system. Also, the ACL setup
- for the queue must allow the current user to submit a job to the queue.
- Before specifying a queue, ensure that the system is configured with
- the queue, and access is allowed for submitting jobs to the queue.
- description>
- property>
- ....
- ....
- ....
- configuration>
发现mapred.job.queue.name配置值是cp_admin_job_queue而不是提交作业时指定的cp_normal_job_queue, 会不会是第二个作业使用了cp_admin_job_queue值而导致失败。
抱着试试的心态,把$HADOOP_HOME/conf配置文件拷贝一份到当前用户目录下
- [root@localhost yinjie]>cp -rf $HADOOP_HOME/conf ./
- ....
- [root@localhost yinjie/conf]>ls
- allslaves configuration.xsl fair-scheduler.xml hadoop-metrics.properties hdfs-site.xml mapred-queue-acls.xml masters ssl-client.xml.example
- capacity-scheduler.xml core-site.xml hadoop-env.sh hadoop-policy.xml log4j.properties mapred-site.xml slaves ssl-server.xml.example
- [root@localhost yinjie/conf]>
- [root@localhost yinjie/conf]>vi mapred-site.xml
编辑mapred-site.xml, 把mapred.job.queue.name修改成cp_normal_job_queue 后保存
再一次提交作业,使用 --config 参数指定修改后的配置目录
- [root@localhost yinjie]>hadoop --config /home/yinjie/conf jar $HADOOP_HOME/hadoop-examples-*.jar grep -Dmapred.job.queue.name=cp_normal_job_queue /group/*****/2011-08-12/01 /group/*****/grep/2011-08-12/01 'www.****.cn'
- 11/08/31 17:25:19 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 140356 for yinjie
- 11/08/31 17:25:19 INFO security.TokenCache: Got dt for hdfs://****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24719;uri=****:8020;t.service=****:8020
- 11/08/31 17:25:19 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
- 11/08/31 17:25:19 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 2ad6654f3e9cad97d13f716e51a0509253c0aabb]
- 11/08/31 17:25:19 INFO mapred.FileInputFormat: Total input paths to process : 22
- 11/08/31 17:25:19 INFO mapred.JobClient: Running job: job_201108241351_24719
- 11/08/31 17:25:20 INFO mapred.JobClient: map 0% reduce 0%
- 11/08/31 17:25:30 INFO mapred.JobClient: map 4% reduce 0%
- 11/08/31 17:25:31 INFO mapred.JobClient: map 14% reduce 0%
- 11/08/31 17:25:32 INFO mapred.JobClient: map 51% reduce 0%
- 11/08/31 17:25:33 INFO mapred.JobClient: map 63% reduce 0%
- 11/08/31 17:25:34 INFO mapred.JobClient: map 68% reduce 0%
- 11/08/31 17:25:35 INFO mapred.JobClient: map 77% reduce 0%
- 11/08/31 17:25:36 INFO mapred.JobClient: map 87% reduce 0%
- 11/08/31 17:25:37 INFO mapred.JobClient: map 93% reduce 0%
- 11/08/31 17:25:38 INFO mapred.JobClient: map 96% reduce 0%
- 11/08/31 17:25:39 INFO mapred.JobClient: map 97% reduce 0%
- 11/08/31 17:25:40 INFO mapred.JobClient: map 98% reduce 0%
- 11/08/31 17:25:42 INFO mapred.JobClient: map 99% reduce 31%
- 11/08/31 17:25:48 INFO mapred.JobClient: map 100% reduce 31%
- 11/08/31 17:25:51 INFO mapred.JobClient: map 100% reduce 33%
- 11/08/31 17:25:53 INFO mapred.JobClient: map 100% reduce 100%
- 11/08/31 17:25:53 INFO mapred.JobClient: Job complete: job_201108241351_24719
- 11/08/31 17:25:53 INFO mapred.JobClient: Counters: 24
- 11/08/31 17:25:53 INFO mapred.JobClient: Job Counters
- 11/08/31 17:25:53 INFO mapred.JobClient: Launched reduce tasks=1
- 11/08/31 17:25:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1025313
- 11/08/31 17:25:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
- 11/08/31 17:25:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
- 11/08/31 17:25:53 INFO mapred.JobClient: Rack-local map tasks=26
- 11/08/31 17:25:53 INFO mapred.JobClient: Launched map tasks=176
- 11/08/31 17:25:53 INFO mapred.JobClient: Data-local map tasks=150
- 11/08/31 17:25:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=18297
- 11/08/31 17:25:53 INFO mapred.JobClient: FileSystemCounters
- 11/08/31 17:25:53 INFO mapred.JobClient: FILE_BYTES_READ=2580
- 11/08/31 17:25:53 INFO mapred.JobClient: HDFS_BYTES_READ=22352133231
- 11/08/31 17:25:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=10563326
- 11/08/31 17:25:53 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=118
- 11/08/31 17:25:53 INFO mapred.JobClient: Map-Reduce Framework
- 11/08/31 17:25:53 INFO mapred.JobClient: Reduce input groups=1
- 11/08/31 17:25:53 INFO mapred.JobClient: Combine output records=99
- 11/08/31 17:25:53 INFO mapred.JobClient: Map input records=26525927
- 11/08/31 17:25:53 INFO mapred.JobClient: Reduce shuffle bytes=3624
- 11/08/31 17:25:53 INFO mapred.JobClient: Reduce output records=1
- 11/08/31 17:25:53 INFO mapred.JobClient: Spilled Records=198
- 11/08/31 17:25:53 INFO mapred.JobClient: Map output bytes=515064
- 11/08/31 17:25:53 INFO mapred.JobClient: Map input bytes=22351478236
- 11/08/31 17:25:53 INFO mapred.JobClient: Combine input records=21461
- 11/08/31 17:25:53 INFO mapred.JobClient: Map output records=21461
- 11/08/31 17:25:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=28153
- 11/08/31 17:25:53 INFO mapred.JobClient: Reduce input records=99
- 11/08/31 17:25:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
- 11/08/31 17:25:53 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 140359 for yinjie
- 11/08/31 17:25:53 INFO security.TokenCache: Got dt for hdfs://****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24723;uri=****:8020;t.service=****:8020
- 11/08/31 17:25:53 INFO mapred.FileInputFormat: Total input paths to process : 1
- 11/08/31 17:25:53 INFO mapred.JobClient: Running job: job_201108241351_24723
- 11/08/31 17:25:54 INFO mapred.JobClient: map 0% reduce 0%
- 11/08/31 17:26:01 INFO mapred.JobClient: map 100% reduce 0%
- 11/08/31 17:26:13 INFO mapred.JobClient: map 100% reduce 100%
- 11/08/31 17:26:13 INFO mapred.JobClient: Job complete: job_201108241351_24723
- 11/08/31 17:26:13 INFO mapred.JobClient: Counters: 23
- 11/08/31 17:26:13 INFO mapred.JobClient: Job Counters
- 11/08/31 17:26:13 INFO mapred.JobClient: Launched reduce tasks=1
- 11/08/31 17:26:13 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=3225
- 11/08/31 17:26:13 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
- 11/08/31 17:26:13 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
- 11/08/31 17:26:13 INFO mapred.JobClient: Launched map tasks=1
- 11/08/31 17:26:13 INFO mapred.JobClient: Data-local map tasks=1
- 11/08/31 17:26:13 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8191
- 11/08/31 17:26:13 INFO mapred.JobClient: FileSystemCounters
- 11/08/31 17:26:13 INFO mapred.JobClient: FILE_BYTES_READ=32
- 11/08/31 17:26:13 INFO mapred.JobClient: HDFS_BYTES_READ=248
- 11/08/31 17:26:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=117216
- 11/08/31 17:26:13 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=22
- 11/08/31 17:26:13 INFO mapred.JobClient: Map-Reduce Framework
- 11/08/31 17:26:13 INFO mapred.JobClient: Reduce input groups=1
- 11/08/31 17:26:13 INFO mapred.JobClient: Combine output records=0
- 11/08/31 17:26:13 INFO mapred.JobClient: Map input records=1
- 11/08/31 17:26:13 INFO mapred.JobClient: Reduce shuffle bytes=0
- 11/08/31 17:26:13 INFO mapred.JobClient: Reduce output records=1
- 11/08/31 17:26:13 INFO mapred.JobClient: Spilled Records=2
- 11/08/31 17:26:13 INFO mapred.JobClient: Map output bytes=24
- 11/08/31 17:26:13 INFO mapred.JobClient: Map input bytes=32
- 11/08/31 17:26:13 INFO mapred.JobClient: Combine input records=0
- 11/08/31 17:26:13 INFO mapred.JobClient: Map output records=1
- 11/08/31 17:26:13 INFO mapred.JobClient: SPLIT_RAW_BYTES=130
- 11/08/31 17:26:13 INFO mapred.JobClient: Reduce input records=1
OK, 作业成功了!