nomad2

《Hadoop The Definitive Guide》ch11 Pig

1. Pig

Pig是一种用于探索大型数据集的脚本语言，专门用于数据的批处理。

2. 安装和启动

export HADOOP_INSTALL=/local/nomad2/hadoop/hadoop-0.20.203.0
export PATH=$PATH:$HADOOP_INSTALL/bin
export JAVA_HOME=/usr/lib/jvm/java-6-sun/
export PIG_INSTALL=/local/nomad2/pig/pig-0.10.0
export PATH=$PATH:$PIG_INSTALL/bin
export PIG_HADOOP_VERSION=20
export PIG_CLASSPATH=$HADOOP_INSTALL/conf/

[ate: /local/nomad2/hadoop ]
>> pig                         
2012-07-06 05:54:13,371 [main] INFO  org.apache.pig.Main - Apache Pig version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
2012-07-06 05:54:13,372 [main] INFO  org.apache.pig.Main - Logging error messages to: /local/nomad2/hadoop/pig_1341525253367.log
2012-07-06 05:54:13,539 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost/
2012-07-06 05:54:13,743 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
grunt> records = LOAD 'input/ncdc/micro-tab/sample.txt'
  AS (year:chararray, temperature:int, quality:int);
grunt>

3. Grunt

3.1 LOAD

grunt> records = LOAD 'input/ncdc/micro-tab/sample.txt'
  AS (year:chararray, temperature:int, quality:int);
grunt> dump records;
2012-07-06 05:58:16,661 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2012-07-06 05:58:16,851 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 05:58:16,873 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 05:58:16,873 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 05:58:16,936 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 05:58:16,952 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 05:58:16,956 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job7839038626859326850.jar
2012-07-06 05:58:19,936 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job7839038626859326850.jar created
2012-07-06 05:58:19,947 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-07-06 05:58:19,974 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-07-06 05:58:20,209 [Thread-10] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 05:58:20,210 [Thread-10] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 05:58:20,218 [Thread-10] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-07-06 05:58:20,475 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-07-06 05:58:21,245 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201207030133_0009
2012-07-06 05:58:21,245 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201207030133_0009
2012-07-06 05:58:34,295 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-07-06 05:58:40,839 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-07-06 05:58:40,842 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
0.20.203.0      0.10.0  nomad2        2012-07-06 05:58:16     2012-07-06 05:58:40     UNKNOWN

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime     Alias   Feature Outputs
job_201207030133_0009   1       0       6       6       6       0       0       0       records MAP_ONLY        hdfs://localhost/tmp/temp279304135/tmp-1218319262,

Input(s):
Successfully read 5 records (432 bytes) from: "hdfs://localhost/user/nomad2/input/ncdc/micro-tab/sample.txt"

Output(s):
Successfully stored 5 records (74 bytes) in: "hdfs://localhost/tmp/temp279304135/tmp-1218319262"

Counters:
Total records written : 5
Total bytes written : 74
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201207030133_0009


2012-07-06 05:58:40,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2012-07-06 05:58:40,859 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 05:58:40,859 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(1950,0,1)
(1950,22,1)
(1950,-11,1)
(1949,111,1)
(1949,78,1)

grunt> describe records;
records: {year: chararray,temperature: int,quality: int}

3.2 Filter

grunt> filtered_records = FILTER records BY temperature != 9999 AND (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9);
grunt> dump filtered_records
2012-07-06 06:01:21,090 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: FILTER
2012-07-06 06:01:21,124 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:01:21,127 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:01:21,127 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:01:21,129 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:01:21,131 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:01:21,131 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job5684834367376328176.jar
2012-07-06 06:01:23,823 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job5684834367376328176.jar created
2012-07-06 06:01:23,829 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-07-06 06:01:23,847 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-07-06 06:01:24,021 [Thread-25] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:01:24,022 [Thread-25] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 06:01:24,023 [Thread-25] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-07-06 06:01:24,347 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201207030133_0010
2012-07-06 06:01:24,347 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201207030133_0010
2012-07-06 06:01:24,350 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-07-06 06:01:37,381 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-07-06 06:01:44,408 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-07-06 06:01:44,409 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
0.20.203.0      0.10.0  nomad2        2012-07-06 06:01:21     2012-07-06 06:01:44     FILTER

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime     Alias   Feature Outputs
job_201207030133_0010   1       0       6       6       6       0       0       0       filtered_records,recordsMAP_ONLY        hdfs://localhost/tmp/temp279304135/tmp-767579563,

Input(s):
Successfully read 5 records (432 bytes) from: "hdfs://localhost/user/nomad2/input/ncdc/micro-tab/sample.txt"

Output(s):
Successfully stored 5 records (74 bytes) in: "hdfs://localhost/tmp/temp279304135/tmp-767579563"

Counters:
Total records written : 5
Total bytes written : 74
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201207030133_0010


2012-07-06 06:01:44,414 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2012-07-06 06:01:44,419 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:01:44,419 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(1950,0,1)
(1950,22,1)
(1950,-11,1)
(1949,111,1)
(1949,78,1)

3.3 Group

grunt> grouped_records = GROUP filtered_records BY year;
grunt> dump grouped_records;
2012-07-06 06:02:40,489 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2012-07-06 06:02:40,519 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:02:40,528 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:02:40,528 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:02:40,532 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:02:40,534 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:02:40,535 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job5702139319332202005.jar
2012-07-06 06:02:43,152 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job5702139319332202005.jar created
2012-07-06 06:02:43,157 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-07-06 06:02:43,162 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:02:43,162 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:02:43,185 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-07-06 06:02:43,340 [Thread-39] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:02:43,340 [Thread-39] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 06:02:43,341 [Thread-39] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-07-06 06:02:43,686 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201207030133_0011
2012-07-06 06:02:43,686 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201207030133_0011
2012-07-06 06:02:43,687 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-07-06 06:02:58,225 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-07-06 06:03:18,786 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-07-06 06:03:18,786 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
0.20.203.0      0.10.0  nomad2        2012-07-06 06:02:40     2012-07-06 06:03:18     GROUP_BY,FILTER

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime     Alias   Feature Outputs
job_201207030133_0011   1       1       6       6       6       12      12      12      filtered_records,grouped_records,records  GROUP_BY        hdfs://localhost/tmp/temp279304135/tmp-1975892788,

Input(s):
Successfully read 5 records (432 bytes) from: "hdfs://localhost/user/nomad2/input/ncdc/micro-tab/sample.txt"

Output(s):
Successfully stored 2 records (87 bytes) in: "hdfs://localhost/tmp/temp279304135/tmp-1975892788"

Counters:
Total records written : 2
Total bytes written : 87
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201207030133_0011


2012-07-06 06:03:18,793 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2012-07-06 06:03:18,797 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:03:18,797 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(1949,{(1949,111,1),(1949,78,1)})
(1950,{(1950,0,1),(1950,22,1),(1950,-11,1)})

3.4 foreach and generate

grunt> max_temp = FOREACH grouped_records GENERATE group, MAX(filtered_records.temperature);
grunt> dump max_temp;
2012-07-06 06:04:38,622 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2012-07-06 06:04:38,650 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:04:38,654 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2012-07-06 06:04:38,663 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:04:38,663 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:04:38,667 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:04:38,669 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:04:38,670 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job7466964472073663140.jar
2012-07-06 06:04:41,549 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job7466964472073663140.jar created
2012-07-06 06:04:41,553 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-07-06 06:04:41,561 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:04:41,561 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:04:41,575 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-07-06 06:04:41,743 [Thread-53] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:04:41,743 [Thread-53] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 06:04:41,744 [Thread-53] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-07-06 06:04:42,076 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201207030133_0012
2012-07-06 06:04:42,076 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201207030133_0012
2012-07-06 06:04:42,080 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-07-06 06:04:55,115 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-07-06 06:05:17,190 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-07-06 06:05:17,190 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
0.20.203.0      0.10.0  nomad2        2012-07-06 06:04:38     2012-07-06 06:05:17     GROUP_BY,FILTER

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime     Alias   Feature Outputs
job_201207030133_0012   1       1       6       6       6       12      12      12      filtered_records,grouped_records,max_temp,records GROUP_BY,COMBINER       hdfs://localhost/tmp/temp279304135/tmp-954146705,

Input(s):
Successfully read 5 records (432 bytes) from: "hdfs://localhost/user/nomad2/input/ncdc/micro-tab/sample.txt"

Output(s):
Successfully stored 2 records (28 bytes) in: "hdfs://localhost/tmp/temp279304135/tmp-954146705"

Counters:
Total records written : 2
Total bytes written : 28
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201207030133_0012


2012-07-06 06:05:17,196 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2012-07-06 06:05:17,201 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:05:17,201 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(1949,111)
(1950,22)

3.5 illustrate

grunt> illustrate max_temp;
2012-07-06 06:06:08,418 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost/
2012-07-06 06:06:08,419 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2012-07-06 06:06:08,449 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,450 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,450 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,450 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,451 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,486 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:06:08,486 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 06:06:08,495 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,497 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,497 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,498 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,498 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,499 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,505 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,505 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,533 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,535 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,535 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,536 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,536 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,537 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,542 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,542 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,558 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,560 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,560 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,561 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,561 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,562 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,566 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,566 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,580 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,582 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,582 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,583 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,583 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,584 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,588 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,588 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,600 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,603 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,603 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,607 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,607 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,621 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,622 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,623 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,623 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,623 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,624 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,628 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,628 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,644 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,646 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,646 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,646 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,646 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,647 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,651 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,651 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,662 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,664 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,664 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,664 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,664 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,665 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,669 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,669 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
(1949,111,1)
2012-07-06 06:06:08,683 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,684 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,684 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,685 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,685 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,686 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,689 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,689 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,699 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,713 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,714 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,716 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,717 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,717 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,720 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,720 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,738 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,739 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,739 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,740 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,740 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,741 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,744 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,744 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,755 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,757 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,757 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,757 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,758 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,759 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,762 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,762 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,772 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,773 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,774 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,774 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,774 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,775 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,778 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,778 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,788 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,789 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,789 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,789 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,789 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,790 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,793 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,793 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,804 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,805 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,805 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,805 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,805 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,806 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,808 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,808 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,818 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,819 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,822 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,822 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,832 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,833 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,833 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,833 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,834 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,834 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,836 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,837 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:06:08,846 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:06:08,846 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:06:08,846 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:06:08,847 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode.
2012-07-06 06:06:08,847 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:06:08,847 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:06:08,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:06:08,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
----------------------------------------------------------------------------
| records     | year:chararray     | temperature:int     | quality:int     | 
----------------------------------------------------------------------------
|             | 1949               | 111                 | 1               | 
|             | 1949               | 78                  | 1               | 
|             | 1949               | 9999                | 1               | 
----------------------------------------------------------------------------
-------------------------------------------------------------------------------------
| filtered_records     | year:chararray     | temperature:int     | quality:int     | 
-------------------------------------------------------------------------------------
|                      | 1949               | 111                 | 1               | 
|                      | 1949               | 78                  | 1               | 
-------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------
| grouped_records     | group:chararray     | filtered_records:bag{:tuple(year:chararray,temperature:int,quality:int)}                     | 
--------------------------------------------------------------------------------------------------------------------------------------------
|                     | 1949                | {(1949, 111, 1), (1949, 78, 1)}                                                              | 
--------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------
| max_temp     | group:chararray     | :int     | 
-------------------------------------------------
|              | 1949                | 111      | 
-------------------------------------------------

3.6 STORE

grunt> store max_temp into 'out' using PigStorage(';');
2012-07-06 06:42:16,785 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2012-07-06 06:42:16,806 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:42:16,807 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2012-07-06 06:42:16,809 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:42:16,809 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:42:16,811 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:42:16,811 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:42:16,811 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job5028501529287696495.jar
2012-07-06 06:42:19,342 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job5028501529287696495.jar created
2012-07-06 06:42:19,344 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-07-06 06:42:19,347 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51
2012-07-06 06:42:19,347 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-07-06 06:42:19,357 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-07-06 06:42:19,436 [Thread-103] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:42:19,436 [Thread-103] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 06:42:19,437 [Thread-103] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-07-06 06:42:19,858 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201207030133_0015
2012-07-06 06:42:19,858 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201207030133_0015
2012-07-06 06:42:19,859 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-07-06 06:42:31,884 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-07-06 06:42:54,937 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-07-06 06:42:54,938 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
0.20.203.0      0.10.0  nomad2        2012-07-06 06:42:16     2012-07-06 06:42:54     GROUP_BY,FILTER

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime  Alias   Feature Outputs
job_201207030133_0015   1       1       6       6       6       12      12      12      filtered_records,grouped_records,max_temp,records      GROUP_BY,COMBINER       hdfs://localhost/user/nomad2/out,

Input(s):
Successfully read 5 records (432 bytes) from: "hdfs://localhost/user/nomad2/input/ncdc/micro-tab/sample.txt"

Output(s):
Successfully stored 2 records (17 bytes) in: "hdfs://localhost/user/nomad2/out"

Counters:
Total records written : 2
Total bytes written : 17
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201207030133_0015


2012-07-06 06:42:54,943 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
grunt> cat out
1949;111
1950;22

3.7 STEAM

grunt> c = stream records through `cut -f 2`;
grunt> dump c
2012-07-06 06:46:07,018 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: STREAMING
2012-07-06 06:46:07,039 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-07-06 06:46:07,040 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-07-06 06:46:07,040 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-07-06 06:46:07,042 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-07-06 06:46:07,042 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-07-06 06:46:07,043 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2350549270034741889.jar
2012-07-06 06:46:09,525 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2350549270034741889.jar created
2012-07-06 06:46:09,526 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-07-06 06:46:09,537 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-07-06 06:46:09,641 [Thread-116] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:46:09,641 [Thread-116] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-07-06 06:46:09,642 [Thread-116] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-07-06 06:46:10,037 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201207030133_0016
2012-07-06 06:46:10,037 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201207030133_0016
2012-07-06 06:46:10,039 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-07-06 06:46:23,084 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-07-06 06:46:30,106 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-07-06 06:46:30,107 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
0.20.203.0      0.10.0  nomad2        2012-07-06 06:46:07     2012-07-06 06:46:30     STREAMING

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MaxReduceTime   MinReduceTime   AvgReduceTime  Alias   Feature Outputs
job_201207030133_0016   1       0       6       6       6       0       0       0       c,records       STREAMING,MAP_ONLY     hdfs://localhost/tmp/temp-1533317312/tmp1954401074,

Input(s):
Successfully read 5 records (432 bytes) from: "hdfs://localhost/user/nomad2/input/ncdc/micro-tab/sample.txt"

Output(s):
Successfully stored 5 records (46 bytes) in: "hdfs://localhost/tmp/temp-1533317312/tmp1954401074"

Counters:
Total records written : 5
Total bytes written : 46
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201207030133_0016


2012-07-06 06:46:30,112 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2012-07-06 06:46:30,118 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-07-06 06:46:30,118 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(0)
(22)
(-11)
(111)
(78)

你可能感兴趣的:(《Hadoop The Definitive Guide》ch11 Pig)

运动仿真——phased.Platform TifLil phasedArray工具箱 MATLAB知识点 matlab
在雷达仿真过程中，运动仿真的必要性，以及运动仿真可以实现哪些功能，在matlab对应的userguide中已经讲的很清楚了，这里不再赘述。本文主要介绍phased.Platform的一些“坑”，和典型的用法。第一坑：系统对象机制系统对象（systemobject）在调用的时候，返回当前的状态值，并计算下一状态值存储在系统对象中，直到调用release函数复位。假如仿真的时间步长为T，第一次调用系统
Prompt Engineering（提示工程）遥望盼望 NLP prompt python 人工智能
PromptEngineering（提示工程）1、提示工程2、分隔符3、翻译案例4、词库扩展5、Langchain提示模板PromptTemplate（用封装库）6、langchain结构化输出（用封装库）7、openai结构化输出1、提示工程https://prompt-engineering.xiniushu.com/guides/guidelineshttps://www.prompting
2024年河南省职业院校技能大赛高职组 “大数据分析与应用” 赛项任务书（四）落寞的魚丶大数据应用开发赛项数据分析数据挖掘高职组 2024年河南职业技能大赛大数据分析与应用
2024年河南省职业院校技能大赛高职组“大数据分析与应用”赛项任务书（四））背景描述：任务一：Hadoop完全分布式安装配置（25分）任务二：离线数据处理（25分）子任务一：数据抽取任务三：数据采集与实时计算（20分）任务一：实时数据采集任务四：数据可视化（10分）子任务一：用柱状图展示各省份消费额的中位数任务五：综合分析（20分）子任务一：Kafka中的数据如何保证不丢失？子任务二：请描述HBa
大数据（2）Hadoop架构深度拆解：HDFS与MapReduce企业级实战与高阶调优一个天蝎座白勺程序猿大数据开发从入门到实战合集大数据 hadoop 架构
目录一、分布式系统的设计哲学演进1.1从Google三驾马车到现代数据湖二、企业级HDFS架构全景图2.1联邦架构的深度实践2.2生产环境容灾设计2.3性能压测方法论三、MapReduce引擎内核解密3.1Shuffle机制全链路优化3.2资源调度革命：从MRv1到YARN3.3企业级编码规范四、千亿级数据分析实战：运营商信令数据挖掘4.1场景描述4.2优化后的MR作业链4.3性能对比数据五、云原
[AMS] Android 后台进程启动 activity 限制 Stang_Tang Android framework android
https://developer.android.google.cn/guide/components/activities/background-starts?hl=ltAndroid10（API级别29）及更高版本对应用在后台运行时可以启动activity的时间施加了限制。这些限制有助于最大限度地减少对用户造成的干扰，并且可以让用户更好地控制其屏幕上显示的内容。frameworks/base
探索EPG：一款强大的自定义电子节目指南工具伍辰惟
探索EPG：一款强大的自定义电子节目指南工具去发现同类优质开源项目:https://gitcode.com/项目简介EPG（ElectronicProgramGuide）是一款开源的、基于Web的电子节目指南系统，旨在帮助用户轻松创建和管理个性化的电视节目或广播频道列表。该项目提供了灵活的数据导入方式，支持XMLTV格式，并且可以与各种流媒体设备如Kodi、Plex等无缝集成，为你的多媒体娱乐体验
hadoop-HDFS操作 wenying_44323744 hadoop hdfs eclipse
1.使用的是hadoop的用户登录到系统，那么cd~是跳转到/home/hadoop下。2.在操作hdfs时，需要在hadoop用户下的/usr/local/hadoop，此时是在根目录下。cd/usr/local/hadoop或者cd/cdusr/local/hadoop3.回到Linux的操作目录我们把安装包放在了linux系统下的Downloads文件下，可以sudotar-zxf~/Dow
Hadoop安装 Cindy_0124 hadoop 大数据分布式
Hadoop的安装方式有三种，分别是单机模式，伪分布式模式，分布式模式。单机模式：单机模式：Hadoop默认模式为非分布式模式（本地模式），无需进行其他配置即可运行。非分布式即单Java进程，方便进行调试。伪分布式模式：Hadoop可以在单节点上以伪分布式的方式运行，Hadoop进程以分离的Java进程来运行，节点既作为NameNode也作为DataNode，同时，读取的是HDFS中的文件。分布式
数据权限访问控制（Apache Sentry） deepdata_cn 权限管理 apache sentry
ApacheSentry最初由Cloudera公司内部开发，针对Hadoop系统中的数据（主要是HDFS、Hive的数据）进行细粒度控制，对HDFS、Hive以及Impala有着良好的支持性。2013年Sentry成为Apache的孵化项目，为Hadoop集群元数据和数据存储提供集中、细粒度的访问控制。其架构包括DataEngine、Plugin、Policymetadata等部分，Plugin负
SL导轨通常指的是“直线导轨”（Linear Guide），也称为线性滑轨或直线轴承 getapi 数据库云计算
SL导轨通常指的是“直线导轨”（LinearGuide），也称为线性滑轨或直线轴承。它是一种机械传动元件，广泛应用于需要高精度直线运动的机械设备中。SL导轨的主要功能是为运动部件提供平稳、精确的直线导向，同时承受一定的负载。以下是关于SL导轨的一些关键信息：1.SL导轨的基本结构SL导轨通常由以下几个主要部分组成：导轨（Rail）：安装在固定部件上，作为滑块的运动轨道。滑块（Block/Carri
GSMA SAS 安全生产审计检查清单 SofterICer eSIM SAS 安全网络
GSMASAS安全生产审计检查清单以下是根据GSMAFS.18-SecurityAccreditationScheme-ConsolidatedSecurityRequirementsandGuidelinesv11.1文档中与安全生产相关的章节，整理的安全生产审计检查清单。该清单涵盖了生产流程安全的关键领域、控制措施和最佳实践，并按照文档结构进行组织。1.生产流程控制控制措施/要求适用性状态备注
hbase表无法删除，命令行卡住问题处理 spring208208 大数据组件线上问题分析 hbase 数据库大数据
问题现象hbase表无法删除，命令行卡住1.activemaster日志出现超时WARNorg.apache.hadoop.hbase.master.procedure.TruncateTableProcedure:Retriableerrortryingtotruncatetable=xxxstate=TRUNCATE_TABLE_PRE_OPERATIONorg.apache.hadoop.h
【Linux 下的 bash 无法正常解析, Windows 的 CRLF 换行符问题导致的】待磨的钝刨 linux bash windows
文章目录报错原因：解决办法：方法一：用`dos2unix`修复方法二：手动转换换行符方法三：VSCode或其他编辑器手动改总结这个错误很常见，原因是你的wait_for_gpu.sh脚本文件格式不对，具体来说是Windows的CRLF换行符问题导致的，Linux下的bash无法正常解析。hadoop@hadoop:~/anaconda3$bashwait_for_gpu.sh:invalidopt
大数据技术实战---项目中遇到的问题及项目经验一个“不专业”的阿凡大数据
问题导读：1、项目中遇到过哪些问题？2、Kafka消息数据积压，Kafka消费能力不足怎么处理？3、Sqoop数据导出一致性问题？4、整体项目框架如何设计？项目中遇到过哪些问题7.1Hadoop宕机（1）如果MR造成系统宕机。此时要控制Yarn同时运行的任务数，和每个任务申请的最大内存。调整参数：yarn.scheduler.maximum-allocation-mb（单个任务可申请的最多物理内存
Picgo 配置--免费图床使用三金C_C 工具’工具使用 picgo 图床
下载pigco，然后去github建一个仓库，可以命名为https://github.com/jacinli/image-hosting这是一个免费的图床方案使用了picgo+github仓库的方案可以配置CDN，强烈建议配置cdn(见最下）1.准备GitHub仓库创建仓库：登录GitHub，点击右上角的“+”号，选择“Newrepository”。给仓库取一个名字（比如image-hosting
Apache大数据旭哥优选大数据选题 Apache大数据旭大数据定制选题 java hadoop spark 开发语言 idea hive 数据库架构
定制旭哥服务，一对一，无中介包安装+答疑+售后态度和技术都很重要定制按需求做要求不高就实惠一点定制需提前沟通好怎么做，这样才能避免不必要的麻烦python、flask、Django、mapreduce、mysqljava、springboot、vue、echarts、hadoop、spark、hive、hbase、flink、SparkStreaming、kafka、flume、sqoop分析+推
【Hive】-- hive 3.1.3 伪分布式部署（单节点） oo寻梦in记 Apache Paimon 大数据服务部署 hive 分布式 hadoop
1、环境准备1.1、版本选择apachehive3.1.3apachehadoop3.1.0oraclejdk1.8mysql8.0.15操作系统：Macos10.151.2、软件下载https://archive.apache.org/dist/hive/https://archive.apache.org/dist/hadoop/1.3、解压tar-zxvfapache-hive-4.0.0-
【Linux】Hadoop-3.4.1的伪分布式集群的初步配置孤独打铁匠Julian Linux linux hadoop ubuntu
配置步骤一、检查环境JDK#目前还是JDK8最适合Hadoopjava-versionecho$JAVA_HOMEHadoophadoopversionecho$HADOOP_HOME二、配置SSH免密登录Hadoop需要通过SSH管理节点（即使在伪分布式模式下）sudoaptinstallopenssh-server#安装SSH服务（如未安装）cd~/.ssh/ssh-keygen-trsa#生
pigz is needed by docker-ce-18.03.1.ce-1.el7.centos.x86_64 问题清风的BLOG 问题锦集 pigz is needed by docker-ce-18
出现问题[root@centos74~]#rpm-ivhdocker-ce-18.03.1.ce-1.el7.centos.x86_64.rpm error:Faileddependencies: pigzisneededbydocker-ce-18.03.1.ce-1.el7.centos.x86_64解决问题由上可以看出安装docker需要pigz这个包所以安装这个依赖pigz下载网址：可
Hadoop 集群规划与部署最佳实践 AI天才研究院 Python实战 DeepSeek R1 &大数据AI人工智能大模型自然语言处理人工智能语言模型编程实践开发语言架构设计
作者：禅与计算机程序设计艺术1.简介2009年2月2日，ApacheHadoop项目诞生。它是一个开源的分布式系统基础架构，用于存储、处理和分析海量的数据。Hadoop具有高容错性、可靠性、可扩展性、适应性等特征，因而广泛应用于数据仓库、日志分析、网络流量监测、推荐引擎、搜索引擎等领域。由于Hadoop采用“分而治之”的架构设计理念，因此可以轻松应对数据量、计算能力和存储成本的增长。2013年底，
MySQL 到 Hadoop：Sqoop 数据迁移 ETL Ice星空 ETL
文章目录ETL：Extract-Transform-Load数据迁移过程一、Extract数据抽取1.ODS：OperationalDataStore-可操作数据存储2.DW：DataWarehouse-数据仓库3.DM：DataMart-数据集市二、Transform数据清洗和转换1.数据清洗2.数据转换三、Load数据加载四、数据迁移方法1.Sqoop1.1MySQL->Hive1.1.1im
HBase安装 lianhedaxue Hadoop hbase
HBase安装本章将介绍如何安装HBase和初始配置。需要用Java和Hadoop来处理HBase，所以必须下载java和Hadoop并安装系统中。安装前设置安装Hadoop在Linux环境下之前，需要建立和使用LinuxSSH(安全Shell)。按照下面设立Linux环境提供的步骤。创建一个用户首先，建议从Unix创建一个单独的Hadoop用户，文件系统隔离Hadoop文件系统。按照下面给出创建
HBase的架构介绍，安装及简单操作 pk_xz123456 大数据 hbase 架构数据库
一、HBase安装1.环境准备Java环境：确保系统中已经安装了Java8或更高版本。可以通过在命令行中输入java-version来检查Java版本。Hadoop环境：HBase依赖于Hadoop，需要先安装并配置好Hadoop集群。确保Hadoop的相关服务（如HDFS、YARN等）已经正常启动。2.下载HBase从HBase官方网站（https://hbase.apache.org/）下载适
HDFS相关的面试题努力的搬砖人. java 面试 hdfs
以下是150道HDFS相关的面试题，涵盖了HDFS的基本概念、架构、操作、数据存储、高可用性、权限管理、性能优化、容错机制、与MapReduce的结合、安全性、数据压缩、监控与管理、与YARN的关系、数据一致性、数据备份与恢复等方面，希望对你有所帮助。HDFS基本概念1.HDFS是什么？它的设计目标是什么？•HDFS是Hadoop分布式文件系统，设计目标是实现对大规模数据的高吞吐量访问，适用于一次
hadoop3.x--搭建hadoop高可用集群（HA模式）运维小菜 hadoop hadoop hdfs
hadoop高可用集群（HA模式）一、安装前1.集群规划2.安装前配置3.安装jdk与hadoop4.克隆虚拟机与互信配置5.搭建zookeeper集群二、HDFS1.配置hdfs2.初始化启动hdfs集群三、MapReduce与Yarn1.配置MapReduce2.配置yarn3.启动yarn四、验证1.查看java进程2.hdfs与yarn前台页面一、安装前1.集群规划hostnameipNN
c++ Templates Guide Benny.LIU c++template
c++TemplatesGuide前言FunctionTemplatesClassTemplatesNontypeTemplateParametersTrickyBasicsUsingTemplatesinPracticeBasicTemplateTerminology前言Typeparametersareintroducedwitheitherthekeywordtypenameorthekey
Vue3 从零到全掌握：最详尽的入门指南（近万字超全内容） AA-老高(接毕设) 开发资料 vue.js 前端 javascript
一、Vue脚手架Vue3官方文档地址：https://v3.cn.vuejs.org/以前的官方脚手架@vue-cli也可以用，但这里推荐一个更轻快的脚手架Vite脚手架网址：Vite中文网方式一：vue-cli脚手架初始化Vue3项目官方文档：https://cli.vuejs.org/zh/guide/creating-a-project.html#vue-create// 查看@vue/
在虚拟机上安装Hadoop 杜清卿 hadoop
基本步骤与安装java一致:先用finalshell将hadoop-3.1.3.tar.gz导入到opt目录下面的software文件夹下面，然后解压,最后配置环境变量。1.使用finalshell上传。这里直接鼠标拖动操作即可。2.解压。进入到Hadoop安装包路径下，cd/opt/software/，再解压安装文件到/opt/module下，对应的命令是:tar-zxvfhadoop-.1.3
hadoop集群配置-scp拓展使用杜清卿 hadoop 服务器大数据
任务1：在hadoop102上，将hadoop101中/opt/module/hadoop-3.1.3目录拷贝到hadoop102上。分析：使用scp进行拉取操作：先登录到hadoop2使用命令：scp-rroot@hadoop101:/opt/module/hadoop-3.1.3/opt/module/任务2：在hadoop101上操作，将hadoop100中/opt/module目录下所有目
大数据学习（75）-大数据组件总结 viperrrrrrr 大数据 impala yarn hdfs hive CDH mapreduce
大数据学习系列专栏：哲学语录:用力所能及，改变世界。如果觉得博主的文章还不错的话，请点赞+收藏⭐️+留言支持一下博主哦一、CDHCDH（ClouderaDistributionIncludingApacheHadoop)是由Cloudera公司提供的一个集成了ApacheHadoop以及相关生态系统的发行版本。CDH是一个大数据平台，简化和加速了大数据处理分析的部署和管理。CDH提供Hadoop的
微信开发者验证接口开发 362217990 微信开发者 token 验证
微信开发者接口验证。 Token，自己随便定义，与微信填写一致就可以了。根据微信接入指南描述 http://mp.weixin.qq.com/wiki/17/2d4265491f12608cd170a95559800f2d.html 第一步：填写服务器配置第二步：验证服务器地址的有效性第三步：依据接口文档实现业务逻辑这里主要讲第二步验证服务器有效性。建一个
一个小编程题-类似约瑟夫环问题 BrokenDreams 编程
今天群友出了一题：一个数列,把第一个元素删除,然后把第二个元素放到数列的最后,依次操作下去,直到把数列中所有的数都删除,要求依次打印出这个过程中删除的数。 &
linux复习笔记之bash shell (5) 关于减号-的作用 eksliang linux关于减号“-”的含义 linux关于减号“-”的用途 linux关于“-”的含义 linux关于减号的含义
转载请出自出处： http://eksliang.iteye.com/blog/2105677 管道命令在bash的连续处理程序中是相当重要的，尤其在使用到前一个命令的studout（标准输出）作为这次的stdin（标准输入）时，就显得太重要了，某些命令需要用到文件名，例如上篇文档的的切割命令（split）、还有
Unix(3) 18289753290 unix ksh
1)若该变量需要在其他子进程执行，则可用"$变量名称"或${变量}累加内容什么是子进程？在我目前这个shell情况下，去打开一个新的shell，新的那个shell就是子进程。一般状态下，父进程的自定义变量是无法在子进程内使用的，但通过export将变量变成环境变量后就能够在子进程里面应用了。 2)条件判断： &&代表and ||代表or&nbs
关于ListView中性能优化中图片加载问题酷的飞上天空 ListView
ListView的性能优化网上很多信息，但是涉及到异步加载图片问题就会出现问题。具体参看上篇文章http://314858770.iteye.com/admin/blogs/1217594 如果每次都重新inflate一个新的View出来肯定会造成性能损失严重，可能会出现listview滚动是很卡的情况，还会出现内存溢出。现在想出一个方法就是每次都添加一个标识，然后设置图
德国总理默多克：给国人的一堂“震撼教育”课永夜-极光教育
http://bbs.voc.com.cn/topic-2443617-1-1.html德国总理默多克：给国人的一堂“震撼教育”课　安吉拉—默克尔，一位经历过社会主义的东德人，她利用自己的博客，发表一番来华前的谈话，该说的话，都在上面说了，全世界想看想传播——去看看默克尔总理的博客吧！　　德国总理默克尔以她的低调、朴素、谦和、平易近人等品格给国人留下了深刻印象。她以实际行动为中国人上了一堂
关于Java继承的一个小问题。。。随便小屋 java
今天看Java 编程思想的时候遇见一个问题，运行的结果和自己想想的完全不一样。先把代码贴出来！ //CanFight接口 interface Canfight { void fight(); } //ActionCharacter类 class ActionCharacter { public void fight() { System.out.pr
23种基本的设计模式 aijuans 设计模式
Abstract Factory：提供一个创建一系列相关或相互依赖对象的接口，而无需指定它们具体的类。　　Adapter：将一个类的接口转换成客户希望的另外一个接口。A d a p t e r模式使得原本由于接口不兼容而不能一起工作的那些类可以一起工作。　　Bridge：将抽象部分与它的实现部分分离，使它们都可以独立地变化。　　Builder：将一个复杂对象的构建与它的表示分离，使得同
《周鸿祎自述：我的互联网方法论》读书笔记 aoyouzi 读书笔记
从用户的角度来看,能解决问题的产品才是好产品,能方便/快速地解决问题的产品,就是一流产品. 商业模式不是赚钱模式一款产品免费获得海量用户后,它的边际成本趋于0,然后再通过广告或者增值服务的方式赚钱,实际上就是创造了新的价值链. 商业模式的基础是用户,木有用户,任何商业模式都是浮云.商业模式的核心是产品,本质是通过产品为用户创造价值. 商业模式还包括寻找需求
JavaScript动态改变样式访问技术百合不是茶 JavaScript style属性 ClassName属性
一:style属性格式: HTML元素.style.样式属性="值"; 创建菜单:在html标签中创建或者在head标签中用数组创建 <html> <head> <title>style改变样式</title> </head> &l
jQuery的deferred对象详解 bijian1013 jquery deferred对象
jQuery的开发速度很快，几乎每半年一个大版本，每两个月一个小版本。每个版本都会引入一些新功能，从jQuery 1.5.0版本开始引入的一个新功能----deferred对象。 &nb
淘宝开放平台TOP Bill_chen C++c 物流 C#
淘宝网开放平台首页：http://open.taobao.com/ 淘宝开放平台是淘宝TOP团队的产品，TOP即TaoBao Open Platform，是淘宝合作伙伴开发、发布、交易其服务的平台。支撑TOP的三条主线为： 1.开放数据和业务流程 * 以API数据形式开放商品、交易、物流等业务； &
【大型网站架构一】大型网站架构概述 bit1129 网站架构
大型互联网特点面对海量用户、海量数据大型互联网架构的关键指标高并发高性能高可用高可扩展性线性伸缩性安全性大型互联网技术要点前端优化 CDN缓存反向代理 KV缓存消息系统分布式存储 NoSQL数据库搜索监控安全想到的问题： 1.对于订单系统这种事务型系统，如
eclipse插件hibernate tools安装白糖_ Hibernate
eclipse helios(3.6)版 1.启动eclipse 2.选择 Help > Install New Software...> 3.添加如下地址： http://download.jboss.org/jbosstools/updates/stable/helios/ 4.选择性安装：hibernate tools在All Jboss tool
Jquery easyui Form表单提交注意事项 bozch jquery easyui
jquery easyui对表单的提交进行了封装，提交的方式采用的是ajax的方式，在开发的时候应该注意的事项如下： 1、在定义form标签的时候，要将method属性设置成post或者get，特别是进行大字段的文本信息提交的时候，要将method设置成post方式提交，否则页面会抛出跨域访问等异常。所以这个要
Trie tree(字典树)的Java实现及其应用-统计以某字符串为前缀的单词的数量 bylijinnan java实现
import java.util.LinkedList; public class CaseInsensitiveTrie { /** 字典树的Java实现。实现了插入、查询以及深度优先遍历。 Trie tree's java implementation.(Insert,Search,DFS) Problem Description Igna
html css 鼠标形状样式汇总 chenbowen00 html css
css鼠标手型cursor中hand与pointer Example：CSS鼠标手型效果 <a href="#" style="cursor:hand">CSS鼠标手型效果</a><br/> Example：CSS鼠标手型效果 <a href="#" style=&qu
[IT与投资]IT投资的几个原则 comsci it
无论是想在电商,软件,硬件还是互联网领域投资,都需要大量资金,虽然各个国家政府在媒体上都给予大家承诺,既要让市场的流动性宽松,又要保持经济的高速增长....但是,事实上,整个市场和社会对于真正的资金投入是非常渴望的,也就是说,表面上看起来,市场很活跃,但是投入的资金并不是很充足的......
oracle with语句详解 daizj oracle with with as
oracle with语句详解转在oracle中，select 查询语句，可以使用with,就是一个子查询，oracle 会把子查询的结果放到临时表中，可以反复使用例子:注意，这是sql语句，不是pl/sql语句，可以直接放到jdbc执行的 ----------------------------------------------------------------
hbase的简单操作 deng520159 数据库 hbase
近期公司用hbase来存储日志,然后再来分析 ,把hbase开发经常要用的命令找了出来. 用ssh登陆安装hbase那台linux后用hbase shell进行hbase命令控制台! 表的管理 1）查看有哪些表 hbase(main)> list 2）创建表 # 语法：create <table>, {NAME => <family&g
C语言scanf继续学习、算术运算符学习和逻辑运算符 dcj3sjt126com c
/* 2013年3月11日20:37:32 地点：北京潘家园功能：完成用户格式化输入多个值目的：学习scanf函数的使用 */ # include <stdio.h> int main(void) { int i, j, k; printf("please input three number:\n"); //提示用
2015越来越好 dcj3sjt126com 歌曲
越来越好房子大了电话小了感觉越来越好假期多了收入高了工作越来越好商品精了价格活了心情越来越好天更蓝了水更清了环境越来越好活得有奔头人会步步高想做到你要努力去做到幸福的笑容天天挂眉梢越来越好婆媳和了家庭暖了生活越来越好孩子高了懂事多了学习越来越好朋友多了心相通了大家越来越好道路宽了心气顺了日子越来越好活的有精神人就不显
java.sql.SQLException: Value '0000-00-00' can not be represented as java.sql.Tim feiteyizu mysql
数据表中有记录的time字段（属性为timestamp）其值为：“0000-00-00 00:00:00” 程序使用select 语句从中取数据时出现以下异常： java.sql.SQLException:Value '0000-00-00' can not be represented as java.sql.Date java.sql.SQLException: Valu
Ehcache（07）——Ehcache对并发的支持 234390216 并发 ehcache 锁 ReadLock WriteLock
Ehcache对并发的支持在高并发的情况下，使用Ehcache缓存时，由于并发的读与写，我们读的数据有可能是错误的，我们写的数据也有可能意外的被覆盖。所幸的是Ehcache为我们提供了针对于缓存元素Key的Read（读）、Write（写）锁。当一个线程获取了某一Key的Read锁之后，其它线程获取针对于同
mysql中blob,text字段的合成索引 jackyrong mysql
在mysql中，原来有一个叫合成索引的，可以提高blob,text字段的效率性能，但只能用在精确查询，核心是增加一个列，然后可以用md5进行散列，用散列值查找则速度快比如： create table abc(id varchar(10),context blog,hash_value varchar(40)); insert into abc(1,rep
逻辑运算与移位运算 latty 位运算逻辑运算
源码：正数的补码与原码相同例+7 源码：00000111 补码：00000111 （用8位二进制表示一个数）负数的补码：符号位为1，其余位为该数绝对值的原码按位取反；然后整个数加1。 -7 源码： 10000111 ，其绝对值为00000111 取反加一：11111001 为-7补码已知一个数的补码，求原码的操作分两种情况：
利用XSD 验证XML文件 newerdragon java xml xsd
XSD文件（XML Schema 语言也称作 XML Schema 定义（XML Schema Definition，XSD）。具体使用方法和定义请参看： http://www.w3school.com.cn/schema/index.asp java自jdk1.5以上新增了SchemaFactory类可以实现对XSD验证的支持，使用起来也很方便。以下代码可用在J
搭建 CentOS 6 服务器(12) - Samba rensanning centos
（1）安装 # yum -y install samba Installed: samba.i686 0:3.6.9-169.el6_5 # pdbedit -a rensn new password:123456 retype new password:123456 …… （2）Home文件夹 # mkdir /etc
Learn Nodejs 01 toknowme nodejs
（1）下载nodejs https://nodejs.org/download/ 选择相应的版本进行下载（2）安装nodejs 安装的方式比较多，请baidu下我这边下载的是“node-v0.12.7-linux-x64.tar.gz”这个版本（1）上传服务器（2）解压 tar -zxvf node-v0.12.
jquery控制自动刷新的代码举例 xp9802 jquery
1、html内容部分复制代码代码示例: <div id='log_reload'> <select name="id_s" size="1"> <option value='2'>-2s-</option> <option value='3'>-3s-</option