Pig version:
[root@n8 examples]# pig -version Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:48:21
Hadoop version:
[root@n8 examples]# hadoop version Hadoop 2.0.0-cdh4.3.0 Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.3.0/src/hadoop-common-project/hadoop-common -r 48a9315b342ca16de92fcc5be95ae3650629155a Compiled by jenkins on Mon May 27 19:45:25 PDT 2013 From source with checksum a4218d77f9b12df4e3e49ef96f9d357d This command was run using /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/hadoop-common-2.0.0-cdh4.3.0.jar
Try to run below Pig script
define dividend_analysis (daily, year, daily_symbol, daily_open, daily_close) returns analyzed { divs = load 'NYSE_dividends' as (exchange:chararray, symbol:chararray, date:chararray, dividends:float); divsthisyear = filter divs by date matches '$year-.*'; dailythisyear = filter $daily by date matches '$year-.*'; jnd = join divsthisyear by symbol, dailythisyear by $daily_symbol; $analyzed = foreach jnd generate dailythisyear::$daily_symbol, $daily_close - $daily_open; }; daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray, date:chararray, open:float, high:float, low:float, close:float, volume:int, adj_close:float); results = dividend_analysis(daily, '2009', 'symbol', 'open', 'close'); dump results;
Error info as follows
grunt> exec ch6/macro.pig 2013-08-16 19:39:17,936 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS 2013-08-16 19:39:17,937 [main] WARN org.apache.hadoop.conf.Configuration - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address 2013-08-16 19:39:17,937 [main] WARN org.apache.hadoop.conf.Configuration - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum <line 21, column 53> Unexpected character '$' 2013-08-16 19:39:17,943 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 21, column 53> Unexpected character '$' Details at logfile: /root/git/programmingpig/examples/pig_1376647314921.log
The error happen in this line
$analyzed = foreach jnd generate dailythisyear::$daily_symbol, $daily_close - $daily_open;
Now sure why this error happen for now, pig version?? Apparently we have to use the prefix 'dailythisyear::' here because both datasets have one field names symbol. If I remove the prefix and run again, the error will be:
2013-08-16 19:33:10,570 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: at expanding macro 'dividend_analysis' (null:24) <line 19, column 43> Invalid field projection. Projected field [symbol] does not exist in schema: macro_dividend_analysis_divsthisyear_0::exchange:chararray,macro_dividend_analysis_divsthisyear_0::symbol:chararray,macro_dividend_analysis_divsthisyear_0::date:chararray,macro_dividend_analysis_divsthisyear_0::dividends:float,macro_dividend_analysis_dailythisyear_0::exchange:chararray,macro_dividend_analysis_dailythisyear_0::symbol:chararray,macro_dividend_analysis_dailythisyear_0::date:chararray,macro_dividend_analysis_dailythisyear_0::open:float,macro_dividend_analysis_dailythisyear_0::high:float,macro_dividend_analysis_dailythisyear_0::low:float,macro_dividend_analysis_dailythisyear_0::close:float,macro_dividend_analysis_dailythisyear_0::volume:int,macro_dividend_analysis_dailythisyear_0::adj_close:float. Details at logfile: /root/git/programmingpig/examples/pig_1376647314921.log