- cogroup
cogroup is a generalization of group . Instead of collecting records of one input based on a key, it collects records of n inputs based on a key. The result is a record with a key and one bag for each input.
A = load 'input1' as (id:int, val:float);
B = load 'input2' as (id:int, val2:int);
C = cogroup A by id, B by id;
describe C;
C: {group: int,A: {id: int,val: float},B: {id: int,val2: int}}
Another way to think of cogroup is as the first half of a join. The keys are collected together, but the cross product is not done.
- union
union puts two data sets together by concatenating them.
A = load 'input1' as (x:int, y:float);
B = load 'input2' as (x:int, y:float);
C = union A, B;
describe C;
C: {x: int,y: float}
A = load 'input1' as (w:chararray, x:int, y:float);
B = load 'input2' as (x:int, y:double, z:chararray);
C = union onschema A, B;
describe C;
C: {w: chararray,x: int,y: double,z: chararray}
- cross
cross matches the mathematical set operation of the same name. In the following Pig Latin, cross takes every record in NYSE_daily and combines it with every record in NYSE_dividends:
daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,date:chararray,
open:float,high:float, low:float,close:float, volume:int, adj_close:float);
divs= load 'NYSE_dividends' as (exchange:chararray, symbol:chararray,date:chararray, dividends:float);
tonsodata = cross daily, divs parallel 10;