超大数据去重的处理办法

#! /bin/bash


sed -i '1d' action_201602.csv
sed -i '1d' action_201603.csv
sed -i '1d' action_201603_extra.csv
sed -i '1d' action_201604.csv


awk '!a[$0]++' action_201602.csv >201602.csv


cat  action_201603.csv |sort|uniq >action_201603_res.csv
cat  action_201604.csv |sort|uniq >201604.csv
cat  action_201603_extra.csv |sort|uniq >201603_e.csv
cat  action_201603.csv |sort|uniq >201603.csv

你可能感兴趣的:(hadoop,操作系统)