1、汉字分词工具使用,以及统计每个汉字出现的次数
思路:Wordcount—>难点怎么去切分一个词汇:中国很大,很美,很富有。
Map ---->
,。 “” ‘’
IK Analyzer 2012_u6_source.jar
IKAnalyzer2012_u6
2、输出每个月平均气温
思路:求平均值---->难点:怎么去设定MapOutKey—年份月份 作为key
3对:Mapper–>
Reducer–>
3、输出每个月3个最低温
思路:Reducer 阶段 定义一个数组长度为3. keyout 年份+月份
获取到每一个温度,跟原数组中的最低温度进行对比,然后替换
自定义:MyTree
public class MyTree{
private float one = 9999;
private float two = 9999;
private float three = 9999;
public void add(float temp) {
if(temp < three) {
if(temp < two) {
if(temp < one) {
three = two;
two = one;
one = temp;
}else {
three = two;
two = temp;
}
}else {
three = temp;
}
}
}
@Override
public String toString() {
return one + “°F,” + two + “°F,” + three+ “°F”;
}
}
4、定义两列数字,让第一列倒叙,第二列正叙
思路:***只要key,value,这个类实现Writable
自定义排序---->必须要实现WritableComparable
@Override
public int compareTo(T o) {
// TODO Auto-generated method stub
return 0;
}
5、根据不同的学期进行成绩总分的汇总,将数据分别输出到多个不同的文件中去。
思路:Partitioner编程 P
class DiyPartitioner extends Partitioner
//根据你输出的值分区,1,最多分为两个区
@Override
public int getPartition(Text key, Student value, int numPartitions) {
System.out.println(value.getXq()+"…");
if(value.getXq().equals(“1”)) {
return 0;
}
return 1;
}
}
numPartitions—>job.setNumReduceTasks(2);
part-r-00000
part-r-00001
默认没有自定义Partitioner,ReduceTasks为:1
M->keyout 求hash值h,h%num(ReduceTasks个数1) 0-----part-r-00000
6、求每个部门最早进入公司的员工姓名
dept文件内容:
10,ACCOUNTING,NEW YORK
20,RESEARCH,DALLAS
30,SALES,CHICAGO
40,OPERATIONS,BOSTON
emp文件内容:
7369,SMITH,CLERK,7902,17-12月-80,800,20
7499,ALLEN,SALESMAN,7698,20-2月-81,1600,300,30
7521,WARD,SALESMAN,7698,22-2月-81,1250,500,30
7566,JONES,MANAGER,7839,02-4月-81,2975,20
7654,MARTIN,SALESMAN,7698,28-9月-81,1250,1400,30
7698,BLAKE,MANAGER,7839,01-5月-81,2850,30
7782,CLARK,MANAGER,7839,09-6月-81,2450,10
7839,KING,PRESIDENT,17-11月-81,5000,10
7844,TURNER,SALESMAN,7698,08-9月-81,1500,0,30
7900,JAMES,CLERK,7698,03-12月-81,950,30
7902,FORD,ANALYST,7566,03-12月-81,3000,20
7934,MILLER,CLERK,7782,23-1月-82,1300,10
多表关联
把数据量小的表存放内存中,使用Mapper—>setup();
7、求每个地区的工厂有哪些
思路:典型的多表关联
确定Mapper–>keyout-工厂ID
8、自定义FileInputFormat
思路:抄 源码(参考源代码)
RecordReader createRecordReader(){
return new MyRecordReader();
}
boolean isSplitable(){
return true;//false
}
奇偶行问题,1>求奇数行之和,偶数行之积 2>奇数行为key,偶数行为value
9、根据成绩排序
思路:自定排序类
10、求孩子和爷爷(爷孙关系)
思路:单表关联
子 父
a b
Mapper—>map(){
context.write(new Text(“b”),new Text(“1”+“a”+“b”));
context.write(new Text(“a”),new Text(“2”+“a”+“b”));
}
Reducer—>reduce(){
String sun[]…
String p[]…
if(sun!=null—p!=null){
sun — p 求笛卡尔积
}
}
11、求成绩平均值
思路:略
12、每一行取第一个单词,统计这些单词的个数
思路:KeyValueTextInputFormat
文件:
hello hadoop linux
hello–key
hadoop linux–value
如果想要修改其默认分割符
conf.set(“mapreduce.input.keyvaluelinerecordreader.key.value.separator”,"-");
13、统计所有英文单词
思路:WordCount—典型案例
14、根据数字排序
思路:略
15、统计手机流量
思路:自定义排序类
16、根据学生成绩排序
思路:略
17、输出天气最小值
思路:略
18、自定义FileOutputFormat
思路:抄 源码
public class MyTextOutputFormat
@Override
public RecordWriter
// TODO Auto-generated method stub
return new MyRecordWriter
}
}
class MyRecordWriter
private static final String utf8 = “UTF-8”;
private static final byte[] newline;
private TaskAttemptContext job;
static {
try {
newline = “\n”.getBytes(utf8);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException(“can’t find " + utf8 + " encoding”);
}
}
private DataOutputStream out;
public MyRecordWriter(TaskAttemptContext o) throws IOException {
// TODO Auto-generated constructor stub
this.job=o;
}
@Override
public void write(K key, V value) throws IOException, InterruptedException {
Configuration conf = job.getConfiguration();
boolean nullKey = key == null || key instanceof NullWritable;
boolean nullValue = value == null || value instanceof NullWritable;
if (nullKey && nullValue) {
return;
}
Path file;
if(value.toString().contains("INFO")) {
file = new Path("hdfs://192.168.2.199:9000/z/");
}else {
//省略插入
file=new Path("hdfs://192.168.2.199:9000/other/");
}
FileSystem fs=file.getFileSystem(conf);
FSDataOutputStream fileOut = fs.create(file, true);
out=fileOut;
if (!nullKey) {
writeObject(key,out);
}
if (!(nullKey || nullValue)) {
out.write('\t');
}
if (!nullValue) {
writeObject(key,out);
}
out.write(newline);
}
@Override
public void close(TaskAttemptContext context) throws IOException, InterruptedException {
if(out!=null) {
out.close();
}
}
private void writeObject(Object o,DataOutputStream out) throws IOException {
if (o instanceof Text) {
Text to = (Text) o;
out.write(to.getBytes(), 0, to.getLength());
} else {
out.write(o.toString().getBytes(utf8));
}
}
}
19、数据去重
思路:略
20、求QQ好友推荐
思路:
21、求共同好友
数据源:
A:B,C,D,F,E,O
B:A,C,E,K
目标数据:
A-B C,E
思路:1、选取:右边所有好友,分别为key
B A
C A
D A…
A B
C B
E B…
---->C A,B,E
E A,B,C
static class MyMapper extends Mapper
@Override
protected void map(LongWritable key, Text value, Mapper
throws IOException, InterruptedException {
//A:B,C,D,F,E,O
String line=value.toString();//获取一行文本内容
String st[]=line.split("?;
//st[0] A st[1] B,C,D,F,E,O
String str[]=st[1].split(",");
// Arrays.sort(str);
for (String string : str) {
context.write(new Text(string), new Text(st[0]));//B A
}
}
}
static class MyReducer extends Reducer
@Override
protected void reduce(Text key, Iterable values, Reducer
throws IOException, InterruptedException {//B 都有一个共同好友B
String friends="";
for (Text text : values) {
friends+=text.toString()+",";
}
//friends ---->A,E,F,J,
context.write(key, new Text(friends.substring(0,friends.length()-1)));//–>B A,E,F,J
}
}
A O,H,C,B,F,D,K,I,G
B E,A,J,F
C F,E,H,K,G,B,A
D C,L,K,A,H,G,F,E
E G,H,D,B,L,F,M,A
F A,C,L,M,D,G
G M
H O
I C,O
J O
K O,B
L D,E
M E,F
O A,F,H,I,J
2、" “右边两两组合作为key
static class MyMapper extends Mapper
@Override
protected void map(LongWritable key, Text value, Mapper
throws IOException, InterruptedException {
//B E,J,F,A
String line=value.toString();//获取一行文本内容
String st[]=line.split(”\t");
//st[0] B st[1] E,J,F,A
String str[]=st[1].split(",");
Arrays.sort(str);//---->AE 避免 EA
if(str.length>1) {
for (String string : str) {
String i=string;
for (String s : str) {
String j=s;
if(i!=j) {
context.write(new Text(i+"-"+j), new Text(st[0]));//A-E B
}
}
}
}
}
}
static class MyReducer extends Reducer
@Override
protected void reduce(Text key, Iterable values, Reducer
throws IOException, InterruptedException {//A-E 所有共同好友
String friends="";
for (Text text : values) {
friends+=text.toString()+",";
}
//friends ---->B,C,D,
context.write(key, new Text(friends.substring(0,friends.length()-1)));//–>B A,E,F,J
}
}
A-B C,E
A-C F,D
A-D F,E
A-E C,D,B
A-F E,O,C,B,D
A-G C,E,D,F
A-H D,E,O,C
…