大数据IMF传奇 java开发hadoop wodcount和hdfs文件 !

 
大数据IMF传奇 java开发hadoop wodcount和hdfs文件建立!


windows 32 ecliplse 连接虚拟机的hadoop集群


1、使用hadoop-eclipse-plugin-2.6.0.jar加入eclipse的插件区G:\IMFBigDataSpark2016\eclipse(java)\plugins


2、切换"Map/Reduce"工作目录,eclipse出现mapreduce视图
  eclipse 选择"Window"菜单下选择"Open Perspective",弹出一个窗体,从中选择"Map/Reduce"选项进行切换。
 
   选择"Window"菜单下的"Preference",然后弹出一个窗体,在窗体的左侧,有一列选项,里面会多出"Hadoop Map/Reduce"


选项,点击此选项,选择Hadoop的安装目录 G:\IMFBigDataSpark2016\hadoop-2.6.0


3、建立与Hadoop集群的连接,在Eclipse软件下面的"Map/Reduce Locations"进行右击,弹出一个选项,选择"New Hadoop 
Location", 弹出一个窗体。


4、输入信息
Location Name:Hadoop_2.6_Location 


Map/Reduce Master
Host:192.168.2.100 
Port:50020


DFS Master 
Use M/R Master host:192.168.2.100
Port:9000


User name:hadoop


5、开启hadoop集群
[root@master sbin]#./start-dfs.sh 


6、eclipse中刷新 Hadoop_2.6_Location ,提示出错连接拒绝


7.配置hadoop


[root@master etc]#cp -R ./hadoop ./hadoop.bak.2.9


vi yarn-site.xml


<configuration>


 <property>
   <name>yarn.resourcemanager.hostname</name>
   <value>master</value>
   <description>The hostname of theRM.</description>
 </property>


 <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
   <description>Shuffle service that needs to be set for Map Reduceapplications.</description>
   </property>


</configuration>






8.配置编辑vi mapred-site.xml


<configuration>


  <property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
   <description>The runtime framework for executing MapReduce jobs.</description>
   </property>


  <property>
   <name>mapreduce.jobhistory.address</name>
    <value>master:10020</value>
    <description>MapReduce JobHistoryServer IPC host:port</description>
  </property>
  <property>


   <name>mapreduce.jobhistory.webapp.address</name>
    <value>master:19888</value>
    <description>MapReduce JobHistoryServer Web UI host:port</description>
  </property>


</configuration>


9.start-yarn.sh




[root@master sbin]#./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-hadoop-resourcemanager-master.out
localhost: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-master.out
[root@master sbin]#jps
3040 NodeManager
2948 ResourceManager
2518 DataNode
3080 Jps
2665 SecondaryNameNode
2428 NameNode
[root@master sbin]#


10.在ecliplse中刷新,显示集群的目录,ok


11.开始mapreduce的开发


1.上传文件
[root@master sbin]#hadoop dfs -mkdir -p  /wordcount/input
[root@master hadoop-2.6.0]#hadoop dfs -put LICENSE.txt /wordcount/input
[root@master hadoop-2.6.0]#hadoop dfs -put README.txt /wordcount/input


2.建立代码工程,编辑代码
New-Other -MapReduce的类型 HelloMapReduce
建包:package com.dtspark.hadoop.hellomapreduce;
新建代码文件:MyFirstMapReduce
run configure:
Arguments:输入
hdfs://Master:9000/wordcount/input hdfs://Master:9000/wordcount/output


3、源代码附后
4、运行报错


4.1编译出错,无法加载主类
Description Resource Path Location Type
The container 'Maven Dependencies' references non existing library 'C:\Users\admin\.m2\repository\org\apache


\hadoop\hadoop-yarn-project\2.7.2\hadoop-yarn-project-2.7.2.jar' SparkApps Build pathBuild Path Problem


以为少了jar包,去seach.maven.org查找,修改maven的pom.xml文件
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-yarn-project</artifactId>
    <version>2.7.2</version>
</dependency>
提示pom.xml文件出错了,可能依赖有问题。


另外想办法,再建了一个新的测试类,将MyFirstMapReduce的分段代码拷贝过来,结果又是好的!莫名的问题,搞了半天,终于解决了无法加载主类这个的问题。


4.2 运行仍然有问题,考虑检查虚拟机的集群问题,在虚拟机上运行
[root@master mapreduce]#hadoop jar hadoop-mapreduce-examples-2.6.0.jar w
hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount  hdfs://Master:9000/wordcount/input hdfs://Master:9000/wordcount/output

发现运行以后就卡住了,去掉yarn的配置,hadoop运行正常,先保持最简配置,不用yarn。hadoop集群正常了


4.3 更改程序的传入参数,再次运行又报错:
hdfs://Master:9000/wordcount/input/README.txt   hdfs://Master:9000/My1
运行结果生成,改成IP地址
hdfs://192.168.2.100:9000/wordcount/input  hdfs://192.168.2.100:9000/wordcount/output
出错:
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 


hdfs://192.168.2.100:9000/wordcount/output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:562)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
改成
hdfs://192.168.2.100:9000/wordcount/input  hdfs://192.168.2.100:9000/wordcount/output2
[root@master hadoop]#hadoop dfs -chmod 777 /wordcount




4.4再次运行,再次出错:
Exception in thread "main" java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)


上网查资料:缺了 winutils.exe ,下载 winutils.exe 把此文件放到hadoop/bin下,在环境变量中配置 HADOOP_HOME  Caused by: java.io.IOException: CreateProcess error=216, 该版本的 %1 与您运行的 Windows 版本不兼容。请查看计算机的系统信息,了解是否需要 x86 (32 位)或 x64 (64 位)版本的程序,然后联系软件发布者。


再次下载32位的 winutils.exe,问题解决


4.5运行仍出错,缺了 hadoop.dll问题解决

缺少hadoop.dll,把这个文件拷贝到C:\Windows\System32下面

程序运行ok!



第二个例子:HDFS文件测试
1、源代码附后
2、运行报错


Exception in thread "main" org.apache.hadoop.net.ConnectTimeoutException: Call From pc/192.168.3.6 to master:9000 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=master/180.168.41.175:9000]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)


2、改成ip地址解决:


String uri = "hdfs://master:9000/";  
改成
String uri = "hdfs://192.168.2.100:9000/";  
 
3、运行又出错,新的问题:
 Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=admin, access=WRITE, inode="/":root:supergroup:drwxr-xr-x


[root@master hadoop]#hadoop dfs -ls  /
DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.16/02/09 04:35:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Found 9 items
-rw-r--r--   1 root supergroup       1366 2016-01-24 05:03 /README.txt
drwxr-xr-x   - root supergroup          0 2016-02-08 18:10 /data
-rw-r--r--   1 root supergroup         24 2016-01-24 07:37 /helloSpark.txt
drwxr-xr-x   - root supergroup          0 2016-01-31 22:29 /historyserverforSpark
drwxr-xr-x   - root supergroup          0 2016-01-31 08:45 /library
drwx-wx-wx   - root supergroup          0 2016-02-08 23:17 /tmp
drwxr-xr-x   - root supergroup          0 2016-02-08 21:09 /wordcount
drwxr-xr-x   - root supergroup          0 2016-01-24 05:15 /wordcountoutput
drwxr-xr-x   - root supergroup          0 2016-02-09 00:11 /wordcountoutput1


4/修改权限,解决方法
hadoop dfs -chmod 777 / 
[root@master hadoop]#hadoop dfs -chmod 777 / 
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


16/02/09 04:39:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using 


builtin-java classes where applicable
[root@master hadoop]#


5/运行正常了,


运行结果:


log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
FileStatus{path=hdfs://192.168.2.100:9000/README.txt; isDirectory=false; length=1366; replication=1; 


blocksize=134217728; modification_time=1453629801771; access_time=1454991822379; owner=root; group=supergroup; 


permission=rw-r--r--; isSymlink=false}
FileStatus{path=hdfs://192.168.2.100:9000/data; isDirectory=true; modification_time=1454973004428; 


access_time=0; owner=root; group=supergroup; permission=rwxr-xr-x; isSymlink=false}
FileStatus{path=hdfs://192.168.2.100:9000/helloSpark.txt; isDirectory=false; length=24; replication=1; 


blocksize=134217728; modification_time=1453639061697; access_time=1454970923317; owner=root; group=supergroup; 


permission=rw-r--r--; isSymlink=false}
FileStatus{path=hdfs://192.168.2.100:9000/historyserverforSpark; isDirectory=true; 


modification_time=1454297369551; access_time=0; owner=root; group=supergroup; permission=rwxr-xr-x; 


isSymlink=false}
FileStatus{path=hdfs://192.168.2.100:9000/library; isDirectory=true; modification_time=1454247941619; 


access_time=0; owner=root; group=supergroup; permission=rwxr-xr-x; isSymlink=false}
FileStatus{path=hdfs://192.168.2.100:9000/tmp; isDirectory=true; modification_time=1454991429304; 


access_time=0; owner=root; group=supergroup; permission=rwx-wx-wx; isSymlink=false}
FileStatus{path=hdfs://192.168.2.100:9000/wordcount; isDirectory=true; modification_time=1454983779042; 


access_time=0; owner=root; group=supergroup; permission=rwxr-xr-x; isSymlink=false}
FileStatus{path=hdfs://192.168.2.100:9000/wordcountoutput; isDirectory=true; modification_time=1453630512077; 


access_time=0; owner=root; group=supergroup; permission=rwxr-xr-x; isSymlink=false}
FileStatus{path=hdfs://192.168.2.100:9000/wordcountoutput1; isDirectory=true; modification_time=1454994674335; 


access_time=0; owner=root; group=supergroup; permission=rwxr-xr-x; isSymlink=false}
Hello World!








两个例子的源代码
第一个例子wodcount的例子源代码如下:
package com.dtspark.hadoop.hellomapreduce;
import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.fs.Path;  
import org.apache.hadoop.io.IntWritable;  
import org.apache.hadoop.io.Text;  
import org.apache.hadoop.mapreduce.Job;  
import org.apache.hadoop.mapreduce.Mapper;  
import org.apache.hadoop.mapreduce.Reducer;  
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
import org.apache.hadoop.util.GenericOptionsParser;  
   
import java.io.IOException;  
public class MyFirstMapReduce {


 
    public static class MyMapper extends Mapper<Object, Text, Text, IntWritable>{  
        private final static IntWritable one = new IntWritable(1);  
        private Text event = new Text();  
   
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {  
            int idx = value.toString().indexOf(" ");  
            if (idx > 0) {  
                String e = value.toString().substring(0, idx);  
                event.set(e);  
                context.write(event, one);  
            }  
        }  
    }  
   
    public static class MyReducer extends Reducer<Text,IntWritable,Text,IntWritable> {  
        private IntWritable result = new IntWritable();  
   
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, 


InterruptedException {  
            int sum = 0;  
            for (IntWritable val : values) {  
                sum += val.get();  
            }  
            result.set(sum);  
            context.write(key, result);  
        }  
    }  
   
    public static void main(String[] args) throws Exception {  
        Configuration conf = new Configuration();  
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();  
        if (otherArgs.length < 2) {  
            System.err.println("Usage: EventCount <in> <out>");  
            System.exit(2);  
        }  
        Job job = Job.getInstance(conf, "event count");  
        job.setJarByClass(MyFirstMapReduce.class);  
        job.setMapperClass(MyMapper.class);  
        job.setCombinerClass(MyReducer.class);  
        job.setReducerClass(MyReducer.class);  
        job.setOutputKeyClass(Text.class);  
        job.setOutputValueClass(IntWritable.class);  
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));  
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));  
        System.exit(job.waitForCompletion(true) ? 0 : 1);  
    }  


}


第二个例子创建HDFS文件HelloHDFS的源代码:
package com.dtspark.hadoop.hellomapreduce;
import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.fs.FSDataOutputStream;  
import org.apache.hadoop.fs.FileStatus;  
import org.apache.hadoop.fs.FileSystem;  
import org.apache.hadoop.fs.Path;  
import org.apache.hadoop.io.IOUtils;


import java.io.IOException;
import java.io.InputStream;  
import java.net.URI; 




public class HadoopHDFS {


public static void main(String[] args) throws IOException { 
        String uri = "hdfs://192.168.2.100:9000/";  
        Configuration config = new Configuration();  
        FileSystem fs = FileSystem.get(URI.create(uri), config);  
   
        // 列出hdfs上/ 目录下的所有文件和目录  
        FileStatus[] statuses = fs.listStatus(new Path("/"));  
        for (FileStatus status : statuses) {  
            System.out.println(status);  
        }  
   
        // 在hdfs的/Mytest.log,并写入一行文本  
        FSDataOutputStream os = fs.create(new Path("/Mytest.log"));  
        os.write("Hello World!".getBytes());  
        os.flush();  
        os.close();  
   
        // 显示在hdfs的/Mytest.log下指定文件的内容  
        InputStream is = fs.open(new Path("/Mytest.log"));  
        IOUtils.copyBytes(is, System.out, 1024, true);  


}
}



第一个例子运行结果

大数据IMF传奇 java开发hadoop wodcount和hdfs文件 !_第1张图片







第二个例子运行结果

大数据IMF传奇 java开发hadoop wodcount和hdfs文件 !_第2张图片





   

你可能感兴趣的:(大数据IMF传奇 java开发hadoop wodcount和hdfs文件 !)