window客户端访问HDFS

通过widows客户端访问Hadoop集群,读取HDFS文件

使用平台为eclipse,CDH5.1.0,hdf2.3.0

1、新建java project

2、在集群中找到core-site.xml和hdfs-site文件拷贝到java project的工程下,放置到bin文件夹下

在src右键,新建source folder即可,如下

window客户端访问HDFS_第1张图片

2、编程代码如下:

package com.mail;


import java.net.URI;
import java.io.InputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;


public class Hdfs {


public static void main(String[] args) {
// TODO Auto-generated method stub
try {
Configuration conf = new Configuration();
FileSystem file = FileSystem.get(conf);
String path ="/tmp/data/mllib/kmeans_data.txt";
if(file.exists(new Path(path))){
boolean b = true;
System.out.println("********************");
}

InputStream in = null;
try {
in = file.open(new Path(path));
IOUtils.copyBytes(in, System.out, 4096, true);
} finally {
IOUtils.closeStream(in);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}

3、程序编写完成后执行会存在很多问题。

1)org/apache/commons/logging/LogFactory,缺少commons-logging-1.0.4.jar 

2)com/google/common/collect/Interners,缺少google的包,我下载的是guava-18.0.jar

3)NoClassDefFoundError: org/apache/commons/configuration/Configuration,缺少Configuration的jar包

4)DistributedFileSystem could not be instantiated,org.apache.hadoop.conf.Configuration.addDeprecations

这是由于hdfs版本与集群版本版本不一致



4、读取hdfs的文件,遍历文件夹中的文件,在文件夹中新建文件,并且写入内容,代码如下:

public static void main(String[] args){
try {
Configuration conf = new Configuration();
FileSystem file = FileSystem.get(conf);
String path ="/tmp/daily_mail/CN/sql/";
String Outputpath ="/tmp/daily_mail/CN/hql/";
FileStatus[] lstStatus = file.listStatus(new Path(path));
for (FileStatus status : lstStatus) {
FSDataInputStream inputStream = file.open(status.getPath());
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
String sql = "";
String line = null;
while (null != (line = br.readLine())) {
sql += line;
sql += " ";
}
System.out.println(sql);
String name = Outputpath + status.getPath().getName();
FileSystem OutPutfile = FileSystem.get(conf);
OutPutfile.deleteOnExit(new Path(name));
OutPutfile.createNewFile(new Path(name));
FSDataOutputStream Outputfs = OutPutfile.append(new Path(name));
Outputfs.write(sql.getBytes());
Outputfs.flush();
Outputfs.close();
}
} catch (Exception ex) {
ex.printStackTrace();
}
}

你可能感兴趣的:(大数据,hadoop)