20190824 课堂笔记
设置快捷键
设置编译
创建项目
选择quickstart
GAV设置
项目设置
修改
添加hadoop-version, repository
添加hadoop依赖
查看是否加进来了
maven reimport
建立一个test项目
所有的操作入口都是FileSystem
mkdirs操作
public static final String HDFS_PATH = "hdfs://192.168.1.64:8020";
@Test
public void mkdir() throws Exception{
Configuration
configuration = new Configuration();
FileSystem fileSystem =FileSystem.get(new URI(HDFS_PATH), configuration); // 注意这里的HDFS_PATH 不需要写成"HDFS_PATH"
boolean isSuccess =
fileSystem.mkdirs(new Path("/ruozedata/hdfsapi"));
Assert.assertEquals(true, isSuccess);
}
创建目录, 并且指定用户
@Test
public void mkdir02() throws Exception{
Configuration
configuration = new Configuration();
FileSystem fileSystem =FileSystem.get(new URI(HDFS_PATH), configuration, "hadoop");
boolean isSuccess =
fileSystem.mkdirs(new Path("/ruozedata/hdfsapi"));
Assert.assertEquals(true, isSuccess);
}
创建成功
从本地拷贝文件到hdfs
@Test
public void copyFromLocalFile() throws Exception{
Path srcPath = new Path("D:/BDP/api/testapi.py");
Path dstPath = new Path("/ruozedata/hdfsapi");
fileSystem.copyFromLocalFile(srcPath, dstPath);
}
上面的拷贝 Replication 为3
想要和配置文件中的副本数一致,有两个方法:
1. 设置 副本
configuration.set("dfs.replication", "2");
2. 将 hdfs-site.xml 拷贝进来
执行如下
将hdfs 文件拷贝到本地
@Test
public void copyToLocal() throws Exception{
Path srcPath = new Path("/ruozedata/hdfsapi/test20190825.txt");
Path dstPath = new Path("D:/BDP/ruoze/ruoze20190825.txt");
fileSystem.copyToLocalFile( srcPath, dstPath);
}
这样执行报空指针异常,改成下面代码执行正常
@Test
public void copyToLocal() throws Exception{
Path srcPath = new Path("/ruozedata/hdfsapi/test20190825.txt");
Path dstPath = new Path("D:/BDP/ruoze/ruoze20190825.txt");
fileSystem.copyToLocalFile(false, srcPath, dstPath, true);
}
false: delSrc
true: userRawLocalFileSystem
重命名
@Test
public void rename() throws Exception{
Path srcPath = new Path("/ruozedata/hdfsapi/test20190825.txt");
Path dstPath = new Path("/ruozedata/hdfsapi/test20190825-2.txt");
fileSystem.rename(srcPath, dstPath);
}
ok
列出目录内容
@Test
public void listFiles() throws Exception{
RemoteIterator
files = fileSystem.listFiles(new Path("/ruozedata/hdfsapi"), true);
while (files.hasNext()){
LocatedFileStatus
fileStatus = files.next();
String isDir =
fileStatus.isDirectory() ? "文件夹" : "文件";
String permission =
fileStatus.getPermission().toString();
short replication =
fileStatus.getReplication();
long length =
fileStatus.getLen();
String path =
fileStatus.getPath().toString();
System.out.println(isDir + "\t"
+ permission + "\t"
+ replication + "\t"
+ length + "\t"
+ path + "\t"
);
}
}
输出
文件输出
@Test
public void download01() throws Exception{
FSDataInputStream in = fileSystem.open(new Path("/ruozedata/hdfsapi/spark-2.3.0.tgz"));
FileOutputStream out = new FileOutputStream(new File("D:/BDP/ruoze/spark01.tgz.part02"));
in.seek(1024*128*128);
byte[] buffer = new byte[1024];
for(int i=0;i< 1024* 128;i++){
in.read(buffer);
out.write(buffer);
}
IOUtils.closeStream(out);
IOUtils.closeStream(in);
}
这种方式, 得到的文件大小都是128m的
下面的这样方式, 能够拿到正确的文件大小
@Test
public void download01() throws Exception{
FSDataInputStream in = fileSystem.open(new Path("/ruozedata/hdfsapi/spark-2.3.0.tgz"));
FileOutputStream out = new FileOutputStream(new File("D:/BDP/ruoze/spark01.tgz.part02"));
in.seek(1024*1024*128);
// byte[] buffer = new
byte[1024];
// for(int i=0;i< 1024*
128;i++){
// in.read(buffer);
// out.write(buffer);
// }
IOUtils.copyBytes(in, out, configuration);
IOUtils.closeStream(out);
IOUtils.closeStream(in);
}
输出块信息
BlockLocation[]
blockLocations = fileStatus.getBlockLocations();
for(BlockLocation location:
blockLocations){
String[] hosts =
location.getHosts();
for(String host: hosts){
System.out.println(host);
}
}