Python3 使用hdfs分布式文件储存系统
from pyhdfs import *
client = HdfsClient(hosts="testhdfs.org, 50070",
user_name="web_crawler") # 创建一个连接
client.get_home_directory() # 获取hdfs根路径
client.listdir(PATH) # 获取hdfs指定路径下的文件列表
client.copy_from_local(file_path, hdfs_path, overwrite=True) # 把本地文件拷贝到服务器,不支持文件夹;overwrite=True表示存在则覆盖
client.delete(PATH, recursive=True) # 删除指定文件
hdfs_path必须包含文件名及其后缀,不然不会成功
如果连接
HdfsClient
报错
Traceback (most recent call last):
File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "
client.get_home_directory()
File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 565, in get_home_directory
return _json(self._get('/', 'GETHOMEDIRECTORY', **kwargs))['Path']
File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 391, in _get
return self._request('get', *args, **kwargs)
File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 377, in _request
_check_response(response, expected_status)
File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 799, in _check_response
remote_exception = _json(response)['RemoteException']
File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 793, in _json
"Expected JSON. Is WebHDFS enabled? Got {!r}".format(response.text))
pyhdfs.HdfsException: Expected JSON. Is WebHDFS enabled? Got '\n\n\n\n
502 Server dropped connection
\n
The following error occurred while trying to accesshttp://%2050070:50070/webhdfs/v1/?user.name=web_crawler&op=GETHOMEDIRECTORY:
\n502 Server dropped connection
\n
Generated Fri, 21 Dec 2018 02:03:18 GMT by Polipo on .\n\r\n'
则一般是访问认证错误,可能原因是账户密码不正确或者无权限,或者本地网络不在可访问名单中