python无法读取hdfs文件的问题:requests.exceptions.ConnectionError: HTTPConnectionPool

1.问题一描述:在用python的hdfs库操作HDFS时,可以正常的获取到hdfs的文件目录

from hdfs import *
client = Client("http://10.0.30.9:50070")
print(client.list('/'))
['test.txt']

但是在读取文件时,出现了hdfs.util.HdfsError: File /user/dr.who/test.txt not found.的错误,尝试使用pyhdfs也是同样的问题,包括下面说的第二个问题

from hdfs import *
client = Client("http://10.0.30.9:50070")
print(client.list('/'))
with client.read('test.txt') as reader:
    content = reader.read()
    print(content)
Traceback (most recent call last):
  File "E:/pycharm/workspace/hadoopforwin/myhdfs.py", line 5, in 
    with client.read('test.txt') as reader:
  File "D:\python3.6\lib\contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "D:\python3.6\lib\site-packages\hdfs\client.py", line 678, in read
    buffersize=buffer_size,
  File "D:\python3.6\lib\site-packages\hdfs\client.py", line 112, in api_handler
    raise err
  File "D:\python3.6\lib\site-packages\hdfs\client.py", line 107, in api_handler
    **self.kwargs
  File "D:\python3.6\lib\site-packages\hdfs\client.py", line 210, in _request
    _on_error(response)
  File "D:\python3.6\lib\site-packages\hdfs\client.py", line 50, in _on_error
    raise HdfsError(message, exception=exception)
hdfs.util.HdfsError: File /user/dr.who/test.txt not found.

2.问题一解决方法:出现这个问题是因为没有指定根路径(root path),需要在调用Client方法连接hdfs时指定root path

from hdfs import *
client = Client("http://10.0.30.9:50070", root='/')
print(client.list('/'))
with client.read('test.txt') as reader:
    content = reader.read()
    print(content)

执行代码,又出现了新的问题。。。。。

3.问题二描述:报错内容的最后一行如下,这里的hmaster是hadoop主机的主机名,说明程序没有将主机名映射到正确的ip

requests.exceptions.ConnectionError: HTTPConnectionPool(host='hmaster', port=50075): Max retries exceeded with url: /webhdfs/v1/test.txt?op=OPEN&namenoderpcaddress=hMaster:9000&offset=0 (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

4.问题二解决方法:在运行python程序的主机的hosts文件中加上主机名和ip的映射,对于我所使用的windows系统,hosts文件的路径是C://Windows/System32/drivers/etc/hosts,在文件末尾加上

ip 主机名

以本文的情况为例,则是

10.0.30.9 hmaster

修改完记得保存,运行程序成功读取文件。

5.在使用hdfs和pyhdfs库时,除了读取文件,还有一些方法也会出现这种情况,解决方法相同

你可能感兴趣的:(python,hdfs)