hdfs data flow-part reading

The DistributedFileSystem returns a FSDataInputStream (an input stream that supports
file seeks) to the client for it to read data from. FSDataInputStream in turn wraps a
DFSInputStream, which manages the datanode and namenode I/O.

The client then calls read() on the stream (step 3). DFSInputStream, which has stored
the datanode addresses for the first few blocks in the file, then connects to the first
(closest) datanode for the first block in the file. Data is streamed from the datanode
back to the client, which calls read() repeatedly on the stream (step 4). When the end
of the block is reached, DFSInputStream will close the connection to the datanode, then
find the best datanode for the next block (step 5). This happens transparently to the
client, which from its point of view is just reading a continuous stream.

 

//TODO anatomy by chinese

 

你可能感兴趣的:(reading)