Hadoop更新

hadoop版本更新比较慢,做些记录。

3.0主要更新功能:

hdfs  erasure coding ,最早由twitter提出,类似于软RAID功能,可以节省1.5倍存储空间,使用于存储鸡肋数据场景,重要数据重构需要一定CPU,除非群体有大量空间时间。

YARN Timeline Service,MapReduce 1阶段,和JobTracker集成在一块,大集群写日志消耗较大,到MapReduce 2阶段,相应功能迁移到historyserver,存在SPOF问题,这一版本还是试验性质。

Support for Opportunistic Containers and Distributed Scheduling: 低priority  container

Support for more than 2 NameNodes: 单个namenode->Active/Standby namenode->multiple

Intra-datanode balancer:这个功能比较不错,单机多磁盘balancer,老版本经常有这个问题。

YARN Resource Types:受linux cgroup限制,老版本只支持CPU和Memory,这个版本终于支持GPU了,还没看底层实现。

3.1主要更新:

Yarn Service framework provides first class support and APIs to host long running services natively in YARN.

In a nutshell, it serves as a container orchestration platform for managing containerized services on YARN. It supports both docker container and traditional process based containers in YARN. 加强对Docker Container支撑 

First-class GPU scheduling and isolation (For both docker/non-docker containers) on YARN.

First-class FPGA scheduling and isolation (For both docker/non-docker containers) on YARN

PB级数据AI的支撑,TB级有Dask足够。

Support administrators to specify absolute resources (X Memory, Y VCores, Z GPUs, etc.) to a queue instead of providing percentage based values. This provides better control for admins to configure required amount of resources for a given queue.

Hadoop 1 Memory(Java Process实现)=> Hadoop 2 Memory+CPU (Linux CGroup实现)=>Hadoop 3 Memory+CPU + GPU

PB级机器学习和深度学习,Hadoop 3.1首选。

你可能感兴趣的:(Hadoop更新)