Hadoop大数据系列一 整体介绍

基本组件

HDFS: Hadoop's Distributed File System

Hadoop YARN: A framework for job scheduling and cluster resource management

Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

扩展组件

HBase: A scalable, distributed database that supports structured data storage for large tables.

Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying

Mahout: A Scalable machine learning and data mining library.

Spark: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.

ZooKeeper: A high-performance coordination service for distributed applications.

你可能感兴趣的:(Hadoop大数据系列一 整体介绍)