python3 与主流大数据组件

大数据领域主要是以java为主,次要的编程语言为python,scala等,本文介绍和python相关的大数据:

python所需要的版本为python3.6:

数据源:
MySQL:
oracle:
MS SQL server:
postgresql: pip install  psycopg2
MongoDB:
Neo4J:
Redis:

大数据处理:
Hadoop:(HDFS、MapReduce、YARN)
 pip install dask
 pip install mrjob
 pip install pydoop (默认的为pydoop 1.2不稳定)
# pip install --pre pydoop
 hive:
 pyhive impyla
 hbase:
 happybase
 
 presto:
 pip install presto
 pip install presto-python-client
 clickhouse:
 ElasticSearch:
 elasticsearch-py
 pip install elasticsearch
 pip install pysolr
 pip install elasticsearch-dsl
 kafka:
 pip install kafka-python
 kafka pykafka
 spark:
 flink:
 
 kylin:
 kylinpy
 
 kudu:
 kudu-python
 impala:
 impyla
 apache-beam
 大数据可视化:
 pyecharts
 hue
 superset
 调度系统:
 luigi
 airflow
 安全:


Druid.io:官方
pip install pydruid
网站:
https://github.com/druid-io/pydruid
注意:
若同时安装上述软件,其依赖的软件包会有冲突。建议独立部署.

 

你可能感兴趣的:(Python)