Spark相关学习链接(持续更新)

Spark

  • 向Spark1.6开炮:问题总结与踩坑: http://www.tuicool.com/articles/2U36Zb
  • Spark Summit 2017 2月份: https://spark-summit.org/east-2017/schedule/
  • Trends for Big Data and Apache Spark in 2017 by Matei Zaharia: https://www.slideshare.net/SparkSummit/trends-for-big-data-and-apache-spark-in-2017-by-matei-zaharia
  • Spark Streaming 源码解析:https://github.com/lw-lin/CoolplaySpark/tree/master/Spark%20Streaming%20源码解析系列
  • Spark RDD/DataSet/DataFrame使用场景: https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html
  • Spark配置文件说明:http://spark.apache.org/docs/latest/configuration.html
  • Spark Sorting性能显著的原因(包含几个很重要的issue):https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
  • 关于Spark Shuffle非常不错的一篇文章:https://0x0fff.com/spark-architecture-shuffle/
  • Yahoo关于Spark所做的优化:https://spark-summit.org/2013/wp-content/uploads/2013/10/Li-AEX-Spark-yahoo.pdf
  • Databricks Spark操作手册:https://docs.databricks.com/spark/latest/data-sources/sql-databases.html
  • spark内存管理:https://wongxingjun.github.io/2016/05/26/Spark内存管理/

TODO:

  • Spark广告点击预测:http://lxw1234.com/archives/2016/01/595.htm
  • Spark相关问题积累:http://blog.leanote.com/post/anglema/总结-spark-问题

Flink

  • Flink Scheduler: http://chenyuzhao.me/2017/02/09/flink-scheduler/
  • Apache Flink:特性、概念、组件栈、架构及原理分析: http://shiyanjun.cn/archives/1508.html
  • Flink 原理与实现:架构和拓扑概览: http://wuchong.me/blog/2016/05/03/flink-internals-overview/
  • ResourceManager源码分析:http://zengzhaozheng.blog.51cto.com/8219051/1438204

Java

  • Java高编译低运行错误: http://www.jianshu.com/p/f4996b1ccf2f

Hadoop

  • ResourceManager高可用配置: http://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
  • ResourceManager架构、原理:http://www.jianshu.com/p/626b66fa65db
  • HDFS ZKFailOverController原理:http://blog.csdn.net/zkq_1986/article/details/54952738
  • HadoopYarn 内存资源隔离实现原理:http://blog.csdn.net/a860mhz/article/details/50618555
  • Yarn集群CGroup配置:http://www.jianshu.com/p/e283ab7e2530
  • HDFS Recovery Processes:http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/

Git

  • 查看某一个文件的提交历史: http://www.cnblogs.com/flyme/archive/2011/11/28/2265899.html

分享

  • InfoQ的大牛分享: http://www.infoq.com/cn/netease/presentations/

HBase

  • HBase Snaptshot流程详解:http://www.cnblogs.com/foxmailed/p/3914117.html

Mysql

  • Mysql备份总结:http://www.cnblogs.com/liangshaoye/p/5464794.html
  • Mysql BinLog概念:http://blog.csdn.net/wyzxg/article/details/7412777
  • 数据库隔离级别论文:http://www.cs.umb.edu/~poneil/iso.pdf
  • Mysql RedoLog以及Recover:http://www.cnblogs.com/liuhao/p/3714012.html
  • 隔离级别介绍:http://blog.csdn.net/qq_33290787/article/details/51924963

Maven

  • Maven Plugins: http://maven.apache.org/plugins/index.html

Kafka

  • Kafka配置选项: http://blog.csdn.net/vegetable_bird_001/article/details/51858915
  • Kafka性能参数与压力调优:http://blog.csdn.net/stark_summer/article/details/50203133

Hive

  • Hive执行过程原理: http://tech.meituan.com/hive-sql-to-mapreduce.html
  • 一起学Hive: http://lxw1234.com/archives/2015/09/476.htm

博客

  • 大数据田地:http://lxw1234.
  • 阿里新型存储引擎:http://www.infoq.com/cn/news/2017/08/ali-polardb?from=timeline&isappinstalled=0#tt_daymode=1&tt_font=m?&tt_from=weixin_moments&utm_source=weixin_moments&utm_medium=toutiao_ios&utm_campaign=client_share&wxshare_count=1

你可能感兴趣的:(Spark相关学习链接(持续更新))