【Reading】2013-05, 06, 07, 08

  • http://hortonworks.com/blog/moving-hadoop-beyond-batch-with-apache-yarn/ 分析Hadoop YARN出现的原因,主要的角度是SQL in Hadoop;
  • http://blog.cloudera.com/blog/2013/05/cloudera-development-kit-cdk/  此前Cloudera推出Cloudera Manager,主要提供给Admin和Operation使用;现在终于面前开发者而推出Cloudera Development Kit
  • http://blog.cloudera.com/blog/2013/05/extending-the-data-warehouse-with-hadoop/ 提出了Cloudera的观点,认为Hadoop不会替换现有的数据基础设施,比如数据仓库,相反,Hadoop是一种补充,比如其可以作为Transactional system和warehouse之间的staging area
  • http://www.cs.umd.edu/users/pugh/java/memoryModel/ 《Effective Java》的Item 66,说的是"When multiple threads share mutable data, each thread that reads or writes the data must perform synchronization.",哪怕这个mutable data是int, long这类原子类型,也需要同步。这跟Java Memory Model有关,因为"it does not guarantee that a value written by one thread will be visible to another"。这篇文章比较详细地解释了Memory Model,因为有processor local cache,code reordering的存在,如果没有同步,那么多线程可能拿到不一致的数据。
  • http://developer.yahoo.com/blogs/hadoop/next-generation-apache-hadoop-mapreduce-3061.html Arun C. Murthy写的一个关于YARN的post,浅显易懂
  • http://www.eecs.harvard.edu/~mdw/papers/events.pdf 这是一篇2000年左右的论文,讨论了构造高并发系统的两种基本策略——thread-based and event-driven——阐述各自的优缺点,并提出一种混合策略。
  • http://act2.me/full-stack-web-development/ 短小精悍的文章,讨论了Web开发演进的历史
  • http://www.infoq.com/news/2009/08/google-chose-jetty 
  • http://blog.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/ 提高MapReduce程序性能的几种思路
  • http://drankye.wordpress.com/2012/11/20/understanding-hadoop-kerberos-authentication/ 描述Hadoop Kerberos Authentication机制的原理和应用的几个框架和API
  • http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ 解释了为什么Hadoop不适合处理很多小文件
  • http://www.programcreek.com/2013/09/top-10-methods-for-java-arrays/ 超级实用的Java数组使用技巧

你可能感兴趣的:(Reading)