spark使用总结

1.
RDD:Resilient Distributed Dataset 弹性分布数据集
http://developer.51cto.com/art/201309/410276_1.htm
2.spark-shell 的使用

./spark-shell --driver-library-path :/usr/local/hadoop-1.1.2/lib/native/Linux-i386-32:/usr/local/hadoop-1.1.2/lib/native/Linux-amd64-64:/usr/local/hadoop-1.1.2/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar
3.
wordcount 程序
val file = sc.textFile("hdfs://192.168.100.99:9000/user/chaobo/test/tmp/2014/07/07/hive-site.xml.lzo")
val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
结果打印到屏幕count.collect()
结果写到hdfs count.saveAsTextFile("hdfs://192.168.100.99:9000/user/chaobo/result_20140707")   最后一级目录不能存在
4.启动主节点
../sbin/start-master.sh
5.启动子节点
../sbin/start-slave.sh --webui-port 8081

你可能感兴趣的:(spark)