IDEA中创建Maven工程 -》配置华为云镜像 kerberos认证代码参考华为的样例代码 -》将步骤3的集群客户端配置文件和用户凭证keytab、krb5.conf放到工程的资源目录下,将客户端配置文件中的所有ip地址修改成hosts文件中相应的主机名。 -》在hdfs-site.xml中添加如下配置 本地测试PC与集群不在一个局域网,这种情况下,本地访问hdfs时,namenode会返回数据所在的datanode地址,但是返回的可能是datanode的内网私有ip,我们无法根据该ip访问数据节点datanode,添加如下配置,让namenode返回datanode的域名。之前我们已经在本地hosts文件配置了所有节点的公网ip。因此本地就可以通过域名访问到hdfs中的数据了。 dfs.client.use.datanode.hostname true only cofig in clients
-》spark访问云端hdfs代码如下:
package com.huawei.bigdata.spark.examples
import java.io.File
import org.apache.hadoop.conf.Configuration
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext._
import com.huawei.hadoop.security.LoginUtil
object FemaleInfoCollection {
def main(args: Array[String]) {
// security mode
val userPrincipal = "spark_wang"
val filePath = System.getProperty("user.dir") + File.separator + "resources" + File.separator
val userKeyTableFile = filePath + "user.keytab"
val krbFile = filePath + "krb5.conf"
val hadoopConf: Configuration = new Configuration()
// hadoopConf.set("dfs.client.use.datanode.hostname", "true") // 已在hdfs-side.xml添加该配置
LoginUtil.login(userPrincipal, userKeyTableFile, krbFile, hadoopConf)
// Configure the Spark application name.
val conf = new SparkConf().setAppName("CollectFemaleInfo")
.setMaster("local")
// Initializing Spark
val sc = new SparkContext(conf)
// Read data. This code indicates the data path that the input parameter args(0) specifies.
val text = sc.textFile("/user/spark_wang/female-info.txt") // 默认会从配置文件中获取hdfs地址,可以写成全路径hdfs://node-master1bcgx:9820/user/spark_wang/female-info.txt
// Filter the data information about the time that female netizens spend online.
val data = text.filter(_.contains("female"))
// Aggregate the time that each female netizen spends online
val femaleData: RDD[(String, Int)] = data.map { line =>
val t = line.split(',')
(t(0), t(2).toInt)
}.reduceByKey(_ + _)
// Filter the information about female netizens who spend more than 2 hours online, and export the results
val result = femaleData.filter(line => line._2 > 10)
result.collect().map(x => x._1 + ',' + x._2).foreach(println)
sc.stop()
}
}
刚在一台IBM Xserver服务器上装了RedHat Linux Enterprise AS 4,为了提高网络的可靠性配置双网卡绑定。
一、环境描述
我的RedHat Linux Enterprise AS 4安装双口的Intel千兆网卡,通过ifconfig -a命令看到eth0和eth1两张网卡。
二、双网卡绑定步骤:
2.1 修改/etc/sysconfig/network
1.AdviceMethods.java
package com.bijian.study.spring.aop.schema;
public class AdviceMethods {
public void preGreeting() {
System.out.println("--how are you!--");
}
}
2.beans.x
包括Spark Streaming在内的实时计算数据可靠性指的是三种级别:
1. At most once,数据最多只能接受一次,有可能接收不到
2. At least once, 数据至少接受一次,有可能重复接收
3. Exactly once 数据保证被处理并且只被处理一次,
具体的多读几遍http://spark.apache.org/docs/lates
具体思路参见:http://zhedahht.blog.163.com/blog/static/25411174200712895228171/
import java.util.ArrayList;
import java.util.List;
public class MinStack {
//maybe we can use origin array rathe
日期转换函数的详细使用说明
DATE_FORMAT(date,format) Formats the date value according to the format string. The following specifiers may be used in the format string. The&n