三台机器主机名修改:(无需重启生效,重新连接即可)
hostnamectl set-hostname master
hostnamectl set-hostname slave1
hostnamectl set-hostname slave2
三台机器防火墙关闭:
systemctl stop firewalld.service
IP映射:
vim /etc/hosts
内网ip1 master
内网ip2 slave1
内网ip3 slave2
三台机器时区更改:
tzselect 5911
三台机器ntp服务下载:
yum install -y ntp
master作为ntp服务器,修改ntp配置文件。(master上执行)
vim /etc/ntp.conf
server 127.127.1.0
fudge 127.127.1.0 stratum 10
master中重启ntp服务
systemctl restart ntpd.service
slave1、slave2同步:
ntpdate master
比赛中必须将如下信息添加至环境变量中并且生效(三台机器)
vim /etc/profile
TZ='Asia/Shanghai'; export TZ
保存退出
source /etc/profile
slave1、slave2从节点在早十晚五时间段每隔半个小时同步一次主节点时间(24小时制、用户root任务调度crontab
crontab -e
添加如下:
*/30 10-17 * * *./ntpdate master
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
ssh-copy-id localhost
master:
cat id_rsa.pub >> authorized_keys
(注意在.ssh/路径下操作。查看一下authorized_keys,如果authorized_keys中有密钥信息,就不用追加了)
ssh master
exit
slave1、slave2:
scp master:~/.ssh/id_rsa.pub ~/.ssh/master_rsa.pub
cat master_rsa.pub >> authorized_keys
master:
ssh slave1
ssh slave2
方法二:
ssh-keygen -t rsa
三台机器将密钥拷贝到第一台机器
命令: ssh-copy-id master
将第一台机器的公钥拷贝到其他机器
scp /root/.ssh/authorized_keys slave1:/root/.ssh
scp /root/.ssh/authorized_keys slave2:/root/.ssh
ssh-copy-id localhost
mkdir -p /usr/java
tar -zxvf /usr/package/jdk-8u171-linux-x64.tar.gz -C /usr/java/
vim /etc/profile
source /etc/profile
export JAVA_HOME=/usr/java/jdk1.8.0_171
export CLASSPATH=$JAVA_HOME/lib/
export PATH=$PATH:$JAVA_HOME/bin
export PATH JAVA_HOME CLASSPATH
保存退出
source /etc/profile
scp -r /usr/java/ slave1:/usr/
scp -r /usr/java/ slave2:/usr/
从节点配置环境变量并且生效。
mkdir -p /usr/zookeeper
tar -zxvf /usr/package/zookeeper-3.4.10.tar.gz -C /usr/zookeeper/
vim /etc/hosts
192.168.57.110 master master.root
192.168.57.111 slave1 slave1.root
192.168.57.112 slave2 slave2.root
进入配置文件目录conf:
zoo.cfg:
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
## 修改如下内容
dataDir=/usr/zookeeper/zookeeper-3.4.10/zkdata
## 添加如下内容
dataLogDir=/usr/zookeeper/zookeeper-3.4.10/zkdatalog
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888
myid:
进入主目录
mkdir zkdata
mkdir zkdatalog
vim zkdata/myid
1
保存推出
scp -r /usr/zookeeper/ root@slave1:/usr/
scp -r /usr/zookeeper/ root@slave2:/usr/
slave1修改myid为2
slave2修改myid为3
三台机器配置环境变量:
vim /etc/profile
## ZOOKEEPER
export ZOOKEEPER_HOME=/usr/zookeeper/zookeeper-3.4.10
PATH=$PATH:$ZOOKEEPER_HOME/bin
source /etc/profile
三台机器启动zookeeper:
zkServer.sh start
zkServer.sh status
mkdir -p /usr/hadoop
tar -zxvf /usr/package/hadoop-2.7.3.tar.gz -C /usr/hadoop/
三台机器配置环境变量:
vim /etc/profile
export HADOOP_HOME=/usr/hadoop/hadoop-2.7.3
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib
export PATH=$PATH:$HADOOP_HOME/bin
source /etc/profile
进入配置文件目录 主目录/etc/hadoop
hadoop-env.sh:
export JAVA_HOME=/usr/java/jdk1.8.0_171
core-site.xml:
fs.default.name
hdfs://master:9000
hadoop.tmp.dir
/usr/hadoop/hadoop-2.7.3/hdfs/tmp
A base for other temporary directories.
io.file.buffer.size
131072
fs.checkpoint.period
60
fs.checkpoint.size
67108864
yarn-site.xml:
yarn.resourcemanager.address
master:18040
yarn.resourcemanager.scheduler.address
master:18030
yarn.resourcemanager.webapp.address
master:18088
yarn.resourcemanager.resource-tracker.address
master:18025
yarn.resourcemanager.admin.address
master:18141
yarn.resourcemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.auxservices.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
yarn-en.sh:
export JAVA_HOME=/usr/java/jdk1.8.0_171
slaves:
vim slaves
slave1
slave2
master:
vim master
master
hdfs-site.xml:
dfs.replication
2
dfs.namenode.name.dir
file:/usr/hadoop/hadoop-2.7.3/hdfs/name
true
dfs.datanode.data.dir
file:/usr/hadoop/hadoop-2.7.3/hdfs/data
true
dfs.namenode.secondary.http-address
master:9001
dfs.webhdfs.enabled
true
dfs.permissions
false
mapred-site.xml:
mapreduce.framework.name
yarn
scp -r /usr/hadoop/ root@slave1:/usr/
scp -r /usr/hadoop/ root@slave2:/usr/
sourc /etc/profile
验证,待前面步骤全部验证成功后再格式化启动
hadoop namenode -format
sbin/start-all.sh
slave2 启动mysql服务,重置密码:
systemctl start mysqld.service
grep "temporary password" /var/log/mysqld.log 查看初始密码
mysql -uroot -p
set global validate_password_policy=0;
set global validate_password_length=4;
alter user 'root'@'localhost' identified by '123456';
\q
mysql -uroot -p123456
create user 'root'@'%' identified by '123456';
grant all privileges on *.* to 'root'@'%' withgrant option;
flush privileges;
解压安装 master、slave1:
mkdir -p /usr/hive
tar -zxvf /usr/package/apache-hive-2.1.1-bin.tar.gz -C /usr/hive
master、slave1设置HIVE系统环境变量($HIVE_HOME):在配置文件目录下
vim /etc/profile
## HIVE
export HIVE_HOME=/usr/hive/apache-hive-2.1.1-bin
export PATH=$PATH:$HIVE_HOME/bin
source /etc/profile
master、slave1设置HIVE运行环境:
cp hive-env.sh.template hive-env.sh
vim hive-env.sh
export HADOOP_HOME=/usr/hadoop/hadoop-2.7.3
export HIVE_CONF_DIR=/usr/hive/apache-hive-2.1.1-bin/conf
export HIVE_AUX_JARS_PATH=/usr/hive/apache-hive-2.1.1-bin/lib
保存退出
解决jline的版本冲突 操作环境: master、slave1:
cp /usr/hive/apache-hive-2.1.1-bin/lib/jline-2.12.jar /usr/hadoop/hadoop-2.7.3/share/hadoop/yarn/lib/
slave1驱动拷贝(依赖包存放于/usr/package/):
cp /usr/package/mysql-connector-java-5.1.47-bin.jar /usr/hive/apache-hive-2.1.1-bin/lib/
slave1 配置 hive-site.xml 文件:
hive.metastore.warehouse.dir
/user/hive_remote/warehouse
javax.jdo.option.ConnectionURL
jdbc:mysql://slave2:3306/hive?createDatabaseIfNotExist=true
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName
root
javax.jdo.option.ConnectionPassword
123456
hive.metastore.schema.verification
false
datanucleus.schema.autoCreateAll
true
master 配置 hive-site.xml 文件:
hive.metastore.warehouse.dir
/user/hive_remote/warehouse
hive.metastore.local
false
hive.metastore.uris
thrift://slave1:9083
初始化数据库,启动metastore服务,开启客户端操作环境: slave1、master
解决:(slave1)
启动:
bin/hive --service metastore
先查看是否有连接数据库的权限,没有的话给权限
查看是否已经有该数据库,有的话删除
将hive目录下的 metastore_db删除
重新初始化 bin/schematool -dbType mysql -initSchema
开启slave1 的metastore: bin/hive --service metastore
hive服务器不需要初始化,可以直接启动客户端 bin/hive
hadoop namenode -format
sbin/start-all.sh
systemctl start mysqld.service
bin/schematool -dbType mysql -initSchema
bin/hive
hadoop namenode -format
sbin/start-all.sh
systemctl start mysqld.service
bin/schematool -dbType mysql -initSchema
bin/hive
create external table person(age double, workclass string, fnlwgt string, edu string, edu_num double, marital_status string, occupation string, relationship string, race string, sex string, gain string, loss string, hours double, native string, income string) row format delimited fields terminated by ',';
load data local inpath '/root/college/person.csv' into table person;
insert overwrite local directory '/root/person00' row format delimited fields terminated by ',' select count(*) from person;
insert overwrite local directory '/root/person03' row format delimited fields terminated by ',' select round(avg(age)) from person;
insert overwrite local directory '/root/person04' row format delimited fields terminated by ',' select count(*) from person where age between 35 and 40 and marital_status == 'Never-married';
insert overwrite local directory '/root/person05' row format delimited fields terminated by ',' select count(*) from person where hours between 20 and 30 and occupation == 'Tech-support';
insert overwrite local directory '/root/person06' row format delimited fields terminated by ',' select * from(select count(*) as s from person group by race) w order by w.s desc;
select * from(select count(*) as s from person group by race) w order by w.s desc;
select count(*) x from person group by race having order by x desc;
create table student(Id Int, Name String, Age Int, Sex string) row format delimited fields terminated by ',';
alter table student add columns (address string);
scala安装
mkdir -p /usr/scala
tar -zxvf 压缩包 -C /usr/scala
vim /etc/profile
export SCALA_HOME=/usr/scala/scala-2.11.12
export PATH=$SCALA_HOME/bin:$PATH
保存退出
source /etc/profile
scala -version
scp -r /usr/scala root@slave1:/usr/
scp -r /usr/scala root@slave2:/usr/
配置环境变量
mkdir -p /usr/spark
tar -zxvf 压缩包 -C /usr/spark
进入配置文件目录
spark-env.sh:
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export SPARK_MASTER_IP=master
export SCALA_HOME=/usr/scala/scala-2.11.12
export SPARK_WORKER_MEMORY=8g
export JAVA_HOME=/usr/java/jdk1.8.0_171
export HADOOP_HOME=/usr/hadoop/hadoop-2.7.3
export HADOOP_CONF_DIR=/usr/hadoop/hadoop-2.7.3/etc/Hadoop
slaves:
cp slaves.template.template slaves
vim slaves
slave1
slave2
vim /etc/profile
export SPARK_HOME=/usr/spark/spark-2.4.0-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH
source /etc/profile
scp -r /usr/spark root@slave1:/usr/
scp -r /usr/spark root@slave2:/usr/
从节点配置环境变量并且生效。
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import java.net.URI;
public class TenMain extends Configured implements Tool {
public int run(String[] strings) throws Exception {
Job job = Job.getInstance(super.getConf(), "mapreduce_x");
job.setInputFormatClass(TextInputFormat.class);
TextInputFormat.addInputPath(job,new Path("hdfs://mycluster/bigdatacase/dataset"));
job.setMapperClass(TenMapper.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(LongWritable.class);
job.setReducerClass(TenReduce.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(NullWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path path = new Path("hdfs://mycluster/usr/output");
TextOutputFormat.setOutputPath(job,path);
FileSystem fileSystem = FileSystem.get(new URI("hdfs://mycluster/usr/output"), new Configuration());
//判断目录是否存在
boolean bl2 = fileSystem.exists(path);
if(bl2){
//删除目标目录
fileSystem.delete(path, true);
}
boolean b1 = job.waitForCompletion(true);
return b1 ?0:1;
}
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
int run = ToolRunner.run(configuration,new TenMain(),args );
System.exit(run);
}
}
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class TenMapper extends Mapper<LongWritable, Text,LongWritable,LongWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] split = value.toString().split("\t");
if (split[5].equals("2014-12-11")&&split[3].equals("4")){
context.write(new LongWritable(1),new LongWritable(1));
}
}
}
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class TenReduce extends Reducer<LongWritable,LongWritable,LongWritable, NullWritable> {
@Override
protected void reduce(LongWritable key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
int sum =0;
for (LongWritable i:values
) {
sum+=Integer.parseInt(i.toString());
}
context.write(new LongWritable(sum),NullWritable.get());
}
}