初学大数据,记录自己的笔记
准备:
1、虚拟机vmware10
2、os centos6.5 x86_64
3、163网络yum源 CentOS6-Base-163.repo
4、hadoop-2.6.2.tar
--hadoop核心组件
jdk-7u79-linux-x64.tar
--java
hbase-1.1.0.1-bin.tar
--hbase注意支持的版本
sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar
--sqoop导数据工具
apache-hive-2.0.0-bin.tar
--hive
ojdbc6.jar
--oracle jdbc
mysql-connector-java-5.1.38.tar
--mysql jdbc
系统安装:
1、centos6.5 安装,尽量少的安装组件,具体安装步骤本文不做说明
一共三台 master slave1 slave2
2、网络配置 /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=00:0C:29:D2:D1:D8
TYPE=Ethernet
UUID=1a852f8f-0eec-4ff6-a105-9ed88a7df754
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
IPADDR=192.168.10.201
NETMASK=255.255.255.0
GEATWAY=192.168.10.1
DNS1=8.8.8.8
DNS2=114.114.114.114
3、yum源
cp CentOS6-Base-163.repo /etc/yum.pepos.d/
yum clean all
yum list openssh
4、ssh无密码登录,因为hadoop安装时所有的分发都依靠ssh
yum install sshd
5、关闭防火墙和selinux
6、hostname
/etc/hosts
--文件拷贝到slave
192.168.10.200 master
192.168.10.201 slave1
192.168.10.202 slave2
/etc/sysconfig/network
--文件拷贝到slave
NETWORKING=yes
HOSTNAME=master
hostname master
7、配置ssh密码登录
/etc/etc/ssh/sshd_config
--文件拷贝到slave
去掉开头的注释
RSAAuthentication yes
PubkeyAuthentication
yes
PasswordAuthentication
yes
输入ssh-keygen -t rsa 生成key (一直选择默认)
cd /root/.ssh/
将master和slave的公钥合并到authorized_keys里面
cat id_rsa.pub>>authorized_keys
ssh root@slave1 cat /root/.ss h/id_rsa.pub>> authorized_keys
ssh root@slave2 cat /root/.ss h/id_rsa.pub>> authorized_keys
验证秘钥(必做)
8、安装jdk
tar zxvf jdk-7u79-linux-x64.tar
文件解压后mv到/usr/local/下命名为java
配置环境 /etc/profile
--文件拷贝到slave(可以最后一起拷贝)
export JAVA_HOME=/usr/local/java
export JRE_HOME=/usr/local/java/jre
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVE_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/BINCLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
. /etc/profile
输入 jave -version 测试
9、安装hadoop
tar zxvf hadoop-2.6.2.tar
mv到/opt目录下命名为hadoop
配置环境 /etc/profile
--文件拷贝到slave(可以最后一起拷贝)
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_LOG_DIR=/opt/logs
export YARN_LOG_DIR=$HADOOP_LOG_DIR
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
修改配置文件
/opt/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value> --hdfs的端口地址很重要
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/tmp</value>
</property>
<property>
<name>hadoop.proxyuser.aboutyun.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.aboutyun.groups</name>
<value>*</value>
</property>
</configuration>
/opt/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.peplication</name>
<value>1</value> --文件数量默认为3,数量不能超过节点的数量
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
/opt/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
/opt/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
</configuration>
/opt/hadoop/etc/hadoop/hadoop-env.sh
xport JAVA_HOME=/usr/local/java
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
/opt/hadoop/etc/hadoop/master --maste特有的文件
写上master或者master的ip地址
/opt/hadoop/etc/hadoop/slave
slave1
slave2
hdfs文件格式化
master节点在操作
hadoop namenode -format
注意查看输出日志
10、hadoop分发复制
scp -r /opt/hadoop/ root@slave1:/opt/
scp -r /opt/hadoop/ root@slave1:/opt/
11、master 节点 执行 start-all.sh 开启所有进程,错误日志请查看log
stop-all.sh 关闭所有节点
输入jps可以查看进程
访问:http://master:50070
http://master:8088
12、hbase安装 安装前请先查看hbase版本支持的hadoop,否则会出现很多不兼容情况
tar zxvf hbase-1.1.0.1-bin.tar
mv到/opt目录下命名为hbase
修改环境/etc/profile --文件拷贝到slave
export HBASE_HOME=/opt/hbase
export PATH=$HBASE_HOME/bin:$PATH
. /etc/profile
修改配置文件
/opt/hbase/conf/hbase-env.sh
export JAVA_HOME=/usr/local/java
export HBASE_CLASSPATH=/opt/hadoop/etc/hadoop
export HBASE_MANAGES_ZK=true
/opt/hbase/conf/hbase-env.sh
<configuration>
<property>
<name>hbase.master</name>
<value>hdfs://master:60000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/opt/habse/tmp</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave1,slave2</value>
</property>
<property>
<name>hbase.zookeeper.property.datadir</name>
<value>/opt/hbase/zookeeper</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
/opt/hbase/conf/regionservers
slave1
slave2
启动hbase start-hbase.sh
停止hbase stop-hbase.sh
注意查看输出日志
http://master:16010
http://master:16030
13、sqoop数据导入导出安装
tar zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar
mv到/opt目录下命名为sqoop
修改环境/etc/profile
export SQOOP_HOME=/opt/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
export CATALINA_BASE=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs/
/opt/sqoop/conf/sqoop-env.sh
export HADOOP_COMMON_HOME=/opt/hadoop
export HADOOP_MAPRED_HOME=/opt/hadoop
export HBASE_HOME=/opt/hbase
export HIVE_HOME=/opt/hive
cp mysql和oracle 的jdbc文件 到sqoop/lib/下面
执行sqoop help cmd
mysql:
sqoop list-databases --connect jdbc:mysql://192.168.10.200:3306/ -username t -password 123456
oracle:
sqoop import --connect jdbc:oracle:thin:@192.168.10.105:1521:pu --username SCOTT -P --table LEMP --hbase-table xemp --column-family x_cf --hbase-row-key empno -m 2 --columns empno,ename,job,mgr,hiredate,sal,comm,deptno