以下为单节点的测试环境搭建,仅供学习测试使用
1.卸载Linux系统自带的openJDK:
rpm -qa | grep java
查看系统自带的jdk,然后运行rpm -e --nodeps ***
删除系统自带的jdk 注意 带有nohach的不用卸载,
2.linux中进行传输文件:
lrzsz的安装:yum -y install lrzsz
3.centos7开启远程登录
用root用户执行vi /etc/ssh/sshd_config
将 #PermitRootLogin yes
这一行的“#”去掉,修改为:
PermitRootLogin yes
用ifconfig查看当前网卡的名字 我这里是:ifcfg-ens33
网络IP地址配置文件在 /etc/sysconfig/network-scripts
文件夹下
vim ifcfg-ens33
主要修改的是:
BOOTPROTO="static" # 使用静态IP地址,默认为dhcp
IPADDR="192.168.52.100" # 设置的静态IP地址
NETMASK="255.255.255.0" # 子网掩码
GATEWAY="192.168.52.10" # 网关地址
DNS1="192.168.52.10" # DNS服务器
全部的配置文件如下:
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static" # 使用静态IP地址,默认为dhcp
IPADDR="192.168.52.50" # 设置的静态IP地址
NETMASK="255.255.255.0" # 子网掩码
GATEWAY="192.168.52.10" # 网关地址
DNS1="192.168.52.10" # DNS服务器
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="95b614cd-79b0-4755-b08d-99f1cca7271b"
DEVICE="ens33"
ONBOOT="yes" #是否开机启用
查看防火墙是否开启:firewall-cmd --state
停止防火墙:systemctl stop firewalld.service
禁止防火墙开机自启动:systemctl disable firewalld.service
vim /etc/selinux/config
修改 SELINUX=enforcing
改为 SELINUX=disabled
7.安装JDK
将jdk统一安装到/usr/java 下面
mkdir /usr/java
//解压tar包
tar -xzvf jdk-8u45-linux-x64.tar.gz -C /usr/java/
#切记必须修正所属⽤户及⽤户组
chown -R root:root /usr/java/jdk1.8.0_45
配置环境变量:
vim /etc/profile
配置环境变量:
vim /etc/profile
export JAVA_HOME=/usr/java/jdk1.8.0_45
export PATH=$JAVA_HOME/bin:$PATH
source /etc/profile
rpm -qa | grep mysql
rpm -e mysql-libs-5.1.73-8.el6_8.x86_64 --nodeps
第一步:在线安装mysql相关的软件包
yum install mysql mysql-server mysql-devel
第二步:启动mysql的服务
/etc/init.d/mysqld start
第三步:通过mysql安装自带脚本进行设置
/usr/bin/mysql_secure_installation
第四步:进入mysql的客户端然后进行授权
grant all privileges on *.* to 'root'@'%' identified by '123456' with grant option;
flush privileges;
第五步:登录MySQL检查是否能正常使用
Apache Hadoop 地址:https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html
Cloudera Hadoop 地址:http://archive.cloudera.com/cdh5/cdh/5/
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2.tar.gz
当前版本的Hadoop出现问题时,可以到changes.log里面查看高版本是否将此补丁修复
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2-changes.log
2.1 创建hadoop用户
useradd hadoop
[root@bigdata01 ~]# id hadoop
uid=1001(hadoop) gid=1002(hadoop) groups=1002(hadoop)
2.2 切换到hadoop用户
[root@bigdata01 ~]# su - hadoop
2.3 创建文件夹
[hadoop@bigdata01 ~]$ mkdir app software sourcecode log tmp data lib
[hadoop@bigdata01 ~]$ ll
total 0
drwxrwxr-x 3 hadoop hadoop 50 1 22:17 app
drwxrwxr-x 2 hadoop hadoop 6 1 22:10 data
drwxrwxr-x 2 hadoop hadoop 6 1 22:10 lib
drwxrwxr-x 2 hadoop hadoop 6 1 22:10 log
drwxrwxr-x 2 hadoop hadoop 43 1 22:14 software
drwxrwxr-x 2 hadoop hadoop 6 1 22:10 sourcecode
drwxrwxr-x 2 hadoop hadoop 22 2 12:36 tmp
[hadoop@bigdata01 ~]$
2.4 上传压缩包到software目录解压到app目录
[hadoop@bigdata01 software]$ tar -xzvf hadoop-2.6.0-cdh5.16.2.tar.gz -C ../app/
[hadoop@bigdata01 app]$ ll
drwxr-xr-x 14 hadoop hadoop 241 Jun 3 19:11 hadoop-2.6.0-cdh5.16.2
2.5 做软连接
[hadoop@bigdata01 app]$ ln -s hadoop-2.6.0-cdh5.16.2/ hadoop
[hadoop@bigdata01 app]$ ll
lrwxrwxrwx 1 hadoop hadoop 23 Dec 1 22:17 hadoop -> hadoop-2.6.0-cdh5.16.2/
drwxr-xr-x 14 hadoop hadoop 241 Jun 3 19:11 hadoop-2.6.0-cdh5.16.2
2.6 检查jdk
[hadoop@bigdata01 app]$ which java
/usr/java/jdk1.8.0_121/bin/java
2.7 配置环境变量
[hadoop@bigdata01 ~]$ vim .bashrc
export HADOOP_HOME=/home/hadoop/app/hadoop
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
2.8 检查环境变量
[hadoop@bigdata01 ~]$ source .bashrc
[hadoop@bigdata01 ~]$ which hadoop
~/app/hadoop/bin/hadoop
[hadoop@bigdata01 ~]$ echo $HADOOP_HOME
/home/hadoop/app/hadoop
2.9 查看hadoop命令帮助
[hadoop@bigdata01 ~]$ hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
s3guard manage data on S3
trace view and modify Hadoop tracing settings
or
CLASSNAME run the class named CLASSNAME
Apache Hadoop 文档:https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html
Cloudera Hadoop 文档:http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2/hadoop-project-dist/hadoop-common/SingleCluster.html
1.10 安装ssh
[hadoop@bigdata01 ~]$ cd ~
执行ssh-keygen 三次回车键 会生成.ssh文件夹
$ ssh-keygen
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
采坑:
在进行ssh登录的时候输入yes这句话是维护在know_hosts中的,遇到问题时候可以到文件中删除对应的秘钥,这个文件中存储了ssh的秘钥文件
[hadoop@bigdata01 ~]$ cd .ssh/
[hadoop@bigdata01 .ssh]$ cat known_hosts
bigdata01 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBOwZK88+GuH93o6h17DEP19Ly+m79cw1rpjXTcmqlBOviTG0d8mXGmJoBDpPf/pQA49tWqgeVFcsDfBr9YdCK5w=
。。。。。
2.11 格式化hdfs
[hadoop@bigdata01 ~]$ hdfs namenode -format
当出现。。。。 has been successfully formatted时候说明成功执行
[hadoop@bigdata01 ~]$ start-dfs.sh
20/07/13 14:28:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [bigdata01]
bigdata01: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-namenode-bigdata01.out
bigdata01: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-datanode-bigdata01.out
Starting secondary namenodes [bigdata01]
bigdata01: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-secondarynamenode-bigdata01.out
20/07/13 14:28:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@bigdata01 ~]$
[hadoop@bigdata01 ~]$ jps
11536 Jps
11416 SecondaryNameNode
11258 DataNode
11131 NameNode
配置DataNode和SecondaryNameNode都以bigdata01启动
NameNode的启动是由core-site.xml中的fs.defaultFS控制的
DataNode的启动是由salves文件中机器名称控制的
SecondaryNameNode的启动是由hdfs-site.xml控制的
<property>
<name>dfs.namenode.secondary.http-addressname>
<value>bigdata01:50090value>
property>
<property>
<name>dfs.namenode.secondary.https-addressname>
<value>bigdata01:50091value>
property>
测试
[hadoop@bigdata01 hadoop]$ hadoop fs -mkdir /hadooptest
[hadoop@bigdata01 hadoop]$ hadoop fs -ls /
drwxr-xr-x - hadoop supergroup 0 2020-07-14 12:35 /hadooptest
[hadoop@bigdata01 tmp]$ vim test.txt
[hadoop@bigdata01 tmp]$ hadoop fs -put test.txt /hadooptest
[hadoop@bigdata01 tmp]$ hadoop fs -ls /hadooptest
-rw-r--r-- 1 hadoop supergroup 42 2020-07-14 12:36 /hadooptest/test.txt
[hadoop@bigdata01 tmp]$ hadoop fs -cat /hadooptest/test.txt
hadoop hive spark flink impala kudu flume
Yarn的单点部署文档:https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html
etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
configuration>
etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
configuration>
[hadoop@bigdata01 ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/yarn-hadoop-resourcemanager-bigdata01.out
bigdata01: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/yarn-hadoop-nodemanager-bigdata01.out
[hadoop@bigdata01 ~]$ jps
11857 SecondaryNameNode
11570 NameNode
11698 DataNode
12002 ResourceManager
12391 Jps
12105 NodeManager
[hadoop@bigdata01 ~]$ netstat -nlp |grep 12002
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp6 0 0 :::8088 :::* LISTEN 12002/java
[root@bigdata01 ~]# find / -name '*example*.jar'
/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce1/hadoop-examples-2.6.0-mr1-cdh5.16.2.jar
/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-test-sources.jar
/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-sources.jar
/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar
最后一个为我们需要的案例jar包
[hadoop@bigdata01 ~]$ hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
。。。。
Most commands print help when invoked w/o parameters.
[hadoop@bigdata01 ~]$ hadoop jar /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
[hadoop@bigdata01 ~]$
[hadoop@bigdata01 ~]$ hadoop jar /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar wordcount
Usage: wordcount <in> [<in>...] <out>
提示需要输入和输出路径
[hadoop@bigdata01 ~]$ hadoop fs -cat /wordcount/test/test1.txt
20/07/13 21:45:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop hadoop hadoop spark flume spark flink hive hue
flink hbase kafka kafka spark hadoop hive
[hadoop@bigdata01 ~]$ hadoop jar /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar wordcount /wordcount/test /wordcount/output
20/07/13 21:46:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/07/13 21:46:45 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/07/13 21:46:46 INFO input.FileInputFormat: Total input paths to process : 1
20/07/13 21:46:46 INFO mapreduce.JobSubmitter: number of splits:1
20/07/13 21:46:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1575293526101_0001
20/07/13 21:46:46 INFO impl.YarnClientImpl: Submitted application application_1575293526101_0001
20/07/13 21:46:47 INFO mapreduce.Job: The url to track the job: http://bigdata01:8088/proxy/application_1575293526101_0001/
20/07/13 21:46:47 INFO mapreduce.Job: Running job: job_1575293526101_0001
20/07/13 21:46:57 INFO mapreduce.Job: Job job_1575293526101_0001 running in uber mode : false
20/07/13 21:46:57 INFO mapreduce.Job: map 0% reduce 0%
20/07/13 21:47:03 INFO mapreduce.Job: map 100% reduce 0%
20/07/13 21:47:10 INFO mapreduce.Job: map 100% reduce 100%
20/07/13 21:47:10 INFO mapreduce.Job: Job job_1575293526101_0001 completed successfully
20/07/13 21:47:10 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=100
FILE: Number of bytes written=286249
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=209
HDFS: Number of bytes written=62
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4554
Total time spent by all reduces in occupied slots (ms)=2929
Total time spent by all map tasks (ms)=4554
Total time spent by all reduce tasks (ms)=2929
Total vcore-milliseconds taken by all map tasks=4554
Total vcore-milliseconds taken by all reduce tasks=2929
Total megabyte-milliseconds taken by all map tasks=4663296
Total megabyte-milliseconds taken by all reduce tasks=2999296
Map-Reduce Framework
Map input records=2
Map output records=16
Map output bytes=160
Map output materialized bytes=100
Input split bytes=111
Combine input records=16
Combine output records=8
Reduce input groups=8
Reduce shuffle bytes=100
Reduce input records=8
Reduce output records=8
Spilled Records=16
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=108
CPU time spent (ms)=1520
Physical memory (bytes) snapshot=329445376
Virtual memory (bytes) snapshot=5455265792
Total committed heap usage (bytes)=226627584
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=98
File Output Format Counters
Bytes Written=6
[hadoop@bigdata01 ~]$ hadoop fs -ls /wordcount
20/07/13 21:48:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2020-07-14 21:47 /wordcount/output
drwxr-xr-x - hadoop supergroup 0 2020-07-14 21:45 /wordcount/test
[hadoop@bigdata01 ~]$ hadoop -ls /wordcount/output
Error: No command named `-ls' was found. Perhaps you meant `hadoop ls'
[hadoop@bigdata01 ~]$ hadoop fs -ls /wordcount/output
20/07/13 21:48:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2020-07-14 21:47 /wordcount/output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 62 2020-07-14 21:47 /wordcount/output/part-r-00000
[hadoop@bigdata01 ~]$ hadoop fs -cat /wordcount/output/part-r-00000
20/07/13 21:49:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
flink 2
flume 1
hadoop 4
hbase 1
hive 2
hue 1
kafka 2
spark 3
[hadoop@bigdata01 ~]$
[hadoop@bigdata01 ~]$ hostnamectl
Static hostname: bigdata01
Icon name: computer-vm
Chassis: vm
Machine ID: 928fc74e61be492eb9a51cc408995739
Boot ID: 32e41529ec49471dba619ba744be31b1
Virtualization: vmware
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-957.el7.x86_64
Architecture: x86-64
[hadoop@bigdata01 ~]$
[hadoop@bigdata01 ~]$ hostnamectl --help
hostnamectl [OPTIONS...] COMMAND ...
Query or change system hostname.
-h --help Show this help
--version Show package version
--no-ask-password Do not prompt for password
-H --host=[USER@]HOST Operate on remote host
-M --machine=CONTAINER Operate on local container
--transient Only set transient hostname
--static Only set static hostname
--pretty Only set pretty hostname
Commands:
status Show current hostname settings
set-hostname NAME Set system hostname
set-icon-name NAME Set icon name for host
set-chassis NAME Set chassis type for host
set-deployment NAME Set deployment environment for host
set-location NAME Set location for host
set-hostname NAME Set system hostname
[hadoop@bigdata01 ~]$ hostnamectl xxx bigdata01
[hadoop@bigdata01 ~]$ cat /etc/hostname
bigdata01
修改主机名称之后需要修改对应的hosts文件中的ip地址和主机名称的映射
官网参考地址:https://cwiki.apache.org/confluence/display/Hive/GettingStarted
1.前置条件:
2.下载tar包:
wget http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.16.2.tar.gz
3.解压:tar -zxvf hive-1.1.0-cdh5.16.2.tar.gz -C ~/app/
4.修正用户和用户组:
chmod -R hadoop:hadoop /home/hadoop/app/hive/* /home/hadoop/app/hive-1.1.0-cdh5.16.2/*
5.软连接:
ln -s hive-1.1.0-cdh5.16.2/ hive
6.配置环境变量:
[hadoop@bigdata01 app]$ cd ~
[hadoop@bigdata01 ~]$ vim .bashrc
export HIVE_HOME=/home/hadoop/app/hive
export PATH=$HIVE_HOME/bin:$PATH
[hadoop@bigdata01 ~]$ source .bashrc
[hadoop@bigdata01 ~]$ which hive
~/app/hive/bin/hive
7.拷贝MySQL驱动包到$HIVE_HOME/lib/下
8.Hive 配置文件:
Hive中是没有hive的template的,需要自己创建一个hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURLname>
<value>jdbc:mysql://bigdata01:3306/bigdata_hive?createDatabaseIfNotExist=truevalue>
property>
<property>
<name>javax.jdo.option.ConnectionDriverNamename>
<value>com.mysql.jdbc.Drivervalue>
property>
<property>
<name>javax.jdo.option.ConnectionUserNamename>
<value>rootvalue>
property>
<property>
<name>javax.jdo.option.ConnectionPasswordname>
<value>123456value>
property>
<property>
<name>hive.cli.print.current.dbname>
<value>truevalue>
property>
<property>
<name>hive.cli.print.headername>
<value>truevalue>
property>
configuration>
进入Hive客户端执行show databases;报错
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
进入/tmp/hadoop查看hive.log 完整的日志信息
Unable to open a test connection to the given database. JDBC url = jdbc:mysql://192.168.52.50:3306/bigdata_hive?createDatabaseIfNotExist=true, username = root. Terminating connection pool (set lazyInit to true
if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Access denied for user 'root'@'bigdata01' (using password: YES)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1078)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4237)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4169)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:928)
at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1750)
at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1290)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2493)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2526)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2311)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:834)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:416)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:347)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:120)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:501)
at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:298)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:420)
at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:449)
at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:344)
at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:300)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:60)
at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:69)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:685)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:663)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:712)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:511)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:78)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6517)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:207)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1660)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:68)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:83)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3412)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3431)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3656)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:232)
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:216)
at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:339)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:300)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:275)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.createHiveDB(BaseSemanticAnalyzer.java:201)
at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.<init>(DDLSemanticAnalyzer.java:222)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:265)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:546)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1358)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1475)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1287)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1277)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:226)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:175)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:389)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:634)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
错误分析:不能实例化hive的元数据信息,可能的原因是mysql数据库未连接成功,检查myslq服务,以及hive-site.xml 中配置的mysql的连接信息,mysql的驱动jar包是否有,这里的问题应该是mysql用户名密码配置有误。
原因:hive-site.xml 中mysql的用户名和密码配置错误
创建表:create table stu(id int, name string, age int);
查看表结构:
desc stu;
desc extended stu;
desc formatted stu;
show create table stu;
插入数据:insert into stu values(1,‘tom’,30);
查询数据:select * from stu;
创建stu表,默认存储在HDFS的目录
hdfs://bigdata001:8020/user/hive/warehouse/stu
hive.metastore.warehouse.dir:/user/hive/warehouse
stu:表的名字
表的完整路径是: ${hive.metastore.warehouse.dir}/tablename
Hive的完整执行日志:
cd $HIVE_HOME
cp hive-log4j.properties.template hive-log4j.properties
hive.log.dir=${java.io.tmpdir}/${user.name}
hive.log.file=hive.log
${java.io.tmpdir}/${user.name}/${hive.log.file}
/tmp/hadoop/hive.log
cat hive-log4j.properties
[hadoop@bigdata01 conf]$ cat hive-log4j.properties
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Define some default values that can be overridden by system properties
hive.log.threshold=ALL
hive.root.logger=WARN,DRFA
hive.log.dir=${java.io.tmpdir}/${user.name}
hive.log.file=hive.log
# Define the root logger to the system property "hadoop.root.logger".
log4j.rootLogger=${hive.root.logger}, EventCounter
# Logging Threshold
log4j.threshold=${hive.log.threshold}
#
# Daily Rolling File Appender
#
# Use the PidDailyerRollingFileAppend class instead if you want to use separate log files
# for different CLI session.
#
# log4j.appender.DRFA=org.apache.hadoop.hive.ql.log.PidDailyRollingFileAppender
log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFA.File=${hive.log.dir}/${hive.log.file}
# Rollver at midnight
log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
# 30-day backup
#log4j.appender.DRFA.MaxBackupIndex=30
log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
# Pattern format: Date LogLevel LoggerName LogMessage
#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# Debugging Pattern format
log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t]: %c{2} (%F:%M(%L)) - %m%n
#
# console
# Add "console" to rootlogger above if you want to use this
#
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} [%t]: %p %c{2}: %m%n
log4j.appender.console.encoding=UTF-8
#custom logging levels
#log4j.logger.xxx=DEBUG
#
# Event Counter Appender
# Sends counts of logging messages at different severity levels to Hadoop Metrics.
#
log4j.appender.EventCounter=org.apache.hadoop.hive.shims.HiveEventCounter
log4j.category.DataNucleus=ERROR,DRFA
log4j.category.Datastore=ERROR,DRFA
log4j.category.Datastore.Schema=ERROR,DRFA
log4j.category.JPOX.Datastore=ERROR,DRFA
log4j.category.JPOX.Plugin=ERROR,DRFA
log4j.category.JPOX.MetaData=ERROR,DRFA
log4j.category.JPOX.Query=ERROR,DRFA
log4j.category.JPOX.General=ERROR,DRFA
log4j.category.JPOX.Enhancer=ERROR,DRFA
# Silence useless ZK logs
log4j.logger.org.apache.zookeeper.server.NIOServerCnxn=WARN,DRFA
log4j.logger.org.apache.zookeeper.ClientCnxnSocketNIO=WARN,DRFA
#custom logging levels
log4j.logger.org.apache.hadoop.hive.ql.parse.SemanticAnalyzer=INFO
log4j.logger.org.apache.hadoop.hive.ql.Driver=INFO
log4j.logger.org.apache.hadoop.hive.ql.exec.mr.ExecDriver=INFO
log4j.logger.org.apache.hadoop.hive.ql.exec.mr.MapRedTask=INFO
log4j.logger.org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask=INFO
log4j.logger.org.apache.hadoop.hive.ql.exec.Task=INFO
log4j.logger.org.apache.hadoop.hive.ql.session.SessionState=INFO
[hadoop@bigdata01 conf]$
hive.log.dir=${java.io.tmpdir}/${user.name}为Hive的日志目录
{java.io.tmpdir}为 /tmp 目录
hive.log.file=hive.log 为Hive的log日志文件
下载 cdh5
wget http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.16.2.tar.gz
解压
tar -zxvf sqoop-1.4.6-cdh5.16.2.tar.gz -C ~/app/
配置系统环境变量
export SQOOP_HOME=/home/hadoop/app/sqoop-1.4.6-cdh5.16.2
export PATH= S Q O O P H O M E / b i n : SQOOP_HOME/bin: SQOOPHOME/bin:PATH
source
配置文件:
$SQOOP_HOME/conf/
cp sqoop-env-template.sh sqoop-env.sh
export HADOOP_COMMON_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.16.2
export HADOOP_MAPRED_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.16.2
export HIVE_HOME=/home/hadoop/app/hive-1.1.0-cdh5.16.2
驱动包
cp mysql-connector-java-5.1.27-bin.jar $SQOOP_HOME/lib/
测试是否可用:
列出所有数据库
sqoop list-databases \
--connect jdbc:mysql://bigdata01:3306 \
--password 123456 \
--username root
列出所有表
sqoop list-tables \
--connect jdbc:mysql://bigdata01:3306/sqoop \
--password 123456 \
--username root
sqoop import \
--connect jdbc:mysql://bigdata01:3306/sqoop \
--password bigdata --username root \
--table emp \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"
参数:
sqoop import \
--connect jdbc:mysql://bigdata01:3306/sqoop \
--password bigdata --username root \
--table emp \
--columns "id,name" \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"
参数:
sqoop import \
--connect jdbc:mysql://bigdata01:3306/sqoop \
--password bigdata --username root \
--table emp \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--query 'select id,name from student where id <=1 and $CONDITIONS;'
参数:
如果query后使用的是双引号,则$CONDITIONS前必须加转义符,防止shell识别为自己的变量。
sqoop import \
--connect jdbc:mysql://bigdata01:3306/sqoop \
--password bigdata --username root \
--table emp \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"
--where "id > 400"
sqoop import \
--connect jdbc:mysql://bigdata01:3306/sqoop \
--password bigdata --username root \
--table emp \
--target-dir /user/company \
--null-string "" \
--null-non-string "0" \
--check-column "id" \
--incremental append \
--fields-terminated-by '\t' \
--last-value 0
-m 1
参数:
sqoop import \
--connect jdbc:mysql://bigdata01:3306/sqoop \
--password bigdata --username root \
--table emp \
--hive-overwrite \
--delete-target-dir \
--null-string "" \
--null-non-string "0" \
--hive-import \
--hive-database default \
--hive-table staff \
--fields-terminated-by '\t' \
--num-mappers 1
参数:
先保证MySQL创建了一张和Hive一样表结构的表用来接收数据
注意表结构和分隔符都要一样
sqoop import \
-Dsqoop.export.records.per.statement=10 \
--connect jdbc:mysql://bigdata01:3306/sqoop \
--password bigdata --username root \
--table emp \
--table staff \
--export-dir /user/company/ \
--null-string "" \
--null-non-string "0" \
--columns "id,name" \
--fields-terminated-by '\t' \
-m 1
参数:
注意:MySQL中表不存在会自动创建
sqoop import \
--connect jdbc:mysql://bigdata01:3306/sqoop \
--password bigdata --username root \
--table emp \
--num-mappers 1 \
--export-dir /user/hive/warehouse/staff \
--input-fields-terminated-by "\t"
参数: