本人刚接触大数据,在调试安装的的过程中,有些bug没有记录,有些bug的处理方法也不太记得清了,如下述流程有误,欢迎批评指正
mac自带jdk,用homebrew安装hadoop,注意brew安装的文件都在/usr/local/Cellar/下
brew install hadoop
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa ```
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
测试一下:ssh localhost
出现 ssh: connect to host localhost port 22: Connection refused
则到系统偏好设置,共享下,打开远程登录功能
(1) core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dirname>
<value>/Users/glenn/.hadoop_tmpvalue>
<description>A base for other temporary directories.description>
property>
<property>
<name>fs.default.namename>
<value>hdfs://localhost:9000value>
property>
configuration>
注意,这里hadoop.tmp.dir对应了hadoop的文件系统路径,里面记录了namenode,datanode,mapred的相关信息,hdfs下的文件内容都在这里,默认情况下,它对应的是/tmp/{$user}
,这是个随时会清空的路径,每次重启也会自动清空,这将会影响hdfs内容的存储,必须修改路径;
如果不修改,对应的bug现象是:jps找不到datanode 或 datanode等;一般这时候需要格式化hdfs,bin/hadoop namenode -format
,多次之后,出现 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Java.io.IOException:Incompatible namespaceIDs
(2) hdfs-site.xml
伪分布式不需要备份文件信息了
<configuration>
<property>
<name>dfs.replicationname>
<value>1value>
property>
configuration>
(3) mapred-site.xml
<configuration>
<property>
<name>mapred.job.trackername>
<value>localhost:9001value>
property>
configuration>
貌似参考了hadoop1.0版本的设置。。。需要配置yarn的请参考其他说明,如:
http://www.cnblogs.com/micrari/p/5716851.html
HADOOP_HOME="/usr/local/Cellar/hadoop/2.8.0"
PATH=$HADOOP_HOME/sbin:$PATH:
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
alias start-hadoop='$HADOOP_HOME/sbin/start-all.sh'
alias stop-hadoop='$HADOOP_HOME/sbin/stop-all.sh'
格式化hdfs(参考): bin/hadoop namenode -format
启动hadoop: start-hadoop
bug:“WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable”
原因:hadoop的lib是32位的,系统64位,需要重新编译lib库,不管它也可以正常运行大部分功能
命令行检查: jps
Jps
SecondaryNameNode
ResourceManager
NodeManager
DataNode
NameNode
至少要出现datanode,resourceManager,namenode
浏览器查看:
ResourceManager:http://localhost:50070
JobTracker:http://localhost:8088
Node imformation:http://localhost:8042
DataNode:http://localhost:50075
(1)hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop
.ipc.RemoteException: java.io.IOException: … could only be replicated to 0 nodes, instead of 1 …
datanode启动异常
stop-hadoop
hadoop namenode -format
检查是不是hadoop.tmp.dir路径有问题
(2)It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon
单节点hadoop,端口号不是9000,参考5中端口号
参考:http://www.jianshu.com/p/d19ce17234b7
hive能将hdfs上的数据看作是数据库表的形式来处理,为此,它需要为数据形成表模式,这些信息存储在metastore数据库中,也就是说,它依赖数据库的管理模式,所以是需要为节点配置数据库的。
hive的metastore的配置有三种模式:
(1) 内嵌metasore:每次只能有一个内嵌的Derby数据库可以访问某个磁盘的数据库文件,这是hive默认的配置形式
(2) 本地metastore:支持多用户同时访问,但是metastore服务会和hive服务运行在同一个进程
(3)远程metastore:metastore服务和hive服务运行在不同进程,数据库可以置于防火墙之后
brew:brew install hive
HIVE_HOME="/usr/local/Cellar/hive/2.1.1"
PATH=$HIVE_HOME/binL$PATH:
这里采用本地metastore配置
(1) 安装mysql:brew install mysql
(2) 测试mysql:
mysql.server start
mysql_secure_installation
mysql -u root -p
(3) mysql下创建metastore和hive用户
mysql> CREATE DATABASE metastore;
mysql> USE metastore;
mysql> CREATE USER 'hiveuser'@'localhost' IDENTIFIED BY 'password';
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,ALTER,CREATE ON metastore.* TO 'hiveuser'@'localhost';
这里创建了数据库metastore,本地用户hiveuser
(4) 下载mysql的jdbc:
curl -L 'http://www.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.22.tar.gz/from/http://mysql.he.net/
sudo cp mysql-connector-java-5.1.15/mysql-connector-java-5.1.22-bin.jar /usr/local/Cellar/hive/hive.version.no/libexec/lib/
(1) hive-default.xml
直接cp hive-deafult.xml.template hive-default.xml
(2) hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURLname>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=truevalue>
property>
<property>
<name>javax.jdo.option.ConnectionDriverNamename>
<value>com.mysql.jdbc.Drivervalue>
property>
<property>
<name>javax.jdo.option.ConnectionUserNamename>
<value>hiveuservalue>
property>
<property>
<name>javax.jdo.option.ConnectionPasswordname>
<value>rootvalue>
property>
<property>
<name>datanucleus.autoCreateSchemaname>
<value>truevalue>
property>
<property>
<name>datanucleus.fixedDatastorename>
<value>truevalue>
property>
<property>
<name>datanucleus.autoCreateTablesname>
<value>Truevalue>
property>
<property>
<name>hive.metastore.warehouse.dirname>
<value>/user/hive/warehousevalue>
<description>location of default database for the warehousedescription>
property>
<property>
<name>hive.metastore.schema.verificationname>
<value>falsevalue>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic
schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
description>
property>
configuration>
主要是配上javax.jdo.option.ConnectionURL,设置为刚刚创建的metastore,javax.jdo.option.ConnectionDriverName设置为jdbc的驱动,javax.jdo.option.ConnectionUserName设置为刚刚建立的用户hiveuser,hive.metastore.warehouse.dir设置hive内表对应的hdfs路径根目录
hadoop开启:hadoop-start
hive开启: hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/Cellar/hive/2.1.1/libexec/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/Cellar/hadoop/2.8.0/libexec/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/Cellar/hive/2.1.1/libexec/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
(1) Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStore
没有关联上对应的metastore,这里可能是没有事先创建好db,mysql的服务又没有启动;也可能是忘记先开启hadoop;
(2)Version information not found
hive-site.xml中没有hive.metastore.schema.verification为false
(3)metastore_db cannot create
“ERROR Datastore.Schema (Log4JLogger.java:error(125)) - Failed initialising database.
Failed to create database ‘metastore_db’, see the next exception for details”
检查hive-site.xml路径;第一次启动,写权限不够,sudo hive即可(不确定第二条成不成立,不记得具体如何解决了)
参考:http://www.cnblogs.com/ToDoToTry/p/5349753.html
brew:brew install hbase
(1)hbase-env.sh
这里我主要打开了hbase自带的zookeeper,设置hadoop路径
export HBASE_MANAGES_ZK=true
export HBASE_CLASSPATH="/usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop"
(2)hbase-site.xml
<configuration>
<property>
<name>hbase.rootdirname>
<value>hdfs://localhost:9000/hbasevalue>
property>
<property>
<name>hbase.zookeeper.property.clientPortname>
<value>2181value>
property>
<property>
<name>hbase.zookeeper.property.dataDirname>
<value>/usr/local/var/zookeepervalue>
property>
<property>
<name>hbase.zookeeper.dns.interfacename>
<value>lo0value>
property>
<property>
<name>hbase.regionserver.dns.interfacename>
<value>lo0value>
property>
<property>
<name>hbase.master.dns.interfacename>
<value>lo0value>
property>
<property>
<name>hbase.cluster.distributedname>
<value>truevalue>
property>
<property>
<name>dfs.replicationname>
<value>1value>
property>
<property>
<name>hbase.master.info.portname>
<value>60010value>
property>
configuration>
这里主要是hbase.rootdir端口号要和hadoop的datanode保持一致,zookeeper采用系统默认的,hbase的端口号hbase.master.info.port改为60010
HBASE_HOME="/usr/local/Cellar/hbase/1.2.6"
PATH=$HBASE_HOME/bin:$PATH:
(1)shell下查看:hbase shell
(2)服务查看:start-hbase.sh
,进入localhost:60010
(1)hbase 控制台打不开
hbase 1.0 以后的版本,需要自己手动配置hbase端口,在文件 hbase-site.xml 中添加如下配置
<property>
<name>hbase.master.info.portname>
<value>60010value>
property>
brew:brew install scala
官网:http://spark.apache.org/downloads.html
把下载的包解压到 /usr/local/spark/
(1)
cp slaves.template slaves
cp spark-env.sh.template spark-env.sh
(2) spark-env.sh
export SCALA_HOME=/usr/loal/Cellar/scala/2.12.3
export SPARK_HOME=/usr/local/spark/spark-2.2.0-bin-hadoop2.7
export HADOOP_HOME=/usr/local/Cellar/hadoop/2.8.0
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home
export HADOOP_CONF_DIR=/usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop
export SPARK_WORKER_MEMORY=1g
export SPARK_MASTER_IP=localhost
export SPARK_WORKER_CORES=2
export SPARK_LOCAL_IP=127.0.0.1
export SPARK_MASTER_WEBUI_PORT=1080
SPARK_HOME=/usr/local/spark/spark-2.2.0-bin-hadoop2.7
PATH=$SPARK_HOME/bin:$PATH:
alias start-spark='sudo $SPARK_HOME/sbin/start-all.sh'
alias stop-spark='sudo $SPARK_HOME/sbin/stop-all.sh'
开启spark:start-spark
运行demo:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local examples/jars/spark-examples_2.11-2.2.0.jar
shell测试:spark-shell
(1)Java.NET.BindException: Cannot assign requested address: Service ‘sparkDriver’ failed after 16 retries (starting from 0)! Consider explicitly setting the appropriate port for the service ‘sparkDriver’ (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
根据提示,这里是端口号没设置上,检查spark-env.sh,是否有设置如下两个参数
export SPARK_LOCAL_IP=127.0.0.1
export SPARK_MASTER_WEBUI_PORT=1080
(2)Directory /usr/local/spark/spark-2.2.0-bin-hadoop2.7/metastore_db cannot be created.
该路径上创建db文件夹没有权限,用sudo spark-shell
(3)mac root@localhost’s password: localhost: Permission denied, please try again
如果忘记密码,则重设root密码:
sudo passwd root
否则可能远程登录服务没开启:
sudo launchctl load -w /System/Library/LaunchDaemons/ssh.plist
或者直接在系统偏好设置的共享里面打开远程登录