flume安装在hadoop记录

flume介绍

1、功能:数据采集
	1.1、collecting收集
	1.2、aggregating聚合
	1.3、moving移动
2、数据源->目标地
3、流式:就像水流一样,动态的,有序
		3.1、实时和离线都可以使用
4、flume如何去解决数据源在windows的问题
		4.1、Linux:NFS(NetWork File System)
		4.2、通过NFS来将windows上的目录挂载到Linux系统上
5、三大组件:source、channel、sink
6、channel类型
		6.1、file channel
		6.2、memory channel
		6.3、kafka channel
7、source、channel、sink共同组成了一个agent

flume部署

安装包下载
链接:https://pan.baidu.com/s/170PUOjZIl_MSQU6AmzYe6w 密码:gigf

flume安装在hadoop上有三种安装方式
如果flume在Hadoop集群中

  • 修改flume-env.sh
    export JAVA_HOME=/opt/moduels/jdk1.7.0_67

如果flume在Hadoop集群中,而且Hadoop是配置了HA的

  • 修改flume-env.sh
    export JAVA_HOME=/opt/moduels/jdk1.7.0_67
  • 将Hadoop的core-site和hdfs-site文件拷贝到flume/conf下

如果flume不在Hadoop集群中

  • 修改flume-env.sh
    export JAVA_HOME=/opt/moduels/jdk1.7.0_67
  • 将Hadoop的core-site和hdfs-site文件拷贝到flume/conf下
  • 将Hadoop相关jar包放到lib目录下,lib包在上面下载。

常用用法:
-Dflume.root.logger=INFO.console

flume-hdfs-dir-mem-conf.properties

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'agent'

agent.sources = seqGenSrc
agent.channels = memoryChannel
agent.sinks = loggerSink

# For each one of the sources, the type is defined
agent.sources.seqGenSrc.type = spooldir
agent.sources.seqGenSrc.spoolDir = /opt/cdhmoduels/apache-flume-1.5.0-cdh5.3.6-bin/file/spooling

# The channel can be defined as follows.
agent.sources.seqGenSrc.channels = memoryChannel

# Each sink's type must be defined
agent.sinks.loggerSink.type = logger
agent.sinks.loggerSink.type = hdfs
agent.sinks.loggerSink.hdfs.path = /flume/event/hdfsdir
agent.sinks.loggerSink.hdfs.filePrefix = hive-log
agent.sinks.loggerSink.hdfs.rollSize = 10240
agent.sinks.loggerSink.hdfs.rollInterval = 0
agent.sinks.loggerSink.hdfs.rollCount = 0
#Specify the channel the sink should use
agent.sinks.loggerSink.channel = memoryChannel

# Each channel's type is defined.	
agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 1000
agent.channels.memoryChannel.transactionCapacity = 1000

启动命令

bin/flume-ng agent --conf conf/ --name agent --conf-file conf/flume-hdfs-dir-mem-conf.properties -Dflume.root.logger=INFO,console

你可能感兴趣的:(大数据学习)