大数据工具篇之flume1.4-安装部署指南

一、引言

  flume-ng是一个分布式、高可靠和高效的日志收集系统,flume-ng是flume的新版本的意思,其中“ng”意为new generate(新一代),目前来说,flume-ng 1.4是最新的版本。flume-ng与flume相比,发生了很大的变化,因为之前一直在flume0.9的版本,一直没有升级到flume-ng,最近因为项目需要,做了一次升级,发现了一些问题,特记录下来,分享给大家。

二、版本说明

  flume-ng 1.4.0

三、安装步骤

  下载、解压、安装JDK、设置环境变量部分已经有很多介绍性的问题,不做说明。需要特别说明之处的是,flume-ng不需要要zookeeper,无需设置。

四、flume-ng bug  

  安装完成后运行flume-ng会出现错误信息,这主要是因为shell脚本的问题,我将修改后的flume-ng完整的上传如下,其中标注:#zhangzl下面的行是需要修改的部分。完整脚本如下所示:  

大数据工具篇之flume1.4-安装部署指南
  1 #!/bin/bash

  2 #

  3 #

  4 # Licensed to the Apache Software Foundation (ASF) under one

  5 # or more contributor license agreements.  See the NOTICE file

  6 # distributed with this work for additional information

  7 # regarding copyright ownership.  The ASF licenses this file

  8 # to you under the Apache License, Version 2.0 (the

  9 # "License"); you may not use this file except in compliance

 10 # with the License.  You may obtain a copy of the License at

 11 #

 12 #   http://www.apache.org/licenses/LICENSE-2.0

 13 #

 14 # Unless required by applicable law or agreed to in writing,

 15 # software distributed under the License is distributed on an

 16 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

 17 # KIND, either express or implied.  See the License for the

 18 # specific language governing permissions and limitations

 19 # under the License.

 20 #

 21 

 22 ################################

 23 # constants

 24 ################################

 25 

 26 FLUME_AGENT_CLASS="org.apache.flume.node.Application"

 27 FLUME_AVRO_CLIENT_CLASS="org.apache.flume.client.avro.AvroCLIClient"

 28 FLUME_VERSION_CLASS="org.apache.flume.tools.VersionInfo"

 29 FLUME_TOOLS_CLASS="org.apache.flume.tools.FlumeToolsMain"

 30 

 31 CLEAN_FLAG=1

 32 ################################

 33 # functions

 34 ################################

 35 

 36 info() {

 37   if [ ${CLEAN_FLAG} -ne 0 ]; then

 38     local msg=$1

 39     echo "Info: $msg" >&2

 40   fi

 41 }

 42 

 43 warn() {

 44   if [ ${CLEAN_FLAG} -ne 0 ]; then

 45     local msg=$1

 46     echo "Warning: $msg" >&2

 47   fi

 48 }

 49 

 50 error() {

 51   local msg=$1

 52   local exit_code=$2

 53 

 54   echo "Error: $msg" >&2

 55 

 56   if [ -n "$exit_code" ] ; then

 57     exit $exit_code

 58   fi

 59 }

 60 

 61 # If avail, add Hadoop paths to the FLUME_CLASSPATH and to the

 62 # FLUME_JAVA_LIBRARY_PATH env vars.

 63 # Requires Flume jars to already be on FLUME_CLASSPATH.

 64 add_hadoop_paths() {

 65   local HADOOP_IN_PATH=$(PATH="${HADOOP_HOME:-${HADOOP_PREFIX}}/bin:$PATH" \

 66       which hadoop 2>/dev/null)

 67 

 68   if [ -f "${HADOOP_IN_PATH}" ]; then

 69     info "Including Hadoop libraries found via ($HADOOP_IN_PATH) for HDFS access"

 70 

 71     # determine hadoop java.library.path and use that for flume

 72     local HADOOP_CLASSPATH=""

 73     local HADOOP_JAVA_LIBRARY_PATH=$(HADOOP_CLASSPATH="$FLUME_CLASSPATH" \

 74         ${HADOOP_IN_PATH} org.apache.flume.tools.GetJavaProperty \

 75         java.library.path)

 76 

 77     # look for the line that has the desired property value

 78     # (considering extraneous output from some GC options that write to stdout)

 79     # IFS = InternalFieldSeparator (set to recognize only newline char as delimiter)

 80     IFS=$'\n'

 81     for line in $HADOOP_JAVA_LIBRARY_PATH; do

 82       #if [[ $line =~ ^java\.library\.path=(.*)$ ]]; then

 83       if [[ "$line" =~ "^java\.library\.path=(.*)$" ]]; then

 84         HADOOP_JAVA_LIBRARY_PATH=${BASH_REMATCH[1]}

 85         break

 86       fi

 87     done

 88     unset IFS

 89 

 90     if [ -n "${HADOOP_JAVA_LIBRARY_PATH}" ]; then

 91       FLUME_JAVA_LIBRARY_PATH="$FLUME_JAVA_LIBRARY_PATH:$HADOOP_JAVA_LIBRARY_PATH"

 92     fi

 93 

 94     # determine hadoop classpath

 95     HADOOP_CLASSPATH=$($HADOOP_IN_PATH classpath)

 96 

 97     # hack up and filter hadoop classpath

 98     local ELEMENTS=$(sed -e 's/:/ /g' <<<${HADOOP_CLASSPATH})

 99     local ELEMENT

100     for ELEMENT in $ELEMENTS; do

101       local PIECE

102       for PIECE in $(echo $ELEMENT); do

103           #zhangzl

104         if [[ $PIECE =~ "slf4j-(api|log4j12).*\.jar" ]]; then

105           info "Excluding $PIECE from classpath"

106           continue

107         else

108           FLUME_CLASSPATH="$FLUME_CLASSPATH:$PIECE"

109         fi

110       done

111     done

112 

113   fi

114 }

115 add_HBASE_paths() {

116   local HBASE_IN_PATH=$(PATH="${HBASE_HOME}/bin:$PATH" \

117       which hbase 2>/dev/null)

118 

119   if [ -f "${HBASE_IN_PATH}" ]; then

120     info "Including HBASE libraries found via ($HBASE_IN_PATH) for HBASE access"

121 

122     # determine HBASE java.library.path and use that for flume

123     local HBASE_CLASSPATH=""

124     local HBASE_JAVA_LIBRARY_PATH=$(HBASE_CLASSPATH="$FLUME_CLASSPATH" \

125         ${HBASE_IN_PATH} org.apache.flume.tools.GetJavaProperty \

126         java.library.path)

127 

128     # look for the line that has the desired property value

129     # (considering extraneous output from some GC options that write to stdout)

130     # IFS = InternalFieldSeparator (set to recognize only newline char as delimiter)

131     IFS=$'\n'

132     for line in $HBASE_JAVA_LIBRARY_PATH; do

133     #zhangzl

134       if [[ $line =~ "^java\.library\.path=(.*)$" ]]; then

135         HBASE_JAVA_LIBRARY_PATH=${BASH_REMATCH[1]}

136         break

137       fi

138     done

139     unset IFS

140 

141     if [ -n "${HBASE_JAVA_LIBRARY_PATH}" ]; then

142       FLUME_JAVA_LIBRARY_PATH="$FLUME_JAVA_LIBRARY_PATH:$HBASE_JAVA_LIBRARY_PATH"

143     fi

144 

145     # determine HBASE classpath

146     HBASE_CLASSPATH=$($HBASE_IN_PATH classpath)

147 

148     # hack up and filter HBASE classpath

149     local ELEMENTS=$(sed -e 's/:/ /g' <<<${HBASE_CLASSPATH})

150     local ELEMENT

151     for ELEMENT in $ELEMENTS; do

152       local PIECE

153       for PIECE in $(echo $ELEMENT); do

154       #zhangzl

155         if [[ $PIECE =~ "slf4j-(api|log4j12).*\.jar" ]]; then

156           info "Excluding $PIECE from classpath"

157           continue

158         else

159           FLUME_CLASSPATH="$FLUME_CLASSPATH:$PIECE"

160         fi

161       done

162     done

163     FLUME_CLASSPATH="$FLUME_CLASSPATH:$HBASE_HOME/conf"

164 

165   fi

166 }

167 

168 set_LD_LIBRARY_PATH(){

169 #Append the FLUME_JAVA_LIBRARY_PATH to whatever the user may have specified in

170 #flume-env.sh

171   if [ -n "${FLUME_JAVA_LIBRARY_PATH}" ]; then

172     export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${FLUME_JAVA_LIBRARY_PATH}"

173   fi

174 }

175 

176 display_help() {

177   cat <<EOF

178 Usage: $0 <command> [options]...

179 

180 commands:

181   help                  display this help text

182   agent                 run a Flume agent

183   avro-client           run an avro Flume client

184   version               show Flume version info

185 

186 global options:

187   --conf,-c <conf>      use configs in <conf> directory

188   --classpath,-C <cp>   append to the classpath

189   --dryrun,-d           do not actually start Flume, just print the command

190   --plugins-path <dirs> colon-separated list of plugins.d directories. See the

191                         plugins.d section in the user guide for more details.

192                         Default: \$FLUME_HOME/plugins.d

193   -Dproperty=value      sets a Java system property value

194   -Xproperty=value      sets a Java -X option

195 

196 agent options:

197   --conf-file,-f <file> specify a config file (required)

198   --name,-n <name>      the name of this agent (required)

199   --help,-h             display help text

200 

201 avro-client options:

202   --rpcProps,-P <file>   RPC client properties file with server connection params

203   --host,-H <host>       hostname to which events will be sent

204   --port,-p <port>       port of the avro source

205   --dirname <dir>        directory to stream to avro source

206   --filename,-F <file>   text file to stream to avro source (default: std input)

207   --headerFile,-R <file> File containing event headers as key/value pairs on each new line

208   --help,-h              display help text

209 

210   Either --rpcProps or both --host and --port must be specified.

211 

212 Note that if <conf> directory is specified, then it is always included first

213 in the classpath.

214 

215 EOF

216 }

217 

218 run_flume() {

219   local FLUME_APPLICATION_CLASS

220 

221   if [ "$#" -gt 0 ]; then

222     FLUME_APPLICATION_CLASS=$1

223     shift

224   else

225     error "Must specify flume application class" 1

226   fi

227 

228   if [ ${CLEAN_FLAG} -ne 0 ]; then

229     set -x

230   fi

231   $EXEC $JAVA_HOME/bin/java $JAVA_OPTS -cp "$FLUME_CLASSPATH" \

232       -Djava.library.path=$FLUME_JAVA_LIBRARY_PATH "$FLUME_APPLICATION_CLASS" $*

233 }

234 

235 ################################

236 # main

237 ################################

238 

239 # set default params

240 FLUME_CLASSPATH=""

241 FLUME_JAVA_LIBRARY_PATH=""

242 JAVA_OPTS="-Xmx20m"

243 LD_LIBRARY_PATH=""

244 

245 opt_conf=""

246 opt_classpath=""

247 opt_plugins_dirs=""

248 opt_java_props=""

249 opt_dryrun=""

250 

251 mode=$1

252 shift

253 

254 case "$mode" in

255   help)

256     display_help

257     exit 0

258     ;;

259   agent)

260     opt_agent=1

261     ;;

262   node)

263     opt_agent=1

264     warn "The \"node\" command is deprecated. Please use \"agent\" instead."

265     ;;

266   avro-client)

267     opt_avro_client=1

268     ;;

269   tool)

270     opt_tool=1

271     ;;

272   version)

273    opt_version=1

274    CLEAN_FLAG=0

275    ;;

276   *)

277     error "Unknown or unspecified command '$mode'"

278     echo

279     display_help

280     exit 1

281     ;;

282 esac

283 

284 args=""

285 while [ -n "$*" ] ; do

286   arg=$1

287   shift

288 

289   case "$arg" in

290     --conf|-c)

291       [ -n "$1" ] || error "Option --conf requires an argument" 1

292       opt_conf=$1

293       shift

294       ;;

295     --classpath|-C)

296       [ -n "$1" ] || error "Option --classpath requires an argument" 1

297       opt_classpath=$1

298       shift

299       ;;

300     --dryrun|-d)

301       opt_dryrun="1"

302       ;;

303     --plugins-path)

304       opt_plugins_dirs=$1

305       shift

306       ;;

307     -D*)

308       opt_java_props="$opt_java_props $arg"

309       ;;

310     -X*)

311       opt_java_props="$opt_java_props $arg"

312       ;;

313     *)

314       args="$args $arg"

315       ;;

316   esac

317 done

318 

319 # make opt_conf absolute

320 if [[ -n "$opt_conf" && -d "$opt_conf" ]]; then

321   opt_conf=$(cd $opt_conf; pwd)

322 fi

323 

324 # allow users to override the default env vars via conf/flume-env.sh

325 if [ -z "$opt_conf" ]; then

326   warn "No configuration directory set! Use --conf <dir> to override."

327 elif [ -f "$opt_conf/flume-env.sh" ]; then

328   info "Sourcing environment configuration script $opt_conf/flume-env.sh"

329   source "$opt_conf/flume-env.sh"

330 fi

331 

332 # append command-line java options to stock or env script JAVA_OPTS

333 if [ -n "${opt_java_props}" ]; then

334   JAVA_OPTS="${JAVA_OPTS} ${opt_java_props}"

335 fi

336 

337 # prepend command-line classpath to env script classpath

338 if [ -n "${opt_classpath}" ]; then

339   if [ -n "${FLUME_CLASSPATH}" ]; then

340     FLUME_CLASSPATH="${opt_classpath}:${FLUME_CLASSPATH}"

341   else

342     FLUME_CLASSPATH="${opt_classpath}"

343   fi

344 fi

345 

346 if [ -z "${FLUME_HOME}" ]; then

347   FLUME_HOME=$(cd $(dirname $0)/..; pwd)

348 fi

349 

350 # prepend $FLUME_HOME/lib jars to the specified classpath (if any)

351 if [ -n "${FLUME_CLASSPATH}" ] ; then

352   FLUME_CLASSPATH="${FLUME_HOME}/lib/*:$FLUME_CLASSPATH"

353 else

354   FLUME_CLASSPATH="${FLUME_HOME}/lib/*"

355 fi

356 

357 # load plugins.d directories

358 PLUGINS_DIRS=""

359 if [ -n "${opt_plugins_dirs}" ]; then

360   PLUGINS_DIRS=$(sed -e 's/:/ /g' <<<${opt_plugins_dirs})

361 else

362   PLUGINS_DIRS="${FLUME_HOME}/plugins.d"

363 fi

364 

365 unset plugin_lib plugin_libext plugin_native

366 for PLUGINS_DIR in $PLUGINS_DIRS; do

367   if [[ -d ${PLUGINS_DIR} ]]; then

368     for plugin in ${PLUGINS_DIR}/*; do

369       if [[ -d "$plugin/lib" ]]; then

370         plugin_lib="${plugin_lib}${plugin_lib+:}${plugin}/lib/*"

371       fi

372       if [[ -d "$plugin/libext" ]]; then

373         plugin_libext="${plugin_libext}${plugin_libext+:}${plugin}/libext/*"

374       fi

375       if [[ -d "$plugin/native" ]]; then

376         plugin_native="${plugin_native}${plugin_native+:}${plugin}/native"

377       fi

378     done

379   fi

380 done

381 

382 if [[ -n "${plugin_lib}" ]]

383 then

384   FLUME_CLASSPATH="${FLUME_CLASSPATH}:${plugin_lib}"

385 fi

386 

387 if [[ -n "${plugin_libext}" ]]

388 then

389   FLUME_CLASSPATH="${FLUME_CLASSPATH}:${plugin_libext}"

390 fi

391 

392 if [[ -n "${plugin_native}" ]]

393 then

394   if [[ -n "${FLUME_JAVA_LIBRARY_PATH}" ]]

395   then

396     FLUME_JAVA_LIBRARY_PATH="${FLUME_JAVA_LIBRARY_PATH}:${plugin_native}"

397   else

398     FLUME_JAVA_LIBRARY_PATH="${plugin_native}"

399   fi

400 fi

401 

402 # find java

403 if [ -z "${JAVA_HOME}" ] ; then

404   warn "JAVA_HOME is not set!"

405   # Try to use Bigtop to autodetect JAVA_HOME if it's available

406   if [ -e /usr/libexec/bigtop-detect-javahome ] ; then

407     . /usr/libexec/bigtop-detect-javahome

408   elif [ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ] ; then

409     . /usr/lib/bigtop-utils/bigtop-detect-javahome

410   fi

411 

412   # Using java from path if bigtop is not installed or couldn't find it

413   if [ -z "${JAVA_HOME}" ] ; then

414     JAVA_DEFAULT=$(type -p java)

415     [ -n "$JAVA_DEFAULT" ] || error "Unable to find java executable. Is it in your PATH?" 1

416     JAVA_HOME=$(cd $(dirname $JAVA_DEFAULT)/..; pwd)

417   fi

418 fi

419 

420 # look for hadoop libs

421 add_hadoop_paths

422 add_HBASE_paths

423 

424 # prepend conf dir to classpath

425 if [ -n "$opt_conf" ]; then

426   FLUME_CLASSPATH="$opt_conf:$FLUME_CLASSPATH"

427 fi

428 

429 set_LD_LIBRARY_PATH

430 # allow dryrun

431 EXEC="exec"

432 if [ -n "${opt_dryrun}" ]; then

433   warn "Dryrun mode enabled (will not actually initiate startup)"

434   EXEC="echo"

435 fi

436 

437 # finally, invoke the appropriate command

438 if [ -n "$opt_agent" ] ; then

439   run_flume $FLUME_AGENT_CLASS $args

440 elif [ -n "$opt_avro_client" ] ; then

441   run_flume $FLUME_AVRO_CLIENT_CLASS $args

442 elif [ -n "${opt_version}" ] ; then

443   run_flume $FLUME_VERSION_CLASS $args

444 elif [ -n "${opt_tool}" ] ; then

445   run_flume $FLUME_TOOLS_CLASS $args

446 else

447   error "This message should never appear" 1

448 fi

449 

450 exit 0
View Code

五、测试配置文件

  在conf目录下创建example-conf.properties文件,属性如下所示:  

 1 # Describe the source 

 2 a1.sources = r1

 3 a1.sinks = k1

 4 a1.channels = c1

 5 

 6 # Describe/configure the source

 7 a1.sources.r1.type = avro

 8 a1.sources.r1.bind = localhost

 9 a1.sources.r1.port = 44444

10 

11 # Describe the sink

12 # 将数据输出至日志中

13 a1.sinks.k1.type = logger

14 

15 

16 # Use a channel which buffers events in memory

17 a1.channels.c1.type = memory

18 a1.channels.c1.capacity = 1000

19 a1.channels.c1.transactionCapacity = 100

20 

21 # Bind the source and sink to the channel

22 a1.sources.r1.channels = c1

23 a1.sinks.k1.channel = c1

六、运行命令

  6.1 启动代理

[hadoop@hadoop1 conf]$ flume-ng agent -n a1 -f example-conf.properties

  6.2 启动avro-client客户端向agent代理发送数据-需要单独启动新的窗口

[hadoop@hadoop1 conf]$ flume-ng avro-client -H localhost -p 44444 -F file01

七、结果查看

1 14/01/16 22:26:34 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 => /127.0.0.1:44444] OPEN

2 14/01/16 22:26:34 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 => /127.0.0.1:44444] BOUND: /127.0.0.1:44444

3 14/01/16 22:26:34 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 => /127.0.0.1:44444] CONNECTED: /127.0.0.1:54289

4 14/01/16 22:26:36 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 :> /127.0.0.1:44444] DISCONNECTED

5 14/01/16 22:26:36 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 :> /127.0.0.1:44444] UNBOUND

6 14/01/16 22:26:36 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 :> /127.0.0.1:44444] CLOSED

7 14/01/16 22:26:36 INFO ipc.NettyServer: Connection to /127.0.0.1:54289 disconnected.

8 14/01/16 22:26:38 INFO sink.LoggerSink: Event: { headers:{} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64                hello world }

 

你可能感兴趣的:(Flume)