flume中source是数据源,想数据源传递数据有两种实现思路There are two ways of achieving this. The first option is to create a custom client that communicates with one of Flume’s existing Sources like AvroSource or SyslogTcpSource. Here the client should convert its data into messages understood by these Flume Sources. The other option is to write a custom Flume Source that directly talks with your existing client application using some IPC or RPC protocol, and then converts the client data into Flume Events to be sent downstream. Note that all events stored within the Channel of a Flume agent must exist as Flume Events.
思路1:实现一个client可以向source直接发送数据
思路2:自己实现一个source,可以直接通过rpc通讯实现与自己写的客户端直接传输数据。
客户端1:
As of Flume 1.4.0默认source是用的协议是Avro的RPC协议,thrift协议也是支持的。在conf配置文件中对source中type的配置要与应用程序中获取client时使用的传输协议要一致。例如source在配置文件中设备的type类型为Avro就要使用RpcClientFactory.getDefaultInstance(hostname, port)获取RpcClient。如果在source配置文件中配置的时thrift,就要用函数RpcClientFactory.getThriftInstance(hostname, port);获取RpcClient。记住配置文件中的协议类型要与客户端的协议类型一致。
例子:
public class FirstRPCClient {
private static RpcClient client = null;
private static final String ip="worker1";
private static final int port=41414;
public static void main(String [] args){
// Initialize client with the remote Flume agent's host and port
client = RpcClientFactory.getDefaultInstance(ip, port);
// Send 10 events to the remote Flume agent. That agent should be
// configured to listen with an AvroSource.
String sampleData = "Hello Flume!";
for (int i = 0; i < 10; i++) {
sendDataToFlume(sampleData);
}
client.close();
}
private static void sendDataToFlume(String data) {
// Create a Flume Event object that encapsulates the sample data
Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));
// Send the event
try {
client.append(event);
} catch (EventDeliveryException e) {
// clean up and recreate the client
client.close();
client = null;
//client = RpcClientFactory.getDefaultInstance(hostname, port);
// Use the following method to create a thrift client (instead of the above line):
client = RpcClientFactory.getThriftInstance(ip, port);
}
}
}
配置文件:
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
a1.sinks.k1.type = logger
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
客户端2:失效备援
这个类包封装了Avro RPCclient的类默认提供故障处理能力。hosts采用空格分开host:port所代表的flume agent,构成一个故障处理组。这Failover RPC Client目前不支持thrift。如果当前选择的host agent有问题,这个failover client会自动负载到组中下一个host中。
这边代码设三个host用于故障转移,这里偷懒,用同一个主机的3个端口模拟。代码还是将Hello Flume 发送10遍给第一个flume代理,当第一个代理故障的时候,则发送给第二个代理,以顺序进行故障转移。
下面是代理配置沿用之前的那个,并对配置文件进行拷贝,
cp avro_client_case20.conf avro_client_case21.conf
cp avro_client_case20.conf avro_client_case22.conf
分别修改avro_client_case21.conf与avro_client_case22.conf中的
a1.sources.r1.port= 50001 与a1.sources.r1.port = 50002
flume-ng agent -c conf -f conf/avro_client_case20.conf-n a1 -Dflume.root.logger=INFO,console
flume-ng agent -c conf -f conf/avro_client_case21.conf-n a1 -Dflume.root.logger=INFO,console
flume-ng agent -c conf -f conf/avro_client_case22.conf-n a1 -Dflume.root.logger=INFO,console
具体代码实现如下所示:
public class SecondRPCClient {
private static RpcClient client=null;
private static Properties props=null;
public static void main(String []args) throws Exception{
// Setup properties for the failover
props = new Properties();
props.put("client.type", "default_failover");
// List of hosts (space-separated list of user-chosen host aliases)
props.put("hosts", "h1 h2 h3");
// host/port pair for each host alias
String host1 = "worker1:41414";
String host2 = "worker1:41415";
String host3 = "worker1:41416";
props.put("hosts.h1", host1);
props.put("hosts.h2", host2);
props.put("hosts.h3", host3);
// create the client with failover properties
client = RpcClientFactory.getInstance(props);
for(int i=0;i<100;i++){
Thread.sleep(1000);
sendDataToFlume("hello flume"+i);
}
client.close();
}
private static void sendDataToFlume(String data) {
// Create a Flume Event object that encapsulates the sample data
Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));
// Send the event
try {
client.append(event);
} catch (EventDeliveryException e) {
// clean up and recreate the client
client.close();
client = null;
//client = RpcClientFactory.getDefaultInstance(hostname, port);
// Use the following method to create a thrift client (instead of the above line):
client = RpcClientFactory.getInstance(props);
}
}
}
客户端3:LoadBalancing RPC client
要点1:The LoadBalancing RPC Client currently does not support thrift.
要点2:If backoff is enabled then the client will temporarily blacklist hosts that fail, causing them to be excluded from being selected as a failover host until a given timeout。
客户端4:Secure RPC client - Thrift¶
客户端3,4实现具体参考下边链接即可
http://flume.apache.org/releases/content/1.6.0/FlumeDeveloperGuide.html
Embedded agent
可以嵌入到客户端内使用,但是agent是有要求的,不是所有的source,channel,sink都可以的。其中 File Channel and Memory Channel是被允许的。Avro Sink是惟一被允许的sink
Flume has an embedded agent api which allows users to embed an agent in their application. This agent is meant to be lightweight and as such not all sources, sinks, and channels are allowed. Specifically the source used is a special embedded source and events should be send to the source via the put, putAll methods on the EmbeddedAgent object. Only File Channel and Memory Channel are allowed as channels while Avro Sink is the only supported sink. Interceptors are also supported by the embedded agent.
Note: The embedded agent has a dependency on hadoop-core.jar.
除此之外flume也可以不使用系统提供的source,channel,sink,也可以通过代码自定义这些组件,使用时要加上transaction机制。