As mentioned in the earlier section, Flume agent configuration is read from a file that resembles a Java property file format with hierarchical property settings.
Defining the flow
To define the flow within a single agent, you need to link the sources and sinks via a channel. You need to list the sources, sinks and channels for the given agent, and then point the source and sink to a channel. A source instance can specify multiple channels, but a sink instance can only specify one channel. The format is as follows:
# list the sources, sinks and channels for the agent.sources=.sinks=.channels=# set channel for source.sources.= ...# set channel for sink.sinks..channel=
For example, an agent named agent_foo is reading data from an external avro client and sending it to HDFS via a memory channel. The config file weblog.config could look like:
# list the sources, sinks and channels for the agentagent_foo.sources=avro-appserver-src-1agent_foo.sinks=hdfs-sink-1agent_foo.channels=mem-channel-1# set channel for sourceagent_foo.sources.avro-appserver-src-1.channels=mem-channel-1# set channel for sinkagent_foo.sinks.hdfs-sink-1.channel=mem-channel-1
This will make the events flow from avro-AppSrv-source to hdfs-Cluster1-sink through the memory channel mem-channel-1. When the agent is started with the weblog.config as its config file, it will instantiate that flow.
Configuring individual components
After defining the flow, you need to set properties of each source, sink and channel. This is done in the same hierarchical namespace fashion where you set the component type and other values for the properties specific to each component:
# properties for sources.sources.=# properties for channels.channel..=# properties for sinks.sources..=
The property “type” needs to be set for each component for Flume to understand what kind of object it needs to be. Each source, sink and channel type has its own set of properties required for it to function as intended. All those need to be set as needed. In the previous example, we have a flow from avro-AppSrv-source to hdfs-Cluster1-sink through the memory channel mem-channel-1. Here’s an example that shows configuration of each of those components:
agent_foo.sources=avro-AppSrv-sourceagent_foo.sinks=hdfs-Cluster1-sinkagent_foo.channels=mem-channel-1# set channel for sources, sinks# properties of avro-AppSrv-sourceagent_foo.sources.avro-AppSrv-source.type=avroagent_foo.sources.avro-AppSrv-source.bind=localhostagent_foo.sources.avro-AppSrv-source.port=10000# properties of mem-channel-1agent_foo.channels.mem-channel-1.type=memoryagent_foo.channels.mem-channel-1.capacity=1000agent_foo.channels.mem-channel-1.transactionCapacity=100# properties of hdfs-Cluster1-sinkagent_foo.sinks.hdfs-Cluster1-sink.type=hdfsagent_foo.sinks.hdfs-Cluster1-sink.hdfs.path=hdfs://namenode/flume/webdata#...
Adding multiple flows in an agent
A single Flume agent can contain several independent flows. You can list multiple sources, sinks and channels in a config. These components can be linked to form multiple flows:
# list the sources, sinks and channels for the agent.sources=.sinks=.channels=
Then you can link the sources and sinks to their corresponding channels (for sources) of channel (for sinks) to setup two different flows. For example, if you need to setup two flows in an agent, one going from an external avro client to external HDFS and another from output of a tail to avro sink, then here’s a config to do that:
# list the sources, sinks and channels in the agentagent_foo.sources=avro-AppSrv-source1 exec-tail-source2agent_foo.sinks=hdfs-Cluster1-sink1 avro-forward-sink2agent_foo.channels=mem-channel-1 file-channel-2# flow #1 configurationagent_foo.sources.avro-AppSrv-source1.channels=mem-channel-1agent_foo.sinks.hdfs-Cluster1-sink1.channel=mem-channel-1# flow #2 configurationagent_foo.sources.exec-tail-source2.channels=file-channel-2agent_foo.sinks.avro-forward-sink2.channel=file-channel-2
Configuring a multi agent flow
To setup a multi-tier flow, you need to have an avro/thrift sink of first hop pointing to avro/thrift source of the next hop. This will result in the first Flume agent forwarding events to the next Flume agent. For example, if you are periodically sending files (1 file per event) using avro client to a local Flume agent, then this local agent can forward it to another agent that has the mounted for storage.
Weblog agent config:
# list sources, sinks and channels in the agentagent_foo.sources=avro-AppSrv-sourceagent_foo.sinks=avro-forward-sinkagent_foo.channels=file-channel# define the flowagent_foo.sources.avro-AppSrv-source.channels=file-channelagent_foo.sinks.avro-forward-sink.channel=file-channel# avro sink propertiesagent_foo.sources.avro-forward-sink.type=avroagent_foo.sources.avro-forward-sink.hostname=10.1.1.100agent_foo.sources.avro-forward-sink.port=10000# configure other pieces#...
HDFS agent config:
# list sources, sinks and channels in the agentagent_foo.sources=avro-collection-sourceagent_foo.sinks=hdfs-sinkagent_foo.channels=mem-channel# define the flowagent_foo.sources.avro-collection-source.channels=mem-channelagent_foo.sinks.hdfs-sink.channel=mem-channel# avro sink propertiesagent_foo.sources.avro-collection-source.type=avroagent_foo.sources.avro-collection-source.bind=10.1.1.100agent_foo.sources.avro-collection-source.port=10000# configure other pieces#...
Here we link the avro-forward-sink from the weblog agent to the avro-collection-source of the hdfs agent. This will result in the events coming from the external appserver source eventually getting stored in HDFS.
Fan out flow
As discussed in previous section, Flume supports fanning out the flow from one source to multiple channels. There are two modes of fan out, replicating and multiplexing. In the replicating flow, the event is sent to all the configured channels. In case of multiplexing, the event is sent to only a subset of qualifying channels. To fan out the flow, one needs to specify a list of channels for a source and the policy for the fanning it out. This is done by adding a channel “selector” that can be replicating or multiplexing. Then further specify the selection rules if it’s a multiplexer. If you don’t specify a selector, then by default it’s replicating:
# List the sources, sinks and channels for the agent.sources=.sinks=.channels=# set list of channels for source (separated by space).sources..channels=# set channel for sinks.sinks..channel=.sinks..channel=.sources..selector.type=replicating
The multiplexing select has a further set of properties to bifurcate the flow. This requires specifying a mapping of an event attribute to a set for channel. The selector checks for each configured attribute in the event header. If it matches the specified value, then that event is sent to all the channels mapped to that value. If there’s no match, then the event is sent to set of channels configured as default:
# Mapping for multiplexing selector.sources..selector.type=multiplexing.sources..selector.header=.sources..selector.mapping.=.sources..selector.mapping.=.sources..selector.mapping.=#....sources..selector.default=
The mapping allows overlapping the channels for each value.
The following example has a single flow that multiplexed to two paths. The agent named agent_foo has a single avro source and two channels linked to two sinks:
# list the sources, sinks and channels in the agentagent_foo.sources=avro-AppSrv-source1agent_foo.sinks=hdfs-Cluster1-sink1 avro-forward-sink2agent_foo.channels=mem-channel-1 file-channel-2# set channels for sourceagent_foo.sources.avro-AppSrv-source1.channels=mem-channel-1 file-channel-2# set channel for sinksagent_foo.sinks.hdfs-Cluster1-sink1.channel=mem-channel-1agent_foo.sinks.avro-forward-sink2.channel=file-channel-2# channel selector configurationagent_foo.sources.avro-AppSrv-source1.selector.type=multiplexingagent_foo.sources.avro-AppSrv-source1.selector.header=Stateagent_foo.sources.avro-AppSrv-source1.selector.mapping.CA=mem-channel-1agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ=file-channel-2agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY=mem-channel-1 file-channel-2agent_foo.sources.avro-AppSrv-source1.selector.default=mem-channel-1
The selector checks for a header called “State”. If the value is “CA” then its sent to mem-channel-1, if its “AZ” then it goes to file-channel-2 or if its “NY” then both. If the “State” header is not set or doesn’t match any of the three, then it goes to mem-channel-1 which is designated as ‘default’.
The selector also supports optional channels. To specify optional channels for a header, the config parameter ‘optional’ is used in the following way:
The selector will attempt to write to the required channels first and will fail the transaction if even one of these channels fails to consume the events. The transaction is reattempted on all of the channels. Once all required channels have consumed the events, then the selector will attempt to write to the optional channels. A failure by any of the optional channels to consume the event is simply ignored and not retried.
If there is an overlap between the optional channels and required channels for a specific header, the channel is considered to be required, and a failure in the channel will cause the entire set of required channels to be retried. For instance, in the above example, for the header “CA” mem-channel-1 is considered to be a required channel even though it is marked both as required and optional, and a failure to write to this channel will cause that event to be retried on all channels configured for the selector.
Note that if a header does not have any required channels, then the event will be written to the default channels and will be attempted to be written to the optional channels for that header. Specifying optional channels will still cause the event to be written to the default channels, if no required channels are specified. If no channels are designated as default and there are no required, the selector will attempt to write the events to the optional channels. Any failures are simply ignored in that case.
Flume Sources
Avro Source
Listens on Avro port and receives events from external Avro client streams. When paired with the built-in Avro Sink on another (previous hop) Flume agent, it can create tiered collection topologies. Required properties are in bold.
Property Name
Default
Description
channels
–
type
–
The component type name, needs to be avro
bind
–
hostname or IP address to listen on
port
–
Port # to bind to
threads
–
Maximum number of worker threads to spawn
selector.type
selector.*
interceptors
–
Space-separated list of interceptors
interceptors.*
compression-type
none
This can be “none” or “deflate”. The compression-type must match the compression-type of matching AvroSource
ssl
false
Set this to true to enable SSL encryption. You must also specify a “keystore” and a “keystore-password”.
keystore
–
This is the path to a Java keystore file. Required for SSL.
keystore-password
–
The password for the Java keystore. Required for SSL.
keystore-type
JKS
The type of the Java keystore. This can be “JKS” or “PKCS12”.
Listens on Thrift port and receives events from external Thrift client streams. When paired with the built-in ThriftSink on another (previous hop) Flume agent, it can create tiered collection topologies. Required properties are in bold.
Exec source runs a given Unix command on start-up and expects that process to continuously produce data on standard out (stderr is simply discarded, unless property logStdErr is set to true). If the process exits for any reason, the source also exits and will produce no further data. This means configurations such as cat[namedpipe] or tail-F[file] are going to produce the desired results where as date will probably not - the former two commands produce streams of data where as the latter produces a single event and exits.
Required properties are in bold.
Property Name
Default
Description
channels
–
type
–
The component type name, needs to be exec
command
–
The command to execute
shell
–
A shell invocation used to run the command. e.g. /bin/sh -c. Required only for commands relying on shell features like wildcards, back ticks, pipes etc.
restartThrottle
10000
Amount of time (in millis) to wait before attempting a restart
restart
false
Whether the executed cmd should be restarted if it dies
logStdErr
false
Whether the command’s stderr should be logged
batchSize
20
The max number of lines to read and send to the channel at a time
selector.type
replicating
replicating or multiplexing
selector.*
Depends on the selector.type value
interceptors
–
Space-separated list of interceptors
interceptors.*
Warning
The problem with ExecSource and other asynchronous sources is that the source can not guarantee that if there is a failure to put the event into the Channel the client knows about it. In such cases, the data will be lost. As a for instance, one of the most commonly requested features is the tail-F[file]-like use case where an application writes to a log file on disk and Flume tails the file, sending each line as an event. While this is possible, there’s an obvious problem; what happens if the channel fills up and Flume can’t send an event? Flume has no way of indicating to the application writing the log file that it needs to retain the log or that the event hasn’t been sent, for some reason. If this doesn’t make sense, you need only know this: Your application can never guarantee data has been received when using a unidirectional asynchronous interface such as ExecSource! As an extension of this warning - and to be completely clear - there is absolutely zero guarantee of event delivery when using this source. For stronger reliability guarantees, consider the Spooling Directory Source or direct integration with Flume via the SDK.
Note
You can use ExecSource to emulate TailSource from Flume 0.9x (flume og). Just use unix command tail-F/full/path/to/your/file. Parameter -F is better in this case than -f as it will also follow file rotation.
The ‘shell’ config is used to invoke the ‘command’ through a command shell (such as Bash or Powershell). The ‘command’ is passed as an argument to ‘shell’ for execution. This allows the ‘command’ to use features from the shell such as wildcards, back ticks, pipes, loops, conditionals etc. In the absence of the ‘shell’ config, the ‘command’ will be invoked directly. Common values for ‘shell’ : ‘/bin/sh -c’, ‘/bin/ksh -c’, ‘cmd /c’, ‘powershell -Command’, etc.
agent_foo.sources.tailsource-1.type=execagent_foo.sources.tailsource-1.shell=/bin/bash -cagent_foo.sources.tailsource-1.command=for i in /path/*.txt; do cat $i; done
JMS Source
JMS Source reads messages from a JMS destination such as a queue or topic. Being a JMS application it should work with any JMS provider but has only been tested with ActiveMQ. The JMS source provides configurable batch size, message selector, user/pass, and message to flume event converter. Note that the vendor provided JMS jars should be included in the Flume classpath using plugins.d directory (preferred), –classpath on command line, or via FLUME_CLASSPATH variable in flume-env.sh.
The JNDI name the connection factory shoulld appear as
providerURL
–
The JMS provider URL
destinationName
–
Destination name
destinationType
–
Destination type (queue or topic)
messageSelector
–
Message selector to use when creating the consumer
userName
–
Username for the destination/provider
passwordFile
–
File containing the password for the destination/provider
batchSize
100
Number of messages to consume in one batch
converter.type
DEFAULT
Class to use to convert messages to flume events. See below.
converter.*
–
Converter properties.
converter.charset
UTF-8
Default converter only. Charset to use when converting JMS TextMessages to byte arrays.
Converter
The JMS source allows pluggable converters, though it’s likely the default converter will work for most purposes. The default converter is able to convert Bytes, Text, and Object messages to FlumeEvents. In all cases, the properties in the message are added as headers to the FlumeEvent.
BytesMessage:
Bytes of message are copied to body of the FlumeEvent. Cannot convert more than 2GB of data per message.
TextMessage:
Text of message is converted to a byte array and copied to the body of the FlumeEvent. The default converter uses UTF-8 by default but this is configurable.
ObjectMessage:
Object is written out to a ByteArrayOutputStream wrapped in an ObjectOutputStream and the resulting array is copied to the body of the FlumeEvent.
This source lets you ingest data by placing files to be ingested into a “spooling” directory on disk. This source will watch the specified directory for new files, and will parse events out of new files as they appear. The event parsing logic is pluggable. After a given file has been fully read into the channel, it is renamed to indicate completion (or optionally deleted).
Unlike the Exec source, this source is reliable and will not miss data, even if Flume is restarted or killed. In exchange for this reliability, only immutable, uniquely-named files must be dropped into the spooling directory. Flume tries to detect these problem conditions and will fail loudly if they are violated:
If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing.
If a file name is reused at a later time, Flume will print an error to its log file and stop processing.
To avoid the above issues, it may be useful to add a unique identifier (such as a timestamp) to log file names when they are moved into the spooling directory.
Despite the reliability guarantees of this source, there are still cases in which events may be duplicated if certain downstream failures occur. This is consistent with the guarantees offered by other Flume components.
Property Name
Default
Description
channels
–
type
–
The component type name, needs to be spooldir.
spoolDir
–
The directory from which to read files from.
fileSuffix
.COMPLETED
Suffix to append to completely ingested files
deletePolicy
never
When to delete completed files: never or immediate
fileHeader
false
Whether to add a header storing the filename
fileHeaderKey
file
Header key to use when appending filename to header
ignorePattern
^$
Regular expression specifying which files to ignore (skip)
trackerDir
.flumespool
Directory to store metadata related to processing of files. If this path is not an absolute path, then it is interpreted as relative to the spoolDir.
batchSize
100
Granularity at which to batch transfer to the channel
inputCharset
UTF-8
Character set used by deserializers that treat the input file as text.
deserializer
LINE
Specify the deserializer used to parse the file into events. Defaults to parsing each line as an event. The class specified must implement EventDeserializer.Builder.
deserializer.*
Varies per event deserializer.
bufferMaxLines
–
(Obselete) This option is now ignored.
bufferMaxLineLength
5000
(Deprecated) Maximum length of a line in the commit buffer. Use deserializer.maxLineLength instead.
The following event deserializers ship with Flume.
LINE
This deserializer generates one event per line of text input.
Property Name
Default
Description
deserializer.maxLineLength
2048
Maximum number of characters to include in a single event. If a line exceeds this length, it is truncated, and the remaining characters on the line will appear in a subsequent event.
deserializer.outputCharset
UTF-8
Charset to use for encoding events put into the channel.
AVRO
This deserializer is able to read an Avro container file, and it generates one event per Avro record in the file. Each event is annotated with a header that indicates the schema used. The body of the event is the binary Avro record data, not including the schema or the rest of the container file elements.
Note that if the spool directory source must retry putting one of these events onto a channel (for example, because the channel is full), then it will reset and retry from the most recent Avro container file sync point. To reduce potential event duplication in such a failure scenario, write sync markers more frequently in your Avro input files.
Property Name
Default
Description
deserializer.schemaType
HASH
How the schema is represented. By default, or when the value HASH is specified, the Avro schema is hashed and the hash is stored in every event in the event header “flume.avro.schema.hash”. If LITERAL is specified, the JSON-encoded schema itself is stored in every event in the event header “flume.avro.schema.literal”. Using LITERAL mode is relatively inefficient compared to HASH mode.
BlobDeserializer
This deserializer reads a Binary Large Object (BLOB) per event, typically one BLOB per file. For example a PDF or JPG file. Note that this approach is not suitable for very large objects because the entire BLOB is buffered in RAM.
Property Name
Default
Description
deserializer
–
The FQCN of this class: org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
deserializer.maxBlobLength
100000000
The maximum number of bytes to read and buffer for a given request
NetCat Source
A netcat-like source that listens on a given port and turns each line of text into an event. Acts like nc-k-l[host][port]. In other words, it opens a specified port and listens for data. The expectation is that the supplied data is newline separated text. Each line of text is turned into a Flume event and sent via the connected channel.
A simple sequence generator that continuously generates events with a counter that starts from 0 and increments by 1. Useful mainly for testing. Required properties are in bold.
Reads syslog data and generate Flume events. The UDP source treats an entire message as a single event. The TCP sources create a new event for each string of characters separated by a newline (‘n’).
Required properties are in bold.
Syslog TCP Source
The original, tried-and-true syslog TCP source.
Property Name
Default
Description
channels
–
type
–
The component type name, needs to be syslogtcp
host
–
Host name or IP address to bind to
port
–
Port # to bind to
eventSize
2500
Maximum size of a single event line, in bytes
selector.type
replicating or multiplexing
selector.*
replicating
Depends on the selector.type value
interceptors
–
Space-separated list of interceptors
interceptors.*
For example, a syslog TCP source for agent named a1:
This is a newer, faster, multi-port capable version of the Syslog TCP source. Note that the ports configuration setting has replaced port. Multi-port capability means that it can listen on many ports at once in an efficient manner. This source uses the Apache Mina library to do that. Provides support for RFC-3164 and many common RFC-5424 formatted messages. Also provides the capability to configure the character set used on a per-port basis.
Property Name
Default
Description
channels
–
type
–
The component type name, needs to be multiport_syslogtcp
host
–
Host name or IP address to bind to.
ports
–
Space-separated list (one or more) of ports to bind to.
eventSize
2500
Maximum size of a single event line, in bytes.
portHeader
–
If specified, the port number will be stored in the header of each event using the header name specified here. This allows for interceptors and channel selectors to customize routing logic based on the incoming port.
charset.default
UTF-8
Default character set used while parsing syslog events into strings.
charset.port.
–
Character set is configurable on a per-port basis.
batchSize
100
Maximum number of events to attempt to process per request loop. Using the default is usually fine.
readBufferSize
1024
Size of the internal Mina read buffer. Provided for performance tuning. Using the default is usually fine.
numProcessors
(auto-detected)
Number of processors available on the system for use while processing messages. Default is to auto-detect # of CPUs using the Java Runtime API. Mina will spawn 2 request-processing threads per detected CPU, which is often reasonable.
selector.type
replicating
replicating, multiplexing, or custom
selector.*
–
Depends on the selector.type value
interceptors
–
Space-separated list of interceptors.
interceptors.*
For example, a multiport syslog TCP source for agent named a1:
A source which accepts Flume Events by HTTP POST and GET. GET should be used for experimentation only. HTTP requests are converted into flume events by a pluggable “handler” which must implement the HTTPSourceHandler interface. This handler takes a HttpServletRequest and returns a list of flume events. All events handled from one Http request are committed to the channel in one transaction, thus allowing for increased efficiency on channels like the file channel. If the handler throws an exception, this source will return a HTTP status of 400. If the channel is full, or the source is unable to append events to the channel, the source will return a HTTP 503 - Temporarily unavailable status.
All events sent in one post request are considered to be one batch and inserted into the channel in one transaction.
A handler is provided out of the box which can handle events represented in JSON format, and supports UTF-8, UTF-16 and UTF-32 character sets. The handler accepts an array of events (even if there is only one event, the event has to be sent in an array) and converts them to a Flume event based on the encoding specified in the request. If no encoding is specified, UTF-8 is assumed. The JSON handler supports UTF-8, UTF-16 and UTF-32. Events are represented as follows.
To set the charset, the request must have content type specified as application/json;charset=UTF-8 (replace UTF-8 with UTF-16 or UTF-32 as required).
One way to create an event in the format expected by this handler is to use JSONEvent provided in the Flume SDK and use Google Gson to create the JSON string using the Gson#fromJson(Object, Type) method. The type token to pass as the 2nd argument of this method for list of events can be created by:
By default HTTPSource splits JSON input into Flume events. As an alternative, BlobHandler is a handler for HTTPSource that returns an event that contains the request parameters as well as the Binary Large Object (BLOB) uploaded with this request. For example a PDF or JPG file. Note that this approach is not suitable for very large objects because it buffers up the entire BLOB in RAM.
Property Name
Default
Description
handler
–
The FQCN of this class: org.apache.flume.sink.solr.morphline.BlobHandler
handler.maxBlobLength
100000000
The maximum number of bytes to read and buffer for a given request
Legacy Sources
The legacy sources allow a Flume 1.x agent to receive events from Flume 0.9.4 agents. It accepts events in the Flume 0.9.4 format, converts them to the Flume 1.0 format, and stores them in the connected channel. The 0.9.4 event properties like timestamp, pri, host, nanos, etc get converted to 1.x event header attributes. The legacy source supports both Avro and Thrift RPC connections. To use this bridge between two Flume versions, you need to start a Flume 1.x agent with the avroLegacy or thriftLegacy source. The 0.9.4 agent should have the agent Sink pointing to the host/port of the 1.x agent.
Note
The reliability semantics of Flume 1.x are different from that of Flume 0.9.x. The E2E or DFO mode of a Flume 0.9.x agent will not be supported by the legacy source. The only supported 0.9.x mode is the best effort, though the reliability setting of the 1.x flow will be applicable to the events once they are saved into the Flume 1.x channel by the legacy source.
Required properties are in bold.
Avro Legacy Source
Property Name
Default
Description
channels
–
type
–
The component type name, needs to be org.apache.flume.source.avroLegacy.AvroLegacySource
A custom source is your own implementation of the Source interface. A custom source’s class and its dependencies must be included in the agent’s classpath when starting the Flume agent. The type of the custom source is its FQCN.
Scribe is another type of ingest system. To adopt existing Scribe ingest system, Flume should use ScribeSource based on Thrift with compatible transfering protocol. The deployment of Scribe please following guide from Facebook. Required properties are inbold.
Property Name
Default
Description
type
–
The component type name, needs to be org.apache.flume.source.scribe.ScribeSource
This sink writes events into the Hadoop Distributed File System (HDFS). It currently supports creating text and sequence files. It supports compression in both file types. The files can be rolled (close current file and create a new one) periodically based on the elapsed time or size of data or number of events. It also buckets/partitions data by attributes like timestamp or machine where the event originated. The HDFS directory path may contain formatting escape sequences that will replaced by the HDFS sink to generate a directory/file name to store the events. Using this sink requires hadoop to be installed so that Flume can use the Hadoop jars to communicate with the HDFS cluster. Note that a version of Hadoop that supports the sync() call is required.
The following are the escape sequences supported:
Alias
Description
%{host}
Substitute value of event header named “host”. Arbitrary header names are supported.
%t
Unix time in milliseconds
%a
locale’s short weekday name (Mon, Tue, ...)
%A
locale’s full weekday name (Monday, Tuesday, ...)
%b
locale’s short month name (Jan, Feb, ...)
%B
locale’s long month name (January, February, ...)
%c
locale’s date and time (Thu Mar 3 23:05:25 2005)
%d
day of month (01)
%D
date; same as %m/%d/%y
%H
hour (00..23)
%I
hour (01..12)
%j
day of year (001..366)
%k
hour ( 0..23)
%m
month (01..12)
%M
minute (00..59)
%p
locale’s equivalent of am or pm
%s
seconds since 1970-01-01 00:00:00 UTC
%S
second (00..59)
%y
last two digits of year (00..99)
%Y
year (2010)
%z
+hhmm numeric timezone (for example, -0400)
The file in use will have the name mangled to include ”.tmp” at the end. Once the file is closed, this extension is removed. This allows excluding partially complete files in the directory. Required properties are in bold.
Note
For all of the time related escape sequences, a header with the key “timestamp” must exist among the headers of the event (unless hdfs.useLocalTimeStamp is set to true). One way to add this automatically is to use the TimestampInterceptor.
Name prefixed to files created by Flume in hdfs directory
hdfs.fileSuffix
–
Suffix to append to file (eg .avro - NOTE: period is not automatically added)
hdfs.inUsePrefix
–
Prefix that is used for temporal files that flume actively writes into
hdfs.inUseSuffix
.tmp
Suffix that is used for temporal files that flume actively writes into
hdfs.rollInterval
30
Number of seconds to wait before rolling current file (0 = never roll based on time interval)
hdfs.rollSize
1024
File size to trigger roll, in bytes (0: never roll based on file size)
hdfs.rollCount
10
Number of events written to file before it rolled (0 = never roll based on number of events)
hdfs.idleTimeout
0
Timeout after which inactive files get closed (0 = disable automatic closing of idle files)
hdfs.batchSize
100
number of events written to file before it is flushed to HDFS
hdfs.codeC
–
Compression codec. one of following : gzip, bzip2, lzo, lzop, snappy
hdfs.fileType
SequenceFile
File format: currently SequenceFile, DataStream or CompressedStream (1)DataStream will not compress output file and please don’t set codeC (2)CompressedStream requires set hdfs.codeC with an available codeC
hdfs.maxOpenFiles
5000
Allow only this number of open files. If this number is exceeded, the oldest file is closed.
hdfs.minBlockReplicas
–
Specify minimum number of replicas per HDFS block. If not specified, it comes from the default Hadoop config in the classpath.
hdfs.writeFormat
–
Format for sequence file records. One of “Text” or “Writable” (the default).
hdfs.callTimeout
10000
Number of milliseconds allowed for HDFS operations, such as open, write, flush, close. This number should be increased if many HDFS timeout operations are occurring.
hdfs.threadsPoolSize
10
Number of threads per HDFS sink for HDFS IO ops (open, write, etc.)
hdfs.rollTimerPoolSize
1
Number of threads per HDFS sink for scheduling timed file rolling
hdfs.kerberosPrincipal
–
Kerberos user principal for accessing secure HDFS
hdfs.kerberosKeytab
–
Kerberos keytab for accessing secure HDFS
hdfs.proxyUser
hdfs.round
false
Should the timestamp be rounded down (if true, affects all time based escape sequences except %t)
hdfs.roundValue
1
Rounded down to the highest multiple of this (in the unit configured using hdfs.roundUnit), less than current time.
hdfs.roundUnit
second
The unit of the round down value - second, minute or hour.
hdfs.timeZone
Local Time
Name of the timezone that should be used for resolving the directory path, e.g. America/Los_Angeles.
hdfs.useLocalTimeStamp
false
Use the local time (instead of the timestamp from the event header) while replacing the escape sequences.
serializer
TEXT
Other possible options include avro_event or the fully-qualified class name of an implementation of the EventSerializer.Builder interface.
The above configuration will round down the timestamp to the last 10th minute. For example, an event with timestamp 11:54:34 AM, June 12, 2012 will cause the hdfs path to become /flume/events/2012-06-12/1150/00.
Logger Sink
Logs event at INFO level. Typically useful for testing/debugging purpose. Required properties are in bold.
This sink forms one half of Flume’s tiered collection support. Flume events sent to this sink are turned into Avro events and sent to the configured hostname / port pair. The events are taken from the configured Channel in batches of the configured batch size. Required properties are in bold.
Property Name
Default
Description
channel
–
type
–
The component type name, needs to be avro.
hostname
–
The hostname or IP address to bind to.
port
–
The port # to listen on.
batch-size
100
number of event to batch together for send.
connect-timeout
20000
Amount of time (ms) to allow for the first (handshake) request.
request-timeout
20000
Amount of time (ms) to allow for requests after the first.
reset-connection-interval
none
Amount of time (s) before the connection to the next hop is reset. This will force the Avro Sink to reconnect to the next hop. This will allow the sink to connect to hosts behind a hardware load-balancer when news hosts are added without having to restart the agent.
compression-type
none
This can be “none” or “deflate”. The compression-type must match the compression-type of matching AvroSource
compression-level
6
The level of compression to compress event. 0 = no compression and 1-9 is compression. The higher the number the more compression
ssl
false
Set to true to enable SSL for this AvroSink. When configuring SSL, you can optionally set a “truststore”, “truststore-password”, “truststore-type”, and specify whether to “trust-all-certs”.
trust-all-certs
false
If this is set to true, SSL server certificates for remote servers (Avro Sources) will not be checked. This should NOT be used in production because it makes it easier for an attacker to execute a man-in-the-middle attack and “listen in” on the encrypted connection.
truststore
–
The path to a custom Java truststore file. Flume uses the certificate authority information in this file to determine whether the remote Avro Source’s SSL authentication credentials should be trusted. If not specified, the default Java JSSE certificate authority files (typically “jssecacerts” or “cacerts” in the Oracle JRE) will be used.
truststore-password
–
The password for the specified truststore.
truststore-type
JKS
The type of the Java truststore. This can be “JKS” or other supported Java truststore type.
This sink forms one half of Flume’s tiered collection support. Flume events sent to this sink are turned into Thrift events and sent to the configured hostname / port pair. The events are taken from the configured Channel in batches of the configured batch size. Required properties are in bold.
Property Name
Default
Description
channel
–
type
–
The component type name, needs to be thrift.
hostname
–
The hostname or IP address to bind to.
port
–
The port # to listen on.
batch-size
100
number of event to batch together for send.
connect-timeout
20000
Amount of time (ms) to allow for the first (handshake) request.
request-timeout
20000
Amount of time (ms) to allow for requests after the first.
connection-reset-interval
none
Amount of time (s) before the connection to the next hop is reset. This will force the Thrift Sink to reconnect to the next hop. This will allow the sink to connect to hosts behind a hardware load-balancer when news hosts are added without having to restart the agent.
This sink writes data to HBase. The Hbase configuration is picked up from the first hbase-site.xml encountered in the classpath. A class implementing HbaseEventSerializer which is specified by the configuration is used to convert the events into HBase puts and/or increments. These puts and increments are then written to HBase. This sink provides the same consistency guarantees as HBase, which is currently row-wise atomicity. In the event of Hbase failing to write certain events, the sink will replay all events in that transaction.
The HBaseSink supports writing data to secure HBase. To write to secure HBase, the user the agent is running as must have write permissions to the table the sink is configured to write to. The principal and keytab to use to authenticate against the KDC can be specified in the configuration. The hbase-site.xml in the Flume agent’s classpath must have authentication set tokerberos (For details on how to do this, please refer to HBase documentation).
For convenience, two serializers are provided with Flume. The SimpleHbaseEventSerializer (org.apache.flume.sink.hbase.SimpleHbaseEventSerializer) writes the event body as-is to HBase, and optionally increments a column in Hbase. This is primarily an example implementation. The RegexHbaseEventSerializer (org.apache.flume.sink.hbase.RegexHbaseEventSerializer) breaks the event body based on the given regex and writes each part into different columns.
The type is the FQCN: org.apache.flume.sink.hbase.HBaseSink.
This sink writes data to HBase using an asynchronous model. A class implementing AsyncHbaseEventSerializer which is specified by the configuration is used to convert the events into HBase puts and/or increments. These puts and increments are then written to HBase. This sink provides the same consistency guarantees as HBase, which is currently row-wise atomicity. In the event of Hbase failing to write certain events, the sink will replay all events in that transaction. The type is the FQCN: org.apache.flume.sink.hbase.AsyncHBaseSink. Required properties are in bold.
Property Name
Default
Description
channel
–
type
–
The component type name, needs to be asynchbase
table
–
The name of the table in Hbase to write to.
zookeeperQuorum
–
The quorum spec. This is the value for the property hbase.zookeeper.quorum in hbase-site.xml
znodeParent
/hbase
The base path for the znode for the -ROOT- region. Value of zookeeper.znode.parent in hbase-site.xml
columnFamily
–
The column family in Hbase to write to.
batchSize
100
Number of events to be written per txn.
timeout
60000
The length of time (in milliseconds) the sink waits for acks from hbase for all events in a transaction.
Note that this sink takes the Zookeeper Quorum and parent znode information in the configuration. Zookeeper Quorum and parent node configuration may be specified in the flume configuration file. Alternatively, these configuration values are taken from the first hbase-site.xml file in the classpath.
If these are not provided in the configuration, then the sink will read this information from the first hbase-site.xml file in the classpath.
This sink extracts data from Flume events, transforms it, and loads it in near-real-time into Apache Solr servers, which in turn serve queries to end users or search applications.
This sink is well suited for use cases that stream raw data into HDFS (via the HdfsSink) and simultaneously extract, transform and load the same data into Solr (via MorphlineSolrSink). In particular, this sink can process arbitrary heterogeneous raw data from disparate data sources and turn it into a data model that is useful to Search applications.
The ETL functionality is customizable using a morphline configuration file that defines a chain of transformation commands that pipe event records from one command to another.
Morphlines can be seen as an evolution of Unix pipelines where the data model is generalized to work with streams of generic records, including arbitrary binary payloads. A morphline command is a bit like a Flume Interceptor. Morphlines can be embedded into Hadoop components such as Flume.
Commands to parse and transform a set of standard data formats such as log files, Avro, CSV, Text, HTML, XML, PDF, Word, Excel, etc. are provided out of the box, and additional custom commands and parsers for additional data formats can be added as morphline plugins. Any kind of data format can be indexed and any Solr documents for any kind of Solr schema can be generated, and any custom ETL logic can be registered and executed.
Morphlines manipulate continuous streams of records. The data model can be described as follows: A record is a set of named fields where each field has an ordered list of one or more values. A value can be any Java Object. That is, a record is essentially a hash table where each hash table entry contains a String key and a list of Java Objects as values. (The implementation uses Guava’s ArrayListMultimap, which is a ListMultimap). Note that a field can have multiple values and any two records need not use common field names.
This sink fills the body of the Flume event into the _attachment_body field of the morphline record, as well as copies the headers of the Flume event into record fields of the same name. The commands can then act on this data.
Routing to a SolrCloud cluster is supported to improve scalability. Indexing load can be spread across a large number of MorphlineSolrSinks for improved scalability. Indexing load can be replicated across multiple MorphlineSolrSinks for high availability, for example using Flume features such as Load balancing Sink Processor. MorphlineInterceptor can also help to implement dynamic routing to multiple Solr collections (e.g. for multi-tenancy).
The morphline and solr jars required for your environment must be placed in the lib directory of the Apache Flume installation.
The type is the FQCN: org.apache.flume.sink.solr.morphline.MorphlineSolrSink
Required properties are in bold.
Property Name
Default
Description
channel
–
type
–
The component type name, needs to beorg.apache.flume.sink.solr.morphline.MorphlineSolrSink
morphlineFile
–
The relative or absolute path on the local file system to the morphline configuration file. Example:/etc/flume-ng/conf/morphline.conf
morphlineId
null
Optional name used to identify a morphline if there are multiple morphlines in a morphline config file
batchSize
1000
The maximum number of events to take per flume transaction.
batchDurationMillis
1000
The maximum duration per flume transaction (ms). The transaction commits after this duration or when batchSize is exceeded, whichever comes first.
This sink writes data to an elasticsearch cluster. By default, events will be written so that the Kibana graphical interface can display them - just as if logstash wrote them.
The elasticsearch and lucene-core jars required for your environment must be placed in the lib directory of the Apache Flume installation. Elasticsearch requires that the major version of the client JAR match that of the server and that both are running the same minor version of the JVM. SerializationExceptions will appear if this is incorrect. To select the required version first determine the version of elasticsearch and the JVM version the target cluster is running. Then select an elasticsearch client library which matches the major version. A 0.19.x client can talk to a 0.19.x cluster; 0.20.x can talk to 0.20.x and 0.90.x can talk to 0.90.x. Once the elasticsearch version has been determined then read the pom.xml file to determine the correct lucene-core JAR version to use. The Flume agent which is running the ElasticSearchSink should also match the JVM the target cluster is running down to the minor version.
Events will be written to a new index every day. The name will be -yyyy-MM-dd where is the indexName parameter. The sink will start writing to a new index at midnight UTC.
Events are serialized for elasticsearch by the ElasticSearchLogStashEventSerializer by default. This behaviour can be overridden with the serializer parameter. This parameter accepts implementations of org.apache.flume.sink.elasticsearch.ElasticSearchEventSerializer or org.apache.flume.sink.elasticsearch.ElasticSearchIndexRequestBuilderFactory. Implementing ElasticSearchEventSerializer is deprecated in favour of the more powerful ElasticSearchIndexRequestBuilderFactory.
The type is the FQCN: org.apache.flume.sink.elasticsearch.ElasticSearchSink
Required properties are in bold.
Property Name
Default
Description
channel
–
type
–
The component type name, needs to beorg.apache.flume.sink.elasticsearch.ElasticSearchSink
hostNames
–
Comma separated list of hostname:port, if the port is not present the default port ‘9300’ will be used
indexName
flume
The name of the index which the date will be appended to. Example ‘flume’ -> ‘flume-yyyy-MM-dd’
indexType
logs
The type to index the document to, defaults to ‘log’
clusterName
elasticsearch
Name of the ElasticSearch cluster to connect to
batchSize
100
Number of events to be written per txn.
ttl
–
TTL in days, when set will cause the expired documents to be deleted automatically, if not set documents will never be automatically deleted
The ElasticSearchIndexRequestBuilderFactory or ElasticSearchEventSerializer to use. Implementations of either class are accepted but ElasticSearchIndexRequestBuilderFactory is preferred.
A custom sink is your own implementation of the Sink interface. A custom sink’s class and its dependencies must be included in the agent’s classpath when starting the Flume agent. The type of the custom sink is its FQCN. Required properties are inbold.
Channels are the repositories where the events are staged on a agent. Source adds the events and Sink removes it.
Memory Channel
The events are stored in an in-memory queue with configurable max size. It’s ideal for flows that need higher throughput and are prepared to lose the staged data in the event of a agent failures. Required properties are in bold.
Property Name
Default
Description
type
–
The component type name, needs to be memory
capacity
100
The maximum number of events stored in the channel
transactionCapacity
100
The maximum number of events the channel will take from a source or give to a sink per transaction
keep-alive
3
Timeout in seconds for adding or removing an event
byteCapacityBufferPercentage
20
Defines the percent of buffer between byteCapacity and the estimated total size of all events in the channel, to account for data in headers. See below.
byteCapacity
see description
Maximum total bytes of memory allowed as a sum of all events in this channel. The implementation only counts the Event body, which is the reason for providing the byteCapacityBufferPercentage configuration parameter as well. Defaults to a computed value equal to 80% of the maximum memory available to the JVM (i.e. 80% of the -Xmx value passed on the command line). Note that if you have multiple memory channels on a single JVM, and they happen to hold the same physical events (i.e. if you are using a replicating channel selector from a single source) then those event sizes may be double-counted for channel byteCapacity purposes. Setting this value to 0 will cause this value to fall back to a hard internal limit of about 200 GB.
The events are stored in a persistent storage that’s backed by a database. The JDBC channel currently supports embedded Derby. This is a durable channel that’s ideal for flows where recoverability is important. Required properties are in bold.
Property Name
Default
Description
type
–
The component type name, needs to be jdbc
db.type
DERBY
Database vendor, needs to be DERBY.
driver.class
org.apache.derby.jdbc.EmbeddedDriver
Class for vendor’s JDBC driver
driver.url
(constructed from other properties)
JDBC connection URL
db.username
“sa”
User id for db connection
db.password
–
password for db connection
connection.properties.file
–
JDBC Connection property file path
create.schema
true
If true, then creates db schema if not there
create.index
true
Create indexes to speed up lookups
create.foreignkey
true
transaction.isolation
“READ_COMMITTED”
Isolation level for db session READ_UNCOMMITTED, READ_COMMITTED, SERIALIZABLE, REPEATABLE_READ
maximum.connections
10
Max connections allowed to db
maximum.capacity
0 (unlimited)
Max number of events in the channel
sysprop.*
DB Vendor specific properties
sysprop.user.home
Home path to store embedded Derby database
Example for agent named a1:
a1.channels=c1a1.channels.c1.type=jdbc
File Channel
Required properties are in bold.
Property Name Default
Description
type
–
The component type name, needs to be file.
checkpointDir
~/.flume/file-channel/checkpoint
The directory where checkpoint file will be stored
useDualCheckpoints
false
Backup the checkpoint. If this is set to true, backupCheckpointDirmust be set
backupCheckpointDir
–
The directory where the checkpoint is backed up to. This directory must not be the same as the data directories or the checkpoint directory
dataDirs
~/.flume/file-channel/data
The directory where log files will be stored
transactionCapacity
1000
The maximum size of transaction supported by the channel
checkpointInterval
30000
Amount of time (in millis) between checkpoints
maxFileSize
2146435071
Max size (in bytes) of a single log file
minimumRequiredSpace
524288000
Minimum Required free space (in bytes). To avoid data corruption, File Channel stops accepting take/put requests when free space drops below this value
capacity
1000000
Maximum capacity of the channel
keep-alive
3
Amount of time (in sec) to wait for a put operation
write-timeout
3
Amount of time (in sec) to wait for a write operation
checkpoint-timeout
600
Expert: Amount of time (in sec) to wait for a checkpoint
List of all keys (e.g. history of the activeKey setting)
encyption.keyProvider.keys.*.passwordFile
–
Path to the optional key password file
Note
By default the File Channel uses paths for checkpoint and data directories that are within the user home as specified above. As a result if you have more than one File Channel instances active within the agent, only one will be able to lock the directories and cause the other channel initialization to fail. It is therefore necessary that you provide explicit paths to all the configured channels, preferably on different disks. Furthermore, as file channel will sync to disk after every commit, coupling it with a sink/source that batches events together may be necessary to provide good performance where multiple disks are not available for checkpoint and data directories.
The Pseudo Transaction Channel is only for unit testing purposes and is NOT meant for production use.
Required properties are in bold.
Property Name
Default
Description
type
–
The component type name, needs to be org.apache.flume.channel.PseudoTxnMemoryChannel
capacity
50
The max number of events stored in the channel
keep-alive
3
Timeout in seconds for adding or removing an event
Custom Channel
A custom channel is your own implementation of the Channel interface. A custom channel’s class and its dependencies must be included in the agent’s classpath when starting the Flume agent. The type of the custom channel is its FQCN. Required properties are in bold.
In the above configuration, c3 is an optional channel. Failure to write to c3 is simply ignored. Since c1 and c2 are not marked optional, failure to write to those channels will cause the transaction to fail.
Multiplexing Channel Selector
Required properties are in bold.
Property Name
Default
Description
selector.type
replicating
The component type name, needs to be multiplexing
selector.header
flume.selector.header
selector.default
–
selector.mapping.*
–
Example for agent named a1 and it’s source called r1:
A custom channel selector is your own implementation of the ChannelSelector interface. A custom channel selector’s class and its dependencies must be included in the agent’s classpath when starting the Flume agent. The type of the custom channel selector is its FQCN.
Property Name
Default
Description
selector.type
–
The component type name, needs to be your FQCN
Example for agent named a1 and its source called r1:
Sink groups allow users to group multiple sinks into one entity. Sink processors can be used to provide load balancing capabilities over all sinks inside the group or to achieve fail over from one sink to another in case of temporal failure.
Required properties are in bold.
Property Name
Default
Description
sinks
–
Space-separated list of sinks that are participating in the group
processor.type
default
The component type name, needs to be default, failover or load_balance
Default sink processor accepts only a single sink. User is not forced to create processor (sink group) for single sinks. Instead user can follow the source - channel - sink pattern that was explained above in this user guide.
Failover Sink Processor
Failover Sink Processor maintains a prioritized list of sinks, guaranteeing that so long as one is available events will be processed (delivered).
The failover mechanism works by relegating failed sinks to a pool where they are assigned a cool down period, increasing with sequential failures before they are retried. Once a sink successfully sends an event, it is restored to the live pool.
To configure, set a sink groups processor to failover and set priorities for all individual sinks. All specified priorities must be unique. Furthermore, upper limit to failover time can be set (in milliseconds) using maxpenalty property.
Required properties are in bold.
Property Name
Default
Description
sinks
–
Space-separated list of sinks that are participating in the group
processor.type
default
The component type name, needs to be failover
processor.priority.
–
must be one of the sink instances associated with the current sink group
Load balancing sink processor provides the ability to load-balance flow over multiple sinks. It maintains an indexed list of active sinks on which the load must be distributed. Implementation supports distributing load using either via round_robin orrandom selection mechanisms. The choice of selection mechanism defaults to round_robin type, but can be overridden via configuration. Custom selection mechanisms are supported via custom classes that inherits from AbstractSinkSelector.
When invoked, this selector picks the next sink using its configured selection mechanism and invokes it. For round_robin andrandom In case the selected sink fails to deliver the event, the processor picks the next available sink via its configured selection mechanism. This implementation does not blacklist the failing sink and instead continues to optimistically attempt every available sink. If all sinks invocations result in failure, the selector propagates the failure to the sink runner.
If backoff is enabled, the sink processor will blacklist sinks that fail, removing them for selection for a given timeout. When the timeout ends, if the sink is still unresponsive timeout is increased exponentially to avoid potentially getting stuck in long waits on unresponsive sinks. With this disabled, in round-robin all the failed sinks load will be passed to the next sink in line and thus not evenly balanced
Required properties are in bold.
Property Name
Default
Description
processor.sinks
–
Space-separated list of sinks that are participating in the group
processor.type
default
The component type name, needs to be load_balance
processor.backoff
false
Should failed sinks be backed off exponentially.
processor.selector
round_robin
Selection mechanism. Must be either round_robin, random or FQCN of custom class that inherits from AbstractSinkSelector
processor.selector.maxTimeOut
30000
Used by backoff selectors to limit exponential backoff (in milliseconds)
Custom sink processors are not supported at the moment.
Event Serializers
The file_roll sink and the hdfs sink both support the EventSerializer interface. Details of the EventSerializers that ship with Flume are provided below.
Body Text Serializer
Alias: text. This interceptor writes the body of the event to an output stream without any transformation or modification. The event headers are ignored. Configuration options are as follows:
Property Name
Default
Description
appendNewline
true
Whether a newline will be appended to each event at write time. The default of true assumes that events do not contain newlines, for legacy reasons.
Alias: avro_event. This interceptor serializes Flume events into an Avro container file. The schema used is the same schema used for Flume events in the Avro RPC mechanism. This serializers inherits from the AbstractAvroEventSerializer class. Configuration options are as follows:
Property Name
Default
Description
syncIntervalBytes
2048000
Avro sync interval, in approximate bytes.
compressionCodec
null
Avro compression codec. For supported codecs, see Avro’s CodecFactory docs.
Flume has the capability to modify/drop events in-flight. This is done with the help of interceptors. Interceptors are classes that implement org.apache.flume.interceptor.Interceptor interface. An interceptor can modify or even drop events based on any criteria chosen by the developer of the interceptor. Flume supports chaining of interceptors. This is made possible through by specifying the list of interceptor builder class names in the configuration. Interceptors are specified as a whitespace separated list in the source configuration. The order in which the interceptors are specified is the order in which they are invoked. The list of events returned by one interceptor is passed to the next interceptor in the chain. Interceptors can modify or drop events. If an interceptor needs to drop events, it just does not return that event in the list that it returns. If it is to drop all events, then it simply returns an empty list. Interceptors are named components, here is an example of how they are created through configuration:
Note that the interceptor builders are passed to the type config parameter. The interceptors are themselves configurable and can be passed configuration values just like they are passed to any other configurable component. In the above example, events are passed to the HostInterceptor first and the events returned by the HostInterceptor are then passed along to the TimestampInterceptor. You can specify either the fully qualified class name (FQCN) or the alias timestamp. If you have multiple collectors writing to the same HDFS path, then you could also use the HostInterceptor.
Timestamp Interceptor
This interceptor inserts into the event headers, the time in millis at which it processes the event. This interceptor inserts a header with key timestamp whose value is the relevant timestamp. This interceptor can preserve an existing timestamp if it is already present in the configuration.
Property Name
Default
Description
type
–
The component type name, has to be timestamp or the FQCN
preserveExisting
false
If the timestamp already exists, should it be preserved - true or false
This interceptor inserts the hostname or IP address of the host that this agent is running on. It inserts a header with keyhost or a configured key whose value is the hostname or IP address of the host, based on configuration.
Property Name
Default
Description
type
–
The component type name, has to be host
preserveExisting
false
If the host header already exists, should it be preserved - true or false
Static interceptor allows user to append a static header with static value to all events.
The current implementation does not allow specifying multiple headers at one time. Instead user might chain multiple static interceptors each defining one static header.
Property Name
Default
Description
type
–
The component type name, has to be static
preserveExisting
true
If configured header already exists, should it be preserved - true or false
This interceptor sets a universally unique identifier on all events that are intercepted. An example UUID is b5755073-77a9-43c1-8fad-b7a586fc1b97, which represents a 128-bit value.
Consider using UUIDInterceptor to automatically assign a UUID to an event if no application level unique key for the event is available. It can be important to assign UUIDs to events as soon as they enter the Flume network; that is, in the first Flume Source of the flow. This enables subsequent deduplication of events in the face of replication and redelivery in a Flume network that is designed for high availability and high performance. If an application level key is available, this is preferable over an auto-generated UUID because it enables subsequent updates and deletes of event in data stores using said well known application level key.
Property Name
Default
Description
type
–
The component type name has to be org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
headerName
id
The name of the Flume header to modify
preserveExisting
true
If the UUID header already exists, should it be preserved - true or false
prefix
“”
The prefix string constant to prepend to each generated UUID
Morphline Interceptor
This interceptor filters the events through a morphline configuration file that defines a chain of transformation commands that pipe records from one command to another. For example the morphline can ignore certain events or alter or insert certain event headers via regular expression based pattern matching, or it can auto-detect and set a MIME type via Apache Tika on events that are intercepted. For example, this kind of packet sniffing can be used for content based dynamic routing in a Flume topology. MorphlineInterceptor can also help to implement dynamic routing to multiple Apache Solr collections (e.g. for multi-tenancy).
Currently, there is a restriction in that the morphline of an interceptor must not generate more than one output record for each input event. This interceptor is not intended for heavy duty ETL processing - if you need this consider moving ETL processing from the Flume Source to a Flume Sink, e.g. to a MorphlineSolrSink.
Required properties are in bold.
Property Name
Default
Description
type
–
The component type name has to be org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
morphlineFile
–
The relative or absolute path on the local file system to the morphline configuration file. Example: /etc/flume-ng/conf/morphline.conf
morphlineId
null
Optional name used to identify a morphline if there are multiple morphlines in a morphline config file
This interceptor filters events selectively by interpreting the event body as text and matching the text against a configured regular expression. The supplied regular expression can be used to include events or exclude events.
Property Name
Default
Description
type
–
The component type name has to be regex_filter
regex
”.*”
Regular expression for matching against events
excludeEvents
false
If true, regex determines events to exclude, otherwise regex determines events to include.
Regex Extractor Interceptor
This interceptor extracts regex match groups using a specified regular expression and appends the match groups as headers on the event. It also supports pluggable serializers for formatting the match groups before adding them as event headers.
Property Name
Default
Description
type
–
The component type name has to be regex_extractor
regex
–
Regular expression for matching against events
serializers
–
Space-separated list of serializers for mapping matches to header names and serializing their values. (See example below) Flume provides built-in support for the following serializers: org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializerorg.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
serializers..type
default
Must be default(org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializer),org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer, or the FQCN of a custom class that implements org.apache.flume.interceptor.RegexExtractorInterceptorSerializer
serializers..name
–
serializers.*
–
Serializer-specific properties
The serializers are used to map the matches to a header name and a formatted header value; by default, you only need to specify the header name and the default org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializer will be used. This serializer simply maps the matches to the specified header name and passes the value through as it was extracted by the regex. You can plug custom serializer implementations into the extractor using the fully qualified class name (FQCN) to format the matches in anyway you like.
Example 1:
If the Flume event body contained 1:2:3.4foobar5 and the following configuration was used
the extracted event will contain the same body but the following headers will have been added timestamp=>1350611220000
Flume Properties
Property Name
Default
Description
flume.called.from.service
–
If this property is specified then the Flume agent will continue polling for the config file even if the config file is not found at the expected location. Otherwise, the Flume agent will terminate if the config doesn’t exist at the expected location. No property value is needed when setting this property (eg, just specifying -Dflume.called.from.service is enough)
Property: flume.called.from.service¶
Flume periodically polls, every 30 seconds, for changes to the specified config file. A Flume agent loads a new configuration from the config file if either an existing file is polled for the first time, or if an existing file’s modification date has changed since the last time it was polled. Renaming or moving a file does not change its modification time. When a Flume agent polls a non-existent file then one of two things happens: 1. When the agent polls a non-existent config file for the first time, then the agent behaves according to the flume.called.from.service property. If the property is set, then the agent will continue polling (always at the same period – every 30 seconds). If the property is not set, then the agent immediately terminates. ...OR... 2. When the agent polls a non-existent config file and this is not the first time the file is polled, then the agent makes no config changes for this polling period. The agent continues polling rather than terminating.
终端仿真器是一款用其它显示架构重现可视终端的计算机程序。换句话说就是终端仿真器能使哑终端看似像一台连接上了服务器的客户机。终端仿真器允许最终用户用文本用户界面和命令行来访问控制台和应用程序。(LCTT 译注:终端仿真器原意指对大型机-哑终端方式的模拟,不过在当今的 Linux 环境中,常指通过远程或本地方式连接的伪终端,俗称“终端”。)
你能从开源世界中找到大量的终端仿真器,它们
功能:在控制台每秒输出一次
代码:
package Main;
import javax.swing.Timer;
import java.awt.event.*;
public class T {
private static int count = 0;
public static void main(String[] args){
1,获取样式属性值
top 与顶部的距离
left 与左边的距离
right 与右边的距离
bottom 与下边的距离
zIndex 层叠层次
例子:获取左边的宽度,当css写在body标签中时
<div id="adver" style="position:absolute;top:50px;left:1000p
spring data jpa 支持以方法名进行查询/删除/统计。
查询的关键字为find
删除的关键字为delete/remove (>=1.7.x)
统计的关键字为count (>=1.7.x)
修改需要使用@Modifying注解
@Modifying
@Query("update User u set u.firstna
项目中controller的方法跳转的到ModelAndView类,一直很好奇spring怎么实现的?
/*
* Copyright 2002-2010 the original author or authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* yo
(1)npm是什么
npm is the package manager for node
官方网站:https://www.npmjs.com/
npm上有很多优秀的nodejs包,来解决常见的一些问题,比如用node-mysql,就可以方便通过nodejs链接到mysql,进行数据库的操作
在开发过程往往会需要用到其他的包,使用npm就可以下载这些包来供程序调用
&nb
Controller层的拦截器继承于HandlerInterceptorAdapter
HandlerInterceptorAdapter.java 1 public abstract class HandlerInterceptorAdapter implements HandlerIntercep