Here’s how Filebeat works: When you start Filebeat, it starts one or more inputs that look in the locations you’ve specified for log data. Foreach log that Filebeat locates, Filebeat starts a harvester. Each harvester reads a single log for new content and sends the new log data to libbeat, which aggregates the events and sends the aggregated data to the output that you’ve configured for Filebeat.
当启动filebeat后,filebeat会开启一个或多个input,去“盯住”在配置文件中指定的log文件。
对于每个log文件,filebeat会启动一个harvester。harvester读去单个log里新增的内容,并将其发送给libbeat。
libbeat会收集这些事件,并将其发送给配置好的output。
1, A harvester is responsible for reading the content of a single file.
2, The harvester reads each file, line by line, and sends the content to the output.
3, One harvester is started for each file.
4, The harvester is responsible for opening and closing the file, which means that the file descriptor remains open while the harvester is running. If a file is removed or renamed while it’s being harvested, Filebeat continues to read the file. This has the side effect that the space on your disk is reserved until the harvester closes.
5, Closing a harvester has the following consequences:
>The file handler is closed, freeing up the underlying resources if the file was deleted while the harvester was still reading the file.
>The harvesting of the file will only be started again after scan_frequency
has elapsed.
>If the file is moved or removed while the harvester is closed, harvesting of the file will not continue.
6 To control when a harvester is closed, use the close_* configuration options.
负责管理harvesters,寻找所有读取资源。
input 检查每一个文件,去判定是否需要启动一个harvester;harvester是否已经运行;文件是否可以忽略。
harvester关闭之后,文件大小发生了变化,仅新内容被读取。
An input is responsible for managing the harvesters and finding all sources to read from.
If the input type is log
, the input finds all files on the drive that match the defined glob paths and starts a harvester for each file. Each input runs in its own Go routine.
The following example configures Filebeat to harvest lines from all log files that match the specified glob patterns:
filebeat.inputs: - type: log paths: - /var/log/*.log - /var/path2/*.log
Filebeat currently supports several input
types. Each input type can be defined multiple times. The log
input checks each file to see whether a harvester needs to be started, whether one is already running, or whether the file can be ignored (see ignore_older
). New lines are only picked up if the size of the file has changed since the harvester was closed.
filebeat会记录每个文件的状态并频繁写入磁盘上的一个注册文件。所记录的状态用来记录harvester读取的偏移量,从而确保日志文件中所有的日志被收集。如果output比如Elasticsearch or Logstash现在不可达,filebeat会追踪日志文件的最后一行,并且继续读取日志文件,当output可达的时候。当filebeat运行时,每一个input的状态信息同样保存在内存中。当filebeat重启的时候,注册文件中的数据用来重建状态信息。filebeat在最终已知的位置开启harvester进行收集日志。
对每一个input,filebeat记录他所找到的每个文件的状态。因为文件可以被重命名或移动,文件名和路径不足以识别一个文件。对每一个文件来说,filebeat存储唯一的识别码,去检测一个文件是否之前被收集。
如果你的用例包含每天创建大批量的新文件,你可能会发现注册文件会增长的比较大。
registry file默认存放在:/var/lib/filebeat/registry
Filebeat keeps the state of each file and frequently flushes the state to disk in the registry file. The state is used to remember the last offset a harvester was reading from and to ensure all log lines are sent. If the output, such as Elasticsearch or Logstash, is not reachable, Filebeat keeps track of the last lines sent and will continue reading the files as soon as the output becomes available again. While Filebeat is running, the state information is also kept in memory for each input. When Filebeat is restarted, data from the registry file is used to rebuild the state, and Filebeat continues each harvester at the last known position.
For each input, Filebeat keeps a state of each file it finds. Because files can be renamed or moved, the filename and path are not enough to identify a file. For each file, Filebeat stores unique identifiers to detect whether a file was harvested previously.
If your use case involves creating a large number of new files every day, you might find that the registry file grows to be too large. See Registry file is too large?edit for details about configuration options that you can set to resolve this issue.
Filebeat guarantees that events will be delivered to the configured output at least once and with no data loss. Filebeat is able to achieve this behavior because it stores the delivery state of each event in the registry file.
filebeat确保事件至少一次被传送给output,并且没有数据丢失。filebeat之所以能够做到这个行为是因为在注册文件中存储了每个事件的投递状态。
In situations where the defined output is blocked and has not confirmed all events, Filebeat will keep trying to send events until the output acknowledges that it has received the events.
假如output被阻挡了,并且没有确认所有的事件,filebeat会持续尝试发送事件知道output确认收到。
If Filebeat shuts down while it’s in the process of sending events, it does not wait for the output to acknowledge all events before shutting down. Any events that are sent to the output, but not acknowledged before Filebeat shuts down,are sent again when Filebeat is restarted. This ensures that each event is sent at least once, but you can end up with duplicate events being sent to the output. You can configure Filebeat to wait a specific amount of time before shutting down by setting the shutdown_timeout
option.