An Introduction to Fluentd

Thamidu Muthukumarana
5 min readNov 16, 2020
Photo by Solanger Mendoza on Unsplash

Logs are generated (should be generated) by all software. Logs can be used
to detect if the software is behaving within the expectations, or to investigate problems related to the software among other things. In many instances these logs have to be moved to different locations for various reasons. Common reasons include the need to back up these logs, and the need to send the logs to log analysis platforms. Even in small projects, manually moving logs is a tedious task, that should be automated. In large projects with many subsystems, it is really helpful to have a unified way to send the logs to the required destination.

Fluentd is an opensource data collector that can be used to transport logs in a uniform way. It is a powerful and light weight tool that lets you get logs from several sources, parse these logs, filter the logs and send them out via outputs. It is written in Ruby and C. There is also another related project called fluentbit which can be considered as a lighter version of fluentd. It’s solely written in C. At the time of writing both fluentd and fluentbit are licensed under the terms of the Apache License v2.0.

Fluentd treats each logged item as a fluentd event. The following section shows what a fluentd event consist of.

Fluentd events

A fluend event has three parts,

  1. Tag — Used for routing. Usually specifies where the event comes from
  2. Time — Time taken from the log event
  3. Record — The log as a JSON object. This JSON object is generated by a parser.

Fluentd uses the MessagePack format to represent an event internally. Fluentd collects these events together and form chunks in the output plugins before they are queued to be sent.

Fluentd Plugins

Fluentd uses an extensive collection of plugins to do what it does. This extensibility via plugins is one of the best things about fluentd. The plugins need to be configured in a configuration file. There is a list of plugin types in fluentd below.

Types of Plugins in fluentd

  • Input Plugins
  • Output Plugins
  • Filter Plugins
  • Parser Plugins
  • Formatter Plugins
  • Buffer Plugins
  • Storage Plugins
  • Service Discovery Plugins

Input Plugins

Input plugins provide fluentd the ability to ingest logs from external sources. Some of the input plugins include ,

in_tail — reads in appended data to a file as the file grows

in_forward — listens to a tcp socket to get an event stream sent via fluentd’s
forward protocol. This plugin is used send events from one fluentd instance to another.

There are other input plugins such as in_http, in_udp, in_exec etc. You can learn more about the available input plugins from here. Each of these plugins have different properties that need to be setup in the configuration file. You can write your own plugins using ruby using the plugin API provided by fluentd.

Output Plugins

Output plugins provide fluentd the ability to push logs to external destinations.

Some of the output plugins include,

out_s3 — Writes records to Amazon S3 storage system

out_elasticsearch — Writes records into Elasticsearch

out_webhdfs — Writes records into Hadoop Distributed File System

You can learn more about the other available output plugins from here.

Filter Plugins

Filter plugins give fluentd, the ability to filter out logs, add new data to logs, removing certain parts from the logs (useful to protect privacy). Learn about available filter plugins here.

Parser Plugins

Parser plugins are used by fluentd to convert unstructured log data in to a structured format. If no parser is used the whole log line is treated as a single string. There are some builtin parsers. More information about them can be found here. It is also possible to define your own parser in a parser configuration file using regular expressions.

Formatter Plugins

Formatter plugins are used by output plugins that support them in order to format the log data before they are sent to the destination. There are several built in formatter plugins to be used by output plugins.

Buffer Plugins

Buffer plugins are also another type of plugins used by output plugins. A supporting output plugin uses buffer plugins to store an incoming event stream before sending them to the destination until conditions for flushing the buffer is met.

Storage Plugins

Storage plugins can be used by input/filter/output plugins in order to store their internal state as key value pairs. Available buffer plugins include buf_file and buf_memory. With buf_file, each chunk (chunk is a collection of events bundled together in to a single blob) is managed one by one in the form of files. With buf_memory chunks are managed in the form of continuous memory blocks.

Service Discovery Plugins

Other plugins that support the service discovery feature can use these plugins in order to find the target locations. Currently service discovery is only supported by the forward plugin. The available service discovery plugins are,

static — allows to set up the server information in the configuration file itself

file — allows to update the targets by reading an external file

srv — allows to update the targets by SRV record

Example

Lets try to read the WSO2 Identity Server audit logs and write them to the standard out using fluentd’s tail and stdout plugins. First follow your preferred installation method to install fluentd in your system. The instructions can be found here.

Then add the following to the configuration file.

<source>
@type tail
path <IS-Home>/repository/logs/audit.log
tag audit_logs
<parse>
@type json
</parse>
</source>
<match audit_logs>
@type stdout
</match>

The source block is used to set the tail plugin as the input plugin and configure it. The path property is used to set the location of the log file to be read. The value of the tag property is used to set the tags of log events produced by the input plugin. The parse section configures how the logs should be parsed.

The match block instructs to capture all log chunks with the tag “audit_logs” and send them to the standard output.

Then when you start fluentd and perform some tasks that cause logs you will see that the logs will appear in the standard output as follows.

This was a small introduction to fluentd and description of it’s features. You are further encouraged to refer to the documentation of both fluentd’s and fluentbit’s documentation to learn more about them.

References

--

--

Thamidu Muthukumarana

Undergraduate at Department of Computer Science and Engineering, University of Moratuwa