Kubernetes

Deploy Fluentd on Kubernetes

Damian Igbe, Phd
March 24, 2022, 5:25 p.m.

Subscribe to Newsletter

Be first to know about new blogs, training offers, and company news.

1. Introduction

Fluentd solves a major problem in today’s distributed and complex infrastructure–logging. Deploy fluentd on Kubernetes is a howto on deploying logging in your Kubernetes infrastructure. System logs and application logs help you to understand the activities inside your Kubernetes cluster. Once logs are collected, they can be used for:

  • Security–may be needed for compliance
  • Monitoring – application and system logs can help you understand what is happening inside your cluster and help detect potential problems. e.g monitoring memory
  • Troubleshooting and Debugging – help solve problems

Like most modern applications, Kubernetes support logging to help with debugging and monitoring. Kubernetes usually reads from the underlying container engine like Docker. How much logs Kubernetes collects, therefore, depends on the logging level enabled at the underlying container engine.

There are different types of logging:

  1. Local logging: This is writing to the standard output and standard error streams inside the container itself. The problem with this method of logging is that when the container dies or is evicted, you may not have access to the logs.
  2. Node-level logging: node-level logging is when the container engine redirects everything from the container’s stdout and stderr to another location. For example, the Docker container engine redirects the 2 streams to a logging driver. Log rotation is a good way to ensure that the logs don’t clog the node. This method is better than local logging but still not a perfect solution because logs are localized on every node.  The ideal solution is to have all the logs sent to a centralized node for centralized management.
  3.  Cluster-level-logging. This requires a separate backend to store, analyze, and query logs. The backend can either be within or outside the cluster. A node-level logging agent (e.g fluentd) runs on each node and sends log data to a central logging node. Typically, the logging agent is a container that has access to a directory with log files from all of the application containers on that node. Kubernetes does not provide a native backend to store and analyze logs, but many existing logging solutions exist that integrate well with the Kubernetes cluster such as ElasticSearch and Stackdriver.

2. Fluentd, ElasticSearch, and Kibana

Deploy fluentd on Kubernetes tutorial discusses how to perform Kubernetes Cluster-level-logging using Fluentd, Elasticsearch, and Kibana. Fluentd is the logging agent deployed on every node. Fluentd sends the standard output and standard error for each container log collected to Elasticsearch for analysis.  Visualization is done on Kibana. The diagram below (most diagrams are from the Fluentd website) depicts a pictorial view of Fluentd, Elasticsearch, and Kibana.

 

2.1 What is Elasticsearch?

Elasticsearch is a search engine that provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

2.2 What is Kibana?

Kibana is an open-source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of content indexed on an Elasticsearch cluster. Users can create bar, line, and scatter plots, or pie charts and maps on top of large volumes of data.

2.3 What is Fluentd?

Fluentd Is a  free and open-source log collector that instantly enables you to have a ‘Log Everything’ architecture. It has 3 main attributes:

  • Unify all facets of processing log data: collecting, filtering, buffering, and outputting logs across multiple sources and destinations
  • Fluentd treats logs as JSON, a popular machine-readable format.
  • Fluentd is extensible and currently has over 600 plugins.

Fluentd agents are deployed on every node to gather all of the logs that are stored within individual nodes in the Kubernetes cluster. The logs can usually be found under the /var/log/containers directory.

Below is the simplified architecture of Fluentd. Fluentd is pluggable, extensible, and reliable. It can do buffering, HA, and load balancing.

Input: Tell fluentd what to log

Engine:  The main engine contains the common concerns for logging. E.g buffering, error handling, message routing.

Output:  Where to send output logs in the correct format e.g MongoDB or ProgreSQL or Elasticsearch

The input and output are pluggable and plugins can be classified into Read, Parse, Buffer, Write, and Format plugins. Plugins are further discussed below.

As can be seen from the Architecture, Fluentd collects logs from different sources/Applications to be logged. It can collect data from an infinite number of sources. Data collected is then output to the desired storage backend such as Mysql, MongoDB, or PostgreSQL. This is illustrated with this diagram.

 

2.4 Understanding Fluentd Logging

Fluentd works with plugins to get its mission accomplished. Some plugins are inbuilt but custom plugins can be developed as Fluentd is extensible. The different types of plugins are illustrated in the diagram below. A good reference for each of the plugins is on the fluentd website.

A brief description of each is presented here:

Interface Description
Input Entry point of data. This interface allows to gather or receive data from external sources. E.g: log file content, data over TCP, built-in metrics, etc. Can periodically pull data from data sources. A Fluentd event consists of a tag, time and record.

 

  • tag: Where an event comes from. For message routing
  • time: When an event happens. Epoch time
  • record: Actual log content. JSON object

The input plugin is responsible for generating Fluentd events from specified data sources.

Parser Parsers enable the user to create their own parser formats to read the user’s custom data format. convert unstructured data gathered from the Input interface into a structured one. Parsers are optional and depend on Input plugins.
Filter Filter plugins enable Fluentd to modify event streams by the Input Plugin. Example use cases are:

 

  1. Filtering out events by grepping the value of one or more fields.
  2. Enriching events by adding new fields.
  3. Deleting or masking certain fields for privacy and compliance.
Buffer By default, the data ingested by the Input plugins reside in memory until is routed and delivered to an output interface.
Output  An output defines a destination for the data. There are three types of output plugins: Non-Buffered, Buffered, and Time Sliced.

 

  • Non-Buffered output plugins do not buffer data and immediately write out results.
  • Buffered output plugins maintain a queue of chunks (a chunk is a collection of events)
  • Time Sliced Output plugins are  a type of Buffered plugin where the chunks are keyed by time.
Formatter Lets the user extend and re-use custom output formats

2.5 Understanding how Fluentd Sends Kubernetes Logs to ElasticSearch

The installation instructions to deploy fluentd on Kubernetes are below but it's important to understand how Fluentd is configured. Fluentd will contact Elasticsearch on a well-defined URL and port, configured inside the Fluentd container. 3 Plugins are used here: Input, Filter, and Output. The diagram below depicts the configuration architecture and the different plugins are explained. The configuration file is called td-agent.conf. td in td-agent.conf implies Treasure Data, the company behind Fluentd.

 

The configuration file is located at /etc/td-agent/td-agent.conf

2.5.1 Input Plugin:

Here is the configuration in td-agent.conf to collect logs from /var/log/containers

<source>
  type tail
  path /var/log/containers/*.log
  pos_file fluentd-docker.pos
  time_format %Y-%m-%dT%H:%M:%S
  tag kubernetes.*
  format json
  read_from_head true
</source>

2.5.2 Filter Plugin

To get more information out of Docker containers suitable for Kubernetes, a plugin called Kubernetes metadata is required.

To install the filter plugin:

gem install fluent-plugin-kubernetes_metadata_filter

Here is the configuration in td-agent.conf to scrap additional Kubernetes parameters

<filter kubernetes.var.log.containers.**.log>
type kubernetes_metadata
</filter> 

2.5.3 Output Plugin

For the output, the Elasticsearch plugin will be installed. Full details of the Elasticsearch plugin can be found here

Prepare for the ruby gem to run and then install fluent-plugin:

apt install ruby
sudo apt-get install make libcurl4-gnutls-dev 
sudo apt-get install build-essential 
sudo apt-get install ruby2.3-dev
gem install fluent-plugin-elasticsearch

The configuration in td-agent.conf to send log files to Elasticsearch is here:

<match **>
type elasticsearch
user "#{ENV['FLUENT_ELASTICSEARCH_USER']}"
password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD']}"
log_level info
include_tag_key true
host elasticsearch-logging
port 9200
logstash_format true
# Set the chunk limit the same as for fluentd-gcp.
buffer_chunk_limit 2M
# Cap buffer memory usage to 2MiB/chunk * 32 chunks = 64 MiB
buffer_queue_limit 32
flush_interval 5s
# Never wait longer than 5 minutes between retries.
max_retry_wait 30
# Disable the limit on the number of retries (retry forever).
disable_retry_limit
# Use multiple threads for processing.
num_threads 8
</match> 

Note that Fluentd, Elasticsearch, and Kibana will be deployed as different containers so the fluentd configurations above will be on the fluentd container.

3. Installing Fluentd, Elasticsearch, and Kibana

To deploy these services, let’s use Kubernetes manifest files which are already publicly available. We need to create a deployment and a service for each of the applications. You can find the manifest files cloned to this GitHub location. Only a few modifications were done to the yaml templates.

The Kubernetes installation was performed following kubernetes with KOPS, one of the earlier blog tutorials. One master and one slave node were used but you can use as many nodes desired. Fluentd is deployed as a daemonset so whenever an additional node is added, it will join the cluster and start sending logs to Elasticsearch on the master node.

Step 1: Clone the repository on your master Kubernetes node and then create  the deployments and service objects:

kubectl create -f elastic-search-rc.yaml
kubectl create -f elasticsearch-svc.yaml
kubectl create -f kibana-rc.yaml
kubectl create -f kibana-svc.yaml

Step 2: Create the  fluentd daemonsets:

kubectl create -f fluentd-daemonset.yaml

Step 3: Check all is well, that all the Kubernetes objects are properly deployed:

$ kubectl get pods --all-namespaces

NAMESPACE   NAME          READY        STATUS RESTARTS AGE
kube-system elasticsearch-logging-h68v6 1/1 Unknown 0 59d
kube-system elasticsearch-logging-mpdkv 1/1 Running 5 59d
kube-system etcd-fluentdmaster 1/1 Running 11 63d
kube-system fluentd-es-1.24-2z7w5 1/1 Running 5 59d
kube-system kibana-logging-5874ff6996-5wqfg 1/1 Running 5 59d
kube-system kube-apiserver-fluentdmaster 1/1 Running 11 63d
kube-system kube-controller-manager-fluentdmaster 1/1 Running 11 63d
kube-system kube-dns-6f4fd4bdf-655tv 3/3 Running 30 63d
kube-system kube-proxy-4ff9h 1/1 Running 10 63d
kube-system kube-proxy-vclr9 1/1 Running 6 59d
kube-system kube-scheduler-fluentdmaster 1/1 Running 11 63d
kube-system weave-net-6w9sd 2/2 Running 19 59d
kube-system weave-net-m24wm 2/2 Running 3 6h

kubectl get svc --all-namespaces
NAMESPACE NAME     TYPE      CLUSTER-IP      EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 63d
kube-system elasticsearch-logging ClusterIP 10.111.123.66 <none> 9200/TCP 63d
kube-system kibana-logging NodePort 10.96.204.66 <none> 80:30560/TCP 63d
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 63d

Step 4: Test  Elasticsearch with basic query searches. If Elasticsearch is not working properly, Kibana will give errors when loaded in the browser. Note that the IP address is the service address of Easticsearch as can be seen in the kubectl get svc command above.

curl 10.111.123.66:9200/_search?q=*pretty 
curl 10.111.123.66:9200/_search?q=*warning

The warning search will give a long output as follows:
igbedo@fluentdmaster:~/Kubernetes-efk-stack$ curl 10.111.123.66:9200/_search?q=*warning
{"took":20,"timed_out":false,"_shards":{"total":6,"successful":6,"failed":0},"hits":{"total":5,"max_score":1.0,"hits":[{"_index":"logstash-2018.06.12","_type":"fluentd","_id":"AWP1MbMPybVhNMr2IQmo","_score":1.0,"_source":{"type":"log","@timestamp":"2018-06-12T18:10:50Z","tags":["warning","elasticsearch","admin"],"pid":6,"message":"No living connections","log":"{\"type\":\"log\",\"@timestamp\":\"2018-06-12T18:10:50Z\",\"tags\":[\"warning\",\"elasticsearch\",\"admin\"],\"pid\":6,\"message\":\"No living connections\"}\n","stream":"stdout","docker":{"container_id":"ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15"},"kubernetes":{"container_name":"kibana-logging","namespace_name":"kube-system","pod_name":"kibana-logging-5874ff6996-5wqfg","pod_id":"d6402b2f-3f9b-11e8-902e-08002728f91c","labels":{"k8s-app":"kibana-logging","pod-template-hash":"1430992552"},"host":"fluentdslave1","master_url":"https://10.96.0.1:443/api"},"tag":"kubernetes.var.log.containers.kibana-logging-5874ff6996-5wqfg_kube-system_kibana-logging-ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15.log"}},{"_index":"logstash-2018.06.12","_type":"fluentd","_id":"AWP1Ma5tybVhNMr2IQkj","_score":1.0,"_source":{"log":"WARNING: Tini has been relocated to /sbin/tini.\n","stream":"stderr","docker":{"container_id":"ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15"},"kubernetes":{"container_name":"kibana-logging","namespace_name":"kube-system","pod_name":"kibana-logging-5874ff6996-5wqfg","pod_id":"d6402b2f-3f9b-11e8-902e-08002728f91c","labels":{"k8s-app":"kibana-logging","pod-template-hash":"1430992552"},"host":"fluentdslave1","master_url":"https://10.96.0.1:443/api"},"@timestamp":"2018-06-12T18:10:06+00:00","tag":"kubernetes.var.log.containers.kibana-logging-5874ff6996-5wqfg_kube-system_kibana-logging-ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15.log"}},{"_index":"logstash-2018.06.12","_type":"fluentd","_id":"AWP1MbMPybVhNMr2IQmu","_score":1.0,"_source":{"type":"log","@timestamp":"2018-06-12T18:10:52Z","tags":["warning","elasticsearch","admin"],"pid":6,"message":"No living connections","log":"{\"type\":\"log\",\"@timestamp\":\"2018-06-12T18:10:52Z\",\"tags\":[\"warning\",\"elasticsearch\",\"admin\"],\"pid\":6,\"message\":\"No living connections\"}\n","stream":"stdout","docker":{"container_id":"ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15"},"kubernetes":{"container_name":"kibana-logging","namespace_name":"kube-system","pod_name":"kibana-logging-5874ff6996-5wqfg","pod_id":"d6402b2f-3f9b-11e8-902e-08002728

You can ssh into any of the containers if need to troubleshoot a service:

$ kubectl exec -it fluentd-es-1.24-2z7w5 --namespace=kube-system -- /bin/bash

Step 5: If all goes well, put the IP of the Kibana service (obtained with kubectl get svc –all-namespaces) in your URL and you will see the Kibana Dashboard.

 

 

 

4. Conclusion

Deploy fluentd on Kubernetes tutorial discusses how to deploy Fluentd, Kibana, and Elasticserach on a Kubernetes cluster. You’ll have a fully functional Kubernetes cluster together with Logging by following this tutorial. Fluentd is very important and almost becoming the standard in modern architecture logging, replacing Syslog. If you like the tutorials, do subscribe to our blog and youtube channel for more coming your way.

Zero-to-Hero Program: We Train and Mentor you to land your first Tech role