1. Introduction
Fluentd solves a major problem in today’s distributed and complex infrastructure–logging. Deploy fluentd on Kubernetes is a howto on deploying logging in your Kubernetes infrastructure. System logs and application logs help you to understand the activities inside your Kubernetes cluster. Once logs are collected, they can be used for:
- Security–may be needed for compliance
- Monitoring – application and system logs can help you understand what is happening inside your cluster and help detect potential problems. e.g monitoring memory
- Troubleshooting and Debugging – help solve problems
Like most modern applications, Kubernetes support logging to help with debugging and monitoring. Kubernetes usually reads from the underlying container engine like Docker. How much logs Kubernetes collects, therefore, depends on the logging level enabled at the underlying container engine.
There are different types of logging:
- Local logging: This is writing to the standard output and standard error streams inside the container itself. The problem with this method of logging is that when the container dies or is evicted, you may not have access to the logs.
- Node-level logging: node-level logging is when the container engine redirects everything from the container’s stdout and stderr to another location. For example, the Docker container engine redirects the 2 streams to a logging driver. Log rotation is a good way to ensure that the logs don’t clog the node. This method is better than local logging but still not a perfect solution because logs are localized on every node. The ideal solution is to have all the logs sent to a centralized node for centralized management.
- Cluster-level-logging. This requires a separate backend to store, analyze, and query logs. The backend can either be within or outside the cluster. A node-level logging agent (e.g fluentd) runs on each node and sends log data to a central logging node. Typically, the logging agent is a container that has access to a directory with log files from all of the application containers on that node. Kubernetes does not provide a native backend to store and analyze logs, but many existing logging solutions exist that integrate well with the Kubernetes cluster such as ElasticSearch and Stackdriver.
2. Fluentd, ElasticSearch, and Kibana
Deploy fluentd on Kubernetes tutorial discusses how to perform Kubernetes Cluster-level-logging using Fluentd, Elasticsearch, and Kibana. Fluentd is the logging agent deployed on every node. Fluentd sends the standard output and standard error for each container log collected to Elasticsearch for analysis. Visualization is done on Kibana. The diagram below (most diagrams are from the Fluentd website) depicts a pictorial view of Fluentd, Elasticsearch, and Kibana.
2.1 What is Elasticsearch?
Elasticsearch is a search engine that provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.
2.2 What is Kibana?
Kibana is an open-source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of content indexed on an Elasticsearch cluster. Users can create bar, line, and scatter plots, or pie charts and maps on top of large volumes of data.
2.3 What is Fluentd?
Fluentd Is a free and open-source log collector that instantly enables you to have a ‘Log Everything’ architecture. It has 3 main attributes:
- Unify all facets of processing log data: collecting, filtering, buffering, and outputting logs across multiple sources and destinations
- Fluentd treats logs as JSON, a popular machine-readable format.
- Fluentd is extensible and currently has over 600 plugins.
Fluentd agents are deployed on every node to gather all of the logs that are stored within individual nodes in the Kubernetes cluster. The logs can usually be found under the /var/log/containers directory.
Below is the simplified architecture of Fluentd. Fluentd is pluggable, extensible, and reliable. It can do buffering, HA, and load balancing.
Input: Tell fluentd what to log
Engine: The main engine contains the common concerns for logging. E.g buffering, error handling, message routing.
Output: Where to send output logs in the correct format e.g MongoDB or ProgreSQL or Elasticsearch
The input and output are pluggable and plugins can be classified into Read, Parse, Buffer, Write, and Format plugins. Plugins are further discussed below.
As can be seen from the Architecture, Fluentd collects logs from different sources/Applications to be logged. It can collect data from an infinite number of sources. Data collected is then output to the desired storage backend such as Mysql, MongoDB, or PostgreSQL. This is illustrated with this diagram.
2.4 Understanding Fluentd Logging
Fluentd works with plugins to get its mission accomplished. Some plugins are inbuilt but custom plugins can be developed as Fluentd is extensible. The different types of plugins are illustrated in the diagram below. A good reference for each of the plugins is on the fluentd website.
A brief description of each is presented here:
Interface | Description |
---|---|
Input | Entry point of data. This interface allows to gather or receive data from external sources. E.g: log file content, data over TCP, built-in metrics, etc. Can periodically pull data from data sources. A Fluentd event consists of a tag, time and record.
The input plugin is responsible for generating Fluentd events from specified data sources. |
Parser | Parsers enable the user to create their own parser formats to read the user’s custom data format. convert unstructured data gathered from the Input interface into a structured one. Parsers are optional and depend on Input plugins. |
Filter | Filter plugins enable Fluentd to modify event streams by the Input Plugin. Example use cases are:
|
Buffer | By default, the data ingested by the Input plugins reside in memory until is routed and delivered to an output interface. |
Output | An output defines a destination for the data. There are three types of output plugins: Non-Buffered, Buffered, and Time Sliced.
|
Formatter | Lets the user extend and re-use custom output formats |
2.5 Understanding how Fluentd Sends Kubernetes Logs to ElasticSearch
The installation instructions to deploy fluentd on Kubernetes are below but it's important to understand how Fluentd is configured. Fluentd will contact Elasticsearch on a well-defined URL and port, configured inside the Fluentd container. 3 Plugins are used here: Input, Filter, and Output. The diagram below depicts the configuration architecture and the different plugins are explained. The configuration file is called td-agent.conf. td in td-agent.conf implies Treasure Data, the company behind Fluentd.
The configuration file is located at /etc/td-agent/td-agent.conf
2.5.1 Input Plugin:
Here is the configuration in td-agent.conf to collect logs from /var/log/containers
<source>
type tail
path /var/log/containers/*.log
pos_file fluentd-docker.pos
time_format %Y-%m-%dT%H:%M:%S
tag kubernetes.*
format json
read_from_head true
</source>
2.5.2 Filter Plugin
To get more information out of Docker containers suitable for Kubernetes, a plugin called Kubernetes metadata is required.
To install the filter plugin:
gem install fluent-plugin-kubernetes_metadata_filter
Here is the configuration in td-agent.conf to scrap additional Kubernetes parameters
<filter kubernetes.var.log.containers.**.log>
type kubernetes_metadata
</filter>
2.5.3 Output Plugin
For the output, the Elasticsearch plugin will be installed. Full details of the Elasticsearch plugin can be found here
Prepare for the ruby gem to run and then install fluent-plugin:
apt install ruby sudo apt-get install make libcurl4-gnutls-dev sudo apt-get install build-essential sudo apt-get install ruby2.3-dev gem install fluent-plugin-elasticsearch
The configuration in td-agent.conf to send log files to Elasticsearch is here:
<match **>
type elasticsearch
user "#{ENV['FLUENT_ELASTICSEARCH_USER']}"
password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD']}"
log_level info
include_tag_key true
host elasticsearch-logging
port 9200
logstash_format true
# Set the chunk limit the same as for fluentd-gcp.
buffer_chunk_limit 2M
# Cap buffer memory usage to 2MiB/chunk * 32 chunks = 64 MiB
buffer_queue_limit 32
flush_interval 5s
# Never wait longer than 5 minutes between retries.
max_retry_wait 30
# Disable the limit on the number of retries (retry forever).
disable_retry_limit
# Use multiple threads for processing.
num_threads 8
</match>
Note that Fluentd, Elasticsearch, and Kibana will be deployed as different containers so the fluentd configurations above will be on the fluentd container.
3. Installing Fluentd, Elasticsearch, and Kibana
To deploy these services, let’s use Kubernetes manifest files which are already publicly available. We need to create a deployment and a service for each of the applications. You can find the manifest files cloned to this GitHub location. Only a few modifications were done to the yaml templates.
The Kubernetes installation was performed following kubernetes with KOPS, one of the earlier blog tutorials. One master and one slave node were used but you can use as many nodes desired. Fluentd is deployed as a daemonset so whenever an additional node is added, it will join the cluster and start sending logs to Elasticsearch on the master node.
Step 1: Clone the repository on your master Kubernetes node and then create the deployments and service objects:
kubectl create -f elastic-search-rc.yaml kubectl create -f elasticsearch-svc.yaml kubectl create -f kibana-rc.yaml kubectl create -f kibana-svc.yaml
Step 2: Create the fluentd daemonsets:
kubectl create -f fluentd-daemonset.yaml
Step 3: Check all is well, that all the Kubernetes objects are properly deployed:
$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system elasticsearch-logging-h68v6 1/1 Unknown 0 59d kube-system elasticsearch-logging-mpdkv 1/1 Running 5 59d kube-system etcd-fluentdmaster 1/1 Running 11 63d kube-system fluentd-es-1.24-2z7w5 1/1 Running 5 59d kube-system kibana-logging-5874ff6996-5wqfg 1/1 Running 5 59d kube-system kube-apiserver-fluentdmaster 1/1 Running 11 63d kube-system kube-controller-manager-fluentdmaster 1/1 Running 11 63d kube-system kube-dns-6f4fd4bdf-655tv 3/3 Running 30 63d kube-system kube-proxy-4ff9h 1/1 Running 10 63d kube-system kube-proxy-vclr9 1/1 Running 6 59d kube-system kube-scheduler-fluentdmaster 1/1 Running 11 63d kube-system weave-net-6w9sd 2/2 Running 19 59d kube-system weave-net-m24wm 2/2 Running 3 6h kubectl get svc --all-namespaces NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 63d kube-system elasticsearch-logging ClusterIP 10.111.123.66 <none> 9200/TCP 63d kube-system kibana-logging NodePort 10.96.204.66 <none> 80:30560/TCP 63d kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 63d
Step 4: Test Elasticsearch with basic query searches. If Elasticsearch is not working properly, Kibana will give errors when loaded in the browser. Note that the IP address is the service address of Easticsearch as can be seen in the kubectl get svc command above.
curl 10.111.123.66:9200/_search?q=*pretty curl 10.111.123.66:9200/_search?q=*warning The warning search will give a long output as follows: igbedo@fluentdmaster:~/Kubernetes-efk-stack$ curl 10.111.123.66:9200/_search?q=*warning {"took":20,"timed_out":false,"_shards":{"total":6,"successful":6,"failed":0},"hits":{"total":5,"max_score":1.0,"hits":[{"_index":"logstash-2018.06.12","_type":"fluentd","_id":"AWP1MbMPybVhNMr2IQmo","_score":1.0,"_source":{"type":"log","@timestamp":"2018-06-12T18:10:50Z","tags":["warning","elasticsearch","admin"],"pid":6,"message":"No living connections","log":"{\"type\":\"log\",\"@timestamp\":\"2018-06-12T18:10:50Z\",\"tags\":[\"warning\",\"elasticsearch\",\"admin\"],\"pid\":6,\"message\":\"No living connections\"}\n","stream":"stdout","docker":{"container_id":"ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15"},"kubernetes":{"container_name":"kibana-logging","namespace_name":"kube-system","pod_name":"kibana-logging-5874ff6996-5wqfg","pod_id":"d6402b2f-3f9b-11e8-902e-08002728f91c","labels":{"k8s-app":"kibana-logging","pod-template-hash":"1430992552"},"host":"fluentdslave1","master_url":"https://10.96.0.1:443/api"},"tag":"kubernetes.var.log.containers.kibana-logging-5874ff6996-5wqfg_kube-system_kibana-logging-ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15.log"}},{"_index":"logstash-2018.06.12","_type":"fluentd","_id":"AWP1Ma5tybVhNMr2IQkj","_score":1.0,"_source":{"log":"WARNING: Tini has been relocated to /sbin/tini.\n","stream":"stderr","docker":{"container_id":"ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15"},"kubernetes":{"container_name":"kibana-logging","namespace_name":"kube-system","pod_name":"kibana-logging-5874ff6996-5wqfg","pod_id":"d6402b2f-3f9b-11e8-902e-08002728f91c","labels":{"k8s-app":"kibana-logging","pod-template-hash":"1430992552"},"host":"fluentdslave1","master_url":"https://10.96.0.1:443/api"},"@timestamp":"2018-06-12T18:10:06+00:00","tag":"kubernetes.var.log.containers.kibana-logging-5874ff6996-5wqfg_kube-system_kibana-logging-ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15.log"}},{"_index":"logstash-2018.06.12","_type":"fluentd","_id":"AWP1MbMPybVhNMr2IQmu","_score":1.0,"_source":{"type":"log","@timestamp":"2018-06-12T18:10:52Z","tags":["warning","elasticsearch","admin"],"pid":6,"message":"No living connections","log":"{\"type\":\"log\",\"@timestamp\":\"2018-06-12T18:10:52Z\",\"tags\":[\"warning\",\"elasticsearch\",\"admin\"],\"pid\":6,\"message\":\"No living connections\"}\n","stream":"stdout","docker":{"container_id":"ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15"},"kubernetes":{"container_name":"kibana-logging","namespace_name":"kube-system","pod_name":"kibana-logging-5874ff6996-5wqfg","pod_id":"d6402b2f-3f9b-11e8-902e-08002728
You can ssh into any of the containers if need to troubleshoot a service:
$ kubectl exec -it fluentd-es-1.24-2z7w5 --namespace=kube-system -- /bin/bash
Step 5: If all goes well, put the IP of the Kibana service (obtained with kubectl get svc –all-namespaces) in your URL and you will see the Kibana Dashboard.
4. Conclusion
Deploy fluentd on Kubernetes tutorial discusses how to deploy Fluentd, Kibana, and Elasticserach on a Kubernetes cluster. You’ll have a fully functional Kubernetes cluster together with Logging by following this tutorial. Fluentd is very important and almost becoming the standard in modern architecture logging, replacing Syslog. If you like the tutorials, do subscribe to our blog and youtube channel for more coming your way.