Export Kubernetes Logs to Azure Log Analytics with Fluent Bit

Every container you run in Kubernetes is going to be generating log data. No one has time to go through and regularly check individual container logs for issues, and so in production environments, it is often required to export these logs to an aggregator for automated analysis.

If you’re using Azure, then Log Analytics may be your log aggregator of choice, and so you need a way to export your container logs into Log Analytics. If you are using AKS, you can deploy the Azure Monitor solution which does this for you, however, if you are running your own cluster, or even using another cloud provider and still want to use Log Analytics, then that it’s not quite so simple. This is where Fluent Bit can help.

Fluent Bit is a popular open source tool for processing and forwarding logs. It can be used to take logs from a variety of systems (one of which is Kubernetes) and process and forward these to multiple destinations. One of these supported destinations is Azure Log Analytics. Other platforms like AWS Elasticsearch, Kafka and InfluxDB, are also supported, you can see a full list here.

While Fluent Bit is not explicitly built for Kubernetes, it does have a native way to deploy and configure it on a Kubernetes cluster using Daemon sets. While the documentation is pretty good, the example configurations all focus around Elasticsearch and Kafka, so I needed to make some tweaks to get it to work with Log Analytics, which are detailed below. You can also find the amended files on my Github here.

At present Fluent Bit is Linux only, so it does not work with Windows Containers running in Kubernetes

Fluent Bit is a powerful tool and can do some pretty useful parsing of log data before it is exported to your log aggregator. I mentioned that if you are using AKS you can use the Azure Monitor solution, but if you want to do some processing of your log data before export, then you may want to consider using something like Fluent Bit, even if you are using AKS.

Deployment

Pre-Requisites

Before we deploy the Fluent Bit Daemon set, we need to deploy some pre-requisite resource:

  • Namespace
  • Service Account
  • Role
  • Role Binding

The easiest way to do this is using the YAML files stored in Fluent Bits Github Repo, but before you run the commands below make sure you have read these files and have understood what they will do on your cluster.

To deploy the resources run the command below:

kubectl create namespace logging
kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-service-account.yaml
kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role.yaml
kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role-binding.yaml

Config Map

We now need to deploy a config map which contains the settings to be used by Fluent Bits Daemon set. This config map defines which of Fluent Bit’s input and output plugins are used. In our case, we want to use the Kubernetes input and Azure output. The example on the Fluent Bit Github site is for Elasticsearch, so I have updated this to work with Log Analytics.

The critical part for outputting to Log Analytics is the output plugin. The entry in the config map looks like this:

  output-azure.conf: |
    [OUTPUT]
        Name        azure
        Match       *
        Customer_ID ${FLUENT_AZURE_WORKSPACE_ID}
        Shared_Key  ${FLUENT_AZURE_WORKSPACE_KEY}

Here we define we want to use the Azure plugin, to export all the data we receive, and then providing the data needed to connect to the workspace.

  • Customer_ID - This is the Log Analytics Workspace ID, the GUID found on the advanced settings page
  • Shared_Key - The workspace Key, again found on the advanced settings page

In the entry above, we have configured the values of these to come from environment variables attached to the Daemon set.

In addition to the Azure configuration, the full configuration file also defines:

  • Using the Kubernetes Input Plugin
  • The Azure Log Analytics Output Plugin
  • A Kubernetes Filter, this enriches the data from the logs with metadata about where it has come from. Information such as the pod name, namespace and labels are added to the log entry.
  • Parses for common container types - these parse the data in the logs and give them a better structure before export. If the resources you are interested in does not have a parser, you might want to consider building your own.
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
  labels:
    k8s-app: fluent-bit
data:
  # Configuration files: server, input, filters and output
  # ======================================================
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-azure.conf

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10

  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off

  output-azure.conf: |
    [OUTPUT]
        Name        azure
        Match       *
        Customer_ID ${FLUENT_AZURE_WORKSPACE_ID}
        Shared_Key  ${FLUENT_AZURE_WORKSPACE_KEY}

  parsers.conf: |
    [PARSER]
        Name   apache
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache2
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache_error
        Format regex
        Regex  ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$

    [PARSER]
        Name   nginx
        Format regex
        Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On
        # Command      |  Decoder | Field | Optional Action
        # =============|==================|=================
        Decode_Field_As   escaped    log

    [PARSER]
        Name        syslog
        Format      regex
        Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key    time
        Time_Format %b %d %H:%M:%S

Save this somewhere on your machine, then run the command below to create the config map.

kubectl create -f fluent-bit-configmap.yaml

Where “fluent-bit-configmap.yaml” is the path to the config map file.

Daemonset

Now that we have our configuration setup, we need to create a Daemon set, which deploys a Fluent Bit pod on every node to collect the required data. Again, the example on Github is for Elasticsearch, so we need to amend this slightly. The only change we need to make is to change the environment variables passed into the pod to be the workspace ID and Key.

To keep my workspace key private, I have created a secret to store the ID and Key. I created the YAML file below to define the secret. The values are base64 encoded versions of the ID and Secret:

apiVersion: v1
kind: Secret
metadata:
  name: LogAnalytics
type: Opaque
data:
  WorkSpaceID: xxxxxxxxxxxxxx
  WorkspaceKey: xxxxxxxxxxxxxxxxx

I then create the secret by running:

kubectl apply -f ./secret.yaml

Now, we can amend the Daemonset YAML to use these secrets:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
  labels:
    k8s-app: fluent-bit-logging
    version: v1
    kubernetes.io/cluster-service: "true"
spec:
  template:
    metadata:
      labels:
        k8s-app: fluent-bit-logging
        version: v1
        kubernetes.io/cluster-service: "true"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "2020"
        prometheus.io/path: /api/v1/metrics/prometheus
    spec:
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:1.0.6
        imagePullPolicy: Always
        ports:
          - containerPort: 2020
        env:
        - name: FLUENT_AZURE_WORKSPACE_ID
          valueFrom:
            secretKeyRef:
              name: LogAnalytics
              key: WorkSpaceID
        - name: FLUENT_AZURE_WORKSPACE_KEY
          valueFrom:
            secretKeyRef:
              name: LogAnalytics
              key: WorkspaceKey
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
      terminationGracePeriodSeconds: 10
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
      serviceAccountName: fluent-bit
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"

We then deploy this using the command below:

kubectl create -f fluent-bit-ds.yaml

Once that has run, if we run “kubectl get pods -n logging” on the cluster, we should see the Daemon set pods created.

Pods

Log Data

Once Fluent Bit has been running for a few minutes, we should start to see data appear in Log Analytics. To check, open your workspace, go to logs, and under the “Custom Logs” section, you should see “fluentbit_CL”.

Logs

If you select the view icon (the eye to the right), it will create the query below, to get some sample data:

fluentbit_CL
| limit 50

Run this query, and you should get some records returned. If you expand a record, you will be able to see the log data, plus all the Kubernetes metadata. The log data itself is in the “log_s” field.

Log Data

Now the data is in Log Analytics, you can create queries using the Kusto Language to do whatever you need with the data.

Take-Away

Fluent Bit is a powerful tool for collecting and processing log data and sending it where it needs to go. It is also reasonably easy to set up and get data exporting to Log Analytics (or any other log aggregator), yet has the potential to allow you to do some very sophisticated filtering, parsing and process if you need to.

If your using AKS and you simply want to get log data into Log Analytics, then the Azure Monitor plugin is probably the easiest way to do this. However, if you are not using AKS, or if you are but need more powerful processing of your logs, then Fluent Bit might be a useful tool.