Tuesday, March 28, 2017

kubernetes log collection with custom ELK stack

logstash / k8s.conf: 
input {
  file {
    path => "/var/log/containers/*.log"
    start_position => "beginning"
  }
}

filter {
  kubernetes {}
  mutate { remove_field => "path" }
  json { source => "message" }
}

output {
  if "_jsonparsefailure" not in [tags] {
    mutate { remove_field => "message" }
  }
  stdout {  # for test
    codec => json
  }
  elasticsearch {
    hosts => ["elk-e0:9200", "elk-e1:9200", "elk-e2:9200"]
    index => "kube-%{+YYYY.MM.dd}"
  }
}

安装:
  1. gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
  2. \curl -sSL https://get.rvm.io | bash -s stable 
  3. rvm install jruby 1.7
  4. dpkg -i logstash-5.2.2.deb
  5. /usr/share/logstash/bin/logstash-plugin install logstash-filter-kubernetes-0.3.1.gem
  6. /usr/share/logstash/bin/logstash -f k8s.conf

测试:
  1. mkdir -p /var/log/containers/
  2. echo '{"log":"\n","stream":"stdout","time":"2017-03-13T09:28:04.20730347Z"}' >> /var/log/containers/redis-master_default_master-ce24440abd65b3702d7dc0588a2a1e099bc41e6b7833456774bb4845d7958429.log 

依据:
  1. kubelet maintain the symlinks on /var/logs/containers/
  2. output.elasticsearch.hosts: If given an array it will load balance requests across the hosts specified in the hosts parameter.
  3. k8s默认是支持fluentd,所以推断必然有办法收集日志和分析tag
    1. 不支持自定义es服务,只能用自带的,不符合需求
    2. es服务以container方式运行于k8s的cluster-service上
    3. 自动安装了fluentd到各个node上,并且自动配置和发现(k8s/service: elasticsearch-logging)
    4. 相关代码位于kubernetes/cluster/addons/fluentd-elasticsearch/

参考文档:
  1. https://kubernetes.io/docs/tasks/debug-application-cluster/logging-elasticsearch-kibana/
  2. https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/master/lib/fluent/plugin/filter_kubernetes_metadata.rb
  3. https://github.com/vaijab/logstash-filter-kubernetes
  4. https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-hosts
  5. https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/fluentd-elasticsearch/fluentd-es-image/td-agent.conf
  6. http://www.tuicool.com/articles/jEBBZbb



schema:
PUT _template/template_kube
{
  "template": "kube-*",
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  },
  "mappings": {
    "_default_": {
      "properties": {
        "host": {
          "type": "keyword"
        },
        "kubernetes.container_id": {
          "type": "keyword"
        },
        "kubernetes.container_name": {
          "type": "keyword"
        },
        "kubernetes.namespace": {
          "type": "keyword"
        },
        "kubernetes.pod": {
          "type": "keyword"
        },
        "kubernetes.replication_controller": {
          "type": "keyword"
        },
        "log": {
          "type": "text",
          "analyzer": "english"
        },
        "message": {
          "type": "text"
        },
        "stream": {
          "type": "keyword"
        },
        "time": {
          "type": "date"
        },
        "tags": {
          "type": "nested"
        }
      }
    }
  }
}

依据:
  1. Indexes imported from 2.x only support string and not text or keyword.
  2. For the legacy mapping type string the index option only accepts legacy values analyzed (default, treat as full-text field), not_analyzed (treat as keyword field) and no.

refs:
  1. https://www.elastic.co/guide/en/elasticsearch/reference/current/string.html
  2. https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html

未来展望
  1. 可能的日志深度分析:[ https://logz.io/learn/docker-monitoring-elk-stack/ ]
  2. filter的逻辑还是较fluentd-k8s的插件为简单,如果需要更多元数据,可以做代码迁移
  3. k8s集群本身的日志收集还没有做
  4. 对外提供Elasticsearch的schema自定义会有很重要的意义,但计费和资源控制就比较麻烦



日志空间回收

  1. 如果根据日志事件删除,会造成服务器负载高的问题
  2. 以index名称区分日期,"kube-%{+YYYY.MM.dd}"
  3. crontab
    1. 25 2 * * * curl -XDELETE "localhost:9200/kube-`date --date='30 day ago' +%F`"

refs
  1. http://orchome.com/477
  2. https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html
  3. https://my.oschina.net/ylchou/blog/507075
  4. http://stackoverflow.com/questions/15374752/get-yesterdays-date-in-bash-on-linux-dst-safe/29081965#29081965

No comments:

Post a Comment