elk分析nginx访问日志

使用rsyslog + elk分析nginx访问日志。

elk日志分析栈,即elasticsearch logstash kibana。

为了减少对现有系统的改造,故使用rsyslog来读取应用服务器的nginx access_log,使用udp的方式发送到elk日志分析服务器的logstash,logstash解析日志后写入elasticsearch,进行存储和索引,最后由kibana来查询elasticsearch中的数据进行分析。

nginx.conf

log_format main '$request_time $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"';

nginx日志

0.083 111.112.113.119 - - [25/Jul/2015:17:39:12 +0800] "POST /m/write?&title=xxx&img= HTTP/1.1" 200 83 "-" "Dalvik/1.6.0 (Linux; U; Android 4.4.4; MI 4LTE MIUI/V6.5.3.0.KXGCNCD)" "-"

/etc/rsyslog.conf

$ModLoad imuxsock
$ModLoad imklog
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
$IncludeConfig /etc/rsyslog.d/*.conf
*.info;mail.none;authpriv.none;cron.none /var/log/messages
authpriv.* /var/log/secure
mail.* -/var/log/maillog
cron.* /var/log/cron
*.emerg *
uucp,news.crit /var/log/spooler
local7.* /var/log/boot.log
$ModLoad imfile
$InputFileName /var/log/nginx/access.log
$InputFileTag nginx-access:
$InputFileStateFile state-nginx-accesslog
$InputRunFileMonitor
$InputFilePollInterval 10
if $programname == 'nginx-access' then @@YOUR_ELK_IP:514
if $programname == 'nginx-access' then ~

最终logstash接收到的信息如下:
里面包含了一些rsyslog附加的信息。

<133>Jul 25 10:37:45 APPSERVE_RNAME nginx-access: 0.455 111.112.113.119 - - [25/Jul/2015:10:37:43 +0800] "POST /m/write?&title=xxx&create=1 HTTP/1.1" 200 121 "-" "Dalvik/v3.3.86_update2 (Linux; U; Android 4.4.4; m1 Build/KTU84P)" "-"

下面是最终的filter
logstash.conf

input {
tcp {
port => 514
type => syslog
}
udp {
port => 514
type => syslog
}
}
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{NUMBER:request_time} %{IPORHOST:client_ip} \- (%{WORD:remote_user}|-) \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:http_verb} %{NOTSPACE:http_request}(?: HTTP/%{NUMBER:http _version})?)\" %{NUMBER:http_status_code} (?:%{NUMBER:bytes_read}|-) %{QS:referrer} %{QS:agent} \"(%{IP:x_forwarder_for}|-)\"" }
}
kv {
source => "http_request"
prefix => "arg_"
field_split => "&?"
value_split => "="
exclude_keys => [ "title", "content" ]
}
urldecode {
field => "http_request"
}
mutate {
remove_field => [ "message" ]
}
geoip {
source => "client_ip"
}
syslog_pri { }
date {
match => [ "timestamp", "d/MMM/y:HH:mm:ss Z", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
output {
elasticsearch {
host => "elk-elasticsearch"
protocol => "http"
}
}

2016.08.01 更新

nginx 1.7.1 版本之后支持将error_log和access_log发送到syslog,当然也支持远端syslog服务。

这样的话,可以将上面 /etc/rsyslog.conf 的部分去掉,简单地把下面这行配置加到nginx配置文件中即可。

access_log syslog:server=syslog.yourdomain.com:514 default;

官方CHANGES LIST

官方教程 Logging to syslog

官方文档 ngx_http_log_module

Changes with nginx 1.7.1 27 May 2014
*) Feature: the "error_log" and "access_log" directives now support
logging to syslog.