The ELKBeats Stack: L is for Logstash

Read the first item in this Table of Contents if you haven't been here before.

Table of Contents


Logstash is the information gatherer in the ELK stack. You point it at logs and other data sources via the complex and unpleasant configuration, and it converts those sources to JSON and feeds it to ...

Installing Logstash

You've already done the prerequisites, right?:

# apt-get update
...
# apt-get install logstash

Logstash is a Ruby product, and the package is huge: 75M compressed. This appears to be because it doesn't bother to use the system Ruby, but instead brings with it a huge portion of JRuby 1.9, which you'll find installed under /opt/logstash/vendor/bundle/jruby/. Undoubtedly this makes the package more self-sufficient, but good lord it's huge.

Configure logstash by creating a basic config /etc/logstash/conf.d/apache.conf:

input {
    file {
        path => '/var/log/apache2/access.log'
    }
}

filter {
    grok {
        match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
}

output {
    stdout { codec => rubydebug }
    #elasticsearch { }
}

The logstash installation creates the folders /etc/logstash/ and /etc/logstash/conf.d/ but puts nothing at all in them. All we're doing here is setting up logstash to look at the local Apache log - this will only work if the Apache log is in its default location, change if that's appropriate to your situation. "grok" is telling logstash how to understand the incoming data, in this case providing a packaged regex named COMBINEDAPACHELOG. And again, if you have a non-standard Apache log, this may need to be changed. The last section tells Logstash what to do with the output: at the moment we're just going to dump it to stdout, later we'll send it to Elasticsearch. It's helpful to know we can do both at once.

Aside: About the word "grok." As a long-time science fiction fan, I feel compelled to point out that it originated in the novel Stranger in a Strange Land by Robert Heinlein. I'm going to say, for the purposes of working on the ELK Stack, it means "to deeply understand." Heinlein had all kinds of other ideas about that: "grok" has proven so popular and long-lasting that it even has its own Wikipedia entry.

The configuration can of course be much more complex: you can do intake on multiple files of various types, with each type requiring different filtering. This is where the conf.d/ folder setup will come in handy, grouping input-filter-output for a given filetype together(? I have NOT tested that theory yet).

To test:

# /opt/logstash/bin/logstash -f /etc/logstash/conf.d

Wait for "Logstash startup completed" (takes a few seconds), then point a browser at your local Apache and reload the page a couple times (you could also use curl or wget). This generates new content in the Apache log. You should see logstash analyzing the lines from the log - output will be of the form:

{
        "message" => "192.168.101.203 - - [01/Mar/2016:16:34:17 -0500] \"GET /icons/openlogo-75.png HTTP/1.1\" 304 181 \"http://192.168.101.137/\" \"Mozilla/5.0 (X11; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0 Iceweasel/44.0.2\"",
       "@version" => "1",
     "@timestamp" => "2016-03-01T21:34:18.258Z",
           "path" => "/var/log/apache2/access.log",
           "host" => "elktest",
       "clientip" => "192.168.101.203",
          "ident" => "-",
           "auth" => "-",
      "timestamp" => "01/Mar/2016:16:34:17 -0500",
           "verb" => "GET",
        "request" => "/icons/openlogo-75.png",
    "httpversion" => "1.1",
       "response" => "304",
          "bytes" => "181",
       "referrer" => "\"http://192.168.168.137/\"",
          "agent" => "\"Mozilla/5.0 (X11; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0 Iceweasel/44.0.2\""
}

This shows that logstash is successfully parsing the Apache logs (it's managed to identify and separate all the various parts). Kill it with Ctrl-C, then start it as a service:

# systemctl start logstash

The source I'm using recommends adding user "logstash" to group "adm" so that it can read the apache logs: I was hesitant to do this, what with the point of running logstash with an unprivileged user is so that it's ... well, unprivileged. And adding it to "adm" changes that significantly. But you need to find some way to make the Apache log readable to the logstash user (which it is NOT by default). In the end, I added "logstash" to "adm".

With that setting done, you can now run tail -f /var/log/logstash/logstash.stdout, hit Apache, and see logstash.stdout change on the fly.

This leaves us with a running logstash, but it won't start automatically after a reboot. Run this:

# systemctl enable logstash

It should now be running every time the machine boots.

The executable is /opt/logstash/bin/logstash. It logs to the /var/log/logstash/ folder. One common message to NOT be alarmed about is:

{:timestamp=>"2016-03-03T11:55:16.660000-0500", :message=>"SIGTERM received. Shutting down the pipeline.", :level=>:warn}

This is apparently the standard shutdown message (of course if you didn't shut it down or reboot your system, you shouldn't be seeing this).

Another common message is

{:timestamp=>"2016-03-02T15:14:33.558000-0500", :message=>"You may be interested in the '--configtest' flag which you can\nuse to validate logstash's configuration before you choose\nto restart a running system."}

Sounds like a good suggestion, but how to use that isn't obvious:

# /opt/logstash/bin/logstash --configtest --config /etc/logstash/conf.d/apache.conf

I hope you'll see the message "Configuration OK" - but you'll have to wait a while. On my reasonably equipped machine, this simple test is taking a rather staggering 10-15 seconds. Error messages aren't great, but do offer some guidance.

A parting note about logstash. As you start developing more complex input, filter, and output stanzas, you'll find it easiest to separate them into separate files in the /etc/logstash/conf.d/ folder. Order is important: when logstash reads these files in, it does so in the order the OS reads them, but it doesn't know or care about that ... it only knows that inputs should come before filters, which should come before outputs. Number them so they sort properly and logstash will be happy:

10-beats-input.conf
21-syslog-filter.conf
22-nginx-filter.conf
30-elasticsearch-output.conf

Continue to The ELKBeats Stack: E is for Elasticsearch, the next article in this series.