Merge pull request #30 from Dutzu/master

Syntax highlighting fix
This commit is contained in:
Holger Reinhardt 2016-02-04 12:55:46 +01:00
commit 86e8ba070f
2 changed files with 18 additions and 15 deletions

View file

@ -73,10 +73,11 @@ This is a pain because if you want to properly visualize a set of log messages g
Let's take a look at what fluentd sends to Elasticsearch. Here is a sample log file with 2 log messages:
~~~java
~~~
2015-11-12 06:34:01,471 [ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO ==== Request ===
2015-11-12 06:34:01,473 [ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO GET /monitor/broker/ HTTP/1.1
~~~
{: .language-java}
A message sent to Elasticsearch from fluentd would contain these values:
@ -87,6 +88,7 @@ A message sent to Elasticsearch from fluentd would contain these values:
2015-11-12 06:34:01 -0800 tag.common: {"message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO GET /monitor/broker/ HTTP/1.1\n","time_as_string":"2015-11-12 06:34:01 -0800"}
~~~
{: .language-java}
I added the `time_as_string` field in there just so you can see the literal string that is sent as the time value.
@ -107,7 +109,7 @@ Next you need to parse the timestamp of your logs into separate date, time and m
</record>
</filter>
~~~
{: .language-xml}
The result is that the above sample will come out like this:
@ -115,7 +117,7 @@ The result is that the above sample will come out like this:
2015-12-12 05:26:15 -0800 akai.common: {"date_string":"2015-11-12","time_string":"06:34:01","msec":"471","message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO ==== Request ===","@timestamp":"2015-11-12T06:34:01.471Z"}
2015-12-12 05:26:15 -0800 akai.common: {"date_string":"2015-11-12","time_string":"06:34:01","msec":"473","message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO GET /monitor/broker/ HTTP/1.1\n","@timestamp":"2015-11-12T06:34:01.473Z"}
~~~
{: .language-java}
*__Note__: you can use the same record_transformer filter to remove the 3 separate time components after creating the `@timestamp` field via the `remove_keys` option.*
### Do not analyse
@ -146,14 +148,14 @@ Using this example configuration I tried to create a pie chart showing the numbe
</record>
</filter>
~~~
{: .language-xml}
Sample output from stdout:
~~~
2015-12-12 06:01:35 -0800 clear: {"date_string":"2015-10-15","time_string":"06:37:32","msec":"415","message":"[amelJettyClient(0xdc64419)-706] jetty:test/test INFO totallyAnonymousContent: http://whyAreYouReadingThis?:)/history/3374425?limit=1","@timestamp":"2015-10-15T06:37:32.415Z","sourceProject":"Test-Analyzed-Field"}
~~~
{: .language-java}
And here is the result of trying to use it in a visualization:
{:.center}
@ -175,11 +177,11 @@ curl -XPUT localhost:9200/_template/template_doru -d '{
"settings" : {....
}'
~~~
{: .language-bash}
The main thing to note in the whole template is this section:
~~~ json
~~~
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
@ -192,7 +194,7 @@ The main thing to note in the whole template is this section:
}
}
~~~
{: .language-json}
This tells Elasticsearch that for any field of type string that it receives it should create a mapping of type string that is analyzed + another field that adds a `.raw` suffix that will not be analyzed.
The `not_analyzed` suffixed field is the one you can safely use in visualizations, but do keep in mind that this creates the scenario mentioned before where you can have up to 40% inflation in storage requirements because you will have both analyzed and not_analyzed fields in store.

View file

@ -26,7 +26,7 @@ The simplest approach is to just parse all messages using the common denominator
In the case of a typical log file a configuration can be something like this (but not necessarily):
~~~ xml
~~~
<source>
type tail
path /var/log/test.log
@ -39,7 +39,7 @@ In the case of a typical log file a configuration can be something like this (bu
format1 /(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{3}) (?<message>(.|\s)*)/
</source>
~~~
{: .language-xml}
You will notice we still do a bit of parsing, the minimal level would be to just have a multiline format to split the log contents into separate messages and then to push the contents on.
The reason we do not just put everything into a single field with a greedy regex pattern is to have the correct timestamp pushed showing the time of the log and not the time when the log message was read by the log shipper, along with the rest of the message.
@ -90,6 +90,7 @@ An example of this is shown in the configuration below:
type stdout
</match>
~~~
{: .language-ruby}
This approach is useful when we have multiline log messages within our logfile and the messages themselves have different formats for the content. Still, the important thing to note is that all log messages are prefixed by a standard timestamp, this is key to succesfully splitting messages correctly.
@ -99,10 +100,10 @@ Fluentd will continue to read logfile lines and keep them in a buffer until a li
Looking at the example, all our log messages (single or multiline) will take the form:
~~~ json
~~~
{ "time":"2015-10-15 08:21:04,716", "message":"[ ttt-grp-127.0.0.1-8119-test-11] LogInterceptor INFO HTTP/1.1 200 OK" }
~~~
{: .language-json}
Being tagged with log.unprocessed all the messages will be caught by the *rewrite_tag_filter* match tag and it is at this point that we can pinpoint what type of contents each message has and we can re-tag them for individual processing.
This module is key to the whole mechanism as the *rewrite_tag_filter* takes the role of a router. You can use this module to redirect messages to different processing modules or even outputs depending on the rules you define in it.
@ -159,7 +160,7 @@ An example of this approach can be seen below:
</pattern>
</source>
~~~
{: .language-ruby}
When choosing this path there are multiple issues you need to be aware of:
* The pattern matching is done sequentially and the first pattern that matches the message is used to parse it and the message is passed along
* You need to make sure the most specific patterns are higher in the list and the more generic ones lower
@ -208,7 +209,7 @@ AKA_ARGO_LOG2 %{AKAIDATESTAMP2:time} %{WORD:argoComponent} *%{LOGLEVEL:logLevel}
AKA_ARGO_SOURCE (GC|CMS)
AKA_ARGO_GC \[%{AKA_ARGO_SOURCE:source} %{AKA_GREEDYMULTILINE:message}
~~~
{: .language-bash}
To use Grok you will need to install the *fluent-plugin-grok-parser* and then you can use grok patterns with any of the other techniques previously described with regex: Multiline, Multi-format.