added language IAL for syntax highlighting

This commit is contained in:
dutzu 2016-02-04 12:38:41 +02:00
parent 02f45f662f
commit 91628ac3bb
1 changed files with 14 additions and 13 deletions

View File

@ -83,11 +83,12 @@ A message sent to Elasticsearch from fluentd would contain these values:
*-this isn't the exact message, this is the result of the stdout output plugin-* *-this isn't the exact message, this is the result of the stdout output plugin-*
~~~ java ~~~
2015-11-12 06:34:01 -0800 tag.common: {"message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO ==== Request ===","time_as_string":"2015-11-12 06:34:01 -0800"} 2015-11-12 06:34:01 -0800 tag.common: {"message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO ==== Request ===","time_as_string":"2015-11-12 06:34:01 -0800"}
2015-11-12 06:34:01 -0800 tag.common: {"message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO GET /monitor/broker/ HTTP/1.1\n","time_as_string":"2015-11-12 06:34:01 -0800"} 2015-11-12 06:34:01 -0800 tag.common: {"message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO GET /monitor/broker/ HTTP/1.1\n","time_as_string":"2015-11-12 06:34:01 -0800"}
~~~ ~~~
{: .language-java}
I added the `time_as_string` field in there just so you can see the literal string that is sent as the time value. I added the `time_as_string` field in there just so you can see the literal string that is sent as the time value.
@ -99,7 +100,7 @@ In order to build it yourself you only need the `record_transformer` filter that
Next you need to parse the timestamp of your logs into separate date, time and millisecond components (which is basically what the better-timestamp plugin asks you to do, to some extent), and then to create a filter that would match all the messages you will send to Elasticsearch and to create the `@timestamp` value by appending the 3 components. This makes use of the fact that fluentd also allows you to run ruby code within your record_transformer filters to accommodate for more special log manipulation tasks. Next you need to parse the timestamp of your logs into separate date, time and millisecond components (which is basically what the better-timestamp plugin asks you to do, to some extent), and then to create a filter that would match all the messages you will send to Elasticsearch and to create the `@timestamp` value by appending the 3 components. This makes use of the fact that fluentd also allows you to run ruby code within your record_transformer filters to accommodate for more special log manipulation tasks.
~~~xml ~~~
<filter tag.**> <filter tag.**>
type record_transformer type record_transformer
enable_ruby true enable_ruby true
@ -108,15 +109,15 @@ Next you need to parse the timestamp of your logs into separate date, time and m
</record> </record>
</filter> </filter>
~~~ ~~~
{: .language-xml}
The result is that the above sample will come out like this: The result is that the above sample will come out like this:
~~~java ~~~
2015-12-12 05:26:15 -0800 akai.common: {"date_string":"2015-11-12","time_string":"06:34:01","msec":"471","message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO ==== Request ===","@timestamp":"2015-11-12T06:34:01.471Z"} 2015-12-12 05:26:15 -0800 akai.common: {"date_string":"2015-11-12","time_string":"06:34:01","msec":"471","message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO ==== Request ===","@timestamp":"2015-11-12T06:34:01.471Z"}
2015-12-12 05:26:15 -0800 akai.common: {"date_string":"2015-11-12","time_string":"06:34:01","msec":"473","message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO GET /monitor/broker/ HTTP/1.1\n","@timestamp":"2015-11-12T06:34:01.473Z"} 2015-12-12 05:26:15 -0800 akai.common: {"date_string":"2015-11-12","time_string":"06:34:01","msec":"473","message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO GET /monitor/broker/ HTTP/1.1\n","@timestamp":"2015-11-12T06:34:01.473Z"}
~~~ ~~~
{: .language-java}
*__Note__: you can use the same record_transformer filter to remove the 3 separate time components after creating the `@timestamp` field via the `remove_keys` option.* *__Note__: you can use the same record_transformer filter to remove the 3 separate time components after creating the `@timestamp` field via the `remove_keys` option.*
### Do not analyse ### Do not analyse
@ -137,7 +138,7 @@ For instance, by using the record_transformer I would send the hostname and also
Using this example configuration I tried to create a pie chart showing the number of messages per project for a dashboard. Here is what I got. Using this example configuration I tried to create a pie chart showing the number of messages per project for a dashboard. Here is what I got.
~~~ xml ~~~
<filter tag.**> <filter tag.**>
type record_transformer type record_transformer
enable_ruby true enable_ruby true
@ -147,14 +148,14 @@ Using this example configuration I tried to create a pie chart showing the numbe
</record> </record>
</filter> </filter>
~~~ ~~~
{: .language-xml}
Sample output from stdout: Sample output from stdout:
~~~ java ~~~
2015-12-12 06:01:35 -0800 clear: {"date_string":"2015-10-15","time_string":"06:37:32","msec":"415","message":"[amelJettyClient(0xdc64419)-706] jetty:test/test INFO totallyAnonymousContent: http://whyAreYouReadingThis?:)/history/3374425?limit=1","@timestamp":"2015-10-15T06:37:32.415Z","sourceProject":"Test-Analyzed-Field"} 2015-12-12 06:01:35 -0800 clear: {"date_string":"2015-10-15","time_string":"06:37:32","msec":"415","message":"[amelJettyClient(0xdc64419)-706] jetty:test/test INFO totallyAnonymousContent: http://whyAreYouReadingThis?:)/history/3374425?limit=1","@timestamp":"2015-10-15T06:37:32.415Z","sourceProject":"Test-Analyzed-Field"}
~~~ ~~~
{: .language-java}
And here is the result of trying to use it in a visualization: And here is the result of trying to use it in a visualization:
{:.center} {:.center}
@ -170,17 +171,17 @@ And the solution is: When Elasticsearch creates a new index, it will rely on the
And what you basically need to do is to do a curl put with that json content to ES and then all the indices created that are prefixed with `logstash-*` will use that template. Be aware that with the fluent-plugin-elasticsearch you can specify your own index prefix so make sure to adjust the template to match your prefix: And what you basically need to do is to do a curl put with that json content to ES and then all the indices created that are prefixed with `logstash-*` will use that template. Be aware that with the fluent-plugin-elasticsearch you can specify your own index prefix so make sure to adjust the template to match your prefix:
~~~ java ~~~
curl -XPUT localhost:9200/_template/template_doru -d '{ curl -XPUT localhost:9200/_template/template_doru -d '{
"template" : "logstash-*", "template" : "logstash-*",
"settings" : {.... "settings" : {....
}' }'
~~~ ~~~
{: .language-bash}
The main thing to note in the whole template is this section: The main thing to note in the whole template is this section:
~~~ json ~~~
"string_fields" : { "string_fields" : {
"match" : "*", "match" : "*",
"match_mapping_type" : "string", "match_mapping_type" : "string",
@ -193,7 +194,7 @@ The main thing to note in the whole template is this section:
} }
} }
~~~ ~~~
{: .language-json}
This tells Elasticsearch that for any field of type string that it receives it should create a mapping of type string that is analyzed + another field that adds a `.raw` suffix that will not be analyzed. This tells Elasticsearch that for any field of type string that it receives it should create a mapping of type string that is analyzed + another field that adds a `.raw` suffix that will not be analyzed.
The `not_analyzed` suffixed field is the one you can safely use in visualizations, but do keep in mind that this creates the scenario mentioned before where you can have up to 40% inflation in storage requirements because you will have both analyzed and not_analyzed fields in store. The `not_analyzed` suffixed field is the one you can safely use in visualizations, but do keep in mind that this creates the scenario mentioned before where you can have up to 40% inflation in storage requirements because you will have both analyzed and not_analyzed fields in store.