diff --git a/_posts/2016-01-18-Fluentd-log-parsing b/_posts/2016-01-18-Fluentd-log-parsing new file mode 100644 index 0000000..d3c7428 --- /dev/null +++ b/_posts/2016-01-18-Fluentd-log-parsing @@ -0,0 +1,220 @@ +--- +title: Fluentd log parsing +subtite: Approaches to log parsing +description: Description of a couple of approaches to designing your fluentd configuration. +category: howto +tags: [fluentd, logs] +author: Doru Mihai +author_email: doru.mihai@haufe-lexware.com +--- + +# Approaches to log parsing + +When you will start to deploy your log shippers to more and more systems you will encounter the issue of adapting your solution to be able to parse whatever log format and source each system is using. Luckily, fluentd has a lot of plugins and you can approach a problem of parsing a log file in different ways. + + +The main reason you may want to parse a log file and not just pass along the contents is that when you have multi-line log messages that you would want to transfer as a single element rather than split up in an incoherent sequence. + + +Another reason would be log files that contain multiple log formats that you would want to parse into a common data structure for easy processing. +Below I will enumerate a couple of strategies that can be applied for parsing logs. + +And last but not least, there is the case that you have multiple log sources (perhaps each using a different technology) and you want to parse them and aggregate all information to a common data structure for coherent analysis and visualization of the data. + +## One Regex to rule them all +The simplest approach is to just parse all messages using the common denominator. This will lead to a very black-box type approach to your messages deferring any parsing efforts to a later time or to another component further downstream. + + +In the case of a typical log file a configuration can be something like this (but not necessarily): + + ~~~xml + + type tail + path /var/log/test.log + read_from_head true + tag test.unprocessed + format multiline + format_firstline /\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{3}/ + #we go with the most generic pattern where we know a message will have + #a timestamp in front of it, the rest is just stored in the field 'message' + format1 /(?