Logs: Finding Structure in the Chaos

Did you ever get overwhelmed trying to understand all different log message of an application? In this post I’d like to share how to analyze application logs combining different open source projects to produce an appealing dashboard summarizing all information.

We will use a github repository with a collection of algorithms to parse logs, logparser, plotly to create interactive plots, and pandas to transform data.

For starters, we will by briefly explain the dependencies and datasets used, and later we show how to obtain a dashboard like the following.

Plot Results

Let us get started!

Dependencies

Logparser

The first and main dependency is the repository logparser which consists of a collection of algorithms to parse logs. Furthermore, the repository includes articles explaining each algorithm to great extent.

One of the algorithm that works quite well is Spell and we will use it throughout the post. Spell works well in most scenarios since it is fast, requires no pre-defined log templates or regular expressions, and parses logs in an online streaming fashion, so it is able to start parsing logs immediately.

The accuracy of Spell is also supported by multiple benchmarks collected in logparser and its performance is often one of the best amongst the options tested.

This algorithm processes a stream of logs by associating each incoming log to the element from a growing list of log templates that shares the longest common subsequence with it. If the incoming log is very different from the current log templates, then a new log template is added to the list. The parameter tau can also be tweaked to manage the number of log templates.

Remark: In this blog we added some small modifications to the code provided in the logparser repository since that was developed with Windows machines in mind and it behaves slightly differently in our machines that run macOS.

Example

As a short example, we can see what happens as the following logs are processed using the Spell algorithm.

The ducks waddled in the park.
The ducks walkded in the park.
The ducks flew in the air.
The cat chased birds in the Spark.

After the first two lines are read, they are associated with the same log template as shown below. Note that the positions where the log lines differ are omited from the log template and the corresponding tokens are considered parameters.

Log Template	Parameter List
The ducks <*> in the park.	[‘waddled’]
The ducks <*> in the park.	[‘walkded’]

Then, we note that the third line still has a significant substring in common with the previous log template, so it is grouped with the previous log messages. Finally, the last line does have a substring longer than half of its length in common with the log template, so it originates a new log template.

Log Template	Parameter List
The ducks <> in the <>.	[‘waddled’, park]
The ducks <> in the <>.	[‘walkded’, park]
The ducks <> in the <>.	[‘flew’ , air]
The cat chased birds in the park.	[]

Cray Data

We use log data shared by usenix, specifically the log dataset corresponding to Cray Supercomputer. This file has close to 250K log lines.

Here is a sample of the logs.

[Tue Sep  9 20:18:34 2008]:

Boot Node Daemon starting at Tue Sep  9 20:18:34 2008.
[Tue Sep  9 20:18:34 2008]:Connecting to SMW '10.3.1.1'
[Tue Sep  9 20:18:34 2008]:done
[Tue Sep  9 20:18:34 2008]:topo class is RS_TOPO_CLASS_1
[Tue Sep  9 20:18:38 2008]:Event:Tue Sep  9 20:18:40 2008|ec_hsn_load_req Transfer a kernel image over HSN to a node|src:1b:3:s0||svc: svid=:9:s0:9:s0 nid=c0-0c0s0n3  bparams='' kpath='/tmp/boot/kernel.cpio.2.1.31HD-13426' flags=0x3
[Tue Sep  9 20:18:38 2008]:
***** HSN Booting 1 nodes at Tue Sep  9 20:18:38 2008
[Tue Sep  9 20:18:38 2008]:number of interfaces is 8
[Tue Sep  9 20:18:38 2008]:initialized portals event queue
[Tue Sep  9 20:18:41 2008]:loading index-file /tmp/boot_9lDixW/SNL0.load
[Tue Sep  9 20:18:41 2008]:Set comp mdh[0] to 0x0
[Tue Sep  9 20:18:41 2008]:Corresponding md is the following:
[Tue Sep  9 20:18:41 2008]:start  is 0x2b20e175b010

As you can see, each log line start with a timestamp and is followed by a message.

Dashboard

Putting together all the pieces mentioned above we can obtain the following dashboard. You can also produce the same plot using this notebook.

Plot Results

Aside from grouping similar logs, we can also see their timestamps and infer whether the log messages were sporadic or recurrent.

Parsing logs

After downloading the logs files, the following code snippet injests the log messages and producces two CSV files, one listing all the log templates and another listing all log messages with their corresponding log templates and parameters.

import pandas as pd
from logparser.Spell import Spell

t = "data/0809181018"
parser = Spell.LogParser(log_format="\[<Timestamp>\]:<Content>", indir=t, outdir=t, tau=0.5)
parser.parse("bnd.log")

As an example, here is part of the output file that includes the log messages.

LineId	Timestamp	Content	EventId	EventTemplate	ParameterList
11	Tue Sep 9 20:18:41 2008	Corresponding md is the following:	1cbf7853	Corresponding md is the following	[]
12	Tue Sep 9 20:18:41 2008	start is 0x2b20e175b010	365695dc	start is <*>	[‘0x2b20e175b010’]
13	Tue Sep 9 20:18:41 2008	length is 0x3ffb28	f3cb5b83	length is <*>	[‘0x3ffb28’]
14	Tue Sep 9 20:18:41 2008	user_ptr is 0x3e9	7d7147f1	user_ptr is <*>	[‘0x3e9’]
15	Tue Sep 9 20:18:41 2008	Set comp mdh[1] to 0x1	2bab36e2	Set comp <> to <>	“[‘mdh[1]’, ‘0x1’]”

Timestamps

In order to convert the Timestamp column into an actual time we use regular expressions to extract the year, month, day and time from the the existing field.

from calendar import math_abbr

df.assign(
    Timestamp=lambda x0:
    x0["Timestamp"]
    .str.extract(r"\S+\s+(?P<m>\S+)\s+(?P<d>\S+)\s+(?P<t>\S+)\s+(?P<y>\S+)")
    .replace(dict(zip(month_abbr, range(13))))
    .assign(m=lambda x: x["m"].astype(str).str.rjust(2, "0"))
    .assign(d=lambda x: x["d"].astype(str).str.rjust(2, "0"))
    .pipe(lambda x: pd.to_datetime(x["y"] + "-" + x["m"] + "-" + x["d"] + " " + x["t"])))

Plot Result

We use the plotly.express module to concisely create the histogram with the different log templates. Note that we plot time in the x-axis, and we position the legend below the graph, instead of next to it.

import plotly.express as px

df = pd.read_csv(next(Path(t).glob("*_structured.csv")))

(
    px.histogram(
        df
        .assign(
            Timestamp=lambda x0:
            x0["Timestamp"]
            .str.extract(r"\S+\s+(?P<m>\S+)\s+(?P<d>\S+)\s+(?P<t>\S+)\s+(?P<y>\S+)")
            .replace(dict(zip(month_abbr, range(13))))
            .assign(m=lambda x: x["m"].astype(str).str.rjust(2, "0"))
            .assign(d=lambda x: x["d"].astype(str).str.rjust(2, "0"))
            .pipe(lambda x: pd.to_datetime(x["y"] + "-" + x["m"] + "-" + x["d"] + " " + x["t"])))
        [["Timestamp", "EventTemplate"]],
        x="Timestamp",
        color="EventTemplate",
        nbins=288)
    .update_layout(height=500, legend=dict(y=-0.5, orientation="h"), title="logs")
    .show()
)

Once again, here is the final result.

Plot Results

Again, the dashboard can be produced using this notebook.

Conclusion

Throughout this post we describe some basic building blocks to produce insights from log messages.

I hope you enjoyed it!

Dependencies¶

Logparser¶

Example¶

Cray Data¶

Dashboard¶

Parsing logs¶

Timestamps¶

Plot Result¶

Conclusion¶

Dependencies

Logparser

Example

Cray Data

Dashboard

Parsing logs

Timestamps

Plot Result

Conclusion