clickhouseimporter

Purpose

The purpose of this tool is to parse log files generated by the Percona Audit Plugin for Percona Server and import them into a clickhouse database.

Use at your own risk.

A brief story

I needed to perform audit logging on several Percona servers. The Percona Audit Plugin does exactly that, but while generating the log is easy, doing a the acual audit -- not so much. When I saw the quite mindblowing performance of Clickhouse when searching through huge amount of data, I decided to give it a try, and import all my logs into a Clickhouse database. I looked around for a tool to handle the import, but couldn't find a perfect match. I found a couple of projects, but they didn't handle importing logs from different hosts into the same database, and because they were built for handling logs in general, they didn't perform quite as well as I'd like. So, to make a long story short, I made my own importer. It works for me, and perhaps it can work for you too.

Prerequisites

The importer does not interface with Percona Server itself in any way. It will only handle the log files you feed it. This means that you have to have your Percona Server up and running, and configured for Audit logs. Please read the documentation for configuring the Audit Plugin.
The importer connects to Clickhouse like a normal client, so you have to set up a working Clickhouse server in advance and provide the necessary credentials (see config file below).

A few other things that are nice to know before starting:

The parser itself is pretty CPU intensive, so its not recommended to use it on a production Percona server. Instead consider copying the log file to another host and run the parser from there.
The current importer only support the JSON format. If you need to use other formats (like CSV or XML), you need to adapt the importer to support this.
Also you'll have to have a Go environment to compile the source into a binary.
The parser currently only supports log files in the UTC time zone (timestamps ending in "UTC" or "Z").

The code base

If you have a look at the code (and I urge you to do so), you'll see it consists of three .go files:

main.go: The main file doing the actual parsing.
subfunctions.go: The definition of some simple functions used in the main.
structs.go: The JSON and configuration parameters are loaded into structs. These are defined in this file.

Installation

Building the binary

Download the code to your GO environment and build it with go build.

Move the binary to a location of your choice, and you're ready to go.

Download the binary

Download the binary file (clickhouseimporter).

Move the binary to a location of your choice, and you're ready to go.

Setting up the database on clickhouse

The following commands will create a database called "audit" with one table called "mysql_audit_log" from the SQL files included in the repository:

[user@host]# clickhouse-client --multiline < createdatabase.sql
[user@host]# clickhouse-client --multiline < createtable.sql

Config file

The configuration is defined in a YAML file, similar to this:

# Clickhouse credentials
database:
  user: "username"
  pass: "password"
  hostname: "localhost"  # or the hostname of the clickhouse server
  port: 9000
  dbname: "audit"
  tablename: "mysql_audit_log"
#MISC
misc:
  batchsize: 20000  # the number of lines parsed before committing to clickhouse

Database and table

The name of the database and table can be changed in the createdatabase.sql and createtable.sql files. Remember to change the names accordingly in the config file. Changing the structure of the table itself is not recommended.

The structure of the table is almost identical to that of the JSON file, except for the fields os_login and proxy_user. In all the samples I've examined, these fields have always been empty, so for simplicity's sake, they have been left out. Other than that, the structure of the table is like this:

CREATE TABLE audit.mysql_audit_log (
 `time` DateTime,
 `date` Date DEFAULT toDate(time),
 `name` String,
 `record` String,
 `command_class` String,
 `connection_id` String,
 `status` UInt32,
 `sqltext` String,
 `user` LowCardinality(String),
 `host` LowCardinality(String),
 `os_user` String,
 `ip` String,
 `db` LowCardinality(String),
 `dbserver` LowCardinality(String)
 ) ENGINE = MergeTree() PARTITION BY toDate(time) ORDER BY time SETTINGS index_granularity = 8192;

The table is partitioned by date to make cleanup easy.

Running the importer

./clickhouseimporter -d database-hostname -a file-to-parse -c path/to/yaml-config-file

Example

I can't guarantee that this example will work out of the box in your environment, but this is more or less a copy of what I ended up doing in my own environment. This is an example of the clickhouseimporter processing a directory having subdirectories with incomming files from four different servers: server1-4. Obviously you have to copy the log files from the Percona server(s) to the importer host prior to running it.

The contents of the /data/audit/ingress/ folders looks something like this:

/data/audit/ingress/
|-- server1
|   -- audit.log-20200616193501
|-- server2
|   -- audit.log-20200616193501
|-- server3
|   -- audit.log-20200616193001
|   -- audit.log-20200616193501
|   -- audit.log-20200616194001
|-- server4
|   -- audit.log-20200616193501

The bash script below will find all folders in /data/audit/ingress/ and scan them individually for files named audit*. Each file will then be parsed line by line, imported into clickhouse, and deleted afterwards. If you don't want your original files to be deleted, remove or comment the rm -f $INCOMINGDIR/$DBHOST/$file line.

Note that the name of the folder is assumed to be the hostname from where the logfile originated. This name becomes to value of the dbserver column.

#!/bin/bash

shopt -s nullglob

INCOMINGDIR="/data/audit/ingress"
CONFIGFILE="/data/audit/conf.yml"

cd $INCOMINGDIR; for DBHOST in *; do

cd $INCOMINGDIR/$DBHOST
if [ -z audit* ]; then
	echo "no new files for $DBHOST"
else
	for file in audit* ;

	do
		echo "doing $INCOMINGDIR/$DBHOST/$file"
		if ! /data/audit/clickhouseimporter -d $DBHOST -a $INCOMINGDIR/$DBHOST/$file -c $CONFIGFILE; then
			echo "Could not complete task"
			exit
		fi
		rm -f $INCOMINGDIR/$DBHOST/$file
	done
fi
done

Expected performance

The performance of this importer will depend on almost every factor in the equation.

The batchsize configured in the config file (read clickhouse documentation for best performance)
The CPU on the host
The IO capacity of the storage system
The size of the file and the length of the queries within it.
Probably more factors in your environment.

Tested on a VMWare VM with two CPU cores, a SAN for storage, and a batchsize of 20000, the parser was able to process roughly 60.000 lines of JSON per second with a CPU load of roughly 70-100% on one core. The Clickhouse server is using 15-30% of the other core. If your setup require more than this you might consider another aproach. Running the importer in parallel with a higher number of CPU cores should be posible, but has not been tested.

Expected storage consumption

Disk usage depends on how much activity you have on your Percona servers.

In my particular case I log everything (audit_log_policy set to ALL), and after running this for a couple of months, I have accumulated 18 billion rows using roughly 870 GB of disk space. Even with this amount of data, Clickhouse is still going through it like a hot knife through butter.

Examples queries analyzing the data:

Here we count the number of queries per server to see how the activity is distributed across the servers:

SELECT
    dbserver,
    count()
FROM mysql_audit_log
GROUP BY dbserver

┌─dbserver──────────────────────┬───count()─┐
│ server1                       │  48218977 │
│ server2                       │   7851352 │
│ server3                       │  48069978 │
│ server4                       │    780280 │
│ server5                       │ 150245687 │
│ server6                       │       349 │
└───────────────────────────────┴───────────┘

6 rows in set. Elapsed: 1.138 sec. Processed 255.17 million rows, 1.28 GB (224.19 million rows/s., 1.12 GB/s.)

clickhouse-1.stofa.net :)

Count the number of failed queries

SELECT
    status,
    count()
FROM mysql_audit_log
WHERE (status != 0)
GROUP BY status

┌─status─┬─count()─┐
│   1062 │     219 │
│   1146 │      44 │
│   1045 │       5 │
│   1226 │     191 │
│   1364 │      29 │
│   1406 │       1 │
│   1142 │   42139 │
│   1156 │     152 │
│   1094 │       2 │
│   1366 │      12 │
│   1048 │       7 │
│   1242 │      31 │
│   1292 │      38 │
│   1158 │       1 │
│   1064 │    1261 │
│   1356 │       2 │
│   1265 │       2 │
└────────┴─────────┘

17 rows in set. Elapsed: 0.290 sec. Processed 256.85 million rows, 1.26 GB (887.05 million rows/s., 4.34 GB/s.)

clickhouse-1.stofa.net :)

You can now start digging into the different status codes and look for failed queries. You can identify if someone is trying to brute force their way into the database. Frankly, the opportunities are endless once you start combing through the data. You can combine various search parameters in many different ways in search of the elusive needle in a haystack. You're only limited by your imagination. Have fun!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

clickhouseimporter

Purpose

A brief story

Prerequisites

The code base

Installation

Building the binary

Download the binary

Setting up the database on clickhouse

Config file

Database and table

Running the importer

Example

Expected performance

Expected storage consumption

Examples queries analyzing the data:

Count the number of failed queries

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
clickhouseimporter		clickhouseimporter
createdatabase.sql		createdatabase.sql
createtable.sql		createtable.sql
main.go		main.go
structs.go		structs.go
subfunctions.go		subfunctions.go

License

LennyAndersen/clickhouse-audit-import

Folders and files

Latest commit

History

Repository files navigation

clickhouseimporter

Purpose

A brief story

Prerequisites

The code base

Installation

Building the binary

Download the binary

Setting up the database on clickhouse

Config file

Database and table

Running the importer

Example

Expected performance

Expected storage consumption

Examples queries analyzing the data:

Count the number of failed queries

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages