Skip to content

creating a new table

theopolis edited this page Oct 28, 2014 · 28 revisions

Introduction

The core of osquery is a SQL language where tables represent abstract operating system concepts. osquery provides a simple API for creating new tables. Any new table you write can be used in conjunction with existing tables via sub-queries, joins, etc. This allows for a rich data exploration experience.

Perhaps you want to expose some information about a part of the operating system which isn't currently implemented by osquery. Perhaps you want to use osquery to query something proprietary and internal. All of these use-cases are supported and more, using osquery's table API.

Creating your own table

This guide is going to take you through creating a new, very simple osquery table. We'll show you how to get all the knobs turning and leave the creative programming as an exercise for the reader.

The table that we're going to be implementing is going to be a "time" table. The table will have one row and that row will have three columns:

  • hour
  • minute
  • second

The values of the columns will be determined by the current time, which will be dynamically computed at query time.

Declare the schema for your table

Under the hood, osquery uses a set of lower level libraries from SQLite core to create what SQLite calls "virtual tables". The API for creating virtual tables is very complex. In order to make creating new tables as easy as possible, osquery comes with a set of tools to make this easier.

Instead of writing low-level C code which consumes SQLite APIs, you can write a simple table declaration. osquery developers call these files "table specs", short for "table specifications". For our time table, our spec would look like this:

# use the table_name function to define what the name of
# your table is
table_name("time")

# define your schema using the schema function, which 
# accepts a list of Column instances
schema([
    # each column can be creating inline for maximum
    # readability. declare the name of your column
    # as well as the type of the column. Currently
    # supported options are "std::string and "int"
    Column(name="hour", type="int"),
    Column(name="minutes", type="int"),
    Column(name="seconds", type="int"),
])

# use the implementation function to declare where in
# osquery codebase your table is implemented. the string
# that you pass to this function is made up of two bits
# which are separated by an @ symbol. the first bit is
# the name of the implementation file and the second bit
# is the name of the function which implements the table.
#
# the general pattern here is:
#  "{table_name}@gen{TableName}"
implementation("time@genTime")

Feel free to leave the comments out in your production spec. The function names are pretty intuitive.

As an aside, you may be thinking that the syntax used for declaring the schema of tables is very similar to Python. Well, that's because it is! The build process actually parses the spec files as if they were Python code and meta-programs necessary C/C++ implementation files.

Although it's possible to get fancy here and try to use inheritance for Column objects, use loops in your table spec, etc. please don't.

If you have a great idea for a new virtual table or would like inspiration, check out the virtual tables label in the osquery issues

Where do I put the spec?

You may be wondering how osquery handles cross-platform support while still allowing operating-system specific tables. The osquery build process takes care of this by only generating the relevant code based on a directory structure convention.

Creating your implementation

As indicated in the spec file, our implementation will be a function called genTime in the file osquery/tables/utility/time.cpp. Let's go ahead and create that file with the following content:

// Copyright 2004-present Facebook. All Rights Reserved.

#include <ctime>
#include <boost/lexical_cast.hpp>
#include "osquery/database.h"

using std::string;
using boost::lexical_cast;

namespace osquery {
namespace tables {

QueryData genTime() {
  Row r;
  QueryData results;

  time_t _time = time(0);
  struct tm* now = localtime(&_time);

  r["hour"] = lexical_cast<string>(now->tm_hour);
  r["minutes"] = lexical_cast<string>(now->tm_min);
  r["seconds"] = lexical_cast<string>(now->tm_sec);

  results.push_back(r);
  return results;
}
}
}

Let's go through this code example's key points.

  • Your implementation function should be in the osquery::tables namespace.
  • Your implementation function should accept no parameters and return an instance of QueryData

What's a QueryData and Row?

Data types like QueryData, Row, DiffResults, etc. are osquery's built-in data result types. They're all defined in osquery/database/results.h.

Row is just a typedef for a std::map<std::string, std::string>. That's it. A row of data is just a mapping of strings that represent column names to strings that represent column values. Note that, currently, even if your SQL table type is an int and not a std::string, we need to cast the ints as strings to comply with the type definition of the Row object. They'll be casted back to int's later.

QueryData is just a typedef for a std::vector<Row>. Query data is just a list of rows. Simple enough.

To populate the data that will be returned to the user at runtime, your implementation function must generate the data that you'd like to display and populate a QueryData map with the appropriate Rows. Then, just return the QueryData.

In our case, we used system APIs to create a struct of type tm which has fields such as tm_hour, tm_min and tm_sec which represent the current time. We can then simply create our three entires in our Row variable: hour, minutes and seconds. Then we push that single row onto the QueryData variable and return it. Note that if we wanted our table to have many rows (a more common use-case), we would just push back more Row maps onto results.

Building your code

If you've created a new file, then you need to make sure that CMake builds your code. Open osquery/tables/CMakeLists.txt. Find the line that defines the library osquery_tables and add your file, utility/time.cpp to the sources which are compiled by that library.

If your table only works on OS X, find the target called osquery_tables_darwin and add your file to that list of sources instead. If your table only works on Linux, find the target called osquery_tables_linux and add your implementation file to that list of sources.

Return to the root of the repository and execute make. This will generate the appropriate code and link everything together properly.

Testing your table

If your code compiled properly, launch the interactive query console by executing ./build/osquery/osqueryi and try issuing your table a command: SELECT * FROM time;.

Getting your query ready for use in osqueryd

You don't have to do anything to make your query work in the osqueryd daemon. All osquery queries work in osqueryd. It's worth noting, however, that osqueryd is a long-running process. If your new table leaks memory or uses a lot of systems resources, you will notice poor performance from osqueryd. For more information on ensuring a performant table, see the performance overview.

When in doubt, use the existing open source tables to guide your development.

Clone this wiki locally