-
Notifications
You must be signed in to change notification settings - Fork 0
creating a new table
The core of osquery is a SQL language where tables represent abstract operating system concepts. osquery provides a simple API for creating new tables. Any new table you write can be used in conjunction with existing tables via sub-queries, joins, etc. This allows for a rich data exploration experience.
Perhaps you want to expose some information about a part of the operating system which isn't currently implemented by osquery. Perhaps you want to use osquery to query something proprietary and internal. All of these use-cases are supported and more, using osquery's table API.
This guide is going to take you through creating a new, very simple osquery table. We'll show you how to get all the knobs turning and leave the creative programming as an exercise for the reader.
The table that we're going to be implementing is going to be a "time" table. The table will have one row and that row will have three columns:
- hour
- minute
- second
The values of the columns will be determined by the current time, which will be dynamically computed at query time.
Under the hood, osquery uses a set of lower level libraries from SQLite core to create what SQLite calls "virtual tables". The API for creating virtual tables is very complex. In order to make creating new tables as easy as possible, osquery comes with a set of tools to make this easier.
Instead of writing low-level C code which consumes SQLite APIs, you can write a simple table declaration. osquery developers call these files "table specs", short for "table specifications". For our time table, our spec would look like this:
# use the table_name function to define what the name of
# your table is
table_name("time")
# define your schema using the schema function, which
# accepts a list of Column instances
schema([
# each column can be creating inline for maximum
# readability. declare the name of your column
# as well as the type of the column. Currently
# supported options are "std::string and "int"
Column(name="hour", type="int"),
Column(name="minutes", type="int"),
Column(name="seconds", type="int"),
])
# use the implementation function to declare where in
# osquery codebase your table is implemented. the string
# that you pass to this function is made up of two parts
# separated by an @ symbol. the first part is the name
# of the implementation file and the second part is the
# name of the function that implements the table.
#
# the general pattern here is:
# "{table_name}@gen{TableName}"
implementation("time@genTime")Feel free to leave the comments out in your production spec. The function names are pretty intuitive.
As an aside, you may be thinking that the syntax used for declaring the schema of tables is very similar to Python. Well, that's because it is! The build process actually parses the spec files as if they were Python code and meta-programs necessary C/C++ implementation files.
Although it's possible to get fancy here and try to use inheritance for Column objects, use loops in your table spec, etc. please don't.
If you have a great idea for a new virtual table or would like inspiration, check out the virtual tables label in the osquery issues
You may be wondering how osquery handles cross-platform support while still allowing operating-system specific tables. The osquery build process takes care of this by only generating the relevant code based on a directory structure convention.
- Cross-platform table specs are stored in
osquery/tables/specs/x/ - Mac OS X specific table specs are stored in
osquery/tables/specs/darwin/ - Linux specific table specs are stored in
osquery/tables/specs/linux/
As indicated in the spec file, our implementation will be a function called genTime in the file osquery/tables/utility/time.cpp. Let's go ahead and create that file with the following content:
// Copyright 2004-present Facebook. All Rights Reserved.
#include <ctime>
#include <boost/lexical_cast.hpp>
#include "osquery/database.h"
using std::string;
using boost::lexical_cast;
namespace osquery {
namespace tables {
QueryData genTime() {
Row r;
QueryData results;
time_t _time = time(0);
struct tm* now = localtime(&_time);
r["hour"] = lexical_cast<string>(now->tm_hour);
r["minutes"] = lexical_cast<string>(now->tm_min);
r["seconds"] = lexical_cast<string>(now->tm_sec);
results.push_back(r);
return results;
}
}
}Let's go through this code example's key points.
- Your implementation function should be in the
osquery::tablesnamespace. - Your implementation function should accept no parameters and return an instance of
QueryData
Data types like QueryData, Row, DiffResults, etc. are osquery's built-in data result types. They're all defined in osquery/database/results.h.
Row is just a typedef for a std::map<std::string, std::string>. That's it. A row of data is just a mapping of strings that represent column names to strings that represent column values. Note that, currently, even if your SQL table type is an int and not a std::string, we need to cast the ints as strings to comply with the type definition of the Row object. They'll be casted back to int's later.
QueryData is just a typedef for a std::vector<Row>. Query data is just a list of rows. Simple enough.
To populate the data that will be returned to the user at runtime, your implementation function must generate the data that you'd like to display and populate a QueryData map with the appropriate Rows. Then, just return the QueryData.
In our case, we used system APIs to create a struct of type tm which has fields such as tm_hour, tm_min and tm_sec which represent the current time. We can then simply create our three entires in our Row variable: hour, minutes and seconds. Then we push that single row onto the QueryData variable and return it. Note that if we wanted our table to have many rows (a more common use-case), we would just push back more Row maps onto results.
If you've created a new file, then you need to make sure that CMake builds your code. Open osquery/tables/CMakeLists.txt. Find the line that defines the library osquery_tables and add your file, utility/time.cpp to the sources which are compiled by that library.
If your table only works on OS X, find the target called osquery_tables_darwin and add your file to that list of sources instead. If your table only works on Linux, find the target called osquery_tables_linux and add your implementation file to that list of sources.
Return to the root of the repository and execute make. This will generate the appropriate code and link everything together properly.
If your code compiled properly, launch the interactive query console by executing ./build/[darwin|linux]/osquery/osqueryi and try issuing your table a command: SELECT * FROM time;.
You don't have to do anything to make your query work in the osqueryd daemon. All osquery queries work in osqueryd. It's worth noting, however, that osqueryd is a long-running process. If your new table leaks memory or uses a lot of systems resources, you will notice poor performance from osqueryd. For more information on ensuring a performant table, see the performance overview.
When in doubt, use the existing open source tables to guide your development.