-
Notifications
You must be signed in to change notification settings - Fork 0
creating a new table
SQL tables are used to represent abstract operating system concepts, like running processes.
A table can be used in conjunction with other tables via operations like sub-queries and joins. This allows for a rich data exploration experience.
While osquery ships with a default set of tables, osquery provides an API that allows you to create new tables. This allows you to add to the existing capabilities of the product.
You can explore existing tables [here] (https://github.com/facebook/osquery/tree/master/osquery/tables)
Tables that are up for grabs in terms of development can be found [here] (https://github.com/facebook/osquery/issues?q=is%3Aopen+is%3Aissue+label%3A%22virtual+tables%22)
Lets walk through an exercise where we build a 'time' table.
The table will have one row, and that row will have three columns:
- hour
- minute
- second
Column values will be dynamically computed at query time.
Under the hood, osquery uses libraries from SQLite core to create "virtual tables". The default API for creating virtual tables is relatively complex.
osquery has abstracted this complexity away, allowing you to write a simple table declaration.
For our time exercise, it would look like the following:
# use the table_name function to define what the name of
# your table is
table_name("time")
# define your schema using the schema function, which
# accepts a list of Column instances
schema([
# each column can be creating inline for maximum
# readability. declare the name of your column
# as well as the type of the column. Currently
# supported options are INTEGER, BIGINT, TEXT,
# DATE and DATETIME
Column("hour", INTEGER),
Column("minutes", INTEGER),
Column("seconds", INTEGER),
])
# use the implementation function to declare where in
# osquery codebase your table is implemented. the string
# that you pass to this function is made up of two parts
# separated by an @ symbol. the first part is the name
# of the implementation file and the second part is the
# name of the function that implements the table.
#
# the general pattern here is:
# "{table_name}@gen{TableName}"
implementation("time@genTime")You can leave the comments out in your production spec. Shoot for simplicity, do not go "hard in the paint" and do things like inheritance for Column objects, loops in your table spec, etc.
You might wonder "this syntax looks similar to Python?". Well, it is! The build process actually parses the spec files as Python code and meta-programs necessary C/C++ implementation files.
You may be wondering how osquery handles cross-platform support while still allowing operating-system specific tables. The osquery build process takes care of this by only generating the relevant code based on a directory structure convention.
- Cross-platform table specs are stored in
osquery/tables/specs/x/ - Mac OS X specific table specs are stored in
osquery/tables/specs/darwin/ - Linux specific table specs are stored in
osquery/tables/specs/linux/
As indicated in the spec file, our implementation will be in a function called genTime in the file osquery/tables/utility/time.cpp.
Here is that code:
// Copyright 2004-present Facebook. All Rights Reserved.
#include <ctime>
#include <boost/lexical_cast.hpp>
#include "osquery/database.h"
using std::string;
using boost::lexical_cast;
namespace osquery {
namespace tables {
QueryData genTime() {
Row r;
QueryData results;
time_t _time = time(0);
struct tm* now = localtime(&_time);
r["hour"] = lexical_cast<string>(now->tm_hour);
r["minutes"] = lexical_cast<string>(now->tm_min);
r["seconds"] = lexical_cast<string>(now->tm_sec);
results.push_back(r);
return results;
}
}
}Key points to remember:
- Your implementation function should be in the
osquery::tablesnamespace. - Your implementation function should accept no parameters and return an instance of
QueryData
Data types like QueryData, Row, DiffResults, etc. are osquery's built-in data result types. They're all defined in include/osquery/database/results.h.
Row is just a typedef for a std::map<std::string, std::string>. That's it. A row of data is just a mapping of strings that represent column names to strings that represent column values. Note that, currently, even if your SQL table type is an int and not a std::string, we need to cast the ints as strings to comply with the type definition of the Row object. They'll be casted back to int's later.
QueryData is just a typedef for a std::vector<Row>. Query data is just a list of rows. Simple enough.
To populate the data that will be returned to the user at runtime, your implementation function must generate the data that you'd like to display and populate a QueryData map with the appropriate Rows. Then, just return the QueryData.
In our case, we used system APIs to create a struct of type tm which has fields such as tm_hour, tm_min and tm_sec which represent the current time. We can then create our three entries in our Row variable: hour, minutes and seconds. Then we push that single row onto the QueryData variable and return it. Note that if we wanted our table to have many rows (a more common use-case), we would just push back more Row maps onto results.
If you've created a new file, you'll need to make sure that CMake properly builds your code. Open osquery/tables/CMakeLists.txt. Find the line that defines the library osquery_tables and add your file, utility/time.cpp to the sources which are compiled by that library.
If your table only works on OS X, find the target called osquery_tables_darwin and add your file to that list of sources instead. If your table only works on Linux, find the target called osquery_tables_linux and add your implementation file to that list of sources.
Return to the root of the repository and execute make. This will generate the appropriate code and link everything properly.
If your code compiled properly, launch the interactive query console by executing ./build/[darwin|linux]/osquery/osqueryi and try issuing your new table a command: SELECT * FROM time;.
You don't have to do anything to make your query work in the osqueryd daemon. All osquery queries work in osqueryd. It's worth noting, however, that osqueryd is a long-running process. If your table leaks memory or uses a lot of systems resources, you will notice poor performance from osqueryd. For more information on ensuring a performant table, see performance overview.
When in doubt, use existing open source tables to guide your development.