Skip to content

Latest commit

 

History

History
328 lines (267 loc) · 11 KB

File metadata and controls

328 lines (267 loc) · 11 KB

Lovefield 103: In-field features

Lovefield is designed to handle real-world applications and therefore it has quite a few features that are specifically designed to deal with in-field issues.

Version control and upgrade

For a shipping product, it's very common that developers need to change the database schema after it's released. For example, you need to add an additional column to a table. One of the reasons Lovefield was created in the first place is to resolve this pain.

Due to technical limitations of the web platform, Lovefield was not able to provide more flexible upgrade usage like other RDBMS. It has to be version based. Each database has a version, which must be a positive integer. Lovefield uses the integer to tell if an upgrade is needed. If so, it will call the onUpgrade handler. Let's see how this work from the following example:

// Version 1 of the table.
const builder = lf.schema.create('Sales', 1);
builder
  .createTable('Order')
  .addColumn('id', lf.Type.INTEGER)
  .addColumn('sku', lf.Type.INTEGER)
  .addColumn('amount', lf.Type.NUMBER)
  .addColumn('date', lf.Type.DATE_TIME)
  .addColumn('agentId', lf.Type.STRING)
  .addColumn('isLocal', lf.Type.BOOLEAN)
  .addPrimaryKey(['id']);

builder.createTable('Agent').addColumn('id', lf.Type.STRING);

Say we have version 1 of a database that meets the need of our web app v1. After using it for a while, the users complain that they could not add notes to each order. As a result, we need an additional column to the Order table. On top of that, isLocal confuses to internal function name convention so we want to rename it, and Agent field is never used because of design issues. How can we adjust the database to solve all these issues? This involves two steps:

  1. Create a second version of the database schema that address these issues.
  2. Upgrade the existing database in the field to the new version.

The first step can easily be done by the following code:

// Version 2 of the table.
const builder2 = lf.schema.create('Sales', 2); // MUST use a greater number
builder2
  .createTable('Order')
  .addColumn('id', lf.Type.INTEGER)
  .addColumn('sku', lf.Type.INTEGER)
  .addColumn('amount', lf.Type.NUMBER)
  .addColumn('date', lf.Type.DATE_TIME)
  .addColumn('memo', lf.Type.STRING)
  .addColumn('local', lf.Type.BOOLEAN)
  .addPrimaryKey(['id'])
  .addNullable(['memo']); // You can add additional constraints, too.

// No more agent field and Agent table.

The second version of database schema looks and tests right. Now how can we migrate existing database to the new one? This is when the onUpgrade() callback is for use:

const db = await builder2.connect({
  onUpgrade: onSalesDBUpgrade, // Gives upgrade handling function.
});

// Sales database upgrade logic.
async function onSalesDBUpgrade(rawDB) {
  // You can retrieve the currently persisted database version.
  const oldVersion = rawDB.getVersion();

  if (oldVersion != 1) {
    // Do something. In our example, we'll just give up.
    return Promise.reject(new Error('Unable to handle DB upgrade'));
  }

  // DROP TABLE Agent.
  // This call is synchronous!
  rawDB.dropTable('Agent');

  // Everything else is asynchronous.
  // We need to add a new column |memo| to existing |Order| table.
  // Let's add it for every row with a default value 'N/A'.
  await rawDB.addTableColumn('Order', 'memo', 'N/A');

  // Then rename |isLocal| to |local|
  await rawDB.renameTableColumn('Order', 'isLocal', 'local');

  // Don't forget to remove the |agent| column.
  await rawDB.dropTableColumn('Order', 'agent');

  return Promise.resolve();
}

Although these APIs solve majority of the upgrade needs, Lovefield does not support data transformations (i.e. read the data out and convert them into new data). This is also because of technical limitations posed by the Web standards.

In that case, you'll need to dump existing data, transform, and import again.

Import and export data

Lovefield supports exporting all its data to a JSON object, and importing data from a JSON object. However, it will be the developer's responsibility to check heap usage (i.e. if you run into OOM, there's nothing Lovefield can help you with).

export() can be performed anytime as you wish:

const db = await builder.connect();
const dump = await db.export();

// Dump object format:
// {
//   "name": <database_name>,
//   "version": <database_version>,
//   "tables": {
//     <table_name>: [
//       {
//         <column_name>: <value>,
//         <column_name>: <value>,
//         ...
//       },
//       { ... },
//       ...
//     ],
//     <table_name>: [ ... ],
//     ...
//   }
// }

As to import(), it has quite some limitations/requirements:

  • import() expects exactly the same data format as exports().
  • import() can only be performed on an completely empty database instance.
  • Lovefield does not check data integrity during import. You are held solely responsible for that.

Both import() and export() will lock the whole database so that no no transactions can be performed until the import() or export() are done. The code snippet shows how to use import():

const db = await builder.connect(); // Make sure the database is empty.
await db.import(jsonObject);

Inspect and optimize your query

Lovefield provides two ways for you to understand your query better. The first is to provide equivalent SQL statement, and the second is to show the execution plan generated by Lovefield:

const q = db
  .select(
    c.lastName.as('ln'),
    c.firstName.as('fn'),
    o.amount.as('amount'),
    o.date.as('date')
  )
  .from(c)
  .leftOuterJoin(o, c.id.eq(o.customerId))
  .orderBy(c.lastName)
  .orderBy(c.firstName);

// Use toSql() to get SQL statement.
console.log(q.toSql());
// Result:
// SELECT Customer.lastName AS ln,
//        Customer.firstName AS fn,
//        Order.amount AS amount,
//        Order.date AS date
//   FROM Customer
//   LEFT OUTER JOIN Order ON (Customer.id = Order.customerId)
//   ORDER BY Customer.lastName ASC, Customer.firstName ASC;

// Use explain() to get Lovefield execution plan
console.log(q.explain());
// Result:
// project(Customer.lastName,Customer.firstName,Order.amount,Order.date)
// -order_by(Customer.lastName ASC, Customer.firstName ASC)
// --join(type: outer, impl: hash, join_pred(Customer.id eq Order.customerId))
// ---table_access(Customer)
// ---table_access(Order)

toSql() is for developers that are porting existing code to Web-based solution and want to use Lovefield as a quick replacement. For these developers, they already have good SQL-based implementations and all they need are just making sure that things implemented in Lovefield is as expected (given the syntax are not the same). In this case, toSql() can bring peace of mind of cross checking whether the implementation is as expected or not.

On the other hand, Lovefield query engine can have different execution plan compared to other databases. Therefore, if you wish to tune performance of your queries, you'll need to check the Lovefield execution plan and see if your schema design makes the best use of Lovefield.

Let's start with an example to see how this work. Say we have a schema like the following:

schemaBuilder
  .createTable('Order')
  .addColumn('id', lf.Type.INTEGER)
  .addColumn('description', lf.Type.STRING)
  .addColumn('date', lf.Type.DATE_TIME)
  .addColumn('customerId', lf.Type.INTEGER)
  .addColumn('clerkId', lf.Type.INTEGER)
  .addColumn('amount', lf.Type.NUMBER)
  .addPrimaryKey(['id']);

schemaBuilder
  .createTable('Customer')
  .addColumn('id', lf.Type.INTEGER)
  .addColumn('firstName', lf.Type.STRING)
  .addColumn('lastName', lf.Type.STRING)
  .addColumn('clerk', lf.Type.BOOLEAN)
  .addPrimaryKey(['id']);

And one of our performance hot spot is this query:

// Using node.js here,
// so no need to use .col() API to make TypeScript compiler happy.
const q = db
  .select(
    c.firstName.as('fn'),
    c.lastName.as('ln'),
    o.date.as('date'),
    o.amount.as('amt')
  )
  .from(c, o)
  .where(lf.op.and(c.id.eq(o.customerId), o.customerId.eq(customerId)))
  .orderBy(o.date);

In JavaScript console, you can pause before the query execution, and run

console.log(q.explain());

It shows the following execution plan:

project(Customer.firstName,Customer.lastName,Order.date,Order.amount)
-order_by(Order.date ASC)
--join(type: inner, impl: index_nested_loop, join_pred(Customer.id eq Order.customerId))
---no_op_step(Customer)
---select(value_pred(Order.customerId eq 2))
----table_access(Order)

This is actually the execution plan tree, root node is Project and leaf node is table_access. When you see table_access, this means a full table scan, whose execution time is in direct proportion of number of rows. This is not necessary the wrong/bad thing, because in cases like outer join, this is completely normal. The next node is select(value_pred()), which means comparing each row of previous step with a value. This is something really fishy and we need to be careful about that.

What if we create an index on the customerId column for the inner join?

schemaBuilder
    .createTable('Order')
    ...
    .addPrimaryKey(['id'])
    .addIndex('idxCustomer', ['customerId'], false);  // Add one index here.

The execution plan now becomes:

project(Customer.firstName,Customer.lastName,Order.date,Order.amount)
-order_by(Order.date ASC)
--join(type: inner, impl: index_nested_loop, join_pred(Customer.id eq Order.customerId))
---no_op_step(Customer)
---table_access_by_row_id(Order)
----index_range_scan(Order.idxCustomer, [2, 2], natural)

The leaf node is an index_range_scan() based on another index in Order table. Given Lovefield uses B+ Tree index, we know the runtime complexity is O(log n) for this step instead of previous O(n). A theoretical improvement that worth another in-field trial to see how much time it saves.

To learn more about the optimization you can do, it will require you

  • Have deeper understanding of how a database query engine works.
  • Watch Demetrios' js.la tech talk
  • Read code in lib/proc/pp, pp means physical pass of the query engine.

Inspect your data at runtime

Lovefield has a Chrome Extension that can help you inspect data and debug. To use it, you need to

  • Install and enable the extension
  • Add enableInspector: true in connect, for example:
// This is suggested to be debug only, remove enableInspector before shipping.
const db = await builder.connect({
  enableInspector: true,
});

The screen shot of this tool: Lovefield inspector screenshot

Lovefield-TS ported the all the required code and therefore the extension works for both the original Lovefield and the TypeScript port.