Skip to content
This repository was archived by the owner on Sep 27, 2019. It is now read-only.
This repository was archived by the owner on Sep 27, 2019. It is now read-only.

Mechanism to detect and determine cached statement replan in protocol handler #789

Open
@ChTimTsubasa

Description

@ChTimTsubasa

1. Background and Problems:

1.1 Postgres Extended Query Protocol

Compared to Postgres Simple Query Protocol, this extended one has a major change to support statement[1] caching, which is trying to cache every query's plan and reuse the plan once the same query appears again.

A typical example would be a client sending two same queries to the server:

For the first query, the server would receive a series of packets including PARSE->BIND[2]->DESCRIBE->EXECUTE. A new statement with a plan generated by optimizer would be stored at PARSE stage.

For the second query, instead of receiving 4 packets again, the server would only get BIND->DESCRIBE->EXECUTE packets. The server later would try to find the generated plan in the cache at BIND stage therefore to skip optimizing.

To support caching, current peloton uses a map with statement name string as the key and statement object as the value.

1.2 Problem of re-planning for cached statement

Using the example in 1.1 for a better illustration: After the first query and before the second query, it is possible that the table's schema has changed by either:

  • Another transaction in another connection changes the table schema and commit.

  • Between this two identical queries, the client sends one or more queries that changed the table schema.

The above situations would make the cached statement invalid and could potentially crash the system. The current implementation does not take this into consideration.

2. Proposed Solution

Basically, this solution relies on the recently implemented transactional catalog to store a "PlanVersion" field on each table tuple. This field is retrieved at PARSE stage and retrieved again and compare at BIND stage (or possibly other stages?) The protocol_handler saves a copy of the "PlanVersion" number in the cache, and compare this number when it comes to BIND to decide whether it should re-plan statement.

2.1 On catalog side

For each pg_table tuple, there should be a field storing the "PlanVersion" number.

An function that could get() the "PlanVersion" given the table_oid(s).

Some places either in catalog or other parts in the system would update that "PlanVersion" of a table if the table schema is changed(ABORT, CREAT or ALTER).

@mengranwo @AngLi-Leon @camellyx

2.2 On protocol_handler side

At PARSE stage, protocol_handler would access the PlanVersion field and stores it in its cache with the statement.

At BIND stage, it should access the PlanVersion field again and compare the version to its previously cached PlanVersion, if they are different, it would invoke the replanStatement() function.

Protocol handler should be able to access the most current TXN which is stored in traffic cop, since catalog's get() method is transactional and the TXN object need to be passed in as an argument.

[1] statement is a data structure that contains the query string and the generated query plan tree.
[2] Note that the BIND here is different from what peloton binder is doing. The BIND refers to a stage where a statement is searched from the cache and the parameters of a query are provided. ("SELECT * FROM foo WHERE $1 > 0", the actual value of $1 is provided at BIND stage.)

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions