Postgres Pagination Strategy for List Endpoints

# Postgres Pagination Strategy for List Endpoints

## Background

The `list*` endpoints in Marquez v0.1 have (or will have soon)
implemented result pagination to bound the response sizes. To ensure
Marquez Core's scalability, and to enable a low-latency UI, we should
standardize pagination strategy which will scale with Marquez's data
volume.

### Assumptions

- `list*` endpoint results should always be sorted by the same table column(s)
- Clients do not require random access into the list of objects

### Goals

- Simple to implement within the application and the database
- Performance and database load should be constant regardless of
 cardinality of underlying table

## Strategy

### Keyset Pagination

The **Keyset Pagination** approach is simplest method for both the Marquez application and Postgres database to implement. It requires an index on the column used to sort the data. The application should be passed two values: `after` and `n`:

- `n` is the number of results to return
- `after` is the value that the next `n` results should follow in the sort
 order

There are a few benefits to this approach: Btree indexes can fulfill this query easily with a range scan, regardless of the indexed column cardinality. It also can be combined with filtering in the WHERE clause, which could be desirable client-side. However, the requirement of an after value requires the client to track the last value (or first value if paging backwards) from the previous page and pass it back to the server to fetch another set of results. It also means that the client cannot arbitrarily jump to a page which is not immediately before or after the last page fetched.

<img width="415" alt="screen shot 2019-02-05 at 3 53 31 pm" src="https://user-images.githubusercontent.com/41961267/52314362-23c51680-2967-11e9-88bb-ec2980d472b1.png">1

## Alternatives Considered

### Limit/Offset

The initial Marquez pagination implementation utilized the `LIMIT`, `OFFSET` features in Postgres. This works for rapid testing and development with small data volume, but querying fetching large OFFSETs is increasingly expensive on the database2.

<img width="415" alt="screen shot 2019-02-05 at 3 48 40 pm" src="https://user-images.githubusercontent.com/41961267/52314324-01cb9400-2967-11e9-9362-7211c769fe04.png"> 1

### Internal CTIDs

Postgres tracks an internal ID for each row in a table which corresponds to the physical ordering of the data in the table. For append-only data, the CTID will increase monotonically for new rows and could potentially be used for random access to pages. However, this will cause issues for
tables where data is deleted since Postgres will fill the CTID gaps with new rows. At the moment, Marquez Core is actually append-only and never deletes any data, so this could potentially work. However, paginating with CTIDs cannot be combined with filtering3 which will block useful UI feature for filtering on potentially large lists.

------------------------------------------------------------------------

1. [*https://leopard.in.ua/2014/10/11/postgresql-paginattion\#.XFogos9Kii4*](https://leopard.in.ua/2014/10/11/postgresql-paginattion#.XFogos9Kii4)

2. [*https://www.postgresql.org/docs/9.3/queries-limit.html*](https://www.postgresql.org/docs/9.3/queries-limit.html)

3. [*https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/*](https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Postgres Pagination Strategy for List Endpoints #281

Postgres Pagination Strategy for List Endpoints

Background

Assumptions

Goals

Strategy

Keyset Pagination

Alternatives Considered

Limit/Offset

Internal CTIDs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Postgres Pagination Strategy for List Endpoints #281

Description

Postgres Pagination Strategy for List Endpoints

Background

Assumptions

Goals

Strategy

Keyset Pagination

Alternatives Considered

Limit/Offset

Internal CTIDs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions