Skip to content

KV list() and 'cursor' #109

@cfjello

Description

@cfjello

I think KV list iterators has great potential! In distributed environment they could be fantastic, especially if they could be shared among processes. and they can! - well, almost.

I found the following behavoir when trying out list iterators:

1) You cannot provide a cursor-name to the iterator, when it is first created:

const LIMIT = 5; 
const keyPart = ["user"]; 
const cursor= "USERS";
const itor = kv.list<User>({ prefix: keyPart }, { limit: LIMIT, cursor: cursor );

This code will not return any result, because it is trying to lookup an existing cursor/iterator by name.
Also note, the LIMIT above applies to the list iterator. It is not a SQL style fetch limit, so you have to create a new list iterator to fetch the next 5 rows. This is probably by design.

2) A newly created list itorator does not have a cursor attribute that can be referenced, it is only assigned after the first fetch:

export type User = {
    id: number;
    name: string;
    age: number;
}

// Generate 100 users with random ages
const users: User[] = [];
for (let i = 1; i <= 100; i++) {
    users.push({
        id: i,
        name: `John_${i}`,
        age: 20 + (i % 30) // Example age between 20 and 49
    });
}

const kv = await Deno.openKv("./db.sqlite3")

async function fetchBatch<T>(
    iterator: Deno.KvListIterator<T>,
  ): Promise<{ cursor: string; items: T[] }> {
    let cursor = "";
    let result = await iterator.next();
    const items: T[] = [];
    while (!result.done) {
      cursor = iterator.cursor;
      // result.value returns full KvEntry object
      const item = result.value.value as T;
      items.push(item as T);
      result = await iterator.next()
    }
    return { cursor, items };
  }

// Populate the KV store with the users
await kv.delete(["user"]);
await kv.delete(["user_by_age"]);
for (const user of users) {
    const result = await kv.atomic()
      .set(["user", user.id], user)
      .set(["user_by_age", user.age, user.id], user)
      .commit();
    if (!result.ok) {
      throw new Error(`Problem persisting user ${user.name}`);
    }
  }


const itor = kv.list<User>({ prefix: ["user"] }, { limit: 5 });
let pageNum = 1;

// const cursor = itor.cursor; - trying to reference the iterator cursor name here, before the first fetch, will fail
const batch = await fetchBatch<User>(itor);

console.log(`-----------------------\nPage ${pageNum}:`);
for (const u of batch.items) {
    console.log(`${u.name} ${u.age}`);
}

// Now we can assign the name of the cursor
const cursor = itor.cursor;
const itor2 = kv.list<User>({ prefix: ["user"] }, { limit: 5 , cursor: cursor});


const batch2 = await fetchBatch<User>(itor2);
console.log(`-----------------------\nPage ${++pageNum}:`);
for (const u of batch2.items) {
    console.log(`${u.name} ${u.age}`);
}

As the code shows, it is possible to create a second iterator that looks up the first and will fetch the next five rows, so the result is:

-----------------------
Page 1:
John_1 21
John_2 22
John_3 23
John_4 24
John_5 25
-----------------------
Page 2:
John_6 26
John_7 27
John_8 28
John_9 29
John_10 30

3) This functionality also works across processes, indicating that these iterators are somehow tracked on a lower level. I tried to store the cursor name in the KV database, fetch the information from an independent process (same database), create a new list iterator using the cursor name fetched and 'voir la', the result was the same as above: Page 1 was produced by first process and Page 2 by the second process. In my opinion this is absolutely brilliant, a shared iterator in a distributed environment.

4) However, there was a problem, both in the dual- and single-process scenario. The first iterator has a private attribute #count, that is only updated when it initially runs. The first derived named iterator seems to know this count and picks up at the right row. However the #count is not subsequently tracked and updated - it only works the first time.

I am aware that I am probably squeezing the lemon here, but I would love it, if what I described above is actually how it is supposed to work and it's only the #count that needs fixing.

I have attached some sample code:

Iterators.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions