Skip to content

Paginate results from API /sentences when sort=random#3263

Open
jiru wants to merge 2 commits intodevfrom
api-paginated-random
Open

Paginate results from API /sentences when sort=random#3263
jiru wants to merge 2 commits intodevfrom
api-paginated-random

Conversation

@jiru
Copy link
Member

@jiru jiru commented Mar 6, 2026

This PR allows to paginate results from the API /sentences endpoint when sort=random is used.

  1. The initial call with sort=random computes a random seed value.
  2. Paginated links include the seed value as sort=random:<seed>, e.g. sort=random:12345.
  3. Subsequent calls read the seed value from the sort= parameter.

This ensures the same sentence won’t appear twice within a complete result set.

Make the sort value row-dependent for random sort. This allows to use
seek-based pagination for random sorts, too. It is definitely not very
optimized, because we need to compute the random value for every row,
but it works and it is fast enough.
@kumakyoo42
Copy link

I don't understand all the details, but I wonder if your algorithm also works, when a new sentence is added (or removed) while switching from one page to the next.

@jiru
Copy link
Member Author

jiru commented Mar 6, 2026

I didn’t try inserting a new sentence, but it shouldn’t be a problem.

The position of each sentence in the result set is directly derived from the sentence id, so if a sentence is added, it will get inserted somewhere inside the result set, without affecting the order of all the other sentences.

The main website tatoeba.org paginates using page numbers, but the API uses keyset pagination instead: API consumers can only go from one page to the next (and not the other way around), and the position of the next page is based on the position of the last sentence of the current page, so everything gets shifted without affecting any ongoing page browsing.

The initial call with sort=random computes a seed value. Paginated links
include the seed value as sort=random:<seed>, e.g. sort=random:12345.
Subsequent calls read the seed value from the sort= parameter.

This ensures the same sentence won’t appear twice within the complete result.
@jiru jiru force-pushed the api-paginated-random branch from 5ee9dc1 to 79fe164 Compare March 6, 2026 16:15
@jiru
Copy link
Member Author

jiru commented Mar 6, 2026

I have deployed this branch on https://api.dev.tatoeba.org/ if you want to test out.

@kumakyoo42
Copy link

Unfortunately the test didn't pass… After adding a sentence the "random" order is completely changed. :-(

@jiru
Copy link
Member Author

jiru commented Mar 7, 2026

I just tried and it worked for me. Here is my method:

# Get the first 50 sentences ids in French using random seed 657749637
curl -s "https://api.dev.tatoeba.org/v1/sentences?sort=random%3A657749637&lang=fra"| jq '.data[].id' > /tmp/ids

# Add a new sentence on dev.tatoeba.org ("Il faut mettre un peu d’ordre dans ce bazar.")

# Wait until the new sentence gets indexed, check if it is with this command:
curl -s "https://api.dev.tatoeba.org/v1/sentences?sort=created&lang=fra" -G --data-urlencode "q=Il faut mettre un peu d’ordre dans ce bazar." | jq .

# Get the list of sentences ids again, using the same random seed
curl -s "https://api.dev.tatoeba.org/v1/sentences?sort=random%3A657749637&lang=fra"| jq '.data[].id' > /tmp/ids.new

# Compare the two lists
diff -u /tmp/ids /tmp/ids.new

@kumakyoo42
Copy link

I didn't use the api (I haven't look at it yet), just dev.tatoeba.org. Maybe that is the reason?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants