Skip to content

improve search performance by scanning with the timestamp index iteratively #2698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lilydjwg
Copy link
Contributor

Or else SQLite uses the command index and sort. Even with the timestamp index, SQLite still needs sort.

This is what atuin uses:

select * from history indexed by idx_history_command where deleted_at is null group by command having max(timestamp) order by timestamp desc limit 100;
|--SCAN history USING INDEX idx_history_command
`--USE TEMP B-TREE FOR ORDER BY

Try to tell SQLite to use the timestamp index. Slightly faster.

select * from history indexed by idx_history_timestamp where deleted_at is null group by command having max(timestamp) order by timestamp desc limit 100;
|--SCAN history USING INDEX idx_history_timestamp
|--USE TEMP B-TREE FOR GROUP BY
`--USE TEMP B-TREE FOR ORDER BY

The fastest index to use is...not to use an index at all.

select * from history not indexed where deleted_at is null group by command having max(timestamp) order by timestamp desc limit 100;
QUERY PLAN
|--SCAN history
|--USE TEMP B-TREE FOR GROUP BY
`--USE TEMP B-TREE FOR ORDER BY

The following one is very fast, but it might not fetch enough rows due to duplications. So we do the deduplication ourselves instead and let SQLite just scan using the index and never sort. The downside of this method is that it doesn't perform well when there are too many duplicates, hence #2697.

select * from (select * from history where deleted_at is null order by timestamp desc limit 1000) group by command having max(timestamp);
QUERY PLAN
|--CO-ROUTINE (subquery-1)
|  `--SCAN history USING INDEX idx_history_timestamp |--SCAN (subquery-1)
`--USE TEMP B-TREE FOR GROUP BY

For the following command, the elapsed times are (lowest among several runs):

atuin search --search-mode fuzzy --limit 100 --cmd-only > /dev/null
command index   : 0.154s
timestamp index : 0.139s
no index        : 0.124s
paging          : 0.012s

See also #475, #493, #2015.


BTW, the idx_history_command index is unnecessary since there is idx_history_command_timestamp.

The SQL atuin used is something like:

SELECT * FROM history WHERE (deleted_at IS NULL) GROUP BY command HAVING max(timestamp) ORDER BY timestamp DESC LIMIT 100;

The HAVING max(timestamp) part doesn't seem right. SELECT says:

If a HAVING clause is specified, it is evaluated once for each group of rows as a boolean expression. If the result of evaluating the HAVING clause is false, the group is discarded. If the HAVING clause is an aggregate expression, it is evaluated across all rows in the group.

So HAVING max(timestamp) is true all the time, and we are not sure which row is outputted. In the following section it says

Special processing occurs when the aggregate function is either min() or max().

But it refers to the columns being selected, not in the HAVING clause.

Checks

  • I am happy for maintainers to push small adjustments to this PR, to speed up the review cycle
  • I have checked that there are no existing pull requests for the same thing

…tively

Or else SQLite uses the command index and sort. Even with the timestamp
index, SQLite still needs sort.

This is what atuin uses:

select * from history indexed by idx_history_command where deleted_at is null group by command having max(timestamp) order by timestamp desc limit 100;
|--SCAN history USING INDEX idx_history_command
`--USE TEMP B-TREE FOR ORDER BY

Try to tell SQLite to use the timestamp index. Slightly faster.

select * from history indexed by idx_history_timestamp where deleted_at is null group by command having max(timestamp) order by timestamp desc limit 100;
|--SCAN history USING INDEX idx_history_timestamp
|--USE TEMP B-TREE FOR GROUP BY
`--USE TEMP B-TREE FOR ORDER BY

The fastest index to use is...not to use an index at all.

select * from history not indexed where deleted_at is null group by command having max(timestamp) order by timestamp desc limit 100;
QUERY PLAN
|--SCAN history
|--USE TEMP B-TREE FOR GROUP BY
`--USE TEMP B-TREE FOR ORDER BY

The last one is very fast, but it might not fetch enough rows due to
duplications. So we do the deduplication ourselves instead and let
SQLite just scan using the index and never sort.

select * from (select * from history where deleted_at is null order by timestamp desc limit 1000) group by command having max(timestamp);
QUERY PLAN
|--CO-ROUTINE (subquery-1)
|  `--SCAN history USING INDEX idx_history_timestamp
|--SCAN (subquery-1)
`--USE TEMP B-TREE FOR GROUP BY

For the following command, the elpased times are (lowest among several runs):

atuin search --search-mode fuzzy --limit 100 --cmd-only > /dev/null

command index   : 0.154s
timestamp index : 0.139s
no index        : 0.124s
paging          : 0.012s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant