Skip to content

Commit 50f11f4

Browse files
authored
Update ROADMAP.md for clarity and organization
1 parent 4841c77 commit 50f11f4

File tree

1 file changed

+10
-23
lines changed

1 file changed

+10
-23
lines changed

ROADMAP.md

Lines changed: 10 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,9 @@
11
# Possible Future Extensions and Improvements:
22

3-
I'm not sure how much I will be able to work on this in the future, so nothing is guaranteed. Also, I believe in YAGNI and won't implement a lot of this unless I hear from anyone who wants to actually use `genedex`. I you want to use the library, but are missing a specific feature, I'd be happy to hear from you and will give the missing feature a high priority.
3+
I'm not sure how much I will be able to work on this in the future, so nothing is guaranteed. I won't implement a lot of this unless I hear from anyone who wants to actually use `genedex`. If you want to use the library, but are missing a specific feature, I'd be happy to hear from you and will give the missing feature a high priority.
44

5-
### Optimizations for existing features
5+
### Optimizations for Existing Features
66

7-
- space optimization for rarely occurring symbols (such as the sentinel and N in the human Genome)
8-
- maybe leverage the fact that such characters (namely N) often occur in runs
9-
- the sentinel can not be searched. the current sampled suffix array implementation has special handling for it.
10-
so it technically doesn't have to be stored in the text with rank support. If N also gets special handling,
11-
the condensed text with rank support will get smaller and maybe faster.
12-
A text sampled suffix array could be an option, or a "sparse" text with rank support substructure.
137
- paired blocks for improved memory usage when using larger alphabets
148
- in the search, `lookup_tables::compute_lookup_idx_static_len` still seems to be one of the bottlenecks. this
159
should be investigated further, maybe it can be optimized or it's a measuring error.
@@ -19,28 +13,21 @@ I'm not sure how much I will be able to work on this in the future, so nothing i
1913
- a faster `u32`-SACA to make the low memory mode less painful (`sais-drum` is a start, but optimizing it would be a lot of work)
2014
- suffix array, lookup table compression using unconventional int widths (e.g. 33 bit)
2115

22-
### Smaller new features
16+
### Small New Features/Tweaks
2317

24-
- more flexible alphabet API
25-
- allow alphabet with sentinel included in io representation
26-
- allow alphabet without sentinel (only usable for single text indexing)
27-
- optimized version for single text without sentinel
28-
- functionality to directly retrieve maximal exact matches (MEMs/SMEMs)
29-
- more documentation tests
3018
- gate rayon/OpenMP usage behind feature flag
3119
- API to use batched search with cursors
32-
- type-erase index storage type and choose automatically for text size (does that work with savefile?)
33-
- bidirectional FM-Index
34-
- optimizations for highly repetitive texts such as run length encoding (r-index)
35-
- optional functionality for text recovery
36-
- text sampled suffix array (with text ids and optionally other annotations)
20+
- type-erase index storage type and choose automatically for text size
3721
- optimized functions for reading directly from input files: both for texts to build the index and queries to search.
3822
the latter might be more important, because for simple searches, the search can be faster than reading the
3923
queries from disk.
24+
- more documentation tests
4025

41-
### Larger new features, might never happen
26+
### Large New Features
4227

28+
- functionality to directly retrieve maximal exact matches (MEMs/SMEMs), FMD-Index
29+
- bidirectional FM-Index
4330
- searches with errors and "degenerate" chars in IUPAC fasta definition (using search schemes, needs bidirectional FM-Index)
44-
- FMD-Index
45-
- word-based FM-Indices
31+
- optimizations for highly repetitive texts such as run length encoding (r-index). This would be simpler, but much less useful than a ropeBWT-based FM-Index
4632
- ropeBWT/dynamic FM-Index
33+
- word-based FM-Index

0 commit comments

Comments
 (0)