Skip to content

Commit 2c9b169

Browse files
authored
Update ROADMAP.md
1 parent 484832d commit 2c9b169

File tree

1 file changed

+16
-23
lines changed

1 file changed

+16
-23
lines changed

ROADMAP.md

Lines changed: 16 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,8 @@
1-
# Possible Future Extensions and Improvements (roughly in order of priority):
1+
# Possible Future Extensions and Improvements:
22

3-
I'm not sure how much I will be able to work on this in the future, so nothing is guaranteed.
3+
I'm not sure how much I will be able to work on this in the future, so nothing is guaranteed. Also, I believe in YAGNI and won't implement a lot of this unless I hear from anyone who wants to actually use `genedex`. I you want to use the library, but are missing a specific feature, I'd be happy to hear from you and will give the missing feature a high priority.
44

5-
### High priority: index for single texts
6-
7-
- more flexible alphabet API
8-
- allow alphabet with sentinel included in io representation
9-
- allow alphabet without sentinel (only usable for single text indexing)
10-
- optimized version for single text without sentinel
11-
- optimized construction directly from (fasta) file reader
12-
13-
### Nice to have, higher priority
5+
### Optimizations for existing features
146

157
- space optimization for rarely occurring symbols (such as the sentinel and N in the human Genome)
168
- maybe leverage the fact that such characters (namely N) often occur in runs
@@ -24,30 +16,31 @@ I'm not sure how much I will be able to work on this in the future, so nothing i
2416
- the batching of search queries could be improved. Currenty, it is not efficent if the queries have very different lengths
2517
or if many of them quickly get an empty interval, while others need ot be searched to the very end.
2618
- the batched rank function could also be optimized using const currying and other techniques
27-
- functionality to directly retrieve maximal exact matches
28-
- more documentation tests
29-
30-
### Large topics, is the goal to eventually support
31-
32-
- bidirectional FM-Index
33-
- searches with errors and "degenerate" chars in IUPAC fasta definition (using search schemes)
19+
- a faster `u32`-SACA to make the low memory mode less painful (`sais-drum` is a start, but optimizing it would be a lot of work)
20+
- suffix array, lookup table compression using unconventional int widths (e.g. 33 bit)
3421

35-
### Nice to have, but low priority
22+
### Smaller new features
3623

37-
- a faster `u32`-SACA to make the low memory mode less painful (would be a ton of work)
24+
- more flexible alphabet API
25+
- allow alphabet with sentinel included in io representation
26+
- allow alphabet without sentinel (only usable for single text indexing)
27+
- optimized version for single text without sentinel
28+
- functionality to directly retrieve maximal exact matches (MEMs/SMEMs)
29+
- more documentation tests
3830
- gate rayon/OpenMP usage behind feature flag
3931
- API to use batched search with cursors
4032
- type-erase index storage type and choose automatically for text size (does that work with savefile?)
33+
- bidirectional FM-Index
34+
- optimizations for highly repetitive texts such as run length encoding (r-index)
4135
- optional functionality for text recovery
4236
- text sampled suffix array (with text ids and optionally other annotations)
43-
- suffix array, lookup table compression using unconventional int widths (e.g. 33 bit)
4437
- optimized functions for reading directly from input files: both for texts to build the index and queries to search.
4538
the latter might be more important, because for simple searches, the search can be faster than reading the
4639
queries from disk.
4740

48-
### Large topics, might never happen
41+
### Larger new features, might never happen
4942

43+
- searches with errors and "degenerate" chars in IUPAC fasta definition (using search schemes, needs bidirectional FM-Index)
5044
- FMD-Index
5145
- word-based FM-Indices
52-
- optimizations for highly repetitive texts such as run length encoding (r-index)
5346
- ropeBWT/dynamic FM-Index

0 commit comments

Comments
 (0)