Implement Matryoshka #11

RomeoV · 2025-05-19T17:45:28Z

Comes with:

OMP reimplementation
Default to Arnoldi solver
ksvd_loop implementation
fasterror and fasterror!

This is because it's not very compatible with the new Matryoshka implementation. May put back later, let's see.

Refer to Matryoshka paper Figure 12 in Appendix D for some more inspiration on that and take the median in the figure.

Used to call from python for matryoshka dicts.

About 10x slower than MP for 2k dimensions, 4k dictionaries. But recovers true dictionary.

Brings down time from 120ms to 72ms for test workload...

src/ksvd_types.jl

src/atomreplacement.jl

src/matching_pursuit.jl

src/ksvd_loop.jl

src/KSVD.jl

src/ksvd_loop.jl

Project.toml

We had previously removed minibatching because it's a bit tricky to get to work with Matryoshka. However, we managed to bring it back. To this end, we also introduced `maybeview`: A function that constructs a view if the column indexing is contiguous, and otherwise copies. The new minibatching continues to only update `D` on some subset of the columns of `Y`, but returns `X` for all columns of `Y`, as it's good to update the structure.

Because these types hold big matrices, whenever they are printed, the terminal blows up. This occurs for example in the tests... Now it should be fine.

Sometimes matryoshka fails budget assignment for too small problems. Use `ceil` so that we always have at least one nnz. May lead to a total nnz count that's slightly out of budget though...

When the error is precomputed, we had issues where we had preallocated an error buffer for the entire data matrix, but then tried to copy in only the minibatch slice. Now we preallocate only an error buffer according to the minibatch size.

This is used a lot in testing, and I've been coding it new every time... Time to make a proper function!

Added `Hungarian.jl` dependency, which however, doesn't have further dependencies.

See JuliaSparse/SparseArrays.jl#628

Now, after fitting all the coefficients, we "refit" them once more by solving $A x = b$ for only the nonzero elements of $x$ and columns of $A$.

I benchmarked once more, and it's typically not faster than just `Y - D*X` for our problem sizes.

RomeoV

Some feedback...

RomeoV added 19 commits April 30, 2025 00:00

Use sqrt(svd tol) for test.

cf7827d

Implement fasterror!

9a11799

Add fasterror shortcut function

ff83f31

fixup! Implement fasterror!

af47145

Format buffer.

42e9c3f

Implement Matryoshka Loop :)

f224aa8

fixup! Implement Matryoshka Loop :)

b05a18c

Change Matryoshka implementation

cf703ac

Fix NaN problem with zero inputs.

4cdf5fe

Fix faulty warning when all x are used.

3aca8e9

Stop dividing by zero if y is zero

10f6108

Un-implement minibatching

bcfa78b

This is because it's not very compatible with the new Matryoshka implementation. May put back later, let's see.

Implement equal budget per sub-sae for Matryoshka

c2fc10b

Refer to Matryoshka paper Figure 12 in Appendix D for some more inspiration on that and take the median in the figure.

Default to ArnoldiSVDSolver

4702fe3

Implement sparse_coding_matryoshka.

e1374c0

Used to call from python for matryoshka dicts.

Allow passing DtD and DtY

2b9338c

Implement other sparse coding interface.

6712c47

Properly implement OMP.

1009181

About 10x slower than MP for 2k dimensions, 4k dictionaries. But recovers true dictionary.

Try implementing some optimizations

a52fb9a

Brings down time from 120ms to 72ms for test workload...

RomeoV commented May 19, 2025

View reviewed changes

Project.toml Outdated Show resolved Hide resolved

RomeoV added 9 commits May 19, 2025 11:41

Relax many compat entries.

5c0776d

Nit: formatting.

8d62932

Implement pretty-printing for types with buffers

b6e80b4

Because these types hold big matrices, whenever they are printed, the terminal blows up. This occurs for example in the tests... Now it should be fine.

Let legacy matching pursuit take DtD and DtY for API completeness.

39e9ca0

nit: fix matryoshka budget for small k

33184b1

Sometimes matryoshka fails budget assignment for too small problems. Use `ceil` so that we always have at least one nnz. May lead to a total nnz count that's slightly out of budget though...

fixup! Bring back minibatch

fa3b952

When the error is precomputed, we had issues where we had preallocated an error buffer for the entire data matrix, but then tried to copy in only the minibatch slice. Now we preallocate only an error buffer according to the minibatch size.

Export some more symbols.

b1736e8

Implement feature grid tests with most supported features.

29fc1f1

RomeoV added 15 commits May 19, 2025 15:28

Add utility for initializing random X.

9b3807e

This is used a lot in testing, and I've been coding it new every time... Time to make a proper function!

Implement and test permute_D_X! functionality

b8f5b1c

Added `Hungarian.jl` dependency, which however, doesn't have further dependencies.

Add faster indexing row adjoints of sparse matrices

dbfe184

See JuliaSparse/SparseArrays.jl#628

Speed up permute_D_X! with Xref

df13421

Let sparse_coding take kwargs for convenience.

c5fe23e

Implement optional refit_coeffs for sparse coding.

b16433d

Now, after fitting all the coefficients, we "refit" them once more by solving $A x = b$ for only the nonzero elements of $x$ and columns of $A$.

fixup! Add utility for initializing random X.

1cfde14

fixup! Implement optional refit_coeffs for sparse coding.

19777f6

nit: remove unused variables.

4a1515c

fixup! Implement optional refit_coeffs for sparse coding.

6fba0b7

Allow passing DtD to high level sparse_coding interface.

cdfc7a9

Completely overhaul sparse_coding test.

fac0d03

Remove unnecessary copy

1600aef

Remove fasterror

b017845

I benchmarked once more, and it's typically not faster than just `Y - D*X` for our problem sizes.

Add try-catch for Arnoldi method.

8984601

RomeoV commented May 20, 2025

View reviewed changes

RomeoV added 10 commits May 20, 2025 00:20

Provide docstring for MatryoshkaLoop

572f9c2

Fixup docstring for NormalLoop

2aaf05f

Use init_sparse_assignment_mat in some tests.

e340160

nit: remove some visual noise.

eab1b6f

fixup! Use init_sparse_assignment_mat in some tests.

6223e36

test: Fixup some test nits

04eb920

Nit: warn when using permute_D_X! with X reference.

9a37a14

Export some more symbols.

0593240

Merge branch 'master' into matryoshka

156dcf9

Loosen ksvd end-to-end atol a bit more.

112e6fd

RomeoV merged commit a4323d9 into master May 20, 2025
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Matryoshka #11

Implement Matryoshka #11

Uh oh!

RomeoV commented May 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RomeoV left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement Matryoshka #11

Implement Matryoshka #11

Uh oh!

Conversation

RomeoV commented May 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RomeoV left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants