Skip to content

feat: implement Fast-SNP, speed up loopless FVA #1444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: devel
Choose a base branch
from

Conversation

isaf27
Copy link

@isaf27 isaf27 commented Jun 2, 2025

TL;DR

This PR implements Fast-SNP algorithm for adding loopless constraints. It consists of two key optimizations:

  • Consider for loopless constraints only the reactions that can be part of a cycle (loop).
  • Build sparse $N_{\text{int}}$ matrix using Fast-SNP algorithm (https://doi.org/10.1093/bioinformatics/btw555), which takes reaction flux sign (positive/negative) into account.

As a result a 10-100x (or more) speed-up for loopless FVA is achieved.

This PR is somewhat similar to #841, but differs in implementation.

Code modifications

One of the key optimizations is that add_loopless and flux_variability_analysis functions use only reactions that can be a part of a cycle (we call them cyclic). This greatly improves performance due to the number of cyclic reactions is usually much smaller than the number of internal reactions.

Function find_cyclic_reactions

We add a new function find_cyclic_reactions which finds all reactions in the model that can be a part of a cycle.
Also for each such reaction we identify if it can have positive and negative flux in a cycle. This is tested by restricting boundary reaction fluxes to zero and optimizing flux through each of the internal reactions.

There are two methods (method parameter) that can be called:

  • basic: a straightforward procedure that iterates over all of the reactions;
  • optimized: an optimized version inspired by Fast-SNP random weights idea, which discovers many cyclic reactions in one go.

The second method uses less LP programs and usually works at least 2x faster. It is a custom procedure, so we don't have a citation for the algorithm.

Function add_loopless

We introduced new method parameter. If method="fastSNP" we:

  1. Find all cyclic reactions using find_cyclic_reactions.
  2. Calculate $N_{\text{int}}$ for the cyclic reactions using Fast-SNP nullspace basis construction algorithm, taking model flux bounds-based signs into account. As the result, the constraints are constructed using much smaller and sparse $N_{\text{int}}$ matrix which greatly reduces number of binary variables in the corresponding MILP prolem and improves the speed of FVA.

If method="original" then the current SVD-based algorithm is ran on all of the internal reactions.

Function flux_variability_analysis

We made some optimizations for this function if loopless=True.

  1. We always find all cyclic reactions and their possible directions using find_cyclic_reactions function. Reactions which are not cyclic are optimized using the basic FVA mode—their extreme values are guaranteed to be achievable without using cycle fluxes. This greatly improves the performance of the function even when using the CycleFreeFlux method.
  2. We added a new add_loopless regime to this function, which adds loopless constraints (re-using the calculated cyclic reactions and add_loopless function) to solve loopless FVA for the cyclic reactions. In this regime the exact FVA bounds are calculated (unlike for the CycleFreeFlux method).

Performance

FVA quick example

For the quick demonstration of the performance improvement we assembled the following script, which solves FVA for a single reaction with the original and new Fast-SNP loopless constraints.

import time

from cobra.io import load_model
from cobra.flux_analysis.loopless import add_loopless


for model_name, reaction in [
    ('iIT341', 'ACKr'),
    ('iAF692', 'MDHy'),
    ('iNJ661', 'NADTRHD_copy1'),
]:
    print(f'Working with model {model_name}...')

    for method in ['original', 'fastSNP']:
        model = load_model(model_name)

        print(f'Finding loopless FVA bounds for reaction {reaction} using {method} method...')

        start_time = time.time()
        add_loopless(model, method=method)
        print(f'Loopless constraints added in {time.time() - start_time:.2f} seconds.')
        
        bounds = {}
        start_time = time.time()
        for direction in ['min', 'max']:
            rxn = model.reactions.get_by_id(reaction)
            model.solver.objective.set_linear_coefficients(
                {rxn.forward_variable: 1, rxn.reverse_variable: -1}
            )
            model.solver.objective.direction = direction
            
            bounds[direction] = model.slim_optimize()
        
        print(f'FVA bounds calculated in {time.time() - start_time:.2f} seconds.')
        print(f'Loopless FVA bounds for {reaction} ({method}): {bounds}')

Here are the example run times for the tested model/reaction pairs (CPLEX solver was used for MILP):

  • Model iIT341, reaction ACKr: $0.86 \to 0.09$, $9.6\times$ speed-up.
  • Model iAF692, reaction MDHy: $1.35 \to 0.11$, $12.3\times$ speed-up.
  • Model iNJ661, reaction NADTRHD_copy1: $120.61 \to 0.93$, $129\times$ speed-up.

The fastest complete loopless FVA

The most efficient way to calculate loopless FVA for all of the reactions is to directly call flux_variability_analysis function, without using explicit add_loopless:

model = load_model(model_name)
flux_variability_analysis(model, loopless=True, params_add_loopless={})

Discussion points

There are several points regarding the API we wanted to highlight for discussion:

  1. As was mentioned flux_variability_analysis(model, loopless=True, params_add_loopless={}) is the most efficient way to calculate loopless FVA. The function flux_variability_analysis identifies which method (add_loopless-based or CycleFreeFlux) will be used based on the params_add_loopless presence. This form enables backward-compatibility, but is a bit clunky. A better way could be to make loopless parameter deprecated and replace it with a new string-based method parameter to control the behavior, with the corresponding backward compatibility-handling procedure that sets method="CycleFreeFlux" if loopless=True is present.
  2. In add_loopless function the original $N_{\text{int}}$ construction algorithm can also benefit from being run only for the cyclic reactions. We could create another option use_cyclic=True to enable that, however we decided not to implement it. Our thinking is that method=original is provided for the reproducibility of the previous behavior, but ultimately shouldn't be used: the Fast-SNP algorithm should always provide a better performance.
  3. Conversely, flux_variability_analysis(loopless=True) can still be ran when the full loopless FVA is not feasible. Thus we decided to implement running CycleFreeFlux only for the cyclic reactions, which significantly improves the performance. This changes the behavior, but should produce the same results as before.

We would be happy to hear your feedback and are ready to implement any additional changes to API/docs/tests/etc that are needed for this PR to be merged.

@cdiener
Copy link
Member

cdiener commented Jun 3, 2025

Thanks so much. This looks great! Sorry for the issues with the CI. I will try to fix them this week, but you might have to rebase on the devel branch after that.

In general I think it will be a great contribution. I will try to review as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Nullspace calculation similar to COBRA (MATLAB)
2 participants