Skip to content

[dupcol_action::presolve Method Performance Optimization Help] When dealing with large-scale MILP problems, the efficiency issue of dupcol_action::presolve method #221

@wozhendeshuai

Description

@wozhendeshuai

@jjhforrest
I encountered a performance bottleneck while trying to solve a large MILP problem containing variables of tens of millions. Currently, the solver I am using is CBC, and I spent up to 6 hours in the preprocessing phase.
After careful analysis, it was found that time was mainly consumed during the processing of duplicate columns in the preprocessing stage, which means that the execution time of the dupcol_action::presolve method is quite long (reaching more than 3H), becoming the main bottleneck of the entire preprocessing stage. Although attempts have been made to enable multi-threaded concurrent processing, there has been no significant improvement in speed in practical applications.

In summary:

  • MILP model size: tens of millions of variables
  • Preprocessing time: concentrated on thedupcol_action::presolve method, reaching more than 3 hours
  • Attempted optimization method: enabling multithreading, but no significant speed increase observed

The current understanding of the dupcol_action::presolve method is as follows:
This method is used to find and process duplicate columns with the same coefficients in mathematical programming models. When it is found that all non-zero elements and their corresponding rows between two or more columns are completely consistent, the preprocessor will merge these columns into one column and update the corresponding lower bound, upper bound, column elements, and possible column solutions. During the merging process, the values of the lower and upper bounds are added together to obtain a new boundary range; If the boundary value exceeds a reasonable range, set it to infinity to indicate unbounded. In addition, it will also determine the state of the merged new column based on the state information of the column (such as one of the columns being the base variable). In some cases, if the sparsity pattern of the columns is different but their contribution to the rows is the same (for example, considering tolerance), it is possible to choose to delete one of the columns and adjust the right-hand term of the remaining columns to maintain equivalence. At the same time, the code also handles situations where inequality constraints may become contradictory or overlapping due to column merging, as well as the resulting infeasibility judgments.

If there is any deviation in my understanding above, I hope you can help me correct it.

I hope to receive some professional guidance from you on whether there are specific optimization strategies for such large-scale problems. If you have any suggestions on how to tune, change parameter settings, or improve algorithms to adapt to large-scale scenarios, please let me know.

At the same time, if more details about problem instances or computing environments are needed, I am willing to cooperate in providing them.

Looking forward to your reply and support, thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions