Skip to content

Conversation

@usarica
Copy link
Contributor

@usarica usarica commented Apr 9, 2021

Robust fit fails to recognize convergence when the crossing point is on the order of the crossing tolerance. In such cases, the POI approaches close enough in nll to the specified level, but what is returned is still NaN. This PR fixes this issue. It also introduces the runtime define 'FITTER_CROSSING_PERSISTENT_TRIALS'; if this value is >0, the crossing minimizations continue until 'FITTER_CROSSING_PERSISTENT_TRIALS'-many trials are conducted.

@usarica
Copy link
Contributor Author

usarica commented Apr 9, 2021

The last commit also fixes a residual issue related to steep log-likelihood functions. It replaces the guess made between the present and the previous locations with a full quadratic interpolation based on the three closest points. The precision requirement is kept the same.

@ajgilbert
Copy link
Collaborator

Hi, I would like to understand better what the issues are before considering merging this PR. The robustFit algo is used heavily in CMS and any changes to the algorithm need to be carefully understood and validated (in general we recommend contacting the combine developers first if you encounter a problem like this).

Anyway, the issue that "Robust fit fails to recognize convergence when the crossing point is on the order of the crossing tolerance" - could this be addressed by setting a more appropriate min/max on the parameter being scanned, or is the issue not as simple as that? It's true the algorithm may struggle if the range on the parameter is many orders of magnitude larger than the distance to the crossing.

@usarica
Copy link
Contributor Author

usarica commented Apr 12, 2021

Hi, I would like to understand better what the issues are before considering merging this PR. The robustFit algo is used heavily in CMS and any changes to the algorithm need to be carefully understood and validated (in general we recommend contacting the combine developers first if you encounter a problem like this).

Anyway, the issue that "Robust fit fails to recognize convergence when the crossing point is on the order of the crossing tolerance" - could this be addressed by setting a more appropriate min/max on the parameter being scanned, or is the issue not as simple as that? It's true the algorithm may struggle if the range on the parameter is many orders of magnitude larger than the distance to the crossing.

I don't think setting the min/max of the parameter manually is desirable if one runs several hundreds of jobs, which is the case I am dealing with. It is not a scalable solution. Moreover, as I mentioned, the last commit fixes an important bug, where the interpolation equation assumes certain properties that are not true in general (and I have come across cases where the interval is off by an order of magnitude, within the small uncertainty band it is supposed to have). The solution is tested well, comparison to Minos algorithm closes when Minos manages to succeed.

@usarica
Copy link
Contributor Author

usarica commented Apr 12, 2021

BTW, the default behavior is almost exactly the same if one does not set FITTER_CROSSING_PERSISTENT_TRIALS, with the only meaningful change being in how the interpolation is done in the first if-condition inside the for-loop.

@ajgilbert
Copy link
Collaborator

Well I think setting sensible ranges on all floating parameters is good practice. This also helps minuit make reasonable initial steps at the start of the minimisation. If this is really a problem, then I guess you could also play with the stepSize or tolerance parameters to achieve a similar effect.

It would be helpful to have more detail on the bug you mention though. Can you provide a minimal example that we can test?

@usarica
Copy link
Contributor Author

usarica commented Apr 12, 2021

Well I think setting sensible ranges on all floating parameters is good practice. This also helps minuit make reasonable initial steps at the start of the minimisation. If this is really a problem, then I guess you could also play with the stepSize or tolerance parameters to achieve a similar effect.

It would be helpful to have more detail on the bug you mention though. Can you provide a minimal example that we can test?

I agree with you in general, but it doesn't always work in practice because the different data cards I have to work with are also in different ranges (I am running some T&P fits using combine, so each bin has a different signal or bkg shape, or signal fraction). I put initial guesses on the central value and ranges, but I can't narrow the range finely enough. That is why I decided to improve the functionality of robustFit instead (I still have several hundreds of datacards, so tuning them by hand was becoming impractical).

Let me upload an example workspace on lxplus and point to it in this thread.

@usarica
Copy link
Contributor Author

usarica commented Apr 12, 2021

Well I think setting sensible ranges on all floating parameters is good practice. This also helps minuit make reasonable initial steps at the start of the minimisation. If this is really a problem, then I guess you could also play with the stepSize or tolerance parameters to achieve a similar effect.
It would be helpful to have more detail on the bug you mention though. Can you provide a minimal example that we can test?

I agree with you in general, but it doesn't always work in practice because the different data cards I have to work with are also in different ranges (I am running some T&P fits using combine, so each bin has a different signal or bkg shape, or signal fraction). I put initial guesses on the central value and ranges, but I can't narrow the range finely enough. That is why I decided to improve the functionality of robustFit instead (I still have several hundreds of datacards, so tuning them by hand was becoming impractical).

Let me upload an example workspace on lxplus and point to it in this thread.

Here it is:
/afs/cern.ch/work/u/usarica/public/HCombRobFitFix
I added the instructions and sample results in the README. You could reproduce them by running the commands I include.

@ajgilbert
Copy link
Collaborator

Thanks, so I was able to get to (I think) the correct crossing in a fairly small number of steps using: combine /afs/cern.ch/work/u/usarica/public/HCombRobFitFix/higgsCombineSinglesSnapshot2.MultiDimFit.mH120.root -M MultiDimFit --freezeParameters r --redefineSignalPOIs frac_sig -v 1 --alignEdges=1 --saveNLL --saveSpecifiedNuis=rgx{.*} --algo singles --saveWorkspace --X-rtd MINIMIZER_no_analytic -n SinglesFinalSnapshot --snapshotName=MultiDimFit --cminDefaultMinimizerStrategy 0 --robustFit 1 --stepSize 0.0001 --X-rtd FITTER_DYN_STEP

which gives:

Searching for crossing at nll = 0.494466 in the interval 1, 0
      frac_sig      lvl-here  lvl-there   stepping
0.999900    +0.37360  +0.49447    -0.000100
0.999700    +0.02609  +0.37360    -0.000200
0.999496    -0.45808  +0.02609    -0.000204
0.999690    +0.00461  -0.45808    -0.000010
0.999680    -0.01612  +0.00461    -0.000010
0.999687    +0.00030  -0.01612    -0.000002

But then indeed I see the issue with the interpolated value here. So that part should probably be fixed. At this point I'd prefer to apply a minimal fix for this algo (perhaps even just returning rStart at the last point would be sufficient), and then consider your improvements for a new improved algo that could eventually become the default. Will discuss with the other developers.

@usarica
Copy link
Contributor Author

usarica commented Apr 13, 2021

Thanks, so I was able to get to (I think) the correct crossing in a fairly small number of steps using: combine /afs/cern.ch/work/u/usarica/public/HCombRobFitFix/higgsCombineSinglesSnapshot2.MultiDimFit.mH120.root -M MultiDimFit --freezeParameters r --redefineSignalPOIs frac_sig -v 1 --alignEdges=1 --saveNLL --saveSpecifiedNuis=rgx{.*} --algo singles --saveWorkspace --X-rtd MINIMIZER_no_analytic -n SinglesFinalSnapshot --snapshotName=MultiDimFit --cminDefaultMinimizerStrategy 0 --robustFit 1 --stepSize 0.0001 --X-rtd FITTER_DYN_STEP

which gives:

Searching for crossing at nll = 0.494466 in the interval 1, 0
      frac_sig      lvl-here  lvl-there   stepping
0.999900    +0.37360  +0.49447    -0.000100
0.999700    +0.02609  +0.37360    -0.000200
0.999496    -0.45808  +0.02609    -0.000204
0.999690    +0.00461  -0.45808    -0.000010
0.999680    -0.01612  +0.00461    -0.000010
0.999687    +0.00030  -0.01612    -0.000002

But then indeed I see the issue with the interpolated value here. So that part should probably be fixed. At this point I'd prefer to apply a minimal fix for this algo (perhaps even just returning rStart at the last point would be sufficient), and then consider your improvements for a new improved algo that could eventually become the default. Will discuss with the other developers.

Thanks for looking at this issue. I still don't think this adjustment can be done in a general setting (since in the case of these workspaces, the minimum and range of frac_sig could be quite different, and there are as many as 100K of them according to my last count), but for sure I could wait until a discussion with other developers happens first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants