Merge pull request #881 from cms-analysis/anigamova-patch-2

anigamova · web-flow · commit 07311788571c · 2024-06-12T11:07:03.000+02:00
Update docs for the v10 release
diff --git a/bin/combine.cpp b/bin/combine.cpp
@@ -32,7 +32,7 @@
 using namespace std;
 
 // Update whenever we have a new Tag
-std::string combineTagString = "v9.2.1";
+std::string combineTagString = "v10.0.0";
 // 
 
 int main(int argc, char **argv) {
diff --git a/docs/index.md b/docs/index.md
@@ -32,7 +32,30 @@ should be sufficient. To choose a release version, you can find the latest
 releases on github under
 [https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/releases](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/releases)
 
-#### Combine v9 - recommended version
+### Combine v10 - recommended version
+
+The nominal installation method is inside CMSSW. The current release targets
+the CMSSW `14_1_X` series because of the recent switch to el9 at lxplus machines.
+
+
+
+```sh
+cmsrel CMSSW_14_1_0_pre4
+cd CMSSW_14_1_0_pre4/src
+cmsenv
+git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
+cd HiggsAnalysis/CombinedLimit
+```
+Update to a recommended tag - currently the recommended tag is **v10.0.0**: [see release notes](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/releases/tag/v10.0.0)
+
+```sh
+cd $CMSSW_BASE/src/HiggsAnalysis/CombinedLimit
+git fetch origin
+git checkout v10.0.0
+scramv1 b clean; scramv1 b # always make a clean build
+```
+
+### Combine v9 
 
 The nominal installation method is inside CMSSW. The current release targets
 the CMSSW `11_3_X` series because this release has both python2 and python3 ROOT
@@ -239,7 +262,13 @@ See [contributing.md](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimi
 
 ## CombineHarvester/CombineTools
 
-CombineTools is an additional tool for submitting <span style="font-variant:small-caps;">Combine</span> jobs to batch systems or crab, which was originally developed in the context of Higgs to tau tau analyses. Since the repository contains a certain amount of analysis-specific code, the following scripts can be used to clone it with a sparse checkout for just the core [`CombineHarvester/CombineTools`](https://github.com/cms-analysis/CombineHarvester/blob/master/CombineTools/) subpackage, speeding up the checkout and compile times:
+!!! info
+    Starting with <span style="font-variant:small-caps;">Combine v10</span>, CombineTool functionalities for job submition and parallelization (combineTool.py) as well as many plotting functions have been integrated into the <span style="font-variant:small-caps;">Combine</span> package.
+    For these tasks you no longer have to follow the instructions below.
+
+
+CombineTools is an additional packages with useful features for <span style="font-variant:small-caps;">Combine</span>, which is used for example for the automated datacard validation (see [instructions](docs/part3/validation)).
+Since the repository contains a certain amount of analysis-specific code, the following scripts can be used to clone it with a sparse checkout for just the core [`CombineHarvester/CombineTools`](https://github.com/cms-analysis/CombineHarvester/tree/main/CombineTools/) subpackage, speeding up the checkout and compile times:
 
 git clone via ssh:
 
diff --git a/docs/part3/commonstatsmethods.md b/docs/part3/commonstatsmethods.md
@@ -685,7 +685,7 @@ The following algorithms are implemented:
 
 - **`AD`**: Compute a goodness-of-fit measure for binned fits using the *Anderson-Darling* test. It is based on the integral of the difference between the cumulative distribution function and the empirical distribution function over all bins. It also gives the tail ends of the distribution a higher weighting.
 
-The output tree will contain a branch called **`limit`**, which contains the value of the test statistic in each toy. You can make a histogram of this test statistic $t$. From the distribution that is obtained in this way ($f(t)$) and the single value obtained by running on the observed data ($t_{0}$) you can calculate the p-value $p = \int_{t=t_{0}}^{\mathrm{+inf}} f(t) dt$. Note: in rare cases the test statistic value for the toys can be undefined (for AS and KD). In this case we set the test statistic value to -1. When plotting the test statistic distribution, those toys should be excluded. This is automatically taken care of if you use the GoF collection script in CombineHarvester, which is described below.
+The output tree will contain a branch called **`limit`**, which contains the value of the test statistic in each toy. You can make a histogram of this test statistic $t$. From the distribution that is obtained in this way ($f(t)$) and the single value obtained by running on the observed data ($t_{0}$) you can calculate the p-value $p = \int_{t=t_{0}}^{\mathrm{+inf}} f(t) dt$. Note: in rare cases the test statistic value for the toys can be undefined (for AS and KD). In this case we set the test statistic value to -1. When plotting the test statistic distribution, those toys should be excluded. This is automatically taken care of if you use the GoF collection script which is described below.
 
 When generating toys, the default behavior will be used. See the section on [toy generation](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#toy-data-generation) for options that control how nuisance parameters are generated and fitted in these tests. It is recommended to use *frequentist toys* (`--toysFreq`) when running the **`saturated`** model, and the default toys for the other two tests.
 
diff --git a/docs/part4/usefullinks.md b/docs/part4/usefullinks.md
@@ -77,7 +77,7 @@ The paper for the <span style="font-variant:small-caps;">Combine</span> tool is
 * _What does fit status XYZ mean?_ 
     * <span style="font-variant:small-caps;">Combine</span> reports the fit status in some routines (for example in the `FitDiagnostics` method). These are typically the status of the last call from Minuit. For details on the meanings of these status codes see the [Minuit2Minimizer](https://root.cern.ch/root/html/ROOT__Minuit2__Minuit2Minimizer.html) documentation page.
 * _Why does my fit not converge?_ 
-    * There are several reasons why some fits may not converge. Often some indication can be obtained from the `RooFitResult` or status that you will see information from when using the `--verbose X` (with $X>2$) option. Sometimes however, it can be that the likelihood for your data is very unusual. You can get a rough idea about what the likelihood looks like as a function of your parameters (POIs and nuisances) using `combineTool.py -M FastScan -w myworkspace.root` (use --help for options). 
+    * There are several reasons why some fits may not converge. Often some indication can be obtained from the `RooFitResult` or status that you will see information from when using the `--verbose X` (with $X>2$) option. Sometimes however, it can be that the likelihood for your data is very unusual. You can get a rough idea about what the likelihood looks like as a function of your parameters (POIs and nuisances) using `combineTool.py -M FastScan -w myworkspace.root` (use --help for options, see also [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/debugging/#analyzing-the-nll-shape-in-each-parameter).
     * We have often seen that fits in <span style="font-variant:small-caps;">Combine</span> using `RooCBShape` as a parametric function will fail. This is related to an optimization that fails. You can try to fix the problem as described in this issue: [issues#347](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/issues/347) (i.e add the option `--X-rtd ADDNLL_CBNLL=0`).
 * _Why does the fit/fits take so long?_ 
     * The minimization routines are common to many methods in <span style="font-variant:small-caps;">Combine</span>. You can tune the fits using the generic optimization command line options described [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#generic-minimizer-options). For example, setting the default minimizer strategy to 0 can greatly improve the speed, since this avoids running HESSE. In calculations such as `AsymptoticLimits`, HESSE is not needed and hence this can be done, however, for `FitDiagnostics` the uncertainties and correlations are part of the output, so using strategy 0 may not be particularly accurate. 
diff --git a/docs/part5/longexercise.md b/docs/part5/longexercise.md
@@ -12,7 +12,7 @@ You can find a presentation with some more background on likelihoods and extract
 If you are not yet familiar with these concepts, or would like to refresh your memory, we recommend that you have a look at these presentations before you start with the exercise.
 
 ## Getting started
-To get started, you should have a working setup of `Combine` and `CombineHarvester`, please follow the instructions from the [home page](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/#within-cmssw-recommended-for-cms-users), to setup `CombineHarvester` checkout necessary scripts as described [here](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/#combineharvestercombinetools). Make sure to use the latest recommended releases for both packages.
+To get started, you should have a working setup of `Combine`, please follow the instructions from the [home page](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/#within-cmssw-recommended-for-cms-users). Make sure to use the latest recommended release.
 
 Now we will move to the working directory for this tutorial, which contains all the inputs needed to run the exercises below:
 ```shell
@@ -432,7 +432,7 @@ is perfectly valid and only one `rateParam` will be created. These parameters wi
 
 ### B: Nuisance parameter impacts
 
-It is often useful to examine in detail the effects the systematic uncertainties have on the signal strength measurement. This is often referred to as calculating the "impact" of each uncertainty. What this means is to determine the shift in the signal strength, with respect to the best-fit, that is induced if a given nuisance parameter is shifted by its $\pm1\sigma$ post-fit uncertainty values. If the signal strength shifts a lot, it tells us that it has a strong dependency on this systematic uncertainty. In fact, what we are measuring here is strongly related to the correlation coefficient between the signal strength and the nuisance parameter. The `MultiDimFit` method has an algorithm for calculating the impact for a given systematic: `--algo impact -P [parameter name]`, but it is typical to use a higher-level script, `combineTool.py` (part of the CombineHarvester package you checked out at the beginning) to automatically run the impacts for all parameters. Full documentation on this is given [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/nonstandard/#nuisance-parameter-impacts). There is a three step process for running this. First we perform an initial fit for the signal strength and its uncertainty:
+It is often useful to examine in detail the effects the systematic uncertainties have on the signal strength measurement. This is often referred to as calculating the "impact" of each uncertainty. What this means is to determine the shift in the signal strength, with respect to the best-fit, that is induced if a given nuisance parameter is shifted by its $\pm1\sigma$ post-fit uncertainty values. If the signal strength shifts a lot, it tells us that it has a strong dependency on this systematic uncertainty. In fact, what we are measuring here is strongly related to the correlation coefficient between the signal strength and the nuisance parameter. The `MultiDimFit` method has an algorithm for calculating the impact for a given systematic: `--algo impact -P [parameter name]`, but it is typical to use a higher-level script, `combineTool.py`, to automatically run the impacts for all parameters. Full documentation on this is given [here](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/nonstandard/#nuisance-parameter-impacts). There is a three step process for running this. First we perform an initial fit for the signal strength and its uncertainty:
 
 ```shell
 combineTool.py -M Impacts -d workspace_part3.root -m 200 --rMin -1 --rMax 2 --robustFit 1 --doInitialFit
diff --git a/docs/tutorial2023/parametric_exercise.md b/docs/tutorial2023/parametric_exercise.md
@@ -1,8 +1,7 @@
 # Parametric Models in Combine
 
 ## Getting started
-
-To get started, you should have a working setup of `Combine` and `CombineHarvester`, please follow the instructions from the [home page](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/#within-cmssw-recommended-for-cms-users), to setup `CombineHarvester` checkout necessary scripts as described [here](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/#combineharvestercombinetools). Make sure to use the latest recommended releases for both packages.
+To get started, you should have a working setup of `Combine`, please follow the instructions from the [home page](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/#within-cmssw-recommended-for-cms-users). Make sure to use the latest recommended release.
 
 Now let's move to the working directory for this tutorial which contains all of the inputs and scripts needed to run the parametric fitting exercise:
 ```shell
@@ -462,7 +461,7 @@ To perform a likelihood scan (i.e. calculate 2NLL at fixed values of the signal
 ```shell
 combine -M MultiDimFit datacard_part1_with_norm.root -m 125 --freezeParameters MH -n .scan --algo grid --points 20 --setParameterRanges r=lo,hi
 ```
-We can use the `plot1DScan.py` function from combineTools to plot the likelihood scan:
+We can use the `plot1DScan.py` function from CombineTools to plot the likelihood scan:
 ```shell
 plot1DScan.py higgsCombine.scan.MultiDimFit.mH125.root -o part2_scan
 ```
@@ -802,7 +801,7 @@ These methods are not limited to this particular grouping of systematics. We can
 ### Impacts
 It is often useful/required to check the impacts of the nuisance parameters (NP) on the parameter of interest, r. The impact of a NP is defined as the shift $\Delta r$ induced as the NP, $\theta$, is fixed to its $\pm1\sigma$ values, with all other parameters profiled as normal. More information can be found in the combine documentation via this [link](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/nonstandard/#nuisance-parameter-impacts).
 
-Let's calculate the impacts for our analysis. We can use the `combineTool.py` from the `CombineHarvester` package to automate the scripts. The impacts are calculated in a few stages:
+Let's calculate the impacts for our analysis. We can use the `combineTool.py` to automate the scripts. The impacts are calculated in a few stages:
 
 1) Do an initial fit for the parameter of interest, adding the `--robustFit 1` option:
 ```shell
diff --git a/docs/tutorial2023_unfolding/unfolding_exercise.md b/docs/tutorial2023_unfolding/unfolding_exercise.md
@@ -1,7 +1,7 @@
 # Likelihood Based Unfolding Exercise in Combine
 
 ## Getting started
-To get started, you should have a working setup of `Combine` and `CombineHarvester`, please follow the instructions from the [home page](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/#within-cmssw-recommended-for-cms-users), to setup `CombineHarvester` checkout necessary scripts as described [here](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/#combineharvestercombinetools). Make sure to use the latest recommended releases for both packages.
+To get started, you should have a working setup of `Combine`, please follow the instructions from the [home page](https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/#within-cmssw-recommended-for-cms-users). Make sure to use the latest recommended release.
 
 After setting up `Combine`, you can access the working directory for this tutorial which contains all of the inputs and scripts needed to run the unfolding fitting exercise:
 
diff --git a/interface/utils.h b/interface/utils.h
@@ -112,7 +112,7 @@ namespace utils {
     void setModelParameters( const std::string & setPhysicsModelParameterExpression, const RooArgSet & params);
     // Set range of physics model parameters
     void setModelParameterRanges( const std::string & setPhysicsModelParameterRangeExpression, const RooArgSet & params);
-    void check_inf_parameters(const RooArgSet & params);
+    void check_inf_parameters(const RooArgSet & params, int verbosity);
     bool isParameterAtBoundary( const RooRealVar &);
     bool anyParameterAtBoundaries( const RooArgSet &, int verbosity);
 
diff --git a/src/Combine.cc b/src/Combine.cc
@@ -483,7 +483,7 @@ void Combine::run(TString hlfFile, const std::string &dataset, double &limit, do
     }
     // look for parameters ranged [-1e+30, 1e+30], corresponding to the old definition of unlimited parameters, 
     // since ROOT v6.30 have to removeRange() to keep them unlimited
-    utils::check_inf_parameters(w->allVars());
+    utils::check_inf_parameters(w->allVars(), verbose);
 
   } else {
     std::cerr << "HLF not validated" << std::endl;
@@ -527,7 +527,7 @@ void Combine::run(TString hlfFile, const std::string &dataset, double &limit, do
     if (setPhysicsModelParameterExpression_ != "") {
 	    utils::setModelParameters( setPhysicsModelParameterExpression_, w->allVars());
     }
-    utils::check_inf_parameters(w->allVars());
+    utils::check_inf_parameters(w->allVars(), verbose);
   }
   gSystem->cd(pwd);
 
diff --git a/src/utils.cc b/src/utils.cc
@@ -886,14 +886,16 @@ void utils::setModelParameterRanges( const std::string & setPhysicsModelParamete
 }
 
 
-void utils::check_inf_parameters(const RooArgSet & params) {
+void utils::check_inf_parameters(const RooArgSet & params, int verbosity) {
 
     double infinity_root626 = 1.0e30; 
     for (RooAbsArg *arg : params) {    
         RooRealVar *p = dynamic_cast<RooRealVar *>(arg);
         if (p->getRange().first <= -infinity_root626 || p->getRange().second >= +infinity_root626){
-            std::cout << "Found a parameter named "<< p->GetName() 
-                      << " infinite in ROOT versions < 6.30, going to removeRange()" << endl;
+            if ( verbosity > 2 ) {
+                std::cout << "Found a parameter named "<< p->GetName()
+                          << " infinite in ROOT versions < 6.30, going to removeRange()" << endl;
+            }
             p->removeRange();
         }
     }

Original file line number	Diff line number	Diff line change
`@@ -483,7 +483,7 @@ void Combine::run(TString hlfFile, const std::string &dataset, double &limit, do`
`483`	`483`	`}`
`484`	`484`	`// look for parameters ranged [-1e+30, 1e+30], corresponding to the old definition of unlimited parameters,`
`485`	`485`	`// since ROOT v6.30 have to removeRange() to keep them unlimited`
`486`		`- utils::check_inf_parameters(w->allVars());`
	`486`	`+ utils::check_inf_parameters(w->allVars(), verbose);`
`487`	`487`
`488`	`488`	`} else {`
`489`	`489`	`std::cerr << "HLF not validated" << std::endl;`
`@@ -527,7 +527,7 @@ void Combine::run(TString hlfFile, const std::string &dataset, double &limit, do`
`527`	`527`	`if (setPhysicsModelParameterExpression_ != "") {`
`528`	`528`	`utils::setModelParameters( setPhysicsModelParameterExpression_, w->allVars());`
`529`	`529`	`}`
`530`		`- utils::check_inf_parameters(w->allVars());`
	`530`	`+ utils::check_inf_parameters(w->allVars(), verbose);`
`531`	`531`	`}`
`532`	`532`	`gSystem->cd(pwd);`
`533`	`533`
Original file line number	Diff line number	Diff line change
`@@ -886,14 +886,16 @@ void utils::setModelParameterRanges( const std::string & setPhysicsModelParamete`
`886`	`886`	`}`
`887`	`887`
`888`	`888`
`889`		`-void utils::check_inf_parameters(const RooArgSet & params) {`
	`889`	`+void utils::check_inf_parameters(const RooArgSet & params, int verbosity) {`
`890`	`890`
`891`	`891`	`double infinity_root626 = 1.0e30;`
`892`	`892`	`for (RooAbsArg *arg : params) {`
`893`	`893`	`RooRealVar p = dynamic_cast<RooRealVar >(arg);`
`894`	`894`	`if (p->getRange().first <= -infinity_root626 \|\| p->getRange().second >= +infinity_root626){`
`895`		`- std::cout << "Found a parameter named "<< p->GetName()`
`896`		`- << " infinite in ROOT versions < 6.30, going to removeRange()" << endl;`
	`895`	`+ if ( verbosity > 2 ) {`
	`896`	`+ std::cout << "Found a parameter named "<< p->GetName()`
	`897`	`+ << " infinite in ROOT versions < 6.30, going to removeRange()" << endl;`
	`898`	`+ }`
`897`	`899`	`p->removeRange();`
`898`	`900`	`}`
`899`	`901`	`}`