Skip to content

Conversation

@saswatinandan
Copy link
Contributor

@saswatinandan saswatinandan commented Sep 28, 2025

This PR is created to implement the changes described in the DP note CMS-DP-2025-031. The changes are as described in the DP note exactly, except following things

  • the boolean variables filter_, isSaturated_, and peakFilter_ have been dropped for the collection SiStripApproxCluster_v1 and encoded to the compBarycenter_ and compavgCharge_. It reduces the size of strip collection by 5%.
  • A flag v1 is added to the code so that default and the new version can be run. To run v1 online, this line should be added to the hlt configuration file process.hltSiStripClusters2ApproxClusters = cms.EDProducer("SiStripClusters2ApproxClusters_v1".... instead of process.hltSiStripClusters2ApproxClusters = cms.EDProducer("SiStripClusters2ApproxClusters".....
  • To apply tight charge cut on strip cluster these lines shoyld be added to the online configuration process.HLTSiStripClusterChargeCutTight = cms.PSet( value = cms.double( 1945.0 ) ) process.HLTSiStripClusterChargeCutNone = cms.PSet( value = cms.double( -1.0 ) ) process.hltSiStripClusterizerForRawPrime.Clusterizer.clusterChargeCut.refToPSet_='HLTSiStripClusterChargeCutTight' process.ClusterShapeHitFilterESProducer.clusterChargeCut.refToPSet_='HLTSiStripClusterChargeCutTight'. This reduces the cluster size by 3.3%.

With all these changes, the strip cluster collection size is reduced by ~21%.

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 28, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49015/46203

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@saswatinandan
Copy link
Contributor Author

This PR is created to include a flag so that default rawprime can be used in parallel to the changes which are introducedì in the DP note CMS-DP-2025-031 ;. This PR follows the exact algorithm of the DP note. Since another PR is created with modification and it already failed, it would be better to proceed with this PR to proceed quickly.

@mmusich
Copy link
Contributor

mmusich commented Sep 29, 2025

@saswatinandan this PR is not testable.
Please fix the code quality requirements

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49015/46209

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @saswatinandan for master.

It involves the following packages:

  • DataFormats/SiStripCluster (reconstruction)
  • RecoLocalTracker/SiStripClusterizer (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @alesaggio, @echabert, @elusian, @felicepantaleo, @gbenelli, @gpetruc, @jlidrych, @missirol, @mmasciov, @mmusich, @mtosi, @robervalwalsh, @rovere, @threus, @yduhm this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@mmusich
Copy link
Contributor

mmusich commented Sep 29, 2025

@saswatinandan this PR lacks of a suitable title as well as a suitable description, please provide one.

@mmusich
Copy link
Contributor

mmusich commented Sep 29, 2025

test parameters:

  • addpkg = DQM/Integration
  • workflows = 161,161.02,161.03,161.1,161.2,161.3,161.4,162,162.02,162.03,162.1,162.2,162.3,162.4

@mmusich
Copy link
Contributor

mmusich commented Sep 29, 2025

@cmsbuild, please test

@cmsbuild cmsbuild added the hold label Sep 29, 2025
@silviodonato
Copy link
Contributor

@mmusich @icali @mandrenguyen

unclear the interplay of this with #49013

Let me clarify a bit. I worked on #49013 to add the possibility to run using the two versions of raw prime in the same CMSSW release (which was not possible in the original version provided by @saswatinandan ). Meanwhile @saswatinandan implemented this alternative version, which allows you to run on the two versions of rawPrime.

The main difference between the two PRs are:

  • here SiStripApproximateCluster.h is completely unchanged, the new rawPrime version is implemented in a new file SiStripApproximateCluster_v1.h
  • while in New version of RawPrime (SiStripApproximateCluster) #49013 SiStripApproximateCluster.h is changed and contains both versions of rawPrime (adding a version_ flag data member to SiStripApproximateCluster. Note: as version_ is constant the impact on the event size is only 0.02%).

I also implemented some other changes, mainly trying to simplify the code and improve the readability. I made also some other changes with a very small impact on the performance. You can easily see all the differences by looking at this commit.
Anyway, the "physics" behind (removal of beginIndices_, save distance instead of value for barycenter_ and detId_, encoding of flags inside spare bits of other variables, optimization of number of bits used in some variables) is identical as I took it directly from @saswatinandan's previous code.

Clearly, there are Pros and Cons for both the implementations.
I believe that both versions are ok, especially for a testing stream.
Let us know which solution you prefer.

@mmusich
Copy link
Contributor

mmusich commented Sep 30, 2025

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d3e041/48369/summary.html
COMMIT: 1d8ce7c
CMSSW: CMSSW_16_0_X_2025-09-29-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/49015/48369/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 63
  • DQMHistoTests: Total histograms compared: 5531720
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 5531700
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 62 files compared)
  • Checked 282 log files, 248 edm output root files, 63 DQM output files
  • TriggerResults: no differences found

@mandrenguyen
Copy link
Contributor

unhold

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @ftenchini, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 6562d2d into cms-sw:master Sep 30, 2025
11 checks passed
@mandrenguyen
Copy link
Contributor

Just to follow up: It seemed that we were being asked to choose between this PR and #49013. Since the physics was said to be identical I simply went with the PR that was fully signed. If there should be more discussion between the implementation in this PR and #49013 let's still have it, and follow up in a subsequent PR, if needed. I was simply trying to make sure we had something in for 15_1_0 which will be built very shortly.

@saswatinandan
Copy link
Contributor Author

saswatinandan commented Oct 1, 2025

I believe the code in other PR 49013 is cleaner than this one and provides a slight improvement in the calculation of the barycenter position. One important point to note is that if both versions are used in the same code like the other PR 49013, the boolean variables filter_, isSaturated_, and peakFilter_ cannot be dropped, since this would cause conflicts with other code such as code1 and code2. In that case, it might be better to create two separate collections and, if possible, merge the other PR.

@mmusich
Copy link
Contributor

mmusich commented Oct 1, 2025

One important point to note is that if both versions are used in the same code,

maybe I am missing something, but now since the two data-formats are completely decoupled we could just have different versions of the backward compatibility tests (tagging also @cms-sw/core-l2 ) or the conversion code.
In any case the plan for 2025 PbPb as far as I understand is to just produce at the HLT level a test stream with the new data-format and only repack but not feed it anywhere to downstream reco.

@mandrenguyen
Copy link
Contributor

I believe the code in other PR 49013 is cleaner than this one and provides a slight improvement in the calculation of the barycenter position. One important point to note is that if both versions are used in the same code like the other PR 49013, the boolean variables filter_, isSaturated_, and peakFilter_ cannot be dropped, since this would cause conflicts with other code such as code1 and code2. In that case, it might be better to create two separate collections and, if possible, merge the other PR.

Apologies for jumping the gun then. Feel free to make a new PR or re-open the closed one (and deal with the conflicts I guess).

@silviodonato
Copy link
Contributor

maybe I am missing something, but now since the two data-formats are completely decoupled we could just have different versions of the backward compatibility tests (tagging also @cms-sw/core-l2 ) or the conversion code.
In any case the plan for 2025 PbPb as far as I understand is to just produce at the HLT level a test stream with the new data-format and only repack but not feed it anywhere to downstream reco.

Yes, I agree. Let's keep #49015 merged and let's try to have the backport in 15_1_0.

SiStripApprox2ApproxClusters is only used in RecoLocalTracker/SiStripClusterizer/test/step2_RAW2DIGI_L1Reco_RECO_ApproxClusters.py
and it is used to estimate the performance of different compressing methods, it is not really relevant.
and yes, we could write a new TestWriteSiStripApproximateClusterCollection.cc in the future. They are not big issues in my opinion.

Note that the conversion code (SiStripApproximateCluster_v1 --> SiStripCluster) is already in place here, SiStripApprox2Clusters.cc

The only scope of SiStripApproximateCluster(_v1) is to be converted back to SiStripCluster. Once that is working, all the downstream code will work.

@mmusich
Copy link
Contributor

mmusich commented Oct 1, 2025

Note that the conversion code (SiStripApproximateCluster_v1 --> SiStripCluster) is already in place here, SiStripApprox2Clusters.cc

yes, I meant two instances of the same class configured with different flags that we can use e.g. in DQM to run the conversion from the two different streams (this is not yet available, but it's straightforward to implement)

Copy link
Contributor

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to jump in post merge.

#include "assert.h"

class SiStripCluster;
class SiStripApproximateCluster_v1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit confused by the use of namespace in v1::SiStripApproximateClusterCollection and postfix in SiStripApproximateCluster_v1. I think a namespace for both would have been clearer.

float _barycenter;
cms_uint16_t compBarycenter = (compBarycenter_ & 0x7FFF);
if (previous_barycenter == -999)
_barycenter = compBarycenter * maxBarycenter_ / maxRange_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leading underscore in variable names should be avoided (2.14 in https://cms-sw.github.io/cms_coding_rules.html#2--naming-rules-1)

Comment on lines +64 to +69
<class name="SiStripApproximateCluster_v1" ClassVersion="7">
<version ClassVersion="7" checksum="1154754493"/>
<version ClassVersion="6" checksum="132211472"/>
<version ClassVersion="5" checksum="3495825183"/>
<version ClassVersion="4" checksum="2854791577"/>
<version ClassVersion="3" checksum="2041370183"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new data format types there should be only one class version. The indentation would also be nice to be consistent

  <class name="SiStripApproximateCluster_v1" ClassVersion="3">
    <version ClassVersion="3" checksum="1154754493"/>

Comment on lines +71 to +73
<class name="v1::SiStripApproximateClusterCollection" ClassVersion="4">
<version ClassVersion="4" checksum="2896589077"/>
<version ClassVersion="3" checksum="3101417750"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

 <class name="v1::SiStripApproximateClusterCollection" ClassVersion="3">
  <version ClassVersion="3" checksum="2896589077"/>

Comment on lines 37 to +40
clusterToken_ = consumes(conf.getParameter<edm::InputTag>("inputApproxClusters"));
clusterToken_v1_ = consumes(conf.getParameter<edm::InputTag>("inputApproxClusters"));
tkGeomToken_ = esConsumes();
v1 = conf.getParameter<bool>("v1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since v1 dictates which of the two data products is accessed in the produce(), it would be better to do the same for consumes()

if (v1) {
  clusterToken_v1_ = consumes(conf.getParameter<edm::InputTag>("inputApproxClusters"));
} else {
  clusterToken_ = consumes(conf.getParameter<edm::InputTag>("inputApproxClusters"));
}

Then e.g. the clusterToken_v1_.isUninitialized() could be used in produce() and v1 member would not be needed.

@mmusich
Copy link
Contributor

mmusich commented Oct 3, 2025

type ngt

@cmsbuild cmsbuild added the ngt label Oct 3, 2025
@fwyzard
Copy link
Contributor

fwyzard commented Dec 21, 2025

Let me add some more general comments, as I just stumbled upon this PR by chance.

IMHO this is the kind of (technical) changes that should be discussed in a Reconstruction meeting, if we actually had them, before @cms-sw/reconstruction-l2 would sign for them.


Does this PR set a new precedent and policy for data formats ?
Or is it a "one off" - that may suggest CMS should find a better solution in the future ?

I do not see any other file with _v1 or any other version in their names.
I found just a couple of versioning namespaces in other files (RecoLocalCalo/EcalRecProducers/plugins/AmplitudeComputationKernels.h and L1Trigger/L1TMuonEndCapPhase2/interface/EMTFConstants.h), and in each case there seems to be a single version, and there is no version in the file name.


In the current version of the code, DataFormats/SiStripCluster/src/SiStripApproximateCluster_v1.cc is still using names with a trailing underscore for local:

bool filter_, isSaturated_, peakFilter_;

Those should be reserved to data members, please fix them by removing the trailing _.


The static analyser does not seem to like the code. That's how I stumbled upon this file, I saw this warning while looking at some unrelated changes:

src/DataFormats/SiStripCluster/src/SiStripApproximateCluster_v1.cc:28:3: warning: Value stored to 'filter_' is never read [deadcode.DeadStores]
   28 |   filter_ = false;
      |   ^         ~~~~~
src/DataFormats/SiStripCluster/src/SiStripApproximateCluster_v1.cc:48:5: warning: Value stored to 'filter_' is never read [deadcode.DeadStores]
   48 |     filter_ = true;
      |     ^         ~~~~
2 warnings generated.

Finally, I'm just confused, if this is the new version of the code, why is this v1 and not v2 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants