Skip to content

Fix SNMP cacheNumObjCount -- number of cached objects #2053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

cvuosalo
Copy link
Contributor

@cvuosalo cvuosalo commented Apr 10, 2025

SNMP counter cacheNumObjCount used StoreEntry::inUseCount() stats. For
Squid instances using a rock cache_dirs or a shared memory cache, the
number of StoreEntry objects in use is usually very different from the
number of cached objects because these caches do not use StoreEntry
objects as a part of their index. For all instances, inUseCount() also
includes ongoing transactions and internal tasks that are not related to
cached objects at all.

We now use the sum of the counters already reported on "on-disk objects"
and "Hot Object Cache Items" lines in "Internal Data Structures" section
of mgr:info cache manager report. Due to floating-point arithmetic,
these stats are approximate, but it is best to keep SNMP and cache
manager reports consistent.

This change does not fix SNMP Gauge32 overflow bug: Caches with 2^32 or
more objects continue to report wrong/smaller cacheNumObjCount values.

On MemStore::getStats() and StoreInfoStats changes

To include the number of memory-cached objects while supporting SMP
configurations with shared memory caches, we had to change how cache
manager code aggregates StoreInfoStats::mem data collected from SMP
worker processes. Before these changes, StoreInfoStats::operator +=()
used a mem.shared data member to trigger special aggregation code hack,
but

  • SNMP-specific code cannot benefit from that StoreInfoStats aggregation
    because SNMP code exchanges simple counters rather than StoreInfoStats
    objects. StoreInfoStats::operator +=() is never called by SNMP code.
    Instead, SNMP uses Snmp::Pdu::aggregate() and friends.

  • We could not accommodate SNMP by simply adding special aggregation
    hacks directly to MemStore::getStats() because that would break
    critical "all workers report about the same stats" expectations of the
    special hack in StoreInfoStats::operator +=().

To make both SNMP and cache manager use cases work, we removed the hack
from StoreInfoStats::operator +=() and hacked MemStore::getStats()
instead, making the first worker responsible for shared memory cache
stats reporting (unlike SMP rock diskers, there is no single kid process
dedicated to managing a shared memory cache). StoreInfoStats operator
now uses natural aggregation logic without hacks.

TODO: After these changes, StoreInfoStats::mem.shared becomes
essentially unused because it was only used to enable special
aggregation hack in StoreInfoStats that no longer exists. Remove?

@cvuosalo cvuosalo marked this pull request as draft April 10, 2025 16:43
@cvuosalo cvuosalo changed the title Correct number of cache objects reported to monitoring Correct SNMP counter for number of cache objects Apr 10, 2025
@yadij
Copy link
Contributor

yadij commented Apr 11, 2025

This looks reasonable. Have you tested with multiple workers and rock caches to ensure the number is correct when there are multiple cache processes.

@rousskov
Copy link
Contributor

This looks reasonable. Have you tested with multiple workers and rock caches to ensure the number is correct when there are multiple cache processes.

It cannot be correct AFAICT because PR code does not communicate with disker processes that currently manage rock cache_dir stats. See Rock::SwapDir::doReportStat() and commit 39c1e1d for more details. If we keep the current official code architecture, then this PR would have to asynchronously aggregate information across diskers like mgr:info currently does. Doing that well (e.g., without code duplication) may be difficult, but I have not checked any details.

Regardless of the design specifics, we should strive to keep cache manager stats and SNMP stats in sync: Both code areas should use the same mechanisms for obtaining statistics and just format/report it differently. Unfortunately, achieving that ideal requires a lot of work.

Answer = snmp_var_new_integer(Var->name, Var->name_length,
(snint) StoreEntry::inUseCount(),
(snint) storestats.swap.count,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you try using a c++ static_cast instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This style of cast is used extensively throughout the file. I don't want to go against the style convention within this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • the type used for this cast is itself wrong1
  • lots of other nearby snmp_var_new_integer() callers have similar casting bugs/problems
  • no simple cast can supply a reasonable value -- more code is needed to handle SNMP Gauge32 limitations
  • official code this PR is replacing had similar casting bugs/problems

Given the above facts, I think this PR can leave this bad cast "as is" despite our "no C casts in new/changed code" preference/policy. Said that, I am not going to object to fixing this cast in this PR if @kinkie insists on that fix. However, I would then insist on PR code supplying the maximum Gauge32 value instead of overflowing that counter, with a level-2 (or even level-1) error logged to cache.log. @kinkie, do you insist?

Footnotes

  1. snmp_var_new_integer() does not take snint (i.e. int64_t) value; it takes an int value instead. The supplied storestats.swap.count value type is ... double. The old inUseCount() return value type is size_t.

@cvuosalo
Copy link
Contributor Author

cvuosalo commented Apr 17, 2025

As @rousskov said, the change in this draft PR doesn't provide the correct count of cache objects. I am trying to understand what code changes would be required to achieve the correction, but that will take some time.

LATER EDIT:
Further extensive testing actually shows this PR is correct and does provide the correct count of cache objects. I will elaborate in another comment.

@rousskov
Copy link
Contributor

the change in this draft PR doesn't provide the correct count of cache objects. I am trying to understand what code changes would be required to achieve the correction, but that will take some time.

Fortunately, Squid already has SNMP response aggregation framework that, according to its description, can do what you want in principle. For starting points, see Snmp::Inquirer and Snmp::Pdu::aggregate(). I am not an SNMP expert and do not remember much about that Squid SNMP code. We need to figure out why SNMP stats aggregation code is not triggered for the "number of cached entries" object and adjust the code accordingly.

N.B. I assume that the relevant SNMP stats aggregation code is not triggered today for the "number of cached entries" object.

@cvuosalo
Copy link
Contributor Author

cvuosalo commented May 5, 2025

I have tested this PR extensively and used snmpwalk to monitor the value of cacheNumObjCount on both test systems and production systems processing heavy traffic. I have configured debug_options and tracked the internal incrementing of the object count. My tests and tracing have shown this PR is correct. In Squid 6, the SNMP response aggregation framework is in place and working correctly to aggregate the Rock cache statistics. All the features @rousskov mentions above as being necessary have been implemented in the Squid 6 code and are working correctly, as far as I have seen in my review of the code and from my tests. I would like to remove this PR from draft status and propose it for inclusion in Squid 6.

@rousskov rousskov marked this pull request as ready for review May 6, 2025 15:14
@squid-anubis squid-anubis added M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels labels May 6, 2025
@rousskov rousskov changed the title Correct SNMP counter for number of cache objects Fix SNMP cacheNumObjCount -- number of disk cached objects May 6, 2025
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tests and tracing have shown this PR is correct.

Thank you for running those tests! I now also believe that this PR handles SMP stats aggregation correctly. This PR should be merged as long as cacheNumObjCount should be limited to on-disk objects (i.e. should exclude memory-cached objects). I left a change request dedicated to addressing that question/concern. That request is the only reason reason I am not approving this PR now.

I polished PR title/description (i.e. future commit message) and polished the code a bit. @cvuosalo, please check and adjust further as needed, keeping Squid Project commit message requirements in mind.

I would like to remove this PR from draft status

Done. FWIW, I hope you can change that status yourself, as you see fit.

and propose it for inclusion in Squid 6.

This PR targets master/v8, as it should. Squid v6 and v7 inclusion may happen after this PR is merged into master. Those decisions are up to v6 and v7 maintainers.

Answer = snmp_var_new_integer(Var->name, Var->name_length,
(snint) StoreEntry::inUseCount(),
(snint) storestats.swap.count,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • the type used for this cast is itself wrong1
  • lots of other nearby snmp_var_new_integer() callers have similar casting bugs/problems
  • no simple cast can supply a reasonable value -- more code is needed to handle SNMP Gauge32 limitations
  • official code this PR is replacing had similar casting bugs/problems

Given the above facts, I think this PR can leave this bad cast "as is" despite our "no C casts in new/changed code" preference/policy. Said that, I am not going to object to fixing this cast in this PR if @kinkie insists on that fix. However, I would then insist on PR code supplying the maximum Gauge32 value instead of overflowing that counter, with a level-2 (or even level-1) error logged to cache.log. @kinkie, do you insist?

Footnotes

  1. snmp_var_new_integer() does not take snint (i.e. int64_t) value; it takes an int value instead. The supplied storestats.swap.count value type is ... double. The old inUseCount() return value type is size_t.

@rousskov rousskov added the S-waiting-for-author author action is expected (and usually required) label May 6, 2025
@cvuosalo
Copy link
Contributor Author

cvuosalo commented May 6, 2025

@rousskov I don't have an opinion on including memory-cached objects in the count nor about the casts. I would defer to the Squid experts on these questions. Just let me know what additional changes to make, if any. Is there anything else in the PR that needs revision?

@rousskov
Copy link
Contributor

rousskov commented May 6, 2025

Just let me know what additional changes to make, if any. Is there anything else in the PR that needs revision?

IMO, no changes are needed except "add memory-cached objects counter" changes tracked in #2053 (comment). Please see that discussion for specific recommendations.

@squid-anubis squid-anubis added the M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels label May 13, 2025
@cvuosalo cvuosalo requested a review from rousskov May 13, 2025 21:33
Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for advancing this PR. We are making progress, but more work is needed.

cvuosalo added 2 commits May 16, 2025 22:22
Merge branch 'fix-cache-object-count' of github.com:cvuosalo/squid into fix-cache-object-count
@rousskov rousskov self-requested a review May 19, 2025 13:49
@rousskov rousskov removed the S-waiting-for-author author action is expected (and usually required) label May 19, 2025
@squid-anubis squid-anubis added M-abandoned-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels and removed M-passed-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels labels May 28, 2025
@cvuosalo
Copy link
Contributor Author

@rousskov After more testing and running with debugging statements, I understand better how this latest version of the PR is working. I think it will work fine for our purposes. I approve.
Thank you for your extensive help in completing this PR.

squid-anubis pushed a commit that referenced this pull request May 29, 2025
SNMP counter cacheNumObjCount used StoreEntry::inUseCount() stats. For
Squid instances using a rock cache_dirs or a shared memory cache, the
number of StoreEntry objects in use is usually very different from the
number of cached objects because these caches do not use StoreEntry
objects as a part of their index. For all instances, inUseCount() also
includes ongoing transactions and internal tasks that are not related to
cached objects at all.

We now use the sum of the counters already reported on "on-disk objects"
and "Hot Object Cache Items" lines in "Internal Data Structures" section
of `mgr:info` cache manager report. Due to floating-point arithmetic,
these stats are approximate, but it is best to keep SNMP and cache
manager reports consistent.

This change does not fix SNMP Gauge32 overflow bug: Caches with 2^32 or
more objects continue to report wrong/smaller cacheNumObjCount values.

### On MemStore::getStats() and StoreInfoStats changes

To include the number of memory-cached objects while supporting SMP
configurations with shared memory caches, we had to change how cache
manager code aggregates StoreInfoStats::mem data collected from SMP
worker processes. Before these changes, `StoreInfoStats::operator +=()`
used a mem.shared data member to trigger special aggregation code hack,
but

* SNMP-specific code cannot benefit from that StoreInfoStats aggregation
  because SNMP code exchanges simple counters rather than StoreInfoStats
  objects. `StoreInfoStats::operator +=()` is never called by SNMP code.
  Instead, SNMP uses Snmp::Pdu::aggregate() and friends.

* We could not accommodate SNMP by simply adding special aggregation
  hacks directly to MemStore::getStats() because that would break
  critical "all workers report about the same stats" expectations of the
  special hack in `StoreInfoStats::operator +=()`.

To make both SNMP and cache manager use cases work, we removed the hack
from StoreInfoStats::operator +=() and hacked MemStore::getStats()
instead, making the first worker responsible for shared memory cache
stats reporting (unlike SMP rock diskers, there is no single kid process
dedicated to managing a shared memory cache). StoreInfoStats operator
now uses natural aggregation logic without hacks.

TODO: After these changes, StoreInfoStats::mem.shared becomes
essentially unused because it was only used to enable special
aggregation hack in StoreInfoStats that no longer exists. Remove?
@squid-anubis squid-anubis added M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels and removed M-abandoned-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels labels May 29, 2025
@yadij yadij added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels and removed M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels S-could-use-an-approval An approval may speed this PR merger (but is not required) labels May 29, 2025
@yadij
Copy link
Contributor

yadij commented May 29, 2025

@rousskov, please fix the conflicts so this can be merged.

@yadij yadij added S-waiting-for-author author action is expected (and usually required) backport-to-v6 backport-to-v7 maintainer has approved these changes for v7 backporting labels May 29, 2025
@squid-anubis squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label May 29, 2025
@rousskov
Copy link
Contributor

@rousskov, please fix the conflicts so this can be merged.

What conflicts?

@rousskov rousskov removed the S-waiting-for-author author action is expected (and usually required) label May 29, 2025
@squid-anubis squid-anubis added M-merged https://github.com/measurement-factory/anubis#pull-request-labels and removed M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels labels May 29, 2025
squidadm pushed a commit to squidadm/squid that referenced this pull request May 29, 2025
SNMP counter cacheNumObjCount used StoreEntry::inUseCount() stats. For
Squid instances using a rock cache_dirs or a shared memory cache, the
number of StoreEntry objects in use is usually very different from the
number of cached objects because these caches do not use StoreEntry
objects as a part of their index. For all instances, inUseCount() also
includes ongoing transactions and internal tasks that are not related to
cached objects at all.

We now use the sum of the counters already reported on "on-disk objects"
and "Hot Object Cache Items" lines in "Internal Data Structures" section
of `mgr:info` cache manager report. Due to floating-point arithmetic,
these stats are approximate, but it is best to keep SNMP and cache
manager reports consistent.

This change does not fix SNMP Gauge32 overflow bug: Caches with 2^32 or
more objects continue to report wrong/smaller cacheNumObjCount values.

### On MemStore::getStats() and StoreInfoStats changes

To include the number of memory-cached objects while supporting SMP
configurations with shared memory caches, we had to change how cache
manager code aggregates StoreInfoStats::mem data collected from SMP
worker processes. Before these changes, `StoreInfoStats::operator +=()`
used a mem.shared data member to trigger special aggregation code hack,
but

* SNMP-specific code cannot benefit from that StoreInfoStats aggregation
  because SNMP code exchanges simple counters rather than StoreInfoStats
  objects. `StoreInfoStats::operator +=()` is never called by SNMP code.
  Instead, SNMP uses Snmp::Pdu::aggregate() and friends.

* We could not accommodate SNMP by simply adding special aggregation
  hacks directly to MemStore::getStats() because that would break
  critical "all workers report about the same stats" expectations of the
  special hack in `StoreInfoStats::operator +=()`.

To make both SNMP and cache manager use cases work, we removed the hack
from StoreInfoStats::operator +=() and hacked MemStore::getStats()
instead, making the first worker responsible for shared memory cache
stats reporting (unlike SMP rock diskers, there is no single kid process
dedicated to managing a shared memory cache). StoreInfoStats operator
now uses natural aggregation logic without hacks.

TODO: After these changes, StoreInfoStats::mem.shared becomes
essentially unused because it was only used to enable special
aggregation hack in StoreInfoStats that no longer exists. Remove?
@squidadm squidadm removed the backport-to-v7 maintainer has approved these changes for v7 backporting label May 29, 2025
@squidadm
Copy link
Collaborator

queued for backport to v7

yadij pushed a commit that referenced this pull request May 29, 2025
SNMP counter cacheNumObjCount used StoreEntry::inUseCount() stats. For
Squid instances using a rock cache_dirs or a shared memory cache, the
number of StoreEntry objects in use is usually very different from the
number of cached objects because these caches do not use StoreEntry
objects as a part of their index. For all instances, inUseCount() also
includes ongoing transactions and internal tasks that are not related to
cached objects at all.

We now use the sum of the counters already reported on "on-disk objects"
and "Hot Object Cache Items" lines in "Internal Data Structures" section
of `mgr:info` cache manager report. Due to floating-point arithmetic,
these stats are approximate, but it is best to keep SNMP and cache
manager reports consistent.

This change does not fix SNMP Gauge32 overflow bug: Caches with 2^32 or
more objects continue to report wrong/smaller cacheNumObjCount values.

### On MemStore::getStats() and StoreInfoStats changes

To include the number of memory-cached objects while supporting SMP
configurations with shared memory caches, we had to change how cache
manager code aggregates StoreInfoStats::mem data collected from SMP
worker processes. Before these changes, `StoreInfoStats::operator +=()`
used a mem.shared data member to trigger special aggregation code hack,
but

* SNMP-specific code cannot benefit from that StoreInfoStats aggregation
  because SNMP code exchanges simple counters rather than StoreInfoStats
  objects. `StoreInfoStats::operator +=()` is never called by SNMP code.
  Instead, SNMP uses Snmp::Pdu::aggregate() and friends.

* We could not accommodate SNMP by simply adding special aggregation
  hacks directly to MemStore::getStats() because that would break
  critical "all workers report about the same stats" expectations of the
  special hack in `StoreInfoStats::operator +=()`.

To make both SNMP and cache manager use cases work, we removed the hack
from StoreInfoStats::operator +=() and hacked MemStore::getStats()
instead, making the first worker responsible for shared memory cache
stats reporting (unlike SMP rock diskers, there is no single kid process
dedicated to managing a shared memory cache). StoreInfoStats operator
now uses natural aggregation logic without hacks.

TODO: After these changes, StoreInfoStats::mem.shared becomes
essentially unused because it was only used to enable special
aggregation hack in StoreInfoStats that no longer exists. Remove?
squidadm pushed a commit to squidadm/squid that referenced this pull request May 29, 2025
SNMP counter cacheNumObjCount used StoreEntry::inUseCount() stats. For
Squid instances using a rock cache_dirs or a shared memory cache, the
number of StoreEntry objects in use is usually very different from the
number of cached objects because these caches do not use StoreEntry
objects as a part of their index. For all instances, inUseCount() also
includes ongoing transactions and internal tasks that are not related to
cached objects at all.

We now use the sum of the counters already reported on "on-disk objects"
and "Hot Object Cache Items" lines in "Internal Data Structures" section
of `mgr:info` cache manager report. Due to floating-point arithmetic,
these stats are approximate, but it is best to keep SNMP and cache
manager reports consistent.

This change does not fix SNMP Gauge32 overflow bug: Caches with 2^32 or
more objects continue to report wrong/smaller cacheNumObjCount values.

### On MemStore::getStats() and StoreInfoStats changes

To include the number of memory-cached objects while supporting SMP
configurations with shared memory caches, we had to change how cache
manager code aggregates StoreInfoStats::mem data collected from SMP
worker processes. Before these changes, `StoreInfoStats::operator +=()`
used a mem.shared data member to trigger special aggregation code hack,
but

* SNMP-specific code cannot benefit from that StoreInfoStats aggregation
  because SNMP code exchanges simple counters rather than StoreInfoStats
  objects. `StoreInfoStats::operator +=()` is never called by SNMP code.
  Instead, SNMP uses Snmp::Pdu::aggregate() and friends.

* We could not accommodate SNMP by simply adding special aggregation
  hacks directly to MemStore::getStats() because that would break
  critical "all workers report about the same stats" expectations of the
  special hack in `StoreInfoStats::operator +=()`.

To make both SNMP and cache manager use cases work, we removed the hack
from StoreInfoStats::operator +=() and hacked MemStore::getStats()
instead, making the first worker responsible for shared memory cache
stats reporting (unlike SMP rock diskers, there is no single kid process
dedicated to managing a shared memory cache). StoreInfoStats operator
now uses natural aggregation logic without hacks.

TODO: After these changes, StoreInfoStats::mem.shared becomes
essentially unused because it was only used to enable special
aggregation hack in StoreInfoStats that no longer exists. Remove?
squidadm pushed a commit to squidadm/squid that referenced this pull request May 29, 2025
SNMP counter cacheNumObjCount used StoreEntry::inUseCount() stats. For
Squid instances using a rock cache_dirs or a shared memory cache, the
number of StoreEntry objects in use is usually very different from the
number of cached objects because these caches do not use StoreEntry
objects as a part of their index. For all instances, inUseCount() also
includes ongoing transactions and internal tasks that are not related to
cached objects at all.

We now use the sum of the counters already reported on "on-disk objects"
and "Hot Object Cache Items" lines in "Internal Data Structures" section
of `mgr:info` cache manager report. Due to floating-point arithmetic,
these stats are approximate, but it is best to keep SNMP and cache
manager reports consistent.

This change does not fix SNMP Gauge32 overflow bug: Caches with 2^32 or
more objects continue to report wrong/smaller cacheNumObjCount values.

### On MemStore::getStats() and StoreInfoStats changes

To include the number of memory-cached objects while supporting SMP
configurations with shared memory caches, we had to change how cache
manager code aggregates StoreInfoStats::mem data collected from SMP
worker processes. Before these changes, `StoreInfoStats::operator +=()`
used a mem.shared data member to trigger special aggregation code hack,
but

* SNMP-specific code cannot benefit from that StoreInfoStats aggregation
  because SNMP code exchanges simple counters rather than StoreInfoStats
  objects. `StoreInfoStats::operator +=()` is never called by SNMP code.
  Instead, SNMP uses Snmp::Pdu::aggregate() and friends.

* We could not accommodate SNMP by simply adding special aggregation
  hacks directly to MemStore::getStats() because that would break
  critical "all workers report about the same stats" expectations of the
  special hack in `StoreInfoStats::operator +=()`.

To make both SNMP and cache manager use cases work, we removed the hack
from StoreInfoStats::operator +=() and hacked MemStore::getStats()
instead, making the first worker responsible for shared memory cache
stats reporting (unlike SMP rock diskers, there is no single kid process
dedicated to managing a shared memory cache). StoreInfoStats operator
now uses natural aggregation logic without hacks.

TODO: After these changes, StoreInfoStats::mem.shared becomes
essentially unused because it was only used to enable special
aggregation hack in StoreInfoStats that no longer exists. Remove?
@squidadm
Copy link
Collaborator

queued for backport to v6

yadij pushed a commit that referenced this pull request May 29, 2025
SNMP counter cacheNumObjCount used StoreEntry::inUseCount() stats. For
Squid instances using a rock cache_dirs or a shared memory cache, the
number of StoreEntry objects in use is usually very different from the
number of cached objects because these caches do not use StoreEntry
objects as a part of their index. For all instances, inUseCount() also
includes ongoing transactions and internal tasks that are not related to
cached objects at all.

We now use the sum of the counters already reported on "on-disk objects"
and "Hot Object Cache Items" lines in "Internal Data Structures" section
of `mgr:info` cache manager report. Due to floating-point arithmetic,
these stats are approximate, but it is best to keep SNMP and cache
manager reports consistent.

This change does not fix SNMP Gauge32 overflow bug: Caches with 2^32 or
more objects continue to report wrong/smaller cacheNumObjCount values.

### On MemStore::getStats() and StoreInfoStats changes

To include the number of memory-cached objects while supporting SMP
configurations with shared memory caches, we had to change how cache
manager code aggregates StoreInfoStats::mem data collected from SMP
worker processes. Before these changes, `StoreInfoStats::operator +=()`
used a mem.shared data member to trigger special aggregation code hack,
but

* SNMP-specific code cannot benefit from that StoreInfoStats aggregation
  because SNMP code exchanges simple counters rather than StoreInfoStats
  objects. `StoreInfoStats::operator +=()` is never called by SNMP code.
  Instead, SNMP uses Snmp::Pdu::aggregate() and friends.

* We could not accommodate SNMP by simply adding special aggregation
  hacks directly to MemStore::getStats() because that would break
  critical "all workers report about the same stats" expectations of the
  special hack in `StoreInfoStats::operator +=()`.

To make both SNMP and cache manager use cases work, we removed the hack
from StoreInfoStats::operator +=() and hacked MemStore::getStats()
instead, making the first worker responsible for shared memory cache
stats reporting (unlike SMP rock diskers, there is no single kid process
dedicated to managing a shared memory cache). StoreInfoStats operator
now uses natural aggregation logic without hacks.

TODO: After these changes, StoreInfoStats::mem.shared becomes
essentially unused because it was only used to enable special
aggregation hack in StoreInfoStats that no longer exists. Remove?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-merged https://github.com/measurement-factory/anubis#pull-request-labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants