Skip to content

Refactor binary HNSW stats to use OpenMP reduction instead of destructor sync (#4910)#4910

Closed
sharm235 wants to merge 1 commit into
facebookresearch:mainfrom
sharm235:export-D95911440
Closed

Refactor binary HNSW stats to use OpenMP reduction instead of destructor sync (#4910)#4910
sharm235 wants to merge 1 commit into
facebookresearch:mainfrom
sharm235:export-D95911440

Conversation

@sharm235
Copy link
Copy Markdown

@sharm235 sharm235 commented Mar 11, 2026

Summary:

Problem

Follow-up to the previous fix (replacing #pragma omp critical with #pragma omp atomic in FlatHammingDis::~FlatHammingDis). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (IndexHNSW.cpp).

The previous approach had two issues:

  1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm
  2. FlatHammingDis maintained a redundant ndis counter (incremented on every operator() call) that duplicated what HNSW::search() already tracks internally via its returned HNSWStats

Changes

IndexBinaryHNSW::search() — use #pragma omp for reduction

  • Capture the HNSWStats returned by hnsw.search() per query
  • Accumulate n1, n2, ndis, nhops via #pragma omp for reduction(+: ...), which uses lock-free per-thread copies merged at barrier — zero contention
  • Call hnsw_stats.combine() once, single-threaded, after the parallel region
  • This exactly matches the pattern in hnsw_search() in IndexHNSW.cpp (lines 259-291)

FlatHammingDis — remove stats accumulation entirely

  • Remove size_t ndis field (redundant with HNSW::search() internal tracking)
  • Remove ndis++ from operator() (eliminates per-distance-computation overhead)
  • Remove destructor (no stats to accumulate → no sync needed)

IndexBinaryHNSWCagra::search() — fix missing stats aggregation

  • The base_level_only path's second parallel block (search_level_0) accumulated stats into a per-thread HNSWStats search_stats but never aggregated it into hnsw_stats — this was a pre-existing bug where stats were silently lost
  • Added #pragma omp critical { hnsw_stats.combine(search_stats); } once per thread at the end of the parallel block, matching IndexHNSW::search_level_0_wrapper() (line 461)

Performance Impact

  • Zero lock contention in the main IndexBinaryHNSW::search() path (reduction is lock-free)
  • Removes per-distance-computation ndis++ overhead from FlatHammingDis::operator()
  • IndexBinaryHNSWCagra path: #pragma omp critical once per thread (acceptable, matches float HNSW)
  • No change to search results — only stats accumulation mechanism changes

Reviewed By: mnorris11

Differential Revision: D95911440

@meta-cla meta-cla Bot added the CLA Signed label Mar 11, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Mar 11, 2026

@sharm235 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95911440.

@meta-codesync meta-codesync Bot changed the title Refactor binary HNSW stats to use OpenMP reduction instead of destructor sync Refactor binary HNSW stats to use OpenMP reduction instead of destructor sync (#4910) Mar 13, 2026
sharm235 pushed a commit to sharm235/faiss that referenced this pull request Mar 13, 2026
…tor sync (facebookresearch#4910)

Summary:

## Problem

Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`).

The previous approach had two issues:
1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm
2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats`

## Changes

### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction`
- Capture the `HNSWStats` returned by `hnsw.search()` per query
- Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention
- Call `hnsw_stats.combine()` once, single-threaded, after the parallel region
- This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291)

### `FlatHammingDis` — remove stats accumulation entirely
- Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking)
- Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead)
- Remove destructor (no stats to accumulate → no sync needed)

### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation
- The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost
- Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461)

## Performance Impact
- **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free)
- Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()`
- `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW)
- No change to search results — only stats accumulation mechanism changes

Reviewed By: mnorris11

Differential Revision: D95911440
sharm235 pushed a commit to sharm235/faiss that referenced this pull request Mar 13, 2026
…tor sync (facebookresearch#4910)

Summary:

## Problem

Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`).

The previous approach had two issues:
1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm
2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats`

## Changes

### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction`
- Capture the `HNSWStats` returned by `hnsw.search()` per query
- Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention
- Call `hnsw_stats.combine()` once, single-threaded, after the parallel region
- This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291)

### `FlatHammingDis` — remove stats accumulation entirely
- Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking)
- Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead)
- Remove destructor (no stats to accumulate → no sync needed)

### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation
- The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost
- Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461)

## Performance Impact
- **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free)
- Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()`
- `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW)
- No change to search results — only stats accumulation mechanism changes

Reviewed By: mnorris11

Differential Revision: D95911440
sharm235 pushed a commit to sharm235/faiss that referenced this pull request Mar 13, 2026
…tor sync (facebookresearch#4910)

Summary:

## Problem

Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`).

The previous approach had two issues:
1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm
2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats`

## Changes

### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction`
- Capture the `HNSWStats` returned by `hnsw.search()` per query
- Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention
- Call `hnsw_stats.combine()` once, single-threaded, after the parallel region
- This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291)

### `FlatHammingDis` — remove stats accumulation entirely
- Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking)
- Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead)
- Remove destructor (no stats to accumulate → no sync needed)

### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation
- The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost
- Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461)

## Performance Impact
- **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free)
- Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()`
- `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW)
- No change to search results — only stats accumulation mechanism changes

Reviewed By: mnorris11

Differential Revision: D95911440
sharm235 pushed a commit to sharm235/faiss that referenced this pull request Mar 13, 2026
…tor sync (facebookresearch#4910)

Summary:
Pull Request resolved: facebookresearch#4910

## Problem

Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`).

The previous approach had two issues:
1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm
2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats`

## Changes

### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction`
- Capture the `HNSWStats` returned by `hnsw.search()` per query
- Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention
- Call `hnsw_stats.combine()` once, single-threaded, after the parallel region
- This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291)

### `FlatHammingDis` — remove stats accumulation entirely
- Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking)
- Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead)
- Remove destructor (no stats to accumulate → no sync needed)

### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation
- The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost
- Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461)

## Performance Impact
- **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free)
- Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()`
- `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW)
- No change to search results — only stats accumulation mechanism changes

Reviewed By: mnorris11

Differential Revision: D95911440
@sharm235 sharm235 force-pushed the export-D95911440 branch 2 times, most recently from 9a89b7d to e3ddfed Compare March 13, 2026 05:34
sharm235 pushed a commit to sharm235/faiss that referenced this pull request Mar 13, 2026
…tor sync (facebookresearch#4910)

Summary:

## Problem

Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`).

The previous approach had two issues:
1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm
2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats`

## Changes

### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction`
- Capture the `HNSWStats` returned by `hnsw.search()` per query
- Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention
- Call `hnsw_stats.combine()` once, single-threaded, after the parallel region
- This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291)

### `FlatHammingDis` — remove stats accumulation entirely
- Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking)
- Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead)
- Remove destructor (no stats to accumulate → no sync needed)

### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation
- The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost
- Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461)

## Performance Impact
- **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free)
- Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()`
- `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW)
- No change to search results — only stats accumulation mechanism changes

Reviewed By: mnorris11

Differential Revision: D95911440
sharm235 pushed a commit to sharm235/faiss that referenced this pull request Mar 13, 2026
…tor sync (facebookresearch#4910)

Summary:

## Problem

Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`).

The previous approach had two issues:
1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm
2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats`

## Changes

### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction`
- Capture the `HNSWStats` returned by `hnsw.search()` per query
- Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention
- Call `hnsw_stats.combine()` once, single-threaded, after the parallel region
- This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291)

### `FlatHammingDis` — remove stats accumulation entirely
- Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking)
- Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead)
- Remove destructor (no stats to accumulate → no sync needed)

### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation
- The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost
- Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461)

## Performance Impact
- **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free)
- Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()`
- `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW)
- No change to search results — only stats accumulation mechanism changes

Reviewed By: mnorris11

Differential Revision: D95911440
…tor sync (facebookresearch#4910)

Summary:
Pull Request resolved: facebookresearch#4910

## Problem

Follow-up to the previous fix (replacing `#pragma omp critical` with `#pragma omp atomic` in `FlatHammingDis::~FlatHammingDis`). This commit implements the architecturally correct fix by matching the pattern already used in the float HNSW path (`IndexHNSW.cpp`).

The previous approach had two issues:
1. Stats accumulation in a destructor is fragile — the timing and frequency of destructor calls depends on the caller's object lifetime management, not on the search algorithm
2. `FlatHammingDis` maintained a redundant `ndis` counter (incremented on every `operator()` call) that duplicated what `HNSW::search()` already tracks internally via its returned `HNSWStats`

## Changes

### `IndexBinaryHNSW::search()` — use `#pragma omp for reduction`
- Capture the `HNSWStats` returned by `hnsw.search()` per query
- Accumulate `n1, n2, ndis, nhops` via `#pragma omp for reduction(+: ...)`, which uses lock-free per-thread copies merged at barrier — zero contention
- Call `hnsw_stats.combine()` once, single-threaded, after the parallel region
- This exactly matches the pattern in `hnsw_search()` in `IndexHNSW.cpp` (lines 259-291)

### `FlatHammingDis` — remove stats accumulation entirely
- Remove `size_t ndis` field (redundant with `HNSW::search()` internal tracking)
- Remove `ndis++` from `operator()` (eliminates per-distance-computation overhead)
- Remove destructor (no stats to accumulate → no sync needed)

### `IndexBinaryHNSWCagra::search()` — fix missing stats aggregation
- The `base_level_only` path's second parallel block (`search_level_0`) accumulated stats into a per-thread `HNSWStats search_stats` but never aggregated it into `hnsw_stats` — this was a pre-existing bug where stats were silently lost
- Added `#pragma omp critical { hnsw_stats.combine(search_stats); }` once per thread at the end of the parallel block, matching `IndexHNSW::search_level_0_wrapper()` (line 461)

## Performance Impact
- **Zero lock contention** in the main `IndexBinaryHNSW::search()` path (reduction is lock-free)
- Removes per-distance-computation `ndis++` overhead from `FlatHammingDis::operator()`
- `IndexBinaryHNSWCagra` path: `#pragma omp critical` once per thread (acceptable, matches float HNSW)
- No change to search results — only stats accumulation mechanism changes

Reviewed By: mnorris11

Differential Revision: D95911440
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Mar 13, 2026

This pull request has been merged in 5dcef18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant