Skip to content

Commit b87726a

Browse files
authored
SuperKMeans and Full Refactor (#12)
* Bye BOND and NARY * Working refactor * U8 works * IVF2 working * IVF2 working * Adding SuperKMeans * Build Index done * End to end benchmark that creates indexes from scratch * Refactor structure * Serialize and deserialize * Removing old benchmarks * Refactoring bindings * Get used memory in bytes * Get in memory size * Format fix * FFTW working but deactivating for now * Filtered search working * Test suite * Fixing bug with PDXTree * Git Ignore * Reintroducing FAISS and some readme stuff * Hierarchical KMeans by default * Fixing examples * Fixing examples * New PDX * Compile benchmarks flag * Commiting to hierarchical and cleaning a bit * Fixing FFTW bug and enhancing performance * Bumping SuperKMeans * Adding cohere dataset * Adding cohere dataset * AVX512 fix * AVX512 fix * AVX512 fix * Optimizing index creation * Bump superkmeans * Fixing unaligned pointer * Final results and new README * Removing benchmarks
1 parent 4a2e65e commit b87726a

File tree

121 files changed

+8970
-7951
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

121 files changed

+8970
-7951
lines changed

.clang-format

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
BasedOnStyle: LLVM
2+
3+
# Indentation
4+
IndentWidth: 4
5+
TabWidth: 4
6+
7+
# Braces
8+
BreakBeforeBraces: Attach
9+
AllowShortFunctionsOnASingleLine: InlineOnly
10+
11+
ColumnLimit: 100
12+
13+
# Pointer/reference alignment
14+
PointerAlignment: Left
15+
AlignConsecutiveAssignments: false
16+
AlignConsecutiveDeclarations: false
17+
AlignAfterOpenBracket: BlockIndent
18+
AlwaysBreakTemplateDeclarations: Yes
19+
20+
# Spaces
21+
SpaceBeforeParens: ControlStatements
22+
SpaceAfterCStyleCast: true
23+
SpacesInParentheses: false
24+
25+
26+
BinPackParameters: false
27+
AllowAllParametersOfDeclarationOnNextLine: false
28+
AlwaysBreakAfterReturnType: None
29+
PenaltyReturnTypeOnItsOwnLine: 1024
30+
31+
BinPackArguments: false
32+
AllowAllArgumentsOnNextLine: true

.clang-tidy

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
Checks: >
2+
-*,
3+
bugprone-*,
4+
clang-analyzer-*,
5+
cppcoreguidelines-virtual-class-destructor,
6+
modernize-pass-by-value,
7+
modernize-use-emplace,
8+
modernize-use-nullptr,
9+
modernize-use-override,
10+
modernize-use-using,
11+
performance-*,
12+
readability-redundant-*,
13+
-bugprone-easily-swappable-parameters,
14+
-performance-avoid-endl
15+
16+
WarningsAsErrors: ''
17+
18+
CheckOptions:
19+
- key: bugprone-narrowing-conversions.WarnOnEquivalentBitWidth
20+
value: false
21+
22+
HeaderFilterRegex: 'include/superkmeans/.*'

.github/workflows/ci.yml

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main]
6+
paths-ignore:
7+
- '**.md'
8+
- 'LICENSE'
9+
- '.gitignore'
10+
pull_request:
11+
branches: [main]
12+
paths-ignore:
13+
- '**.md'
14+
- 'LICENSE'
15+
- '.gitignore'
16+
17+
jobs:
18+
format-check:
19+
runs-on: ubuntu-24.04
20+
steps:
21+
- uses: actions/checkout@v4
22+
23+
- uses: actions/setup-python@v5
24+
with:
25+
python-version: "3.12"
26+
27+
- name: Install clang-format 18.1.8
28+
run: pip install clang-format==18.1.8
29+
30+
- name: Check C++ formatting
31+
run: |
32+
clang-format --version
33+
./scripts/format_check.sh
34+
35+
tidy-check:
36+
runs-on: ubuntu-24.04
37+
env:
38+
CC: clang-18
39+
CXX: clang++-18
40+
steps:
41+
- uses: actions/checkout@v4
42+
with:
43+
submodules: recursive
44+
45+
- name: Install dependencies
46+
run: |
47+
sudo apt-get update
48+
sudo apt-get install -y clang-18 clang-tidy-18 libomp-18-dev libopenblas-dev cmake
49+
sudo ln -sf /usr/bin/clang-tidy-18 /usr/local/bin/clang-tidy
50+
51+
- name: Configure
52+
run: cmake -B build -DPDX_COMPILE_TESTS=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
53+
54+
- name: Build
55+
run: cmake --build build -j$(nproc)
56+
57+
- name: Run clang-tidy
58+
run: |
59+
ln -s build/compile_commands.json compile_commands.json
60+
./scripts/tidy_check.sh
61+
62+
cpp-build-and-test:
63+
runs-on: ubuntu-24.04
64+
env:
65+
CC: clang-18
66+
CXX: clang++-18
67+
steps:
68+
- uses: actions/checkout@v4
69+
with:
70+
submodules: recursive
71+
72+
- name: Install dependencies
73+
run: |
74+
sudo apt-get update
75+
sudo apt-get install -y clang-18 libomp-18-dev libopenblas-dev cmake
76+
77+
- name: Configure
78+
run: cmake -B build -DPDX_COMPILE_TESTS=ON -DCMAKE_BUILD_TYPE=Release
79+
80+
- name: Build tests
81+
run: cmake --build build -j$(nproc) --target tests
82+
83+
- name: Run tests
84+
run: ctest --test-dir build --output-on-failure
85+
86+
python:
87+
runs-on: ubuntu-24.04
88+
env:
89+
CC: clang-18
90+
CXX: clang++-18
91+
steps:
92+
- uses: actions/checkout@v4
93+
with:
94+
submodules: recursive
95+
96+
- name: Install system dependencies
97+
run: |
98+
sudo apt-get update
99+
sudo apt-get install -y clang-18 libomp-18-dev libopenblas-dev cmake
100+
101+
- uses: actions/setup-python@v5
102+
with:
103+
python-version: "3.12"
104+
105+
- name: Install Python bindings
106+
run: pip install .
107+
108+
- name: Verify import
109+
run: python -c "import pdxearch; print('pdxearch imported successfully')"
110+
111+
sanitizers-asan-ubsan:
112+
runs-on: ubuntu-24.04
113+
env:
114+
CC: clang-18
115+
CXX: clang++-18
116+
steps:
117+
- uses: actions/checkout@v4
118+
with:
119+
submodules: recursive
120+
121+
- name: Install dependencies
122+
run: |
123+
sudo apt-get update
124+
sudo apt-get install -y clang-18 libomp-18-dev libopenblas-dev cmake
125+
126+
- name: Configure with ASan + UBSan
127+
run: |
128+
cmake -B build_asan -DPDX_COMPILE_TESTS=ON \
129+
-DCMAKE_BUILD_TYPE=Debug \
130+
-DCMAKE_CXX_FLAGS="-fsanitize=address,undefined -fno-omit-frame-pointer" \
131+
-DCMAKE_C_FLAGS="-fsanitize=address,undefined -fno-omit-frame-pointer"
132+
133+
- name: Build tests
134+
run: cmake --build build_asan -j$(nproc) --target tests
135+
136+
- name: Run tests
137+
run: ctest --test-dir build_asan --output-on-failure
138+
139+
ci-pass:
140+
runs-on: ubuntu-latest
141+
if: always()
142+
needs:
143+
- format-check
144+
- tidy-check
145+
- cpp-build-and-test
146+
- python
147+
- sanitizers-asan-ubsan
148+
steps:
149+
- name: Check all jobs passed
150+
run: |
151+
if [[ "${{ contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled') }}" == "true" ]]; then
152+
echo "One or more jobs failed or were cancelled"
153+
exit 1
154+
fi

.gitignore

Lines changed: 16 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,11 @@ venv
4242
/cmake-build-debug/
4343
/cmake-build-release/
4444
/cmake-build*/
45+
/build_debug/
46+
/build_release/
47+
/build_relwithdebinfo/
48+
/tests/cmake_test_discovery_*.json
49+
compile_commands.json
4550
/.idea/
4651
/dummy/
4752
/Testing/
@@ -62,10 +67,10 @@ pdxearch.egg-info
6267
/benchmarks/core_indexes/faiss_l0/*
6368
!/benchmarks/core_indexes/faiss_l0/*.json
6469

65-
/benchmarks/datasets/adsampling_nary
6670
/benchmarks/datasets/adsampling_pdx
6771
/benchmarks/datasets/downloaded
68-
/benchmarks/datasets/nary
72+
/benchmarks/datasets/raw
73+
/benchmarks/datasets/faiss
6974
/benchmarks/datasets/pdx
7075
/benchmarks/datasets/purescan
7176
/benchmarks/datasets/queries
@@ -91,25 +96,12 @@ cmake_install.cmake
9196
/benchmarks/milvus/volumes/
9297
/benchmarks/python_scripts/indexes
9398

94-
/benchmarks/BenchmarkNaryIVFADSampling
95-
/benchmarks/BenchmarkNaryIVFADSamplingSIMD
96-
/benchmarks/BenchmarkPDXADSampling
97-
/benchmarks/BenchmarkIVF2ADSampling
98-
/benchmarks/FilteredBenchmarkPDXADSampling
99-
/benchmarks/FilteredBenchmarkU8IVF2ADSampling
100-
/benchmarks/BenchmarkASYM_U8PDXADSampling
101-
/benchmarks/BenchmarkU8PDXADSampling
102-
/benchmarks/BenchmarkLEP8PDXADSampling
103-
/benchmarks/BenchmarkPDXIVFBOND
104-
/benchmarks/BenchmarkPDXBOND
105-
/benchmarks/BenchmarkPDXLinearScan
106-
/benchmarks/BenchmarkU*
107-
/benchmarks/G4*
108-
/benchmarks/BenchmarkNaryIVFLinearScan
109-
/benchmarks/KernelPDXL1
110-
/benchmarks/KernelPDXL2
111-
/benchmarks/KernelPDXIP
112-
/benchmarks/KernelNaryL1
113-
/benchmarks/KernelNaryL2
114-
/benchmarks/KernelNaryIP
115-
/benchmarks/BenchmarkU8*
99+
/benchmarks/BenchmarkEndToEnd
100+
/benchmarks/BenchmarkSerialization
101+
/benchmarks/BenchmarkPDXIVF
102+
/benchmarks/BenchmarkFiltered
103+
/benchmarks/BenchmarkSpecialFilters
104+
105+
# Test binaries (but keep the committed test data)
106+
*.bin
107+
!tests/test_data.bin

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,6 @@
88
[submodule "extern/findFFTW"]
99
path = extern/findFFTW
1010
url = https://github.com/egpbos/findfftw.git
11+
[submodule "extern/SuperKMeans"]
12+
path = extern/SuperKMeans
13+
url = https://github.com/lkuffo/SuperKMeans

0 commit comments

Comments
 (0)