PyWoWLib provides Python bindings for the C++ WoWIndex library, enabling efficient approximate nearest neighbor (ANN) search with attribute filtering capabilities. This implementation uses pybind11 and focuses on indices where attributes are restricted to predefined Plain Old Data (POD) types (integers, floats, fixed-size strings) chosen at index creation time.
- Fast ANN Search: Leverages the underlying C++
WoWIndexfor efficient similarity search. - Attribute Filtering: Supports filtering search results based on associated attributes during KNN search using range, set, or bitset filters.
- Predefined POD Attribute Types: Ensures type safety and efficient memory handling by requiring attributes to be one of several supported C++ POD types (
int32,int64,float,double, fixed-size strings, etc.). - Fixed-Size String Support: Includes bindings for
FixedString<N>types for safe handling of string attributes within the C++ core. - Simplified Python API: Provides a factory function (
pywowlib.WoWIndex) to create index instances by specifying the desired attribute type as a string (e.g.,"int32","string16"). - Batch Insertion: Supports efficient, parallel insertion of multiple vectors using OpenMP (requires compiler support).
- Filter Factories: Helper functions (
pywowlib.WoWRangeFilter,pywowlib.WoWSetFilter) to create filters corresponding to the index's attribute type.
- Python: 3.7 or higher.
- C++ Compiler: A modern C++ compiler supporting C++17.
- CMake: Recommended for managing the build process (pybind11 integrates well with it).
- (Optional but Recommended) OpenMP: Required for parallel batch insertion (
bulk_insert). Ensure your compiler supports it and the necessary runtime libraries are installed.- Linux: Often included with GCC/Clang (e.g.,
libgomp1). - macOS: Install via Homebrew (
brew install libomp). - Windows: Included with Visual Studio C++ workloads.
- Linux: Often included with GCC/Clang (e.g.,
- Pip: Python package installer.
Install necessary Python packages for building:
pip install pybind11>=2.6 numpy setuptools wheel-
Clone the repository:
git clone https://github.com/nju-websoft/WoW.git cd WoW/python -
Install the package:
pip install . -v- The
-vflag provides verbose output, useful for seeing compiler messages and confirming OpenMP flags are used. - This command invokes
setup.py, compiles the C++ extension, and installs thepywowlibpackage into your Python environment.
For development: Use an editable install to make changes to the Python wrapper (
__init__.py) effective immediately (C++ changes still require reinstalling/rebuilding):pip install -e . -v - The
import pywowlib
import numpy as np
# 1. Define Index Parameters
num_items = 1000
dim = 64
attribute_type = "int32" # Choose the attribute type for this index
# 2. Create Index using the factory
index = pywowlib.WoWIndex(
max_elements=num_items + 10, # Allow some extra space
vec_d=dim,
M=16, # Maximum number of connections per node
efc=100, # Beam search width during construction
o=4, # Window boosting base, suggested to be 2 or 4
space_name='l2', # Distance space ('l2', 'ip', 'cosine')
att_type=attribute_type
)
print(f"Created index with attribute type: {attribute_type}")
# 3. Prepare Data
# Vector IDs (labels) should be 0 to N-1
vector_ids = list(range(num_items))
# Vectors (NumPy array, correct dtype)
vectors = np.random.rand(num_items, dim).astype(np.float32)
# Attributes (must match att_type, e.g., Python ints for "int32")
attributes = [np.int32(i % 50) for i in range(num_items)]
# 4. Insert Data (using parallel batch insert)
print("Inserting data...")
index.bulk_insert(vector_ids, vectors, attributes, threads=4) # Use 'bulk_insert'
print("Insertion complete.")
# 5. Prepare Query and Filter
query_vector = np.random.rand(dim).astype(np.float32)
# Create a filter matching the attribute type
# Find items where attribute (int32) is between 10 and 20
my_filter = pywowlib.WoWRangeFilter(
att_type=attribute_type,
lower_bound=np.int32(10),
upper_bound=np.int32(20)
)
# 6. Search
k = 5 # Number of neighbors
efs = 50 # Beam search width for the search phase, should be larger than k.
print("\nSearching...")
results = index.searchKNN(query_vector, efs=efs, k=k, filter=my_filter)
# 7. Process Results
print("\nSearch Results ([Distance, Vector ID]):")
if results:
for distance, vec_id in results:
print(f" [{distance:.4f}, {vec_id}]")
else:
print(" No results found matching the filter.")
# 8. Save Index (Optional)
# index.save("./my_index.wow")Use the factory function to create or load an index instance.
index = pywowlib.WoWIndex(
max_elements: int, # Max number of vectors the index can hold
vec_d: int, # Dimension of the vectors
M: int, # WoW parameter (connections per node)
efc: int, # WoW parameter (efConstruction)
space_name: str, # Distance space: 'l2', 'ip', 'cosine'
att_type: str, # Specifies the C++ attribute type (see below)
o: int = 4, # (Not recommended to set manually) WoW parameter, window boosting base
wp: int = 11, # (Not recommended to set manually) WoW parameter, expected number of windows
auto_raise_wp: bool = True, # (Not recommended to set manually) WoW parameter, auto-raise window count according to the number of inserted vectors by calculating log_o(max_elements/2)
)
index = pywowlib.WoWIndexLoad(
location: str, # Path to the saved index file
att_type: str, # Must match the type of the saved index!
space_name: str, # Must match the space of the saved index
)att_type(string): Specifies the C++ POD type used for storing and filtering attributes. This choice determines which underlying C++ index specialization is created. Supported values:"int32":int32_t"int64":int64_t"uint32":uint32_t"uint64":uint64_t"float32"or"float":float"float64"or"double":double"string16": Fixed-size string (16 bytes incl. null terminator)."string32": Fixed-size string (32 bytes incl. null terminator)."label": Uses the internalLabelType(typicallyuint64_t) as the attribute (equivalent to"uint64"ifLabelTypeisuint64_t). Creates the same index type asWoWIndexSequentialinternally.
location(string, optional): If provided, attempts to load a previously saved index from the given path instead of creating a new one.- Warning: When loading, you must still provide an
att_typestring that exactly matches the C++ attribute type the index was originally saved with. This factory cannot verify the type from the file. Using the wrongatt_typeduring load will lead to undefined behavior or crashes.
- Warning: When loading, you must still provide an
The att_type chosen during index creation dictates the type of attribute data you pass to insert and use in filters.
- Numeric Types: Pass standard Python
intorfloat. Pybind11 handles conversion to the corresponding C++ type (e.g., Pythonint-> C++int32_tifatt_type="int32"). Be mindful of potential overflow if Python integers exceed the C++ type's range. - String Types (
"stringN"): Pass standard Pythonstr.- These map to C++
FixedString<N>. - Strings longer than
N-1characters will be silently truncated during insertion or when used in filters. - You can also explicitly create
FixedStringobjects in Python if needed:fs16 = pywowlib.FixedString16("my_tag") print(fs16.value) # Output: my_tag print(fs16.capacity()) # Output: 16 fs16_long = pywowlib.FixedString16("this_string_is_definitely_longer_than_15") print(fs16_long.value) # Output: this_string_is_ (truncated)
- These map to C++
-
Single Insert:
index.insert( label: int, # Vector ID (typically 0 to max_elements-1) vector: np.ndarray, # 1D NumPy array, dtype=np.float32, shape=(vec_d,) attribute: Any, # Python object matching the index's 'att_type' # (e.g., int for "int32", str for "string16") replace_deleted: bool = False # Optional flag (behavior defined by C++ core) )
-
Batch Parallel Insert: Uses OpenMP in the C++ binding layer for parallelism.
WoWIndex::insertmethod is thread-safe.index.bulk_insert( # Renamed from insert_batch_parallel for better Python feel vector_ids: List[int], # List of Vector IDs vectors_batch: np.ndarray, # 2D NumPy array, shape=(N, vec_d), dtype=np.float32 attributes_batch: List[Any], # List of attributes matching 'att_type' (len=N) replace_deleted: bool = False, # Optional flag threads: int = 4 # Number of OpenMP threads to suggest )
Use factory functions matching the index att_type.
-
Range Filter: Selects items whose attribute falls within
[lower_bound, upper_bound].range_filter = pywowlib.WoWRangeFilter( att_type: str, # Must match index's att_type lower_bound: Any, # Value matching att_type upper_bound: Any # Value matching att_type ) # Example: # filter_int = pywowlib.WoWRangeFilter("int32", 10, 20) # filter_str = pywowlib.WoWRangeFilter("string16", "cat", "dog")
-
Set Filter: Selects items whose attribute is present in the specified set.
set_filter = pywowlib.WoWSetFilter( att_type: str # Must match index's att_type ) set_filter.add(attribute_value: Any) # Add allowed attributes (must match att_type) # Example: # filter_float = pywowlib.WoWSetFilter("float32") # filter_float.add(1.0) # filter_float.add(3.14) # print(filter_float.allowed_set) # View the set (read-only property)
-
Bitset Label Filter: Filters directly by Vector ID (label), independent of
att_type. Useful for excluding specific known items or only searching within a subset of IDs.
Attention!!! Bitset label filters are only compatible with indices created with att_type="label". And is created by inserting labels as attribute values. They are not compatible with other attribute types and the result quality is not guaranteed.
```python
bitset_filter = pywowlib.WoWBitsetLabelFilter(
max_label: int # Max Vector ID expected + 1
)
bitset_filter.set(vector_id_to_allow: int)
bitset_filter.reset(vector_id_to_disallow: int)
# bitset_filter.test(vector_id: int) -> bool
# bitset_filter.clear()
```
results: List[Tuple[float, int]] = index.searchKNN(
query_vec: np.ndarray, # 1D NumPy array, shape=(vec_d,), dtype=np.float32
efs: int, # HNSW search parameter (exploration factor)
k: int, # Number of nearest neighbors to return
filter: Any # A filter object created via factories or WoWBitsetLabelFilter
)- Returns a list of
(distance, vector_id)tuples, sorted by distance. - The
filterobject must be compatible with the index (e.g., useWoWRangeFilter("int32", ...)with an index created usingatt_type="int32"). Passing an incompatible filter will likely result in a C++ runtime error or incorrect results. - Passing
filter=Nonegives you the nearest neighbors without any filtering, the performance is equivalent to searching on the bottom level of the HNSW graph.
- Save:
index.save("path/to/my_index.wow")
- Load: Use the
locationargument in thepywowlib.WoWIndexfactory.loaded_index = pywowlib.WoWIndexLoad( location="path/to/my_index.wow", att_type="int32", # MUST match the type of the saved index! space_name="l2" # Must match space of the saved index # Other parameters like max_elements, vec_d, M, etc. are read from the file )
Please check wow/example/example_arbitrary,sequential.py for a complete example demonstrating the usage of the library, including index creation, insertion, filtering, and searching.
If you need to build the package locally (e.g., for development or if a pre-built wheel is not available for your platform):
- Ensure all prerequisites listed under Installation are met.
- Clone the repository.
- Navigate to the root directory (containing
setup.py). - (Optional) Clean previous builds:
rm -rf build dist pywowlib.egg-info pywowlib/_pywowlib_core*.so python setup.py clean --all - Build the extension:
- Editable install (recommended for development):
pip install -e . -v - Standard install:
pip install . -v - Build extension in place (for testing without installing):
(Make sure your Python path includes the current directory or the
python setup.py build_ext --inplace
pywowlibsubdirectory).
- Editable install (recommended for development):
Check the build output carefully for any C++ compiler warnings or errors.