Indexing system that allows to efficiently find nearest embeddings
TFSimilarity.indexer.Indexer(
embedding_size: int,
embedding_output: int = None,
stat_buffer_size: int = 1000
) -> None
by indexing known embeddings and make them searchable using an
- [Approximate Nearest Neighbors Search] (https://en.wikipedia.org/wiki/Nearest_neighbor_search) search implemented via the Search() classes and associated data lookup via the Store() classes.
The indexer allows to evaluate the quality of the constructed index and calibrate the SimilarityModel.match() function via the Evaluator() classes.
embedding_size | Size of the embeddings that will be stored. It is usually equivalent to the size of the output layer. |
distance | Distance used to compute embeddings proximity. Defaults to 'cosine'. |
kv_store | How to store the indexed records. Defaults to 'memory'. |
search | Which Search() framework to use to perform KNN search. Defaults to 'nmslib'. |
evaluator | What type of Evaluator() to use to evaluate index performance. Defaults to in-memory one. |
embedding_output | Which model output head predicts the embeddings that should be indexed. Default to None which is for single output model. For multi-head model, the callee, usually the SimilarityModel() class is responsible for passing the correct one. |
stat_buffer_size | Size of the sliding windows buffer used to compute index performance. Defaults to 1000. |
ValueError | Invalid search framework or key value store. |
add(
prediction: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
label: Optional[int] = None,
data: <a href="../../TFSimilarity/callbacks/Tensor.md">TFSimilarity.callbacks.Tensor```
</a> = None,
build: bool = True,
verbose: int = 1
)
Add a single embedding to the indexer
Args | |
---|---|
prediction | TF similarity model prediction, may be a multi-headed output. |
label | Label(s) associated with the embedding. Defaults to None. |
data | Input data associated with the embedding. Defaults to None. |
build | Rebuild the index after insertion. Defaults to True. Set it to false if you would like to add multiples batches/points and build it manually once after. |
verbose | Display progress if set to 1. Defaults to 1. |
batch_add(
predictions: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
labels: Optional[Sequence[int]] = None,
data: Optional[<a href="../../TFSimilarity/callbacks/Tensor.md">TFSimilarity.callbacks.Tensor```
</a>] = None,
build: bool = True,
verbose: int = 1
)
Add a batch of embeddings to the indexer
Args | |
---|---|
predictions | TF similarity model predictions, may be a multi-headed output. |
labels | label(s) associated with the embedding. Defaults to None. |
datas | input data associated with the embedding. Defaults to None. |
build | Rebuild the index after insertion. Defaults to True. Set it to false if you would like to add multiples batches/points and build it manually once after. |
verbose | Display progress if set to 1. Defaults to 1. |
batch_lookup(
predictions: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
k: int = 5,
verbose: int = 1
) -> List[List[Lookup]]
Find the k closest matches for a set of embeddings
Args | |
---|---|
predictions | TF similarity model predictions, may be a multi-headed output. |
k | Number of nearest neighbors to lookup. Defaults to 5. |
verbose | Be verbose. Defaults to 1. |
Returns list of list of k nearest neighbors: List[List[Lookup]]
calibrate(
predictions: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
target_labels: Sequence[int],
thresholds_targets: MutableMapping[str, float],
calibration_metric: Union[str, <a href="../../TFSimilarity/callbacks/ClassificationMetric.md">TFSimilarity.callbacks.ClassificationMetric```
</a>] = f1_score,
k: int = 1,
matcher: Union[str, <a href="../../TFSimilarity/callbacks/ClassificationMatch.md">TFSimilarity.callbacks.ClassificationMatch```
</a>] = match_nearest,
extra_metrics: Sequence[Union[str, ClassificationMetric]] = [precision, recall],
rounding: int = 2,
verbose: int = 1
) -> <a href="../../TFSimilarity/indexer/CalibrationResults.md">TFSimilarity.indexer.CalibrationResults```
</a>
Calibrate model thresholds using a test dataset.
FIXME: more detailed explanation.
Args | |
---|---|
predictions | TF similarity model predictions, may be a multi-headed output. |
target_labels | Sequence of the expected labels associated with the embedded queries. |
thresholds_targets | Dict of performance targets to (if possible) meet with respect to the calibration_metric. |
calibration_metric | - [ClassificationMetric()](metrics/overview.md) used to evaluate the performance of the index. |
k | How many neighbors to use during the calibration. Defaults to 1. |
matcher | 'match_nearest', 'match_majority_vote' or ClassificationMatch object. Defines the classification matching, e.g., match_nearest will count a True Positive if the query_label is equal to the label of the nearest neighbor and the distance is less than or equal to the distance threshold. Defaults to 'match_nearest'. |
extra_metrics | List of additional tf.similarity.classification_metrics.ClassificationMetric() to compute and report. Defaults to ['precision', 'recall']. |
rounding | Metric rounding. Default to 2 digits. |
verbose | Be verbose and display calibration results. Defaults to 1. |
Returns | |
---|---|
CalibrationResults containing the thresholds and cutpoints Dicts. |
evaluate_classification(
predictions: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
target_labels: Sequence[int],
distance_thresholds: Union[Sequence[float], <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>],
metrics: Sequence[Union[str, ClassificationMetric]] = [f1],
matcher: Union[str, <a href="../../TFSimilarity/callbacks/ClassificationMatch.md">TFSimilarity.callbacks.ClassificationMatch```
</a>] = match_nearest,
k: int = 1,
verbose: int = 1
) -> Dict[str, np.ndarray]
Evaluate the classification performance.
Compute the classification metrics given a set of queries, lookups, and distance thresholds.
Args | |
---|---|
predictions | TF similarity model predictions, may be a multi-headed output. |
target_labels | Sequence of expected labels for the lookups. |
distance_thresholds | A 1D tensor denoting the distances points at which we compute the metrics. |
metrics | The set of classification metrics. |
matcher | 'match_nearest', 'match_majority_vote' or ClassificationMatch object. Defines the classification matching, e.g., match_nearest will count a True Positive if the query_label is equal to the label of the nearest neighbor and the distance is less than or equal to the distance threshold. |
distance_rounding | How many digit to consider to decide if the distance changed. Defaults to 8. |
verbose | Be verbose. Defaults to 1. |
Returns | |
---|---|
A Mapping from metric name to the list of values computed for each distance threshold. |
evaluate_retrieval(
predictions: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
target_labels: Sequence[int],
retrieval_metrics: Sequence[<a href="../../TFSimilarity/indexer/RetrievalMetric.md">TFSimilarity.indexer.RetrievalMetric```
</a>],
verbose: int = 1
) -> Dict[str, np.ndarray]
Evaluate the quality of the index against a test dataset.
Args | |
---|---|
predictions | TF similarity model predictions, may be a multi-headed output. |
target_labels | Sequence of the expected labels associated with the embedded queries. |
retrieval_metrics |
List of
- [RetrievalMetric()](retrieval_metrics/overview.md) to compute.
verbose (int, optional): Display results if set to 1 otherwise results are returned silently. Defaults to 1. |
Returns | |
---|---|
Dictionary of metric results where keys are the metric names and values are the metrics values. |
get_calibration_metric()
<b>python @staticmethod</b>
load(
path: Union[str, <a href="../../TFSimilarity/callbacks/Path.md">TFSimilarity.callbacks.Path```
</a>],
verbose: int = 1
)
Load Index data from a checkpoint and initialize underlying structure with the reloaded data.
Args | |
---|---|
path | Directory where the checkpoint is located. |
verbose | Be verbose. Defaults to 1. |
Returns | |
---|---|
Initialized index |
match(
predictions: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
no_match_label: int = -1,
k=1,
matcher: Union[str, <a href="../../TFSimilarity/callbacks/ClassificationMatch.md">TFSimilarity.callbacks.ClassificationMatch```
</a>] = match_nearest,
verbose: int = 1
) -> Dict[str, List[int]]
Match embeddings against the various cutpoints thresholds
Args | |
---|---|
predictions | TF similarity model predictions, may be a multi-headed output. |
no_match_label | What label value to assign when there is no match. Defaults to -1. |
k | How many neighboors to use during the calibration. Defaults to 1. |
matcher | 'match_nearest', 'match_majority_vote' or ClassificationMatch object. Defines the classification matching, e.g., match_nearest will count a True Positive if the query_label is equal to the label of the nearest neighbor and the distance is less than or equal to the distance threshold. |
verbose | display progression. Default to 1. |
-
It is up to the SimilarityModel.match() code to decide which of cutpoints results to use / show to the users. This function returns all of them as there is little performance downside to do so and it makes the code clearer and simpler.
-
The calling function is responsible to return the list of class matched to allows implementation to use additional criteria if they choose to.
Returns | |
---|---|
Dict of cutpoint names mapped to lists of matches. |
print_stats()
display statistics in terminal friendly fashion
reset() -> None
Reinitialize the indexer
save(
path: str, compression: bool = True
)
Save the index to disk
Args | |
---|---|
path | directory where to save the index |
compression | Store index data compressed. Defaults to True. |
single_lookup(
prediction: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
k: int = 5
) -> List[<a href="../../TFSimilarity/indexer/Lookup.md">TFSimilarity.indexer.Lookup```
</a>]
Find the k closest matches of a given embedding
Args | |
---|---|
prediction | TF similarity model prediction, may be a multi-headed output. |
k | Number of nearest neighbors to lookup. Defaults to 5. |
Returns list of the k nearest neighbors info: List[Lookup]
size() -> int
Return the index size
stats()
return index statistics
to_data_frame(
num_items: int = 0
) -> <a href="../../TFSimilarity/indexer/PandasDataFrame.md">TFSimilarity.indexer.PandasDataFrame```
</a>
Export data as pandas dataframe
Args | |
---|---|
num_items (int, optional): Num items to export to the dataframe. Defaults to 0 (unlimited). |
Returns | |
---|---|
pd.DataFrame | a pandas dataframe. |
DATA | 3 |
DISTANCES | 1 |
EMBEDDINGS | 0 |
LABELS | 2 |
RANKS | 4 |