TFSimilarity.indexer.Indexer

Indexing system that allows to efficiently find nearest embeddings

TFSimilarity.indexer.Indexer(
    embedding_size: int,
    embedding_output: int = None,
    stat_buffer_size: int = 1000
) -> None

by indexing known embeddings and make them searchable using an

[Approximate Nearest Neighbors Search] (https://en.wikipedia.org/wiki/Nearest_neighbor_search) search implemented via the Search() classes and associated data lookup via the Store() classes.

The indexer allows to evaluate the quality of the constructed index and calibrate the SimilarityModel.match() function via the Evaluator() classes.

Args
embedding_size	Size of the embeddings that will be stored. It is usually equivalent to the size of the output layer.
distance	Distance used to compute embeddings proximity. Defaults to 'cosine'.
kv_store	How to store the indexed records. Defaults to 'memory'.
search	Which Search() framework to use to perform KNN search. Defaults to 'nmslib'.
evaluator	What type of Evaluator() to use to evaluate index performance. Defaults to in-memory one.
embedding_output	Which model output head predicts the embeddings that should be indexed. Default to None which is for single output model. For multi-head model, the callee, usually the SimilarityModel() class is responsible for passing the correct one.
stat_buffer_size	Size of the sliding windows buffer used to compute index performance. Defaults to 1000.

Raises
ValueError	Invalid search framework or key value store.

Methods

add

View source

add(
    prediction: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
    label: Optional[int] = None,
    data: <a href="../../TFSimilarity/callbacks/Tensor.md">TFSimilarity.callbacks.Tensor```
</a> = None,
    build: bool = True,
    verbose: int = 1
)

Add a single embedding to the indexer

Args
prediction	TF similarity model prediction, may be a multi-headed output.
label	Label(s) associated with the embedding. Defaults to None.
data	Input data associated with the embedding. Defaults to None.
build	Rebuild the index after insertion. Defaults to True. Set it to false if you would like to add multiples batches/points and build it manually once after.
verbose	Display progress if set to 1. Defaults to 1.

batch_add

View source

batch_add(
    predictions: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
    labels: Optional[Sequence[int]] = None,
    data: Optional[<a href="../../TFSimilarity/callbacks/Tensor.md">TFSimilarity.callbacks.Tensor```
</a>] = None,
    build: bool = True,
    verbose: int = 1
)

Add a batch of embeddings to the indexer

Args
predictions	TF similarity model predictions, may be a multi-headed output.
labels	label(s) associated with the embedding. Defaults to None.
datas	input data associated with the embedding. Defaults to None.
build	Rebuild the index after insertion. Defaults to True. Set it to false if you would like to add multiples batches/points and build it manually once after.
verbose	Display progress if set to 1. Defaults to 1.

batch_lookup

View source

batch_lookup(
    predictions: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
    k: int = 5,
    verbose: int = 1
) -> List[List[Lookup]]

Find the k closest matches for a set of embeddings

Args
predictions	TF similarity model predictions, may be a multi-headed output.
k	Number of nearest neighbors to lookup. Defaults to 5.
verbose	Be verbose. Defaults to 1.

Returns list of list of k nearest neighbors: List[List[Lookup]]

calibrate

View source

calibrate(
    predictions: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
    target_labels: Sequence[int],
    thresholds_targets: MutableMapping[str, float],
    calibration_metric: Union[str, <a href="../../TFSimilarity/callbacks/ClassificationMetric.md">TFSimilarity.callbacks.ClassificationMetric```
</a>] = f1_score,
    k: int = 1,
    matcher: Union[str, <a href="../../TFSimilarity/callbacks/ClassificationMatch.md">TFSimilarity.callbacks.ClassificationMatch```
</a>] = match_nearest,
    extra_metrics: Sequence[Union[str, ClassificationMetric]] = [precision, recall],
    rounding: int = 2,
    verbose: int = 1
) -> <a href="../../TFSimilarity/indexer/CalibrationResults.md">TFSimilarity.indexer.CalibrationResults```
</a>

Calibrate model thresholds using a test dataset.

FIXME: more detailed explanation.

Args
predictions	TF similarity model predictions, may be a multi-headed output.
target_labels	Sequence of the expected labels associated with the embedded queries.
thresholds_targets	Dict of performance targets to (if possible) meet with respect to the calibration_metric.
calibration_metric	- [ClassificationMetric()](metrics/overview.md) used to evaluate the performance of the index.
k	How many neighbors to use during the calibration. Defaults to 1.
matcher	'match_nearest', 'match_majority_vote' or ClassificationMatch object. Defines the classification matching, e.g., match_nearest will count a True Positive if the query_label is equal to the label of the nearest neighbor and the distance is less than or equal to the distance threshold. Defaults to 'match_nearest'.
extra_metrics	List of additional tf.similarity.classification_metrics.ClassificationMetric() to compute and report. Defaults to ['precision', 'recall'].
rounding	Metric rounding. Default to 2 digits.
verbose	Be verbose and display calibration results. Defaults to 1.

Returns
CalibrationResults containing the thresholds and cutpoints Dicts.

evaluate_classification

View source

evaluate_classification(
    predictions: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
    target_labels: Sequence[int],
    distance_thresholds: Union[Sequence[float], <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>],
    metrics: Sequence[Union[str, ClassificationMetric]] = [f1],
    matcher: Union[str, <a href="../../TFSimilarity/callbacks/ClassificationMatch.md">TFSimilarity.callbacks.ClassificationMatch```
</a>] = match_nearest,
    k: int = 1,
    verbose: int = 1
) -> Dict[str, np.ndarray]

Evaluate the classification performance.

Compute the classification metrics given a set of queries, lookups, and distance thresholds.

Args
predictions	TF similarity model predictions, may be a multi-headed output.
target_labels	Sequence of expected labels for the lookups.
distance_thresholds	A 1D tensor denoting the distances points at which we compute the metrics.
metrics	The set of classification metrics.
matcher	'match_nearest', 'match_majority_vote' or ClassificationMatch object. Defines the classification matching, e.g., match_nearest will count a True Positive if the query_label is equal to the label of the nearest neighbor and the distance is less than or equal to the distance threshold.
distance_rounding	How many digit to consider to decide if the distance changed. Defaults to 8.
verbose	Be verbose. Defaults to 1.

Returns
A Mapping from metric name to the list of values computed for each distance threshold.

evaluate_retrieval

View source

evaluate_retrieval(
    predictions: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
    target_labels: Sequence[int],
    retrieval_metrics: Sequence[<a href="../../TFSimilarity/indexer/RetrievalMetric.md">TFSimilarity.indexer.RetrievalMetric```
</a>],
    verbose: int = 1
) -> Dict[str, np.ndarray]

Evaluate the quality of the index against a test dataset.

Args
predictions	TF similarity model predictions, may be a multi-headed output.
target_labels	Sequence of the expected labels associated with the embedded queries.
retrieval_metrics	List of - [RetrievalMetric()](retrieval_metrics/overview.md) to compute. verbose (int, optional): Display results if set to 1 otherwise results are returned silently. Defaults to 1.

Returns
Dictionary of metric results where keys are the metric names and values are the metrics values.

get_calibration_metric

View source

get_calibration_metric()

load

View source

<b>python @staticmethod</b>

load(
    path: Union[str, <a href="../../TFSimilarity/callbacks/Path.md">TFSimilarity.callbacks.Path```
</a>],
    verbose: int = 1
)

Load Index data from a checkpoint and initialize underlying structure with the reloaded data.

Args
path	Directory where the checkpoint is located.
verbose	Be verbose. Defaults to 1.

Returns
Initialized index

match

View source

match(
    predictions: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
    no_match_label: int = -1,
    k=1,
    matcher: Union[str, <a href="../../TFSimilarity/callbacks/ClassificationMatch.md">TFSimilarity.callbacks.ClassificationMatch```
</a>] = match_nearest,
    verbose: int = 1
) -> Dict[str, List[int]]

Match embeddings against the various cutpoints thresholds

Args
predictions	TF similarity model predictions, may be a multi-headed output.
no_match_label	What label value to assign when there is no match. Defaults to -1.
k	How many neighboors to use during the calibration. Defaults to 1.
matcher	'match_nearest', 'match_majority_vote' or ClassificationMatch object. Defines the classification matching, e.g., match_nearest will count a True Positive if the query_label is equal to the label of the nearest neighbor and the distance is less than or equal to the distance threshold.
verbose	display progression. Default to 1.

Notes:

It is up to the SimilarityModel.match() code to decide which of cutpoints results to use / show to the users. This function returns all of them as there is little performance downside to do so and it makes the code clearer and simpler.
The calling function is responsible to return the list of class matched to allows implementation to use additional criteria if they choose to.

Returns
Dict of cutpoint names mapped to lists of matches.

print_stats

View source

print_stats()

display statistics in terminal friendly fashion

reset

View source

reset() -> None

Reinitialize the indexer

save

View source

save(
    path: str, compression: bool = True
)

Save the index to disk

Args
path	directory where to save the index
compression	Store index data compressed. Defaults to True.

single_lookup

View source

single_lookup(
    prediction: <a href="../../TFSimilarity/callbacks/FloatTensor.md">TFSimilarity.callbacks.FloatTensor```
</a>,
    k: int = 5
) -> List[<a href="../../TFSimilarity/indexer/Lookup.md">TFSimilarity.indexer.Lookup```
</a>]

Find the k closest matches of a given embedding

Args
prediction	TF similarity model prediction, may be a multi-headed output.
k	Number of nearest neighbors to lookup. Defaults to 5.

Returns list of the k nearest neighbors info: List[Lookup]

size

View source

size() -> int

Return the index size

stats

View source

stats()

return index statistics

to_data_frame

View source

to_data_frame(
    num_items: int = 0
) -> <a href="../../TFSimilarity/indexer/PandasDataFrame.md">TFSimilarity.indexer.PandasDataFrame```
</a>

Export data as pandas dataframe

Args
num_items (int, optional): Num items to export to the dataframe. Defaults to 0 (unlimited).

Returns
pd.DataFrame	a pandas dataframe.

Class Variables
DATA	3
DISTANCES	1
EMBEDDINGS	0
LABELS	2
RANKS	4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!