Find the closest pairs in an array.
Closely compares distances of arrays/embeddings and sorts them.
pip install closelyor install from source:
git clone https://github.com/justinshenk/closely
cd closely
pip install .import closely
# X is an n x m numpy array
pairs, distances = closely.solve(X, n=1)You can specify how many pairs you want to identify with n.
The distance metric can be changed from the default euclidean to any supported by scipy.spatial.distance.cdist, eg, cosine, hamming, etc:
closely.solve(X, metric='cosine`)import closely
import numpy as np
import matplotlib.pyplot as plt
# Create dataset
X = np.random.random((100,2))
pairs, distances = closely.solve(X, n=1)
# Plot points
z, y = np.split(X, 2, axis=1)
fig, ax = plt.subplots()
ax.scatter(z, y)
for i, txt in enumerate(X):
if i in pairs:
ax.annotate(i, (z[i], y[i]), color='red')
else:
ax.annotate(i, (z[i], y[i]))
plt.show() Check pairs:
In [10]: pairs
Out[10]:
array([[ 7, 16],
[96, 50]])
Python code for ordering distance matrices modified from Andriy Lazorenko, packaged and made useful for >2 features by Justin Shenk.
