Similarity Search in FAISS Returning Raw, Unintelligible Data #4120
Rajat-2001
started this conversation in
General
Replies: 1 comment
-
|
I see the vector representation of the text when running the same code e.g. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
When performing similarity search using FAISS (Facebook AI Similarity Search), the results are often returned as raw, low-level vector data that isn't human-readable or useful without additional processing. Instead of meaningful textual data or relevant objects, the output is composed of unintelligible characters and symbols, representing the vectorized data internally.
Example Output:
Rank: 1, Distance: 1.629706859588623, Text: M *M 4M JM pM M M N qN N N N O TO \O ]O {O O P hP ~P P P IQ lQ Q Q Q Q FR XR ~R R R =S S S T T ;T |T T T T [U \U U U +V KV UV dV uV V V W W W W $X 4X X X X
Rank: 2, Distance: 1.6545774936676025, Text: F F F F F G G H H PH RH nH I -I EI HI ZI I I I J J J J K K =L DL #M oM M M M ;N N N N sO O O LP P P P *Q 7Q TQ _Q Q Q R dR R ;S kS T KT T T T T T !U #U
This behavior is expected from FAISS, as it returns high-dimensional vectors during similarity searches. However, it’s not helpful to end users without further translation into meaningful data such as text, image references, or other objects.
Platform
OS: Linux/Ubuntu 22.04
Faiss version: 1.7.2
Faiss compilation options: Compiled with CUDA support
OS:
Faiss version:
Installed from:
Faiss compilation options:
Running on:
Interface:
Reproduction instructions
Install FAISS:
pip install faiss-cpu (for CPU version)
pip install faiss-gpu (for GPU version, if applicable)
Create a FAISS index and add data:
import faiss
import numpy as np
Create random data to simulate a vector search
d = 512 # Dimensionality of the vectors
nb = 1000000 # Number of vectors (adjust as needed)
np.random.seed(1234)
data = np.random.random((nb, d)).astype('float32')
Create FAISS index using L2 distance
index = faiss.IndexFlatL2(d)
index.add(data)
Perform a search with a random query vector
query = np.random.random((1, d)).astype('float32')
D, I = index.search(query, k=5)
Output the results (This is where the raw data appears)
for rank, (distance, idx) in enumerate(zip(D[0], I[0])):
print(f"Rank: {rank+1}, Distance: {distance}, Text: {data[idx]}")
Expected Output: The output should ideally show human-readable data or objects that are similar to the input query.
Example Expected Output:Rank: 1, Distance: 0.923, Text: "Some relevant text or object description"
Rank: 2, Distance: 1.023, Text: "Another relevant item"
Actual Output: Instead of meaningful text or objects, the output returns raw vector data that’s not interpretable without further processing, like:Rank: 1, Distance: 1.629706859588623, Text: M *M 4M JM pM M M N qN N N N O TO \O ]O {O O P hP ~P P P IQ lQ Q Q Q Q FR XR ~R R R =S S S T T ;T |T T T T [U \U U U +V KV UV dV uV V V W W W W $X 4X X X X
Rank: 2, Distance: 1.6545774936676025, Text: F F F F F G G H H PH RH nH I -I EI HI ZI I I I J J J J K K =L DL #M oM M M M ;N N N N sO O O LP P P P *Q 7Q TQ _Q Q Q R dR R ;S kS T KT T T T T T !U #U
The output here is raw data that represents the internal vector space from FAISS, which is not directly human-readable.
Beta Was this translation helpful? Give feedback.
All reactions