1. faiss role
The general solution to the problem of similarity retrieval for TopK is violent retrieval, which iterates through all vectors to compute similarity and derive TopK. However, when the number of vectors is large, this method and its time-consuming, Faiss's appearance solves this problem well.
2. Introduction to faiss
The full name of Faiss is Facebook AI Similarity Search It's a tool developed by FaceBook's AI team for large-scale similarity retrieval problems, written in C++, with a python interface, and can perform millisecond retrieval on a billion-magnitude index.Faiss's job is to encapsulate our own set of candidate vectors into an index database, which can speed up our process of retrieving similar vector TopK s, some of which also support GPU building, which is a strong enhancement.
3. QUICK START
FAISS generally consists of three steps:
- step1: Construct a vector library, which can be constructed by averaging word vectors or by getting the vector of a sentence directly from a pre-training model such as BERT
import numpy as np d = 64 # Vector Dimension nb = 100000 # The amount of data in the index vector database nq = 10000 # Number of queries to retrieve np.random.seed(1234) xb = np.random.random((nb, d)).astype('float32') # Vectors of the index vector library xq = np.random.random((nq, d)).astype('float32') # The query vector to be retrieved
- step2: Build the index and add the vector to the index.Here we use the violence retrieval method FlatL2. The similarity measure used by the L2 index is the L2 norm, which is Euclidean distance.
import faiss index = faiss.IndexFlatL2(d) # Dimension d of vector must be specified when creating index print(index.is_trained) # The output is True, meaning this type of index does not require training, just add vectors in index.add(xb) # Add a vector from a vector library to an index print(index.ntotal) # The total number of vectors contained in the output index is 100000
- step3: Retrieve TopK similar query
k = 4 # K value of topK D, I = index.search(xq, k)# xq is the vector matrix to be retrieved, returns I as the index list of the most similar TopK for each query to be retrieved, and D as its corresponding distance print(I[:5]) print(D[:5])
The first matrix is an index, and the first column 0, 1, 2, 3, 4 represents the index of the matrix to be retrieved. As you can see from the second distance matrix, the distance between yourself and yourself is zero.
[[ 0 393 363 78] [ 1 555 277 364] [ 2 304 101 13] [ 3 173 18 182] [ 4 288 370 531]] [[ 0. 7.17517328 7.2076292 7.25116253] [ 0. 6.32356453 6.6845808 6.79994535] [ 0. 5.79640865 6.39173603 7.28151226] [ 0. 7.27790546 7.52798653 7.66284657] [ 0. 6.76380348 7.29512024 7.36881447]]
In practice, faiss.index_is often used to build indexesThe factory method, which is supported by almost all indexes, can be built as follows in step 2 above:
dim, measure = 64, faiss.METRIC_L2 param = 'Flat' index = faiss.index_factory(dim, param, measure)
dim:Dimension of specified vector
param: is a parameter passed in index that represents what type of index needs to be built
Measure: a measure that currently supports two types, Euclidean distance and inner product, or inner product.Therefore, to calculate cosine similarity, you only need to normalize the vectors and use the inner product measure.The parameter is faiss.METRIC_INNER_PRODUCT
Some indexes can hold integer IDs, and each vector can specify an ID. When similar vectors are queried, the IDs and similarities (or distances) of similar vectors are returned.If not specified, they will be added from 0 in the order they were added.Where IndexFlatL2 does not support the specified ID.IndexFlatL2 does not support specifying an id, but it can be done by IDMAP, as follows
ids = [2,10, 100,...] ids = np.array(ids) index = faiss.index_factory(768, "IDMap, Flat") index.add_with_ids(save_embedding, ids) # Specify id, save_embedding is a vector library
The following method is similar to the one above
index = faiss.IndexFlatL2(d) ids = np.arange(100000, 200000) index2 = faiss.IndexIDMap(index) index2.add_with_ids(xb, ids)
4. Advantages and disadvantages of common Faiss index es and scenarios for their use
4.1 Flat: Violence Retrieval
- Advantages: This method is the most accurate of all Faiss index es and has the highest recall rate, none of them.
- Disadvantages: Slow speed, large memory footprint
- Usage: Very few vector candidate sets, less than 500,000, and low memory
dim, measure = 64, faiss.METRIC_L2 param = 'Flat' index = faiss.index_factory(dim, param, measure) index.is_trained # Output as True index.add(xb) # Add a vector to the index
4.2 IVFx Flat: Retrieval of Inverted Violence
- Advantages: This method uses inverted technology to speed up the retrieval of violence faster than Flat
- Disadvantages: Not very fast yet, and retrieval recalls are falling
- Usage: Same as Flat,
- Parameter: x in IVFx is the number of k-means cluster centers
dim, measure = 64, faiss.METRIC_L2 param = 'IVF100, Flat' # Represents a k-means cluster center of 100, index = faiss.index_factory(dim, param, measure) print(index.is_trained) # The output is False because the inverted index requires training k-means. index.train(xb) # So you need to train the index first, then add the vector index.add(xb)
- Advantages: This method is an improved graph-based retrieval method with fast retrieval speed, 1 billion levels of seconds to retrieve results, and recall rate almost comparable to Flat, reaching an amazing 97%.The time complexity of the retrieval is log loglogn, and the magnitude of the candidate vectors can be almost ignored.It also supports batch import, which is ideal for online tasks and millisecond experience.
- Disadvantages: Constructing an index is extremely slow and takes up a lot of memory (the largest in Faiss, which is larger than the size of the memory used by the original vector)
- Parameters: The X in HNSWx is the maximum number of nodes connected to each point when constructing the graph. The larger the x, the more complex the composition and the more accurate the query. Of course, the slower the index construction time is, the X takes any integer from 4 to 64.
- Usage: Don't care about memory, and have plenty of time to build an index
dim, measure = 64, faiss.METRIC_L2 param = 'HNSW64' index = faiss.index_factory(dim, param, measure) print(index.is_trained) # The output is True at this time index.add(xb)
4.4 PQx: Product Quantization
- Advantages: Using the product quantization method, the ordinary k-means is improved. The dimension of a vector is cut into x-segments, and each segment is k-means separately.So it's fast, takes up less memory, and has a relatively high recall rate
- Disadvantages: Recall rates are much lower than violent retrieval.
- Usage: Memory and its scarcity, fast retrieval speed, less concern about recall rates
- Parameter: X in PQx is the number of segments to slice the vector, so x needs to be divisible by the vector dimension, and the larger x is, the finer the slicing is, and the more time complexity is
dim, measure = 64, faiss.METRIC_L2 param = 'PQ16' index = faiss.index_factory(dim, param, measure) print(index.is_trained) # The output is False because the inverted index requires training k-means. index.train(xb) # So you need to train the index first, then add the vector index.add(xb)
4.5 IVFxPQy Inverted Product Quantization
- Advantages: This method is widely used in the industry and all indicators are acceptable.
- Disadvantages: Collecting the length of a hundred families, naturally also collecting the shortcomings of a hundred families
- Usage: Same as PQx
- Parameter: IVFxPQy, where x and y are the same
dim, measure = 64, faiss.METRIC_L2 param = 'IVF100, PQ16' index = faiss.index_factory(dim, param, measure) print(index.is_trained) # The output is False because the inverted index requires training k-means. index.train(xb) # So you need to train index first, then add vector index.add(xb)
More References 
5. Use GPU
Reference resources Running on GPUs
5.1 Use a single gpu
res = faiss.StandardGpuResources() # Declare gpu resources # Building a flat (CPU) index index_flat = faiss.IndexFlatL2(d) # Adding cpu index to gpu gpu_index_flat = faiss.index_cpu_to_gpu(res, 0, index_flat) # The next steps are similar to the general situation gpu_index_flat.add(xb) # add vectors to the index print(gpu_index_flat.ntotal) k = 4 # we want to see 4 nearest neighbors D, I = gpu_index_flat.search(xq, k) # actual search print(I[:5]) # neighbors of the 5 first queries print(I[-5:]) # neighbors of the 5 last queries
5.2 Use multiple GPU s
ngpus = faiss.get_num_gpus() print("number of GPUs:", ngpus) cpu_index = faiss.IndexFlatL2(d) gpu_index = faiss.index_cpu_to_all_gpus(cpu_index) # build the index gpu_index.add(xb) # add vectors to the index print(gpu_index.ntotal) k = 4 # we want to see 4 nearest neighbors D, I = gpu_index.search(xq, k) # actual search print(I[:5]) # neighbors of the 5 first queries print(I[-5:]) # neighbors of the 5 last queries