Provide GPU-accelerated vector indexes with RAFT#413
Provide GPU-accelerated vector indexes with RAFT#413wphicks wants to merge 30 commits intoRedisAI:mainfrom
Conversation
|
@wphicks also, probably need to sign the CLA as well. |
DvirDukhan
left a comment
There was a problem hiding this comment.
Thanks for the work so far
Please see comments
CMakeLists.txt
Outdated
| set_property(GLOBAL PROPERTY USE_FOLDERS ON) | ||
|
|
||
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions -fPIC ${CLANG_SAN_FLAGS} ${LLVM_CXX_FLAGS} ${COV_CXX_FLAGS}") | ||
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fexceptions -fPIC -pthread ${CLANG_SAN_FLAGS} ${LLVM_CXX_FLAGS} ${COV_CXX_FLAGS}") |
There was a problem hiding this comment.
I actually meant to ask you about that. I was having trouble compiling without it (even before making any of my other changes), but I'll have to recreate the failing build log. Will either post it here or as a separate issue.
src/VecSim/vec_sim_common.h
Outdated
| double preferredShmemCarveout; // Fraction of GPU's unified memory / L1 | ||
| // cache to be used as shared memory | ||
|
|
||
| } IVFParams; |
src/VecSim/vec_sim_common.h
Outdated
|
|
||
| } IVFParams; | ||
|
|
||
| typedef struct { |
There was a problem hiding this comment.
I don't believe we need this struct definition as is
you can send the IVFParams as primaryIndexParams when creating TieredIndexParams
if you need something specific for you tiered index management layer, this is the case where you need to create a specific struct
see
union {
TieredHNSWParams tieredHnswParams;
} specificParams;
at TieredIndexParams
| public: | ||
| BruteForceIndex(const BFParams *params, const AbstractIndexInitParams &abstractInitParams); | ||
|
|
||
| void clear() { |
There was a problem hiding this comment.
The clear() function is used in the Tiered Raft IVF index: The vectors from the flat buffers are all transferred to the backend index, and then the flat buffer is cleared.
src/VecSim/algorithms/ivf/ivf.cuh
Outdated
src/VecSim/algorithms/ivf/ivf.cuh
Outdated
| // Copy label data to previously allocated device buffer | ||
| raft::copy(label_gpu.data_handle(), label, batch_size, res_.get_stream()); | ||
|
|
||
| if (std::holds_alternative<raft::neighbors::ivf_flat::index_params>(build_params_)) { |
There was a problem hiding this comment.
Not unless we want to split IVF flat and IVF PQ back out into separate indexes again.
One possible alternate spelling would be to use std::visit with a constexpr if to distinguish the types. This has the disadvantage that compilers sometimes have trouble creating an efficient std::visit over small variants, but it has the advantage of providing obviously idiomatic access to the variant.
If performance is what we're worried about, I would avoid std::visit but add a compile-time assumption (via __builtin_unreachable, __assume or similar based on the compiler) that the variant contains the expected type prior to the std::get call. That usually allows the compiler to avoid the check that it must ordinarily make to see if it needs to throw a std::bad_variant_access.
If readability is what we are concerned about, the bodies of those if clauses could be separated into their own functions.
There was a problem hiding this comment.
Can you please share a suggestion for the last proposal, so we can iterate? (could be here or on slack). I'm thinking about modularity and code reusability here.
src/VecSim/algorithms/ivf/ivf.cuh
Outdated
| index_{std::nullopt} {} | ||
| auto addVector(const void *vector_data, labelType label, | ||
| bool overwrite_allowed = true) override { | ||
| return addVectorBatch(vector_data, &label, 1, overwrite_allowed); |
There was a problem hiding this comment.
Mandatory due to our requirements? I guess batch size 1 is not efficient here?
There was a problem hiding this comment.
That's correct. As a matter of fact, I would recommend that this index type only be used in a tiered index to allow for efficient insertions.
src/VecSim/algorithms/ivf/ivf.cuh
Outdated
| res_.get_stream()); | ||
|
|
||
| // Perform correct search based on index type | ||
| if (std::holds_alternative<raft::neighbors::ivf_flat::index>(index_)) { |
| auto indexSize() { | ||
| auto frontend_lock = std::scoped_lock(this->flatIndexGuard); | ||
| auto backend_lock = std::scoped_lock(this->mainIndexGuard); | ||
| return (getBackendIndex().indexSize() + this->frontendIndex.indexSize()); |
There was a problem hiding this comment.
Assuming transfer is "atomic" and does not cause duplications?
|
|
||
| auto backend_lock = std::scoped_lock(this->mainIndexGuard); | ||
| this->flatBuffer->clear(); | ||
| frontend_lock.unlock(); |
There was a problem hiding this comment.
needs to be called after addVectorBatch otherwise we can have both indexes empty
There was a problem hiding this comment.
Doesn't the fact that we have a lock on the backend index mean that queries against that index will not execute until we have completed the transfer? Since we do not clear the frontend index before acquiring the backend lock, and we unlock the frontend after the index has been cleared, my understanding was that this should be safe. Are queries allowed to be made against either the frontend or backend index without acquiring the corresponding lock?
There was a problem hiding this comment.
Yes you are right, I missed the backend lock,
Additional question regarding this:
when calling
void clear() {
idToLabelMapping.clear();
vectorBlocks.clear();
count = idType{};
}you are not clearing the label set of the frontend index. Can this cause issues when calling multiple times to transferToBackend with an increasing set of IDs?
alonre24
left a comment
There was a problem hiding this comment.
Some small questions/comments after doing a high-level review - before we move on to POC
| this->idToLabelMapping.shrink_to_fit(); | ||
| this->vectorBlocks.clear(); | ||
| this->vectorBlocks.shrink_to_fit(); | ||
| this->count = idType{}; |
| this->idToLabelMapping.shrink_to_fit(); | ||
| this->vectorBlocks.clear(); | ||
| this->vectorBlocks.shrink_to_fit(); | ||
| this->count = idType{}; |
| //.multi = raftIvfParams->multi, | ||
| //.logCtx = params->logCtx |
|
|
||
| double getDistanceFrom_Unsafe(labelType label, const void *blob) const override { | ||
| auto flat_dist = this->frontendIndex->getDistanceFrom_Unsafe(label, blob); | ||
| auto raft_dist = this->backendIndex->getDistanceFrom_Unsafe(label, blob); |
There was a problem hiding this comment.
This is not implemented in the backend index...
|
|
||
| // delete in place. TODO: Add async job for this | ||
| this->mainIndexGuard.lock(); | ||
| num_deleted_vectors += this->backendIndex->deleteVector(label); |
There was a problem hiding this comment.
How do we reclaim memory after deletion? As far as I understand in the backend index, we only mark vectors as deleted
Co-authored-by: GuyAv46 <47632673+GuyAv46@users.noreply.github.com>
This PR is a replacement for #377, which fell out of date with changes on main. This PR provides access to GPU-accelerated index types provided by the RAPIDS RAFT library. It introduces two new index types: IVF and Tiered IVF, which provide access to both IVF Flat and IVFPQ indexes depending on how they are configured.
The intention of this initial integration is to provide GPU-based performance improvements for building indexes and batched searches. A later PR will introduce another index type which will offer perf improvement (relative to CPU HNSW) for single-query searches as well.
Main objects this PR modified
This PR does not substantively modify existing objects. A
clearmethod is added to the brute force index to allow it to be reset with a single call.This PR is still a work in progress. Remaining tasks:
Tasks to be tackled in later PRs:
Questions for this PR:
Mark if applicable