back to home

microsoft / SPTAG

A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.

4,981 stars
610 forks
140 issues
C++PythonCuda

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing microsoft/SPTAG in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/microsoft/SPTAG)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

SPTAG: A library for fast approximate nearest neighbor search **SPTAG** SPTAG (Space Partition Tree And Graph) is a library for large scale vector approximate nearest neighbor search scenario released by Microsoft Research (MSR) and Microsoft Bing. What's NEW • Result Iterator with Relaxed Monotonicity Signal Support • New Research Paper SPFresh: Incremental In-Place Update for Billion-Scale Vector Search - _published in SOSP 2023_ • New Research Paper VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity - _published in OSDI 2023_ **Introduction** This library assumes that the samples are represented as vectors and that the vectors can be compared by L2 distances or cosine distances. Vectors returned for a query vector are the vectors that have smallest L2 distance or cosine distances with the query vector. SPTAG provides two methods: kd-tree and relative neighborhood graph (SPTAG-KDT) and balanced k-means tree and relative neighborhood graph (SPTAG-BKT). SPTAG-KDT is advantageous in index building cost, and SPTAG-BKT is advantageous in search accuracy in very high-dimensional data. **How it works** SPTAG is inspired by the NGS approach [WangL12]. It contains two basic modules: index builder and searcher. The RNG is built on the k-nearest neighborhood graph [WangWZTG12, WangWJLZZH14] for boosting the connectivity. Balanced k-means trees are used to replace kd-trees to avoid the inaccurate distance bound estimation in kd-trees for very high-dimensional vectors. The search begins with the search in the space partition trees for finding several seeds to start the search in the RNG. The searches in the trees and the graph are iteratively conducted. ## **Highlights** • Fresh update: Support online vector deletion and insertion • Distributed serving: Search over multiple machines ## **Build** **Requirements** • swig >= 4.0.2 • cmake >= 3.12.0 • boost >= 1.67.0 **Fast clone** **Install** > For Linux: > Compile SPDK > Compile isal-l_crypto > Build RocksDB > Build SPTAG It will generate a Release folder in the code directory which contains all the build targets. > For Windows: It will generate a SPTAGLib.sln in the build directory. Compiling the ALL_BUILD project in the Visual Studio (at least 2019) will generate a Release directory which contains all the build targets. For detailed instructions on installing Windows binaries, please see here > Using Docker: Will build a docker container with binaries in . **Verify** Run the SPTAGTest (or Test.exe) in the Release folder to verify all the tests have passed. **Usage** The detailed usage can be found in Get started. There is also an end-to-end tutorial for building vector search online service using Python Wrapper in Python Tutorial. The detailed parameters tunning can be found in Parameters. **References** Please cite SPTAG in your publications if it helps your research: **Contribute** This project welcomes contributions and suggestions from all the users. We use GitHub issues for tracking suggestions and bugs. **License** The entire codebase is under MIT license