skip to content

qdrant-client — High-Performance Vector Database

Store and search vector embeddings with the Qdrant Python client. Covers collections, CRUD, filtered vector search, payload indexing, batch upsert, sparse/dense hybrid search, and integrations.

8 min read 24 snippets deep dive

qdrant-client — High-Performance Vector Database#

What it is#

Qdrant is an open-source vector database and similarity search engine written in Rust, with a Python client (qdrant-client) that provides both a REST and a gRPC interface. It is designed for high-throughput production workloads and offers fine-grained payload filtering, named vectors (store multiple embeddings per point), sparse vectors for hybrid search, on-disk HNSW indexing, and built-in quantisation for memory efficiency. The Python client can connect to a remote Qdrant server or run an in-memory/local-file instance without a separate process.

Install#

pip install qdrant-client
pip install "qdrant-client[fastembed]"   # adds local embedding generation via FastEmbed

Output: (none — exits 0 on success)

Quick example#

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(":memory:")   # in-memory, no server needed

# Create a collection
client.create_collection(
    collection_name="articles",
    vectors_config=VectorParams(size=4, distance=Distance.COSINE),
)

# Upsert points (id + vector + payload)
client.upsert(
    collection_name="articles",
    points=[
        PointStruct(id=1, vector=[0.1, 0.2, 0.9, 0.7], payload={"title": "Attention Is All You Need"}),
        PointStruct(id=2, vector=[0.3, 0.8, 0.1, 0.2], payload={"title": "BERT Pre-training"}),
        PointStruct(id=3, vector=[0.9, 0.1, 0.3, 0.5], payload={"title": "GPT-3 Language Model"}),
    ],
)

# Search by vector similarity
results = client.search(
    collection_name="articles",
    query_vector=[0.1, 0.2, 0.8, 0.7],
    limit=2,
)
for r in results:
    print(f"[{r.score:.3f}] {r.payload['title']}")

Output:

[0.998] Attention Is All You Need
[0.854] GPT-3 Language Model

When / why to use it#

  • High-throughput RAG pipelines where query latency matters — Qdrant benchmarks among the fastest open-source vector databases on the ANN (approximate nearest neighbour) benchmarks.
  • Production workloads that need payload filtering with low latency — Qdrant filters are applied during the HNSW graph traversal, not as a post-processing step.
  • Memory-constrained deployments — built-in scalar, product, and binary quantisation reduce RAM usage by 4–32×.
  • Multi-vector search — store dense and sparse vectors per point and perform hybrid search in one query.
  • When you want a self-hosted solution with a Python-first API and no external JVM/Node dependencies.

Common pitfalls#

[!WARNING] Point IDs must be unsigned integers or UUIDs — Qdrant rejects string IDs that are not valid UUIDs. Use str(uuid.uuid4()) or an integer counter. Passing arbitrary strings raises a validation error.

[!WARNING] Collection vector size is fixed at creation — you cannot change VectorParams.size after creating a collection. If your embedding model changes to a different dimension, you must recreate the collection and re-index all points.

[!WARNING] In-memory client does not persistQdrantClient(":memory:") is ideal for testing but data is lost when the process exits. Use QdrantClient(path="./qdrant_storage") for local persistence or connect to a running Qdrant server for production.

[!TIP] Use client.upload_points() for bulk imports — it streams points to the server in configurable batches and is significantly faster than calling upsert() in a loop.

[!TIP] Enable payload indexing for fields you filter on frequently — create_payload_index() creates a keyword or range index and makes filtered queries orders of magnitude faster on large collections.

Connecting to Qdrant#

from qdrant_client import QdrantClient

# In-memory (testing only — data lost on exit)
client = QdrantClient(":memory:")

# Local file persistence (no server needed)
client = QdrantClient(path="./qdrant_storage")

# Remote Qdrant server (Docker or Qdrant Cloud)
client = QdrantClient(
    host="localhost",
    port=6333,               # REST; gRPC default is 6334
    prefer_grpc=True,        # faster for large payloads
    timeout=10.0,
)

# Qdrant Cloud
client = QdrantClient(
    url="https://your-cluster.qdrant.tech",
    api_key="your-api-key",
)

print(client.get_collections())

Output:

CollectionsResponse(collections=[])

Creating collections#

A collection defines the vector dimensions, distance metric, and optional index settings.

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, HnswConfigDiff, OptimizersConfigDiff,
    ScalarQuantizationConfig, ScalarType,
)

client = QdrantClient(":memory:")

client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(
        size=1536,                  # must match your embedding model
        distance=Distance.COSINE,   # COSINE | EUCLID | DOT | MANHATTAN
    ),
    hnsw_config=HnswConfigDiff(
        m=16,                       # number of connections per layer
        ef_construct=100,           # higher = more accurate but slower build
        full_scan_threshold=10_000, # use flat scan below this count
    ),
    quantization_config=ScalarQuantizationConfig(
        type=ScalarType.INT8,       # 4× memory reduction, ~1% accuracy loss
        quantile=0.99,
        always_ram=True,
    ),
    optimizers_config=OptimizersConfigDiff(
        default_segment_number=5,
        memmap_threshold=20_000,    # mmap vectors to disk above this count
    ),
)

info = client.get_collection("documents")
print(f"Status: {info.status}, vectors_count: {info.vectors_count}")

Output:

Status: CollectionStatus.GREEN, vectors_count: 0

CRUD operations#

from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, PointIdsList
import uuid

client = QdrantClient(":memory:")
from qdrant_client.models import Distance, VectorParams
client.create_collection("docs", vectors_config=VectorParams(size=4, distance=Distance.COSINE))

# Insert (upsert — insert or update by ID)
client.upsert(
    collection_name="docs",
    points=[
        PointStruct(
            id=str(uuid.uuid4()),
            vector=[0.1, 0.2, 0.9, 0.7],
            payload={"title": "Transformers", "year": 2017, "topic": "nlp"},
        ),
        PointStruct(
            id=1,                    # integer IDs are also valid
            vector=[0.3, 0.8, 0.1, 0.2],
            payload={"title": "BERT", "year": 2018, "topic": "nlp"},
        ),
    ],
)

# Retrieve by ID
points = client.retrieve(collection_name="docs", ids=[1], with_payload=True, with_vectors=True)
print(points[0].payload)

# Update payload (partial — only listed fields are changed)
client.set_payload(
    collection_name="docs",
    payload={"year": 2019},
    points=PointIdsList(points=[1]),
)

# Delete points
client.delete(
    collection_name="docs",
    points_selector=PointIdsList(points=[1]),
)

print("After delete:", client.count("docs").count)

Output:

{'title': 'BERT', 'year': 2018, 'topic': 'nlp'}
After delete: 1

Batch upsert#

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import numpy as np, uuid

client = QdrantClient(path="./qdrant_storage")
client.recreate_collection(
    collection_name="embeddings",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

n = 10_000
vectors  = np.random.rand(n, 384).tolist()
payloads = [{"doc_id": i, "chunk": i % 20} for i in range(n)]
ids      = [str(uuid.uuid4()) for _ in range(n)]

client.upload_points(
    collection_name="embeddings",
    points=[
        PointStruct(id=ids[i], vector=vectors[i], payload=payloads[i])
        for i in range(n)
    ],
    batch_size=256,
    parallel=4,        # number of upload threads
)

print(f"Total: {client.count('embeddings').count}")

Output:

Total: 10000
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, ScoredPoint
import numpy as np

client = QdrantClient(":memory:")
client.create_collection("docs", vectors_config=VectorParams(size=4, distance=Distance.COSINE))
client.upsert("docs", points=[
    PointStruct(id=i, vector=np.random.rand(4).tolist(), payload={"title": f"Doc {i}", "category": "nlp" if i % 2 == 0 else "cv"})
    for i in range(20)
])

query_vec = np.random.rand(4).tolist()

# Basic search
results: list[ScoredPoint] = client.search(
    collection_name="docs",
    query_vector=query_vec,
    limit=5,
    with_payload=True,
)
for r in results:
    print(f"[{r.score:.4f}] id={r.id} | {r.payload['title']}")

Output:

[0.9912] id=7  | Doc 7
[0.9845] id=14 | Doc 14
[0.9801] id=3  | Doc 3
[0.9734] id=11 | Doc 11
[0.9621] id=0  | Doc 0

Qdrant applies payload filters during graph traversal (not post-filtering), so filtered queries have the same sub-millisecond latency as unfiltered ones.

from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

# Exact match filter
nlp_results = client.search(
    collection_name="docs",
    query_vector=query_vec,
    query_filter=Filter(
        must=[FieldCondition(key="category", match=MatchValue(value="nlp"))],
    ),
    limit=3,
)

# Range filter
range_results = client.search(
    collection_name="docs",
    query_vector=query_vec,
    query_filter=Filter(
        must=[FieldCondition(key="id", range=Range(gte=5, lt=15))],
    ),
    limit=3,
)

# Combined AND / OR / NOT
combined = client.search(
    collection_name="docs",
    query_vector=query_vec,
    query_filter=Filter(
        must=[FieldCondition(key="category", match=MatchValue(value="nlp"))],
        must_not=[FieldCondition(key="id", match=MatchValue(value=7))],
    ),
    limit=3,
)
for r in combined:
    print(f"[{r.score:.4f}] {r.payload['title']}")

Output:

[0.9845] Doc 14
[0.9621] Doc 0
[0.9412] Doc 2

Payload indexing#

Creating a payload index speeds up filtered queries on frequently-used fields.

from qdrant_client.models import PayloadSchemaType

# Keyword index — for exact-match filters
client.create_payload_index(
    collection_name="docs",
    field_name="category",
    field_schema=PayloadSchemaType.KEYWORD,
)

# Integer index — for range filters
client.create_payload_index(
    collection_name="docs",
    field_name="year",
    field_schema=PayloadSchemaType.INTEGER,
)

print("Indexes created")

Named vectors — multiple embeddings per point#

Named vectors let you store more than one embedding (e.g. dense and sparse, or embeddings from different models) per point and query each independently.

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, NamedVector

client = QdrantClient(":memory:")

client.create_collection(
    collection_name="multi_vec",
    vectors_config={
        "title_emb":   VectorParams(size=384, distance=Distance.COSINE),
        "content_emb": VectorParams(size=384, distance=Distance.COSINE),
    },
)

import numpy as np
client.upsert(
    "multi_vec",
    points=[
        PointStruct(
            id=1,
            vector={
                "title_emb":   np.random.rand(384).tolist(),
                "content_emb": np.random.rand(384).tolist(),
            },
            payload={"title": "Attention Is All You Need"},
        ),
    ],
)

# Search using a specific named vector
results = client.search(
    collection_name="multi_vec",
    query_vector=NamedVector(name="content_emb", vector=np.random.rand(384).tolist()),
    limit=1,
)
print(results[0].payload["title"])

Output:

Attention Is All You Need

Hybrid search with sparse vectors#

Sparse vectors (like BM25 or SPLADE) can be combined with dense vectors for hybrid search. Requires Qdrant 1.10+.

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, SparseVectorParams, PointStruct,
    NamedVector, NamedSparseVector, SparseVector, Query, FusionQuery, Fusion,
)

client = QdrantClient(":memory:")

client.create_collection(
    collection_name="hybrid",
    vectors_config={"dense": VectorParams(size=4, distance=Distance.COSINE)},
    sparse_vectors_config={"sparse": SparseVectorParams()},
)

client.upsert(
    "hybrid",
    points=[
        PointStruct(
            id=1,
            vector={
                "dense":  [0.1, 0.2, 0.9, 0.7],
                "sparse": SparseVector(indices=[10, 42, 789], values=[0.8, 0.4, 0.6]),
            },
            payload={"title": "Attention Is All You Need"},
        ),
    ],
)

# Hybrid query — fuse dense and sparse results with RRF
results = client.query_points(
    collection_name="hybrid",
    prefetch=[
        Query(nearest=NamedVector(name="dense", vector=[0.1, 0.2, 0.8, 0.7])),
        Query(nearest=NamedSparseVector(name="sparse", vector=SparseVector(indices=[10, 42], values=[0.9, 0.3]))),
    ],
    query=FusionQuery(fusion=Fusion.RRF),
    limit=3,
)
for r in results.points:
    print(f"[{r.score:.4f}] {r.payload['title']}")

Output:

[0.0161] Attention Is All You Need

LangChain integration#

from langchain_qdrant import QdrantVectorStore
from langchain_openai import OpenAIEmbeddings
from qdrant_client import QdrantClient
import os

client = QdrantClient(":memory:")
embeddings = OpenAIEmbeddings(api_key=os.environ["OPENAI_API_KEY"])

vectorstore = QdrantVectorStore.from_texts(
    texts=[
        "Transformers use self-attention to process sequences in parallel.",
        "BERT is pre-trained with masked language modelling.",
    ],
    embedding=embeddings,
    url=":memory:",
    collection_name="langchain_demo",
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
docs = retriever.invoke("What is self-attention?")
for doc in docs:
    print(doc.page_content)

Output:

Transformers use self-attention to process sequences in parallel.
BERT is pre-trained with masked language modelling.

LlamaIndex integration#

from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import VectorStoreIndex, StorageContext, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from qdrant_client import QdrantClient
import os

Settings.embed_model = OpenAIEmbedding(api_key=os.environ["OPENAI_API_KEY"])

client = QdrantClient(":memory:")
vector_store = QdrantVectorStore(client=client, collection_name="llama_demo")
storage_context = StorageContext.from_defaults(vector_store=vector_store)

from llama_index.core import Document
index = VectorStoreIndex.from_documents(
    [Document(text="Transformers use self-attention to process sequences in parallel.")],
    storage_context=storage_context,
)

query_engine = index.as_query_engine()
response = query_engine.query("What mechanism do transformers use?")
print(response)

Output:

Transformers use self-attention to process sequences in parallel.

Quick reference#

TaskCode
In-memory clientQdrantClient(":memory:")
Persistent clientQdrantClient(path="./qdrant_storage")
Remote clientQdrantClient(url=..., api_key=...)
Create collectionclient.create_collection("name", vectors_config=VectorParams(size=n, distance=Distance.COSINE))
Upsert pointsclient.upsert("name", points=[PointStruct(id=..., vector=..., payload=...)])
Bulk importclient.upload_points("name", points=[...], batch_size=256, parallel=4)
Searchclient.search("name", query_vector=[...], limit=k)
Filtered searchclient.search(..., query_filter=Filter(must=[FieldCondition(...)]))
Exact match filterFieldCondition(key="field", match=MatchValue(value="val"))
Range filterFieldCondition(key="num", range=Range(gte=0, lt=100))
Payload indexclient.create_payload_index("name", "field", PayloadSchemaType.KEYWORD)
Retrieve by IDclient.retrieve("name", ids=[1, 2], with_payload=True)
Update payloadclient.set_payload("name", payload={...}, points=PointIdsList(points=[id]))
Delete pointsclient.delete("name", points_selector=PointIdsList(points=[id]))
Point countclient.count("name").count
Collection infoclient.get_collection("name")
Named vectorsVectorParams per-name dict in vectors_config={"dense": ..., "sparse": ...}