skip to content

chromadb — Embedded Vector Database

Package-level reference for chromadb on PyPI — install variants, server/client split, embedding-function extras, and alternative vector stores.

16 min read 11 snippets deep dive

chromadb#

What it is#

chromadb is the Python distribution of Chroma, an open-source vector database for AI applications. The package ships the embedded in-process engine, the persistent SQLite/DuckDB-backed store, an HTTP client/server, and a small library of pluggable embedding functions in a single import. The same chromadb.Client API works whether you are running in-memory for a notebook prototype or pointing at a remote Chroma server cluster.

Reach for chromadb when you want zero-infrastructure RAG storage in Python and are happy to scale up later. Reach for qdrant-client, weaviate-client, or hosted services like Pinecone when you need strict multi-tenant isolation, advanced filtering, or production-grade clustering from day one.

Install#

pip install chromadb

Output: (none — exits 0 on success)

uv add chromadb

Output: dependency resolved + added to pyproject.toml

poetry add chromadb

Output: updated lockfile + virtualenv install

pip install chromadb-client      # thin HTTP-only client (no server, no embedding deps)

Output: installs the slim client that talks to a remote chroma run server

Versioning & Python support#

  • Chroma releases are pre-1.0 and move quickly — the 0.4.x → 0.5.x jump in 2024 changed the on-disk persistent-store layout and required a migration script. Always read the changelog before bumping.
  • Recent versions support Python 3.8+ on Linux, macOS, and Windows. Wheels are published for common architectures; building from source needs a C++ toolchain because of the HNSW index code.
  • Client/server version skew matters. If you run the Chroma server in Docker, the chromadb (or chromadb-client) version in your application should track the server’s minor version. Cross-minor combinations sometimes work and sometimes fail with opaque protocol errors — pinning both to the same minor is the safe path.
  • Roadmap targets a 1.0 once the storage format and tenant model stabilise; until then treat every minor as a potential breaking release in CI.

Package metadata#

  • Maintainer: Chroma (the company, formerly Chroma Inc.) and community contributors
  • Project home: github.com/chroma-core/chroma
  • Docs: docs.trychroma.com
  • PyPI: pypi.org/project/chromadb
  • License: Apache-2.0
  • Governance: company-led with open contributions; commercial Chroma Cloud offering tracks the open core
  • First released: 2022
  • Downloads: multiple million per month on PyPI; the default vector store in many LangChain and LlamaIndex tutorials

Optional dependencies & extras#

chromadb ships as one PyPI package that bundles the server, the local persistent store, and the in-process engine. There are no published feature extras in the usual chromadb[xxx] form — instead, optional functionality lives in companion packages and in the small built-in chromadb.utils.embedding_functions module, which loads its own extras lazily.

Common companions to install alongside:

  • chromadb-client — slim HTTP-only client (no server, no embedding deps). Use it in lightweight containers that only talk to a remote chroma run server.
  • sentence-transformers — local CPU/GPU embeddings via the built-in SentenceTransformerEmbeddingFunction.
  • openai — required if you use the built-in OpenAIEmbeddingFunction.
  • cohere, google-generativeai, voyageai — each backs the corresponding built-in embedding function.
  • onnxruntime — used by the bundled default MiniLM embedding model.
  • tiktoken — token counting when you mix Chroma with OpenAI chat models.
  • langchain-chroma or llama-index-vector-stores-chroma — framework adapters in the LangChain / LlamaIndex ecosystems.

Alternatives#

PackageTrade-off
qdrant-clientRust-backed Qdrant server with rich payload filtering and gRPC support. Use when you want a stronger production story than embedded Chroma.
weaviate-clientSchema-first, GraphQL-style queries, hybrid search out of the box. Use for hybrid (vector + BM25) workloads.
pymilvusMilvus client. Use when you need very large-scale clustered vector storage.
pinecone-clientFully-hosted SaaS — no self-hosting required. Use when you want to outsource ops.
lancedbEmbedded columnar vector DB on Lance/Arrow. Use when your data is already columnar and you want zero-copy queries.
faiss-cpu / faiss-gpuLibrary, not a database — raw ANN indexes. Use when you only need similarity search, not metadata storage.

Common gotchas#

  1. 0.4.x → 0.5.x persistent-store migration. The on-disk format changed; existing chroma.sqlite3 stores need the maintainer-provided migration tool. Snapshot the directory before upgrading.
  2. Collection-API churn. Client.persist(), Client(persistence_dir=...), and PersistentClient(path=...) have been the recommended entrypoints at different times. Pin a version and copy-paste from that version’s docs, not Stack Overflow.
  3. Default embedding function downloads a model on first use. The bundled MiniLM ONNX model is fetched from a CDN — in air-gapped environments, pass an explicit embedding_function= or pre-cache the model.
  4. HNSW index parameters are set at collection creation. hnsw:space, hnsw:M, and hnsw:construction_ef cannot be retroactively changed without rebuilding the collection. Decide on cosine vs L2 distance up front.
  5. Tenancy model is recent and still maturing. Tenants and databases inside a single Chroma server are usable but underdocumented; production multi-tenant designs should test isolation carefully.
  6. Client/server version mismatch is silent. A chromadb-client from 2024 talking to a 2026 server may appear to work for add but fail on a newer query parameter. Match minor versions.
  7. In-memory Client() does not persist. Calling Client() with no path gives you an ephemeral store that vanishes on process exit. Use PersistentClient(path=...) or run the HTTP server for durability.

Real-world recipes#

The recipes below are package-level vignettes — they focus on the install footprint and the client/server topology each pattern requires, rather than re-teaching collection methods (the companion sections/ai/chromadb covers the API surface).

Persistent local store for a single-process app — the smallest possible Chroma deployment. PersistentClient writes to a directory of SQLite + parquet shards; restarting the process re-opens the same data.

import chromadb
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction

client = chromadb.PersistentClient(path=".chroma")
emb = SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
docs = client.get_or_create_collection("kb", embedding_function=emb)
docs.upsert(
    ids=["a", "b"],
    documents=["Chroma is an embedded vector DB.", "Pinecone is a hosted SaaS."],
    metadatas=[{"source": "intro"}, {"source": "intro"}],
)
print(docs.query(query_texts=["embedded database"], n_results=1))

Output: the closest match with its id, distance, and metadata; the .chroma/ directory holds chroma.sqlite3 plus per-collection parquet segments

Client-server split for multi-process serving — run chroma run --path .chroma --host 0.0.0.0 --port 8000 in one container, then point every app at it with the slim client. This is the only safe way to share a Chroma store across workers.

import chromadb

client = chromadb.HttpClient(host="chroma.internal", port=8000)
docs = client.get_collection("kb")
print(docs.count())

Output: count of documents in the remote collection; the worker image only needs chromadb-client (~5 MB) — no ONNX, no sentence-transformers, no server dependencies

Custom embedding function for an in-house model — Chroma’s embedding-function interface is a duck-typed callable. Implement __call__(self, input: list[str]) -> list[list[float]] and pass it as embedding_function=.

import chromadb
from chromadb import Documents, EmbeddingFunction, Embeddings

class MyEmbedder(EmbeddingFunction):
    def __init__(self, model):
        self.model = model
    def __call__(self, input: Documents) -> Embeddings:
        return self.model.encode(input).tolist()

client = chromadb.PersistentClient(path=".chroma")
col = client.get_or_create_collection(
    "docs",
    embedding_function=MyEmbedder(my_local_model),
    metadata={"hnsw:space": "cosine"},
)

Output: the collection materialises with a custom embedder; metadata pins the distance function at creation time (cannot be changed retroactively)

Hybrid filter + vector query — Chroma supports a structured where= filter over metadata and a where_document= substring/regex filter, applied as a prefilter to the ANN search.

results = col.query(
    query_texts=["how does HNSW work"],
    n_results=5,
    where={"section": {"$in": ["intro", "tuning"]}},
    where_document={"$contains": "HNSW"},
)

Output: the top-5 hits restricted to documents whose metadata section is intro or tuning AND whose body literally contains the substring HNSW

Multi-tenant via collection-per-tenant — Chroma’s tenants/databases primitive is still maturing; a robust pattern today is one collection per tenant with a name prefix and shared embedding function.

def get_tenant_collection(tenant_id: str):
    return client.get_or_create_collection(
        name=f"tenant_{tenant_id}",
        embedding_function=emb,
    )

Output: every tenant’s data lives in its own collection, with hard isolation at the storage layer — no risk of a faulty where clause leaking rows across tenants

Production deployment#

The two production topologies are embedded persistent (single process owns the directory) and client-server (the chroma run HTTP server fronts a shared volume). Pick early — the on-disk format is the same but the failure modes are not.

Topology checklist:

ConcernEmbedded PersistentClientClient-server (chroma run)
Concurrencyone writer only — file lock contention if you fork workersmany readers + writers via HTTP
Container imagefull chromadb (~150 MB)apps use chromadb-client (~5 MB), server image runs separately
Backupssnapshot the directory while idlesnapshot the volume; coordinate with server quiesce
Authnone — process-localstatic API key (via env var) or proxy fronting the server
Telemetrysends usage pings unless disabledsame — set CHROMA_TELEMETRY=False
Failure mode if disk fullhangs on SQLite writeserver returns 500, client retries

Pinning client and server. When you run the server in Docker (chromadb/chroma:0.5.x), pin the application’s chromadb-client to the same minor. Cross-minor combinations silently fail on newer query parameters (include=["embeddings"] was added late in 0.4.x; older clients ignore it).

Telemetry opt-out — Chroma sends anonymous PostHog pings on startup. Disable per-process with CHROMA_TELEMETRY=False, or in code via Settings(anonymized_telemetry=False). Required for many compliance reviews.

Backups. The persistent directory is roughly safe to tar while the process is idle. For zero-downtime backups, run chroma run with a filesystem that supports atomic snapshots (ZFS, btrfs, LVM, EBS snapshot) and snapshot the volume rather than copying the live directory.

Multi-tenancy strategy. Two paths exist; both work in production today:

  1. Collection-per-tenant — strong isolation; tenant deletion is a single delete_collection. Limit: ~thousands of collections per server before metadata lookups slow down.
  2. Filter-per-tenant — one shared collection with tenant_id in metadata, queried via where={"tenant_id": "..."}. Cheaper at scale, but a missing where clause leaks rows. Add an assertion in your query wrapper.

The newer tenants/databases primitive (client.create_tenant(...)) is still maturing — test isolation explicitly before relying on it for regulated workloads.

Index tuning & retrieval quality#

Chroma uses HNSW (Hierarchical Navigable Small World) under the hood. The three parameters worth knowing are hnsw:space (distance metric), hnsw:M (graph connectivity), and hnsw:construction_ef (build-time candidate pool). All three are set at collection creation time and cannot be changed without rebuilding.

col = client.create_collection(
    name="tuned_kb",
    embedding_function=emb,
    metadata={
        "hnsw:space": "cosine",         # cosine | l2 | ip
        "hnsw:M": 32,                   # default 16; higher = better recall, more RAM
        "hnsw:construction_ef": 200,    # default 100; higher = better index, slower build
        "hnsw:search_ef": 100,          # default 10; raise at query time for higher recall
    },
)

Output: a collection whose HNSW index is built with a larger candidate pool and higher graph degree — measurably better recall at p95 latency cost

Trade-off table:

ParameterLow valueHigh valueWhen to raise
M8–1648–64corpora over ~1M vectors; recall plateau
construction_ef100400+quality matters more than build time
search_ef10100–500tuned per-query; p95 latency target

Distance metric choice. For most modern sentence embeddings (MiniLM, BGE, OpenAI text-embedding-3-*), cosine is the right default — the embeddings are already unit-normalised in spirit. Use ip (inner product) only with explicitly unnormalised embeddings, and l2 only when the embedding model recommends it.

Hybrid filter + vector. Chroma applies metadata where= and document where_document= filters as prefilters — the ANN search runs over the filtered subset. This is fast when filters are selective (the search index is restricted to a small candidate set) and slow when filters match most of the corpus (the prefilter scan dominates).

Reranking pattern. Chroma does not ship a built-in reranker. The standard pattern is to over-retrieve (n_results=50), then rerank in your application with a cross-encoder (sentence-transformers/ms-marco-MiniLM-L-6-v2 or Cohere Rerank), and keep the top 5–10.

Version migration guide#

The 0.4.x → 0.5.x boundary is the largest on-disk change to date. Lesser bumps within 0.5.x also rename APIs.

0.4.x → 0.5.x checklist:

  • On-disk persistent-store schema changed. Existing chroma.sqlite3 stores need the maintainer-provided migration script. Snapshot the directory first.
  • Client(persist_directory=...) removed. Use PersistentClient(path=...) for embedded persistence, or HttpClient(host=..., port=...) for the server.
  • Client.persist() removed. Persistent clients write through automatically; the no-op call was misleading and is gone.
  • Settings(chroma_db_impl=...) removed. Backend is implicit from which client class you instantiate.
  • Telemetry env var is CHROMA_TELEMETRY=False; older ANONYMIZED_TELEMETRY no longer applies.

0.5.x minor-to-minor:

  • Tenants/databases primitive evolved across 0.5 minors — client.create_tenant(...) and client.create_database(...) signatures shifted. Pin if you depend on the new isolation model.
  • get_or_create_collection metadata validation tightened — invalid hnsw:* keys now raise instead of being silently dropped.
  • include= parameter on query() added options ("embeddings", "distances", "metadatas", "documents", "data"); older clients send unrecognised values and 0.5 servers may reject them.

Client/server pinning. Always run the same minor on both sides. The Docker image tag (chromadb/chroma:0.5.x) and the chromadb-client minor must match — cross-minor combinations sometimes work for add()/get() and silently fail on newer query parameters.

The roadmap targets a 1.0 once tenancy and the on-disk format stabilise; treat every pre-1.0 minor as a potential breaking release in CI.

Troubleshooting common errors#

The error catalogue below is what trips up new users most often. Most are environmental rather than code bugs.

  • DuplicateIDError on add(...) — the ID already exists in the collection. Switch to upsert(...), which inserts or updates.
  • InvalidCollectionException — collection name doesn’t exist or was deleted. Use get_or_create_collection for idempotent code.
  • ValueError: Expected metadata to be a non-empty dict — Chroma rejects empty metadatas dicts. Either omit metadatas= entirely or pass at least one key per row.
  • ConnectionError against chroma run — the server is not listening on the expected port, or the firewall is blocking it. curl http://chroma:8000/api/v1/heartbeat should return {"nanosecond heartbeat": ...}.
  • Could not connect to tenant with HttpClient — the server is on an older minor that doesn’t know about tenants. Either upgrade the server or instantiate without tenant=.
  • Default embedder downloads ONNX model on first run — air-gapped environments need the model pre-cached, or use a custom embedding_function=. Symptom: hang on first add() while a CDN times out.
  • OperationalError: database is locked — two PersistentClient instances opened the same directory. Embedded mode is single-writer; move to chroma run or coordinate access.
  • Cross-version protocol errorchromadb-client from 2024 hitting a 2026 server (or vice versa) fails with opaque 400 responses on newer query parameters. Match minors.

Performance tuning#

Chroma’s performance levers are about telling the engine what to skip more than telling it to go faster. The HNSW index handles vector search; everything else (filters, payload returns, embedding-function calls) is in your control.

LeverMechanismWhen it helps
HttpClient for shared workloadsserver-process serialises writesmulti-worker apps
Higher hnsw:M and hnsw:construction_efbetter indexrecall-bound queries
hnsw:search_ef per queryruntime quality knoblatency-budget tuning
Pre-filter with where=shrink the search setselective metadata predicates
include= query parameterdrop unused fieldsreduce network payload
Batch add/upsertamortise round-tripsbulk ingestion
Custom embedding_functionbypass default ONNX downloadair-gapped / faster embedders
chromadb-client slim packageskip server deps in app imagessmaller worker images

Bulk ingestion pattern. Batch documents into upsert(...) calls of a few hundred at a time. Smaller batches are network-overhead-bound; larger batches occupy the server’s index build budget and stall reads.

def chunks(iterable, n):
    buf = []
    for item in iterable:
        buf.append(item)
        if len(buf) >= n:
            yield buf
            buf = []
    if buf:
        yield buf

for batch in chunks(iter_rows(), 500):
    col.upsert(
        ids=[r["id"] for r in batch],
        documents=[r["text"] for r in batch],
        metadatas=[r["meta"] for r in batch],
    )

Output: streams documents into the collection in 500-row batches; the server returns once each batch is persisted

Query-time search_ef. Set per collection at creation, or tune per query if your version supports it. Higher search_ef improves recall linearly with query latency — production deployments set this from a p95 latency budget rather than a fixed default.

Avoid the default embedder in production. The bundled MiniLM-via-ONNX function downloads a model on first call and re-loads it per process. Pass an explicit embedding_function= that uses a long-lived embedding service (OpenAI, Cohere, local sentence-transformers) — both faster and easier to upgrade.

Embeddings & chunking strategy#

Chroma stores vectors and metadata; it does not produce embeddings (the bundled MiniLM ONNX function is a default for convenience, not a recommendation). The embedding choice dominates retrieval quality far more than HNSW tuning, so it deserves explicit attention.

Embedding-model choice. A short-list current at writing:

ModelDimensionsCostStrengths
all-MiniLM-L6-v2384local CPU/GPUsmall, fast, the Chroma default
BAAI/bge-small-en-v1.5384localbetter on MTEB than MiniLM at same dim
BAAI/bge-large-en-v1.51024local GPUstrong open-weight; needs GPU at scale
text-embedding-3-small1536 (or shorter via Matryoshka)OpenAI APIfast, accurate, hosted
text-embedding-3-large3072OpenAI APIbest-in-class accuracy, dimensions matter for storage
voyage-3-large1024+Voyage AI APIstrong on retrieval benchmarks
embed-multilingual-v31024Cohere APInon-English content

Higher-dimensional embeddings improve recall but cost storage and RAM linearly. For corpora over ~1M chunks, prefer dim ≤ 1024 with quantization, or a Matryoshka model trimmed to a shorter dimension.

Chunking decisions live upstream. Chroma stores whatever chunks you give it; bad chunking can’t be fixed at retrieval time. The standard heuristics:

  • 300–500 character chunks for narrow factual questions.
  • 800–1500 character chunks for general RAG with modern long-context LMs.
  • ~100 character overlap to avoid edge-case context truncation.
  • Respect document structure (titles, sections) — unstructured’s chunk_by_title is a sensible default.

HyDE / query rewriting. When user queries are short and noisy, embedding them directly gives weak retrieval. The Hypothetical Document Embeddings pattern (HyDE) asks an LM to generate a plausible answer, embeds that, and queries Chroma with the synthetic embedding — often better than embedding the raw query.

Parent-document retrieval. Embed small chunks for precision; return the parent document (or a wider window) to the LM for context. Store parent_id in metadata and look it up after retrieval.

Security considerations#

Chroma’s default setup has minimal security — appropriate for embedded use, dangerous for a network-exposed server.

  • No auth by default on chroma run. Place behind an authenticated reverse proxy (nginx with basic auth, an API gateway, or a VPC) before exposing to a network. Recent versions support a static API key via CHROMA_SERVER_AUTHN_PROVIDER/CHROMA_SERVER_AUTHN_CREDENTIALS; verify the version’s docs.
  • TLS not built in — the HTTP server speaks plaintext. Terminate TLS at a proxy.
  • Telemetry phones home unless disabled with CHROMA_TELEMETRY=False. Required for many compliance reviews.
  • Multi-tenant leakage risk under filter-per-tenant. A missing where={"tenant_id": ...} clause returns rows across tenants. Either wrap every query in code that enforces the filter, or use collection-per-tenant for hard isolation.
  • Prompt injection via retrieved content. Documents in Chroma are returned verbatim to the LM. A malicious uploaded document can contain instructions the LM follows (“ignore previous instructions and…”). Validate upload provenance; consider a sanitisation pass on retrieved content before prompt assembly.
  • PII in metadata. Metadata is returned in every query response; don’t store anything you wouldn’t want in logs.
  • Backup security. The persistent directory is plaintext SQLite + parquet. Encrypt at rest at the volume layer (LUKS, EBS encryption, etc.).

When NOT to use this#

Chroma is the easiest vector DB to start with; it stops being the right tool at a clear scale boundary.

  • Corpora over ~10 million vectors with strict p95 latency. Qdrant or Milvus have more mature distributed stories. Chroma works but tuning gets painful.
  • Hybrid (BM25 + vector) is a core requirement. Weaviate ships hybrid out of the box; Chroma needs application-side BM25 (e.g. rank-bm25 + post-merge), which is more code.
  • Strict multi-tenant isolation with thousands of tenants. Collection-per-tenant slows down past low thousands; filter-per-tenant is leaky if a query forgets the where clause. Postgres-with-pgvector or a hosted service may fit better.
  • You want a fully-hosted, managed service. Pinecone, Weaviate Cloud, and Qdrant Cloud all front their own engines. Chroma Cloud exists but is younger.
  • Sparse-vector workloads (e.g. SPLADE). Qdrant and Weaviate have first-class sparse support; Chroma is dense-only at this writing.

See also#

  • AI: chromadb — collections, queries, embedding functions, framework integration
  • Concept: RAG — retrieval-augmented generation patterns
  • Concept: API — REST design fundamentals