scipy#

What it is#

scipy is the scientific-computing companion to numpy — a sibling project under the same NumFOCUS umbrella. It bundles production implementations of algorithms numpy intentionally does not ship: numerical optimization (scipy.optimize), statistics (scipy.stats), signal processing (scipy.signal), sparse matrices (scipy.sparse), interpolation, integration, FFTs, and special functions.

On PyPI scipy sits one rung below numpy in import-graph centrality — depended on by scikit-learn, statsmodels, scikit-image, networkx, and most domain-specific scientific stacks. Reach for scipy whenever numpy alone is not enough and you need a battle-tested algorithm rather than a hand-rolled one.

Install#

pip install scipy

Output: (none — exits 0 on success)

uv add scipy

Output: dependency resolved, lockfile updated; pulls numpy automatically

poetry add scipy

Output: installed into the project venv

pip install scipy --only-binary=:all:

Output: forces a wheel install — avoids accidentally compiling SciPy from source on niche platforms (~30 min build)

Versioning & Python support#

scipy follows the SPEC 0 support window (matched with numpy) — the latest three Python minor versions plus the most recent numpy versions. Releases are roughly twice a year; the public API is conservatively versioned with DeprecationWarning one minor before removal.

SciPy line	Python support	Numpy requirement
1.11.x	3.9 – 3.12	numpy >= 1.21
1.13.x	3.10 – 3.12	numpy >= 1.22.4, < 2
1.14.x	3.10 – 3.13	numpy >= 1.23.5, supports numpy 2.x
1.15.x+	3.10 – 3.13	numpy 2.x preferred

A scipy install always re-pins numpy upward — if you pip install scipy after pinning numpy==1.21, the resolver will upgrade numpy or fail.

Package metadata#

Maintainer: SciPy steering council under NumFOCUS sponsorship
Project home: github.com/scipy/scipy
Docs: docs.scipy.org
License: BSD-3-Clause (core); a few bundled algorithms have GPL/LGPL upstream (see Gotchas)
PyPI: pypi.org/project/scipy
Governance: SciPy Enhancement Proposals (SPEPs); steering council
First released: 2001 (Travis Oliphant, Pearu Peterson, Eric Jones)
Downloads: > 100 M / month on PyPI

Optional dependencies & extras#

scipy ships no pip extras — submodules are imported as scipy.<subpackage> and are part of the same wheel. Companion packages typically installed alongside:

pip install numpy scipy matplotlib pandas scikit-learn jupyter

Output: installs the analytical / modelling stack

Submodule	Purpose
`scipy.optimize`	minimisation, curve fitting, root finding
`scipy.stats`	distributions, hypothesis tests, descriptive stats
`scipy.signal`	filters, FFT, spectrogram
`scipy.sparse`	sparse matrices and operations
`scipy.spatial`	KD-trees, distance, geometry
`scipy.integrate`	quad, ODE solvers (`solve_ivp`)
`scipy.interpolate`	splines, RBF, gridded interpolation
`scipy.linalg`	LAPACK wrappers — lazy-loaded BLAS
`scipy.special`	gamma, bessel, erf, …
`scipy.fft`	FFT (preferred over the deprecated `scipy.fftpack`)
`scipy.io`	MATLAB `.mat`, WAV, NetCDF readers

Install size is ~50 MB unpacked — significantly larger than numpy because BLAS/LAPACK are statically linked.

Alternatives#

Package	One-line trade-off
numpy	core arrays; scipy needed for algorithms numpy intentionally omits
statsmodels	richer regression / time-series stats than `scipy.stats`
scikit-learn	ML on top of scipy/numpy; partly overlapping (e.g. SVD)
pymc	Bayesian modelling; statsmodels/scipy for frequentist
jax.scipy	scipy-shaped API with autograd + GPU/TPU
cupyx.scipy	scipy mirror running on NVIDIA GPUs
sympy	symbolic math; scipy is numerical

Common gotchas#

Large install (~50 MB unpacked). Includes a statically linked BLAS/LAPACK. CI containers and serverless deployments feel this — slim images may need explicit scipy wheels rather than --no-binary.
GPL/LGPL submodule corners. A handful of algorithms wrap GPL or LGPL upstream code (some optimisation routines and special functions historically). The scipy wheel itself stays BSD-3 but if you re-ship scipy plus your code, audit the license report. Most users are not affected — only those building closed-source redistributions need to check.
numpy upgrade coupling. Upgrading scipy frequently upgrades numpy. Pin both in lockfiles.
scipy.fftpack is deprecated in favour of scipy.fft. New code should never import the old one.
scipy.misc is gone. Imageio / Pillow replaced the toy demo helpers years ago. Any tutorial that imports scipy.misc.imread is pre-2020.
Default optimisation tolerances. scipy.optimize.minimize defaults are loose for ill-conditioned problems. Set options={"ftol": 1e-10, "xtol": 1e-10} for tight fits.
scipy.stats distributions are slow to instantiate. norm.pdf(x, loc=5, scale=2) re-parses keyword arguments every call. Freeze the distribution: rv = norm(loc=5, scale=2) then rv.pdf(x) for hot loops.
Build-from-source is painful. On platforms without a wheel (uncommon platforms, niche Python versions), scipy needs a Fortran compiler and a BLAS library — easily 30 minutes to compile. Use --only-binary=:all: to fail fast instead.

Real-world recipes#

scipy’s submodules cover so much ground that the recipes here are organised by submodule rather than by pipeline. Each shows the packaging-level context — what’s pulled in, what trade-offs you’re making — rather than re-teaching the API (sections/python/scipy covers the API).

Curve fitting with scipy.optimize.curve_fit:

import numpy as np
from scipy.optimize import curve_fit

def logistic(x, L, k, x0):
    return L / (1 + np.exp(-k * (x - x0)))

rng = np.random.default_rng(0)
x = np.linspace(0, 10, 200)
y_true = logistic(x, 1.0, 1.5, 5.0)
y_obs = y_true + rng.normal(scale=0.05, size=x.size)

(L, k, x0), cov = curve_fit(logistic, x, y_obs, p0=[1.0, 1.0, 5.0])
print(f"L={L:.3f}, k={k:.3f}, x0={x0:.3f}")

Output: parameter estimates close to the true (1.0, 1.5, 5.0); cov is the parameter covariance for confidence intervals

Sparse linear-system solve with scipy.sparse:

import numpy as np
from scipy.sparse import csr_array, eye
from scipy.sparse.linalg import spsolve

N = 10_000
A = eye(N, format="csr") * 4 + csr_array(np.diag(np.ones(N - 1), 1)).tolil().tocsr()
b = np.arange(N, dtype=float)
x = spsolve(A, b)
print(x[:5])

Output: the first 5 values of the solution; the sparse solver handles the 10000x10000 system in milliseconds despite the dense equivalent being 800 MB

Distribution fitting with scipy.stats:

import numpy as np
from scipy import stats

rng = np.random.default_rng(0)
samples = rng.normal(loc=5.0, scale=2.0, size=10_000)

# Method-of-moments / MLE fit
params = stats.norm.fit(samples)
print(f"mu={params[0]:.3f}, sigma={params[1]:.3f}")

# Goodness of fit
ks_stat, ks_p = stats.kstest(samples, "norm", args=params)
print(f"KS={ks_stat:.4f}, p={ks_p:.4f}")

Output: estimated parameters close to (5.0, 2.0) plus a KS test statistic and p-value for the null that the sample is normal

Signal processing — filter design:

import numpy as np
from scipy.signal import butter, sosfiltfilt

fs = 1000.0
sig = np.sin(2 * np.pi * 5 * np.linspace(0, 2, int(fs * 2)))
noise = np.random.default_rng(0).normal(scale=0.5, size=sig.size)
noisy = sig + noise

sos = butter(N=4, Wn=10, btype="low", fs=fs, output="sos")
clean = sosfiltfilt(sos, noisy)
print(clean[:5])

Output: the first 5 samples of the low-pass-filtered signal; second-order-sections (sos) format is numerically more stable than b, a for higher-order filters

ODE solve with scipy.integrate.solve_ivp:

import numpy as np
from scipy.integrate import solve_ivp

def lorenz(t, y, sigma=10.0, rho=28.0, beta=8 / 3):
    x, y_, z = y
    return [sigma * (y_ - x), x * (rho - z) - y_, x * y_ - beta * z]

sol = solve_ivp(lorenz, t_span=(0, 5), y0=[1.0, 1.0, 1.0], dense_output=True, max_step=0.01)
print(sol.t.size, sol.y.shape)

Output: number of time steps taken and the (3, N) state trajectory; dense_output=True builds an interpolant for arbitrary-t evaluation

Performance tuning#

scipy performance breaks down into two layers: the BLAS/LAPACK numpy is linked against (linear algebra, eigensolvers, FFT) and the pure-scipy algorithms (optimisation, integration, statistics). Tuning differs by submodule.

import numpy as np
import scipy

np.show_config()
print(scipy.show_config())

Output: the BLAS the build was linked against (OpenBLAS on most wheels, Accelerate on macOS) — confirms whether you have a fast or vendor-default backend

Tuning levers by submodule:

Submodule	Tuning lever	When it matters
`scipy.linalg`	BLAS thread cap (`OPENBLAS_NUM_THREADS`)	inside parallel CV / multiprocessing
`scipy.optimize`	`tol=`, `options={"maxiter": ...}`	ill-conditioned problems
`scipy.optimize`	Provide analytical `jac=`, `hess=`	high-dim minimisation
`scipy.fft`	`workers=-1` keyword	large FFTs on multi-core
`scipy.sparse`	Pick CSR for row ops, CSC for column ops	avoid format conversion churn
`scipy.stats`	Freeze distributions: `rv = norm(5, 2)`; `rv.pdf(x)`	hot loops over a fixed distribution
`scipy.integrate.solve_ivp`	`method="LSODA"` adaptive	stiff ODEs

Sparse matrix format gotcha:

from scipy.sparse import csr_array, csc_array

# CSR is fast for row slicing, slow for column slicing
A = csr_array([[1, 0, 0], [0, 0, 2], [0, 3, 0]])
print(A[0])         # row slice — fast
print(A[:, 0])      # column slice — triggers a warning + format conversion

Output: the first row, then the first column with a SparseEfficiencyWarning. Switch to csc_array when column access dominates.

Memory & dataset-size scaling#

scipy’s algorithms are generally in-RAM. The main scaling stories are sparse representations (massive memory reduction for matrices with mostly zeros) and chunked spectral / signal processing.

import numpy as np
from scipy.sparse import csr_array

N = 1_000_000
data = np.ones(3_000_000)
row = np.random.default_rng(0).integers(0, N, 3_000_000)
col = np.random.default_rng(1).integers(0, N, 3_000_000)
A = csr_array((data, (row, col)), shape=(N, N))
print(f"sparse: {A.data.nbytes / 1e6:.1f} MB, dense would be: {N * N * 8 / 1e9:.0f} GB")

Output: the sparse matrix’s data footprint vs the (8 TB) dense equivalent — sparse storage is the only way this fits in any RAM at all

For genuinely huge problems:

Iterative linear solvers (scipy.sparse.linalg.cg, gmres, bicgstab) for systems too big for direct factorisation.
Out-of-core eigensolvers — scipy.sparse.linalg.eigsh for partial eigendecompositions when the full matrix would not fit.
scipy.signal chunked filtering — process audio / time series in overlapping blocks rather than loading the full waveform.
PyAMG, petsc4py — external libraries for very large sparse problems; pip-installable alongside scipy.

scipy does not have a streaming or out-of-core model of its own; the path past one node is usually a domain-specific library (PETSc, Trilinos) or hand-rolled chunking.

Version migration guide#

scipy ships roughly twice a year. The recent breaks worth knowing about:

1.10 → 1.11:

Dropped Python 3.8 support.
scipy.misc (already empty) finally removed.
Several scipy.stats API tightenings (keyword-only arguments).

1.11 → 1.13:

scipy.fftpack deprecation accelerated — new code should always use scipy.fft.
scipy.sparse now has both a matrix-style (csr_matrix) and an array-style (csr_array) API. New code should use csr_array — it follows NumPy’s ndarray semantics rather than the legacy MATLAB-style matrix semantics.

1.13 → 1.14:

Full NumPy 2.x compatibility.
Some scipy.stats distribution methods became keyword-only (stats.norm.fit(data, loc=...)).

1.14 → 1.15:

Several optimisation method names tightened.
scipy.linalg.solve(..., assume_a="pos") for the symmetric-positive-definite path (faster than the general solver).

from scipy.sparse import csr_matrix, csr_array

# Legacy matrix-style (avoid in new code)
m = csr_matrix([[1, 0], [0, 2]])
print(type(m), m * m)            # legacy: `*` is matrix multiplication

# Modern array-style
a = csr_array([[1, 0], [0, 2]])
print(type(a), a @ a, a * a)     # `@` is matmul; `*` is element-wise

Output: the matrix prints with csr_matrix-style repr; the array prints with csr_array-style and supports the same operator semantics as numpy

Pin scipy in production — pre-1.0 scipy versions are gone from most index mirrors, and the post-1.0 line breaks downstream pickles (saved sklearn models, for instance) when minor versions change.

Interop with adjacent ecosystems#

scipy lives upstream of most of the scientific stack. The main interop concerns are sparse-matrix exchange and the Array API.

Library	scipy → other	other → scipy	Zero-copy?
numpy	n/a — scipy IS numpy underneath	n/a	Yes (same buffers)
scikit-learn	sklearn accepts scipy.sparse directly	sklearn returns numpy arrays	Yes
pandas	`pd.DataFrame.sparse.from_spmatrix(m)`	`df.sparse.to_coo()`	Partial
networkx	`nx.from_scipy_sparse_array(A)`	`nx.to_scipy_sparse_array(G)`	Copy
pytorch	`torch.from_numpy(A.toarray())`	dense round-trip	Copy
jax	`jnp.asarray(A.toarray())`	dense	Copy
matlab (.mat)	`scipy.io.savemat` / `loadmat`	n/a	Copy

from scipy.sparse import csr_array
from sklearn.linear_model import LogisticRegression
import numpy as np

# 1000 documents x 50000 vocab — way too big as dense, fine as sparse
rng = np.random.default_rng(0)
nnz = 10_000
X = csr_array((rng.normal(size=nnz), (rng.integers(1000, size=nnz), rng.integers(50_000, size=nnz))), shape=(1000, 50_000))
y = rng.integers(0, 2, size=1000)

model = LogisticRegression(max_iter=1000).fit(X, y)
print(model.coef_.shape)

Output: shape (1, 50000) — sklearn consumed the sparse matrix without densifying it, which is the difference between fits-in-RAM and 400 MB of zeros

Troubleshooting common errors#

The errors below cover the recurring frictions; most are about scipy assumptions (BLAS, sparse format) rather than the API itself.

ImportError: cannot import name 'imread' from 'scipy.misc' — scipy.misc removed; use imageio.v3.imread instead.
OptimizeWarning: Covariance of the parameters could not be estimated — your fit is degenerate or your initial p0= was too far off. Try better initial guesses or use a more robust method (method="trf").
SparseEfficiencyWarning when mixing CSR/CSC operations — implicit format conversion. Convert once explicitly (A.tocsc()) or stick to one format throughout.
LinAlgError: SVD did not converge — ill-conditioned matrix. Inspect with np.linalg.cond(A); add regularisation (Tikhonov).
scipy.stats distribution slow — you are creating the frozen object inside a hot loop. rv = norm(loc=5, scale=2) once, then rv.pdf(x).
solve_ivp returns success=False — adaptive step failed. Try method="LSODA" (stiff/non-stiff auto-switching) or relax rtol/atol.
scipy.signal.lfilter numerically unstable on high-order filters — switch to sosfiltfilt with second-order-sections.
scipy build-from-source begins — wheel missing. Use pip install scipy --only-binary=:all: to fail fast, then investigate why no wheel matched your platform.
Mixing numpy 1.x and scipy 1.14+ — scipy 1.14 dropped numpy 1.x support. Pin both.

When NOT to use this#

scipy is the right answer most of the time it’s reached for; the cases below are where a specialised library wins.

Pure statistics with R-style formulas: statsmodels — smf.ols("y ~ x1 + x2", data=df) plus richer regression diagnostics.
Bayesian modelling: pymc, NumPyro. scipy.stats is frequentist.
GPU computation: CuPy + cupyx.scipy, or JAX (jax.scipy).
Symbolic math: SymPy.
Production ML: scikit-learn directly; scipy is the foundation, sklearn is the layered API.
Distributed compute: scipy is single-process. Use dask-glm, dask-ml, or ray.

g h	home
g p	Programming section
g p	Python section
g j	JavaScript section
g t	TypeScript section
g o	OS section
g l	Linux section
g w	Windows section
g z	z/OS section
g o	macOS section
g a	AI section
g c	Claude Code section
g c	Codex CLI section
g c	Claude API section
g p	Prompting section
g f	Frameworks section
g p	Packages section
g p	Pip (Python) section
g p	npm (Node) section
g p	Cargo (Rust) section
g p	Go modules section
g g	graph view
g t	tags index

⌘K / /	open search palette
t	cycle theme (dark → light → system)
?	toggle this panel

[ / ]	previous / next sheet in section
j / k	scroll down / up

scipy — Scientific Algorithms on NumPy