scipy#
What it is#
scipy is the scientific-computing companion to numpy — a sibling project under the same NumFOCUS umbrella. It bundles production implementations of algorithms numpy intentionally does not ship: numerical optimization (scipy.optimize), statistics (scipy.stats), signal processing (scipy.signal), sparse matrices (scipy.sparse), interpolation, integration, FFTs, and special functions.
On PyPI scipy sits one rung below numpy in import-graph centrality — depended on by scikit-learn, statsmodels, scikit-image, networkx, and most domain-specific scientific stacks. Reach for scipy whenever numpy alone is not enough and you need a battle-tested algorithm rather than a hand-rolled one.
Install#
pip install scipy
Output: (none — exits 0 on success)
uv add scipy
Output: dependency resolved, lockfile updated; pulls numpy automatically
poetry add scipy
Output: installed into the project venv
pip install scipy --only-binary=:all:
Output: forces a wheel install — avoids accidentally compiling SciPy from source on niche platforms (~30 min build)
Versioning & Python support#
scipy follows the SPEC 0 support window (matched with numpy) — the latest three Python minor versions plus the most recent numpy versions. Releases are roughly twice a year; the public API is conservatively versioned with DeprecationWarning one minor before removal.
| SciPy line | Python support | Numpy requirement |
|---|---|---|
| 1.11.x | 3.9 – 3.12 | numpy >= 1.21 |
| 1.13.x | 3.10 – 3.12 | numpy >= 1.22.4, < 2 |
| 1.14.x | 3.10 – 3.13 | numpy >= 1.23.5, supports numpy 2.x |
| 1.15.x+ | 3.10 – 3.13 | numpy 2.x preferred |
A scipy install always re-pins numpy upward — if you pip install scipy after pinning numpy==1.21, the resolver will upgrade numpy or fail.
Package metadata#
- Maintainer: SciPy steering council under NumFOCUS sponsorship
- Project home: github.com/scipy/scipy
- Docs: docs.scipy.org
- License: BSD-3-Clause (core); a few bundled algorithms have GPL/LGPL upstream (see Gotchas)
- PyPI: pypi.org/project/scipy
- Governance: SciPy Enhancement Proposals (SPEPs); steering council
- First released: 2001 (Travis Oliphant, Pearu Peterson, Eric Jones)
- Downloads: > 100 M / month on PyPI
Optional dependencies & extras#
scipy ships no pip extras — submodules are imported as scipy.<subpackage> and are part of the same wheel. Companion packages typically installed alongside:
pip install numpy scipy matplotlib pandas scikit-learn jupyter
Output: installs the analytical / modelling stack
| Submodule | Purpose |
|---|---|
scipy.optimize | minimisation, curve fitting, root finding |
scipy.stats | distributions, hypothesis tests, descriptive stats |
scipy.signal | filters, FFT, spectrogram |
scipy.sparse | sparse matrices and operations |
scipy.spatial | KD-trees, distance, geometry |
scipy.integrate | quad, ODE solvers (solve_ivp) |
scipy.interpolate | splines, RBF, gridded interpolation |
scipy.linalg | LAPACK wrappers — lazy-loaded BLAS |
scipy.special | gamma, bessel, erf, … |
scipy.fft | FFT (preferred over the deprecated scipy.fftpack) |
scipy.io | MATLAB .mat, WAV, NetCDF readers |
Install size is ~50 MB unpacked — significantly larger than numpy because BLAS/LAPACK are statically linked.
Alternatives#
| Package | One-line trade-off |
|---|---|
| numpy | core arrays; scipy needed for algorithms numpy intentionally omits |
| statsmodels | richer regression / time-series stats than scipy.stats |
| scikit-learn | ML on top of scipy/numpy; partly overlapping (e.g. SVD) |
| pymc | Bayesian modelling; statsmodels/scipy for frequentist |
| jax.scipy | scipy-shaped API with autograd + GPU/TPU |
| cupyx.scipy | scipy mirror running on NVIDIA GPUs |
| sympy | symbolic math; scipy is numerical |
Common gotchas#
- Large install (~50 MB unpacked). Includes a statically linked BLAS/LAPACK. CI containers and serverless deployments feel this — slim images may need explicit
scipywheels rather than--no-binary. - GPL/LGPL submodule corners. A handful of algorithms wrap GPL or LGPL upstream code (some optimisation routines and special functions historically). The scipy wheel itself stays BSD-3 but if you re-ship scipy plus your code, audit the license report. Most users are not affected — only those building closed-source redistributions need to check.
- numpy upgrade coupling. Upgrading scipy frequently upgrades numpy. Pin both in lockfiles.
scipy.fftpackis deprecated in favour ofscipy.fft. New code should never import the old one.scipy.miscis gone. Imageio / Pillow replaced the toy demo helpers years ago. Any tutorial that importsscipy.misc.imreadis pre-2020.- Default optimisation tolerances.
scipy.optimize.minimizedefaults are loose for ill-conditioned problems. Setoptions={"ftol": 1e-10, "xtol": 1e-10}for tight fits. scipy.statsdistributions are slow to instantiate.norm.pdf(x, loc=5, scale=2)re-parses keyword arguments every call. Freeze the distribution:rv = norm(loc=5, scale=2)thenrv.pdf(x)for hot loops.- Build-from-source is painful. On platforms without a wheel (uncommon platforms, niche Python versions), scipy needs a Fortran compiler and a BLAS library — easily 30 minutes to compile. Use
--only-binary=:all:to fail fast instead.
Real-world recipes#
scipy’s submodules cover so much ground that the recipes here are organised by submodule rather than by pipeline. Each shows the packaging-level context — what’s pulled in, what trade-offs you’re making — rather than re-teaching the API (sections/python/scipy covers the API).
Curve fitting with scipy.optimize.curve_fit:
import numpy as np
from scipy.optimize import curve_fit
def logistic(x, L, k, x0):
return L / (1 + np.exp(-k * (x - x0)))
rng = np.random.default_rng(0)
x = np.linspace(0, 10, 200)
y_true = logistic(x, 1.0, 1.5, 5.0)
y_obs = y_true + rng.normal(scale=0.05, size=x.size)
(L, k, x0), cov = curve_fit(logistic, x, y_obs, p0=[1.0, 1.0, 5.0])
print(f"L={L:.3f}, k={k:.3f}, x0={x0:.3f}")
Output: parameter estimates close to the true (1.0, 1.5, 5.0); cov is the parameter covariance for confidence intervals
Sparse linear-system solve with scipy.sparse:
import numpy as np
from scipy.sparse import csr_array, eye
from scipy.sparse.linalg import spsolve
N = 10_000
A = eye(N, format="csr") * 4 + csr_array(np.diag(np.ones(N - 1), 1)).tolil().tocsr()
b = np.arange(N, dtype=float)
x = spsolve(A, b)
print(x[:5])
Output: the first 5 values of the solution; the sparse solver handles the 10000x10000 system in milliseconds despite the dense equivalent being 800 MB
Distribution fitting with scipy.stats:
import numpy as np
from scipy import stats
rng = np.random.default_rng(0)
samples = rng.normal(loc=5.0, scale=2.0, size=10_000)
# Method-of-moments / MLE fit
params = stats.norm.fit(samples)
print(f"mu={params[0]:.3f}, sigma={params[1]:.3f}")
# Goodness of fit
ks_stat, ks_p = stats.kstest(samples, "norm", args=params)
print(f"KS={ks_stat:.4f}, p={ks_p:.4f}")
Output: estimated parameters close to (5.0, 2.0) plus a KS test statistic and p-value for the null that the sample is normal
Signal processing — filter design:
import numpy as np
from scipy.signal import butter, sosfiltfilt
fs = 1000.0
sig = np.sin(2 * np.pi * 5 * np.linspace(0, 2, int(fs * 2)))
noise = np.random.default_rng(0).normal(scale=0.5, size=sig.size)
noisy = sig + noise
sos = butter(N=4, Wn=10, btype="low", fs=fs, output="sos")
clean = sosfiltfilt(sos, noisy)
print(clean[:5])
Output: the first 5 samples of the low-pass-filtered signal; second-order-sections (sos) format is numerically more stable than b, a for higher-order filters
ODE solve with scipy.integrate.solve_ivp:
import numpy as np
from scipy.integrate import solve_ivp
def lorenz(t, y, sigma=10.0, rho=28.0, beta=8 / 3):
x, y_, z = y
return [sigma * (y_ - x), x * (rho - z) - y_, x * y_ - beta * z]
sol = solve_ivp(lorenz, t_span=(0, 5), y0=[1.0, 1.0, 1.0], dense_output=True, max_step=0.01)
print(sol.t.size, sol.y.shape)
Output: number of time steps taken and the (3, N) state trajectory; dense_output=True builds an interpolant for arbitrary-t evaluation
Performance tuning#
scipy performance breaks down into two layers: the BLAS/LAPACK numpy is linked against (linear algebra, eigensolvers, FFT) and the pure-scipy algorithms (optimisation, integration, statistics). Tuning differs by submodule.
import numpy as np
import scipy
np.show_config()
print(scipy.show_config())
Output: the BLAS the build was linked against (OpenBLAS on most wheels, Accelerate on macOS) — confirms whether you have a fast or vendor-default backend
Tuning levers by submodule:
| Submodule | Tuning lever | When it matters |
|---|---|---|
scipy.linalg | BLAS thread cap (OPENBLAS_NUM_THREADS) | inside parallel CV / multiprocessing |
scipy.optimize | tol=, options={"maxiter": ...} | ill-conditioned problems |
scipy.optimize | Provide analytical jac=, hess= | high-dim minimisation |
scipy.fft | workers=-1 keyword | large FFTs on multi-core |
scipy.sparse | Pick CSR for row ops, CSC for column ops | avoid format conversion churn |
scipy.stats | Freeze distributions: rv = norm(5, 2); rv.pdf(x) | hot loops over a fixed distribution |
scipy.integrate.solve_ivp | method="LSODA" adaptive | stiff ODEs |
Sparse matrix format gotcha:
from scipy.sparse import csr_array, csc_array
# CSR is fast for row slicing, slow for column slicing
A = csr_array([[1, 0, 0], [0, 0, 2], [0, 3, 0]])
print(A[0]) # row slice — fast
print(A[:, 0]) # column slice — triggers a warning + format conversion
Output: the first row, then the first column with a SparseEfficiencyWarning. Switch to csc_array when column access dominates.
Memory & dataset-size scaling#
scipy’s algorithms are generally in-RAM. The main scaling stories are sparse representations (massive memory reduction for matrices with mostly zeros) and chunked spectral / signal processing.
import numpy as np
from scipy.sparse import csr_array
N = 1_000_000
data = np.ones(3_000_000)
row = np.random.default_rng(0).integers(0, N, 3_000_000)
col = np.random.default_rng(1).integers(0, N, 3_000_000)
A = csr_array((data, (row, col)), shape=(N, N))
print(f"sparse: {A.data.nbytes / 1e6:.1f} MB, dense would be: {N * N * 8 / 1e9:.0f} GB")
Output: the sparse matrix’s data footprint vs the (8 TB) dense equivalent — sparse storage is the only way this fits in any RAM at all
For genuinely huge problems:
- Iterative linear solvers (
scipy.sparse.linalg.cg,gmres,bicgstab) for systems too big for direct factorisation. - Out-of-core eigensolvers —
scipy.sparse.linalg.eigshfor partial eigendecompositions when the full matrix would not fit. scipy.signalchunked filtering — process audio / time series in overlapping blocks rather than loading the full waveform.- PyAMG, petsc4py — external libraries for very large sparse problems; pip-installable alongside scipy.
scipy does not have a streaming or out-of-core model of its own; the path past one node is usually a domain-specific library (PETSc, Trilinos) or hand-rolled chunking.
Version migration guide#
scipy ships roughly twice a year. The recent breaks worth knowing about:
1.10 → 1.11:
- Dropped Python 3.8 support.
scipy.misc(already empty) finally removed.- Several
scipy.statsAPI tightenings (keyword-only arguments).
1.11 → 1.13:
scipy.fftpackdeprecation accelerated — new code should always usescipy.fft.scipy.sparsenow has both a matrix-style (csr_matrix) and an array-style (csr_array) API. New code should usecsr_array— it follows NumPy’sndarraysemantics rather than the legacy MATLAB-style matrix semantics.
1.13 → 1.14:
- Full NumPy 2.x compatibility.
- Some
scipy.statsdistribution methods became keyword-only (stats.norm.fit(data, loc=...)).
1.14 → 1.15:
- Several optimisation method names tightened.
scipy.linalg.solve(..., assume_a="pos")for the symmetric-positive-definite path (faster than the general solver).
from scipy.sparse import csr_matrix, csr_array
# Legacy matrix-style (avoid in new code)
m = csr_matrix([[1, 0], [0, 2]])
print(type(m), m * m) # legacy: `*` is matrix multiplication
# Modern array-style
a = csr_array([[1, 0], [0, 2]])
print(type(a), a @ a, a * a) # `@` is matmul; `*` is element-wise
Output: the matrix prints with csr_matrix-style repr; the array prints with csr_array-style and supports the same operator semantics as numpy
Pin scipy in production — pre-1.0 scipy versions are gone from most index mirrors, and the post-1.0 line breaks downstream pickles (saved sklearn models, for instance) when minor versions change.
Interop with adjacent ecosystems#
scipy lives upstream of most of the scientific stack. The main interop concerns are sparse-matrix exchange and the Array API.
| Library | scipy → other | other → scipy | Zero-copy? |
|---|---|---|---|
| numpy | n/a — scipy IS numpy underneath | n/a | Yes (same buffers) |
| scikit-learn | sklearn accepts scipy.sparse directly | sklearn returns numpy arrays | Yes |
| pandas | pd.DataFrame.sparse.from_spmatrix(m) | df.sparse.to_coo() | Partial |
| networkx | nx.from_scipy_sparse_array(A) | nx.to_scipy_sparse_array(G) | Copy |
| pytorch | torch.from_numpy(A.toarray()) | dense round-trip | Copy |
| jax | jnp.asarray(A.toarray()) | dense | Copy |
| matlab (.mat) | scipy.io.savemat / loadmat | n/a | Copy |
from scipy.sparse import csr_array
from sklearn.linear_model import LogisticRegression
import numpy as np
# 1000 documents x 50000 vocab — way too big as dense, fine as sparse
rng = np.random.default_rng(0)
nnz = 10_000
X = csr_array((rng.normal(size=nnz), (rng.integers(1000, size=nnz), rng.integers(50_000, size=nnz))), shape=(1000, 50_000))
y = rng.integers(0, 2, size=1000)
model = LogisticRegression(max_iter=1000).fit(X, y)
print(model.coef_.shape)
Output: shape (1, 50000) — sklearn consumed the sparse matrix without densifying it, which is the difference between fits-in-RAM and 400 MB of zeros
Troubleshooting common errors#
The errors below cover the recurring frictions; most are about scipy assumptions (BLAS, sparse format) rather than the API itself.
ImportError: cannot import name 'imread' from 'scipy.misc'—scipy.miscremoved; useimageio.v3.imreadinstead.OptimizeWarning: Covariance of the parameters could not be estimated— your fit is degenerate or your initialp0=was too far off. Try better initial guesses or use a more robust method (method="trf").SparseEfficiencyWarningwhen mixing CSR/CSC operations — implicit format conversion. Convert once explicitly (A.tocsc()) or stick to one format throughout.LinAlgError: SVD did not converge— ill-conditioned matrix. Inspect withnp.linalg.cond(A); add regularisation (Tikhonov).scipy.statsdistribution slow — you are creating the frozen object inside a hot loop.rv = norm(loc=5, scale=2)once, thenrv.pdf(x).solve_ivpreturnssuccess=False— adaptive step failed. Trymethod="LSODA"(stiff/non-stiff auto-switching) or relaxrtol/atol.scipy.signal.lfilternumerically unstable on high-order filters — switch tososfiltfiltwith second-order-sections.- scipy build-from-source begins — wheel missing. Use
pip install scipy --only-binary=:all:to fail fast, then investigate why no wheel matched your platform. - Mixing numpy 1.x and scipy 1.14+ — scipy 1.14 dropped numpy 1.x support. Pin both.
When NOT to use this#
scipy is the right answer most of the time it’s reached for; the cases below are where a specialised library wins.
- Pure statistics with R-style formulas: statsmodels —
smf.ols("y ~ x1 + x2", data=df)plus richer regression diagnostics. - Bayesian modelling: pymc, NumPyro. scipy.stats is frequentist.
- GPU computation: CuPy + cupyx.scipy, or JAX (
jax.scipy). - Symbolic math: SymPy.
- Production ML: scikit-learn directly; scipy is the foundation, sklearn is the layered API.
- Distributed compute: scipy is single-process. Use dask-glm, dask-ml, or ray.
See also#
- sections/python/scipy — full API tutorial (stats, optimize, signal, sparse)
- sections/python/numpy — the array foundation scipy is built on
- sections/python/scikit-learn — ML on top of scipy
- sections/packages-pip/pip-numpy — sibling foundation
- sections/packages-pip/pip-scikit-learn — downstream consumer