skip to content

Regular Expressions

JavaScript has built-in RegExp support (ES2018+ with named groups, lookbehind, dotAll). Covers literal syntax, flags, character classes, methods, named captures, and common patterns.

16 min read 44 snippets deep dive

Regular Expressions#

What it is#

JavaScript has built-in Regular Expression support based on a PCRE-like syntax. ES2018 added named capture groups, lookbehind assertions, and the s (dotAll) flag. ES2022 added the d (indices) flag. ES2024 added the v (unicodeSets) flag. RegExp literals are compiled at parse time; new RegExp() is evaluated at runtime (useful for dynamic patterns).

Literal syntax vs RegExp constructor#

// Literal — compiled at parse time; use for static patterns
const re = /hello/i;

// Constructor — evaluated at runtime; use for dynamic patterns
const word = "hello";
const re2 = new RegExp(word, "i");

// Escape special characters when building from user input
function escapeRegExp(str) {
  return str.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
}
const userInput = "file.txt";
const safe = new RegExp(escapeRegExp(userInput), "g");

Flags#

FlagNameEffect
gglobalFind all matches (not just first); advances lastIndex
iignoreCaseCase-insensitive matching
mmultiline^ and $ match start/end of each line, not the whole string
sdotAll. matches newline characters too (ES2018)
uunicodeFull Unicode mode; enables \u{…} escapes and \p{…} properties
vunicodeSetsSuperset of u; enables set operations [A--B], [A&&B] (ES2024)
dhasIndicesAdds .indices array to match results with start/end positions (ES2022)
ystickyMatch only at lastIndex position; does not advance past non-matches
const str = "Hello\nWorld";
/^world/i.test(str);   // false — ^ matches start of string only
/^world/im.test(str);  // true  — m flag makes ^ match start of line
/hello.world/s.test(str); // true — s flag allows . to match \n

Character classes and syntax#

Character classes define sets of characters that a position may match. Square brackets [...] match any one character in the set; shorthand escapes like \d, \w, and \s expand to common sets; anchors and quantifiers control position and repetition.

// Character classes
/[aeiou]/     // any vowel
/[^aeiou]/    // any non-vowel
/[a-z]/       // a through z
/[a-zA-Z0-9]/ // alphanumeric

// Shorthand classes
/\d/   // digit: [0-9]
/\D/   // non-digit
/\w/   // word char: [a-zA-Z0-9_]
/\W/   // non-word char
/\s/   // whitespace (space, tab, newline, etc.)
/\S/   // non-whitespace
/./    // any char except newline (unless s flag)

// Anchors
/^start/   // start of string (or line with m flag)
/end$/     // end of string (or line with m flag)
/\bword\b/ // word boundary
/\Bword\B/ // non-word boundary

// Quantifiers
/a*/    // 0 or more
/a+/    // 1 or more
/a?/    // 0 or 1
/a{3}/  // exactly 3
/a{2,5}/  // 2 to 5
/a{2,}/   // 2 or more

// Quantifier greediness
/a+/    // greedy: matches as many as possible
/a+?/   // lazy: matches as few as possible

Groups#

Parentheses group sub-expressions and, by default, capture the matched text as a numbered group. Use (?:...) to group without capturing (cheaper, no slot allocated); use (?<name>...) for named captures accessible via match.groups.

// Capturing group — captured in match result
/(foo)(bar)/.exec("foobar");
// ['foobar', 'foo', 'bar', index: 0, ...]

// Non-capturing group — groups without capture
/(?:foo)(bar)/.exec("foobar");
// ['foobar', 'bar', index: 0, ...]

// Named capturing group (ES2018)
/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/.exec("2026-04-26");
// match.groups = { year: '2026', month: '04', day: '26' }

// Backreferences to captured group
/(['"]).*?\1/.test('"quoted"');  // true — \1 refers to group 1
/(?<q>['"]).*?\k<q>/.test('"quoted"'); // named backreference

Lookahead and lookbehind#

Zero-width assertions that match a position based on what precedes or follows it, without consuming characters. Lookaheads ((?=...) / (?!...)) are part of ES5; lookbehinds ((?<=...) / (?<!...)) require ES2018 or later.

// Positive lookahead — match X only if followed by Y
/\d+(?= dollars)/.exec("100 dollars"); // ['100']

// Negative lookahead — match X only if NOT followed by Y
/\d+(?! dollars)/.exec("100 euros");  // ['100']

// Positive lookbehind (ES2018) — match X only if preceded by Y
/(?<=\$)\d+/.exec("$42");   // ['42']

// Negative lookbehind (ES2018) — match X only if NOT preceded by Y
/(?<!\$)\d+/.exec("42 USD"); // ['42']

String methods with RegExp#

Strings expose match, matchAll, replace, replaceAll, search, and split, all of which accept a RegExp. match / matchAll extract matches; replace / replaceAll substitute them; search returns the match index; split uses the pattern as a delimiter.

const str = "The quick brown fox";

// test() — boolean check
/quick/.test(str);   // true

// match() — returns first match (no g flag) or all matches (g flag)
str.match(/\w+/);    // ['The', index: 0, input: '...', groups: undefined]
str.match(/\w+/g);   // ['The', 'quick', 'brown', 'fox']

// matchAll() — returns iterator of all matches WITH groups (requires g flag)
const re = /(?<word>\w+)/g;
for (const match of str.matchAll(re)) {
  console.log(match.groups.word, "at", match.index);
}

// search() — returns index of first match, or -1
str.search(/fox/);   // 16

// replace() — replace first match (no g) or all (g flag)
str.replace(/\b\w/, (c) => c.toUpperCase()); // Already capitalized...
"foo foo foo".replace(/foo/, "bar");    // "bar foo foo"
"foo foo foo".replace(/foo/g, "bar");   // "bar bar bar"

// replaceAll() — always replaces all (string or regex with g flag)
"foo foo foo".replaceAll("foo", "bar"); // "bar bar bar"

// split() — split on pattern
"one1two2three".split(/\d/); // ['one', 'two', 'three']

RegExp.prototype methods#

test(str) returns a boolean; exec(str) returns the next match object (including capture groups) and advances lastIndex when the g or y flag is set. Prefer str.matchAll(re) over manual exec loops for cleaner iteration.

const re = /(\d+)/g;
const str = "foo123bar456";

// exec() — returns next match each call (stateful with g flag)
let match;
while ((match = re.exec(str)) !== null) {
  console.log(`Found ${match[1]} at index ${match.index}`);
}

Output:

Found 123 at index 3
Found 456 at index 9
// test() — returns boolean
/^\d+$/.test("12345");  // true
/^\d+$/.test("123ab");  // false

[!WARNING] Using .exec() or .test() with a regex that has the g or y flag advances lastIndex. If you reuse the same regex object, always reset re.lastIndex = 0 between operations, or use str.match(re) instead.

Named capture groups#

Named groups ((?<name>...)) make complex patterns self-documenting and let you access captures by name via match.groups instead of by index. They also work in replace() substitution strings as $<name>.

const dateRe = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = "2026-04-26".match(dateRe);

const { year, month, day } = match.groups;
console.log(year, month, day);  // 2026 04 26

// Named groups in replace()
"2026-04-26".replace(dateRe, "$<day>/$<month>/$<year>");
// "26/04/2026"

Output:

2026 04 26

.replace() with a function callback#

When you pass a function as the second argument to .replace(), it is called for each match and its return value becomes the replacement string. The callback receives the full match, any capture group strings, the match offset, and the original string.

// Callback receives: (fullMatch, ...captureGroups, offset, originalStr)
"hello world".replace(/(\w+)/g, (match) => match.toUpperCase());
// "HELLO WORLD"

// Convert kebab-case to camelCase
"my-variable-name".replace(/-([a-z])/g, (_, char) => char.toUpperCase());
// "myVariableName"

// Pad all numbers to 3 digits
"item1 costs 5 dollars and item12 costs 99 dollars"
  .replace(/\d+/g, (n) => n.padStart(3, "0"));
// "item001 costs 005 dollars and item012 costs 099 dollars"

Unicode support#

The u flag enables full Unicode mode: multi-codepoint characters (emoji, supplementary scripts) are handled as single units, \u{HHHH} extended escapes work, and \p{Property} Unicode property escapes become available for matching categories like letters, digits, or scripts.

// \u{…} extended Unicode escapes — requires u or v flag
/\u{1F600}/u.test("😀");  // true
/\u{1F600}/.test("😀");   // false (u flag required)

// \p{…} Unicode property escapes — requires u or v flag
/\p{Letter}/u.test("é");        // true
/\p{Decimal_Number}/u.test("٣"); // true (Arabic digit)
/\p{Script=Greek}/u.test("α");  // true

// Without u flag, . does not match surrogate pairs correctly
"😀".match(/./);   // matches only half the emoji (lone surrogate)
"😀".match(/./u);  // matches the full emoji

d flag — match indices (ES2022)#

const match = /(?<name>\w+)/.exec("hello world", "d");
// With d flag on the regex:
const re = /(?<name>\w+)/d;
const m = re.exec("hello world");
console.log(m.indices[0]);        // [0, 5] — start/end of full match
console.log(m.indices.groups.name); // [0, 5] — start/end of named group

Common pattern templates#

// Email (simplified — not RFC-complete)
const email = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
email.test("user@example.com"); // true

// URL (http/https)
const url = /^https?:\/\/[^\s/$.?#].[^\s]*$/i;
url.test("https://example.com/path?q=1"); // true

// IPv4 address
const ipv4 = /^(\d{1,3}\.){3}\d{1,3}$/;
ipv4.test("192.168.1.255"); // true

// Date YYYY-MM-DD
const isoDate = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/;
isoDate.test("2026-04-26"); // true

// UUID v4
const uuid = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
uuid.test("550e8400-e29b-41d4-a716-446655440000"); // true

// URL slug (lowercase, hyphens, alphanumeric)
const slug = /^[a-z0-9]+(?:-[a-z0-9]+)*$/;
slug.test("my-article-title"); // true
slug.test("My Article Title"); // false

// Hex color (#rgb or #rrggbb)
const hexColor = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
hexColor.test("#1a2b3c"); // true
hexColor.test("#abc");    // true

Sticky flag (y)#

With the y flag, the regex must match at exactly lastIndex — it does not scan forward. Each successful match advances lastIndex to the end of that match, making it efficient for sequential tokeniser/lexer implementations that process a string left-to-right.

const re = /\d+/y;
re.lastIndex = 4;
re.exec("abc 123 456"); // ['123'] — matched at exactly position 4
re.lastIndex;           // 7 — advanced past match
re.exec("abc 123 456"); // ['456'] — matched at position 7 (space then digit)

The y flag is useful for tokenizer/lexer implementations where you process a string sequentially and need to match at a specific position.

v flag — unicodeSets (ES2024)#

The v flag is a superset of u. It enables set notation inside character classes — intersection (&&), subtraction (--), and string-literal alternatives (\q{abc}). It also disallows several quirks that u permitted, making patterns stricter but more predictable.

// Set difference — letters minus ASCII letters (i.e. non-ASCII letters)
/[\p{Letter}--[a-zA-Z]]/v.test("é");     // true
/[\p{Letter}--[a-zA-Z]]/v.test("a");     // false

// Set intersection — uppercase letters that are also ASCII
/[\p{Uppercase}&&[a-zA-Z]]/v.test("A");  // true
/[\p{Uppercase}&&[a-zA-Z]]/v.test("Α");  // false (Greek Alpha is uppercase but not ASCII)

// Strings inside a character class — match either of two multi-codepoint sequences
/^[\q{👨‍👩‍👧|🚀}]$/v.test("🚀");           // true

// Negated string-class
/[^\q{foo|bar}]/v.test("baz");           // true
/[^\q{foo|bar}]/v.test("foo");           // false

The v flag is the right default for new code — it unlocks set operations and tightens the grammar without losing anything u could do. It is mutually exclusive with u; specifying both throws SyntaxError.

RegExp lastIndex and statefulness#

A RegExp with g or y is stateful: it carries a lastIndex cursor that advances on every successful match. Reusing the same object across operations without resetting lastIndex causes silent skipped matches and is one of the most common regex bugs.

const re = /foo/g;
re.test("foo foo");   // true, lastIndex now 3
re.test("foo foo");   // true, lastIndex now 7
re.test("foo foo");   // false — lastIndex (7) is past the end
re.test("foo foo");   // true again — false reset lastIndex to 0

Cures:

  • Use a fresh literal at the call site: /foo/g.test(...) allocates a new object each time.
  • Or use a stateless operation: str.match(re), str.matchAll(re), str.replace(re, ...) all reset internally.
  • For sticky matching, explicitly assign re.lastIndex = 0 between unrelated operations.
// SAFE — string methods are not affected by lastIndex
const re = /\d+/g;
"a1 b2 c3".match(re);  // ['1', '2', '3'] — no matter what re.lastIndex was before

String.prototype.matchAll vs RegExp.prototype.exec#

matchAll(regex) returns an iterator of full Match objects (the same shape exec returns). It is the modern, stateless replacement for the historical while ((m = re.exec(str))) loop.

const text = "Alice: 30, Bob: 25, Carol: 35";
const re = /(?<name>\w+): (?<age>\d+)/g;

// Modern — single allocation, no shared state
for (const m of text.matchAll(re)) {
  console.log(`${m.groups.name} is ${m.groups.age}`);
}

Output:

Alice is 30
Bob is 25
Carol is 35

[!WARNING] matchAll requires the g flag. Passing a non-global regex throws TypeError: matchAll must be called with a global RegExp.

matchAll also exposes .indices when the regex has the d flag — start/end offsets for the full match and every named group.

const re = /(?<name>\w+):/gd;
for (const m of "alice: 30, bob: 25".matchAll(re)) {
  console.log(m.indices.groups.name);   // [start, end] tuple
}

Output:

[ 0, 5 ]
[ 11, 14 ]

.replace() with named-group substitution#

When the replacement is a string, you can reference named groups with $<name> and numeric groups with $1$9. $& is the full match, $\`` and $’` are the preceding and following text.

const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/g;

// Reorder named groups
"2026-05-25".replace(re, "$<day>/$<month>/$<year>");
// "25/05/2026"

// Substitution metasequences
"hello world".replace(/o/g, "[$&]");
// "hell[o] w[o]rld"

"abc-def".replace(/-/, "<$`|$'>");
// "abc<abc|def>def"  ($` = preceding, $' = following)

Patterns and anti-patterns: catastrophic backtracking#

ECMAScript regex uses a backtracking NFA engine, the same family used by Python re and PCRE. Patterns that allow the same character to be matched in multiple ways take exponential time on near-miss inputs. The classic shape is nested quantifiers with overlapping alternation.

// DANGEROUS — runs effectively forever on a non-match input
// /^(a+)+$/.test("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!")

// SAFE — one quantifier, no overlap
/^a+$/.test("aaaaaaaaaaaaaaaa");   // true

// SAFE — replace alternation with a negated class
/^[^!]+$/.test("aaaaaaaaaaaaaaaa"); // true

// SAFE — anchor with explicit non-overlapping classes
const ipPart = /(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)/;
const ipv4 = new RegExp(`^${ipPart.source}(\\.${ipPart.source}){3}$`);
ipv4.test("192.168.1.255");  // true
ipv4.test("256.0.0.1");      // false

JavaScript does not support possessive quantifiers (a++) or atomic groups ((?>...)) — the two PCRE features that exist precisely to prevent backtracking. The mitigation in JS is to refactor the pattern so the engine has no choice to backtrack: use negated classes ([^>]) instead of .+?, anchor with ^/\b, and never nest quantifiers on the same character class.

Differences from Python re#

For cross-language work — porting a regex from Python to JS or vice versa — these are the most common surprises. The full reference for Python is the re article; the PCRE comparison lives in linux/pcre.

FeatureJavaScriptPython re
Named group syntax(?<name>…)(?P<name>…)
Named backreference (pattern)\k<name>(?P=name)
Named backreference (replacement)$<name>\g<name>
Verbose / x flag (whitespace + comments)not supportedre.X / re.VERBOSE
Variable-length lookbehindsupported (V8)not supported (fixed only)
Atomic groups / possessive quantifiersnot supportedpossessive since 3.11
Unicode property escapes \p{…}requires u or v flagalways available (default Unicode)
Default \w matches Unicodeonly with u flagyes
Set operations in character classesv flag ([A--B], [A&&B])not supported
Recursive patterns (?R)not supportednot supported
Sticky y flagyesnot supported
// Same date pattern in both languages — note the (?<…>) vs (?P<…>) difference
const jsDate = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
// Python:    re.compile(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})')

Real-world recipes#

Stable line splitter (handles CR, LF, CRLF)#

A robust line splitter needs to accept either Unix or Windows line endings without producing trailing empty strings.

const text = "a\r\nb\nc\rd\r\n";
const lines = text.split(/\r\n|\r|\n/);
console.log(lines);

Output:

[ 'a', 'b', 'c', 'd', '' ]

The trailing empty string comes from the final \r\n — drop it with a filter if undesired.

URL slugifier#

Lowercase, strip diacritics, collapse non-alphanumerics to single hyphens, trim hyphens. The Unicode normalization step ensures café becomes cafe, not caf.

function slugify(title) {
  return title
    .normalize("NFD")
    .replace(/[̀-ͯ]/g, "")    // strip combining marks
    .toLowerCase()
    .replace(/[^a-z0-9]+/g, "-")
    .replace(/^-+|-+$/g, "");
}

console.log(slugify("Hello, World! — JavaScript 2026"));

Output:

hello-world-javascript-2026

Replace emoji with shortcodes#

String.prototype.replace with a callback receives the matched character; \p{Emoji} in unicode mode catches all of them. The map below is illustrative — production code would use a full table from an emoji library.

const SHORTCODES = { "🚀": ":rocket:", "🔥": ":fire:", "✨": ":sparkles:" };

const shortcoded = "ship it 🚀 🔥 ✨".replace(
  /\p{Emoji_Presentation}/gu,
  (e) => SHORTCODES[e] ?? e
);

console.log(shortcoded);

Output:

ship it :rocket: :fire: :sparkles:

Strip ANSI escape sequences#

Common when sanitising captured CLI output before writing to a log file or rendering it in HTML.

const ANSI = /\x1b\[[0-9;]*m/g;
const colored = "\x1b[31mERROR\x1b[0m: \x1b[1mfile\x1b[0m missing";
console.log(colored.replace(ANSI, ""));

Output:

ERROR: file missing

Extract structured records with named groups#

Parse an Apache-like log line into typed fields in a single pass.

const line = '192.168.1.5 - alice [25/May/2026:13:00:42 +0000] "GET /api HTTP/1.1" 200 1234';

const LOG = /^(?<ip>\S+) \S+ (?<user>\S+) \[(?<time>[^\]]+)\] "(?<method>\S+) (?<path>\S+) (?<proto>[^"]+)" (?<status>\d+) (?<bytes>\d+)$/;

const m = line.match(LOG);
console.log(m.groups);

Output:

{
  ip: '192.168.1.5',
  user: 'alice',
  time: '25/May/2026:13:00:42 +0000',
  method: 'GET',
  path: '/api',
  proto: 'HTTP/1.1',
  status: '200',
  bytes: '1234'
}

Tokenize source code with y flag#

The sticky flag is ideal for hand-written lexers. Each pattern matches at exactly the current cursor; advancing lastIndex consumes the token.

const SPEC = [
  ["NUM",   /\d+/y],
  ["IDENT", /[a-zA-Z_]\w*/y],
  ["OP",    /[+\-*/=]/y],
  ["WS",    /\s+/y],
];

function* tokenize(input) {
  let i = 0;
  while (i < input.length) {
    let matched = false;
    for (const [type, re] of SPEC) {
      re.lastIndex = i;
      const m = re.exec(input);
      if (m && m.index === i) {
        if (type !== "WS") yield { type, value: m[0], index: i };
        i = re.lastIndex;
        matched = true;
        break;
      }
    }
    if (!matched) throw new SyntaxError(`unexpected char at ${i}: ${input[i]}`);
  }
}

console.log([...tokenize("x = 10 + 20")]);

Output:

[
  { type: 'IDENT', value: 'x', index: 0 },
  { type: 'OP', value: '=', index: 2 },
  { type: 'NUM', value: '10', index: 4 },
  { type: 'OP', value: '+', index: 7 },
  { type: 'NUM', value: '20', index: 9 }
]

Escape user input for inclusion in a regex#

Whenever a user-supplied string ends up inside a new RegExp(...), escape it. The function below covers every metacharacter the ECMA spec defines.

function escapeRegExp(s) {
  return s.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
}

const needle = "C++ (cpp)";
const haystack = "I love C++ (cpp) and JavaScript";
const re = new RegExp(escapeRegExp(needle), "g");
console.log(haystack.replace(re, "[REDACTED]"));

Output:

I love [REDACTED] and JavaScript

Strip HTML tags (lightweight)#

Quick-and-dirty plain-text extraction. For untrusted HTML, use a real parser (DOMParser in the browser, parse5 or cheerio in Node) — this regex does not handle nested CDATA or arbitrary attribute payloads safely.

const html = "<p>Hello <strong>world</strong>!</p>";
console.log(html.replace(/<[^>]+>/g, ""));

Output:

Hello world!

Validate semantic version strings#

A near-spec-compliant SemVer 2 pattern, broken into named groups for downstream parsing.

const SEMVER = /^v?(?<major>0|[1-9]\d*)\.(?<minor>0|[1-9]\d*)\.(?<patch>0|[1-9]\d*)(?:-(?<pre>[\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?(?:\+(?<build>[\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?$/;

for (const v of ["1.2.3", "v1.2.3-beta.4+exp", "1.02.3"]) {
  const m = v.match(SEMVER);
  console.log(v, "→", m ? m.groups : null);
}

Output:

1.2.3 → { major: '1', minor: '2', patch: '3', pre: undefined, build: undefined }
v1.2.3-beta.4+exp → { major: '1', minor: '2', patch: '3', pre: 'beta.4', build: 'exp' }
1.02.3 → null

Common pitfalls#

  1. Reusing a g or y regex object without resetting lastIndex — half your matches silently disappear. Either re-create the regex or use a stateless string method.
  2. new RegExp from user input without escaping — turns a literal search box into a denial-of-service vector via catastrophic patterns. Always run input through an escapeRegExp helper.
  3. Capture groups when you only need grouping — slow and pollutes .groups / numbered references. Use (?:…) unless you actually need the capture.
  4. Greedy quantifiers on <tag>-like patterns<.+> on <a><b> matches the whole thing. Use a negated class <[^>]+> for safety and speed.
  5. /\d/ without u on Arabic / Bengali digits\d matches only ASCII [0-9] by default. With the u flag, \p{Decimal_Number} matches every Unicode decimal digit.
  6. Forgetting g on replaceAll-with-regex"aaa".replaceAll(/a/, "b") throws TypeError: must be global. Use /a/g or the string form.
  7. .match() with a g regex throws away capture groups — it returns just the full matches. Use .matchAll() or .exec() to keep groups.
  8. Escaping in template stringsnew RegExp(\\d+`)is\d+because backslashes are eaten by the string literal. Either useString.raw\\d+` or double-escape.
  9. Variable-length lookbehind portability — V8 (Chrome, Node) supports it; some older engines don’t. Test target runtimes if you ship to legacy browsers.
  10. lastIndex on a literal in a hot loop/foo/g inside a function body allocates a new RegExp on every call. Hoist to module scope when patterns are constants.