skip to content

cut, paste & join — Column Tools

Extract columns by delimiter or byte position (cut), merge files column-wise (paste), and join on a common key field (join). Essential for tab/CSV/field-delimited data.

15 min read 48 snippets deep dive

cut, paste & join — Column Tools#

What it is#

cut, paste, and join are POSIX standard text utilities included in every Unix and Linux system for working with column-delimited data. cut extracts specific fields or byte ranges from each line; paste merges multiple files side by side column-wise; join performs a relational inner or outer join on two sorted files that share a common key field. Reach for these tools when you need fast, dependency-free column extraction or merging in a shell pipeline; for more complex transformations, awk gives you full field-level programming.

cut#

Extract specific fields or byte/character ranges from each line.

Syntax#

cut OPTION... [FILE...]

Output: (none — exits 0 on success)

By delimiter field (-d / -f)#

-d sets the field delimiter (a single character) and -f selects which fields to output. Fields are numbered from 1; use N-M for a range, N- for field N to end, and a comma-separated list for non-contiguous fields.

cut -d: -f1           /etc/passwd    # first field (username)

Output:

root
daemon
bin
sys
man
nobody
cut -d: -f1,7         /etc/passwd    # fields 1 and 7

Output:

root:/bin/bash
daemon:/usr/sbin/nologin
bin:/usr/sbin/nologin
sys:/usr/sbin/nologin
man:/usr/sbin/nologin
nobody:/usr/sbin/nologin
cut -d: -f1-3         /etc/passwd    # fields 1 through 3

Output:

root:x:0
daemon:x:1
bin:x:2
sys:x:3
man:x:6
nobody:x:65534
cut -d: -f3-          /etc/passwd    # field 3 to end
cut -d, -f2           data.csv       # CSV second column

Output:

Alice
Frank
Carol
Dave
Eve
cut -d$'\t' -f1,3     data.tsv       # tab-delimited fields 1 and 3
cut -d' ' -f2-        sentence.txt   # all words except first

# Change output delimiter — pre-9.11 idiom (use tr or awk)
cut -d: -f1,3 /etc/passwd | tr ':' '\t'

# GNU coreutils ≥ 9.11: native -O for output delimiter
cut -d: -f1,3 -O $'\t' /etc/passwd

Output: (none — exits 0 on success)

By character position (-c)#

-c selects output by character position rather than field, making it ideal for fixed-width data where columns align at known offsets. For purely ASCII input -c and -b are equivalent; they diverge only with multibyte encodings.

cut -c1       file.txt    # first character of each line
cut -c1-10    file.txt    # characters 1–10

Output:

The quick
Lorem ipsu
Filesystem
2026-04-24
cut -c5-      file.txt    # character 5 to end
cut -c1,5,10  file.txt    # characters 1, 5, and 10
cut -c-80     file.txt    # max 80 chars (truncate long lines)

Output: (none — exits 0 on success)

By byte position (-b)#

-b selects by raw byte offset, which matters when the input contains multibyte UTF-8 characters — a single character may occupy 2–4 bytes, so -b and -c will give different results. Use -b when you need exact binary slicing of a stream.

cut -b1-4   binary.dat   # bytes 1–4 (differs from -c for multibyte chars)

Output: (none — exits 0 on success)

Suppress undelimited lines#

cut -d: -f1 -s /etc/passwd   # -s: skip lines without the delimiter

Output: (none — exits 0 on success)

paste#

Merge files horizontally (column-by-column).

Syntax#

paste [OPTIONS] [FILE...]

Output: (none — exits 0 on success)

paste file1.txt file2.txt            # merge side by side (tab-delimited)
paste -d, file1.txt file2.txt        # use comma as delimiter
paste -d'\t' names.txt emails.txt    # explicit tab

# Serial mode (-s): transpose — each file becomes one tab-joined line
paste -s file.txt
paste -s -d, file.txt                # comma-separated

# Combine N columns from same file
paste - - < file.txt             # 2 lines → 1 row (2 columns)
paste - - - < file.txt           # 3 lines → 1 row (3 columns)

Output: (none — exits 0 on success)

Practical paste examples#

# Create a CSV from two column files
paste -d, ids.txt names.txt > combined.csv

# Add line numbers to a file
seq 1 $(wc -l < file.txt) | paste -d'\t' - file.txt

# Interleave lines from two files
paste -d'\n' file1.txt file2.txt

# Recreate a CSV from a column of values
paste -s -d, values.txt

Output: (none — exits 0 on success)

join#

Join lines from two files on a common key field (like SQL inner join).

Syntax#

join [OPTIONS] FILE1 FILE2

Output: (none — exits 0 on success)

Both files must be sorted on the join key first.

# Join on field 1 (default)
join sorted1.txt sorted2.txt

# Join on specific fields
join -1 2 -2 1 file1.txt file2.txt   # field 2 of f1, field 1 of f2

# Change output delimiter
join -t, file1.csv file2.csv

# Include unmatched lines (outer join)
join -a 1 file1.txt file2.txt        # + unmatched from file1
join -a 2 file1.txt file2.txt        # + unmatched from file2
join -a 1 -a 2 file1.txt file2.txt   # full outer join

# Fill missing fields
join -a 1 -e 'N/A' -o 0,1.2,2.2 file1.txt file2.txt

# Suppress matched lines (anti-join)
join -v 1 file1.txt file2.txt    # lines in f1 not in f2
join -v 2 file1.txt file2.txt    # lines in f2 not in f1

Output: (none — exits 0 on success)

join example#

# employees.txt (sorted by ID):
# 101 Alice
# 102 Frank
# 103 Carol

# salaries.txt (sorted by ID):
# 101 75000
# 102 82000
# 104 91000

join employees.txt salaries.txt
# 101 Alice 75000
# 102 Frank 82000

join -a 1 employees.txt salaries.txt
# 101 Alice 75000
# 102 Frank 82000
# 103 Carol           ← Carol has no salary record

Output: (none — exits 0 on success)

Practical pipelines#

# Extract second column from CSV, remove header, sort, count unique
tail -n +2 data.csv | cut -d, -f2 | sort | uniq -c | sort -rn

# Get all usernames from /etc/passwd
cut -d: -f1 /etc/passwd | sort

Output:

bin
daemon
mail
man
nobody
root
sys
www-data

# Get the home directories of users with /bin/bash shell
grep '/bin/bash$' /etc/passwd | cut -d: -f6

# Compare two lists (IDs in file1 not in file2)
join -v 1 <(sort ids1.txt) <(sort ids2.txt)

# Build a quick lookup from key=value file
cut -d= -f1,2 config.env | tr '=' '\t'

# Transpose a whitespace-delimited matrix
# (for small matrices — use awk for larger ones)
paste $(for i in $(seq 1 $(awk '{print NF; exit}' matrix.txt)); do
  echo <(cut -d' ' -f$i matrix.txt)
done)

Output: (none — exits 0 on success)

What’s new in GNU coreutils 9.11 (April 2026)#

Coreutils 9.11 is the first release to ship a fully multi-byte aware cut: -c now slices on logical characters in any UTF-8 locale without surprises, and three new options close long-standing gaps with BSD/macOS and BusyBox/Toybox cut. Check your version with cut --version-w/-O/-F will return unrecognized option on coreutils ≤ 9.10 and on BSD-only systems unless they already provide their own implementation.

OptionMeaningPre-9.11 workaround
-wTreat any run of whitespace (tabs + spaces) as the field separatortr -s ' \t' '\t' | cut -f…
-O STRINGSet the output delimiter (any string)--output-delimiter=STRING (still works) or tr
-F LISTBSD-style alias: combines -w + -O in a single flagawk '{print $N}'
# -w: split on any whitespace run, no need to tr -s first
echo 'alpha   beta   gamma' | cut -w -f2          # → beta

# -O: short alias for --output-delimiter
cut -d: -f1,5,7 -O '|' /etc/passwd | head -2

Output:

root|root|/bin/bash
daemon|daemon|/usr/sbin/nologin
# -F: BSD/macOS shorthand for "split on whitespace, emit with this delimiter"
ps aux | cut -F ',' -f1,2,11 | head -3

Output: (none — exits 0 on success)

Multi-byte awareness. Before 9.11, cut -c on a UTF-8 stream sometimes truncated mid-codepoint when the locale wasn’t C.UTF-8. From 9.11 onward, -c always counts whole characters regardless of locale; only -b still slices on raw bytes. The café/-b warnings in the section below still apply — -b is for binary or strict-ASCII data only.

The three selection modes#

cut exposes exactly three mutually-exclusive selection modes, and forgetting which one you used is the most common source of confusion. Pick -f for field-delimited data (the common case), -c for fixed-width text where you count characters, and -b only when you genuinely need raw byte offsets — for example, slicing a binary blob.

FlagSelects byBest forPitfalls
-f LISTField number (1-based)CSV/TSV//etc/passwd-style dataRequires -d (default delim is TAB)
-c LISTCharacter positionFixed-width reports, terminal outputSplits multibyte chars by codepoint
-b LISTByte offsetBinary or strictly ASCII fixed-widthSplits multibyte UTF-8 into broken bytes
# These three commands look similar but behave differently
cut -f1   data.tsv      # field 1 (tab is default delimiter)
cut -c1-5 data.tsv      # first 5 characters of each line
cut -b1-5 data.tsv      # first 5 bytes of each line

Output: (none — exits 0 on success)

-f always needs -d (or implicitly TAB)#

-f without -d assumes the field delimiter is a literal tab. Spaces are not delimiters, which trips up people who try cut -f2 file.txt on space-separated input. Use tr -s ' ' '\t' first if the data is space-padded, or switch to awk whose default -F collapses runs of whitespace.

# Wrong — single-space columns look unchanged in output
echo "alpha beta gamma" | cut -f2     # prints whole line

# Right — set a space delimiter
echo "alpha beta gamma" | cut -d' ' -f2    # → beta

# Or normalise whitespace to tabs first
echo "alpha   beta   gamma" | tr -s ' ' '\t' | cut -f2    # → beta

Output: (none — exits 0 on success)

-d only accepts a single character#

cut’s -d flag takes exactly one byte — no multi-character delimiters, no regex, no escape sequences beyond what the shell evaluates. This is the single biggest reason people migrate from cut to awk -F: a delimiter like :: or , (comma-space) can’t be expressed natively.

# Wrong — cut will use only the first character
cut -d'::' -f1 file        # treated as cut -d':' (warning on some systems)

# Right alternatives
awk -F'::' '{print $1}' file              # awk handles multi-char
awk -F', *' '{print $1}' file              # regex delimiter
sed -E 's/, +/\t/g' file | cut -f1         # normalise then cut

Output: (none — exits 0 on success)

Multi-byte characters: -c vs -b#

In a UTF-8 locale, -c operates on logical characters (codepoints) while -b operates on raw bytes. A character like é occupies one codepoint but two bytes, so cut -c1 returns the full é while cut -b1 returns only its leading byte — which is not a valid character on its own. This is why -b should be reserved for binary or strictly-ASCII data.

echo 'café' | cut -c1-3         # → caf  (three characters)
echo 'café' | cut -c1-4         # → café (four characters)
echo 'café' | cut -b1-3         # → caf  (three bytes — still ASCII)
echo 'café' | cut -b1-5         # → café (the 'é' takes bytes 4–5)
echo 'café' | cut -b1-4         # → caf? (truncates 'é' mid-byte)

Output: (none — exits 0 on success)

[!TIP] If you see mojibake (é instead of é) in cut -b output, you have truncated a multi-byte character. Switch to -c or set LC_ALL=C.UTF-8 and use -c.

—complement#

--complement inverts the field/character selection so you keep everything except the listed fields. This is faster to type than enumerating the columns you want when only one or two need to be dropped.

# Drop the password hash field from /etc/shadow (field 2)
cut -d: -f2 --complement /etc/passwd | head -3

Output:

root:0:0:root:/root:/bin/bash
daemon:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:2:2:bin:/bin:/usr/sbin/nologin
# Drop the first column of a CSV
cut -d, -f1 --complement data.csv > data_no_id.csv

# Keep all characters except the first 8 (e.g. strip a timestamp prefix)
cut -c1-8 --complement log.txt

Output: (none — exits 0 on success)

—output-delimiter#

Without --output-delimiter, cut reuses the input delimiter on output, which is fine for round-tripping but means you can’t simultaneously split on : and emit tabs. --output-delimiter=STRING accepts any string (not limited to one character), so it doubles as a quick way to convert delimiters.

# Convert :-delimited /etc/passwd fields to TSV
cut -d: -f1,5,7 --output-delimiter=$'\t' /etc/passwd | head -3

Output:

root	root	/bin/bash
daemon	daemon	/usr/sbin/nologin
bin	bin	/usr/sbin/nologin
# Multi-character output delimiter
cut -d, -f1,2,3 --output-delimiter=' | ' data.csv | head -3

Output:

id | name | role
1 | Alice | admin
2 | Frank | user

-s — suppress lines without the delimiter#

By default, when -f is used, lines that do not contain the delimiter are passed through unchanged (which is rarely what you want). -s discards those lines silently, which is essential when grepping a log file where some lines are headers and others are field-delimited entries.

# A mixed file: header text + CSV rows
printf '%s\n' '# Report generated 2026-04-24' 'id,name,role' '1,Alice,admin' '2,Frank,user' > mixed.csv

cut -d, -f2 mixed.csv          # prints all lines (header is unchanged)
cut -d, -f2 -s mixed.csv       # skips lines without ',' — keeps CSV rows only

Output (cut -d, -f2 -s mixed.csv):

name
Alice
Frank

Collapsing repeated delimiters#

cut treats every delimiter character as a separate boundary, so two consecutive delimiters produce an empty field. This is unlike awk’s default whitespace-splitting, which collapses runs. To get awk-like behaviour with cut, pre-process with tr -s.

# A line with runs of spaces
echo 'alpha   beta   gamma' | cut -d' ' -f2     # → '' (empty field!)

# Collapse runs of spaces to single spaces first
echo 'alpha   beta   gamma' | tr -s ' ' | cut -d' ' -f2   # → beta

# Or just use awk
echo 'alpha   beta   gamma' | awk '{print $2}'    # → beta

Output: (none — exits 0 on success)

cut vs awk: when to use which#

cut is faster, smaller, and pipeline-safe for simple delimiter splits. awk is the right tool the moment you need any of: multi-character delimiters, regex delimiters, runs-of-whitespace, field reordering, conditional row filtering, or arithmetic. As a rule of thumb: if the problem fits on a postcard with cut, use cut; otherwise reach for awk.

Needcutawk
Extract column 1 of a CSVcut -d, -f1awk -F, '{print $1}'
Reorder columnsnot possibleawk -F, '{print $3,$1}'
Multi-char delimiternot possibleawk -F'::'
Collapse whitespaceneeds tr -s firstdefault behaviour
Filter rows by valueneeds grep firstawk -F, '$3>100'
Change output delimiter--output-delimiter`BEGIN{OFS="
Sum a columncut … | paste -sd+ | bcawk '{s+=$2} END{print s}'
Speed on giant filesfasterslightly slower
# Same task three ways — pick the shortest that does the job
cut -d, -f1,3 data.csv                       # cut: simple slice
awk -F, '{print $1,$3}' data.csv             # awk: same, space-joined
awk -F, -v OFS=, '{print $1,$3}' data.csv    # awk: keep CSV format

Output: (none — exits 0 on success)

Pairing cut with paste#

cut … | paste is the canonical “extract two columns, then recombine them with a new delimiter” pattern. Process substitution (<(cmd)) lets paste read multiple cut pipelines in parallel without temp files.

# Build a new TSV from columns 1 and 7 of /etc/passwd
paste <(cut -d: -f1 /etc/passwd) <(cut -d: -f7 /etc/passwd) | head -3

Output:

root	/bin/bash
daemon	/usr/sbin/nologin
bin	/usr/sbin/nologin
# Swap columns 1 and 2 (cut can't reorder, but paste can)
paste <(cut -d, -f2 data.csv) <(cut -d, -f1 data.csv) | tr '\t' ','

Output: (none — exits 0 on success)

Recipes#

# 1. Extract a single CSV column safely (assumes no embedded commas)
cut -d, -f2 data.csv

# 2. Get the home directory of every user in /etc/passwd
cut -d: -f1,6 /etc/passwd
#   alice:/home/alice
#   carol:/home/carol

# 3. Build a `users.txt` from /etc/passwd field 1
cut -d: -f1 /etc/passwd | sort -u > users.txt

# 4. Drop the trailing newline character from each line
cut -c1-$(( $(awk '{print length; exit}' file) - 1 )) file

# 5. Strip an N-character prefix from every line (e.g. log timestamps)
cut -c25- access.log

# 6. Extract the first word of every line
cut -d' ' -f1 sentences.txt

# 7. Pull host names from an SSH config
grep -E '^Host ' ~/.ssh/config | cut -d' ' -f2-

# 8. Get just the PID column from ps
ps aux | tr -s ' ' | cut -d' ' -f2

# 9. Strip protocol from URLs
cut -d/ -f3 urls.txt          # https://example.com/path → example.com

# 10. Recombine after editing — split, mutate, paste back
paste -d, \
  <(cut -d, -f1 data.csv) \
  <(cut -d, -f2 data.csv | tr '[:lower:]' '[:upper:]') \
  <(cut -d, -f3- data.csv)

Output (recipe 2):

root:/root
daemon:/usr/sbin
bin:/bin
alice:/home/alice
carol:/home/carol

CSV caveats#

cut is not a CSV parser — it splits naively on every comma, so embedded commas inside quoted fields ("Smith, John") will be torn apart. For correctness with real-world CSV, use a dedicated tool such as qsv, csvkit’s csvcut, or awk with a CSV-aware library. cut is only safe on TSV or “well-behaved” comma-separated input.

# DANGER — embedded commas destroy field alignment
echo '1,"Smith, John",admin' | cut -d, -f2   # → "Smith   (broken)

# Safe with qsv
echo '1,"Smith, John",admin' | qsv select 2  # → Smith, John

# Safe with awk's FPAT for quoted CSV
echo '1,"Smith, John",admin' | awk 'BEGIN{FPAT="([^,]+)|(\"[^\"]+\")"} {print $2}'

Output: (none — exits 0 on success)

[!TIP] cut always outputs fields in the order they appear in the input, regardless of the order specified with -f. To reorder fields, use awk '{print $3, $1}' instead.

[!TIP] join requires sorted input. Use join <(sort f1) <(sort f2) or pre-sort with sort -k1 when sorting on a non-first field.

[!TIP] cut -d': ' -f2 does not work as a multi-char delimiter — only the first byte (:) is used. Use awk -F': ' or pre-substitute with sed 's/: /\t/g' then cut -f2.

Sources#