sed

Table of Contents

1 Invoke

  • In shell script, use shebang #!/bin/sed -f or #!/bin/sed -nf.
  • In command line, sed [OPTION] [INPUT]
    INPUT
    if not exists, or is -, use stdin. If multiple, concatenate them first, then apply to the whole
    -n
    by default each line of input is echoed to the standard output after all of the commands have been applied to it. The -n option suppresses this behavior
    -e
    expressions in string. can have multiple: sed -e 'xxx' -e 'xxx' -e 'xxx' file.txt
    -f
    script file
    -E
    extended expression
    -i
    modify in place

2 How Sed Works

sed maintains two data buffers: pattern space and hold space, both of them initially empty. It copies a line from input, remove trailing newline, and put into pattern space. Commands matched by address are processed. After the last command, the pattern space is printed out (unless -n is used), and pattern space is deleted (unless special commands like 'D' are used). Next cycle begins and process the next line.

The hold space, on the other hand, holds data between cycles.

h
hold. replace hold space with pattern space
H
Hold. append line from pattern space to hold space, with a newline before it
g
Replace the contents of the pattern space with the contents of the hold space.
G
Append line from hold space to pattern space, with a newline before it
x
Exchange the contents of the hold and pattern spaces.

3 Commands

The commands is of form [addr]X[options]

options
specific for different commands
X
single-letter comand
[addr]
X will only be applied to the matched lines by addr

Commands can be separated by semi-colon or newline.

  • Most commonly used command:
    • d: delete
    • D: Delete. delete line from pattern space until the first newline, and restart the cycle
    • p: print pattern buffer
    • P: print line from pattern space until first newline
    • =: print line number
    • n: (next) print pattern space, then replace with next line
    • N: append line from input file to pattern space
    • s/REGEXP/REPLACEMENT/FLAGS: if REGEXP is matched, the matched part is replaced by REPLACEMENT
      • REPLACEMENT:
        • \1-\9: refers to the capture group
        • &: refers to whole match
      • flag:
        • g: replace all matches
        • [N]: only replace N-th match
        • p: if substitution was made, print the new pattern space
        • w FILE: if substitution was made, write the result to FILE
        • I: case insensitive
  • Not so common:
    • a TEXT: append TEXT after a line. It actually even after the newline, starting a new line.
    • i TEXT: insert TEXT before a line, starting a new line.
    • b LABEL: branch unconditionally to LABEL
    • c TEXT: change line to TEXT
    • l: Print the pattern space in an unambiguous form. Print newline as '\n'.
    • #: begin a line comment
    • q [EXIT-CODE]: quit with code
    • { COMMANDS }: commands should be separated by ;, this allows share address.
    • r: read a file
    • w: write
    • b: branch
    • t: test
    • :label: label

4 Address

  • LINE: single line number
    • NUMBER: line start from 1
    • $: last line
    • FIRST~STEP: matches every STEP-th line starting with line FIRST
  • regular expression:
    • /REGEXP/:
    • /REGEXP/I: case insensitive
  • range:
    • LINE,LINE:
    • LINE,REGEXP: starting from LINE; REGEXP will check the following line, i.e. range span at least two lines. LINE can be 0, in which case REGEXP is trying to matching start from beginning.
    • ADDR,+N: ADDR adn N lines following it.
    • ADDR,-N and the lines following until next line whose line number is multiple of N (this line is considered matched)
    • append ! to range will inverse it.

4.1 Regular expression

4.1.1 Basic and Extended RE

Basic Extended
\+ +
\? ?
\{I,J\} {I,J}
\(capture\) (capture)
\<alter> <alter>

alter is actually the vertical line.

4.1.2 Common

Bracket Expressions (can be used in both basic and extended RE) Put inside [[]] the following

  • :space:
  • :digit:
  • :alnum:
  • :alpha:
  • :lower:
  • :upper:

Extension

  • \w
  • \W
  • \b
  • \B
  • \s
  • §
  • \<
  • \>

5 examples

print

# add line numbers first,
# then use grep,
# then just print the number
cat -n file | grep 'PATTERN' | awk '{print $1}'
# the equilvalence
sed -n '/PATTERN/ =' file

substitute

s/pattern/&/ # '&' stands for the total match
# in extend mode(-E), can use \1 \2
s/(a)b/\1/
s//string/ # use the last run-time used pattern
s/xxx/xxx/g # substitute globally: all
# there will not be recursion. sed will not examine the generated string
s/loop/loop loop/g # will NOT run forever
s/xxx/xxx/2 # only substitute the second match
s/xxx/xxx/g2 # substitute 2,3,4,...
s/xxx/xxx/p # will print out even if -n is used
s/xxx/xxx/I p # ignore case; command can be used together
s/a/A/2pw /tmp/file # combine more

delete

# -i: make change to the original file
# /d: delete the line if match
sed -i '/@slice/d' $ClassName.java
sed -i 'g/@slice/d' xx.java # remove all
sed '/^$/d' # remove all empty lines
sed '11,$ d' # only output first 10 lines
sed '1,/^$/ d' # delete everything up to the first blank line.

6 Trouble Shooting

6.1 GNU sed on Mac

The sed version on Mac OS and GNU Linux are different. So, use gnu! On Mac, install

brew install gnu-sed

This will make a gsed command available. To write a cross platform script, use

echo "OSTYPE: " $OSTYPE
SED=sed
if [[ "$OSTYPE" == "linux-gnu" ]]; then
    SED=sed
elif [[ "$OSTYPE" == "darwin"* ]]; then
    SED=gsed
fi
$SED -E -e "460,$ s/REG[0-9]{1,2}//g" compress42.c.orig > compress42.bugsig.c

6.2 About the regular expression version

-E will enable extra features, such as:

  • a{1,2}

See re_format(7) for details.

There's no \d, so use [0-9] instead. The man page says [:digit:] can be used, but it seems not working.