sed
Table of Contents
1 Invoke
- In shell script, use shebang
#!/bin/sed -f
or#!/bin/sed -nf
. - In command line,
sed [OPTION] [INPUT]
- INPUT
- if not exists, or is
-
, usestdin
. If multiple, concatenate them first, then apply to the whole - -n
- by default each line of input is echoed to the standard output after all of the commands have been applied to it. The -n option suppresses this behavior
- -e
- expressions in string. can have multiple: sed -e 'xxx' -e 'xxx' -e 'xxx' file.txt
- -f
- script file
- -E
- extended expression
- -i
- modify in place
2 How Sed Works
sed
maintains two data buffers: pattern space and hold space,
both of them initially empty. It copies a line from input, remove
trailing newline, and put into pattern space. Commands matched by
address are processed. After the last command, the pattern space is
printed out (unless -n is used), and pattern space is deleted (unless
special commands like 'D' are used). Next cycle begins and process the
next line.
The hold space, on the other hand, holds data between cycles.
- h
- hold. replace hold space with pattern space
- H
- Hold. append line from pattern space to hold space, with a newline before it
- g
- Replace the contents of the pattern space with the contents of the hold space.
- G
- Append line from hold space to pattern space, with a newline before it
- x
- Exchange the contents of the hold and pattern spaces.
3 Commands
The commands is of form [addr]X[options]
- options
- specific for different commands
- X
- single-letter comand
- [addr]
- X will only be applied to the matched lines by addr
Commands can be separated by semi-colon or newline.
- Most commonly used command:
- d: delete
- D: Delete. delete line from pattern space until the first newline, and restart the cycle
- p: print pattern buffer
- P: print line from pattern space until first newline
- =: print line number
- n: (next) print pattern space, then replace with next line
- N: append line from input file to pattern space
- s/REGEXP/REPLACEMENT/FLAGS: if REGEXP is matched, the matched
part is replaced by REPLACEMENT
- REPLACEMENT:
- \1-\9: refers to the capture group
- &: refers to whole match
- flag:
- g: replace all matches
- [N]: only replace N-th match
- p: if substitution was made, print the new pattern space
- w FILE: if substitution was made, write the result to FILE
- I: case insensitive
- REPLACEMENT:
- Not so common:
a TEXT
: append TEXT after a line. It actually even after the newline, starting a new line.i TEXT
: insert TEXT before a line, starting a new line.b LABEL
: branch unconditionally to LABELc TEXT
: change line to TEXTl
: Print the pattern space in an unambiguous form. Print newline as '\n'.#
: begin a line commentq [EXIT-CODE]
: quit with code{ COMMANDS }
: commands should be separated by;
, this allows share address.r
: read a filew
: writeb
: brancht
: test:label
: label
4 Address
- LINE: single line number
- NUMBER: line start from 1
$
: last line- FIRST~STEP: matches every STEP-th line starting with line FIRST
- regular expression:
/REGEXP/
:/REGEXP/I
: case insensitive
- range:
LINE,LINE
:LINE,REGEXP
: starting from LINE; REGEXP will check the following line, i.e. range span at least two lines. LINE can be 0, in which case REGEXP is trying to matching start from beginning.ADDR,+N
: ADDR adn N lines following it.ADDR,-N
and the lines following until next line whose line number is multiple of N (this line is considered matched)- append
!
to range will inverse it.
4.1 Regular expression
4.1.1 Basic and Extended RE
Basic | Extended |
---|---|
\+ | + |
\? | ? |
\{I,J\} | {I,J} |
\(capture\) | (capture) |
\<alter> | <alter> |
alter is actually the vertical line.
4.1.2 Common
Bracket Expressions (can be used in both basic and extended RE)
Put inside [[]]
the following
- :space:
- :digit:
- :alnum:
- :alpha:
- :lower:
- :upper:
Extension
- \w
- \W
- \b
- \B
- \s
- §
- \<
- \>
5 examples
# add line numbers first, # then use grep, # then just print the number cat -n file | grep 'PATTERN' | awk '{print $1}' # the equilvalence sed -n '/PATTERN/ =' file
substitute
s/pattern/&/ # '&' stands for the total match # in extend mode(-E), can use \1 \2 s/(a)b/\1/ s//string/ # use the last run-time used pattern s/xxx/xxx/g # substitute globally: all # there will not be recursion. sed will not examine the generated string s/loop/loop loop/g # will NOT run forever s/xxx/xxx/2 # only substitute the second match s/xxx/xxx/g2 # substitute 2,3,4,... s/xxx/xxx/p # will print out even if -n is used s/xxx/xxx/I p # ignore case; command can be used together s/a/A/2pw /tmp/file # combine more
delete
# -i: make change to the original file # /d: delete the line if match sed -i '/@slice/d' $ClassName.java sed -i 'g/@slice/d' xx.java # remove all sed '/^$/d' # remove all empty lines sed '11,$ d' # only output first 10 lines sed '1,/^$/ d' # delete everything up to the first blank line.
6 Trouble Shooting
6.1 GNU sed on Mac
The sed version on Mac OS and GNU Linux are different. So, use gnu! On Mac, install
brew install gnu-sed
This will make a gsed
command available.
To write a cross platform script, use
echo "OSTYPE: " $OSTYPE SED=sed if [[ "$OSTYPE" == "linux-gnu" ]]; then SED=sed elif [[ "$OSTYPE" == "darwin"* ]]; then SED=gsed fi $SED -E -e "460,$ s/REG[0-9]{1,2}//g" compress42.c.orig > compress42.bugsig.c
6.2 About the regular expression version
-E
will enable extra features, such as:
- a{1,2}
See re_format(7)
for details.
There's no \d
, so use [0-9]
instead. The man page says [:digit:]
can be used, but it seems not working.