awk
Table of Contents
1 tmp
match($0, /#include <(.*)>/, a) {print a[1]}
- sum line of numbers
awk '{s+=$1} END {print s}' mydatafile
The quote for command must be single quote! not double. That is because the double quote will do shell substitution.
awk '{print $0}'
2 Introduction
Use shebang #! /bin/awk -f
for script file. In terminal, Invoke by
awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ...
. MUST USE single
quote. Run a script by awk -f script.awk INPUT-FILES
.
The "program" contains a list of statements. Each of form pattern
{action}
. A missing { action } means print the line; a missing
pattern always matches. Pattern-action statements are separated by
newlines or semicolons.
The program starts by evaluating the all BEGIN patterns in order. Each file is processed in turn by reading data until a record separator. This typically means one line at a time, for all the files. All patterns are examined against the line. If match, action is executed.
Invoking options:
-F FS
- set FS
-v VAR=VAL
- set variable before the program begins
AWK first read record using RS as separator from input. For each record, awk use FS to split it into fields.
awk is line-based, and the content of lines are splited into fields
$1
, $2
, … by the separator FS
. $0
refers to the entire
line. Function parameters are local, all other variables are global.
3 patterns
Patterns are arbitrary Boolean combinations (with !
||
&&
) of
regular expressions and relational expressions.
A pattern may consist of two patterns separated by a comma, it is called pattern range. In this case, the action is performed for all lines from an occurrence of the first pattern though an occurrence of the second, inclusive.
special patterns: BEGIN
, END
. Cannot combine with other patterns.
Also, we can match part of the input by using regular expression
comparison operators (~
and !~
). EXP ~ /REGEXP/
returns true if
match. This can be used in patterns, as well as as conditions for if,
while, for, do statements.
Case sensitivity is controled by IGNORECASE
variable. You can set
when invoking awk, or set the variable in the awk program. Another way
to ignore case is to call tolower
or toupper
function before
comparison.
4 Actions
An action is a sequence of statements. Statements are terminated by semicolons, newlines or right braces.
Reference of user-defined variable do not use $. E.g. myvar=$2
will
assign $2 to myvar! Refer the var directly: no need to use a $ as
shell variables.
Expressions are very similar to C. The added contents are:
- $N: field reference
- expr expr: string concatenation
- string [!]~ pattern: ERE match
- expr in array
- (index) in array
4.1 Variables
- Except function parameters, all variables are global. Uninitialized value have both a numeric value of 0 and a string value of empty string.
- field variable can be referenced by $N or $expr where expr is numerical expression. They are assignable. Reference to non-exist fields is uninitialized value. NF is the number of fields. Assign to non-exist field increase NF.
Important Variables
NF
- number of fields in the current record
NR
- line number of current record
FNR
- line number of current record in the current file
FILENAME
- the name of the current input file
FS (Field Separator)
- regular expression used to separate fields; also settable by option -Ffs.
RS
- input record separator (default newline)
OFS
- output field separator (default blank)
ORS
- output record separator (default newline)
4.2 Output
print and printf write to standard output by default. Redirect is supported.
- > expr : file
- >> expr : file
- | expr : command
4.3 Control Structure
if (condition) then-body [else else-body]
while (condition) body
do body while (condition)
for (init;cond;inc) body
- switch(expr) {case val: body default: body}
- break
- continue
- next: stop current record immediately and go on to next
4.4 String Functions
sub(ere, repl[, in])
- substitute the first match of ere inside
in (or $0) by repl. Return number of substitution.
&
can be used in repl. gsub(ere, repl[, in])
- sub for all matches
- index(s,t)
- index t inside s, 0 if not occur.
- length([s])
- return length for s (or $0)
- match(s, ere)
- return position of match of ere in s
- split(s, a[, fs])
- split s into array a
- sprintf(fmt, expr, expr, …)
- printf with format string
- substr(s, m[, n])
- return substring from m with length at most n
- tolower(s)
- return lowercase
- toupper(s)
- uppercase
5 User-defined Functions
function name([param, ...]) {statements}
6 Examples
awk 'NR==10 {print}' input.txt # output 10th line, or empty is less than 10 lines