awk

Table of Contents

1 tmp

match($0, /#include <(.*)>/, a) {print a[1]}
  • sum line of numbers
awk '{s+=$1} END {print s}' mydatafile

The quote for command must be single quote! not double. That is because the double quote will do shell substitution.

awk '{print $0}'

2 Introduction

Use shebang #! /bin/awk -f for script file. In terminal, Invoke by awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 .... MUST USE single quote. Run a script by awk -f script.awk INPUT-FILES.

The "program" contains a list of statements. Each of form pattern {action}. A missing { action } means print the line; a missing pattern always matches. Pattern-action statements are separated by newlines or semicolons.

The program starts by evaluating the all BEGIN patterns in order. Each file is processed in turn by reading data until a record separator. This typically means one line at a time, for all the files. All patterns are examined against the line. If match, action is executed.

Invoking options:

-F FS
set FS
-v VAR=VAL
set variable before the program begins

AWK first read record using RS as separator from input. For each record, awk use FS to split it into fields.

awk is line-based, and the content of lines are splited into fields $1, $2, … by the separator FS. $0 refers to the entire line. Function parameters are local, all other variables are global.

3 patterns

Patterns are arbitrary Boolean combinations (with ! || &&) of regular expressions and relational expressions.

A pattern may consist of two patterns separated by a comma, it is called pattern range. In this case, the action is performed for all lines from an occurrence of the first pattern though an occurrence of the second, inclusive.

special patterns: BEGIN, END. Cannot combine with other patterns.

Also, we can match part of the input by using regular expression comparison operators (~ and !~). EXP ~ /REGEXP/ returns true if match. This can be used in patterns, as well as as conditions for if, while, for, do statements.

Case sensitivity is controled by IGNORECASE variable. You can set when invoking awk, or set the variable in the awk program. Another way to ignore case is to call tolower or toupper function before comparison.

4 Actions

An action is a sequence of statements. Statements are terminated by semicolons, newlines or right braces.

Reference of user-defined variable do not use $. E.g. myvar=$2 will assign $2 to myvar! Refer the var directly: no need to use a $ as shell variables.

Expressions are very similar to C. The added contents are:

  • $N: field reference
  • expr expr: string concatenation
  • string [!]~ pattern: ERE match
  • expr in array
  • (index) in array

4.1 Variables

  • Except function parameters, all variables are global. Uninitialized value have both a numeric value of 0 and a string value of empty string.
  • field variable can be referenced by $N or $expr where expr is numerical expression. They are assignable. Reference to non-exist fields is uninitialized value. NF is the number of fields. Assign to non-exist field increase NF.

Important Variables

NF
number of fields in the current record
NR
line number of current record
FNR
line number of current record in the current file
FILENAME
the name of the current input file
FS (Field Separator)
regular expression used to separate fields; also settable by option -Ffs.
RS
input record separator (default newline)
OFS
output field separator (default blank)
ORS
output record separator (default newline)

4.2 Output

print and printf write to standard output by default. Redirect is supported.

  • > expr : file
  • >> expr : file
  • | expr : command

4.3 Control Structure

  • if (condition) then-body [else else-body]
  • while (condition) body
  • do body while (condition)
  • for (init;cond;inc) body
  • switch(expr) {case val: body default: body}
  • break
  • continue
  • next: stop current record immediately and go on to next

4.4 String Functions

sub(ere, repl[, in])
substitute the first match of ere inside in (or $0) by repl. Return number of substitution. & can be used in repl.
gsub(ere, repl[, in])
sub for all matches
index(s,t)
index t inside s, 0 if not occur.
length([s])
return length for s (or $0)
match(s, ere)
return position of match of ere in s
split(s, a[, fs])
split s into array a
sprintf(fmt, expr, expr, …)
printf with format string
substr(s, m[, n])
return substring from m with length at most n
tolower(s)
return lowercase
toupper(s)
uppercase

5 User-defined Functions

function name([param, ...]) {statements}

6 Examples

awk 'NR==10 {print}' input.txt # output 10th line, or empty is less than 10 lines