brief introduction

awk Is a powerful text analysis tool , be relative to grep Lookup ,sed The editor of ,awk When it analyzes the data and generates a report , It's especially powerful . Simply speaking awk Is to read the document line by line , Slice each line with a space as the default separator , The cut part is then analyzed .

awk Yes 3 Different versions : awk、nawk and gawk, There is no special explanation , Generally refer to gawk,gawk yes AWK Of GNU edition .

awk Its name derives from its founder Alfred Aho 、Peter Weinberger and Brian Kernighan The first letter of a surname . actually AWK Do have their own language : AWK Programming language , The three founders have formally defined it as “ Style scanning and processing language ”. It allows you to create short programs , These programs read input files 、 Sort data 、 Processing data 、 Perform calculations on input and generate reports , There are countless other functions .

 

Usage method

awk '{pattern + action}' {filenames}

Although the operation can be complicated , But grammar is always like this , among pattern Express AWK What to look for in the data , and action It's a series of commands that are executed when a match is found . Curly braces ({}) You don't need to be in the program all the time , But they are used to group a series of instructions according to a particular pattern . pattern It's a regular expression to express , Enclose with a slash .

awk The basic function of the language is to browse and extract information from files or strings based on specified rules ,awk After extracting information , To perform other text operations . complete awk Scripts are usually used to format information in text files .

Usually ,awk It's a unit of file processing .awk One line per received file , Then execute the corresponding command , To process text .

 

call awk

There are three ways to call awk

 

1. Command line mode
awk [-F  field-separator]  'commands'  input-file(s)
among ,commands  It's real awk command ,[-F Field separator ] It's optional . input-file(s)  It's a pending document .
stay awk in , In every line of the file , Each item separated by a field separator is called a field . Usually , Without naming -F In the case of field separators , The default field separator is space .
2.shell Script mode
Will all awk Command to insert a file , And make awk The program is executable , then awk Command interpreter as the first line of the script , Call... Again by typing the name of the script .
amount to shell The first line of the script :#!/bin/sh
Can be replaced by :#!/bin/awk
3. Will all awk Command to insert a separate file , And then call :
awk -f awk-script-file input-file(s)
among ,-f Option loading awk-script-file Medium awk Script ,input-file(s) It's the same thing .

 

  This chapter focuses on the command line mode .

 

Introductory example

hypothesis last -n 5 The output is as follows

[[email protected] ~]# last -n 5 <== Just take out the first five lines
root     pts/1   192.168.1.100  Tue Feb 10 11:21   still logged in
root     pts/1   192.168.1.100  Tue Feb 10 00:46 - 02:28  (01:41)
root     pts/1   192.168.1.100  Mon Feb  9 11:41 - 18:30  (06:48)
dmtsai   pts/1   192.168.1.100  Mon Feb  9 11:41 - 11:41  (00:00)
root     tty1                   Fri Sep  5 14:09 - 14:10  (00:01)

If only the most recently logged in 5 Account number

#last -n 5 | awk  '{print $1}'
root
root
root
dmtsai
root

awk The workflow is like this : Read in yes '\n' A record separated by a newline character , Then divide the records into fields according to the specified field separator , Fill fields ,$0 Then all domains ,$1 Represents the first domain ,$n It means the first one n Domains . The default field separator is " Blank key " or "[tab] key ", therefore $1 Represents the login user ,$3 Represents the login user ip, And so on .

 

If it's just showing /etc/passwd The account of

#cat /etc/passwd |awk  -F ':'  '{print $1}'  
root
daemon
bin
sys

This is a awk+action An example of , Every line will execute action{print $1}.

-F Specify the field separator as ':'.

 

If it's just showing /etc/passwd Account and account corresponding to shell, And accounts and shell Between tab Key split

#cat /etc/passwd |awk  -F ':'  '{print $1"\t"$7}'
root    /bin/bash
daemon  /bin/sh
bin     /bin/sh
sys     /bin/sh

 

If it's just showing /etc/passwd Account and account corresponding to shell, And accounts and shell Separated by commas , And add column names to all rows name,shell, Add on last line "blue,/bin/nosh".

 

cat /etc/passwd |awk  -F ':'  'BEGIN {print "name,shell"}  {print $1","$7} END {print "blue,/bin/nosh"}'
name,shell
root,/bin/bash
daemon,/bin/sh
bin,/bin/sh
sys,/bin/sh
....
blue,/bin/nosh

 

awk The workflow is like this : Execute first BEGING, Then read the file , Read in yes /n A record separated by a newline character , Then divide the records into fields according to the specified field separator , Fill fields ,$0 Then all domains ,$1 Represents the first domain ,$n It means the first one n Domains , Then start to perform the action corresponding to the mode action. And then start reading in the second record ······ Until all the records are read , Finally, execute END operation .

 

Search for /etc/passwd Yes root All lines of keywords

#awk -F: '/root/' /etc/passwd
root:x:0:0:root:/root:/bin/bash

This is a pattern Use example of , Match the pattern( Here is root) Will be executed action( Is not specified action, Output the content of each line by default ).

Search supports regular , For example, look for root At the beginning : awk -F: '/^root/' /etc/passwd

 

Search for /etc/passwd Yes root All lines of keywords , And display the corresponding shell

# awk -F: '/root/{print $7}' /etc/passwd             
/bin/bash

  It's specified here action{print $7}

 

awk Built-in variables

awk There are many built-in variables for setting environment information , These variables can be changed , Here are some of the most common variables .

 

ARGC                Number of command line parameters
ARGV                Command line arguments
ENVIRON             Support the use of system environment variables in queues
FILENAME           awk The filename of the Browse
FNR                 Number of records browsing files
FS                  Set the input field separator , Equivalent to the command line  -F Options
NF                  The number of fields in the browsing record
NR                  Number of records read
OFS                 Output field separator
ORS                 Output record separator
RS                  Control record separator 

 

  Besides ,$0 A variable is the entire record .$1 Represents the first field of the current row ,$2 Represents the second field of the current row ,...... And so on .

 

Statistics /etc/passwd: file name , Line number of each line , Columns per row , The corresponding complete line content :

#awk  -F ':'  '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:"$0}' /etc/passwd
filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash
filename:/etc/passwd,linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh
filename:/etc/passwd,linenumber:3,columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/sh
filename:/etc/passwd,linenumber:4,columns:7,linecontent:sys:x:3:3:sys:/dev:/bin/sh

 

Use printf replace print, Can make the code more concise , Easy to read

 awk  -F ':'  '{printf("filename:%10s,linenumber:%s,columns:%s,linecontent:%s\n",FILENAME,NR,NF,$0)}' /etc/passwd

 

print and printf

awk It also provides print and printf Two printout functions .

among print The arguments to a function can be variables 、 Number or string . Strings must be quoted in double quotation marks , Parameters are separated by commas . If there is no comma , The parameters are concatenated and cannot be distinguished . here , The function of the comma is the same as that of the separator in the output file , It's just that the latter is a space .

printf function , Its usage and c In language printf Basically similar , You can format strings , When the output is complex ,printf A more useful , The code is easier to understand .

 

 awk Programming

  Variables and assignments

except awk Built in variables for ,awk You can also customize variables .

Here's the statistics /etc/passwd The number of people in your account

awk '{count++;print $0;} END{print "user count is ", count}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
......
user count is  40

count It's a custom variable . Previous action{} There's only one in all of them print, Actually print It's just a statement , and action{} There can be multiple statements , With ; The number separated .

 

There is no initialization here count, Although the default is 0, But it's better to initialize it as 0:

awk 'BEGIN {count=0;print "[start]user count is ", count} {count=count+1;print $0;} END{print "[end]user count is ", count}' /etc/passwd
[start]user count is  0
root:x:0:0:root:/root:/bin/bash
...
[end]user count is  40

 

Count the number of bytes occupied by files in a folder

ls -l |awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size}'
[end]size is  8657198

 

If the M Display in units :

ls -l |awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size/1024/1024,"M"}' 
[end]size is  8.25889 M

Be careful , Statistics do not include subdirectories of folders .

 

Conditional statements

 awk The conditional statement in is from C From the language , See the statement below :

 

if (expression) {
    statement;
    statement;
    ... ...
}
if (expression) {
    statement;
} else {
    statement2;
}
if (expression) {
    statement1;
} else if (expression1) {
    statement2;
} else {
    statement3;
}

 

 

Count the number of bytes occupied by files in a folder , Filter 4096 Size file ( It's usually a folder ):

ls -l |awk 'BEGIN {size=0;print "[start]size is ", size} {if($5!=4096){size=size+$5;}} END{print "[end]size is ", size/1024/1024,"M"}' 
[end]size is  8.22339 M

 

Loop statement

awk Loop statements in are also borrowed from C Language , Support while、do/while、for、break、continue, The semantics of these keywords and C The semantics in language are exactly the same .

 

Array

  because awk The subscripts of arrays in can be numbers and letters , The subscript of an array is often called a keyword (key). Values and keywords are stored in an internal sheet for key/value application hash In the form of . because hash It's not sequential storage , So when you display the contents of the array, you will find , They don't show up in the order you expect . Arrays are just like variables , They are created automatically when they are used ,awk It will also automatically determine whether it stores a number or a string . generally speaking ,awk Arrays in are used to collect information from records , It can be used to calculate the sum 、 Count the number of times words and tracking templates are matched, etc .

 

Show /etc/passwd The account of

 

awk -F ':' 'BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i < NR; i++) print i, name[i]}' /etc/passwd
0 root
1 daemon
2 bin
3 sys
4 sync
5 games
......

 

Use here for Loop through groups

 

awk There is a lot of programming , Here is a list of simple and common usages , Please refer to  http://www.gnu.org/software/gawk/manual/gawk.html