Linux skill :awk A brief introduction to the command

 

stay Linux In command ,awk  Commands are often used to process text content . The following is based on an example awk Common usage of commands .

GNU gawk

awk It's an order , It's also a programming language , It can be implemented in different versions .

stay Linux In the system ,awk The implementation version of is GNU gawk.

stay shell In the implementation of awk command , What is actually carried out is gawk command . As shown below :

$ ls -l /usr/bin/awk
lrwxrwxrwx 1 root root 21  2 month   1  2019 /usr/bin/awk -> /etc/alternatives/awk
$ ls -l /etc/alternatives/awk
lrwxrwxrwx 1 root root 13  3 month   8  2019 /etc/alternatives/awk -> /usr/bin/gawk
$ ls -l /usr/bin/gawk
-rwxr-xr-x 1 root root 441512  7 month   3  2013 /usr/bin/gawk

You can see ,/usr/bin/awk The file eventually links to /usr/bin/gawk file ,/usr/bin/gawk The file is no longer linked to other files .

In the following description , If there is no special instruction , said awk Refer to GNU gawk.

awk Command format

see man awk Explanation , It's also a link to man gawk The content of , The explanation is rather difficult to understand , Not clear enough , You can refer to GNU gawk Online help manual https://www.gnu.org/software/gawk/manual/gawk.html Explanation .

Here's a quote from this online help book , Among them the awk The basic introduction of is as follows :

The basic function of awk is to search files for lines (or other units of text) that contain certain patterns.

When a line matches one of the patterns, awk performs specified actions on that line.

awk continues to process input lines in this way until it reaches the end of the input files.

awk The basic usage of the command is as follows :

There are several ways to run an awk program.

If the program is short, it is easiest to include it in the command that runs awk, like this:awk 'program' input-file1 input-file2 …

where program consists of a series of patterns and actions, an awk program looks like this:

pattern { action }

 

When the program is long, it is usually more convenient to put it in a file and run it with a command like this:

awk -f program-file input-file1 input-file2 …

 

There are single quotes around program so the shell won’t interpret any awk characters as special shell characters.

The quotes also cause the shell to treat all of program as a single argument for awk, and allow program to be more than one line long.

 

You can also run awk without any input files. If you type the following command line:

awk 'program'

awk applies the program to the standard input, which usually means whatever you type on the keyboard.

namely ,awk Command to find a line containing a specific pattern in the given file , And do specific processing for the rows found . These specific processes are handled by program Parameter assignment .

above-mentioned , Given to program Parameters should be enclosed in single quotation marks , avoid shell Extend some special characters .

If no filename is provided ,awk The command reads standard input by default .

If no specific pattern is provided , All rows are processed by default .

Be careful : With the awk Ordered program The parameter after the parameter is considered to be the file name , Even if you use quotation marks to enclose the parameter value, it is still a file name , Not as a string .

This command can't handle string values provided by command line arguments , Specific examples are as follows :

$ cat testawk
This is a test string.
This is another TEST string.
$ awk '{print $3}' testawk
a
another
$ awk '{print $3}' "testawk"
a
another
$ awk '{print $3}' "This is a test string."
awk: fatal: cannot open file `This is a test string.' for reading (No such file or directory)

You can see ,awk '{print $3}' testawk  The command is printed out testawk The third column of the file .

awk '{print $3}' "testawk"  The command is also printed out testawk The third column of the file . Use double quotation marks to put testawk Cover up , It doesn't mean printing "testawk" The third column of the string .

and  awk '{print $3}' "This is a test string."  The command will execute and report an error , The prompt could not find the name This is a test string. The file of , It doesn't deal with "This is a test string." The contents of the string itself , Instead, treat the string as a file name , To handle the contents of the corresponding file .

If you really need to use awk Command to handle strings , You can use the pipeline operator | To connect standard input .

For example echo Command to print the value of a string , This value is then passed through the pipeline operator to awk Standard input for commands .

Specific examples are as follows :

$ echo "This is a test string." | awk '{print $4}'
test
$ value="This is a new test string."
$ echo "$value" | awk '{print $4}'
new

You can see ,echo "This is a test string." | awk '{print $4}'  Command to pass echo First output the value of the string , And then through the pipeline operator | Connect this output to awk Standard input for commands , You can process this string , No error reporting .

echo "$value" | awk '{print $4}'  The command is printed out value The fourth column of the variable value , You can handle variable values in this way .

Be careful : Here we use the pipeline operator | To connect standard input , Give Way awk The command can handle strings passed into standard input , But using the redirection standard input operator < Don't let awk The command handles strings .

Redirection is a file based operation , The given string will be treated as a file name , Examples are as follows :

$ awk '{print $4}' < "This is a test string."
-bash: This is a test string.: No such file or directory

You can see , In redirecting standard input operators  <  Dexter "This is a test string." Strings are treated as file names ,bash Prompt file not found .

It's not here awk Command error , It is bash An error was reported when processing redirection .

awk program

Use awk The key to command is ,program How to write parameters .

see GNU gawk Instructions in the online help book , The list is as follows :

  • Programs in awk consist of pattern–action pairs.
  • An action without a pattern always runs.
  • An awk program generally looks like this: [pattern] { action }
  • Patterns in awk control the execution of rules -- a rule is executed when its pattern matches the current input record.
  • The purpose of the action is to tell awk what to do once a match for the pattern is found.
  • An action consists of one or more awk statements, enclosed in braces (‘{…}’).

namely ,awk Ordered program Parameters from pattern and action form .Pattern Used to specify the matching pattern , And perform the following action operation , Mismatched lines are not processed .Action Used to specify what kind of operation to perform on the matched row , These operation statements should be included in braces {} Inside . If not provided pattern Parameters , All rows are processed by default .

part pattern The description of the parameter is as follows :

  • /regular expression/
    A regular expression. It matches when the text of the input record fits the regular expression.
  • expression
    A single expression. It matches when its value is nonzero (if a number) or non-null (if a string).

Specific examples are as follows :

$ awk '/a.*/ {print $0}' testawk
This is a test string.
This is another TEST string.
$ awk '/test/ {print $0}' testawk
This is a test string.
$ awk 'test {print $0}' testawk
$ awk '"NONE" {print $0}' testawk
This is a test string.
This is another TEST string.
$ awk '$3 == "another" {print $0}' testawk
This is another TEST string.

You can see ,awk '/a.*/ {print $0}' testawk  Command to use a.* Regular expressions to match containing characters ‘a’ The line of , And print out the whole line .

awk '/test/ {print $0}' testawk  The command is print contains "test" Line of string .

awk 'test {print $0}' testawk  The command didn't print anything , This does not mean that printing contains "test" Line of string .

awk '"NONE" {print $0}' testawk  The command is printed out testawk All lines of the file , Although this file does not contain "NONE" character string . Based on the above description , Given to pattern The parameter is a non empty string enclosed in double quotation marks , Means always match , Whatever the content of this string is .

awk '$3 == "another" {print $0}' testawk  The third column of the command match is "another" Line of string , And print out the whole line .

namely , If you want to match a string ,pattern The parameters are written as “/regular expression/” It's going to be simpler , To write as “expression” form , Need to know awk How to write the expression of .

Get the content of the given line

awk When reading each line , Will split the line content into multiple words based on the split character , It can be used $number To get the first number List of words ,number Value from 1 Start . for example ,$1 The words corresponding to the first column ,$2 Words corresponding to the second column ,$3 The words corresponding to the third column , And so on . It can be used $NF To get the last column after splitting .

Special ,$0 Get the whole line , Including the beginning of the line 、 Or any white space at the end of the line .

With "This is a test string." Here's an example , There is a corresponding relationship as follows :

Linux skill :awk A brief introduction to the command

awk Row field acquisition method

Use -F Option specifies the split character

Mentioned earlier ,awk By default, spaces are used to split lines into multiple words . If you want to split based on other characters , have access to -F Option to specify the split character .

GNU gawk Online help book for -F The options are described below :

-F fs

--field-separator fs

Set the FS variable to fs.

For example "clang/utils/analyzer/" For a directory path like this , If you want to be based on  /  To break up , In order to get the directory names , You can use -F Option to specify that the split character is  /.

Specific examples are as follows :

$ echo "clang/utils/analyzer/" | awk -F '/' '{print $1, $2}'
clang utils
$ echo "clang/utils/analyzer/" | awk -F '/' '{print "Last word is: " $NF}'
Last word is:

You can see , Use -F '/' After specifying the split character , The content given will be / To split , The split word does not contain ‘/’ This character .

Because the last character of the given content is ‘/’, The split content of the last column is empty , therefore $NF The content of is empty .

When you need to split line content based on specific characters , Use awk The command is very practical ,-F Option to specify the split character , And then use $number You can get the second number Column content , Easy to handle .

print sentence

The previous examples all use print Statement to print the content .

GNU gawk Online help book for print The statement is explained as follows :

Use the print statement to produce output with simple, standardized formatting.

You specify only the strings or numbers to print, in a list separated by commas.

They are output, separated by single spaces, followed by a newline.

The statement looks like this:

print item1, item2, …

 

The entire list of items may be optionally enclosed in parentheses.

The simple statement ‘print’ with no items is equivalent to ‘print $0’: it prints the entire current record.

namely ,print Statement to print the given string 、 Or digital content , Use commas between different contents ‘,' separate , But the printed effect is separated by spaces .

After testing , It doesn't work if it's separated by other characters , The printed content will be linked together . Specific examples are as follows :

$ awk '/test/ {print $3, $5}' testawk
a string.
$ awk '/test/ {print $3 $5}' testawk
astring.
$ awk '/test/ {print $3_$5}' testawk
astring.

You can see , stay print It's followed by $3, $5 when , The two strings printed are separated by spaces , And write as $3 $5、 perhaps $3_$5, Two strings printed are directly linked together , Did not print the space given 、 Or underline ‘_’. namely , It can only be separated by commas .

If in print No parameters are provided later , Default is equivalent to print $0, Will print the entire line . If nothing is provided action Parameters , With braces {} They don't offer , Default is equivalent to { print $0 }. If only braces are provided {}, There's nothing in the braces , It's a null operation , Don't do anything? .

Specific examples are as follows :

$ awk '/test/ {print}' testawk
This is a test string.
$ awk '/test/' testawk
This is a test string.
$ awk '/test/ {}' testawk

You can see ,awk '/test/ {print}' testawk  Command in print The parameters are provided later , Print out the whole line .

awk '/test/' testawk  The order did not provide action Parameters , It's also printing out the whole line .

awk '/test/ {}' testawk  The order provides action Parameters , It just doesn't specify what to do , Nothing was printed .