Performance tools of Linux three swordsmen awk, grep, sed

Zee_ 7D 2021-06-23 16:01:48
performance tools linux swordsmen awk


linux There are many tools for text processing , for example :sort, cut, split, join, paste, comm, uniq, column, rev, tac, tr, nl, pr, head, tail....., Study linux Lazy way of text processing ( Not the best way ) May be : Just learn grep,sed and awk. 

Use these three tools , You can solve the problem 99% linux Text processing of the system , Instead of remembering the different commands and parameters above . picture

and , If you've learned and used all three , You'll know the difference . actually , The difference here means which tools are good at solving what problems .

A more lazy way might be to learn scripting languages (python,perl or ruby) And use it for every text processing .


awk、grep、sed yes linux Three sharp tools for text manipulation , It is also necessary to master linux Command one .

The function of all three is to process text , But the focus is different , Among them awk Most powerful , But it's also the most complicated .grep More suitable for simple search or matching text ,sed More suitable for editing matched text ,awk Better for formatting text , More complex formatting of text .

A brief summary :

  • grep: Data search positioning

  • awk: Data slicing

  • sed: Data modification

grep = global regular expression print

In the simplest terms ,grep( Global regular expression printing )-- The command is used to find the strings in the file that match the conditions . Start with the first line of the file ,grep Copy a line to buffer in , Compare it to the search string , If the comparison passes , Print the line to the screen .grep This process will be repeated , Until the file searches all lines .

  Be careful : There is no process execution here grep Store lines 、 Change the line or search only a few lines .

Sample data file

Please cut and paste the following data into a file named “sampler.log” In the file of :

  1. boot

  2. book

  3. booze

  4. machine

  5. boots

  6. bungie

  7. bark

  8. aardvark

  9. broken$tuff

  10. robots

A simple example

grep The simplest example is :

  1. grep "boo" sampler.log

In this case ,grep Will traverse the file “sampler.log” Each line , And print out every line Contains words “boo”:

  1. boot

  2. book

  3. booze

  4. boots

But if you're working on large files , This will happen : If these lines identify which line in the file , What are they , It might be more useful to you , If you need to open a file in an editor , So it's easier to track specific strings and make some changes . In this case, you can add -n Parameter to implement :

  1. grep -n "boo" sampler.log

This leads to a more useful result , Explains which lines match the search string :

  1. 1:boot

  2. 2:book

  3. 3:booze

  4. 5:boots

Another interesting parameter is -v, It prints the opposite result . let me put it another way ,grep All lines that do not match the search string will be printed , Instead of printing the line that matches it . 

In the following cases ,grep Will print without strings “boo” Each line , And display the line number , As shown in the previous example

  1. grep -vn "boo" sampler.log

  2. 4:machine

  3. 6:bungie

  4. 7:bark

  5. 8:aardvark

  6. 9:broken$tuff

  7. 10:robots

c Options tell grep Suppress printing of matching lines , Show only the number of matching rows , The rows that match the query . for example , The numbers will be printed below 4, Because there is 4 It's in  sampler.log  It appears that “boo”.

  1. grep -c "boo" sampler.log

  2. 4

l Option prints only the file name string of the file in the query that has a line that matches the search . If you want to search multiple files for the same string , This will be very useful . like this :

  1. grep -l "boo" *

For searching non code files , A more useful option is -i, Ignore case . This option will handle when matching search strings , Equal case . In the following example , Even if the search string is uppercase , contain “boo” And the lines will also be printed out .

  1. grep -i "BOO" sampler.log

  2. boot

  3. book

  4. booze

  5. boots

x Options only match exactly . let me put it another way , The following command search has no results , Because no line contains only "boo"

  1. grep -x "boo" sampler.log

Last ,-A Allows you to specify additional up and down file lines , So you get the search string extra lines , for example

  1. grep -A2 "mach" sampler.log

  2. machine

  3. boots

  4. bungie

Regular expressions

Regular expressions are a compact way to describe complex patterns in text .

With grep You can use search mode ( pattern ) . Other tools use regular expressions (regexp) In a complex way . and grep The normal string used , It's actually very simple regular expressions . If you use wildcards , Such as ' * ' or ' ? ', For example, list file names and so on , You can use grep Search with basic regular expressions  

For example, search a file for letters e The line at the end :

  1. grep "e$" sampler.log

  2. booze

  3. machine

  4. bungie

If you need more extensive regular expression commands , Must be used  grep-E

for example , Regular expression commands ? Will match 1 or 0 time Previous characters :

  1. grep -E "boots?" sampler.log

  2. boot

  3. boots

You can still use it  pipe(|)  Combine multiple searches , It means “ perhaps ”, So you can do this :

  1. grep -E "boot|boots" sampler.log

  2. boot

  3. boots

Special characters

If you want to search for a special character , What should I do ? If you want to find all the lines , If it contains the dollar character “$”, It cannot be executed  grep“$”a_file, because '$' Will be interpreted as regular expressions , contrary , You will get all the lines , Any of them ends as a line , That is, all lines . The solution is “ escape ” Symbol , So you will use

  1. grep '\$' sampler.log

  2. broken$tuff

You can still use it “-F” Options , It represents “ Fixed string ” or “ Fast ”, Because it only searches for Strings , Not regular expressions .

added regexp Example

Reference resources :


from Aho,Weinberger and Kernighan Create text patterns for scanning and processing languages . 

AWK Very complicated , So this is not a complete guide , But it should give you a way to know what awk You can do it . It's easy to use , Strongly recommended .

AWK Basic knowledge of

awk The program operates on each line of the input file . It can have an optional BEGIN{ } Part of the command executed before processing anything in the file , Then master { } Parts run on every line of the file , Finally, there's an alternative END{ } Part of the operation will be performed later, and the file reading is completed :

  1. BEGIN { …. initialization awk commands …}

  2. { …. awk commands for each line of the file…}

  3. END { …. finalization awk commands …}

For each line of the input file , It looks to see if there are any pattern matching instructions , In this case, it only runs on lines that match the pattern , Otherwise it runs on all lines . these  'pattern-matching'  A command can contain and grep The same regular expression . 

awk Commands can do some very complex mathematical and string operations ,awk It also supports associative arrays . AWK Think of each line as consisting of multiple fields , Each field consists of “ Spacer ” Separate . By default , This is one or more space characters , So it's OK :

  1. this is a line of text

contain 6 A field . stay awk in , The first field is called $1, The second field is called $2, wait , All lines are called $0. 

The field separator is defined by awk Internal variables FS Set up , So if you set FS= ": " Then it will be based on ':' In a row , This is for  /etc/passwd  Documents like that are very useful , Other useful internal variables are NR, The current record number ( Line number ) NF Is the number of fields in the current row . 

AWK You can operate on any file , Include  std-in, under these circumstances , It is usually with '|' Command is used together , for example , combination grep Or other orders . 

for example , If I list all the files in the current directory

  1. ls -l

  2. Total usage 140

  3. -rw-r--r-- 1 root root 55121 1 month   3 17:03 combined_log_format.log

  4. -rw-r--r-- 1 root root 80644 1 month   3 17:03 combined_log_format_w_resp_time.log

  5. -rw-r--r-- 1 root root    71 1 month   3 17:55 sampler.log

` I can see the file size report as 3 Column data . If I want to know their total size , The files in this directory I can do :

  1. ls -l | awk 'BEGIN {sum=0} {sum=sum+$5} END {print sum}'

  2. 135836

Please note that ,'print sum' Print variables sum Value , So if sum = 2 be 'print sum' Give the output '2' and 'print $ sum' Will print '1' , Because the second field contains the value '1' . 

therefore , Will be very simple to write a can calculate the average and the standard deviation of a column of numbers awk command - Accumulate in the main interior 'sumx' and 'sumx2' part , Then use the standard formula to calculate END The mean and standard deviation of the part . 

AWK Support ('for' and 'while') Loops and branching ( Use 'if '). therefore , If you want to trim a file and only on each page 3 Line operation , You can do that :

  1. ls -l | awk '{for (i=1;i<3;i++) {getline}; print NR,$0}'

  2. 3 -rw-r--r-- 1 root root 80644 1 month   3 17:03 combined_log_format_w_resp_time.log

  3. 4 -rw-r--r-- 1 root root    71 1 month   3 17:55 sampler.log

for Recycling “getline” Command traverses the file , And every 3 Print one line at a time .

Be careful , Because the number of lines in the file is 4, Can not be 3 to be divisible by , So the last order is done ahead of time , So the last “print $0” Order to print the 4 That's ok , You can see that we also printed the line , Use NR Variable output line number .

AWK Pattern matching

AWK It's a line oriented language . The first is the pattern , And then there's the action . The operation statement uses { and } Cover up . Patterns may be missing , Or the movement may be missing , however , Of course not all . If there is no pattern , For each input record . A missing action will print the entire record . 

AWK Patterns include regular expressions ( Use with “grep -E” The same grammar ) And the combination of special symbols used “&&” Express “ Logic AND ”,“||” Express “ Logic or ”,“!” It means “ No logic ”. 

You can also do relationship patterns 、 Pattern group 、 Scope, etc .

AWK Control statement

  1. if (condition) statement [ else statement ]

  2. while (condition) statement

  3. do statement while (condition)

  4. for (expr1; expr2; expr3) statement

  5. for (var in array) statement

  6. break

  7. continue

  8. exit [ expression ]

AWK Input / Output statement

Be careful :printf The command allows you to use something like C Specifies the output format more closely for example , You can specify an integer of a given width , Floating point numbers or strings, etc .

AWK Mathematical functions

AWK String function

AWK Command line and usage

You can use it as many times as you need ' -v ' Flag passes the variable to awk Program , for example

  1. awk -v skip=3 '{for (i=1;i<skip;i++) {getline}; print $0}' sampler.log

  2. booze

  3. bungie

  4. broken$tuff

You can also use the editor to write awk Program , Then save it as a script file , for example :

  1. $ cat awk_strip

  2. #!/usr/bin/awk -f

  3. #only print out every 3rd line of input file

  4. BEGIN {skip=3}

  5. {for (i=1;i<skip;i++)

  6. {getline};

  7. print $0}

You can then use it as a new add-on command

  1. chmod u+x awk_strip

  2. ./awk_strip sampler.dat

sed = stream editor

sed For the input stream ( File or input from pipeline ) Perform basic text conversion single through stream , So it's very efficient . however , sed Ability to filter text through pipes , Especially different from other types of editors .

sed Basics

sed It can be on the command line or shel l Use in script , Edit files in a non interactive way . Perhaps the most useful function is to edit a string “ Search and replace ” To another string . You can use sed Commands are embedded into the use of '-e' Option call sed In the command line of , Or put them in a separate file '' And use '-f' Option call sed. The latter option is if sed The command is complex and involves a lot of regexp, The most commonly used , for example : 


Will be taken from  sampler.log  Echo to every line of standard output , Change every line of 'input' Line up 'output'. Be careful sed It's line oriented , So if you want to change every event in every line , So you need to make it a ' greedy ' Search and replace , As shown below :

  1. sed -e 's/input/output/g' sampler.log

  2. boot

  3. book

  4. booze

  5. machine

  6. boots

  7. bungie

  8. bark

  9. aardvark

  10. broken$tuff

  11. robots

/.../  The expression in can be a literal string or regular expression . Note that by default , The output will be written to stdout. You can redirect it to a new file , Or if you want to Edit existing files , You should use '-i' sign :

  1. sed -e 's/input/output/' sampler.log  > new_file

  2. sed -i -e 's/input/output/' sampler.log  

sed And regular expressions

If a character you want to use in a search command is a special symbol , for example '/', What should I do ?( For example, in the file name ) or '*' etc. ? Then you have to be like grep( and awk) So the escape symbol . I want to tell you that I want to edit shell Script to reference  /usr/local/bin instead of  /bin, Then you can do this

  1. sed -e 's/\/bin/\/usr\/local\/bin/' my_script > new_script

What if you want to use wildcards in your search - How to write an output string ? You need to use a special symbol corresponding to the pattern you find “&”. So you want each line to start with a number in your file , And bracket the number :

  1. sed -e 's/[0-9]*/(&)/'

among [0-9] It's all single digits regexp Range , and '*' It's a repeat count , The number of digits representing any number . You can also regexp Using position commands in , You can even save some of the matching results in the pattern buffer , So that it can be reused elsewhere .

Other SED command

The general form is

  1. sed -e '/pattern/ command' sampler.log

among 'pattern' It's a regular expression ,'command' It can be 's'= search&replace, or 'p'= print, or 'd'= delete, or 'i'=insert, or 'a'=append etc. . Please note that , The default operation is to print all not match anyway , So if you want to suppress it , You need to use '-n' Flag call sed, Then you can use 'p' Command to control what is printed . therefore , If you want to make a list of all Subdirectories you can use

  1. ls -l | sed -n -e '/^d/ p'

Because the long list starts with each line with 'd' Symbol , If it's a directory , So this will only print out those with 'd' The line at the beginning of the symbol . Again , If you want to delete all comments with symbols '#' Beginning line , You can use

  1. sed -e '/^#/ d' sampler.log

You can also use the scope form

  1. sed -e '1,100 command' sampler.log

In the 1-100 Do it “ command ”. You can also use a special line number $ To express “ end ” file . therefore , If you want to delete the file before 10 All lines except lines , You can use

  1. sed -e '11,$ d' sampler.log

You can also use the pattern range form , The first regular expression defines the beginning of the scope , And the second stop . therefore , for example , If you want to print from 'boot' To 'machine' All of the line You can do that :

  1. sed -n -e '/boot$/,/mach/p' sampler.log

  2. boot

  3. book

  4. booze

  5. machine

And then just print out (-n)regexp The lines in a given range .


Linux Three swordsmen awk,sed and grep It is widely used in performance modeling 、 Performance monitoring and performance analysis , It's also a high-frequency interview question for testing posts of major Internet companies , One of the necessary skills for middle and high-end testers

本文为[Zee_ 7D]所创,转载请带上原文链接,感谢

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云