brief introduction

grep (global search regular expression(RE) and print out the line, Search for regular expressions and print lines ) Is a powerful text search tool , It can search for text using regular expressions , And print out the matching lines .

Unix Of grep Family includes grep、egrep and fgrep.egrep and fgrep The order of grep It's a little different .egrep yes grep An extension of , Support more re Metacharacters , fgrep Namely fixed grep or fast grep, They think of all the letters as words , in other words , Metacharacters in regular expressions represent their own literal meanings , No longer special .linux Use GNU Version of grep. It's more powerful , Can pass -G、-E、-F Command line options to use egrep and fgrep The function of .

 

grep Common usage

 Copy code

[[email protected] ~]# grep [-acinv] [--color=auto] ' Search string ' filename
Options and parameters :
-a : take  binary  Document to  text  Searching data by file
-c : Calculate find  ' Search string '  The number of times
-i : Ignore case differences , So case is the same
-n : Output line number by the way
-v : Reverse selection , That is to say, no  ' Search string '  Content line !
--color=auto : You can add color to the key words you find !

 Copy code

 

take /etc/passwd, Appear root Line out of

 Copy code

# grep root /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
or
# cat /etc/passwd | grep root 
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

 Copy code

 

take /etc/passwd, Appear root Line out of , Also show these lines in the /etc/passwd Line number

# grep -n root /etc/passwd
1:root:x:0:0:root:/root:/bin/bash
30:operator:x:11:0:operator:/root:/sbin/nologin

  On the display of keywords ,grep have access to --color=auto To display the keyword section in color . This is a very good function ! But if you use it every time grep All have to be added on their own --color=auto It's a lot of trouble again ~ It's easy to use at this time alias We have to deal with it ! You can ~/.bashrc Add this line inside :『alias grep='grep --color=auto'』 And then to 『 source ~/.bashrc 』 To take effect immediately ! So every time grep He will automatically add color display for you

 

take /etc/passwd, Will not appear root Line out of

# grep -v root /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

 

take /etc/passwd, Will not appear root and nologin Line out of

# grep -v root /etc/passwd | grep -v nologin
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

 

use dmesg List core information , And then to grep Find the inclusion eth That line , To color captured keywords , Plus the line number :

[[email protected] ~]# dmesg | grep -n --color=auto 'eth'
247:eth0: RealTek RTL8139 at 0xee846000, 00:90:cc:a6:34:84, IRQ 10
248:eth0: Identified 8139 chip type 'RTL-8139C'
294:eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1
305:eth0: no IPv6 routers present
#  You'll find that in addition to  eth  There will be special colors , There's a line number at the front !

On the display of keywords ,grep have access to --color=auto To display the keyword section in color . This is a very good function ! But if you use it every time grep All have to be added on their own --color=auto It's a lot of trouble again ~ It's easy to use at this time alias We have to deal with it ! You can ~/.bashrc Add this line inside :『alias grep='grep --color=auto'』 And then to 『 source ~/.bashrc 』 To take effect immediately ! So every time grep He will automatically add color display for you

 

use dmesg List core information , And then to grep Find the inclusion eth That line , The first two lines of the keyword line and the last three lines are also displayed together

 Copy code

[[email protected] ~]# dmesg | grep -n -A3 -B2 --color=auto 'eth'
245-PCI: setting IRQ 10 as level-triggered
246-ACPI: PCI Interrupt 0000:00:0e.0[A] -> Link [LNKB] ...
247:eth0: RealTek RTL8139 at 0xee846000, 00:90:cc:a6:34:84, IRQ 10
248:eth0: Identified 8139 chip type 'RTL-8139C'
249-input: PC Speaker as /class/input/input2
250-ACPI: PCI Interrupt 0000:00:01.4[B] -> Link [LNKB] ...
251-hdb: ATAPI 48X DVD-ROM DVD-R-RAM CD-R/RW drive, 2048kB Cache, UDMA(66)
#  As shown above , You'll find keywords  247  The first two lines and  248  The last three lines are also shown !
#  This allows you to capture the data before and after keywords for analysis !

 Copy code

 

Search directory recursively based on file content

# grep ‘energywise’ *           # Search band in current directory 'energywise' Row files
# grep -r ‘energywise’ *        # Search under the current directory and its subdirectories 'energywise' Row files 

# grep -l -r ‘energywise’ *     # Search under the current directory and its subdirectories 'energywise' Row files , But do not show matching rows , Show only matching files 

These commands are very useful , It's a powerful tool for finding documents .

 

grep And normal expressions

  Character class

Search for character classes : If I want to search test or taste When these two words , You can find , In fact, they have something in common 't?st' There is ~ This is the time , I can search like this :

[[email protected] ~]# grep -n 't[ae]st' regular_express.txt
8:I can't finish the test.
9:Oh! The soup taste good.


Actually [] No matter how many bytes are in it , On behalf of 『 One 』 byte , therefore , The above example shows , The string I need is 『tast』 or 『test』 Two strings !

 

Reverse selection of character class [^] : If you want to search for oo The line of , But I don't want it. oo There is g, as follows

[[email protected] ~]# grep -n '[^g]oo' regular_express.txt
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!

The first 2,3 Yes, there is no doubt , because foo And Foo Acceptable !

But the first 18 Obviously there is. google Of goo ah ~ Don't forget. , Because it appears after the line tool Of too ah ! So the row is also listed ~ in other words , 18 Although there are some items we don't want (goo) But because of the need (too) , therefore , It matches the string search !

To the first 19 That's ok , alike , because goooooogle Inside oo It could be o , for example : go(ooo)oogle , therefore , This line also meets the needs !

 

Continuity of character classes : Come again , Suppose I oo Do not want lower case bytes before , therefore , I could write it like this [^abcd....z]oo , But it doesn't seem convenient , Due to lower case ASCII The sequence of upper coding is continuous , therefore , We can simplify it to the following :

[[email protected] ~]# grep -n '[^a-z]oo' regular_express.txt
3:Football game is not use feet only.

in other words , When we are in a set of bytes , If the byte group is continuous , For example, capital English / Lower case English / Figures, etc. , You can use [a-z],[A-Z],[0-9] And so on , So what if our request string is numbers and English ? ha-ha ! Just put it all together , become :[a-zA-Z0-9].

We're going to get the line with the numbers , That's it :

[[email protected] ~]# grep -n '[0-9]' regular_express.txt
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.

 

Leading and trailing bytes ^ $
Head character : If I want to the Only listed at the top of the line ? It's time to use the location byte ! We can do that :

[[email protected] ~]# grep -n '^the' regular_express.txt
12:the symbol '*' is represented as start.

 


here , Only the first 12 That's ok , Because only the first 12 The beginning of the line is the The beginning. ~ Besides , What if I want the line that starts with a lowercase byte ? It can be like this :

 Copy code

[[email protected] ~]# grep -n '^[a-z]' regular_express.txt
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.

 Copy code

 

If I don't want to start with English letters , It could be :

[[email protected] ~]# grep -n '^[^a-zA-Z]' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
21:# I am VBird

^ Symbol , Symbol in character class ( Brackets []) Inside and outside are different ! stay [] Internal representative 『 Reverse selection 』, stay [] Outside represents the meaning of positioning at the beginning of the line !

 

Then if I want to find out , End of line with decimal point (.) That line :

 Copy code

[[email protected] ~]# grep -n '\.$' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
11:This window is clear.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
20:go! go! Let's go.

 Copy code


Special attention , Because the decimal point means something else ( I'll introduce you later ), So you have to use escape characters (\) To remove its special significance !

 

Find blank lines :

[[email protected] ~]# grep -n '^$' regular_express.txt
22:

Because it's just the beginning and the end (^$), therefore , So we can find the blanks !

 

Any byte . And repeat bytes *
The meanings of these two symbols in regular expressions are as follows :

. ( decimal point ): representative 『 There must be an arbitrary byte 』 It means ;
* ( asterisk ): representative 『 Repeat previous character , 0  To infinity 』 It means , In combination form 

Suppose I need to find out g??d String , That is, there are four bytes in total , Start with g And the end is d , I can do this :

[[email protected] ~]# grep -n 'g..d' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
9:Oh! The soup taste good.
16:The world <Happy> is the same with "glad".

Because emphasis g And d There must be two bytes between , therefore , The first 13 Yes god With the first 14 Yes gd It won't be listed !

 

If I want to list oo, ooo, oooo And so on , in other words , At least two ( contain ) o above , What to do ?

because * It stands for 『 repeat 0 One or more front RE character 』 The meaning of , therefore ,『o*』 It stands for :『 Have empty bytes or one o Bytes above 』, therefore ,『 grep -n 'o*' regular_express.txt 』 All data will be printed on the terminal !

When we need 『 At least two. o String above 』 when , Need ooo* , That is to say :

 Copy code

[[email protected] ~]# grep -n 'ooo*' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

 Copy code

 

If I want the beginning and the end of a string to be g, But the two one. g There can only be at least one between o , That is to say gog, goog, gooog.... wait , How should it be? ?

[[email protected] ~]# grep -n 'goo*g' regular_express.txt
18:google is the best tools for search keyword.
19:goooooogle yes!

 

If I want to find out g Beginning and g The line at the end , The characters in it are optional

[[email protected] ~]# grep -n 'g.*g' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
14:The gd software is a library for drafting programs.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.


Because it's a representative g Beginning and g ending , Any middle byte is acceptable , therefore , The first 1, 14, 20 Yes, it's acceptable ! This .* Of RE It is common to represent any character .

 

If I want to find out 『 Arbitrary number 』 The line of ? Because there are only numbers , So it became :

[[email protected] ~]# grep -n '[0-9][0-9]*' regular_express.txt
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.

 

 

Restricted continuity RE character in range {}
We can use . And RE Characters and * To configure the 0 To infinite number of repeat bytes , What if I want to limit the number of repeat bytes in a range ?

for instance , I want to find two or five o Continuous string of , How to do it? ? In this case, you need to use a limited range of characters {} 了 . But because { And } Symbols in shell It has a special meaning , therefore , We have to use characters   \ To make him lose his special meaning . To {} The grammar of , Suppose I find two o String , It can be :

 Copy code

[[email protected] ~]# grep -n 'o\{2\}' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search ke
19:goooooogle yes!

 Copy code

 

Suppose we want to find out g Followed by 2 To 5 individual o , And then one more g String , He would be :

[[email protected] ~]# grep -n 'go\{2,5\}g' regular_express.txt
18:google is the best tools for search keyword.

 

If what I want is 2 individual o The above goooo....g Well ? Except it can be gooo*g , It can also be :

[[email protected] ~]# grep -n 'go\{2,\}g' regular_express.txt
18:google is the best tools for search keyword.
19:goooooogle yes!

 

Expand grep(grep -E perhaps egrep):
Use extension grep The main benefit is the addition of an additional set of regular expression metacharacters .

 

Print all contains NW or EA The line of . If not egrep, It is grep, There will be no results .

    # egrep 'NW|EA' testfile     
    northwest       NW      Charles Main        3.0     .98     3       34
    eastern         EA      TB Savage           4.4     .84     5       20

 

For standards grep, If you precede the extended metacharacter \,grep Extension options are automatically enabled -E.

#grep 'NW\|EA' testfile
northwest       NW      Charles Main        3.0     .98     3       34
eastern         EA      TB Savage           4.4     .84     5       20

 

Search all contains one or more 3 The line of .

 Copy code

# egrep '3+' testfile
# grep -E '3+' testfile
# grep '3\+' testfile        
# this 3 Orders will
northwest       NW      Charles Main          3.0     .98     3       34
western         WE      Sharon Gray           5.3     .97     5       23
northeast       NE      AM Main Jr.           5.1     .94     3       13
central         CT      Ann Stephens          5.7     .94     5       13

 Copy code

 

Search all contains 0 Or 1 Decimal character lines .
    

 Copy code

# egrep '2\.?[0-9]' testfile 
# grep -E '2\.?[0-9]' testfile
# grep '2\.\?[0-9]' testfile 
# First contain 2 character , Followed by 0 Or 1 A little bit , Then there's 0 and 9 Number between .
western         WE       Sharon Gray          5.3     .97     5       23
southwest       SW      Lewis Dalsass         2.7     .8      2       18
eastern         EA       TB Savage             4.4     .84     5       20

 Copy code

 

Search for one or more consecutive no The line of .
    

# egrep '(no)+' testfile
# grep -E '(no)+' testfile
# grep '\(no\)\+' testfile   #3 Commands return the same result ,
northwest       NW      Charles Main        3.0     .98     3       34
northeast       NE       AM Main Jr.        5.1     .94     3       13
north           NO      Margot Weber        4.5     .89     5       9

 

Do not use regular expressions

fgrep Query speed ratio grep Order fast , But it's not flexible enough : It can only find fixed text , Instead of regular expressions .

If you want to find a line with an asterisk in a file or output

fgrep  '*' /etc/profile
for i in /etc/profile.d/*.sh ; do
or
grep -F '*' /etc/profile
for i in /etc/profile.d/*.sh ; do

 

Reference resources  http://vbird.dic.ksu.edu.tw/linux_basic/0330regularex_2.php

       http://www.cnblogs.com/stephen-liu74/archive/2011/11/14/2243694.html