sort Is in Linux A very common command in , Tube sequencing , Concentrate , Five minutes sort, now !

1 sort How it works

 

sort Take each line of the file as a unit , Compare with each other , The principle of comparison is from the first character back , Press... In turn ASCII Code value for comparison , Finally, output them in ascending order .

[[email protected] programming]$ cat seq.txt
banana
apple
pear
orange
[[email protected] programming]$ sort seq.txt
apple
banana
orange
pear

2 sort Of -u Options

Its function is very simple , It's removing duplicate lines from the output line .

[[email protected] programming]$ cat seq.txt
banana
apple
pear
orange
pear
[[email protected] programming]$ sort seq.txt
apple
banana
orange
pear
pear
[[email protected] programming]$ sort -u seq.txt
apple
banana
orange
pear

pear Because of repetition -u The options were ruthlessly deleted .

3 sort Of -r Options

sort The default sort order is ascending , If you want to change it to descending order , Just add -r Just like the .

[[email protected] programming]$ cat number.txt
1
3
5
2
4
[[email protected] programming]$ sort number.txt
1
2
3
4
5
[[email protected] programming]$ sort -r number.txt
5
4
3
2
1

4 sort Of -o Options

because sort The default is to output the results to standard output , So you need to use redirection to write the results to a file , Form like sort filename > newfile.

however , If you want to output the sorting result to the original file , Not with redirection .

[[email protected] programming]$ sort -r number.txt > number.txt
[[email protected] programming]$ cat number.txt
[[email protected] programming]$
see , It turns out that number It's empty .

Right now ,-o The option appears , It successfully solved this problem , Let you feel free to write the results to the original file . This may also be -o The only advantage of proportion orientation is .

[[email protected] programming]$ cat number.txt
1
3
5
2
4
[[email protected] programming]$ sort -r number.txt -o number.txt
[[email protected] programming]$ cat number.txt
5
4
3
2
1

5 sort Of -n Options

Have you ever met 10 Than 2 A small situation . I've met . This happens because the sorter sorts the numbers by character , The sorter will compare 1 and 2, obviously 1 Small , So I will 10 Put it in 2 In the front . This is also sort Our consistent style .

If we want to change this situation , Then use -n Options , To tell sort,“ Order by number ”!

[[email protected] programming]$ cat number.txt
1
10
19
11
2
5
[[email protected] programming]$ sort number.txt
1
10
11
19
2
5
[[email protected] programming]$ sort -n number.txt
1
2
5
10
11
19

6 sort Of -t Options and -k Options

If there is a file like this :

[[email protected] programming]$ cat facebook.txt
banana:30:5.5
apple:10:2.5
pear:90:2.3
orange:20:3.4

This file has three columns , Columns are separated by colons , The first column shows the type of fruit , The second column shows the number of fruits , The third column shows the price of fruit .

So I want to sort by the number of fruits , That is, sort by the second column , How to use it sort Realization ?

fortunately ,sort Provides -t Options , You can set a space character later .( Do you think of cut and paste Of -d Options , Resonance ~~)

After specifying the spacer , You can use it -k To specify the number of columns .

[[email protected] programming]$ sort -n -k 2 -t : facebook.txt
apple:10:2.5
orange:20:3.4
banana:30:5.5
pear:90:2.3

We use a colon as a separator , And sort the second column in numerical ascending order , The results are very satisfactory .

7 Other sort Common options

-f Will convert all lowercase letters to uppercase letters for comparison , That is, ignore case

-c Will check if the file is in order , If in disorder , Then output the information about the first disordered row , Finally back to 1

-C Will check if the file is in order , If in disorder , No output , Return only 1

-M It will be sorted by month , such as JAN Less than FEB wait

-b All blanks in front of each line are ignored , Compare from the first visible character .

Sometimes learning scripts , You'll find that sort The order was followed by a bunch of similar -k1,2, perhaps -k1.2 -k3.4 The east east , It's kind of weird . today , We'll take care of it —-k Options !

1 Prepare the material

$ cat facebook.txt
google 110 5000
baidu 100 5000
guge 50 3000
sohu 100 4500

 

The first domain is the company name , The second domain is the number of companies , The third area is the average wage of employees .( Except for the company name , Don't believe the rest , It's all written in the dark ^_^)

2 I want this file to be sorted alphabetically by company , That is, sort by the first field :( This facebook.txt The file has three fields )

$ sort -t ‘ ‘ -k 1 facebook.txt
baidu 100 5000
google 110 5000
guge 50 3000
sohu 100 4500

See? , Use directly -k 1 Just set it .( Actually, it's not strict here , You will know later )

3 I want to make facebook.txt Sort by the number of people in the company

$ sort -n -t ‘ ‘ -k 2 facebook.txt
guge 50 3000
baidu 100 5000
sohu 100 4500
google 110 5000

No explanation , I believe you can understand .

however , There's a problem here , That's it baidu and sohu We have the same number of companies , All are 100 people , What to do at this time ? According to the default rules , It's sort ascending from the first field , therefore baidu It's in line sohu front .

4  I want to make facebook.txt Sort by the number of people in the company , The same number of employees is sorted in ascending order of average salary :

$ sort -n -t ‘ ‘ -k 2 -k 3 facebook.txt
guge 50 3000
sohu 100 4500
baidu 100 5000
google 110 5000

see , We added a -k2 -k3 It solved the problem . Yes ,sort Support this setting , That is to say, set the priority of domain sorting , First of all 2 Sort fields , If the same , And then in the first place 3 Sort fields .( If you will , You can keep writing like this , Set a lot of sorting priorities )

5 I want to make facebook.txt Sort by employee salary in descending order , If the number of employees is the same , In ascending order of company number :( This is a bit difficult )

$ sort -n -t ‘ ‘ -k 3r -k 2 facebook.txt
baidu 100 5000
google 110 5000
sohu 100 4500
guge 50 3000

Here are some tips , Take a close look at , stay -k 3 A small letter was added behind it r. Do you think , Combined with us Last article , Can you get the answer ? Unveiling :r and -r The function of options is the same , It means reverse order . because sort The default is to sort in ascending order , So we need to add r Represents the third domain ( The average wage of the employees ) In descending order . Here you can add n, When sorting this field , Sort by numerical value , Let's give you an example :

$ sort -t ‘ ‘ -k 3nr -k 2n facebook.txt
baidu 100 5000
google 110 5000
sohu 100 4500
guge 50 3000

see , We took out the front one -n Options , Instead, it's added to every -k The choice is .

6 -k The specific syntax format of the options

If you want to go further , We have to have some theoretical knowledge . You need to understand -k Syntax format for options , as follows :

[ FStart [ .CStart ] ] [ Modifier ] [ , [ FEnd [ .CEnd ] ][ Modifier ] ]

This syntax format can be represented by commas (“,”) It's divided into two parts ,Start Part and End part .

I'll give you an idea first , That's it “ If you don't set End part , So think of it as End Set to end of line ”. This concept is very important , But often you don't value it .

Start Part also consists of three parts , Among them Modifier Part of it is what we said before n and r The options section of . Let's focus on Start Part of the FStart and C.Start.

C.Start It can also be omitted , If omitted, it means starting from the beginning of the field . In the previous example -k 2 and -k 3 It's omitting C.Start For example .

FStart.CStart, among FStart It means the domain used , and CStart It means in FStart The field starts with the first few characters “ Sort first character ”.

Empathy , stay End In the part , You can set FEnd.CEnd, If you omit .CEnd, It means end to “ Domain tail ”, The last character in this field . perhaps , If you will CEnd Set to 0( zero ), It also means the ending to “ Domain tail ”.

7 A whim , Start with the second letter of the company's English name :

$ sort -t ‘ ‘ -k 1.2 facebook.txt
baidu 100 5000
sohu 100 4500
google 110 5000
guge 50 3000

see , We used -k 1.2, This means sorting strings from the second character of the first field to the last character of the field . You'll find that baidu Because the second letter is a And at the top of the list .sohu and google The second character is o, but sohu Of h stay google Of o front , So they are second and third .guge We have to be fourth .

8 Another whim ,, Sort only the second letter of the company's English name , If the same is sorted in descending order according to the employee's salary :

$ sort -t ‘ ‘ -k 1.2,1.2 -k 3,3nr facebook.txt
baidu 100 5000
google 110 5000
sohu 100 4500
guge 50 3000

Because only the second letter is sorted , So we used -k 1.2,1.2 Is represented by , It means that we “ only ” Sort the second letter .( If you ask “ I use -k 1.2 Why not ?”, Of course not , Because you omitted End part , This means that you will sort the strings from the second letter to the last character in the field ). Arrange the salary of employees order , We also used -k 3,3, This is the most accurate statement , It means that we “ only ” Sort this field , Because if you omit the following 3, It's us “ Right. 3 The contents from the beginning to the last domain are sorted ” 了 .

9 stay modifier What other options are available in the section ?

You can use b、d、f、i、n or r.

among n and r You must be familiar with .

b Indicates that the check-in blank symbol of this field is ignored .

d It means to sort the fields in dictionary order ( namely , Think only of white space and letters ).

f Indicates to sort the field regardless of case .

i Said to ignore “ Non printable characters ”, Sort only for printable characters .( There are some ASCII Just non printable characters , such as \a It's the police ,\b It's backspace ,\n It's line breaking ,\r It's carriage return and so on )

10 Think about -k and -u Examples of joint use :

$ cat facebook.txt
google 110 5000
baidu 100 5000
guge 50 3000
sohu 100 4500

This is the most primitive facebook.txt file .

$ sort -n -k 2 facebook.txt
guge 50 3000
baidu 100 5000
sohu 100 4500
google 110 5000

$ sort -n -k 2 -u facebook.txt
guge 50 3000
baidu 100 5000
google 110 5000

When you set the numerical sorting by company employee field , Then add -u after ,sohu One line is deleted ! original -u For identification only -k Set the domain , Found the same , Delete all subsequent lines that are the same .

$ sort  -k 1 -u facebook.txt
baidu 100 5000
google 110 5000
guge 50 3000
sohu 100 4500

$ sort  -k 1.1,1.1 -u facebook.txt
baidu 100 5000
google 110 5000
sohu 100 4500

The same goes for this example , The opening character is g Of guge No one survived .

$ sort -n -k 2 -k 3 -u facebook.txt
guge 50 3000
sohu 100 4500
baidu 100 5000
google 110 5000

Why ! With two levels of priority set here , Use -u No lines have been deleted . original -u It's a trade-off -k Options , Will be the same will be deleted , As long as one level is different, it will not be deleted easily :)( Don't believe it , You can add your own line sina 100 4500 Give it a try )

11 The strangest sort :

$ sort -n -k 2.2,3.1 facebook.txt
guge 50 3000
baidu 100 5000
sohu 100 4500
google 110 5000

Sort from the second character of the second field to the first character of the third field .

first line , Can extract 0 3, The second line extracts 00 5, The third line extracts 00 4, The fourth line extracts 10 5.

Again because sort Think 0 Less than 00 Less than 000 Less than 0000….

therefore 0 3 It must be in the first .10 5 It must be at the last . What is it for? 00 5 But in the 00 4 What about the front ?( You can do your own experiment and think about it .)

The answer is announced. : original “ Cross domain setting is an illusion ”,sort Only the second character of the second field is compared to the last character of the second field , Instead of including the beginning of the third field in the comparison . If I found 00 and 00 Phase at the same time ,sort The first domain will be compared automatically . Of course baidu stay sohu The front . It can be proved by an example :

$ sort -n -k 2.2,3.1 -k 1,1r facebook.txt
guge 50 3000
sohu 100 4500
baidu 100 5000
google 110 5000

12 Sometimes in sort You'll see after the order +1 -2 These symbols , What is this ?

About this grammar , Abreast of the times sort That's how it's explained :

On older systems, `sort’ supports an obsolete origin-zero syntax `+POS1 [-POS2]‘ for specifying sort keys.  POSIX 1003.1-2001 (*note Standards conformance::) does not allow this; use `-k’ instead.

original , This ancient representation has been eliminated , In the future, you can reasonably despise scripts that use this representation !

( To prevent the existence of old scripts , Let's talk about this representation again , A plus sign means Start part , The minus sign means End part . The most important point is , In this way, the method is from the 0 Start counting , The first domain mentioned before , It is expressed here as the second 0 Domains . The first time before 2 Characters , It is expressed here as the second 1 Characters . understand ?)