Linux文本处理命令详解

一。选取命令:

Cutting and Pasting(提取指定内容):

•    cut –blist [-n][file-list](以字节为单位进行分割)

•    cut –clist [file-list](以字符为单位进行分割)中文必备

•    cut –flist [-dchar][-s][file-list](与-d一起使用,指定显示哪个区域)


1.who|cut -b 3//提取who每一行的第3个字节(多个用逗号隔开)

2.如果是中文,用-c则会以字符为单位,输出正常;而-b只会傻傻的以字节(8位二进制位)来计算,输出就是乱码。

3.cut -f1,2 student_record//提取第一、第二个字段

4.没有指定分隔符时,不能提取以非tab(制表符)分隔的文件

其实cut的-d选项的默认间隔符就是制表符,所以当你就是要使用制表符的时候,完全就可以省略-d选项,而直接用-f来取域就可以了。

s12507@Linux:/tmp$ cat /tmp/cut.demo

one yi

two er

three san

s12507@Linux:/tmp$ cut -f1 cut.demo

one yi

two er

three san

5.指定分割字符提取(只能是一个相隔空格)

s12507@Linux:/tmp$ cut -f1 -d" "cut.demo  // s12507@Linux:/tmp$cut -d' '-f1 cut.demo

one

two

three


Grep:Linux系统中grep命令是一种强大的文本搜索工具,它能使用正则表达式搜索文本,并把匹配的行打印出来。

Options:

–  i    Ignore the case of letters(不区分大小写)

–    Print line numbers along with matched lines(显示匹配行及行号)

–  v   Print nonmatching lines(显示不包括匹配文本的所有行)

–  c   Print the number of matching lines only(只输出匹配行的个数)

–   Search for the given pattern as a string(寻找给定模式为字符串)

–  l    Print only the names of files with matchinglines(查询多文件时只输出包含匹配字符的文件名)

eg:

s12507@Linux:/tmp$ grep John /tmp/students //关键字查询

John   Doe     ECE     3.54   [email protected] 111.222.3333

John   Clark   ECE     2.68   [email protected]        111.111.5555

John   Lee     EE      2.64   [email protected]  111.111.2222


s12507@Linux:/tmp$ grep '^R' /tmp/students  //正则表达式使用

Rick   Marsh   CS      23.34  [email protected]   111.222.6666


s12507@Linux:/tmp$ grep -i john /tmp/students //忽略大小写–i(ignored)

John   Doe     ECE     3.54   [email protected] 111.222.3333

John   Clark   ECE     2.68   [email protected]        111.111.5555

John   Lee     EE      2.64   [email protected]  111.111.2222


s12507@Linux:/tmp$ grep -n printf /tmp/hello.c //顺便显示查找内容的行号

4:     printf("hello world!\n");

 

注意:一些正则表达试在某些特定的命令下才能操作

s12507@Linux:/tmp$ egrep -w 'CS|MBA' /tmp/students  // CS或者MBA 系的学生

Al     Davis   CS      2.63   [email protected]      111.222.2222

Ahmad  Rashid  MBA     3.74   [email protected]   111.222.4444

Rick   Marsh   CS      23.34  [email protected]   111.222.6666

James  Adam    CS      2.77   [email protected]   111.222.7777

Jake   Zulu    CS      3.00    [email protected]  111.111.9999


练习:(With grep(/tmp/databook)

1.      Print all lines containing thestring San. grep ‘San’ /tmp/databook

2.      Print all lines where theperson's first name starts with J. grep ‘\<J’ /tmp/databook

3.      Print all lines ending in 700. grep ‘700$’ /tmp/databook

4.      Print all lines that don'tcontain 834. grep ‘834’ –v /tmp/databook

5.      Print all lines where birthdaysare in December.grep ‘:/12’ /tmp/databook

6.      Print all lines where the phonenumber is in the 408 area code.grep ‘:/408-’ /tmp/databook

7.      Print all lines containing anuppercase letter, followed by four lowercase letters, a comma, a space, and oneuppercase letter.

grep '[A-Z][a-z]\{4\}, [A-Z]'/tmp/databook

8.      Print lines where the last namebegins with K or k. grep ‘^[A-Za-z] + [kk]’ /tmp/databook

9.      Print lines preceded by a linenumber where the salary is a six-figure number. grep –n ‘:[0-9]\{6\}$’ /tmp/databook

10.  Print lines containing Lincolnor lincoln (remember that grep is insensitive to case)., grep ‘[Ll]incoln’ /tmp/databook


二。排序命令:

sort: Ordering aset of items according to some criteria

–  b  Ignore leading blanks(忽略每行前面开始出的空格字符)

–   Consider lowercases anduppercase letters to be equivalent(排序时,忽略大小写字母)

–  d  Sort according to usual alphabetical order(根据通常的字母顺序排序)

–  r  Sort in reverse order(以相反的顺序排序)

–  k  Specify a field as the sortkey(选择以哪个区间进行排序)

–   Compare according to stringnumerical value(依照数值的大小排序)

–   Specify field separator(<分隔字符>   指定排序时所用的栏位分隔字符)

 

1.默认以文件开头字母来排序,只跟文件名字的时候

s12507@Linux:/tmp$ sort /tmp/myStudent

2.根据第二个字段来排序

s12507@Linux:/tmp$ sort -k2 myStudent

第三个:s12507@Linux:/tmp$ sort –k3 myStudent

3.指定字段分割符:-t

s12507@Linux:/tmp$ sort -t: -k4 myStudent  //表示指定分割符为“:”

4.注意语言:

(1)字符为中文时,小写字母在排在前面

s12507@Linux:/tmp$ echo $LANG 

zh_CN.UTF-8   //中文

s12507@Linux:/tmp$ sort /tmp/sort.demo

alan

Alice

Jack

Tom

(2)字符为英文时,小写字母在排在后面

s12507@Linux:/tmp$ echo $LANG

C

s12507@Linux:/tmp$ sort /tmp/sort.demo

Alice

Jack

Tom

alan


三。Streameditor – Sed:

sed是一个很好的文件处理工具,本身是一个管道命令,主要是以行为单位进行处理,可以将数据行进行替换、删除、新增、选取等特定工作,下面先了解一下sed的用法
sed命令行格式为:  sed [-nefri] ‘command’输入文本      

-n∶使用安静(silent)模式。在一般 sed 的用法中,所有来自 STDIN的资料一般都会被列出到萤幕上。但如果加上 -n 参数后,则只有经过sed 特殊处理的那一行(或者动作)才会被列出来。  

-i∶直接修改读取的档案内容,而不是由萤幕输出。     


删除空白行:Delete blank line

sed'/^$/d' student_record

 

删除前十行:

s12507@Linux:/tmp$ sed '1,10d' /tmp/students

 

删除计算机系的学生:

s12507@Linux:/tmp$ sed '/\<CS/d' /tmp/students

 

修改EECS系为CS系:(默认一行)

s12507@Linux:/tmp$ sed 's/EECS/CS/' /tmp/students

全文替换:

s12507@Linux:/tmp$ sed 's/EECS/CS/g' /tmp/students

 

打印第二到第五行的内容:

s12507@Linux:/tmp$ sed -n'2,5p' /tmp/students

James  Davis   ECE     3.71   [email protected]       111.222.1111

Al     Davis   CS      2.63   [email protected]      111.222.2222

Ahmad  Rashid  MBA     3.74   [email protected]   111.222.4444

Sam    Chu     ECE     3.68   [email protected]  111.222.5555

 

在指定位置前插入:

s12507@Linux:/tmp$ sed '/\<CS/i good students' /tmp/students

 

指定位置后插入:

s12507@Linux:/tmp$ sed '/\<CS/a good students' /tmp/students


打印所有计算机系的学生:

s12507@Linux:/tmp$ sed -n '/\<CS/p' /tmp/students

Al     Davis   CS      2.63   [email protected]      111.222.2222

Rick   Marsh   CS      23.34  [email protected]   111.222.6666

James  Adam    CS      2.77   [email protected]   111.222.7777

Jake   Zulu    CS      3.00   [email protected]  111.111.9999

非计算机系的学生:

s12507@Linux:/tmp$ sed -n '/\<CS/!p' /tmp/students

 

作业(With sed(/tmp/databook).

1.       Change Jon's name to Joanthan sed ‘s/Jon/Joanthan/’ /tmp/databook

2.       Delete the first three lines sed ‘1,3d’ /tmp/databook

3.       Print lines 5 through 10 sed –n ‘5,10p’ /tmp/databook

4.       Delete lines containing Lane sed ‘/Lane/d’ tmp/databook

5.       Print all lines where the birthdays are inNovember orDecember sed –n ‘/:1[12]\//p’ /tmp/databook

6.       Replace the line containing Jose withJOSE HAS RETIRED. sed ‘s/jose.*/JOSE HASRETIRE/’ /tmp/databook

7.       Change Popeye's birthday to 11/14/46 sed ‘/Popeye/s/[0-9]*\/[0-9]*\/[0-9]*/11\/14\/46/’    /tmp/databook

8.       Delete all blank lines sed ‘/^$/d’ /tmp/databook


四。Awk:

awk是一个强大的文本分析工具,相对于grep的查找,sed的编辑,awk在其对数据分析并生成报告时,显得尤为强大,具有一定的编程能力。

Formof awk command:

awk ‘pattern’ filename

awk ‘{action}’filename             

awk ‘pattern {action}’ filename

parameter(参数):

–  F  specify the field separator(指定字段分隔符)

Awk'spattern is similar to C expression:

–  Relationship expression

–  Conditional expression

–  Arithmetic expression

–  Compound patterns

–  Range patterns

Awk's main action is print

 

Awk's pattern

Linux文本处理命令详解_第1张图片

Linux文本处理命令详解_第2张图片

Eg:

1.以J开头的所有行:

s12507@Linux:/tmp$ awk '/^J/' databook

 

2.打印第一列和第二列

s12507@Linux:~$ awk '{print $1,$2}' /tmp/student_record

John Doe

James Davis


3.打印计算机系的学生

s12507@Linux:~$ awk '/\<CS/{print $1,$2}' /tmp/student_record

Al Davis

Rick Marsh

James Adam

Jake Zulu

 

4.打印绩点大于3.5的:

s12507@Linux:~$ awk '$4>3.5' /tmp/student_record

John   Doe     ECE     3.54

James  Davis   ECE     3.71

Sam    Chu     ECE     3.68

Arun   Roy     SS      3.86

Art    Pohm    ECE     4.00

Nabeel Ali     EE      3.56

Tom    Nelson  ECE     3.81

Pat    King    SS      3.77

John   Lee     EE      3.64

Sunil  Raj     ECE     3.86

Diane  Rover   ECE     3.87

Aziz   Inan    EECS    3.75

 

4. 打印绩点大于3.5的名单:

s12507@Linux:~$ awk '$4>3.5{print $1,$2}' /tmp/student_record

John Doe

James Davis

Sam Chu

Arun Roy

Art Pohm

Nabeel Ali

Tom Nelson

Pat King

John Lee

Sunil Raj

Diane Rover

Aziz Inan

 

5.输出last name以R开头的:

s12507@Linux:~$ awk '$2~/^R/' /tmp/student_record

Ahmad  Rashid  MBA     3.04

Arun   Roy     SS      3.86

Sunil  Raj     ECE     3.86

Charles Right   EECS   3.31

Diane  Rover   ECE     3.87

 

打印名单:

s12507@Linux:~$ awk '$2~/^R/{print $1,$2}' /tmp/student_record

Ahmad Rashid

Arun Roy

Sunil Raj

Charles Right

Diane Rover

 

6.打印绩点加0.5后大于4的:

s12507@Linux:~$ awk '$4+0.5>4' /tmp/student_record

John   Doe     ECE     3.54

James  Davis   ECE     3.71

Sam    Chu     ECE     3.68

Arun   Roy     SS      3.86

Art    Pohm    ECE     4.00

Nabeel Ali     EE      3.56

Tom    Nelson  ECE     3.81

Pat    King    SS      3.77

John   Lee     EE      3.64

Sunil  Raj     ECE     3.86

Diane  Rover   ECE     3.87

Aziz   Inan    EECS    3.75

 

7.打印计算机系且绩点大于2.5的行:

s12507@Linux:~$ awk '$3=="CS" && $4>2.5' /tmp/student_record

Al     Davis   CS      2.63

James  Adam    CS      2.77

Jake   Zulu    CS      3.00

 

8.以冒号为分割符,打印用户名和工作目录

s12507@Linux:~$ awk -F ':' '1541{print $1,$6}' /etc/passwd

s12486 /home/s12486

s12488 /home/s12488

s12490 /home/s12490

s12492 /home/s12492

s12493 /home/s12493

s12494 /home/s12494

 

9.绩点2.77的学生

s12507@Linux:~$ awk ‘$4~2.77’ /tmp/student_record


10.几个常见内置变量:

NF:每一行拥有的字段数

NR:表示目前处理的是“第几行”数据  {print NR},NR=1( )NR>2( )

FS:目前的分割符,默认是空格键{FS=':'}

BEGIN:扫描前,显示预设变量

END:扫描后,输出最终结果

还有一些求和等算数内置函数,使用起来非常方便。

 

作业With awk(/tmp/donors)::

1.Print all the phone numbers

2.Print Dan's phone number

3.Print Susan's name and phone number

4.Print all last names beginning with D

5.Print all first names beginning witheither a C or E.

6.Print all first names containing onlyfour characters.

7.Print the first names of all those in the916 area code.

8.Print Main’s campaign contributions. Eachvalue should be printed with a leading dollar sign; e.g., $250 $100 $175.

9.Print second name followed with a commaand first name

10.Print the first and last names of thosewho contributed more than $100 in the second month.

11.Print the names and phone numbers ofthose who contributed less than $85 in the last month.

12.Print the names of those who contributedbetween $75 and $150 in the first month.

13.Print the names of those who contributedless than $800 over the three-month period.

14.Print the names and addresses of those withan average monthly contribution greater than $200.

15.Print the first name of those not in the916 area code.

16.Print each record preceded by the numberof the record.

17.Print the name and total contribution ofeach person.

18.Add $10 to Chet's second contribution.

19.Change Nancy McNeil's name to LouiseMcInnes.

Linux文本处理命令详解_第3张图片


五。其他命令:

1.统计命令:   wc        -l 行数,-w字数,-m字符数,没有option表示三者。


2.双向重定向:tee  (不仅在实现重定向,还在屏幕输出)    -a表示累加进去。


3.文件分割:  split      -b 大小(200K)    -l行数(10)


4.去重复命令:uniq         -c 记录重复次数


5.字符转换:   
tr 用于转换   -d用来删除字段                     
col、expand 表示tab转换为空格键
join   相关性  join  -t  ':' -1 4 A.txt  -2 3 B.txt 表示以:为分隔符,将A的第四列和B的第三列结合
paste  两行贴在一起,tab隔开

6.文本比较:   
diff:同一文件新旧版本的区别(以行区分)
 cmp:比较两个文件(以字符区分)
patch:用于文件的更新和还原,以diff为基础。

diif old.txt new.txt > A.patch
patch -PN < A.patch 更新
patch -R -PN(N是数字,表示几层目录,若是当前目录则为0) < A.patch

你可能感兴趣的:(Linux文本处理命令详解)