Linux文本操作常用命令

在Linux系统里常用于文本查看、操作、统计的命令：

• head/tail，cat/tac，less/more
• wc，sort，uniq
• cut，paste

通过多敲键盘，形成手感肌肉记忆，熟练掌握这些命令，在以后生信分析中游刃有余。以下是每个命令的详细的介绍（主要是跟着生信技能树小郭老师学习做的笔记）

接下来通过示例把10个命令的常用参数和用法展示出来：

1.cat命令: Concatenate 查看文本文件的内容，输出到屏幕

常见参数

-A ## 打印所有内容，包括特殊字符，如制表符
-n ## 打印出所有行号，
-b ##参数仅打印非空白行行号

参数的用法介绍：

####-A参数
cat  readme.txt##对比有没有加-A
cat -A readme.txt##-A把空隔符显示出来，如换行符可视化
#显示制表符的例子
cat >hope ##重定向一个文件名为beauty
wish to master Linux very well 66 ##中间用tab键打出的空格
^C##换行后按control C
cat -A hope ##加-A参数
wish to master Linux very well^I66$ ##显示^I两个制表符，还有换行符$
\t##特定为编程制表符
\n##为换行符

####-n参数
cat readme.txt##显示文件原始信息
cat -n readme.txt##加-n 一行行排列内容

####-b参数,自行去查使用方法
cat --help

常见用法

cat > file#重定向，注意有>号，把以下的文本保存到文件里，如果目录里没有file，其实可以理解新建文件且编辑
welcome to Biotrainee!#输入文本
^C##按control C 退出

cat file ##查看文本写入的内容
#welcome to Biotrainee!

cat  readme.txt
cat  -n  readme.txt

## 重定向（编辑文件）
cat >file###如果file里面有内容，重定向会把原来的内容清空掉
Welcome to Biotrainee() !
^C          ## 这里是按Crtl  C 退出
## 查看
cat file
Welcome to Biotrainee() !
cp /trainnee2/Aug1/readme.txt ./ ##到/trainnee2/Aug1复制readme.txt到./

注意事项：

cat >file

编辑文本结束时，另起一行再按control C，退出；

只能修改编辑的当行，使用退格键修改时，一定要按住control，和平时直接用的退格键不一样，在cat编辑下，按上下键也一样先按住control；

在开始第三行编辑，突然发现输入的第一行有错误，但是不能去修改第一行。只能重新再定向书写第一行，往后的内容粘贴复制。

cat命令查看和编辑文本的用法

用cat命令可以查看文本，也可以编辑文本，编辑文本要加>符号，退出时，另起一行，按control C退出

其它：

zcat可以查看压缩的文本文件，tac逆向查看

2.head、tail命令

结合 管道符| 使用

head / tail -n :查看文件的前/后 n 行，默认 10 行（在R语言里head和tail默认是前6行）。

head  -n  20  Data/example.fq
## 查看 .bashrc 的最后 10 行
tail  ~/.bashrc
## 查看前20行
head  -n  20  Data/example.fq | tail -1 
##|代表管道符，前面的输出作为后面的输入，R语言管道符为%>%

ls Data
cat Data/example.gtf  | head -n 3 ##读取前3行，因为屏幕小，显示就很乱
cat Data/example.gtf  | tail -n 3 ##查看后3行

####也可以不用管道符
head -n 3 Data/example.gtf

3.less和more命令

less命令用得最多，more命令用得比较少

3.1 less命令

常用参数:

-N##显示行号 
-S##单行显示 

zless##查看压缩文件

常见用法：

less [参数] 文件名

less命令重要的补充用法1：

使用less命令时：

• 上下左右键 查看文本内容

• Enter 键 向下移动一行

• 空格键 翻页

• q 键退出

注意：使用键盘上的快捷键，一定要在纯英文状态下使用，其它按键有时间再自行探索

需要注意退出：只要使用less命令，按q键

使用less命令，按q键退出

less命令重要的补充用法2：

less Data/example.gtf ##显示部分的内容，如新的打开一个窗口，这个时候不要乱按键盘，每个键有作用，举几个简单的：按空格键就实现翻页，按键盘上的上下键向上和向下翻页,左右键不能用。

点击键盘里的/键，为搜索功能，只要输入想搜索的内容，如gene，点击n键为向前翻页，shift N为向后翻页,大写N为往前翻，小写n为往后翻页，按q建退出

参数用法：

####-N：显示行号 
less Data/example.gtf##先看看没有加参数，后面对比加参数的，记得一定按q退出
less -N  Data/example.gtf##-N是显示行号，每行前面以数字显示

####-S：单行显示 
less -S  Data/example.gtf##很规整的排列，键盘上的上下翻页，左右翻动键能用，空格也可以翻页，同样可以用/键搜索

####-N和-S可以一起用
less -NS  Data/example.gtf##显示行号和排列规整

less命令不加参数查看文件：

less命令没加参数查看文件

less命令加-N参数查看文件：显示行号

less命令加-N参数查看文件

less命令加-S参数查看文件：单行显示，非常规整

less命令加-S参数查看文件

对比：less和cat命令显示文本行号的用法：less -N，cat -n

less -N Data/example.fq
zless -N Data/reads.1.fq.gz

cat -n Data/example.fq
##注意参数大小写

3.2 more命令

参数：没有
用法：more 文件名

**more ** 逐页查看，按空格翻页，按回车换行，more命令一般很少用，熟练掌握less命令就好

more  Data/example.gtf ##查看时会显示进度条，按空格键翻页

4.zless和zcat命令

查看压缩文件

ls Data
##有两个红色字体的压缩纯文本文件
zless Data/reads.1.fq.gz
##zless解压纯文本文件，和less功能一样，可以使用键盘键。服务器生设置less也可以查看压缩文本
zless -N Data/reads.1.fq.gz
less -N Data/reads.1.fq.gz#服务器生设置less也可以查看压缩文本

cat Data/reads.1.fq.gz##出现乱码，cat不能直接查看压缩文本
zcat Data/reads.1.fq.gz##zcat命令会刷屏，很难查找之前的内容(命令)

zcat Data/reads.1.fq.gz | less ##用管道符传给less

5.wc命令：统计文本

常见参数：

-l ##统计行数
-w ##统计字符串数 
-c ##统计字节数

常见用法

cat -n readme.txt
cat readme.txt | wc ##
wc -l readme.txt

wc命令的用法

cat -n readme.txt
wc -l readme.txt
cat readme.txt | wc -l
wc -w readme.txt
wc -c readme.txt

#####探讨wc和cat的用法：
cat >file
hello world
^C ##换行后按住control C，就是退出
wc file
#1  2 13 file##1代表1行，2代表2个单词，13代表13个字母，空格也算一个字符，\n换行符也算一个字符，只是看不见

wc统计字符是也把换行符统计进去，但是别的命令不一定会把换行符算进去，。

6.cut：文本切割

常见参数:

-d##指定分隔符，默认\t; 
#-d切出来的第一部分就是第一列，后面一部分就是第二例
-f##输出哪几列(字段fields)

常见用法：

less -S Data/example.gtf ###一个表格
less -S Data/example.gtf | cut -f 1##把第一例切出来，其实就是提出来看，并没有改文件里面的内容
##只是查看那一列的前几行，怎么操作？？
less -S Data/example.gtf | cut -f 1,3-5,7###切第1列，3到5列，第7列，保留
less -S Data/example.gtf | cut -f 1,3-5
less -S Data/example.gtf | cut -d 'h' -f 1  ##chr，分割符就是c，h，r

cut命令常见用法

切记：用less命令，按q退出

cat readme.txt
# Welcome to Biotrainee()!
# This is your personal account in our Cloud.
# Have a fun with it.
# Please feel free to contact with me(email to [email protected])
# (http://www.biotrainee.com/thread-1376-1-1.html)
cat readme.txt |cut -d "h" -f 1
# Welcome to Biotrainee()!        ##没有"h"就不切
#   T
# Have a fun wit                  ##识别大小写，大写的H不切
# Please feel free to contact wit
# (
cat readme.txt |cut -d "h" -f 1,2##保留第1和第2列
# Welcome to Biotrainee()!
#   This is your personal account in our Cloud.    ##This里的h保留
# Have a fun with it.
# Please feel free to contact with me(email to [email protected])
# (http://www.biotrainee.com/t
cat readme.txt |cut -d "h" -f 2
# Welcome to Biotrainee()!
#   is is your personal account in our Cloud.
# it.
# me(email to [email protected])
# ttp://www.biotrainee.com/t

####注意观察cat readme.txt |cut -d "h" -f 1和cat readme.txt |cut -d "h" -f 2切Have a fun with it 的结果

cut命令是按关键词来切的

cut命令用法的示例

7.sort 命令：排序

常见参数:

-n##按照数值从小到大进行排序 
-V##字符串中含有数值时，按照数值从小到大排序 
-r##逆向排序
-k##指定区域
-t##指定分隔符

常见用法：

less -S Data/example.gtf |sort -k 3 | less -S
###-k 3,指定第3列来排,##后面加和不加less -S的区别，加上less -S更好。为字典排序，去查sort，字典排序
less -S Data/example.gtf | sort -k 4 | less -S
##排之前1737是在前面的，排序之后100816在前，主要是用了字典排序，从排列比较对象最左边的数字意义一个个对比

排列前：

sort命令排列前

排列后：

sort命令排列后

sort命令字典法排序比较大小

解释：1和0比，1大；7和0比，7大…在sort字典排序里，100816比1737小，所以100816排在前面。

命令行查看一个表格结构时，因为制表符对齐的原因，出现一个错位的现象。还有字典排序可能和我们平时的认知不一样，所以加上-n参数，让数值有数学意义上的大小来排序。

less -S Data/example.gtf | sort -n -k 4 | less -S

制表符错位现象

8.uniq命令：去除重复行

常见参数：

-c##统计每个字符串连续出现的行数

uniq命令去重，只有两行出现重复，而且是相邻的两行，如果不相邻，即使出现两行一模一样，不会去重。

uniq命令通常和sort命令搭配使用

less -S Data/example.gtf | cut -f 1-5 ##只取前5列出来
less -S Data/example.gtf | cut -f 1-5 | wc -l##统计前5列有多少行
#237
less -S Data/example.gtf | cut -f 1-5 | uniq 
less -S Data/example.gtf | cut -f 1-5 | uniq |wc
 #   220    1100    6698
less -S Data/example.gtf | cut -f 1-5 | uniq |wc -l
##220 去重剩下220行，加不加-l反馈的结果是不一样
###怎么知道去重的是哪些行？加-c参数
less -S Data/example.gtf | cut -f 1-5 | uniq -c
#展示每一行的重复次数，只是展示相邻的
less -S Data/example.gtf | cut -f 3 | sort | uniq -c
#展示每一行的重复次数，展示所有的
less -S Data/example.gtf | cut -f 1-5 |sort | uniq |wc
#排序之后去重，统计
#220    1100    6698

统计出现的次数

换其它文件试试

less -S Data/example.gtf | cut -f 3 #取这个文件的第3列
less -S Data/example.gtf | cut -f 3 | uniq
#全部显示
less -S Data/example.gtf | cut -f 3 | sort |  uniq
# CDS
# UTR
# exon
# gene
# start_codon
# stop_codon
# transcript

 less -S Data/example.gtf | cut -f 3 | sort |  uniq -c
# 29 CDS
# 27 UTR
# 111 exon
# 20 gene
# 7 start_codon
# 9 stop_codon
# 34 transcript
#统计每行出现的次数

总之，uniq命令使用时经常和sort一起

9.paste命令：文本合并

常见参数:

-d##指定分隔符（默认是制表符：\t）
-s##按行合并(文本文件，类似R语言里cbind函数)

常见用法1：

paste file1 file2

cat >file
1
2
3
4
5
6
^C
cat file
# 1
# 2
# 3
# 4
# 5
# 6
paste file readme.txt
# 1 Welcome to Biotrainee()!
#  2    This is your personal account in our Cloud.
# 3 Have a fun with it.
# 4 Please feel free to contact with me(email to [email protected])
# 5 (http://www.biotrainee.com/thread-1376-1-1.html)
# 6
paste -d  "." file readme.txt
# 1.Welcome to Biotrainee()!
#   2.This is your personal account in our Cloud.
# 3.Have a fun with it.
# 4.Please feel free to contact with me(email to [email protected])
# 5.(http://www.biotrainee.com/thread-1376-1-1.html)
# 6.

-s :按行合并演示

#按行合并
paste -s file readme.txt
# 1 2   3   4   5   6
# Welcome to Biotrainee()!  This is your personal account in our Cloud. Have a fun with it. Please feel free to contact with me(email to [email protected])    (http://www.biotrainee.com/thread-1376-1-1.html)
cat >file2
a
b
c
d
f
e
^C
cat file
cat file2
paste -s file file2
# 1 2   3   4   5   6
# a b   c   d   f   e

常见用法2 : paste - -

##paste 传递给两个 - -
seq 20
seq 20 |paste - -
# 1 2
# 3 4
# 5 6
# 7 8
# 9 10
# 11    12
# 13    14
# 15    16
# 17    18
# 19    20
#paste - - 把原来文件的每两行合并新文件的每两列，即两行换成两列

###三个- - -
 seq 20 |paste - - -
# 1 2   3
# 4 5   6
# 7 8   9
# 10    11  12
# 13    14  15
# 16    17  18
# 19    20
#每三行合并成三列

###可以一直在加- (抬杠)

10.tr命令：字符替换

常见参数:

-d##删除指定字符
-s##缩减连续重复字符（不常用就不讲了，自己查资料，探索）

常见用法：

 cat readme.txt | tr "a" "A"#把小写”a“替换成大写的”A“
# Welcome to BiotrAinee()!
#   This is your personAl Account in our Cloud.
# HAve A fun with it.
# PleAse feel free to contAct with me(emAil to [email protected])
# (http://www.biotrAinee.com/threAd-1376-1-1.html)

cat readme.txt | tr "abc" "ABC" ###abc逐个替换，就是把小写的a替换完再到吧，后到c。一对一转换

####删除单独的一个字符
##-d删除指定的字符，如删除小写a，
cat readme.txt | tr -d "a"
# Welcome to Biotrinee()!
#   This is your personl ccount in our Cloud.
# Hve  fun with it.
# Plese feel free to contct with me(emil to [email protected])
# (http://www.biotrinee.com/thred-1376-1-1.html)


####也可以一次删除多个不想要的字母
cat readme.txt | tr -d "abc"
# Welome to Biotrinee()!
#   This is your personl ount in our Cloud.
# Hve  fun with it.
# Plese feel free to ontt with me(emil to [email protected])
# (http://www.iotrinee.om/thred-1376-1-1.html)

补充知识：

要养成一个好习惯，原始数据不要轻易修改，修改的东西另外保存，文本保存用> (重定向)或是 >>(追加)

######> 重定向另存文件
###修改的文本的保存，用重定向符号>，原来的文件还在，
cat readme.txt | tr "a" "A" >file.txt
cat file.txt
cat readme.txt
###用重定向符号>后，命名要和原来的名字不一样，不然会被覆盖掉


######>> 追加另存文件
###两个重定向符号>>,表示追加，就是把修改的文件放到追加文件的后面
cat file##先查看file
# 1
# 2
# 3
# 4
# 5
# 6
cat readme.txt | tr "a" "A" >>file
cat file
# 1
# 2
# 3
# 4
# 5
# 6
# Welcome to BiotrAinee()!
#   This is your personAl Account in our Cloud.
# HAve A fun with it.
# PleAse feel free to contAct with me(emAil to [email protected])
# (http://www.biotrAinee.com/threAd-1376-1-1.html)

来自生信技能树小郭老师的总结：

Linux常用文本命令的小结

生信技能树小郭老师课件图

其它：

今天讲的10个Linux文本操作命令行，前5个命令是最经常用来查看文本，其中less是最常用的，只因它功能很强大，一旦用上less命令，键盘上很多按键都起作用，不要随便乱按，后续自行搜索用法。

今天的内容很多，不要求立马记住。如果要行使功能，马上想到对应命令，然后通过管道符|把多个命令串起来。（命令与命令之间是通过管道符|传递的）

wc把换行符统计在内，换行符也占用一个字符的空间

作业：

1. 用 less 查看 example.gtf ，然后管道符传递给 wc

less -S Data/example.gtf | wc
#237    6944   77781

2. 截取 example.gtf 第 9 列的内容

less -S Data/example.gtf | cut -f 9 | less -S
##以制表符分割，\t

3. 在第2步的基础上截取分号分割的第1列

 less -S Data/example.gtf | cut -f 9 | cut -d ";" -f 1
# gene_id "ENSG00000184731"
# gene_id "ENSG00000184731"
# gene_id "ENSG00000184731"

4. 在第3步的基础上排序、去重复并统计

##less -S Data/example.gtf | sort | cut -d ";" -f 9 | wc#自己做的
less -S Data/example.gtf | cut -f 9 | cut -d ";" -f 1| sort |uniq -c##答案
##   237     474    2133

5. 在第4步的基础上，将空格替换成制表符

##自己做的
less -S Data/example.gtf | sort | cut -d "\t" -f 9 | wc
##答案
less -S Data/example.gtf | cut -f 9 | cut -d ";" -f 1| sort |uniq -c|tr ' ' '\t'
##cta -A把制表符可视化出来
less -S Data/example.gtf | cut -f 9 | cut -d ";" -f 1| sort |uniq -c|tr ' ' '\t' |cat -A
^I^I^I^I^I^I8^Igene_id^I"ENSG00000177693"$
^I^I^I^I^I15^Igene_id^I"ENSG00000184731"$
^I^I^I^I^I^I3^Igene_id^I"ENSG00000221311"$

补充小知识：

清屏：control L

查看历史（输入过的）命令：history

Linux文本操作常用命令

1.cat命令: Concatenate 查看文本文件的内容，输出到屏幕

2.head、tail命令

3.less和more命令

• 上下左右键查看文本内容

• Enter 键向下移动一行

• 空格键翻页

• q 键退出

注意：使用键盘上的快捷键，一定要在纯英文状态下使用，其它按键有时间再自行探索

4.zless和zcat命令

5.wc命令：统计文本

6.cut：文本切割

7.sort 命令：排序

8.uniq命令：去除重复行

9.paste命令：文本合并

10.tr命令：字符替换

补充知识：

来自生信技能树小郭老师的总结：

作业：

注意，多个命令之间是用管道符|传递的

你可能感兴趣的:(Linux文本操作常用命令)

Linux文本操作常用命令

1.cat命令: Concatenate 查看文本文件的内容，输出到屏幕

2.head、tail命令

3.less和more命令

• 上下左右键 查看文本内容

• Enter 键 向下移动一行

• 空格键 翻页

• q 键 退出

注意：使用键盘上的快捷键，一定要在纯英文状态下使用，其它按键有时间再自行探索

4.zless和zcat命令

5.wc命令：统计文本

6.cut：文本切割

7.sort 命令：排序

8.uniq命令：去除重复行

9.paste命令：文本合并

10.tr命令：字符替换

补充知识：

来自生信技能树小郭老师的总结：

作业：

注意，多个命令之间是用管道符|传递的

你可能感兴趣的:(Linux文本操作常用命令)

• 上下左右键查看文本内容

• Enter 键向下移动一行

• 空格键翻页

• q 键退出