时间:Larry Wall 1987
Practical Extraction and Reporting Language
perl第三方包网站
学习书籍:Perl 语言入门(小骆驼)
生信领域需要对perl进行学习
perl环境搭建:eclipse生信编程环境搭建
整数
浮点数
加减乘除 + - * /
求余数% 求幂**
单引号/双引号引起来的字符为字符串
perl中单引号与双引号区别
\t 表示tab空白
\n 表示换行符
print "one day\n";
print "one \"day\"\n"; #转义双引号
print "one \\day\n"; #转义反斜杠
使代码更清晰
qq(string in "qq") #="string in \"qq\""
q(string in 'q') #='string in \'q\''
*()可换乘<>,{},[]等配对字符,如qq/string in "qq"/
print "hello"." word"."\n"; #hello world
print length("hello");
print substr("ATCCGG",3,3); #从第三个字符开始取三个字符
print substr("ATCCGG",3,-3); #去掉开头和结尾三个字符
perl在某些情况下会自动转换print 12+"12"
my $str = "aa:bb:cc";
my @split_str = split (/:/, $str, 2);
#split(/分隔符/, 变量名,分隔成几个元素),返回数组
print "@split_str";
my $str = "aa:bb:cc\n";
chomp $str;
变量是存储在内存中的数据,创建一个变量就会在内存上开辟一个空间
标量变量用 $ ,声明变量用 my my$name;
Perl 变量命名规则
变量赋值符号 =
my$number=10;
my$number1=$number+20;
my$name="abc\n";
print $name;
my$love="I love " . $name;
变量创建不赋值,默认为undef
my$number=undef;
/ my$number;
注:空字符串不等于未定义
只在双引号中支持,单引号中不支持
my $name="abc";
print "this is $name\n"; #=print "this is " . $name . "\n"
如何避免变量内插变量名歧义
my $name = "abc";
my $names = "efg";
print "this is $names\n"; # this is efg
print "this is ${name}s\n"; #this is abcs
比较 | 数字 | 字符串 |
---|---|---|
相等 | == | eq |
不等 | != | ne |
小于 | < | lt |
大于 | > | gt |
小于/等于 | <= | le |
大于/等于 | >= | ge |
布尔表达式:
35 ==35.0
#返回真
"35" eq "35.0"
#返回假(当作字符串比较)
my $num = 10;
if($num > 20){
#if true
print "$num is larger than 20\n";
}
print "\$num is $num\n";
注意perl对于字符串和数字的自动转换
固定为false: 0 / undef / “”
my $var = 0;
if($var){
print "$var is true\n";
}else{
print "$var is false\n";
}
my $ num =100;
if ($num > 200){
print "\$num > 200\n";
}elsif($num >100){
print "\$num >100\n";
}else{
print "\$num is very small\n";
}
my $num = 100;
if($num > 100 and $num < 200){
print "\$num >100 and \$num <200\n";
}
if(($num > 100 || $num >80) and $num < 200){
print "\$num >80 and \$num <200\n";
}
多个条件同时为真:and &&
多个条件有一个为真:or ||
多个条件时,通过()来表明判断顺序
列表(list):标量的有序集合
数组(array):存储列表的变量
数组创建:@ my @arry = (22,"perl")
列表用()产生
(1,2,3)
包含三个数字的列表
(1,2,3,)
同上,末尾逗号被忽略
(1,"perl")
标量类型可混合
()
空列表
产生连续的数字列表…
(1..5)
(1,2,3,4,5)
(1.5..5.7)
同上,小数部分被忽略
(1,5..8)
(1,5,6,7,8)
($m..$n)
可以用变量表示,范围由 m 和 m和 m和n当前的值来决定
my ($ID, $gene, $num) = ("ID1", "ATTCG", undef);
如果列表中元素数量大于变量数量,列表中多余元素被忽略
my @giant = 1..1000; #包含1000个元素的列表
my @merge = (@giant, @giant); #连接两个数组
my $name = "perl";
my @merge1 = (@giant, $name, "perl", @giant) #可混搭
my @tiny = () #空列表
my @rocks = qw(bedrock slate lava) #可以省略逗号和引号
print @rocks;
打印时不会在数组的元素间加空格print "@rocks\n";
打印时会在数组的元素间加空格,即perl支持字符串中数组内插my @merge = (@giant, @giant);
my @rocks = qw(bedrock slate lava)
my $join_string = join("\t", @rocks);
print "join: $join_string\n"; #join: bedrock slate lava
print join("-", @rocks)."\n"; #bedrock-slate-lava
$array[0]
标量$#array
数组最后一个元素的索引值$array[$#array]
@array[0..2]
数组my @names = (0..10);
print "@names[2..5]"; #2 3 4 5
print "@names[(2,4,6)]\n"; #2 4 6
print "@names[2,4,6]\n"; #2 4 6
添加元素push
取出元素pop,取出结尾的元素后原数组会少一个元素
my @names = (0..10);
pop(@names); #直接扔掉最后一个元素
pop @names; #省略括号效果同上
my $last = pop(@names);
print "\$last is $last\n"; #8
push(@names, 100); #在数组后追加100
push @names, 100; #效果同上
push @names, 1..3; #可以追加列表
my @num = 1..5;
push @names, @num; #可以追加数组,即实现数组合并
添加元素unshift
取出元素shift
my @names = (0..10);
shift @names; #去掉0
my @num = 1..5;
unshift @names, @num; #可以在开头添加数组,即实现数组合并
my @names = (0..10);
for(my $i = 0; $i < @names; $i ++){
print "$i : $names[$i]\n";
}
my @names = (0..10);
#按数组元素顺序直接循环数组本身
for my $value(@names){
print "value : $value\n";
}
#元素为数字时的循环方法
for my $i(0..$#names){
print "value: $names[$i]\n";
}
#同时输出索引
my $i=0;
for my $value(@names){
$i++;
print"$i: $value\n";
}
排序特殊变量$a, $b
按字符排序,用cmp
my @languages = qw(perl python r java);
my @languages_sorted = sort {$a cmp $b} @languages;
print "@languages_sorted\n";
按数值排序用<=>
my @names = (3,7,32,1,5,16);
my @names_sorted = sort {$a <=> $b} @names;
print "@names_sorted\n";
如果数组中即有字符串又有数字,用cmp
倒序:sort {$b cmp $a} @languages
perl最独特的特性就在于,它的代码对于上下文是敏感的。
每个表达式要么在scalar上下文中求值,要么在列表上下文求值
my $scalar = "mendeleev"; #赋值,标量上下文
my @array = ("alpha","bata","gamma","pie"); #列表上下文
my ($perl,$python,$R) = ("perl","python","R"); #列表上下文
my $perl = ("perl","python","R"); #标量上下文,将列表的长度赋值给标量
print "$perl\n"; #3
my ($perl) = ("perl","python","R"); #列表上下文
print "$perl\n"; #perl #把第一个元素赋值给标量
列表上下文用于数组赋值/标量批量赋值
强制标量上下文获得数组长度
@name = qw(perl R python shell);
print @name,"/n"; #perlRpythonshell
print scalar @name,"/n"; #3 强制标量上下文
获得字符串和数组长度
my @names = (0..10);
my $num = @names;
print "$num\n";
my $name = "perl";
print length($name)."\n";
数组:索引-值,有顺序
哈希:键keys-值values(键不能重复,值可以重复),没有顺序
my %last_name = ("zhangsan"=>"zhang", "lisi"=>"li", "wangwu"=>"wang");
my %last_name = (
"zhangsan"=>"zhang",
"lisi"=>"li",
"wangwu"=>"wang"
);
my %last_name = (); #空哈希
$last_name{"zhangsan"};
注意:$ {}
访问数组的值:$array[0]/@array[0…3]
$last_name{"zhaosi"} = "zhao";
如果键出现重复,将覆盖该键原来的值
delete $last_name{"zhaosi"};
if (exist $last_name{"zhaosi"}){
print "value existed: \$last_name{"zhaosi"};
}else{
print "value not existed: \$last_name{"zhaosi"};
}
为何要用exist判断?
用exist判断,防止代码终止。
keys获取哈希所有的键
values获取哈希所有的值
返回列表,注意没有顺序
my %last_name = ("zhangsan"=>"zhang", "lisi"=>"li", "wangwu"=>"wang");
my @my_key = keys%last_name;
my @my_value = values%last_name;
while获取哈希的键值对
my %last_name = ("zhangsan"=>"zhang", "lisi"=>"li", "wangwu"=>"wang");
while(($k,$v)=each %last_name){
print "$k => $v \n";
}
foreach获取哈希的键值对
my %last_name = ("zhangsan"=>"zhang", "lisi"=>"li", "wangwu"=>"wang");
for my $k ( keys %last_name){
my $v = $last_name{$k}
print "$k => $v \n";
print "another: $k => $last_name{$k}\n";
}
循环键值对并排序
my %last_name = ("zhangsan"=>"zhang", "lisi"=>"li", "wangwu"=>"wang");
for my $k ( sort {$a cmp $b} keys %last_name){
print "another: $k => $last_name{$k}\n";
}
#{$a cmp $b} 可省略
注意:open打开的是文本文件,或记事本可以打开的文件
open (IN, "D:/out.txt") || die "can not file : D:\\out.txt";
# >>追加内容,>覆盖原内容/新建文件
my$line1=; #只读一行,字符串
print OUT $line1 . "\n";
my@lines=; #列表上下文,每一行为列表的一个元素
print OUT "@lines" ;
#通常不会这么做,占内存
#循环读入数据
while(my $line = ){
print OUT $line;
}
close(IN);
close(OUT);
perl特殊变量$!:用于存储错误信息
open (IN, "
perl特殊变量$_:老地方,在不声明的时候使用的默认变量
open (IN, "D:/out.txt") || die "can not file : D:\\out.txt";
#循环读入数据
while(my $line = ){
print OUT $line;
}
#循环读入数据,效果同上
while( ){
print OUT "$_";
}
close(IN);
close(OUT);
正则表达式规则
#查找匹配
my $bar = "I am a robot. Welcome to robot site.";
if ($bar = ~m/robot/){
print "matched robot\n";
}else{
print "did not match robot\n";
}
#转换
#生成DNA序列的反向互补序列
my $DNA = "ATTGGCCAT";
$DNA = ~tr/ATCG/TAGC/;
$DNA = ~tr/A-Z/a-z/; #大小写转换
$DNA = reverse($DNA); #反向
print $DNA."\n";
#替换
my $z = "I am a robot\n Welcome to Robot site\n robot";
$z = ~s/robot/AAA/; #只对第一个robot进行替换
$z = ~s/robot/AAA/gi; #修饰符g替换全局robot,i忽略大小写
$z = ~s/robot$/AAA/gi;
#铆定行尾,一个字符串行尾只有一个,I am a robot\n Welcome to Robot site\n AAA
$z = ~s/robot$/AAA/gim;
#修饰符m多行模式,可以有多个行尾,I am a AAA\n Welcome to Robot site\n AAA
#捕获
my $f = "0/1:20,6:26:99 0/0:20,5:25:99";
$f = ~/([01]\/[01])/; #捕获第一个0/1
my $value = $1;
print $value."\n";
#捕获全部0/1,0/0
my $f = "0/1:20,6:26:99 0/0:20,5:25:99";
@gene = ($f = ~/([01]\/[01])/g); #列表上下文
print "@gene\n";
#捕获+计数
my $f = "0/1:20,6:26:99 0/0:20,5:25:99";
$num = () = $f = ~/([01]\/[01])/g; #标量上下文
print "$num\n";
#捕获+替换
my $f = "0/1:20,6:26:99 0/1:20,5:25:99";
$f = ~s/([01])\/([01])/$2\/$1/g; #将所有0/1替换成1/0
print $f."\n";
my $DNA = "AATAAT AAGCCG";
$DNA = ~s/AA(.)(.)\2\1/GGGG/g;
#在第一组反斜杠内调用捕获内容可以直接用\1
print $DNA."\n";
实现复杂数据的读取和存储
ref($refname)
#引用
$value = "ATCG";
$valueRef = \$value;
#去引用
$deref = ${$valueRef};
print "$value\n$valueRef\n$deref\n";
#匿名引用
$valueRef = \"ATCG";
print "$valueRef\n${$valueRef}\n";
#引用
@seq = qw(ATCG GCTA AAGG);
$arrayRef = \@seq;
#去引用
@deref = @{$arrayRef};
print "$arrayRef\n@deref\n";
#取出数组元素
print $arrayRef->[0], $arrayRef->[1], $arrayRef->[2]."\n";
#匿名引用
$seq = ["ATTTAC", "AACCGG","GGCCTT"]; #注意是[]不是()
print $seq->[0], $seq->[1], $seq->[2]."\n";
#多维数组
$seqs = [["ATTTAC", "AACCGG","GGCCTT"],["ATTTAC", "AACCGG","GGCCTT"],["ATTTAC", "AACCGG","GGCCTT"]];
print $seqs."\n";
print $seqs->[0], $seqs->[1], $seqs->[2]."\n";
print $seqs->[0][1], $seqs->[1]->[0], $seqs->[2]."\n";
#引用
use Data::Dumper;
my %hash = ("k1"=>"abc1","k2"=>"abc2","k3"=>"abc3");
my $refhash = \%hash;
#去引用
my %deref = %{$refhash};
print $refhash."\n";
print Dumper (\%deref)."\n"; #打印哈希
#取出键值对,两种方法
for my $k(keys %{$refhash}){
print "key: $k\t$$refhash{$k}\n";
}
for my $k(keys %$refhash){
#去引用时,{}可以省略
print "key: $k\t$refhash->{$k}\n";
}
#匿名引用
my $refhash1 = {"k1"=>"abc1","k2"=>"abc2","k3"=>"abc3"}; #注意{}
print $refhash1."\n";