使用awk对文档中特定字段的排序

使用awk对文档中特定字段的排序
----------------------------------------------------

1. 问题定义

现在要对如下文档按特定字段排序,
lemo@debian:~/Testspace/awk$ cat file 
Name    Sex Salary  
Lemo    man 4000  
Jok woman   3000  
Job man 6000  
Petty   man 9000

我现在想对其按Salary从高到低进行排序,这时我们就会想到用数据库进行排序,这样的话要把这些数据导入数据库,如Mysql, sqlite等,但当我们手头没有这些数据库时,其实我们可以用一个叫做awk的程序设计语言来处理,它的优点是非常适合处理结构化数据。


2. 代码分析

# @auth lemo.lu 
# @data 2011.11.16 
# use to sort the record with specific field 
# BEGIN BLOCK 
BEGIN{ 
    print "BEGIN SORT" 
} 
############################################## 
# NR - number of records , start from 1 
# NF - number of fields 
# $0 - the whole line  
# $1,$2 - the first column as $1, the second column as $2 
############################################## 
NR==1 {  # if number of record equals to 1 
    for (i=1; i<=NF; i++) {  
    if ( $i == fldName ) {  
fldNr = i
    }  
    }  
head = $0  # store field names that is store in first line 
    next       # like continue 
}  
# 32 - width => the result is padded with black spaces if the value to be printed 
# is shorter than this number, There , if the length of sorted value is larger that  
# 32, the result will be wrong 
{ values[NR] = sprintf("%32s%s", $fldNr, $0) }  
# END BLOCK 
END {  
n = asort(values) # sorts the content of values and return the number of elments in the array source, the default order is ascending.) 
    print head  
    if( "des" == order) 
    { 
    for (i=n; i>=1; i--) {  
        print substr(values[i],33)  # remove the sort values 
    }  
    } 
    else{ 
    for (i=1; i<=n; i++) {  
        print substr(values[i],33)  # remove the sort values 
    }  
    } 
}
我们可以把这代程序放在test.awk文件中,以gawk命令调用之。

3. 运行结果

下面是一个升序的结果


lemo@debian:~/Testspace/awk$ gawk -v fldName="Salary" -f tst.awk file 
BEGIN SORT 
Name    Sex Salary  
Jok woman   3000  
Lemo    man 4000  
Job man 6000  
Petty   man 9000



下面是一个降序的结果


emo@debian:~/Testspace/awk$ gawk -v fldName="Salary" -v order="des" -f tst.awk file 
BEGIN SORT 
Name    Sex Salary  
Petty   man 9000 
Job man 6000  
Lemo    man 4000  
Jok woman   3000

呵呵,是不是很简单啊。一般来说,awk会结合sed来用,sed是一个字符流编辑器,可以很好的完成对多个文件的一系列编辑工作。


转载出处:http://blog.csdn.net/amuseme_lu/article/details/6985434

你可能感兴趣的:(排序,awk,gawk)