【sed&awk】数据的检索

  作者:zhanhailiang 日期:2012-12-14

在格式化文本中查找缩写词相应的完整词语,如输入“Basic”,返回其全称“Beginner's All-Purpose Symbolic Instruction Code”

下面这个缩写词列表可理解为一个简单的数据库:

zhanhailiang@linux-06bq:~> cat acronyms
BASIC   Beginner's All-Purpose Symbolic Instruction Code
CICS    Customer Information Control System
COBOL   Common Business Oriented Language
DBMS    Data Base Management System
GIGO    Garbage In, Garbage Out
GIRL    Generalized Information Retrieval Language

编写acro脚本,它从命令行中获取第一个参数(首字母缩写词的名字)并将其传递给awk脚本,如下

zhanhailiang@linux-06bq:~> cat acro
#!/bin/sh

awk 'BEGIN {FS="\t";}; tolower($1) == tolower(search) {print $2;}' search=$1 acronyms

在shell命令行中的第一个参数($1)被赋值给变量search,这个变量作为参数传递给awk程序。下面演示如何使用这个程序在列表中找到特殊的首字母缩写词(不区分大小写)。

zhanhailiang@linux-06bq:~> ./acro Gigo
Garbage In, Garbage Out

使用关联数组来实现数据的检索:

#!/bin/sh

#awk 'BEGIN {FS="\t";}; tolower($1) == tolower(search) {print $2;}' search=$1 acronyms

awk 'BEGIN {
    FS="\t";
    search = tolower(search);
};
{
    array[tolower($1)] = $2;
}
END {
    if(search in array)-
        print array[search];
}' search=$1 acronyms

接下来介绍一种更复杂的检索方法(同时交互性更强):

zhanhailiang@linux-06bq:~> cat glossary 
BASIC   Beginner's All-Purpose Symbolic Instruction Code
CICS    Customer Information Control System
COBOL   Common Business Oriented Language
DBMS    Data Base Management System
GIGO    Garbage In, Garbage Out
GIRL    Generalized Information Retrieval Language
zhanhailiang@linux-06bq:~> cat lookup 
awk '
    BEGIN {
        FS = "\t";
        OFS = "\t";
        printf("Enter a glossary term: ");
    };

    FILENAME == "glossary" {
        entry[tolower($1)] = $2;
        next;
    };

    tolower($0) ~ /^(quit|q|exit|x)$/ {
        exit;
    };

    $0 != "" {
        if(tolower($0) in entry) {
            print entry[tolower($0)];
        } else {
            print $0 " not found.";
        }
    };

    {
        printf("Enter another glossary term (q to quit)");
    };
' glossary - # 从标准输入流中读取缩写,查看glossary中是否有相应的匹配

你可能感兴趣的:(sed)