利用在线词典批量查询英语单词

进来遇到很多英语生词,工具书上给的解释错误百出,而很多在线词典不但可以给出某个单词的解释,而且有大量的示例,因此猜想利用在线词典批量查询这些单词。怎么实现呢?

首要问题是如何自动获取某个单词的解释。搜索之后,发现可以用curl实现,如

curl -s "http://www.google.com/dictionary?aq=f&langpair=en|en&q="$1"&hl=en" | html2text -nobs | sed '1,/^ *Dictionary/]/d' | head -n -5 | less 

请参见http://ubuntuforums.org/showthread.php?t=1591389 和 http://stackoverflow.com/questions/1617152/using-google-as-a-dictionary-lookup-via-bash-how-can-one-grab-the-first-definiti。

 

试过Google Dictionary之后发现, curl下来的网页用html2text转换时会报错:Input recoding failed due to invalid input sequence. 尝试了Python版的html2text之后,依然有大量的javascript和HTML代码残留。于是转而求助于百度词典——因为百度词典的搜索结果中没有Javascript语句,html2text一般可以完美转换。

 

转换完之后的文件如下,编码为UTF-8.

�新闻 网页 贴吧 知道 MP3 图片 视频 词典 [antiseptic ] 设置 | 帮助 [查百度词典] 把百度设为首页 语法标注解释 antiseptic英音:[,ænti'septik]美音:[,æntə'sɛptɪk] ***** ***** **** 以下结果由[bddict/source/img/logo.gif]译典通提供词典解释 **** 形容词 a. 1. 抗菌的,防腐的 2. 使用抗菌剂的,使用防腐剂的 antiseptic treatment 防腐处理 3. 未受感染的,无菌的,消过毒的 The technician had on an antiseptic white jacket. 那个技术员穿着消毒白色夹克。 4. 非常整洁的 5. 冷淡的,缺乏热情的 He nodded an antiseptic greeting. 他冷冷地点头打了个招呼。 名词 n. 1. 抗菌剂,防腐剂[C] **** 以下结果来自互联网网络释义 **** antiseptic 1. 防腐剂/消毒药 大学英语相似词辨析(13):ante-,a... antiseptic 防腐剂/消毒药 http://www.english-ex... 2. 防霉剂 北京译邦达翻译公司-2008 Januar... 防霉剂 antiseptic http://www.t-bond.com... 3. 防腐的;防腐剂 石油英语|能源动力行业英语第1521页 antiseptic防腐的;防腐剂 http://www.b2b99.com/... 4. 抗菌剂, 防腐剂 SAT化学词汇表Chapter 6_SAT... antiseptic抗菌剂, 防腐剂 http://www.24en.com/s... Antiseptic 1. 防腐剂 词博英语社区(生物词汇 A-F (1)[c... Antiseptic 防腐剂 http://www.cibo.biz/f... 显示更多网络释义结果 ©2011 Baidu 此内容系百度根据您的指令自动搜索的结果,不代表百度赞成被搜索网站的内容或立场 [http://c.baidu.com/c.gif?t=0&q=antiseptic&p=0&pn=0] [wd ] [s] **** 搜索框提示 **** *** 是否希望搜索汉字和英语时显示搜索框提示 *** #显示 o不显示 [Unknown INPUT type]  

 

显然上面的内容是不便于阅读的。为了提取有用信息,需要对上面的内容进行处理——下面的脚本参考了http://blog.csdn.net/jallin2001/archive/2009/11/13/4808618.aspx。

 

#!/usr/bin/perl -w ############### censor.pl ################# # Handle the explanations got from online dictionary. # Inputs: # ARGV[0] -- temparory file containning the explanations # ARGV[1] -- keyword ############################################ use strict; use Encode; my $syntax = Encode::decode('utf8', '语法标注解释 '); my $internet = Encode::decode('utf8', '以下结果来自互联网网络释义'); my $yingyin = Encode::decode('utf8', '英音'); my $meiyin = Encode::decode('utf8', '美音'); my $write_flag=0; open(EXP,$ARGV[0]); while (my $nextline=) { chomp($nextline); $nextline = Encode::decode('utf8', $nextline); if ($nextline =~ m/.*$syntax.*/) { $write_flag=1; $nextline =~ s/$syntax//; } elsif ($nextline =~ m/.*$internet.*/) { $write_flag=0; } if ($write_flag eq 1) { if ($nextline !~ m/.*/*/*/*/*.*/) { # Excluse lines containning **** # Add a space between the keyword and 英音/美音 $nextline =~ s/$ARGV[1]([$yingyin|$meiyin])/$ARGV[1] $1/; print encode("utf8",$nextline),"/r/n"; # In perl, /r/n is needed to add a new line } } }

运行上面的脚本后,可以得到如下的输出:

antiseptic 英音:[,ænti'septik]美音:[,æntə'sɛptɪk]   形容词 a. 1. 抗菌的,防腐的 2. 使用抗菌剂的,使用防腐剂的 antiseptic treatment 防腐处理 3. 未受感染的,无菌的,消过毒的 The technician had on an antiseptic white jacket. 那个技术员穿着消毒白色夹克。 4. 非常整洁的 5. 冷淡的,缺乏热情的 He nodded an antiseptic greeting. 他冷冷地点头打了个招呼。 名词 n. 1. 抗菌剂,防腐剂[C]

 

另外,如果要自动化查询一批英文单词,可以把它们写到一个文件中,然后用下面的脚本进行自动查询

#!/bin/bash # Command line look up using Google's define feature - command line dictionary WORDS=$(cat ./words) for word in $WORDS do # word=$1 #curl -s -A 'Mozilla/4.0' 'http://www.google.com/dictionary?langpair=zh-CN|en&hl=en&aq=f&q='$word >> test.html curl -s -A 'Mozilla/4.0' "http://dict.baidu.com/s?tn=dict&wd="$word | html2text > tmp 2>/dev/null echo "------------------ $word -------------------" >> mywords ./censor.pl tmp $word >> mywords rm -f tmp done

 

Update 2011-01-02:

 

终于找到了查询Google Dictionary的一种方法。

Google Dictionary对于单词abandon的解释可以利用URL:http://www.google.com/dictionary?langpair=en|zh-CN&q=abandon&hl=en&aq=f 得到,而网页的信息如下:

 

abandon in Chinese (Simplified) - Google Dictionary

Dictionary
Show examples Hide examples

Found in dictionary: English > Chinese (Simplified).

  • Add starRemove star abandon   /əˈbændən/ DJ listen    /ə'bændən/ KK
    • verb
      • to leave somebody, especially somebody you are responsible for, with no intention of returning (不顾责任、义务等)离弃,遗弃,抛弃 ~ sb (to sth) VN
      • to leave a thing or place, especially because it is impossible or dangerous to stay 不得已而放弃;舍弃 ~ sth (to sb/sth) VN
      • to stop supporting or helping somebody; to stop believing in something 停止(支持或帮助);放弃(信念) VN
      • to stop doing something, especially before it is finished; to stop having something 中止;放弃;不再有 VN
      • to feel an emotion so strongly that you can feel nothing else 陷入,沉湎于(某种情感) ~ yourself to sth literary VN
    • noun
      • an uncontrolled way of behaving that shows that somebody does not care what other people think 放任;放纵 uncountable written

English dictionary

  • a·ban·don
    • Complete lack of inhibition or restraint

Related phrases

Related languages

Synonyms

Web translations

abandon

  1. 放弃 abandon放弃离弃遗弃They were accused of abandoning their own principles i.eol.cn - Related search
  2. 抛弃 抛弃放弃 abandon 阿巴诺喹 Abanoquil 减轻减少消除 abate. 在线英语学习. 阿贝氏 www.scientrans.com - Related search
  3. 遗弃 abandon放弃离弃遗弃They were accused of abandoning their own principles i.eol.cn - Related search
  4. 舍弃 舍弃 abandon 缩写地址呼号 abbreviated address calling 异常终止倾印 abend dump www.scientrans.com - Related search
  5. 丢弃 abandon vt丢弃放弃抛弃 ability n能力能耐本领 able a有能力的出色的 abnormal a不 www.51jnjj.cn - Related search

Usage examples

  • "People died for this tournament, others were injured. We can't abandon them and leave like cowards," Alaixys Romao told French sports agency L'Equipe. "If we stay here, it's for them. But also so as not to give satisfaction to the rebels....
    Jan 10, 2010 -  Alaixys Romao -  BBC Sport (blog)
  • "The president would have us believe there are two choices: keep all of our troops in Iraq or abandon these Iraqis," Obama said. "I reject this choice."
    Sep 12, 2007 -  Barack Obama -  Forbes
  • "I am deeply disappointed that the governor has decided to abandon the state and her constituents before her term has concluded," Murkowski said.
    Jul 3, 2009 -  Lisa Murkowski -  Politico

Web definitions


Results partly provided by Dr.eye.

The usage examples, images and web definitions on this page were selected automatically by a computer program. They do not necessarily reflect the views of Google Inc. or its employees.


©2009 Google - Google Home - All About Google

注意到有一行信息:

 

 包含单词abandon的中文释义。于是可以利用curl获取到Google Dictionary的翻译网页,然后直接在获取的网页中查找上面那一行信息。

 

#!/bin/bash # Command line look up using Google's define feature - command line dictionary # gd -i -o [-w ] function query { typeset wd=$1 typeset TMPFILE=tmp.$wd typeset i=0 while ((i<5)) do curl -s -A 'Mozilla/4.0' "http://www.google.com/dictionary?langpair=zh-CN|en&hl=en&aq=f&q=$wd" >$TMPFILE 2>/dev/null if [ -s $TMPFILE ]; then break elif ((i=4)); then echo $wd >> failed.gd.$input fi ((i=i+1)) sleep 1 done perl -i -p -e "print STDOUT /$1,/"/r/n/" if (m/($wd.*:.*)- Google.*/);" $TMPFILE rm -f $TMPFILE } ############################################## # MAIN LINE START HERE ############################################## typeset input= typeset output= typeset word= typeset -i flag=0 while getopts 'i:o:w:' OPT do case $OPT in i) input=${OPTARG};; o) output=${OPTARG};; w) word=${OPTARG};flag=1;; *) echo -u2 "ERROR: Invalid argument [$OPT]" ;; esac done shift `expr $OPTIND - 1` if ((flag==0)); then perl -i.bk -p -e "s/^(/w+).*[/[//].*[/]//]//$1/;" $input # Eliminate the phonetic symbol WORDS=$(cat $input) for word in $WORDS do word_exp=$(query $word) if [ ${#word_exp} != 0 ]; then echo $word_exp >> $output else echo "$word:" >> $output fi done else word_exp=$(query $word) if [ ${#word_exp} != 0 ]; then echo $word_exp else echo "$word:" fi fi

 

2011-01-03 Update:

奉上完整版的程序:

#!/bin/bash # Command line look up using Google's define feature - command line dictionary # od [-b] [-g]-i -o [-w ] # gd - Query words using Google Dictionary function gd { typeset wd=$1 typeset TMPFILE=tmp.gd.$wd typeset -i i=0 while ((i<5)) do curl -s -A 'Mozilla/4.0' "http://www.google.com/dictionary?langpair=zh-CN|en&hl=en&aq=f&q=$wd" >$TMPFILE 2>/dev/null if [ -s $TMPFILE ]; then break elif ((i=4)); then echo $wd >> failed.gd.$input fi ((i=i+1)) sleep 1 done perl -i -p -e "print STDOUT /$1,/"/r/n/" if (m/($wd.*:.*)- Google.*/);" $TMPFILE rm -f $TMPFILE } # bd - Query words using Baidu Dictionary # function bd { typeset wd=$1 typeset TMPFILE=tmp.bd.$wd typeset -i i=0 while ((i<5)) do curl -s -A 'Mozilla/4.0' "http://dict.baidu.com/s?tn=dict&wd="$word | html2text > $TMPFILE 2>/dev/null if [ -s $TMPFILE ]; then break elif ((i=4)); then echo $wd >> failed.bd.$input fi ((i=i+1)) sleep 1 done ./censor.pl $TMPFILE $wd rm -f $TMPFILE } ############################################## # MAIN LINE START HERE ############################################## typeset input= typeset output= typeset word= typeset dict=z"gd bd" typeset -i word_flag=0 while getopts 'agbi:o:w:' OPT do case $OPT in g) dict=gd;; b) dict=bd;; i) input=${OPTARG};; o) output=${OPTARG};; w) word=${OPTARG};word_flag=1;; *) echo -u2 "ERROR: Invalid argument [$OPT]" ;; esac done shift `expr $OPTIND - 1` if ((word_flag==0)); then # perl -i.bk -p -e "s/^(/w+).*[/[//].*[/]//]//$1/;" $input # Eliminate the phonetic symbols WORDS=$(cat $input) for word in $WORDS do case $dict in *bd*) echo "------------------ $word -------------------" >> $output;; esac for d in $dict do word_exp=$($d $word) if [ ${#word_exp} != 0 ]; then echo $word_exp >> $output else echo "$word:" >> $output fi done done else for d in $dict do $d $word done fi 

 

#!/usr/bin/perl -w ############### censor.pl ################# # Handle the explanations got from online dictionary. # Inputs: # ARGV[0] -- temparory file containning the explanations # ARGV[1] -- keyword ############################################ use strict; use Encode; my $syntax = Encode::decode('utf8', '语法标注解释 '); my $internet = Encode::decode('utf8', '以下结果来自互联网网络释义'); my $yingyin = Encode::decode('utf8', '英音'); my $meiyin = Encode::decode('utf8', '美音'); my $baidu = Encode::decode('utf8', '此内容系百度根据您的指令自动搜索的结果'); my $write_flag=0; open(EXP,$ARGV[0]); while (my $nextline=) { chomp($nextline); $nextline = Encode::decode('utf8', $nextline); if ($nextline =~ m/.*$syntax.*/) { $write_flag=1; $nextline =~ s/$syntax//; } elsif ($nextline =~ m/.*$internet.*/ || $nextline =~ m/.*$baidu.*/) { $write_flag=0; } if ($write_flag eq 1) { if ($nextline !~ m/.*/*/*/*/*.*/) { # Excluse lines containning **** # Add a space between the keyword and 英音/美音 $nextline =~ s/$ARGV[1]([$yingyin|$meiyin])/$ARGV[1] $1/; print encode("utf8",$nextline),"/r/n"; # In perl, /r/n is needed to add a new line } } } close(EXP); 

你可能感兴趣的:(Unix/Linux,Perl)