哈希是 SAS 六种表查找技术中最为常用的技术,性能表现经评测也比较上乘,在SAS 产品中已被广泛使用,同时也深受国外SAS程序员的喜爱(国内的情况不太了解,相关的文章和论文几乎没有,而国外程序员光在SAS各种用户大会上的论文就有十多篇)。现将哈希使用中存在的陷阱和技巧整理如下,分享给大家。
本文以一个实际的查找问题为例,说明使用中存在的陷阱和技巧,最后给出作者推荐的哈希代码。
数据表:
Date | AreaCode | PhoneNum | ToAcronym | FromAcronym | SecretCode |
---|---|---|---|---|---|
2006-12-21 | 407 | 312-9088 | AFAIK | DQMOT | 103 |
2006-12-21 | 407 | 324-6674 | BEG | TU | 101 |
2006-12-21 | 407 | 312-9088 | BFN | SYS | 101 |
2006-12-21 | 407 | 312-9088 | BTDT | IHU | 102 |
2006-12-22 | 407 | 312-9088 | C&G | AFAIK | 103 |
生成样例数据的代码:
datadetail;
inputDate date9.
AreaCode
PhoneNum$8.
ToAcronym $
FromAcronym $
SecretCode;
formatDate YYMMDD10.;
datalines;
21DEC2006407 312-9088 AFAIK DQMOT 103
21DEC2006407 324-6674 BEG TU 101
21DEC2006407 312-9088 BFN SYS 101
21DEC2006407 312-9088 BTDT IHU 102
22DEC2006407 312-9088 C&G AFAIK 103
;
run;
查找表:
Acronym | Meaning |
---|---|
AFAIK | as far as I know |
AFK | away from keyboard |
ASAP | as soon as possible |
BEG | big evil grin |
BFN | bye for now |
BTDT | been there done that |
C&G | chuckle and grin |
DQMOT | don't quote me on this. |
IHU | i hear you |
SYS | see you soon |
生成样例数据的代码:
datalookup_1;
inputAcronym $
Meaning$30.;
datalines;
AFAIK as far as I know
AFK away from keyboard
ASAP as soon as possible
BEG big evil grin
BFN bye for now
BTDT been there, done that
DQMOT don't quote me on this.
IHU i hate you
SYS see you soon
;
run;
1、技巧 - 巧妙定义哈希对象的关键字变量和数据变量
哈希对象要求使用前必须先声明关键字变量(definekey)和数据变量(definedata),通常为了代码的通用性,我们不希望将变量类型和长度写死在程序里,这样当查找表的数据结构发生变化时不用重写代码。
无法适应各种数据结构的代码:
length lookup_key $8;
lengthdata $30;
if_n_ =1then
do;
declarehash hashLookup(dataset:'lookup_dataset');
hashLookup.definekey('lookup_key');
hashLookup.definedata('lookup_data');
hashLookup.definedone();
end;
解决的办法如下:
if 0 thenset lookup_dataset;
if_n_ =1then
do;
declarehash hashLookup(dataset:'lookup_dataset');
hashLookup.definekey('lookup_key');
hashLookup.definedata('lookup_data');
hashLookup.definedone();
end;
2、技巧 -消除日志中的未初始化提示
如果代码里没有用call missing对哈希对象中使用的变量进行初始化,则SAS日志中会输出下面的提示:
NOTE: Variable lookup_key is uninitialized.
NOTE: Variable lookup_data is uninitialized.
解决的办法:
callmissing(lookup_key, lookup_data);
3、陷阱 - 查找函数成功返回值为0
find()函数查找到匹配的值返回0,否则返回非0的错误代码,这个跟其他语言不太一样。
错误的代码:
if hashLookup.find(key:lookup_key) then finaldata = lookup_data;
正确的代码:
if hashLookup.find(key:lookup_key)=0 then finaldata = lookup_data;
4、陷阱 - 查找失败时返回不正确的值
下面的代码在查找失败时会返回上次匹配成功的值,产生不正确的数据。
rc=hashLookup.find(key:lookup_key);
finaldata = lookup_data;
运行的结果:
Date | AreaCode | PhoneNum | ToAcronym | FromAcronym | SecretCode | ToMeaning |
---|---|---|---|---|---|---|
2006-12-21 | 407 | 312-9088 | AFAIK | DQMOT | 103 | as far as I know |
2006-12-21 | 407 | 324-6674 | BEG | TU | 101 | big evil grin |
2006-12-21 | 407 | 312-9088 | BFN | SYS | 101 | bye for now |
2006-12-21 | 407 | 312-9088 | BTDT | IHU | 102 | been there done that |
2006-12-22 | 407 | 312-9088 | C&G | AFAIK | 103 | been there done that |
正确的代码如下所示:
rc=hashLookup.find(key:lookup_key);
ifrc=0then finaldata = lookup_data;
运行的结果:
Date | AreaCode | PhoneNum | ToAcronym | FromAcronym | SecretCode | ToMeaning |
---|---|---|---|---|---|---|
2006-12-21 | 407 | 312-9088 | AFAIK | DQMOT | 103 | as far as I know |
2006-12-21 | 407 | 324-6674 | BEG | TU | 101 | big evil grin |
2006-12-21 | 407 | 312-9088 | BFN | SYS | 101 | bye for now |
2006-12-21 | 407 | 312-9088 | BTDT | IHU | 102 | been there done that |
2006-12-22 | 407 | 312-9088 | C&G | AFAIK | 103 |
完整的代码
dataresults;
if0 thenset lookup_1;
dropAcronym Meaningrc;
if_n_ =1then
do;
declarehash hashLookup(dataset:'lookup_1');
hashLookup.definekey('Acronym');
hashLookup.definedata('Meaning');
hashLookup.definedone();
callmissing(Acronym, Meaning);
end;
setdetail;
rc=hashLookup.find(key:ToAcronym);
ifrc=0then ToMeaning = Meaning;
run;