Scalability of Table Lookup Techniques(摘录)

author:Rick Langston, SAS Institute Inc., Cary, NC

 

 

THE FIVE TECHNIQUES
For the purposes of comparing resource utilization, we created a SAS data set of key/label pairs that consists of
incrementing numbers from 1 to n, with labels being the number using the Z8. format. For example, the key value of 1
had the label 00000001. The base table is the SAS data set that contains the keys to look up, and the lookup table is
the SAS data set that contains the key/label pairs. For simplicity, we used the same table for the base table and the
lookup table. The base table was sorted using a random value to properly exemplify real-world scenarios where the
key is not in sorted order in the base table. For further simplicity, the base table used start and label as variable
names so that it could be treated as a CNTLIN= data set.


Using the base table named temp, we tried five standard techniques for table lookup. The following SAS code was
used for each technique:

 

/*-----PROC FORMAT and PUT function-----*/
proc format cntlin=temp; run;
data _null_; set temp(rename=(label=shouldbe)) end=eof;
    label=put(start,$testfmt.);
    if label=shouldbe then matched+1;
    if eof;
    if matched=_n_ then put 'all matched';
    else put 'not all matched';
run;

/*-----hash object-----*/
data _null_; set temp(rename=(label=shouldbe)) end=eof;
      length label $20;
      retain label ' ';

      if _n_=1 then do;
          declare hash ht(dataset:"temp");
          ht.defineKey("start");
          ht.defineData("label");
          ht.defineDone();
          end;

     rc = ht.find();
     if rc = 0 then do;
         if label=shouldbe then matched+1;
         end;
     if eof;
     if matched=_n_ then put 'all matched';
     else put 'not all matched';
run;

/*-----merge-----*/
proc sort data=temp out=temp2(drop=random rename=(label=shouldbe)); by start;
run;
proc sort data=temp out=lookup(drop=random); by start;
run;
data _null_; merge temp2(in=want) lookup end=eof; by start;
       if label=shouldbe then matched+1;
       if eof;
       if matched=_n_ then put 'all matched';
       else put 'not all matched';
run;

/*-----key= usage-----*/
data lookup(index=(start)); set temp(keep=start label);
      run;
data _null_; set temp(keep=start label rename=(label=shouldbe)) end=eof;
      set lookup key=start;
      if label=shouldbe then matched+1;
      if eof;
      if matched=_n_ then put 'all matched';
      else put 'not all matched';
run;


/*-----SQL inner join-----*/
proc sql;
      create table merged as
            select label,shouldbe from temp a inner join temp(rename=(label=shouldbe)) b
                 on a.start = b.start;
quit;

data _null_; set merged end=eof;
     if label=shouldbe then matched+1;
     if eof;
     if matched=_n_ then put 'all matched';
     else put 'not all matched';
run;

 

 

后记

还有几篇文章对此作了阐述,只有细微的不同:

 

http://www2.sas.com/proceedings/forum2008/095-2008.pdf

http://www2.sas.com/proceedings/sugi27/p011-27.pdf

 

你可能感兴趣的:(table,null,Random,dataset,Scalability,Numbers)