今天看了一篇文章,《INTRODUCTION TO PROC TABULATE》,作者Thomas J. Winn, Jr。这一篇文章比很多书上讲的要详细的多,包括《The little sas book》以及帮助文档。文章的将制表步骤分为五部分,大致内容如下(对应proc tabulate基本语法):
制表基本语法:
PROC TABULATE <option-list>;
CLASS <class-variable-list>;
VAR <analysis-variable-list>;
TABLE <<page-expression,>
row-expression,>
column-expression
</ table-option-list>;
BY <NOTSORTED> <DESCENDING> variable-1
< ... <DESCENDING> variable-n>;
FORMAT variable-list-1 format-1
<... variable-list-n format-n>;
FREQ variable;
KEYLABEL keyword-1=‘description-1’
< ... keyword-n=description-n>;
LABEL variable-1=‘label-1’
< ... variable-n=label-n>;
WEIGHT variable;
第一步:分类变量的设定,该步要用到CLASS语句。
例:PROC TABULATE DATA=sashelp.prdsale;
CLASS country division product;
第二步:分析变量的设定,该步用到VAR语句.
例:PROC TABULATE DATA=sashelp.prdsale;
CLASS country division product;
VAR actual;
第三步:表格维度的定义,该步用TABLE语句。table语句中经常用到逗号,两个逗号三维表,一个逗号二维表,没有逗号表只有一维也就是只有列而且只有一行。还有*号,通常表示分类变量的交互。括号表示一组.
形如:TABLE page-expression, row-expression, column-expression . . .;
TABLE row-expression, column-expression . . .; or
TABLE column-expression . . .;
例:TABLE var1, var2; TABLE var1, var2 var3; TABLE var1, var2*var3;
TABLE var1, var2 * (var3 var4) var5 ;等等
第四步:统计量的设定,诸如N,MIN,MAX,MEAN,STD,VAR,MEDIAN,SKEWNESS,SUM,PCTN,PCTSUM等等。这一步情况最复杂,内容也是最多,仅列一例。
例:PROC TABULATE DATA=tabwkshp.empldata;/*empldata是作者合并sas9.1 sample数据库empinfo, jobcodes, and salary得到的*/
CLASS jobcode location gender;
TABLE jobcode, location*gender; /*生成行为jobcode,列分层location和gender的频数表*/
TABLE jobcode*PCTN, location*gender;/*百分数的表,表类型同上*/
TABLE jobcode, location*gender*PCTN;/*同上*/
TABLE jobcode PCTN, location* gender;/*生成频数表,但会多出一列数据总量的百分比*/
第五步:设置表格的格式和标签。
例:PROC FORMAT;
VALUE salfmt low-<12000 = ‘Less than $12,000’
12000-<24000 = ‘$12,000 - $23,999’
24000-<48000 = ‘$24,000 - $47,999’
48000-<72000 = ‘$48,000 - $71,999’
72000-<96000 = ‘$72,000 - $95,999’
96000-<120000 = ‘$96,000 - $119,999’
120000-high = ‘$120,000 or more’;
RUN;
ODS HTML BODY=‘tables.htm’;
PROC TABULATE DATA=sashelp.empldata MISSING FORMAT=9.1;
CLASS title location gender salary;
FORMAT salary salfmt.;
LABEL title = ‘Job Title’;
KEYLABEL PCTN = ‘Percent’ ALL = ‘Total’;
TABLE title*salary ALL, (location*gender ALL)*PCTN
/ RTS=50 MISSTEXT=‘0’;
TITLE ‘Tabular Summary of Employee Information’;
RUN;
ODS HTML CLOSE;
RUN;