对数据进行排序,打印和汇总
1
.程序步的一般形式
PROC whatever
DATA=
BY
TITLE
FOOTNOTE
LABEL
Out=
2
.在程序步中求数据的子集
WHERE condition;
3
.
对数据集进行排序
PROC SORT DATA=
OUT= ;
BY ;
RUN;
4
.输出数据集
PROC PRINT DATA= data-set NOOBS LABEL;
其它选项
BY variable-list;——starts a new section in the output for each new value of the BY variables and prints the values of the BY variables at the top of each section.
ID variable-list;——do not print the observation numbers;
SUM variable-list;——prints sums for the variables in the list;
VAR variable-list;——specifies which variables to print and the order.
FORMAT statement
PUT statement
5
.使用FORMAT
程序步自定义数据格式
PROC FORMAT;
VALUE name range-1=’formatted-text-1’
range-2=’formatted-text-2’
6.
使用MEANS
数据步汇总数据集
(1) 最简单的MEAN程序步形式
PROC MEANS options;
若不使用任何附加选项,MEANS将输出所有变量的统计信息,主要由:非缺失值个数,均值,方差,最大和最小值。
(2)MEANS提供的描述性统计变量
除了上面几种描述性变量以外,MEANS提供了超过30种不同的统计变量,如:
MAX / MIN / MEAN/MEDIAN / N / NMISS / RANGE / STDDEV / SUM
(3)MEANS程序步后可选的语句
BY variable-list: 对每个分组变量分别进行统计;
CLASS variable-list: 同样对每个分组变量分别进行统计,但其输出比BY语句更复杂;
VAR variable-list: 对制定变量进行统计分析。默认情况下,SAS将对所有的数值型变量进行统计分析。
(4)将统计值写入SAS数据集
使用ODS或者OUTPUT语句
OUTPUT OUT=data-set output-statistic-list;
Statistic(variable-list)=name-list;
新数据集将包括在output-statistic-list中定义的所有变量以及在BY和CLASS语句中所定义的变量,并增加两个变量:_TYPE_ 和_FREQ_。如果用户未使用BY和CLASS语句,新数据集将仅有一条观测
7.
使用FREQ
程序步统计类别变量(categorical variable
)
(1)基本形式
PROC FREQ;
TABLES variable-combinations;
用户可以在一个TABLES语句中指定多张统计表,也可以在FREQ程序步中使用多个TABLES语句
(2)TABLES语句后的可选项
LIST——print cross-tabulations in list format rather than grid;
MISSING——includes missing values in frequency statistics;
NOCOL——suppresses printing of column percentages in cross-tabulations;
NOROW——suppresses printing of row percentages in cross-tabulations;
OUT=data-set——writes a data set containing frequencies.
8.
使用TABULATE
程序步
(1
)最简单的形式
PROC TABULATE;
CLASS classification-variable-list;
TABLE page-dimension, row-dimension, column-dimension;
默认情况下,有缺失值的观测TABULATE程序步将不做统计,若有需要,可在程序步中使用MISSING选项。
在不适用任何选项的情况下,TABULATE程序步仅输出CLASS语句列示的变量的计数。
(2
)输出统计值
TABULATE程序步中,CLASS语句列示类别变量,而VAR语句列示连续变量。
PROC TABULATE;
VAR analysis-variable-list;
CLASS classification-variable-list;
TABLE page-dimension, row-dimension, column-dimension;
用户可以既使用CLASS语句也使用VAR语句,或者仅适用其中一种语句。但是,TABLE语句中所出现的变量,必须是要么在CLASS语句中,要么在VAR语句中。
TABLE语句中,除了变量的名字,每一个纬度均可以包括一些统计变量,主要有:
ALL:
adds a row, column, or pages showing the total;
MAX:
highest value;
MIN:
lowest value;
MEAN:
the arithmetic mean;
MEDIAN:
the median;
N:
number of non-missing values;
NMISS:
number of missing values;
P90:
the 90th percentile;
PCTN:
the percentage of observations for that group;
PCTSUM: the percentage of a total sum represented by that group;
STDDEV:
the standard deviation;
SUM:
the sum;
(3
)Concatenating, crossing, and grouping:
在同一纬度内,变量和统计量可以进行连接,交叉和分组处理。
连接变量:将变量和统计量以空格分隔列示;
交叉分析:将变量和统计量以“*”连接列示;
分组分析:将变量和统计量包括在一组“()”内。
(4
)Enhancing the Appearance of PROC TABULATE Output
FORMAT=options
在程序步语句中使用FORMAT=option 选项
PROC TABULATE FORMAT=COMMA 10.0;
BOX= and MISSTEXT= option:
在Table
语句中使用
The BOX=option allows you to write a brief phrase in the normally empty box that appears in the upper left corner of every TABULATE report.
The MISSTEXT=option, specifies a value for SAS to print in empty data cells.
(5
)Changing Headers in PROC TABULATE Output
TABULATE reports have two basic types of headers: headers that are the values of variables listed in a CLASS statement, and headers that are the names of variables and keywords. You use different methods to change different types of headers.
CLASS variable values
To change headers which are the values of variables listed in a CLASS statement, use the FORMAT procedure to create a user-defined format. Then assign the format to the variable in a FORMAT statement.
Variable names and keywords
To change headers which are the names of variables or keywords, put an equal sign after the variable or keyword followed by the new header enclosed in quotation marks. You can eliminate a header entirely by setting it equal to blank, and SAS will remove the box for that header.
TABLE Region='', MEAN=''*Sales='Mean Sales by Region';
Tells SAS to remove the headers for Region, and MEAN, and to change the header for the
variable Sales to “Mean Sales by Region.”
(6
)Specifying Multiple Formats for Data Cells in PROC TABULATE Output
To apply a format to an individual variable, cross it with the variable name. The general form of this is
Variable-name * FORMAT =formatw.d
9
.Producing Simple Output with PROC REPORT
(1
)General form
PROC REPORT NOWINDOWS;
COLUMN variable-list;
To visually separate the headers and data, use the HEADLINE or HEADSKIP options.
(2) Numeric versus character data
If you have at least one character variable in your report, you will get a detail report with one row per observation. If, you report includes only numeric variables, then PROC REPORT will sum those variables.
(3) Using DEFINE Statements in PROC REPORT
DEFINE variable / options ‘column-header’;
In a DEFINE statement, you specify the variable name followed by a slash and any options for that particular variable.
Usage Options
ACROSS: creates a column for each unique value of the variable;
ANALYSIS: calculates statistics for the variable. This is the default usage for numeric variables, and the default statistic is sum;
DISPLAY: creates one row for each observation in the data set. This is the default usage for character variables;
GROUP: creates a row for each unique value of the variable;
ORDER: creates one row for each observation with rows arranged according to the values of the order variable.
Changing column headers
DEFINE Age / ORDER ‘Age at /Admission’;
Missing data
By default, observations are excluded from reports if they have missing values for order, group, or across variables. If you want to keep these observations, then simply add the MISSING option to your PROC statement like this:
(4) Creating Summary Reports with PROC REPORT
Two different usage types cause the REPORT PROCEDURE to “roll up” data into summary groups based on the values of a variable. While the GROUP usage type produces summary rows, the ACROSS usage type produces summary columns.
(5)Adding Summary Breaks to PROC REPORT Output
Two kinds of statements allow you to insert breaks into a report. The BREAK statement adds a break for each unique value of the variable you specify, while the RBREAK statement does the same for the entire report (or BY-group if you are using a BY statement).
BREAK location variable / options;
RBREAK location variable / options;
Where
location has two possible values——BEFORE or AFTER——depending on whether you want the break to precede or follow that particular section of the report. The options that come after the slash tell SAS what kind of break to insert.
Possible options:
OL
: draws a line over the break;
PAGE
: starts a new page
SKIP
: inserts a blank line
SUMMARIZE : inserts sums of numeric variables
UL
: draws a line under the break
You can use an RBREAK statement in any report, but you can use BREAK only if you have at least one group or order variable.
(6)Adding Statistics to PROC REPORT Output
There are several ways to request statistics in the REPORT procedure. An easy method is to insert statistics keywords directly into the COLUMN statement along with the variable names. These are a few of the statistics PROC REPORT can compute:
MAX / MIN / MEAN / MEDIAN / N / NMISS / P90 / PCTN / PCTSUM / STD / SUM
To request a statistic for a particular variable, insert a comma between the statistic and variable in the COLUMN statement. One statistic, N, does not require a comma because it does not apply to a particular variable. To request multiple statistics or statistics for multiple variables, put parentheses around the statistics or variables.
COLUMN Age, Median N;
COLUMN Age, (MIN MAX) (Height Weight), MEAN;