Fifty Ways to Lose Your Data (and How to Avoid Them)

Fifty Ways to Lose Your Data (and How to Avoid Them)

Linda Jolley and Jane Stroupe, SAS Institute Inc., Cary, NC

 

This paper groups possible destructive techniques and their

solutions into the following categories:

NOVICE PROBLEMS

Many errors that beginning SAS® programmers encounter fall into other categories; however, three are identified as

common misconceptions.

1. Assuming data must be created before it can be used.

data perm.DataSet;

proc print data = perm.DataSet;

run;

Solution: Refer to the permanent data set in the PRINT procedure. It is not necessary to have a DATA step in

order to use a data set name in the PROC statement.

proc print data = perm.DataSet;

run;

2. Not keeping all of the variables that you want to keep.

data NewData;

set OldData; /*The data set OldData contains the variables X, Y, and Z. */

keep Z; /*The only variables kept should be the variables W and Z. */

W = X + Y;

run;

Solution: Specify all of the variables you want to keep (even new ones that are created in the DATA step) in the

KEEP statement.

data NewData;

set OldData;

keep Z W;

W = X + Y;

run;

SAS Global Forum 2009 Foundations and Fundamentals

2

3. Not knowing the names of your permanent data sets and creating a new one with the name of an existing

data set.

data perm.old_data;

set other_data;

if x = . then x = 5;

run;

Solution: Look at the names of the existing permanent data sets before creating a new one.

proc datasets lib = perm nods;

contents data = _all_;

run;

data perm.new_name;

set other_data;

if x = . then x = 5;

run;

NOT KNOWING YOUR DATA

The first rule of programming should be 'know thy data.' Unfortunately, many programmers rush into their

programming tasks without considering the values of their data. The following are examples of ignoring that rule.

4. Not knowing the case of character data.

data you;

set perm.Names;

where FirstName contains 'il';

run;

Solution: Use the UPCASE function when comparing character variables with constants.

data you;

set perm.Names;

where UPCASE(FirstName) contains 'IL';

run;

5. Incorrectly choosing the length of numeric data.

data five;

length num5 5 num8 8;

do num8 = 10000000000 to 10000000000000 by 100000000000;

num5 = num8;

output;

end;

run;

Here are the observations six through eight in the data set FIVE. Notice that the values of NUM5 and NUM8 are

not equal starting with observation seven, even though the assignment statement NUM5=NUM8 makes them

equal.

Obs num5 num8

6 510000000000 510000000000

7 609999998976 610000000000

8 709999998976 710000000000

SAS Global Forum 2009 Foundations and Fundamentals

3

Solution: Do not change the length of numeric data if the values are large numbers.

data five;

do num8 = 10000000000 to 10000000000000 by 100000000000;

num5 = num8;

output;

end;

run;

6. Incorrectly determining the length of character data.

trainer.dat

Linda Jolley, Technical Trainer

Jane Stroupe, Technical Trainer

data instructor;

infile 'trainer.dat' dlm = ',';

input Name $ Job;

run;

The resulting data follows:

Name Job

Linda Jo Technica

Jane Str Technica

Solution: When using list input, specify the length of the character data. List input is used to read this data since

it is free format data. The default length of character data is 8, which can sometimes result in data truncation.

data instructor;

length Name $ 12 Job $ 17;

infile 'trainer.dat' dlm = ',';

input A $ B $;

run;

Alternate Solution: Use the colon modifier to specify an informat in the INPUT statement.

data instructor;

infile datalines dlm = ',';

input Name : $12. Job : $17.;

run;

7. Having large numeric data that results in precision problems.

data BigNumber_Num;

input Num;

datalines;

12345678901234567890

;

run;

Numeric data can store 16 or 17 significant digits in a length of 8; therefore, if a numeric value has more than 16

or 17 digits, you lose precision. The resulting value of NUM in this example is 12345678901234567168.

Solution: Create a character value instead of a numeric value. The value of this character variable would be

12345678901234567890.

data BigNumber_Char;

input Num : $20.;

datalines;

12345678901234567890

;

run;

SAS Global Forum 2009 Foundations and Fundamentals

4

8. Creating a numeric variable from a character variable.

data Try_Again;

set BigNumber_Char;

NewNum = input(Num, 20.);

format NewNum 20.;

run;

Solution: The problem continues to be the length of the original number. Even if you create a character value

instead of a numeric one for the variable NUM and then attempt to use the INPUT function to create a numeric

variable from the character value, you lose precision.

9. Using PROC IMPORT to determine the type of variable.

proc import out = work.teens datafile = "teens.xls"

dbms = excel replace;

run;

The Excel spreadsheet:

Solution: Use the MIXED=YES option in the PROC IMPORT statement. The MIXED=YES option assigns a SAS

character type for the column and converts all numeric data values to character data values when mixed data

types are found.

proc import out = work.teens

datafile = "teens.xls"

dbms = excel replace;

mixed=yes;

run;

10. Having raw data with long record lengths.

data LongRecl;

infile 'LongRec.dat';

input Char $480.;

Len = length(Char);

run;

Solution: Use the LRECL= option in the INFILE statement since the record length is longer than the default (256

bytes in Windows and UNIX).

data LongRecl;

infile 'LongRec.dat' lrecl = 480;

input Char $480.;

Len = length(Char);

run;

11. Not knowing that the raw data has varying lengths.

numbers.dat

143

36921

2

4509

SAS Global Forum 2009 Foundations and Fundamentals

5

data varying;

infile 'numbers.dat';

input Num 6.;

run;

The result of the DATA step follows:

Obs Num

1 36921

2 4509

Solution: See Number 12.

12. Not knowing that the raw data has varying lengths (Version 2).

data varying;

infile 'numbers.dat' missover;

input Num 6.;

run;

The result of the DATA step follows:

Obs Num

1 .

2 .

3 .

4 .

Solution to both: Use the TRUNCOVER option in the INFILE statement.

data varying;

infile 'numbers.dat' truncover;

input Num 6.;

run;

The result of the DATA step follows:

Obs Num

1 143

2 36921

3 2

4 4509

13. Having multiple delimiters treated as one delimiter.

fullnames.dat

Thomas, John Marshall

Sherrill, Franklin Woodford

data multiple;

infile 'fullnames.dat' dlm = ', ';

input LastName : $8. FirstMName : $25.;

run;

SAS Global Forum 2009 Foundations and Fundamentals

6

The result of this DATA step follows:

First

Obs LastName MName

1 Thomas John

2 Sherrill Franklin

Solution: Use the DLMSTR= option instead of the DLM= option in the INFILE statement. This SAS 9.2 option

specifies that the delimiter must be the exact string. In this example, the delimiter must be a comma followed by

a blank.

data multiple;

infile 'fullnames.dat' dlmstr = ', ';

input LastName : $8. FirstMName : $25.;

run;

The result of this DATA step follows:

Obs LastName FirstMName

1 Thomas John Marshall

2 Sherrill Franklin Woodford

14. Having class variables that are missing when creating statistics on an analysis variable.

proc summary data = perm.class nway;

output out = summary mean = AvgAge;

class Group;

var Age;

run;

For most procedures that calculate statistics, missing CLASS variables are not included in the output data sets.

The data used for this example is

Obs Name Age Group

1 Michelle 15

2 Steve 15

3 James 12 A

4 Jeffrey 13 A

5 John 12 A

20 Louise 12 D

21 Mary 15 D

The resulting data created by the SUMMARY procedure is as follows:

Obs Group _TYPE_ _FREQ_ AvgAge

1 A 1 4 12.0000

2 B 1 6 14.3333

3 C 1 1 11.0000

4 D 1 8 13.5000

SAS Global Forum 2009 Foundations and Fundamentals

7

Solution: Use the MISSING option in the PROC SUMMARY statement.

proc summary data = perm.class missing nway;

output out = summary mean = AvgAge;

class Group;

var Age;

run;

The resulting data created by this SUMMARY procedure is as follows:

Obs Group _TYPE_ _FREQ_ AvgAge

1 1 2 15.0000

2 A 1 4 12.0000

3 B 1 6 14.3333

4 C 1 1 11.0000

5 D 1 8 13.5000

SYNTAX ERRORS

Often syntax errors stop execution of a step; therefore, a DATA step might not remove data from your data set.

However, there are some syntax errors that have a destructive behavior. Watch out for them!

15. Using the SQL procedure with too many semicolons.

proc sql;

delete from test;

where name='John';

quit;

Although this PROC SQL step causes a syntax error, it is too late. All of the rows of data have been deleted.

Solution: Put a semicolon only at the end of the SQL DELETE statement.

proc sql;

delete from test

where name='John';

quit;

Alternate Solution: Use the NOEXEC option in the PROC SQL statement. This option checks the syntax of a

PROC SQL step without actually executing it.

proc sql noexec;

delete from test

where name='John';

quit;

16. Omitting a semicolon and misspelling a keyword.

data two

ser perm.one;

run;

This program creates three data sets: TWO, SER, and PERM.ONE. Since there are no other statements in the

DATA step, the data set PERM.ONE is created with no observations.

Solution: Add a semicolon at the end of the DATA statement and/or spell SET correctly. (The default SAS

system option DATASTMTCHK = COREKEYWORDS prohibits the SET keyword as a one-level SAS data set

name.)

data two;

set perm.one;

run;

OR

SAS Global Forum 2009 Foundations and Fundamentals

8

data two

set perm.one;

run;

17. Hitting the incorrect key on the keyboard causing more misspellings.

proc sql;

delect * from perm.one;

quit;

Log

1171 proc sql;

1172 delect * from one;

------

1

WARNING 1-322: Assuming the symbol DELETE was misspelled as delect.

NOTE: 3 rows were deleted from WORK.ONE.

1173 quit;

Solution: Watch your fingers as you type. SQL interprets DELECT as DELETE and deletes all the rows of data

from the data set PERM.ONE. The 'S' and the 'D' are next to each other on a standard keyboard.

proc sql;

select * from one;

quit;

18. Merging without a BY statement.

data set CHAR1 data set CHAR2

one two one three

A B A X

C D B Y

E F C Z

data char3;

merge char1 char2;

run;

Without a BY statement, the merge combines the first observations of each data set, the second observations,

and so forth. Therefore, there is incorrect data, and much of it is lost. This is the resulting data.

Obs one two three

1 A B X

2 B D Y

3 C F Z

Solution: Use a BY statement.

data char3;

merge char1 char2;

by a; /* the input data is already sorted */

run;

SAS Global Forum 2009 Foundations and Fundamentals

9

The resulting data set now contains the correct data:

Obs one two three

1 A B X

2 B Y

3 C D Z

19. Merging on one common variable when there is more than one common variable.

data set CHAR1 data set CHAR4

one two one two

A B A X

C D B Y

E F C Z

data char5;

merge char1 char4;

by one; /* the data is already sorted */

run;

When the two data sets are merged only on the variable A, the values of B from the data set CHAR4 overwrite

the values of B from the data set CHAR1. The resulting data set follows:

Obs one two

1 A X

2 B Y

3 C Z

Solution: Merge on all like-named variables.

data char6;

merge char1 char4;

by one two; /* the data is already sorted */

run;

The resulting data follows:

Obs one two

1 A B

2 A X

3 B Y

4 C D

5 C Z

6 E F

Alternate Solution: Rename all of the like-named variables except the variable on which the merge should be

done.

data char7;

merge char1 char4(rename=(Two=NewTwo));

by one; /* the data is already sorted */

run;

SAS Global Forum 2009 Foundations and Fundamentals

10

The resulting data follows:

New

Obs one two Two

1 A B X

2 B Y

3 C D Z

4 E F

20. Forgetting to reset an option.

options obs = 10;

proc print data = perm.one;

run;

proc sort data = perm.one;

by x;

run;

Solution: Reset the option. In the case of the OBS = option, use OBS = MAX to read all of the data.

options obs = max;

proc sort data = perm.one;

by x;

run;

21. Reading in delimited data incorrectly.

locations.dat

Libertyville, Lake, IL

Kansas City,,KS

data locations;

infile 'locations.dat';

input City : $12. County : $10. State : $2.;

run;

The problem here is that no delimiter has been specified; therefore, the default delimiter of a blank is being used.

The resulting data set follows:

Obs City County State

1 Libertyville Lake, IL

Solution: See Number 23.

22. Reading in delimited data incorrectly (Version 2).

data locations;

infile 'locations.dat' dlm = ',';

input City : $12. County : $10. State : $2.;

run;

This is a good start to fixing the problem. However, multiple delimiters are treated as one delimiter, causing the

INPUT statement to reach the end of the line in the second observation. The resulting data set is no different

than it was for the previous example.

Solution: See Number 23.

SAS Global Forum 2009 Foundations and Fundamentals

11

23. Reading in delimited data incorrectly (Version 3).

data locations;

infile 'locations.dat' dlm = ',' missover;

input City : $12. County : $10. State : $2.;

run;

The MISSOVER option prevents the INPUT statement from reaching the end of the line. However, for the

second observation, the value of COUNTY is read in as KS. The resulting data follows:

Obs City County State

1 Libertyville Lake IL

2 Kansas City KS

Solution to all three: Use the DSD option in the INFILE statement.

data locations;

infile 'locations.dat' dlm = ',' missover dsd;

input City : $12. County : $10. State : $2.;

run;

The DSD option (Delimiter-Sensitive Data) uses a comma as the default delimiter, treats consecutive commas as

a missing value, and reads a character value enclosed in quotation marks and removes the quotation marks from

the character string before the value is stored. The resulting data set follows:

Obs City County State

1 Libertyville Lake IL

2 Kansas City KS

24. Reading mixed record types incorrectly.

holiday.dat

US,Independence Day,04Jul2009

FR,Bastille Day,14-7-2009

data Holiday;

infile 'holiday.dat' dsd;

input Country : $2. Holiday : $16.;

if Country = 'US' then input Date : date9.;

else input Date : ddmmyy10.;

format date mmddyy10.;

run;

The resulting data follows:

Obs Country Holiday Date

1 US Independence Day .

Solution: Use multiple INPUT statements with a single trailing @ sign and check a condition before reading

more data. The single trailing @ sign holds the current record in the input buffer so that the second INPUT

statement can continue to read from that record.

data Holiday;

infile 'holiday.dat' dsd;

input Country : $2. Holiday : $16. @;

if Country = 'US' then input Date : date9.;

else input Date : ddmmyy10.;

format date mmddyy10.;

run;

SAS Global Forum 2009 Foundations and Fundamentals

12

Alternate Solution: Use the DATASTYLE= SAS system option and the ANYDTDTE informat on the Date

variable. The DATESTYLE= SAS system option identifies the order of month, day, and year. The default value of

LOCALE uses the style that has been specified by the LOCALE= system option. The ANYDTDTE informat reads

dates in various informats.

options datestyle = locale;

data Holiday;

infile 'holiday.dat' dsd;

input Country : $2. Holiday : $16. Date : anydtdte10.;

format date mmddyy10.;

run;

25. Using the NODUPKEY option with PROC SORT and losing the duplicate observations.

data test;

infile datalines @@;

input Name $;

datalines;

John Bill Sam Bill Tom Joe Chris Bill John Kent Chris

;

run;

proc sort data = test nodupkey;

by Name;

run;

Solution: Use the DUPOUT= option in the PROC SORT statement and name a data set that contains the

duplicate observations.

proc sort data = test nodupkey dupout = extras;

run;

26. Merging to make changes to data.

data set Grocery data set Update

ID Name Job Salary ID Name Job Salary

1065 Bob Clerk 35090 1065 Manager 45125

1112 Sue Manager 55890 1117 33589

1117 Sam Bagger 31298 1350 Bagger

1350 Peg Stocker 25897 1350 30876

data updated_grocery;

merge grocery update;

by idnum;

run;

Because the MERGE processes data sequentially and the like-named variables in the data set Update overwrite

the variables from Grocery, the resulting data set is improperly updated. The resulting data set follows.

Obs ID Name Job Salary

1 1065 Manager 45125

2 1112 Sue Manager 55890

3 1117 33589

4 1350 Bagger

5 1350 30876

SAS Global Forum 2009 Foundations and Fundamentals

13

Solution: Use the UPDATE statement instead of the MERGE statement.

data updated_grocery;

update grocery update;

by idnum;

run;

The UPDATE statement changes the values of selected observations in a master file by applying transactions. If

the data in the transaction data set is missing, the UPDATE statement does not overwrite existing values in the

master data set with missing ones in the transaction data set. The resulting data set follows.

Obs ID Name Job Salary

1 1065 Bob Manager 45125

2 1112 Sue Manager 55890

3 1117 Sam Bagger 33589

4 1350 Peg Bagger 30876

27. Reading hierarchical files and losing the last observation.

pets.dat

1 Michelle

Max cat

Fuschia cat

2 Theresa

Paisley dog

Sitka dog

3 Jane

4 Nancy

Winston cat

data pets;

retain Name;

keep Name NumDogs NumCats NumOther;

infile 'pets.dat';

input @1 FirstChar : $1. @;

if FirstChar in ('1','2','3','4','5','6','7','8','9') then do;

if _n_ ne 1 then output;

input Name : $8.;

NumDogs = 0;

NumCats = 0;

NumOther = 0;

end;

else do;

input @1 PetName : $15. Pet : $3.;

if upcase(Pet) = 'DOG' then NumDogs + 1;

else if upcase(Pet) = 'CAT' then NumCats + 1;

else NumOther + 1;

end;

run;

The resulting data set follows.

Num Num Num

Obs Name Dogs Cats Other

1 Michelle 0 2 0

2 Theresa 2 0 0

3 Jane 0 0 0

SAS Global Forum 2009 Foundations and Fundamentals

14

Solution: Specify the END= option in the INFILE statement and use conditional logic to output the last

observation.

data pets;

retain Name;

keep Name NumDogs NumCats NumOther;

infile 'pets.dat' end = LastObs;

input @1 FirstChar : $1. @;

if FirstChar in ('1','2','3','4','5','6','7','8','9') then do;

if _n_ ne 1 then output;

input Name : $8.;

NumDogs = 0;

NumCats = 0;

NumOther = 0;

end;

else do;

input @1 PetName : $15. Pet : $3.;

if upcase(Pet) = 'DOG' then NumDogs + 1;

else if upcase(Pet) = 'CAT' then NumCats + 1;

else NumOther + 1;

end;

if LastObs then output;

run;

The resulting data set follows.

Num Num Num

Obs Name Dogs Cats Other

1 Michelle 0 2 0

2 Theresa 2 0 0

3 Jane 0 0 0

4 Nancy 0 1 0

28. Allowing automatic conversion of numeric to character values.

Data set Numbers

NumValue

1234

5498

5698

data character;

set Numbers;

CharValue = substr(NumValue, 1, 4);

run;

When automatic conversion of numeric data to character values occurs, the digits are written out in 12 spaces

and are right aligned; therefore, the first four characters would be blank. The resulting data set follows.

Obs NumValue CharValue

1 1234

2 5498

3 5698

SAS Global Forum 2009 Foundations and Fundamentals

15

Alternate Solution: Use the PUT function with a specific number of digits if you know the number of digits.

data character;

set Numbers;

CharValue = substr(put(NumValue, 4.), 1, 4);

run;

Alternate Solution: Use LEFT function along with the PUT function with a specific number of digits if you do not

know the number of digits

data character;

set Numbers;

CharValue = substr(left(put(NumValue, 4.)), 1, 4);

run;

29. Not specifying the KEY variable as a DATA item when using the hash object with the OUTPUT method.

data _null_ ;

length List $100;

if _n_ = 1 then do;

declare hash ph();

ph.DefineKey('Date');

ph.DefineData('List');

ph.DefineDone();

end;

set PhoneCalls end = last;

if ph.find() ne 0 then do;

List = PhoneNum;

ph.add();

end;

else do;

List = catx(', ',List, PhoneNum);

ph.replace();

end;

if last then ph.output(dataset:'PhoneList');

run;

A partial listing of the data set PhoneCalls follows:

Obs Date PhoneNum

1 21DEC2006 324-6674

2 21DEC2006 312-5098

3 21DEC2006 312-9088

4 21DEC2006 324-3452

5 21DEC2006 312-7483

6 21DEC2006 324-8943

7 21DEC2006 312-7456

The variable DATE is not in the resulting data set.

Obs List

1 324-6674, 312-5098, 312-9088, 324-3452, 312-7483, 324-8943, 312-7456, 324-4544

2 312-5098, 312-4755, 389-0098, 324-6674

3 312-9088, 324-3452, 312-7456, 312-5098, 324-8943, 312-5098

SAS Global Forum 2009 Foundations and Fundamentals

16

Solution: Add the KEY variable as one of the DATA columns in the DEFINEDATA method. By default, the KEY

variable is not included as part of the data.

data _null_ ;

length List $100;

if _n_ = 1 then do;

declare hash ph();

ph.DefineKey('Date');

ph.DefineData('Date', 'List');

ph.DefineDone();

end;

set PhoneCalls end = last;

if ph.find() ne 0 then do;

List = PhoneNum;

ph.add();

end;

else do;

List = catx(', ',List, PhoneNum);

ph.replace();

end;

if last then ph.output(dataset:'PhoneList');

run;

30. Having duplicate keys when using a hash object. Only the first one is loaded into the hash object.

data all_inquiries;

drop rc;

length Product_ID 8;

if _N_ = 1 then do;

declare hash h(dataset:'orders');

h.defineKey('Order_ID');

h.defineData('Order_ID', 'Product_ID');

h.defineDone();

call missing(Product_ID);

end;

set inquiries;

rc=h.find();

run;

Solution: Use the MULTIDATA : ‘YES’ parameter in the DECLARE statement. MULTIDATA is a SAS 9.2

enhancement to the hash object syntax which allows multiple data items for each non-unique key.

data all_inquiries;

drop rc;

length Product_ID 8;

if _N_ = 1 then do;

declare hash h(dataset:'orders', multidata:'yes');

h.defineKey('Order_ID');

h.defineData('Order_ID', 'Product_ID');

h.defineDone();

call missing(Product_ID);

end;

set inquiries;

rc = h.find();

do while (rc = 0);

output;

rc=h.find_next();

end;

run;

SAS Global Forum 2009 Foundations and Fundamentals

17

31. Not including automatic variables in your data.

proc sort data = perm.class out = class;

by Group;

run;

data Summary;

set class nobs = NumObs;

keep Group AvgAge TotAge NumObs Count;

by Group;

if first.Group then do;

TotAge = 0;

Count = 0;

end;

TotAge + Age;

Count + 1;

if last.Group then do;

AvgAge = TotAge / Count;

output;

end;

run;

Solution: Create a DATA step variable that contains the value of the NOBS= variable, NUMOBS.

data Summary;

set class nobs = NumObs;

keep Group AvgAge TotAge TotObs Count;

TotObs = NumObs;

by Group;

if first.Group then do;

TotAge = 0;

Count = 0;

end;

TotAge + Age;

Count + 1;

if last.Group then do;

AvgAge = TotAge / Count;

output;

end;

run;

32. Reading from multiple raw data files in one DATA step.

There are three raw data files with the same fields but a different number of records. The DATA step stops when

execution reaches the end of file for the raw data file with the least number of records. Therefore, the DATA step

never reads the rest of the other raw data files.

Bill 32

John 38

Beth 33

Sue 35

data names;

infile 'names1.dat';

input Name $ Age;

output;

infile 'names2.dat';

input Name $ Age;

output;

infile 'names3.dat';

input Name $ Age;

output;

run;

SAS Global Forum 2009 Foundations and Fundamentals

18

Solution: Use the FILEVAR = option in the INFILE statement to specify a variable whose change in value

causes the INFILE statement to close the current input file and open a new one. In addition, use a DO WHILE

loop to read all of the records of each raw data file.

data routes;

length Readit $10;

input ReadIt;

infile in filevar = ReadIt end = LastFile pad;

do while(LastFile = 0);

input Name $ Age;

output;

end;

datalines;

names1.dat

names2.dat

names3.dat

;

run;

BAD LOGIC

33. Having a never met condition on a WHERE statement.

proc sort data = one;

by x;

where x = 12;

run;

Solution: Create a data set using the OUT= option. In this case, the data set TWO would be empty, but the

integrity of the data set ONE is maintained.

proc sort data = one out = two;

by x;

where x = 12;

run;

34. Omitting a condition on an IF statement.

data one;

set one;

if x then delete;

run;

Solution: By not specifying a value for the variable X, the condition IF X is true as long as X is not 0 and X is not

missing. Therefore, all of the observations with a nonzero value of X are deleted.

data one;

set one;

if x = 12 then delete;

run;

35. Specifying a false condition when using the DATA step with a MODIFY statement and a WHERE

statement.

data one;

modify one;

x = x * 2;

where x = 12;

run;

SAS Global Forum 2009 Foundations and Fundamentals

19

Solution: Use the IF statement instead of the WHERE statement.

data one;

modify one;

x = x * 2;

if x = 12;

run;

36. Omitting a LENGTH statement.

data char;

input a $ b $;

datalines;

ONE TWO

THREE FOUR

FIVE SIX

SEVEN EIGHT

NINE TEN

;

data NewChar;

set char;

A = A||B;

run;

Solution: Use a LENGTH statement before the SET statement.

data NewChar;

length A $ 16;

set char;

A = A||B;

run;

37. Using the SCAN function with multiple delimiters.

data test;

input pets : $20.;

pet2 = scan(pets, 2, ',');

datalines;

cat,dog,fish

dog,,bird

;

run;

Solution: By default, the SCAN function treats multiple delimiters as one delimiter. In addition, any delimiters

before or after the string are ignored. Use the 'M' option as the fourth option in the SCAN function (a 9.2

enhancement). It specifies that multiple consecutive delimiters (and any at the beginning or end of the first

argument) are treated as missing values between the delimiters; therefore, the variable being created has a

missing value.

data test;

input pets : $20.;

pet2=scan(pets, 2, ',', 'M');

datalines;

cat,dog,fish

dog,,bird

;

run;

SAS Global Forum 2009 Foundations and Fundamentals

20

38. Creating a random sample with the RANUNI function when you need to recreate the same sample later.

data subset;

do I = 1 to 10;

x = int(num * ranuni(0));

set Seuss point = x nobs = num;

output;

end;

stop;

run;

Solution: Use a positive integer as the seed in the RANUNI function.

data subset;

do I = 1 to 10;

x = int(num * ranuni(12345));

set Seuss point = x nobs = num;

output;

end;

stop;

run;

INTENTIONALLY DELETING DATA SETS

39. Using the DATASETS procedure to delete the entire data set.

proc datasets lib = work;

delete one;

run;

40. Using the SQL procedure to drop a data set.

proc sql;

drop table one;

quit;

41. Using the Explorer window to delete a data set.

Right-click on a SAS data set in the Explorer window and select DELETE. Be careful because this does not put

the data set into the Windows recycle bin.

42. Using the VIEWTABLE window to delete rows from a SAS data set.

• Double-click on the data set in the Explorer window.

• From the Menu bar, select EDIT

你可能感兴趣的:(转载sas文章)