1. 首页
  2. R语言

【学习】七天搞定SAS(三):基本模块调用(格式、计数、概要统计、排序等)(上)

【学习】七天搞定SAS(三):基本模块调用(格式、计数、概要统计、排序等)(上)

搞定基本的函数之后,开始鼓捣SAS里面的模型。也就是说,要开始写PROC了。说实话,越学SAS,越觉得SAS像Stata…无论是从输出 的样式,还是语法。好不习惯没有()的模型调用呀。若是说SAS和Stata的区别,怕只是Stata更侧重于计量模型而SAS则是服务于大多数统计模型 吧。

PROC的基本内容:CONTENT

先是一个最基本的PROC:content,可以显示数据集的主要特性。比如:

1

2

LIBNAME tropical 'c:MySASLib';

PROC CONTENTS DATA = tropical.banana;

这里主要是两个声明:TITLE和FOOTNOTE。前者输出时候会产生一个标题,后者会产生尾注。用法也是比较直接的:

1

2

3

TITLE Heres another title;

TITLE Here’’s another title;

FOOTNOTE3 This is the third footnote;

最后还有一个很像Stata的LABEL声明:

1

2

LABEL ReceiveDate = Date order was received

ShipDate = Date merchandise was shipped;

可以变量加注释。其实R里面给变量加注释是一件非常麻烦的事情,只有少数几个包可以搞定,还非常不值的。一般说来,我尽量在变量命名的时候长一点,这样直接可以读懂;再就是重建一个新的表,存储变量名和label。

SAS PROC求子集:WHERE

如果要在PROC里面先求子集的话,可以直接调用WHERE。感觉这里和SQL的思路比较像。用法也算是比较简单(SAS里面的用法都不是很麻烦,除了某些模型):

1

2

3

4

5

PROC PRINT DATA = 'c:MySASLibstyle';

WHERE Genre = 'Impressionism';

TITLE 'Major Impressionist Painters';

FOOTNOTE 'F = France N = Netherlands U = US';

RUN;

这样最终得到的结果就是:

1

2

3

4

5

6

7

Major Impressionist Painters 1

Obs Name Genre Origin

1 Mary Cassatt Impressionism U

3 Edgar Degas Impressionism F

5 Claude Monet Impressionism F

6 Pierre Auguste Renoir Impressionism F

F = France N = Netherlands U = US

SAS PROC 数据进行排序:SORT

排序就更简单了,直接PROC SORT就可以了。

1

2

3

4

5

6

7

8

9

10

DATA marine;

INFILE 'c:MyRawDataLengths.dat';

INPUT Name $ Family $ Length @@;

RUN;

* Sort the data;

PROC SORT DATA = marine OUT = seasort NODUPKEY;

BY Family DESCENDING Length;

PROC PRINT DATA = seasort;

TITLE 'Whales and Sharks';

RUN;

这样数据就按照Family、Length(递减)排序了。

1

2

3

4

5

6

7

8

9

10

11

12

Whales and Sharks 1

Obs Name Family Length

1 humpback 50.0

2 whale shark 40.0

3 basking shark 30.0

4 mako shark 12.0

5 dwarf shark 0.5

6 blue whale 100.0

7 sperm whale 60.0

8 gray whale 50.0

9 killer whale 30.0

10 beluga whale 15.0

SAS PROC 输出数据:PRINT

最简单的数据输出怕就是PRINT了,顾名思义,直接打印数据出来。这里可以进行便啦的选择,还就可以选择统计量:

1

2

3

4

5

6

7

8

9

10

11

12

13

DATA sales;

INFILE 'c:MyRawDataCandy.dat';

INPUT Name $ 111 Class @15 DateReturned MMDDYY10. CandyType $

Quantity;

Profit = Quantity * 1.25;

PROC SORT DATA = sales;

BY Class;

PROC PRINT DATA = sales;

BY Class;

SUM Profit;

VAR Name DateReturned CandyType Profit;

TITLE 'Candy Sales for Field Trip by Class';

RUN;

得到的结果为:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

Candy Sales for Field Trip by Class 1

Class=14

Date Candy

Obs Name Returned Type Profit

1 Nathan 17612 CD 23.75

2 Matthew 17612 CD 17.50

3 Claire 17613 CD 13.75

4 Chris 17616 CD 7.50

5 Stephen 17616 CD 12.50

Class 75.00

Class=21

Date Candy

Obs Name Returned Type Profit

6 Adriana 17612 MP 8.75

7 Caitlin 17615 CD 11.25

8 Ian 17615 MP 22.50

9 Anthony 17616 MP 16.25

10 Erika 17616 MP 21.25

Class 80.00

======

155.00

SAS PROC里面改变输出格式:FORMAT

基本就是FORMAT一下就可以了,再就是PUT的时候也可以调整。

1

2

3

4

5

6

7

8

9

10

DATA sales;

INFILE 'c:MyRawDataCandy.dat';

INPUT Name $ 111 Class @15 DateReturned MMDDYY10. CandyType $

Quantity;

Profit = Quantity * 1.25;

PROC PRINT DATA = sales;

VAR Name DateReturned CandyType Profit;

FORMAT DateReturned DATE9. Profit DOLLAR6.2;

TITLE 'Candy Sale Data Using Formats';

RUN;

输出结果为:

1

2

3

4

5

6

7

8

9

10

11

12

13

Candy Sale Data Using Formats 1

Date Candy

Obs Name Returned Type Profit

1 Adriana 21MAR2008 MP $8.75

2 Nathan 21MAR2008 CD $23.75

3 Matthew 21MAR2008 CD $17.50

4 Claire 22MAR2008 CD $13.75

5 Caitlin 24MAR2008 CD $11.25

6 Ian 24MAR2008 MP $22.50

7 Chris 25MAR2008 CD $7.50

8 Anthony 25MAR2008 MP $16.25

9 Stephen 25MAR2008 CD $12.50

10 Erika 25MAR2008 MP $21.25

常用的格式有:

  • 文本型:$HEXw.和$w.

  • 日期型:DATEw.(输出为ddmmyy或者ddmmyyyy)、DATETIMEw.d(输出为ddmmyy:hh:mm:ss)、 DAYw.(输出为dd)、EURDFDDw. 、JULIANw.、MMDDYYw.(输出为mmddyy或mmddyyyy)、TIMEw.d(输出为hh:mm:ss)、WEEKDATEw.(输 出为工作日)、WORDDATEw.(输出为单词)。

  • 数字型:BESTw.(自动选择)、COMMAw.d(逗号分隔)、DOLLARw.d(货币)、Ew.(科学计数法)、PDw.d、w.d(标准小数)。

输出的样本见下。

当然FORMAT还可以自定义factor型变量的输出格式,比如:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

DATA carsurvey;

INFILE 'c:MyRawDataCars.dat';

INPUT Age Sex Income Color $;

PROC FORMAT;

VALUE gender 1 = 'Male'

2 = 'Female';

VALUE agegroup 13 < 20 = 'Teen'

20 < 65 = 'Adult'

65 HIGH = 'Senior';

VALUE $col 'W' = 'Moon White'

'B' = 'Sky Blue'

'Y' = 'Sunburst Yellow'

'G' = 'Rain Cloud Gray';

* Print data using userdefined and standard (DOLLAR8.) formats;

PROC PRINT DATA = carsurvey;

FORMAT Sex gender. Age agegroup. Color $col. Income DOLLAR8.;

TITLE 'Survey Results Printed with User-Defined Formats';

RUN;

就可以把数字型的1,2转换为对应的文本male和female等,还可以把变量离散化,得到的输出为:

1

2

3

4

5

6

7

Survey Results Printed with UserDefined Formats 1

Obs Age Sex Income Color

1 Teen Male $14,000 Sunburst Yellow

2 Adult Male $65,000 Rain Cloud Gray

3 Senior Female $35,000 Sky Blue

4 Adult Male $44,000 Sunburst Yellow

5 Adult Female $83,000 Moon White

最终可以实现的自定义输出还包括简单的文本连接,比如:

1

2

3

4

5

6

7

8

9

10

11

12

13

* Write a report with FILE and PUT statements;

DATA _NULL_;

INFILE 'c:MyRawDataCandy.dat';

INPUT Name $ 111 Class @15 DateReturned MMDDYY10.

CandyType $ Quantity;

Profit = Quantity * 1.25;

FILE 'c:MyRawDataStudent.txt' PRINT;

TITLE;

PUT @5 'Candy sales report for ' Name 'from classroom ' Class

// @5 'Congratulations! You sold ' Quantity 'boxes of candy'

/ @5 'and earned ' Profit DOLLAR6.2 ' for our field trip.';

PUT _PAGE_;

RUN;

可以给出若干连续的输出(注意DATA _NULL_;将不生成任何SAS的数据表):

1

2

3

4

5

6

7

8

9

Candy sales report for Adriana from classroom 21

Congratulations! You sold 7 boxes of candy and earned $8.75 for our field trip.

Candy sales report for Nathan from classroom 14

Congratulations! You sold 19 boxes of candy and earned $23.75 for our field trip.

Candy sales report for Matthew from classroom 14

Congratulations! You sold 14 boxes of candy and earned $17.50 for our field trip.

原文始发于微信公众号(PPV课数据科学社区):【学习】七天搞定SAS(三):基本模块调用(格式、计数、概要统计、排序等)(上)

原创文章,作者:ppvke,如若转载,请注明出处:http://www.ppvke.com/archives/31024

联系我们

4000-51-9191

在线咨询:点击这里给我发消息

工作时间:周一至周五,9:30-18:30,节假日休息