1. 首页
  2. R语言

【学习】《R实战》读书笔记(第六章)

【学习】《R实战》读书笔记(第六章)

读书会是一种在于拓展视野、宏观思维、知识交流、提升生活的活动。PPV课R语言读书会以“学习、分享、进步”为宗旨,通过成员协作完成R语言专业书籍的精读和分享,达到学习和研究R语言的目的。读书会由辅导老师或者读书会成员推荐书籍,经过讨论确定要读的书,每个月读一本书且要精读,大家一起分享。

第六章 基本图形

本章概要

1 条形、盒形和点图

2 饼状和扇形图

3 直方图和核密度曲线图

本章所介绍内容概括如下。

数据可视化能够很好地理解数据。R提供了非常丰富的画图函数,通过图形可有助于理解分类变量和连续变量。

1 可视化变量分布

2 结果分组比较

条形图(Bar plot)

条形图通过垂直条或者水平条展示变量频次分布,形式如下。

barplot()。

举例说明如下。

数据源:使用vcd包里面的Arthritis数据集。

Arthritis数据集描述如下。

Data from Koch & Edwards (1988) from a double-blind clinical trial investigating a new treatment for rheumatoid arthritis.

Arthritis

ID Treatment Sex Age Improved

1 57 Treated Male 27 Some

2 46 Treated Male 29 None

3 77 Treated Male 30 None

4 17 Treated Male 32 Marked

5 36 Treated Male 46 Marked

6 23 Treated Male 58 Marked

7 75 Treated Male 59 None

8 39 Treated Male 59 Marked

9 33 Treated Male 63 None

10 55 Treated Male 63 None

11 30 Treated Male 64 None

12 5 Treated Male 64 Some

13 63 Treated Male 69 None

14 83 Treated Male 70 Marked

15 66 Treated Female 23 None

16 40 Treated Female 32 None

17 6 Treated Female 37 Some

18 7 Treated Female 41 None

19 72 Treated Female 41 Marked

20 37 Treated Female 48 None

21 82 Treated Female 48 Marked

22 53 Treated Female 55 Marked

23 79 Treated Female 55 Marked

24 26 Treated Female 56 Marked

25 28 Treated Female 57 Marked

26 60 Treated Female 57 Marked

27 22 Treated Female 57 Marked

28 27 Treated Female 58 None

29 2 Treated Female 59 Marked

30 59 Treated Female 59 Marked

31 62 Treated Female 60 Marked

32 84 Treated Female 61 Marked

33 64 Treated Female 62 Some

34 34 Treated Female 62 Marked

35 58 Treated Female 66 Marked

36 13 Treated Female 67 Marked

37 61 Treated Female 68 Some

38 65 Treated Female 68 Marked

39 11 Treated Female 69 None

40 56 Treated Female 69 Some

41 43 Treated Female 70 Some

42 9 Placebo Male 37 None

43 14 Placebo Male 44 None

44 73 Placebo Male 50 None

45 74 Placebo Male 51 None

46 25 Placebo Male 52 None

47 18 Placebo Male 53 None

48 21 Placebo Male 59 None

49 52 Placebo Male 59 None

50 45 Placebo Male 62 None

51 41 Placebo Male 62 None

52 8 Placebo Male 63 Marked

53 80 Placebo Female 23 None

54 12 Placebo Female 30 None

55 29 Placebo Female 30 None

56 50 Placebo Female 31 Some

57 38 Placebo Female 32 None

58 35 Placebo Female 33 Marked

59 51 Placebo Female 37 None

60 54 Placebo Female 44 None

61 76 Placebo Female 45 None

62 16 Placebo Female 46 None

63 69 Placebo Female 48 None

64 31 Placebo Female 49 None

65 20 Placebo Female 51 None

66 68 Placebo Female 53 None

67 81 Placebo Female 54 None

68 4 Placebo Female 54 None

69 78 Placebo Female 54 Marked

70 70 Placebo Female 55 Marked

71 49 Placebo Female 57 None

72 10 Placebo Female 57 Some

73 47 Placebo Female 58 Some

74 44 Placebo Female 59 Some

75 24 Placebo Female 59 Marked

76 48 Placebo Female 61 None

77 19 Placebo Female 63 Some

78 3 Placebo Female 64 None

79 67 Placebo Female 65 Marked

80 32 Placebo Female 66 None

81 42 Placebo Female 66 None

82 15 Placebo Female 66 Some

83 71 Placebo Female 68 Some

84 1 Placebo Female 74 Marked

> rm(list=ls())

> counts <- table(Arthritis$Improved)

> counts

None Some Marked

42 14 28

> par(mfrow=c(1,2))

> barplot(counts, main=”Simple Bar Plot”, xlab=”Improvement”, ylab=”Frequency”)

> barplot(counts, main=”Horizontal Bar Plot”, xlab=”Frequency”, ylab=”Improvement”, horiz=TRUE)

效果图如图1所示。

图1:简单的垂直和水平条形图。

注意:若是分类变量属于因子类型,没必要使用table()函数转换,直接使用barplot()函数绘图。

堆形或者分组条状图。

举例说明如下。

> rm(list=ls())

> library(vcd)

Loading required package: grid

> counts <- table(Arthritis$Improved, Arthritis$Treatment)

> counts

Placebo Treated

None 29 13

Some 7 7

Marked 7 21

> barplot(counts, main=”Stacked Bar Plot”,

+ xlab=”Treatment”, ylab=”Frequency”,

+ col=c(“red”, “yellow”, “green”),

+ legend=rownames(counts))

> barplot(counts, main=”Stacked Bar Plot”,

+ xlab=”Treatment”, ylab=”Frequency”,

+ col=c(“red”, “yellow”, “green”),

+ legend=rownames(counts), beside=TRUE)

效果图如图2所示。

图2:堆形和组间条状图

均值条状图,即每个条状表示均值指标。

举例说明如下。

> rm(list=ls())

> states <- data.frame(state.region,state.x77)

> means <- aggregate(states$Illiteracy, by=list(state.region),FUN=mean)

> means

Group.1 x

1 Northeast 1.000000

2 South 1.737500

3 North Central 0.700000

4 West 1.023077

> means <- means[order(means$x),]

> means

Group.1 x

3 North Central 0.700000

1 Northeast 1.000000

4 West 1.023077

2 South 1.737500

> barplot(means$x, names.agr=means$Group.1)

> title(“Mean Illiteracy Rate”)

拓展:包gplots中barplot2()函数,增强型线状条http://addictedtor.free.fr/graphiques

饼图(Pie charts)

使用函数pie(),形式如下。

pie(x, labels)

举例说明如下:

> rm(list=ls())

> slices <- c(10, 12, 4, 16, 8)

> lbls <- c(“US”, “UK”, “Australia”, “Germany”, “France”)

> pie(slices, labels = lbls, main=”Simple Pie Chart”)

效果图如图3所示。

图3:饼形图

拓展:包plotrix的fan.plot()函数。

直方图

直方图可以展示不同分组的频次,形式如下。

hist(x)

举例说明如下。

> rm(list=ls())

> par(mfrow=c(1,2))

> hist(mtcars$mpg)

> hist(mtcars$mpg,

+ freq=FALSE,

+ breaks=12,

+ col=”red”,

+ xlab=”Miles Per Gallon”,

+ main=”Histogram, rug plot, density curve”)

> rug(jitter(mtcars$mpg))

> lines(density(mtcars$mpg), col=”blue”, lwd=2)

效果图如图4所示。

图4:直方图

核密度曲线图

它能够有效地反映连续变量的分布情况。形式如下。

plot(density(x))

举例说明如下。

> rm(list=ls())

> d <- density(mtcars$mpg)

> plot(d)

> plot(d, main=”Kernel Density of Mile Per Gallon”)

> polygon(d, col=”red”, border=”blue”)

> rug(mtcars$mpg, col=”brown”)

效果图如图5所示。

图5:核密度曲线图

拓展:包sm的sm.density.compare()函数。

盒形图

盒形图通过五个参数信息描述连续变量的分布特性。这五个参数分别是最大值、最小值、中位数、1/4分位数和3/4分位数。使用boxplot()函数。

举例说明如下。

> rm(list=ls())

> boxplot(mtcars$mpg, main=”Box plot”, ylab=”Miles per Gallon”)

效果图以及图形信息解释如图6所示。

图6:盒形图

拓展:包vioplot中的vioplot()函数。

点图

点图提供一种显示标签值的方法,形式如下。

dotchart(x, labels=)

举例说明如下。

> dotchart(mtcars$mpg, labels=row.names(mtcars),

+ cex=.7,

+ main=”Gas Mileage for Car Model”,

+ xlab=”Mile Per Gallon”)

效果图如图7所示。

图7:点图

总结

1 数据可视化技术

2 R中几种常用的图形绘制(条状图、饼图、扇形图、直方图、核密度曲线图、盒形图和点图等)

Resource

1 http://www.wangluqing.com/2014/06/r-in-action-note8/

2 《R in action》第二部分第六章内容

本栏目文章由PPV课R语言读书会提供,转载请注明来自PPV课R语言读书会。

版权所有,违者必究!

原文始发于微信公众号(PPV课数据科学社区):【学习】《R实战》读书笔记(第六章)

原创文章,作者:ppvke,如若转载,请注明出处:http://www.ppvke.com/archives/29917

联系我们

4000-51-9191

在线咨询:点击这里给我发消息

工作时间:周一至周五,9:30-18:30,节假日休息