首页 > 代码库 > R语言——K折交叉验证之随机均分数据集

R语言——K折交叉验证之随机均分数据集

今天,在阅读吴喜之教授的《复杂数据统计方法》时,遇到了把一个数据集按照某个因子分成若干子集,再把若干子集随机平均分成n份的问题,吴教授的方法也比较好理解,但是我还是觉得有点繁琐,因此自己编写了一个函数,此后遇到这种问题只需要运行一下函数就可以了。

这里采用R中自带的iris数据集,

> str(iris)
‘data.frame‘:	150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

  

iris数据集结构如上所示,其中Species是一个因子型数据,共有三个水平,根据Species将其可以分成三个子集,对每个子集进行五折交叉验证的话,需要把每个数据集均分成五份,R语言代码如下:

fiveDivide<-function(col,data,n=5)
{
  #col is a facotr type column,divide each group of the dataframe 
  #into n partitions,string type
  #data is a data.frame type in R
  #n represents the numbers which you want to divide into,default 5
  #the function return a list contain n data.frame
  #use sample(x) generate x numbers in unordered rank,then
  #divide the x numebr into n partitions
  group_num=length(levels(data[,col]))  #
  lst1=list() #按照因子分类把原数据分成group_num份
  lst2=list() #把每一个gruop分成等分的数据框
  lst3=list() #
  for(i in 1:group_num)
  {
    lst1[[i]]=data[data[col]==levels(data[,col])[i],]  #这里先把原数据集按照因子水平分成n个子集
  }
  for(k in 1:group_num)  #这个循环的目的就是把么个子集平均分成n份,并且是随机分的,需要用到sample函数
  {
    od=sample(nrow(lst1[[k]]))
    newdata=http://www.mamicode.com/lst1[[k]][od,]>

  对iris进行处理:

> rep=fiveDivide("Species",iris,5)
> str(rep)
List of 3
 $ :List of 5
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 4.8 5.2 4.8 4.7 5.5 5.1 4.8 4.4 4.8 4.9
  .. ..$ Sepal.Width : num [1:10] 3 3.5 3.4 3.2 3.5 3.7 3.1 3 3.4 3
  .. ..$ Petal.Length: num [1:10] 1.4 1.5 1.6 1.6 1.3 1.5 1.6 1.3 1.9 1.4
  .. ..$ Petal.Width : num [1:10] 0.3 0.2 0.2 0.2 0.2 0.4 0.2 0.2 0.2 0.2
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 5 4.7 4.8 5.2 5.1 5.1 4.9 5.4 5 5.5
  .. ..$ Sepal.Width : num [1:10] 3.5 3.2 3 3.4 3.5 3.8 3.1 3.4 3.5 4.2
  .. ..$ Petal.Length: num [1:10] 1.3 1.3 1.4 1.4 1.4 1.5 1.5 1.7 1.6 1.4
  .. ..$ Petal.Width : num [1:10] 0.3 0.2 0.1 0.2 0.2 0.3 0.1 0.2 0.6 0.2
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 5.4 4.3 4.9 5.4 4.4 4.6 5.1 5 5.1 5.1
  .. ..$ Sepal.Width : num [1:10] 3.9 3 3.6 3.9 3.2 3.6 3.4 3.4 3.8 3.8
  .. ..$ Petal.Length: num [1:10] 1.3 1.1 1.4 1.7 1.3 1 1.5 1.6 1.9 1.6
  .. ..$ Petal.Width : num [1:10] 0.4 0.1 0.1 0.4 0.2 0.2 0.2 0.4 0.4 0.2
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 4.4 4.5 5.3 5 5 5.1 5.4 5.2 5.1 5.4
  .. ..$ Sepal.Width : num [1:10] 2.9 2.3 3.7 3.3 3.4 3.3 3.7 4.1 3.5 3.4
  .. ..$ Petal.Length: num [1:10] 1.4 1.3 1.5 1.4 1.5 1.7 1.5 1.5 1.4 1.5
  .. ..$ Petal.Width : num [1:10] 0.2 0.3 0.2 0.2 0.2 0.5 0.2 0.1 0.3 0.4
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 4.6 5.8 5 5 5 4.6 5.7 4.9 5.7 4.6
  .. ..$ Sepal.Width : num [1:10] 3.4 4 3.6 3.2 3 3.2 4.4 3.1 3.8 3.1
  .. ..$ Petal.Length: num [1:10] 1.4 1.2 1.4 1.2 1.6 1.4 1.5 1.5 1.7 1.5
  .. ..$ Petal.Width : num [1:10] 0.3 0.2 0.2 0.2 0.2 0.2 0.4 0.2 0.3 0.2
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1
 $ :List of 5
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 6.2 6 5.8 6.3 5.5 5.8 5.8 6.1 6.2 5.6
  .. ..$ Sepal.Width : num [1:10] 2.9 3.4 2.7 3.3 2.6 2.6 2.7 3 2.2 3
  .. ..$ Petal.Length: num [1:10] 4.3 4.5 3.9 4.7 4.4 4 4.1 4.6 4.5 4.1
  .. ..$ Petal.Width : num [1:10] 1.3 1.6 1.2 1.6 1.2 1.2 1 1.4 1.5 1.3
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 6.4 5.6 5.7 6.6 6 6.4 5.9 6.9 6.7 5.5
  .. ..$ Sepal.Width : num [1:10] 3.2 2.5 2.8 3 2.2 2.9 3 3.1 3.1 2.5
  .. ..$ Petal.Length: num [1:10] 4.5 3.9 4.5 4.4 4 4.3 4.2 4.9 4.4 4
  .. ..$ Petal.Width : num [1:10] 1.5 1.1 1.3 1.4 1 1.3 1.5 1.5 1.4 1.3
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 6.5 5.2 6.8 6 5.7 5 6.3 5.7 5.5 5.6
  .. ..$ Sepal.Width : num [1:10] 2.8 2.7 2.8 2.9 2.9 2.3 2.5 2.8 2.3 3
  .. ..$ Petal.Length: num [1:10] 4.6 3.9 4.8 4.5 4.2 3.3 4.9 4.1 4 4.5
  .. ..$ Petal.Width : num [1:10] 1.5 1.4 1.4 1.5 1.3 1 1.5 1.3 1.3 1.5
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 6.6 6.7 5 6.7 5.9 6.1 5.7 5.4 6 5.1
  .. ..$ Sepal.Width : num [1:10] 2.9 3 2 3.1 3.2 2.8 2.6 3 2.7 2.5
  .. ..$ Petal.Length: num [1:10] 4.6 5 3.5 4.7 4.8 4 3.5 4.5 5.1 3
  .. ..$ Petal.Width : num [1:10] 1.3 1.7 1 1.5 1.8 1.3 1 1.5 1.6 1.1
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 5.6 6.1 6.3 7 4.9 5.7 5.5 5.5 6.1 5.6
  .. ..$ Sepal.Width : num [1:10] 2.7 2.9 2.3 3.2 2.4 3 2.4 2.4 2.8 2.9
  .. ..$ Petal.Length: num [1:10] 4.2 4.7 4.4 4.7 3.3 4.2 3.8 3.7 4.7 3.6
  .. ..$ Petal.Width : num [1:10] 1.3 1.4 1.3 1.4 1 1.2 1.1 1 1.2 1.3
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2
 $ :List of 5
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 6.9 6.7 6.1 6.4 6.4 6.7 5.7 6.5 6.4 6.3
  .. ..$ Sepal.Width : num [1:10] 3.2 2.5 2.6 2.8 3.1 3.3 2.5 3 2.7 2.9
  .. ..$ Petal.Length: num [1:10] 5.7 5.8 5.6 5.6 5.5 5.7 5 5.5 5.3 5.6
  .. ..$ Petal.Width : num [1:10] 2.3 1.8 1.4 2.1 1.8 2.1 2 1.8 1.9 1.8
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 5.8 7.7 6.5 6.4 7.4 6.3 6.8 6 6.7 6.8
  .. ..$ Sepal.Width : num [1:10] 2.8 2.8 3.2 3.2 2.8 3.3 3 2.2 3.3 3.2
  .. ..$ Petal.Length: num [1:10] 5.1 6.7 5.1 5.3 6.1 6 5.5 5 5.7 5.9
  .. ..$ Petal.Width : num [1:10] 2.4 2 2 2.3 1.9 2.5 2.1 1.5 2.5 2.3
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 5.8 6.2 6 6.1 7.7 5.6 6.3 7.3 7.2 6.9
  .. ..$ Sepal.Width : num [1:10] 2.7 2.8 3 3 2.6 2.8 2.8 2.9 3 3.1
  .. ..$ Petal.Length: num [1:10] 5.1 4.8 4.8 4.9 6.9 4.9 5.1 6.3 5.8 5.4
  .. ..$ Petal.Width : num [1:10] 1.9 1.8 1.8 1.8 2.3 2 1.5 1.8 1.6 2.1
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 6.7 7.2 7.2 6.3 6.3 6.5 6.3 7.7 7.9 6.5
  .. ..$ Sepal.Width : num [1:10] 3 3.2 3.6 2.7 2.5 3 3.4 3.8 3.8 3
  .. ..$ Petal.Length: num [1:10] 5.2 6 6.1 4.9 5 5.8 5.6 6.7 6.4 5.2
  .. ..$ Petal.Width : num [1:10] 2.3 1.8 2.5 1.8 1.9 2.2 2.4 2.2 2 2
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3
  ..$ :‘data.frame‘:	10 obs. of  5 variables:
  .. ..$ Sepal.Length: num [1:10] 7.7 6.4 6.2 6.9 6.7 7.1 5.8 4.9 5.9 7.6
  .. ..$ Sepal.Width : num [1:10] 3 2.8 3.4 3.1 3.1 3 2.7 2.5 3 3
  .. ..$ Petal.Length: num [1:10] 6.1 5.6 5.4 5.1 5.6 5.9 5.1 4.5 5.1 6.6
  .. ..$ Petal.Width : num [1:10] 2.3 2.2 2.3 2.3 2.4 2.1 1.9 1.7 1.8 2.1
  .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3
  


  

  均分以后数据表现为:

> rep
[[1]]
[[1]][[1]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
46          4.8         3.0          1.4         0.3  setosa
28          5.2         3.5          1.5         0.2  setosa
12          4.8         3.4          1.6         0.2  setosa
30          4.7         3.2          1.6         0.2  setosa
37          5.5         3.5          1.3         0.2  setosa
22          5.1         3.7          1.5         0.4  setosa
31          4.8         3.1          1.6         0.2  setosa
39          4.4         3.0          1.3         0.2  setosa
25          4.8         3.4          1.9         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa

[[1]][[2]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
41          5.0         3.5          1.3         0.3  setosa
3           4.7         3.2          1.3         0.2  setosa
13          4.8         3.0          1.4         0.1  setosa
29          5.2         3.4          1.4         0.2  setosa
1           5.1         3.5          1.4         0.2  setosa
20          5.1         3.8          1.5         0.3  setosa
10          4.9         3.1          1.5         0.1  setosa
21          5.4         3.4          1.7         0.2  setosa
44          5.0         3.5          1.6         0.6  setosa
34          5.5         4.2          1.4         0.2  setosa

[[1]][[3]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
17          5.4         3.9          1.3         0.4  setosa
14          4.3         3.0          1.1         0.1  setosa
38          4.9         3.6          1.4         0.1  setosa
6           5.4         3.9          1.7         0.4  setosa
43          4.4         3.2          1.3         0.2  setosa
23          4.6         3.6          1.0         0.2  setosa
40          5.1         3.4          1.5         0.2  setosa
27          5.0         3.4          1.6         0.4  setosa
45          5.1         3.8          1.9         0.4  setosa
47          5.1         3.8          1.6         0.2  setosa

[[1]][[4]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
9           4.4         2.9          1.4         0.2  setosa
42          4.5         2.3          1.3         0.3  setosa
49          5.3         3.7          1.5         0.2  setosa
50          5.0         3.3          1.4         0.2  setosa
8           5.0         3.4          1.5         0.2  setosa
24          5.1         3.3          1.7         0.5  setosa
11          5.4         3.7          1.5         0.2  setosa
33          5.2         4.1          1.5         0.1  setosa
18          5.1         3.5          1.4         0.3  setosa
32          5.4         3.4          1.5         0.4  setosa

[[1]][[5]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
7           4.6         3.4          1.4         0.3  setosa
15          5.8         4.0          1.2         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
36          5.0         3.2          1.2         0.2  setosa
26          5.0         3.0          1.6         0.2  setosa
48          4.6         3.2          1.4         0.2  setosa
16          5.7         4.4          1.5         0.4  setosa
35          4.9         3.1          1.5         0.2  setosa
19          5.7         3.8          1.7         0.3  setosa
4           4.6         3.1          1.5         0.2  setosa


[[2]]
[[2]][[1]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
98          6.2         2.9          4.3         1.3 versicolor
86          6.0         3.4          4.5         1.6 versicolor
83          5.8         2.7          3.9         1.2 versicolor
57          6.3         3.3          4.7         1.6 versicolor
91          5.5         2.6          4.4         1.2 versicolor
93          5.8         2.6          4.0         1.2 versicolor
68          5.8         2.7          4.1         1.0 versicolor
92          6.1         3.0          4.6         1.4 versicolor
69          6.2         2.2          4.5         1.5 versicolor
89          5.6         3.0          4.1         1.3 versicolor

[[2]][[2]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
52          6.4         3.2          4.5         1.5 versicolor
70          5.6         2.5          3.9         1.1 versicolor
56          5.7         2.8          4.5         1.3 versicolor
76          6.6         3.0          4.4         1.4 versicolor
63          6.0         2.2          4.0         1.0 versicolor
75          6.4         2.9          4.3         1.3 versicolor
62          5.9         3.0          4.2         1.5 versicolor
53          6.9         3.1          4.9         1.5 versicolor
66          6.7         3.1          4.4         1.4 versicolor
90          5.5         2.5          4.0         1.3 versicolor

[[2]][[3]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
55           6.5         2.8          4.6         1.5 versicolor
60           5.2         2.7          3.9         1.4 versicolor
77           6.8         2.8          4.8         1.4 versicolor
79           6.0         2.9          4.5         1.5 versicolor
97           5.7         2.9          4.2         1.3 versicolor
94           5.0         2.3          3.3         1.0 versicolor
73           6.3         2.5          4.9         1.5 versicolor
100          5.7         2.8          4.1         1.3 versicolor
54           5.5         2.3          4.0         1.3 versicolor
67           5.6         3.0          4.5         1.5 versicolor

[[2]][[4]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
59          6.6         2.9          4.6         1.3 versicolor
78          6.7         3.0          5.0         1.7 versicolor
61          5.0         2.0          3.5         1.0 versicolor
87          6.7         3.1          4.7         1.5 versicolor
71          5.9         3.2          4.8         1.8 versicolor
72          6.1         2.8          4.0         1.3 versicolor
80          5.7         2.6          3.5         1.0 versicolor
85          5.4         3.0          4.5         1.5 versicolor
84          6.0         2.7          5.1         1.6 versicolor
99          5.1         2.5          3.0         1.1 versicolor

[[2]][[5]]
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
95          5.6         2.7          4.2         1.3 versicolor
64          6.1         2.9          4.7         1.4 versicolor
88          6.3         2.3          4.4         1.3 versicolor
51          7.0         3.2          4.7         1.4 versicolor
58          4.9         2.4          3.3         1.0 versicolor
96          5.7         3.0          4.2         1.2 versicolor
81          5.5         2.4          3.8         1.1 versicolor
82          5.5         2.4          3.7         1.0 versicolor
74          6.1         2.8          4.7         1.2 versicolor
65          5.6         2.9          3.6         1.3 versicolor


[[3]]
[[3]][[1]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
121          6.9         3.2          5.7         2.3 virginica
109          6.7         2.5          5.8         1.8 virginica
135          6.1         2.6          5.6         1.4 virginica
129          6.4         2.8          5.6         2.1 virginica
138          6.4         3.1          5.5         1.8 virginica
125          6.7         3.3          5.7         2.1 virginica
114          5.7         2.5          5.0         2.0 virginica
117          6.5         3.0          5.5         1.8 virginica
112          6.4         2.7          5.3         1.9 virginica
104          6.3         2.9          5.6         1.8 virginica

[[3]][[2]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
115          5.8         2.8          5.1         2.4 virginica
123          7.7         2.8          6.7         2.0 virginica
111          6.5         3.2          5.1         2.0 virginica
116          6.4         3.2          5.3         2.3 virginica
131          7.4         2.8          6.1         1.9 virginica
101          6.3         3.3          6.0         2.5 virginica
113          6.8         3.0          5.5         2.1 virginica
120          6.0         2.2          5.0         1.5 virginica
145          6.7         3.3          5.7         2.5 virginica
144          6.8         3.2          5.9         2.3 virginica

[[3]][[3]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
143          5.8         2.7          5.1         1.9 virginica
127          6.2         2.8          4.8         1.8 virginica
139          6.0         3.0          4.8         1.8 virginica
128          6.1         3.0          4.9         1.8 virginica
119          7.7         2.6          6.9         2.3 virginica
122          5.6         2.8          4.9         2.0 virginica
134          6.3         2.8          5.1         1.5 virginica
108          7.3         2.9          6.3         1.8 virginica
130          7.2         3.0          5.8         1.6 virginica
140          6.9         3.1          5.4         2.1 virginica

[[3]][[4]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
146          6.7         3.0          5.2         2.3 virginica
126          7.2         3.2          6.0         1.8 virginica
110          7.2         3.6          6.1         2.5 virginica
124          6.3         2.7          4.9         1.8 virginica
147          6.3         2.5          5.0         1.9 virginica
105          6.5         3.0          5.8         2.2 virginica
137          6.3         3.4          5.6         2.4 virginica
118          7.7         3.8          6.7         2.2 virginica
132          7.9         3.8          6.4         2.0 virginica
148          6.5         3.0          5.2         2.0 virginica

[[3]][[5]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
136          7.7         3.0          6.1         2.3 virginica
133          6.4         2.8          5.6         2.2 virginica
149          6.2         3.4          5.4         2.3 virginica
142          6.9         3.1          5.1         2.3 virginica
141          6.7         3.1          5.6         2.4 virginica
103          7.1         3.0          5.9         2.1 virginica
102          5.8         2.7          5.1         1.9 virginica
107          4.9         2.5          4.5         1.7 virginica
150          5.9         3.0          5.1         1.8 virginica
106          7.6         3.0          6.6         2.1 virginica

  

R语言——K折交叉验证之随机均分数据集