r - 在 R 中,在 {boot} 中使用 boot() 函數對聚集數據最高級別的非參數化

  显示原文与译文双语对照的内容
52 3

我有兩級分層數據,在最高級別嘗試非參數的Bootstrap 採樣,在保持原有的集群數據不變的情況下,隨機採樣最高級別的集群,以替換原始的。

我想使用 {boot} 包中的boot() 函數來實現這一點,因為我想使用 boot.ci() 構建a 間隔,需要引導對象。

下面是我不幸的嘗試- 在啟動調用上運行調試表明隨機抽樣不會在集群級( =subject ) 中發生。

### create a very simple two-level dataset with 'subject' as clustering variable
rho <- 0.4
dat <- expand.grid(
 trial=factor(1:5),
 subject=factor(1:3)
 )
sig <- rho * tcrossprod(model.matrix(~ 0 + subject, dat))
diag(sig) <- 1
set.seed(17); dat$value <- chol(sig) %*% rnorm(15, 0, 1)
### my statistic function (adapted from here: http://biostat.mc.vanderbilt.edu/wiki/Main/HowToBootstrapCorrelatedData)
resamp.mean <- function(data, i){
 cluster <- c('subject', 'trial')
 # sample the clustering factor
 cls <- unique(data[[cluster[1]]])[i] 
 # subset on the sampled clustering factors
 sub <- lapply(cls, function(b) subset(data, data[[cluster[1]]]==b)) 
 sub.2 <- do.call(rbind, sub) # join and return samples
 mean((sub.2$value)) # calculate the statistic
}
debugonce(boot)
set.seed(17); dat.boot <- boot(data = dat, statistic = resamp.mean, 4)
### stepping trough the debugger until object 'i' was assigned
### investigating 'i'
# Browse[2]> head(i)
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] 3 7 12 13 10 14 14 15 12 12 12 4 5 9 10
[2,] 15 9 3 13 4 10 2 4 6 11 10 4 9 4 3
[3,] 8 4 7 15 10 12 9 8 9 12 4 15 14 10 4
[4,] 12 3 1 15 8 13 9 1 4 13 9 13 2 11 2
### which is not what I was hoping for.
### I would like something that looks like this, supposing indices = c(2, 2, 1) for the first resample: 
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] 6 7 8 9 10 6 7 8 9 10 1 2 3 4 5

任何幫助都是非常感激的。

时间:原作者:0个回答

51 2

我認為問題源於修改後的統計函數( 。函數中的cls 對象) 。你能試試這個?取消註釋 print 語句以查看哪些主題已經被採樣。它不使用 boot 期望的index 參數,而是在原始函數中使用 sample

resamp.mean <- function(dat, 
 indices, 
 cluster = c('subject', 'trial'), 
 replace = TRUE){
 # boot expects an indices argument but the sampling happens
 # via sample() as in the original source of the function
 # sample the clustering factor
 cls <- sample(unique(dat[[cluster[1]]]), replace=replace)
 # subset on the sampled clustering factors
 sub <- lapply(cls, function(b) subset(dat, dat[[cluster[1]]]==b))
 # join and return samples
 sub <- do.call(rbind, sub)
 # UNCOMMENT HERE TO SEE SAMPLED SUBJECTS 
 # print(sub)
 mean(sub$value)
} 

計算 value 平均值之前的resamp.mean 函數的重採樣如下所示:

 trial subject value
1 1 1 -1.1581291
2 2 1 -0.1458287
3 3 1 -0.2134525
4 4 1 -0.5796521
5 5 1 0.6501587
11 1 3 2.6678441
12 2 3 1.3945740
13 3 3 1.4849435
14 4 3 0.4086737
15 5 3 1.3399146
111 1 1 -1.1581291
121 2 1 -0.1458287
131 3 1 -0.2134525
141 4 1 -0.5796521
151 5 1 0.6501587 
原作者:
...