lung73               package:scaleboot               R Documentation

_C_l_u_s_t_e_r_i_n_g _o_f _7_3 _L_u_n_g _T_u_m_o_r_s

_D_e_s_c_r_i_p_t_i_o_n:

     Bootstrapping hierarchical clustering of the DNA microarray data
     set of 73 lung tissue samples each containing 916 observed genes.

_U_s_a_g_e:

     data(lung73)

     lung73.pvclust

     lung73.sb

_F_o_r_m_a_t:

     'lung73.pvclust' is an object of class '"pvclust"' defined in
     'pvclust' of Suzuki and Shimodaira (2006).

     'lung73.sb' is an object of class '"scalebootv"' of length 72.

_D_e_t_a_i_l_s:

     The microarray dataset of Garber et al. (2001) is reanalyzed in
     Suzuki and Shimodaira (2006), and is found in 'data(lung)' of the
     'pvclust' package. We reanalyze it, again, by the script shown in
     Examples. The result of 'pvclust' is stored in 'lung73.pvclust',
     and model fitting to bootstrap probabilities by the 'scaleboot'
     package is stored in 'lung73.sb'. The AU p-values obtained by
     using the 'scaleboot' package are sometimes very different from
     those obtained by the 'pvclust' package. For example, 'pvclust'
     with default parameter value gave AU p-value of 0.70 for Edge-67,
     but the 'sbfit' gives AU p-value (named "k.3") of 0.95 for the
     same edge. Note that the raw bootstrap probability (i.e., the
     ordinary bootstrap probability with scale=1) is 0.04.

     The AU p-values for all nodes are shown by the 'summary' method,


     > summary(lung73.sb[60:70])

     Corrected P-values (percent):
        raw          k.1          k.2          k.3          model  aic         
     60 20.21 (0.40) 20.29 (0.18) 71.40 (0.20) 78.98 (0.44) sing.3 
     80.46 
     61 58.45 (0.49) 55.08 (0.17) 63.15 (0.24) 56.34 (0.38) poly.3
     575.85 
     62 95.68 (0.20) 95.92 (0.10) 98.64 (0.10) 98.61 (0.12) poly.3
     -12.01 
     63 58.31 (0.49) 57.30 (0.17) 82.09 (0.20) 81.74 (0.28) poly.3 
     20.74 
     64 15.81 (0.36) 15.58 (0.16) 75.36 (0.21) 84.86 (0.37) sing.3 
     71.47 
     65  2.96 (0.17)  2.80 (0.07) 76.73 (0.51) 94.88 (0.20) sing.3 
     33.34 
     66 15.75 (0.36) 15.92 (0.16) 78.02 (0.20) 87.98 (0.29) sing.3  
     7.30 
     67  3.63 (0.19)  3.31 (0.07) 77.02 (0.47) 95.10 (0.17) sing.3 
     25.11 
     68 26.20 (0.44) 27.06 (0.17) 83.06 (0.18) 84.90 (0.27) poly.3  
     8.67 
     69 29.49 (0.46) 29.65 (0.17) 75.37 (0.22) 75.83 (0.34) poly.3
     -14.09 
     70 28.31 (0.45) 29.04 (0.19) 76.62 (0.17) 81.54 (0.37) sing.3  
     0.99


     Shown above are four types of p-values as well as selected model
     and AIC values.  "raw" is the ordinary bootstrap probability,
     "k.1" is equivalent to "raw" but calculated from the multiscale
     bootstrap, "k.2" is equivalent to the third-order AU p-value of
     CONSEL, and finally "k.3" is an improved version of AU p-value. By
     default, we use "k.3" when copying back the p-values to an object
     of class '"pvclust"'.

     See Examples below for details.

_N_o_t_e:

     The microarray dataset is not included in 'data(lung73)', but it
     is found in 'data(lung)' of the 'pvclust' package.

_S_o_u_r_c_e:

     Garber, M. E. et al. (2001) Diversity of gene expression in
     adenocarcinoma of the lung, _Proceedings of the National Academy
     of Sciences_, 98, 13784-13789 (dataset is available from <URL:
     http://genome-www.stanford.edu/lung_cancer/adeno/>).

_R_e_f_e_r_e_n_c_e_s:

     Suzuki, R. and Shimodaira, H. (2006). pvclust: An R package for
     hierarchical clustering with p-values, _Bioinformatics_, 22,
     1540-1542 (software is available from CRAN or <URL:
     http://www.is.titech.ac.jp/~shimo/prog/pvclust/>).

_S_e_e _A_l_s_o:

     'sbpvclust', 'sbfit.pvclust'

_E_x_a_m_p_l_e_s:

     ## Not run: 
     ## script to create lung73.pvclust and lung73.sb
     ## multiscale bootstrap resampling of hierarchical clustering
     library(pvclust)
     data(lung)
     sa <- 9^seq(-1,1,length=13) # wider range of scales than pvclust default
     lung73.pvclust <- pvclust(lung,r=1/sa,nboot=10000) 
     lung73.sb <- sbfit(lung73.pvclust) # model fitting
     ## End(Not run)

     ## Not run: 
     ## Parallel version of the above script
     ## parPvclust took 80 mins using 40 cpu's
     library(snow)
     library(pvclust)
     data(lung)
     cl <- makeCluster(40) # launch 40 cpu's
     sa <- 9^seq(-1,1,length=13) # wider range of scales than pvclust default
     lung73.pvclust <- parPvclust(cl,lung,r=1/sa,nboot=10000) 
     lung73.sb <- sbfit(lung73.pvclust,cluster=cl) # model fitting
     ## End(Not run)

     ## replace au/bp entries in pvclust object
     data(lung73)
     lung73.new <- sbpvclust(lung73.pvclust,lung73.sb) # au <- k.3

     ## Not run: 
     library(pvclust)
     plot(lung73.new) # draw dendrogram with the new au/bp values
     pvrect(lung73.new)
     ## End(Not run)

     ## diagnose edges 61,...,69
     lung73.sb[61:69] # print fitting details
     plot(lung73.sb[61:69]) # plot curve fitting
     summary(lung73.sb[61:69]) # print au p-values
     ## diagnose edge 67
     lung73.sb[[67]] # print fitting
     plot(lung73.sb[[67]],legend="topleft") # plot curve fitting
     summary(lung73.sb[[67]]) # print au p-values

