Title: | LP Nonparametric High Dimensional K-Sample Comparison |
---|---|
Description: | LP nonparametric high-dimensional K-sample comparison method that includes (i) confirmatory test, (ii) exploratory analysis, and (iii) options to output a data-driven LP-transformed matrix for classification. The primary reference is Mukhopadhyay, S. and Wang, K. (2020, Biometrika); <arXiv:1810.01724>. |
Authors: | Subhadeep Mukhopadhyay, Kaijun Wang |
Maintainer: | Kaijun Wang <[email protected]> |
License: | GPL-2 |
Version: | 2.1 |
Built: | 2025-02-14 04:11:56 UTC |
Source: | https://github.com/cran/LPKsample |
This package performs high dimensional K-sample comparison using graph-based LP nonparametric (GLP) method.
Mukhopadhyay, S. and Wang, K.
Maintainer: Kaijun Wang <[email protected]>
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
Mukhopadhyay, S. (2017+), "Unified Statistical Theory of Spectral Graph Analysis".
Mukhopadhyay, S. and Parzen, E. (2014), "LP Approach to Statistical Modeling", arXiv:1405.2601.
This function performs the GLP multivariate K-sample learning.
GLP(X,y,m.max=4,components=NULL,alpha=0.05,c.poly=0.5,clust.alg='kmeans',perm=0, combine.criterion='pvalue',multiple.comparison=TRUE, compress.algorithm=FALSE,nbasis=8, return.LPT=FALSE,return.clust=FALSE)
GLP(X,y,m.max=4,components=NULL,alpha=0.05,c.poly=0.5,clust.alg='kmeans',perm=0, combine.criterion='pvalue',multiple.comparison=TRUE, compress.algorithm=FALSE,nbasis=8, return.LPT=FALSE,return.clust=FALSE)
X |
A |
y |
A length |
m.max |
An integer, maximum order of LP component to investigate, default: 4. |
components |
A vector specifying which components to test. If provided with any value other than NULL, the test will only examine the components mentioned in this argument, ignoring the m.max settings. |
alpha |
Numeric, confidence level |
c.poly |
Numeric, parameter for polynomial kernel, default: 0.5. |
perm |
Number of permutations for approximating p-value, set to 0 to use asymptotic p-value. |
combine.criterion |
How to obtain the overall testing result based on the component-wise results; 'pvalue' uses Fisher's method to combine the p-values from each component; 'kernel' computes an overall kernel |
multiple.comparison |
Set to TRUE to use adjustment for multiple comparisons when determining which components are significant. |
compress.algorithm |
Use the smooth compression of Laplacian spectra for testing the null hypothesis. Recommended for large |
nbasis |
Number of bases used for approximation when |
clust.alg |
|
return.LPT |
logical, whether or not to return the data driven covariate matrix, default: FALSE. |
return.clust |
logical, whether or not to return the class labels assigned by graph community detection, default: FALSE. |
A list containing the following items:
GLP |
Overall GLP statistics. |
pval |
Overall P-value. |
table |
The GLP component table indicating the significance of each component. |
components |
significant eLP components for the data set. |
LPT |
(optional) matrix of data driven covariates. |
clust |
(optional) class labels assigned by graph community detection. |
Mukhopadhyay, S. and Wang, K.
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
Mukhopadhyay, S. and Wang, K. (2020). "Towards a unified statistical theory of spectralgraph analysis", arXiv:1901.07090,
##1.muiltivariate normal distribution with only mean difference: ##generate data, n1=n2=10, dimension 25 X1<-matrix(rnorm(250,mean=0,sd=1),10,25) X2<-matrix(rnorm(250,mean=0.5,sd=1),10,25) y<-c(rep(1,10),rep(2,10)) X<-rbind(X1,X2) ##GLP test: locdiff.test<-GLP(X,y,m.max=4) ## Not run: ##2.Leukemia data example data(leukemia) attach(leukemia) leukemia.test<-GLP(X,class,components=1:4) ##confirmatory results: leukemia.test$GLP # overall statistic #[1] 0.2092378 leukemia.test$pval # overall p-value #[1] 0.0001038647 ##exploratory outputs: leukemia.test$table # rows as shown in Table 3 of reference # component comp.GLP pvalue #[1,] 1 0.209237826 0.0001038647 #[2,] 2 0.022145514 0.2066876581 #[3,] 3 0.002025545 0.7025436476 #[4,] 4 0.033361702 0.1211769396 ## End(Not run)
##1.muiltivariate normal distribution with only mean difference: ##generate data, n1=n2=10, dimension 25 X1<-matrix(rnorm(250,mean=0,sd=1),10,25) X2<-matrix(rnorm(250,mean=0.5,sd=1),10,25) y<-c(rep(1,10),rep(2,10)) X<-rbind(X1,X2) ##GLP test: locdiff.test<-GLP(X,y,m.max=4) ## Not run: ##2.Leukemia data example data(leukemia) attach(leukemia) leukemia.test<-GLP(X,class,components=1:4) ##confirmatory results: leukemia.test$GLP # overall statistic #[1] 0.2092378 leukemia.test$pval # overall p-value #[1] 0.0001038647 ##exploratory outputs: leukemia.test$table # rows as shown in Table 3 of reference # component comp.GLP pvalue #[1,] 1 0.209237826 0.0001038647 #[2,] 2 0.022145514 0.2066876581 #[3,] 3 0.002025545 0.7025436476 #[4,] 4 0.033361702 0.1211769396 ## End(Not run)
Gene expression data for two classes: Acute lymphoblastic leukemia (ALL) and Acute myeloid leukemia (AML), over n=72 observations, and d=7128 genes.
data("leukemia")
data("leukemia")
A list containing the following items:
class
:a vector of class labels
X
:72 by 7128 matrix, gene expressions for each observation
http://statweb.stanford.edu/~ckirby/brad/LSI/datasets-and-programs/datasets.html
data(leukemia)
data(leukemia)
The function computes the LP comeans between x
and y
.
LP.comean(x, y, perm=0)
LP.comean(x, y, perm=0)
x |
vector, observations of an univariate random variable |
y |
vector, observations of another univariate random variable |
perm |
Number of permutations for approximating p-value, set to 0 to use asymptotic p-value. |
A list containing:
LPINFOR |
The test statistics based on LP comeans |
p.val |
Test p-value |
LP.matrix |
LP comean matrix |
Mukhopadhyay, S. and Wang, K.
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
Parzen, E. and Mukhopadhyay, S. (2012) "Modeling, Dependence, Classification, United Statistical Science, Many Cultures".
#example: LP-comean for two simple vectors: y<-c(1,2,3,4,5) z<-c(0,-1,-1,3,4) comeanYZ=LP.comean(y,z) #sum square statistics of LP comean: comeanYZ$LPINFOR #p-value: comeanYZ$p.val #comean matrix: comeanYZ$LP.matrix
#example: LP-comean for two simple vectors: y<-c(1,2,3,4,5) z<-c(0,-1,-1,3,4) comeanYZ=LP.comean(y,z) #sum square statistics of LP comean: comeanYZ$LPINFOR #p-value: comeanYZ$p.val #comean matrix: comeanYZ$LP.matrix
Empirical LP Transformation on the data
LPT(x, k); LP.Poly(x, m);
LPT(x, k); LP.Poly(x, m);
x |
A column vector of the data |
k |
An integer, order of LP component for transformation |
m |
An integer, maximum order of LP component for transformation |
Given a vector of data , the
LPT(x,k)
function computes the vector of eLP component of order specified by for
. While the
LP.Poly(x,m)
function computes all components up until .
A vector containing the elements of -th order component of the eLP transformation on
(LPT);
Or a matrix with columns of
to
-th order component of the eLP transformation on
(LP.Poly);
Mukhopadhyay, S. and Wang, K.
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
Mukhopadhyay, S. and Parzen, E. (2014) "LP Approach to Statistical Modeling", arXiv:1405.2601.
## x<-runif(10) LPT(x,1)
## x<-runif(10) LPT(x,1)
Given data matrix and eLP order
, this function generate the similarity matrix
for graph analysis.
W.Gen(X, k, c.poly = 0.5)
W.Gen(X, k, c.poly = 0.5)
X |
A |
k |
An integer, order of LP component |
c.poly |
Numeric, parameter for polynomial kernel |
A -by-
similarity matrix generated from
-th order eLP transformation of
Mukhopadhyay, S. and Wang, K.
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
#example: 6 observations on 3 features: x<-rbind(matrix(runif(9),3,3),matrix(runif(9)+1,3,3)) #LP similarity matrix: simmat<-W.Gen(x,1)$W image(simmat)
#example: 6 observations on 3 features: x<-rbind(matrix(runif(9),3,3),matrix(runif(9)+1,3,3)) #LP similarity matrix: simmat<-W.Gen(x,1)$W image(simmat)