Title: | Relevance-Integrated Statistical Inference Engine |
---|---|
Description: | Provide methods to perform customized inference at individual level by taking contextual covariates into account. Three main functions are provided in this package: (i) LASER(): it generates specially-designed artificial relevant samples for a given case; (ii) g2l.proc(): computes customized fdr(z|x); and (iii) rEB.proc(): performs empirical Bayes inference based on LASERs. The details can be found in Mukhopadhyay, S., and Wang, K (2021, <arXiv:2004.09588>). |
Authors: | Subhadeep Mukhopadhyay, Kaijun Wang |
Maintainer: | Kaijun Wang <[email protected]> |
License: | GPL-2 |
Version: | 3.3 |
Built: | 2025-01-25 03:49:24 UTC |
Source: | https://github.com/cran/LPRelevance |
How to individualize a global inference method? The goal of this package is to provide a systematic recipe for converting classical global inference algorithms into customized ones. It provides methods that perform individual level inferences by taking contextual covariates into account. At the heart of our solution is the concept of "artificially-designed relevant samples", called LASERs–which pave the way to construct an inference mechanism that is simultaneously efficiently estimable and contextually relevant, thus works at both macroscopic (overall simultaneous) and microscopic (individual-level) scale.
Subhadeep Mukhopadhyay, Kaijun Wang
Maintainer: Kaijun Wang <[email protected]>
Mukhopadhyay, S., and Wang, K (2021) "On The Problem of Relevance in Statistical Inference". <arXiv:2004.09588>
A diffusion tensor imaging study comparing brain activity of six dyslexic children versus six normal controls. Two-sample tests produced z-values at voxels (3-dimensional brain locations), with each
under the null hypothesis of no difference between the dyslexic and normal children.
data(data.dti)
data(data.dti)
A data frame with 15443 observations on the following 4 variables.
coordx
A list of x coordinates
coordy
A list of y coordinates
coordz
A list of z coordinates
z
The -values.
http://statweb.stanford.edu/~ckirby/brad/LSI/datasets-and-programs/datasets.html
Efron, B. (2012). "Large-scale inference: empirical Bayes methods for estimation, testing, and prediction". Cambridge University Press.
A large-scale heterogeneous dataset used in our paper.
data("funnel")
data("funnel")
A data frame with 3565 observations on the following 3 variables.
x
A list of covariate values.
z
A list of z-values.
tags
Binary vector of labels, 1 indicates a data point is a signal.
Mukhopadhyay, S., and Wang, K (2021) "On The Problem of Relevance in Statistical Inference". <arXiv:2004.09588>
This function performs customized fdr analyses tailored to each individual cases.
g2l.proc(X, z, X.target = NULL, z.target = NULL, m = c(4, 6), alpha = 0.1, nbag = NULL, nsample = length(z), lp.reg.method = "lm", null.scale = "QQ", approx.method = "direct", ngrid = 2000, centering = TRUE, coef.smooth = "BIC", fdr.method = "locfdr", plot = TRUE, rel.null = "custom", locfdr.df = 10, fdr.th.fixed = NULL, parallel = FALSE, ...)
g2l.proc(X, z, X.target = NULL, z.target = NULL, m = c(4, 6), alpha = 0.1, nbag = NULL, nsample = length(z), lp.reg.method = "lm", null.scale = "QQ", approx.method = "direct", ngrid = 2000, centering = TRUE, coef.smooth = "BIC", fdr.method = "locfdr", plot = TRUE, rel.null = "custom", locfdr.df = 10, fdr.th.fixed = NULL, parallel = FALSE, ...)
X |
A |
z |
A length |
X.target |
A |
z.target |
A vector of length |
m |
An ordered pair. First number indicates how many LP-nonparametric basis to construct for each |
alpha |
Confidence level for determining signals. |
nbag |
Number of bags of parametric bootstrapped samples to use for each target case, each time a new set of relevance samples will be generated for analysis, and the resulting fdr curves are aggregated together by taking the mean values. Set to |
nsample |
Number of relevance samples generated for each case. The default is the size of the input z-statistic. |
lp.reg.method |
Method for estimating the relevance function and its conditional LP-Fourier coefficients. We currently support three options: lm (inbuilt with subset selection), glmnet, and knn. |
null.scale |
Method of estimating null standard deviation from the laser samples. Available options: "IQR", "QQ" and "locfdr" |
approx.method |
Method used to approximate customized fdr curve, default is "direct".When set to "indirect", the customized fdr is computed by modifying pooled fdr using relevant density function. |
ngrid |
Number of gridpoints to use for computing customized fdr curve. |
centering |
Whether to perform regression-adjustment to center the data, default is TRUE. |
coef.smooth |
Specifies the method to use for LP coefficient smoothing (AIC or BIC). Uses BIC by default. |
fdr.method |
Method for controlling false discoveries (either "locfdr" or "BH"), default choice is "locfdr". |
plot |
Whether to include plots in the results, default is |
rel.null |
How the relevant null changes with x: "custom" denotes we allow it to vary with x, and "th" denotes fixed. |
locfdr.df |
Degrees of freedom to use for |
fdr.th.fixed |
Use fixed fdr threshold for finding signals. Default set to |
parallel |
Use parallel computing for obtaining the relevance samples, mainly used for very huge |
... |
Extra parameters to pass to other functions. Currently only supports the arguments for |
A list containing the following items:
macro |
Available when |
$result |
A list of global inference results: |
$X |
Matrix of covariates, same as input |
$z |
Vector of observations, same as input |
$probnull |
A vector of length |
$signal |
A binary vector of length |
plots |
A list of plots for global inference: |
$signal_x |
A plot of signals discovered, marked in red |
$dps_xz |
A scatterplot of z on x, colored based on the discovery propensity scores, only available when |
$dps_x |
A scatterplot of discovery propensity scores on x, only available when |
micro |
Available when |
$result |
Customized estimates for null probabilities for target |
$result$signal |
A binary vector of length |
$global |
Pooled global estimates for null probabilities for target |
$plots |
Customized fdr plots for the target cases. |
m.lp |
Same as input |
Subhadeep Mukhopadhyay, Kaijun Wang
Maintainer: Kaijun Wang <[email protected]>
Mukhopadhyay, S., and Wang, K (2021) "On The Problem of Relevance in Statistical Inference". <arXiv:2004.09588>
data(funnel) X<-funnel$x z<-funnel$z ##macro-inference using locfdr and LASER: g2l_macro<-g2l.proc(X,z) g2l_macro$macro$plots #Microinference for the DTI data: case A with x=(18,55) and z=3.95 data(data.dti) X<- cbind(data.dti$coordx,data.dti$coordy) z<-data.dti$z g2l_x<-g2l.proc(X,z,X.target=c(18,55),z.target=3.95,nsample =3000) g2l_x$micro$plots$fdr.1+ggplot2::coord_cartesian(xlim=c(0,4)) g2l_x$micro$result[4]
data(funnel) X<-funnel$x z<-funnel$z ##macro-inference using locfdr and LASER: g2l_macro<-g2l.proc(X,z) g2l_macro$macro$plots #Microinference for the DTI data: case A with x=(18,55) and z=3.95 data(data.dti) X<- cbind(data.dti$coordx,data.dti$coordy) z<-data.dti$z g2l_x<-g2l.proc(X,z,X.target=c(18,55),z.target=3.95,nsample =3000) g2l_x$micro$plots$fdr.1+ggplot2::coord_cartesian(xlim=c(0,4)) g2l_x$micro$result[4]
This data set records age and kidney function of volunteers. Higher scores indicates better function.
data(kidney)
data(kidney)
A data frame with 157 observations on the following 2 variables.
x
A list of patients' age.
z
A list of kidney scores.
http://statweb.stanford.edu/~ckirby/brad/LSI/datasets-and-programs/datasets.html
Efron, B. (2012). "Large-scale inference: empirical Bayes methods for estimation, testing, and prediction". Cambridge University Press.
Lemley, K. V., Lafayette, R. A., Derby, G., Blouch, K. L., Anderson, L., Efron, B., & Myers, B. D. (2007). "Prediction of early progression in recently diagnosed IgA nephropathy." Nephrology Dialysis Transplantation, 23(1), 213-222.
This function generates the artificial relevance samples (LASER).These are "sharpened" z-samples manufactured by the relevance-function .
LASER( X,z, X.target, m=c(4,6), nsample=length(z), lp.reg.method='lm', coef.smooth='BIC', centering=TRUE,parallel=FALSE,...)
LASER( X,z, X.target, m=c(4,6), nsample=length(z), lp.reg.method='lm', coef.smooth='BIC', centering=TRUE,parallel=FALSE,...)
X |
A |
z |
A length |
X.target |
A |
m |
An ordered pair. First number indicates how many LP-nonparametric basis to construct for each |
nsample |
Number of relevance samples to generate for each case. |
lp.reg.method |
Method for estimating the relevance function and its conditional LP-Fourier coefficients. We currently support thee options: lm (inbuilt with subset selection), glmnet, and knn. |
centering |
Whether to perform regression-adjustment to center the data, default is TRUE. |
coef.smooth |
Specifies the method to use for LP coefficient smoothing (AIC or BIC). Uses BIC by default. |
parallel |
Use parallel computing for obtaining the relevance samples, mainly used for very huge |
... |
Extra parameters to pass to other functions. Currently only supports the arguments for |
A list containing the following items:
data |
The relevant samples at |
LPcoef |
Parameters of the relevance function |
Subhadeep Mukhopadhyay, Kaijun Wang
Maintainer: Kaijun Wang <[email protected]>
Mukhopadhyay, S., and Wang, K (2021) "On The Problem of Relevance in Statistical Inference". <arXiv:2004.09588>
data(funnel) X<-funnel$x z<-funnel$z z.laser.x30<-LASER(X,z,X.target=30,m=c(4,8))$data hist(z.laser.x30,50)
data(funnel) X<-funnel$x z<-funnel$z z.laser.x30<-LASER(X,z,X.target=30,m=c(4,8))$data hist(z.laser.x30,50)
Performs custom-tailored Finite Bayes inference via LASERs.
rEB.Finite.Bayes(X,z,X.target,z.target,m=c(4,6),m.EB=8, B=10, centering=TRUE, nsample=min(1000,length(z)), g.method='DL',LP.type='L2', sd0=NULL, theta.set.prior=seq(-2.5*sd(z),2.5*sd(z),length.out=500), theta.set.post=seq(z.target-2.5*sd(z),z.target+2.5*sd(z),length.out=500), post.alpha=0.8, plot=TRUE, ...)
rEB.Finite.Bayes(X,z,X.target,z.target,m=c(4,6),m.EB=8, B=10, centering=TRUE, nsample=min(1000,length(z)), g.method='DL',LP.type='L2', sd0=NULL, theta.set.prior=seq(-2.5*sd(z),2.5*sd(z),length.out=500), theta.set.post=seq(z.target-2.5*sd(z),z.target+2.5*sd(z),length.out=500), post.alpha=0.8, plot=TRUE, ...)
X |
A |
z |
A length |
X.target |
A length |
z.target |
the target |
m |
An ordered pair. First number indicates how many LP-nonparametric basis to construct for each |
m.EB |
The truncation point reflecting the concentration of true nonparametric prior density |
B |
Number of bags of bootstrap samples for Finite Bayes. |
centering |
Whether to perform regression-adjustment to center the data, default is TRUE. |
nsample |
Number of relevance samples generated for the target case. |
g.method |
Suggested method for finding parameter estimates |
LP.type |
User selects either "L2" for LP-orthogonal series representation of relevance density function |
sd0 |
Fixed standard deviation for |
theta.set.prior |
This indicates the set of grid points to compute prior density. |
theta.set.post |
This indicates the set of grid points to compute posterior density. |
post.alpha |
The alpha level for posterior HPD interval. |
plot |
Whether to display plots for prior and posterior of Relevance Finite Bayes. |
... |
Extra parameters to pass to LASER function. |
A list containing the following items:
prior |
Relevant Finite Bayes prior results. |
$prior.fit |
Prior density curve estimation. |
posterior |
Relevant empirical Bayes posterior results. |
$post.fit |
Posterior density curve estimation. |
$post.mode |
Posterior mode for |
$post.mean |
Posterior mean for |
$post.mean.sd |
Standard error for the posterior mean. |
$HPD.interval |
The HPD interval for posterior |
g.par |
Parameters for |
LP.coef |
Reports the LP-coefficients of the relevance function |
sd0 |
Initial estimate for null standard errors. |
plots |
The plots for prior and posterior density. |
Subhadeep Mukhopadhyay, Kaijun Wang
Maintainer: Kaijun Wang <[email protected]>
Mukhopadhyay, S., and Wang, K (2021) "On The Problem of Relevance in Statistical Inference". <arXiv:2004.09588>
data(funnel) X<-funnel$x z<-funnel$z X.target=30 z.target=4.49 rFB.out=rEB.Finite.Bayes(X,z,X.target,z.target,B=5,nsample=1000,m=c(4,8),m.EB=8, theta.set.prior=seq(-4,4,length.out=500), theta.set.post=seq(0,5,length.out=500),cred.interval=0.8,parallel=FALSE) rFB.out$plots$prior rFB.out$plots$post
data(funnel) X<-funnel$x z<-funnel$z X.target=30 z.target=4.49 rFB.out=rEB.Finite.Bayes(X,z,X.target,z.target,B=5,nsample=1000,m=c(4,8),m.EB=8, theta.set.prior=seq(-4,4,length.out=500), theta.set.post=seq(0,5,length.out=500),cred.interval=0.8,parallel=FALSE) rFB.out$plots$prior rFB.out$plots$post
Performs custom-tailored empirical Bayes inference via LASERs.
rEB.proc(X, z, X.target, z.target, m = c(4, 6), nbag = NULL, centering = TRUE, lp.reg.method = "lm", coef.smooth = "BIC", nsample = min(length(z),2000), theta.set.prior = NULL, theta.set.post = NULL, LP.type = "L2", g.method = "DL", sd0 = NULL, m.EB = 8, parallel = FALSE, avg.method = "mean", post.curve = "HPD", post.alpha = 0.8, color = "red", ...)
rEB.proc(X, z, X.target, z.target, m = c(4, 6), nbag = NULL, centering = TRUE, lp.reg.method = "lm", coef.smooth = "BIC", nsample = min(length(z),2000), theta.set.prior = NULL, theta.set.post = NULL, LP.type = "L2", g.method = "DL", sd0 = NULL, m.EB = 8, parallel = FALSE, avg.method = "mean", post.curve = "HPD", post.alpha = 0.8, color = "red", ...)
X |
A |
z |
A length |
X.target |
A length |
z.target |
the target |
m |
An ordered pair. First number indicates how many LP-nonparametric basis to construct for each |
nbag |
Number of bags of parametric bootstrapped samples to use, set to |
centering |
Whether to perform regression-adjustment to center the data, default is TRUE. |
lp.reg.method |
Method for estimating the relevance function and its conditional LP-Fourier coefficients. We currently support thee options: lm (inbuilt with subset selection), glmnet, and knn. |
coef.smooth |
Specifies the method to use for LP coefficient smoothing (AIC or BIC). Uses BIC by default. |
nsample |
Number of relevance samples generated for the target case. |
theta.set.prior |
This indicates the set of grid points to compute prior density. |
theta.set.post |
This indicates the set of grid points to compute posterior density. |
LP.type |
User selects either "L2" for LP-orthogonal series representation of relevance density function |
g.method |
Suggested method for finding parameter estimates |
sd0 |
Fixed standard deviation for |
m.EB |
The truncation point reflecting the concentration of true nonparametric prior density |
parallel |
Use parallel computing for obtaining the relevance samples, mainly used for very huge |
avg.method |
For parametric bootstrapping, this specifies how the results from different bags are aggregated. (" |
post.curve |
For plotting, this specifies what to show on posterior curve. " |
post.alpha |
Confidence level to use when plotting posterior confidence band, or the alpha level for HPD interval. |
color |
The color of the plots. |
... |
Extra parameters to pass to other functions. Currently only supports the arguments for |
A list containing the following items:
result |
Contains relevant empirical Bayes prior and posterior results. |
sd0 |
Initial estimate for null standard errors. |
prior |
Relevant empirical Bayes prior results. |
$g.par |
Parameters for |
$g.method |
Method used for finding the parameter estimates |
$LP.coef |
Reports the LP-coefficients of the relevance function |
posterior |
Relevant empirical Bayes posterior results. |
$post.mode |
Posterior mode for |
$post.mean |
Posterior mean for |
$post.mean.sd |
Standard error for the posterior mean, when using parametric bootstrap. |
$HPD.interval |
The HPD interval for posterior |
$post.alpha |
same as input |
plots |
The plots for prior and posterior density. |
Subhadeep Mukhopadhyay, Kaijun Wang
Maintainer: Kaijun Wang <[email protected]>
Mukhopadhyay, S., and Wang, K (2021) "On The Problem of Relevance in Statistical Inference". <arXiv:2004.09588>
data(funnel) X<-funnel$x z<-funnel$z X.target=60 z.target=4.49 rEB.out<-rEB.proc(X,z,X.target,z.target,m=c(4,8), theta.set.prior=seq(-2,2,length.out=200), theta.set.post=seq(-2,5,length.out=200), centering=TRUE,m.EB=6,nsample=1000) rEB.out$plots$rEB.post rEB.out$plots$rEB.prior
data(funnel) X<-funnel$x z<-funnel$z X.target=60 z.target=4.49 rEB.out<-rEB.proc(X,z,X.target,z.target,m=c(4,8), theta.set.prior=seq(-2,2,length.out=200), theta.set.post=seq(-2,5,length.out=200), centering=TRUE,m.EB=6,nsample=1000) rEB.out$plots$rEB.post rEB.out$plots$rEB.prior