Issue Information
Wiley Online Library » Statistical Analysis and Data Mining
by
13h ago
Statistical Analysis and Data Mining: The ASA Data Science Journal, Volume 17, Issue 3, June 2024 ..read more
Visit website
Bayesian relative composite quantile regression approach of ordinal latent regression model with L1/2 regularization
Wiley Online Library » Statistical Analysis and Data Mining
by Tian Yu‐Zhu, Wu Chun‐Ho, Tai Ling‐Nan, Mian Zhi‐Bao, Tian Mao‐Zai
1w ago
Abstract Ordinal data frequently occur in various fields such as knowledge level assessment, credit rating, clinical disease diagnosis, and psychological evaluation. The classic models including cumulative logistic regression or probit regression are often used to model such ordinal data. But these modeling approaches conditionally depict the mean characteristic of response variable on a cluster of predictive variables, which often results in non-robust estimation results. As a considerable alternative, composite quantile regression (CQR) approach is usually employed to gain more robust and re ..read more
Visit website
Transfer learning under the Cox model with interval‐censored data
Wiley Online Library » Statistical Analysis and Data Mining
by Mengqi Xie, Tao Hu, Jie Zhou
2w ago
Abstract Transfer learning, focusing on information borrowing to address limited sample size issues, has gained increasing attention in recent years. Our method aims to utilize data from other population groups as a complement to enhance risk factor discernment and failure time prediction among underrepresented subgroups. However, a literature gap exists in effective knowledge transfer from the source to the target for risk assessment with interval-censored data while accommodating population incomparability and privacy constraints. Our objective is to bridge this gap by developing a transfer ..read more
Visit website
A treeless absolutely random forest with closed‐form estimators of expected proximities
Wiley Online Library » Statistical Analysis and Data Mining
by Eugene Laska, Ziqiang Lin, Carole Siegel, Charles Marmar
2w ago
Abstract We introduce a simple variant of a purely random forest, called an absolute random forest (ARF) used for clustering. At every node, splits of units are determined by a randomly chosen feature and a random threshold drawn from a uniform distribution whose support, the range of the selected feature in the root node, does not change. This enables closed-form estimators of parameters, such as pairwise proximities, to be obtained without having to grow a forest. The probabilistic structure corresponding to an ARF is called a treeless absolute random forest (TARF). With high probability, th ..read more
Visit website
Bayesian shrinkage models for integration and analysis of multiplatform high‐dimensional genomics data
Wiley Online Library » Statistical Analysis and Data Mining
by Hao Xue, Sounak Chakraborty, Tanujit Dey
2w ago
Abstract With the increasing availability of biomedical data from multiple platforms of the same patients in clinical research, such as epigenomics, gene expression, and clinical features, there is a growing need for statistical methods that can jointly analyze data from different platforms to provide complementary information for clinical studies. In this paper, we propose a two-stage hierarchical Bayesian model that integrates high-dimensional biomedical data from diverse platforms to select biomarkers associated with clinical outcomes of interest. In the first stage, we use Expectation Maxi ..read more
Visit website
Randomized multiarm bandits: An improved adaptive data collection method
Wiley Online Library » Statistical Analysis and Data Mining
by Zhigen Zhao, Tong Wang, Bo Ji
2w ago
Abstract In many scientific experiments, multiarmed bandits are used as an adaptive data collection method. However, this adaptive process can lead to a dependence that renders many commonly used statistical inference methods invalid. An example of this is the sample mean, which is a natural estimator of the mean parameter but can be biased. This can cause test statistics based on this estimator to have an inflated type I error rate, and the resulting confidence intervals may have significantly lower coverage probabilities than their nominal values. To address this issue, we propose an alterna ..read more
Visit website
Expert‐in‐the‐loop design of integral nuclear data experiments
Wiley Online Library » Statistical Analysis and Data Mining
by Isaac Michaud, Michael Grosskopf, Jesson Hutchinson, Scott Vander Wiel
2w ago
Abstract Nuclear data are fundamental inputs to radiation transport codes used for reactor design and criticality safety. The design of experiments to reduce nuclear data uncertainty has been a challenge for many years, but advances in the sensitivity calculations of radiation transport codes within the last two decades have made optimal experimental design possible. The design of integral nuclear experiments poses numerous challenges not emphasized in classical optimal design, in particular, constrained design spaces (in both a statistical and engineering sense), severely under-determined sys ..read more
Visit website
Hub‐aware random walk graph embedding methods for classification
Wiley Online Library » Statistical Analysis and Data Mining
by Aleksandar Tomčić, Miloš Savić, Miloš Radovanović
3w ago
Abstract In the last two decades, we are witnessing a huge increase of valuable big data structured in the form of graphs or networks. To apply traditional machine learning and data analytic techniques to such data it is necessary to transform graphs into vector-based representations that preserve the most essential structural properties of graphs. For this purpose, a large number of graph embedding methods have been proposed in the literature. Most of them produce general-purpose embeddings suitable for a variety of applications such as node clustering, node classification, graph visualizatio ..read more
Visit website
Compositional variable selection in quantile regression for microbiome data with false discovery rate control
Wiley Online Library » Statistical Analysis and Data Mining
by Runze Li, Jin Mu, Songshan Yang, Cong Ye, Xiang Zhan
1M ago
Abstract Advancement in high-throughput sequencing technologies has stimulated intensive research interests to identify specific microbial taxa that are associated with disease conditions. Such knowledge is invaluable both from the perspective of understanding biology and from the biomedical perspective of therapeutic development, as the microbiome is inherently modifiable. Despite availability of massive data, analysis of microbiome compositional data remains difficult. The nature that relative abundances of all components of a microbial community sum to one poses challenges for statistical a ..read more
Visit website
Smart data augmentation: One equation is all you need
Wiley Online Library » Statistical Analysis and Data Mining
by Yuhao Zhang, Lu Tang, Yuxiao Huang, Yan Ma
1M ago
Abstract Class imbalance is a common and critical challenge in machine learning classification problems, resulting in low prediction accuracy. While numerous methods, especially data augmentation methods, have been proposed to address this issue, a method that works well on one dataset may perform poorly on another. To the best of our knowledge, there is still no one single best approach for handling class imbalance that can be uniformly applied. In this paper, we propose an approach named smart data augmentation (SDA), which aims to augment imbalanced data in an optimal way to maximize downst ..read more
Visit website

Follow Wiley Online Library » Statistical Analysis and Data Mining on FeedSpot

Continue with Google
Continue with Apple
OR