Wiley Online Library » Statistical Analysis and Data Mining on Feedspot

Wiley Online Library » Statistical Analysis and Data Mining

1,054 FOLLOWERS

Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact.

Issue Information

Wiley Online Library » Statistical Analysis and Data Mining

13h ago

Statistical Analysis and Data Mining: The ASA Data Science Journal, Volume 17, Issue 3, June 2024 ..read more

Visit website

Bayesian relative composite quantile regression approach of ordinal latent regression model with L1/2 regularization

Wiley Online Library » Statistical Analysis and Data Mining

by Tian Yu‐Zhu, Wu Chun‐Ho, Tai Ling‐Nan, Mian Zhi‐Bao, Tian Mao‐Zai

1w ago

Abstract Ordinal data frequently occur in various fields such as knowledge level assessment, credit rating, clinical disease diagnosis, and psychological evaluation. The classic models including cumulative logistic regression or probit regression are often used to model such ordinal data. But these modeling approaches conditionally depict the mean characteristic of response variable on a cluster of predictive variables, which often results in non-robust estimation results. As a considerable alternative, composite quantile regression (CQR) approach is usually employed to gain more robust and re ..read more

Visit website

Transfer learning under the Cox model with interval‐censored data

Wiley Online Library » Statistical Analysis and Data Mining

by Mengqi Xie, Tao Hu, Jie Zhou

2w ago

Abstract Transfer learning, focusing on information borrowing to address limited sample size issues, has gained increasing attention in recent years. Our method aims to utilize data from other population groups as a complement to enhance risk factor discernment and failure time prediction among underrepresented subgroups. However, a literature gap exists in effective knowledge transfer from the source to the target for risk assessment with interval-censored data while accommodating population incomparability and privacy constraints. Our objective is to bridge this gap by developing a transfer ..read more

Visit website

A treeless absolutely random forest with closed‐form estimators of expected proximities

Wiley Online Library » Statistical Analysis and Data Mining

by Eugene Laska, Ziqiang Lin, Carole Siegel, Charles Marmar

2w ago

Abstract We introduce a simple variant of a purely random forest, called an absolute random forest (ARF) used for clustering. At every node, splits of units are determined by a randomly chosen feature and a random threshold drawn from a uniform distribution whose support, the range of the selected feature in the root node, does not change. This enables closed-form estimators of parameters, such as pairwise proximities, to be obtained without having to grow a forest. The probabilistic structure corresponding to an ARF is called a treeless absolute random forest (TARF). With high probability, th ..read more

Visit website

Bayesian shrinkage models for integration and analysis of multiplatform high‐dimensional genomics data

Wiley Online Library » Statistical Analysis and Data Mining

by Hao Xue, Sounak Chakraborty, Tanujit Dey

2w ago

Abstract With the increasing availability of biomedical data from multiple platforms of the same patients in clinical research, such as epigenomics, gene expression, and clinical features, there is a growing need for statistical methods that can jointly analyze data from different platforms to provide complementary information for clinical studies. In this paper, we propose a two-stage hierarchical Bayesian model that integrates high-dimensional biomedical data from diverse platforms to select biomarkers associated with clinical outcomes of interest. In the first stage, we use Expectation Maxi ..read more

Visit website

Randomized multiarm bandits: An improved adaptive data collection method

Wiley Online Library » Statistical Analysis and Data Mining

by Zhigen Zhao, Tong Wang, Bo Ji

2w ago

Abstract In many scientific experiments, multiarmed bandits are used as an adaptive data collection method. However, this adaptive process can lead to a dependence that renders many commonly used statistical inference methods invalid. An example of this is the sample mean, which is a natural estimator of the mean parameter but can be biased. This can cause test statistics based on this estimator to have an inflated type I error rate, and the resulting confidence intervals may have significantly lower coverage probabilities than their nominal values. To address this issue, we propose an alterna ..read more

Visit website

Expert‐in‐the‐loop design of integral nuclear data experiments

Wiley Online Library » Statistical Analysis and Data Mining

by Isaac Michaud, Michael Grosskopf, Jesson Hutchinson, Scott Vander Wiel

2w ago

Abstract Nuclear data are fundamental inputs to radiation transport codes used for reactor design and criticality safety. The design of experiments to reduce nuclear data uncertainty has been a challenge for many years, but advances in the sensitivity calculations of radiation transport codes within the last two decades have made optimal experimental design possible. The design of integral nuclear experiments poses numerous challenges not emphasized in classical optimal design, in particular, constrained design spaces (in both a statistical and engineering sense), severely under-determined sys ..read more

Visit website

Hub‐aware random walk graph embedding methods for classification

Wiley Online Library » Statistical Analysis and Data Mining

by Aleksandar Tomčić, Miloš Savić, Miloš Radovanović

3w ago

Abstract In the last two decades, we are witnessing a huge increase of valuable big data structured in the form of graphs or networks. To apply traditional machine learning and data analytic techniques to such data it is necessary to transform graphs into vector-based representations that preserve the most essential structural properties of graphs. For this purpose, a large number of graph embedding methods have been proposed in the literature. Most of them produce general-purpose embeddings suitable for a variety of applications such as node clustering, node classification, graph visualizatio ..read more

Visit website

Compositional variable selection in quantile regression for microbiome data with false discovery rate control

Wiley Online Library » Statistical Analysis and Data Mining

by Runze Li, Jin Mu, Songshan Yang, Cong Ye, Xiang Zhan

1M ago

Abstract Advancement in high-throughput sequencing technologies has stimulated intensive research interests to identify specific microbial taxa that are associated with disease conditions. Such knowledge is invaluable both from the perspective of understanding biology and from the biomedical perspective of therapeutic development, as the microbiome is inherently modifiable. Despite availability of massive data, analysis of microbiome compositional data remains difficult. The nature that relative abundances of all components of a microbial community sum to one poses challenges for statistical a ..read more

Visit website

Smart data augmentation: One equation is all you need

Wiley Online Library » Statistical Analysis and Data Mining

by Yuhao Zhang, Lu Tang, Yuxiao Huang, Yan Ma

1M ago

Abstract Class imbalance is a common and critical challenge in machine learning classification problems, resulting in low prediction accuracy. While numerous methods, especially data augmentation methods, have been proposed to address this issue, a method that works well on one dataset may perform poorly on another. To the best of our knowledge, there is still no one single best approach for handling class imbalance that can be uniformly applied. In this paper, we propose an approach named smart data augmentation (SDA), which aims to augment imbalanced data in an optimal way to maximize downst ..read more

Visit website

Follow Wiley Online Library » Statistical Analysis and Data Mining on FeedSpot