Two‐sample testing for random graphs
Wiley Online Library » Statistical Analysis and Data Mining
by Xiaoyi Wen
2w ago
Abstract The employment of two-sample hypothesis testing in examining random graphs has been a prevalent approach in diverse fields such as social sciences, neuroscience, and genetics. We advance a spectral-based two-sample hypothesis testing methodology to test the latent position random graphs. We propose two distinct asymptotic normal statistics, each optimally designed for two different models—the elementary Erdős–Rényi model and the more complex latent position random graph model. For the latter, the spectral embedding of the adjacency matrix was utilized to estimate the test statistic. T ..read more
Visit website
Issue Information
Wiley Online Library » Statistical Analysis and Data Mining
by
2w ago
Statistical Analysis and Data Mining: The ASA Data Science Journal, Volume 17, Issue 4, August 2024 ..read more
Visit website
Robust multitask learning in high dimensions under memory constraint
Wiley Online Library » Statistical Analysis and Data Mining
by Canyi Chen, Bingzhen Chen, Lingchen Kong, Liping Zhu
3w ago
Abstract We investigate multitask learning in the context of multivariate linear regression with high dimensional covariates and heavy-tailed noise, while under the constraint of limited memory. To tackle the computational complexity arising from the non-smoothness of the quantile loss, we reformulate it as an equivalent least squares loss, which yields robust solutions even in the presence of heavy-tailed noise. We incorporate a group lasso penalty into the quantile loss to produce sparse solutions, and an accelerated proximal sub-gradient descent algorithm to speed up the computation while e ..read more
Visit website
Gaussian process selections in semiparametric multi‐kernel machine regression for multi‐pathway analysis
Wiley Online Library » Statistical Analysis and Data Mining
by Jiali Lin, Inyoung Kim
1M ago
Abstract Analyzing correlated high-dimensional data is a challenging problem in genomics, proteomics, and other related areas. For example, it is important to identify significant genetic pathway effects associated with biomarkers in which a gene pathway is a set of genes that functionally works together to regulate a certain biological process. A pathway-based analysis can detect a subtle change in expression level that cannot be found using a gene-based analysis. Here, we refer to pathway as a set and gene as an element in a set. However, it is challenging to select automatically which pathw ..read more
Visit website
Bayesian batch optimization for molybdenum versus tungsten inertial confinement fusion double shell target design
Wiley Online Library » Statistical Analysis and Data Mining
by Nomita N. Vazirani, Ryan Sacks, Brian M. Haines, Michael J. Grosskopf, David J. Stark, Paul A. Bradley
1M ago
Abstract Access to reliable, clean energy sources is a major concern for national security. Much research is focused on the “grand challenge” of producing energy via controlled fusion reactions in a laboratory setting. For fusion experiments, specifically inertial confinement fusion (ICF), to produce sufficient energy, the fusion reactions in the ICF fuel need to become self-sustaining and burn deuterium-tritium (DT) fuel efficiently. The recent record-breaking NIF ignition shot was able to achieve this goal as well as produce more energy than used to drive the experiment. This achievement bri ..read more
Visit website
Assessment of the real‐time pattern recognition capability of machine learning algorithms
Wiley Online Library » Statistical Analysis and Data Mining
by Elias Polytarchos, Cleopatra Bardaki, Katerina Pramatari
1M ago
Abstract Nowadays data streams from different sources, like blockchain-based and traditional financial transactions, social networks, and interconnected Internet of Things (IoT) devices, are becoming increasingly large in volume and the need to recognize patterns in real time from these streams, while adapting to their velocity and veracity, is emerging. Established machine learning algorithms used for pattern recognition methods have not been designed taking under account the volume, velocity, diversity, and accuracy of the data streams. This research contributes with an approach for assessin ..read more
Visit website
Sequential metamodel‐based approaches to level‐set estimation under heteroscedasticity
Wiley Online Library » Statistical Analysis and Data Mining
by Yutong Zhang, Xi Chen
1M ago
Abstract This paper proposes two sequential metamodel-based methods for level-set estimation (LSE) that leverage the uniform bound built on stochastic kriging: predictive variance reduction (PVR) and expected classification improvement (ECI). We show that PVR and ECI possess desirable theoretical performance guarantees and provide closed-form expressions for their respective sequential sampling criteria to seek the next design point for performing simulation runs, allowing computationally efficient one-iteration look-ahead updates. To enhance understanding, we reveal the connection between PVR ..read more
Visit website
Towards accelerating particle‐resolved direct numerical simulation with neural operators
Wiley Online Library » Statistical Analysis and Data Mining
by Mohammad Atif, Vanessa López‐Marrero, Tao Zhang, Abdullah Al Muti Sharfuddin, Kwangmin Yu, Jiaqi Yang, Fan Yang, Foluso Ladeinde, Yangang Liu, Meifeng Lin, Lingda Li
1M ago
Abstract We present our ongoing work aimed at accelerating a particle-resolved direct numerical simulation model designed to study aerosol–cloud–turbulence interactions. The dynamical model consists of two main components—a set of fluid dynamics equations for air velocity, temperature, and humidity, coupled with a set of equations for particle (i.e., cloud droplet) tracing. Rather than attempting to replace the original numerical solution method in its entirety with a machine learning (ML) method, we consider developing a hybrid approach. We exploit the potential of neural operator learning to ..read more
Visit website
Nonparametric mean and variance adaptive classification rule for high‐dimensional data with heteroscedastic variances
Wiley Online Library » Statistical Analysis and Data Mining
by Seungyeon Oh, Hoyoung Park
1M ago
Abstract In this study, we introduce an innovative methodology aimed at enhancing Fisher's Linear Discriminant Analysis (LDA) in the context of high-dimensional data classification scenarios, specifically addressing situations where each feature exhibits distinct variances. Our approach leverages Nonparametric Maximum Likelihood Estimation (NPMLE) techniques to estimate both the mean and variance parameters. By accommodating varying variances among features, our proposed method leads to notable improvements in classification performance. In particular, unlike numerous prior studies that assume ..read more
Visit website
Semiparametric estimation of average treatment effects in observational studies
Wiley Online Library » Statistical Analysis and Data Mining
by Jun Wang, Yujiao Guo
2M ago
Abstract We propose a semiparametric method to estimate average treatment effects in observational studies based on the assumption of unconfoundedness. Assume that the propensity score model and outcome model are a general single index model, which are estimated by the kernel method and the unknown index parameter is estimated via linearized maximum rank correlation method. The proposed estimator is computationally tractable, allows for large dimension covariates and not involves the approximation of link functions. We showed that the proposed estimator is consistent and asymptotically normall ..read more
Visit website

Follow Wiley Online Library » Statistical Analysis and Data Mining on FeedSpot

Continue with Google
Continue with Apple
OR