An S-Curve Method for Abrupt and Gradual Changepoint Analysis
Journal of Data Science
by Lan Jiang,Collin Kennedy,Norman Matloff
1w ago
Changepoint analysis has had a striking variety of applications, and a rich methodology has been developed. Our contribution here is a new approach that uses nonlinear regression analysis as an intermediate computational device. The tool is quite versatile, covering a number of different changepoint scenarios. It is largely free of parametric model assumptions, and has the major advantage of providing standard errors for formal statistical inference. Both abrupt and gradual changes are covered. PDF  XML ..read more
Visit website
Dynamic Network Poisson Autoregression with Application to COVID-19 Count Data
Journal of Data Science
by Manabu Asai,Amanda M. Y. Chu,Mike K. P. So
2w ago
There is growing interest in accommodating network structure in panel data models. We consider dynamic network Poisson autoregressive (DN-PAR) models for panel count data, enabling their use in regard to a time-varying network structure. We develop a Bayesian Markov chain Monte Carlo technique for estimating the DN-PAR model, and conduct Monte Carlo experiments to examine the properties of the posterior quantities and compare dynamic and constant network models. The Monte Carlo results indicate that the bias in the DN-PAR models is negligible, while the constant network model suffers from bias ..read more
Visit website
Physician Effects in Critical Care: A Causal Inference Approach Through Propensity Weighting with Parametric and Super Learning Methods
Journal of Data Science
by Yuan Bian,Yu Shi,Hui Guo,Grace Y. Yi,Wenqing He
2w ago
Physician performance is critical to caring for patients admitted to the intensive care unit (ICU), who are in life-threatening situations and require high level medical care and interventions. Evaluating physicians is crucial for ensuring a high standard of medical care and fostering continuous performance improvement. The non-randomized nature of ICU data often results in imbalance in patient covariates across physician groups, making direct comparisons of the patients’ survival probabilities for each physician misleading. In this article, we utilize the propensity weighting method to addres ..read more
Visit website
Visual Analytics for NASCAR Motorsports
Journal of Data Science
by Kornelia Bastin,Christopher G. Healey
2w ago
The National Association of Stock Car Auto Racing (NASCAR) is ranked among the top ten most popular sports in the United States. NASCAR events are characterized by on-track racing punctuated by pit stops since cars must refuel, replace tires, and modify their setup throughout a race. A well-executed pit stop can allow drivers to gain multiple seconds on their opponents. Strategies around when to pit and what to perform during a pit stop are under constant evaluation. One currently unexplored area is publically available communication between each driver and their pit crew during the race. Due ..read more
Visit website
A Two-Stage Classification for Dealing with Unseen Clusters in the Testing Data
Journal of Data Science
by Jung Wun Lee,Ofer Harel
2w ago
Classification is an important statistical tool that has increased its importance since the emergence of the data science revolution. However, a training data set that does not capture all underlying population subgroups (or clusters) will result in biased estimates or misclassification. In this paper, we introduce a statistical and computational solution to a possible bias in classification when implemented on estimated population clusters. An unseen-cluster problem denotes the case in which the training data does not contain all underlying clusters in the population. Such a scenario may occu ..read more
Visit website
Unified Robust Boosting
Journal of Data Science
by Zhu Wang
2w ago
Boosting is a popular algorithm in supervised machine learning with wide applications in regression and classification problems. It combines weak learners, such as regression trees, to obtain accurate predictions. However, in the presence of outliers, traditional boosting may yield inferior results since the algorithm optimizes a convex loss function. Recent literature has proposed boosting algorithms that optimize robust nonconvex loss functions. Nevertheless, there is a lack of weighted estimation to indicate the outlier status of observations. This article introduces the iteratively reweigh ..read more
Visit website
Editorial: Inquire, Investigate, Implement, Innovate – Symposium on Data Science and Statistics 2023
Journal of Data Science
by Emily Dodwell,Amanda A. Koepke
1M ago
PDF  XML ..read more
Visit website
Producing Fast and Convenient Machine Learning Benchmarks in R with the stressor Package
Journal of Data Science
by Sam Haycock,Brennan Bean,Emily Burchfield
1M ago
The programming overhead required to implement machine learning workflows creates a barrier for many discipline-specific researchers with limited programming experience. The stressor package provides an R interface to Python’s PyCaret package, which automatically tunes and trains 14-18 machine learning (ML) models for use in accuracy comparisons. In addition to providing an R interface to PyCaret, stressor also contains functions that facilitate synthetic data generation and variants of cross-validation that allow for easy benchmarking of the ability of machine-learning models to extrapolate o ..read more
Visit website
Spatial-Temporal Extreme Modeling for Point-to-Area Random Effects (PARE)
Journal of Data Science
by Carlynn Fagnant,Julia C. Schedler,Katherine B. Ensor
1M ago
One measurement modality for rainfall is a fixed location rain gauge. However, extreme rainfall, flooding, and other climate extremes often occur at larger spatial scales and affect more than one location in a community. For example, in 2017 Hurricane Harvey impacted all of Houston and the surrounding region causing widespread flooding. Flood risk modeling requires understanding of rainfall for hydrologic regions, which may contain one or more rain gauges. Further, policy changes to address the risks and damages of natural hazards such as severe flooding are usually made at the community/neigh ..read more
Visit website
Evaluating Perceptual Judgements on 3D Printed Bar Charts
Journal of Data Science
by Tyler Wiederich,Susan VanderPlas
2M ago
Graphical design principles typically recommend minimizing the dimensionality of a visualization - for instance, using only 2 dimensions for bar charts rather than providing a 3D rendering, because this extra complexity may result in a decrease in accuracy. This advice has been oft repeated, but the underlying experimental evidence is focused on fixed 2D projections of 3D charts. In this paper, we describe an experiment which attempts to establish whether the decrease in accuracy extends to 3D virtual renderings and 3D printed charts. We replicate the grouped bar chart comparisons in the 1984 ..read more
Visit website

Follow Journal of Data Science on FeedSpot

Continue with Google
Continue with Apple
OR