Visualizing {dplyr}’s mutate(), summarize(), group_by(), and ungroup() with animations
Andrew Heiss Blog
by Andrew Heiss
2w ago
I’ve used Garrick Aden-Buie’s tidyexplain animations since he first made them in 2018. They’re incredibly useful for teaching—being able to see which rows left_join() includes when merging two datasets, or which cells end up where when pivoting longer or pivoting wider is so valuable. Check them all out—they’re so fantastic: left_join() animation by Garrick Aden-Buie One set of animations that I’ve always wished existed but doesn’t is how {dplyr}’s mutate(), summarize(), group_by(), and summarize() work. Unlike other more straightforward {dplyr} functions like filter() and select(), these mu ..read more
Visit website
Demystifying causal inference estimands: ATE, ATT, and ATU
Andrew Heiss Blog
by Andrew Heiss
3w ago
In my causal inference class, I spend just one week talking about the Rubin causal model and potential outcomes. This view of causality argues that for any kind of intervention (passing a new policy, participation in a nonprofit program, taking a specific kind of medicine, etc.), people will have one of two possible outcomes: What would happen if they receive the intervention or treatment, and What would happen if they do not receive the treatment These two outcomes are potential outcomes. Both are plausible, but only one will happen in real life. These potential outcomes lead to a bunch of ..read more
Visit website
DIY API with Make and {plumber}
Andrew Heiss Blog
by Andrew Heiss
3M ago
Complete tutorial and code See the full tutorial here. You can also see the tutorial’s code here and the code for the final API here. For years, I’ve tracked all sorts of data about myself (and my family) through Google Forms, Airtable, and devices like Fitbits to keep track of all sorts of things: personal goals, progress of research projects, current health status, books read, and so on. It’s nice to have all this data, but it’s hard to use it all immediately. I often look at it at the end of the year, or every few months, or whatever, but having an instant snapshot is helpful too. That ..read more
Visit website
How to create separate bibliographies in a Quarto document
Andrew Heiss Blog
by Andrew Heiss
4M ago
tl;dr If you want to skip the explanation and justification for why you might want separate bibliographies, you can skip down to the example section, or just go see some example files at GitHub. Why use separate bibliographies? In academic articles, it’s common to have a supplemental appendix with extra tables, figures, robustness checks, additional math, proofs, and other details. Putting content in the appendix is important for providing additional evidence for the paper’s argument—and for placating reviewers who want to see a dozen more robustness checks. Also, journal articles have wor ..read more
Visit website
How to use natural and base 10 log scales in ggplot2
Andrew Heiss Blog
by Andrew Heiss
4M ago
I always forget how to deal with logged values in ggplot—particularly things that use the natural log. The {scales} package was invented in part to allow users to adjust axes and scales in plots, including adjusting axes to account for logged values, but there have been some new developments in {scales} that have made existing answers (like this one on StackOverflow) somewhat obsolete (e.g. the trans_breaks() and trans_format() functions used there are superceded and deprecated). So here’s a quick overview of how to use 2022-era {scales} to adjust axis breaks and labels to use both base 1 ..read more
Visit website
Guide to understanding the intuition behind the Dirichlet distribution
Andrew Heiss Blog
by Andrew Heiss
4M ago
I’ve been finishing up a project that uses ordered Beta regression (Kubinec 2022), a neat combination of Beta regression and ordered logistic regression that you can use for modeling continuous outcomes that are bounded on either side (in my project, we’re modeling a variable that can only be between 1 and 32, for instance). It’s possible to use something like zero-one-inflated Beta regression for outcomes like this, but that kind of model requires a lot more complexity and computing power (i.e. you need separate simultaneous models to predict if the outcome is (a) zero-or-one vs. no ..read more
Visit website
Manually generate predicted values for logistic regression with matrix multiplication in R
Andrew Heiss Blog
by Andrew Heiss
4M ago
In a project I’m working on, I need to generate predictions from a logistic regression model. That’s typically a really straightforward task—we can just use predict() to plug a dataset of values into a model, which will spit out predictions either on the (gross, uninterpretable) log odds scale or on the (nice, interpretable) percentage-point scale.1 1 And for pro-level predictions, use predictions() from {marginaleffects}. However, in this project I cannot use predict()—I’m working with a big matrix of posterior coefficient draws from a Bayesian model fit with raw Stan code, so ther ..read more
Visit website
The ultimate practical guide to multilevel multinomial conjoint analysis with R
Andrew Heiss Blog
by Andrew Heiss
4M ago
I recently posted a guide (mostly for future-me) about how to analyze conjoint survey data with R. I explore two different estimands that social scientists are interested in—causal average marginal component effects (AMCEs) and descriptive marginal means—and show how to find them with R, with both frequentist and Bayesian approaches. However, that post is a little wrong. It’s not wrong wrong, but it is a bit oversimplified. When political scientists, psychologists, economists, and other social scientists analyze conjoint data, they overwhelmingly do it with ordinary least squares (OLS) regress ..read more
Visit website
How to fill maps with density gradients with R, {ggplot2}, and {sf}
Andrew Heiss Blog
by Andrew Heiss
4M ago
The students in my summer data visualization class are finishing up their final projects this week and I’ve been answering a bunch of questions on our class Slack. Often these are relatively standard reminders of how to tinker with specific ggplot layers (chaning the colors of a legend, adding line breaks in labels, etc.), but today one student had a fascinating and tricky question that led me down a realy fun dataviz rabbit hole. She was making a map with hundreds of points representing specific locations of events. This led to overplotting—it’s really hard to stick hundreds of dots on a smal ..read more
Visit website
The ultimate practical guide to conjoint analysis with R
Andrew Heiss Blog
by Andrew Heiss
4M ago
In my research, I study international nongovernmental organizations (INGOs) and look at how lots of different institutional and organizational factors influence INGO behavior. For instance, many authoritarian regimes have passed anti-NGO laws and engaged in other forms of legal crackdown, which has forced NGOs to change their programming strategies and their sources of funding. In one ongoing project with Suparna Chaudhry and Marc Dotson, we look at how individual private donors feel about NGOs that face legal crackdown abroad and how the experience of crackdown interacts with donor characteri ..read more
Visit website

Follow Andrew Heiss Blog on FeedSpot

Continue with Google
Continue with Apple
OR