MRAN is getting shutdown - what else is there for reproducibility with R, or why reproducibility is on a continuum?
Bruno Rodrigues
by
3w ago
You expect me to read this long ass blog post? If you’re too busy to read this blog post, know that I respect your time. The table below summarizes this blog post: Need Solution I want to start a project and make it reproducible. {renv} and Docker There’s an old script laying around that I want to run. {groundhog} and Docker I want to work inside an environment that Docker and the Posit enables me to run code in a reproducible way. CRAN mirror. But this table doesn’t show the whole picture, especially the issues with relying so much on Docker. So if you’re interesting in mak ..read more
Visit website
Functional programming explains why containerization is needed for reproducibility
Bruno Rodrigues
by
2M ago
I’ve had some discussions online and in the real world about this blog post and I’d like to restate why containerization is needed for reproducibility, and do so from the lens of functional programming. When setting up a pipeline, wether you’re a functional programming enthusiast or not, you’re aiming at setting it up in a way that this pipeline is the composition of (potentially) many referentially transparent and pure functions. As a reminder: referentially transparent functions are functions that always return the same output for the same given input. So for example f(x, y):=x+y is ref ..read more
Visit website
Reproducibility with Docker and Github Actions for the average R enjoyer
Bruno Rodrigues
by
2M ago
This blog post is a summary of Chapters 9 and 10 of this ebook I wrote for a course The goal is the following: we want to write a pipeline that produces some plots. We want the code to be executed inside a Docker container for reproducibility, and we want this container to get executed on Github Actions. Github Actions is a Continuous Integration and Continuous Delivery service from Github that allows you to execute arbitrary code on events (like pushing code to a repo). It’s pretty neat. For example, you could be writing a paper using Latex and get the pdf compiled on Github Actions each t ..read more
Visit website
Open source is a hard requirement for reproducibility
Bruno Rodrigues
by
2M ago
Open source is a hard requirement for reproducibility. No ifs nor buts. And I’m not only talking about the code you typed for your research paper/report/analysis. I’m talking about the whole ecosystem that you used to type your code. (I won’t be talking about making the data available, because I think this is another blog post on its own.) Is your code open? That’s good. But is it code for a proprietary program, like STATA, SAS or MATLAB? Then your project is not reproducible. It doesn’t matter if this code is well documented and written and available on Github. This project is not reproduc ..read more
Visit website
How to deal with annoying medium sized data inside a Shiny app
Bruno Rodrigues
by
3M ago
This blog post is taken from a chapter of my ebook on building reproducible analytical pipelines, which you can read here If you want to follow along, you can start by downloading the data I use here. This is a smaller dataset made from the one you can get here. Uncompressed it’ll be a 2.4GB file. Not big data in any sense, but big enough to be annoying to handle without the use of some optimization strategies (I’ve seen such data described as medium sized data before.). One such strategy is only letting the computations run once the user gives the green light by clicking on an action button ..read more
Visit website
A Linux Live USB as a statistical programming dev environment
Bruno Rodrigues
by
3M ago
This blog post is divided in two parts: in the first part I’ll show you how to create a Linux Live USB with persistent storage that can be used as development environment, and in the second part I’ll show you the easiest way to set up RStudio and R in Ubuntu. Making your own, portable, development environment based on Ubuntu or Debian I’m currently teaching a course at the University of Luxembourg, which focuses on setting up reproducible analytical pipelines (if you’re interested, you can find the course notes here). The problem is that my work laptop runs Windows, and I didn’t want to teach ..read more
Visit website
What's the fastest way to search and replace strings in a data frame?
Bruno Rodrigues
by
5M ago
I’ve tweeted this: Just changed like 100 grepl calls to stringi::stri_detect and my pipeline now runs 4 times faster #RStats — Bruno Rodrigues (@brodriguesco) July 20, 2022 much discussed ensued. Some people were surprised, because in their experience, grepl() was faster than alternatives, especially if you set the perl parameter in grepl() to TRUE. My use case was quite simple; I have a relatively large data set (half a million lines) with one column with several misspelling of city names. So I painstakingly wrote some code to correct the spelling of the major cities (those that came up o ..read more
Visit website
R will always be arcane to those who do not make a serious effort to learn it...
Bruno Rodrigues
by
5M ago
R will always be arcane to those who do not make a serious effort to learn it. It is not meant to be intuitive and easy for casual users to just plunge into. It is far too complex and powerful for that. But the rewards are great for serious data analysts who put in the effort. — Berton Gunter R-help August 2007 I’ve posted this quote on twitter the other day and it sparked some discussion. Personally I agree with this quote, and I’ll explain why. Just like any tool aimed at professionals, R requires people to spend time to actually master it. There is no ifs or buts. Just like I don’t want ..read more
Visit website
Add logging to your functions using my newest package `{loud}`
Bruno Rodrigues
by
5M ago
UPDATE: {loud} has been superseded by {chronicle}, read about it here This is a short blog post to announce the early alpha, hyper unstable, use at your own peril, package I’ve been working on for the past 6 hours or so (actually longer if I add all the research/study time). This package provides the function loudly() which allows you to do cool stuff like: # First two lines install the package # install.packages("devtools") # devtools::install_github("b-rodrigues/loud") library(loud) ## Loading required package: rlang loud_sqrt <- loudly(sqrt) loud_sqrt(1:10) ## $result ## [1 ..read more
Visit website
Speedrunning row-oriented workflows
Bruno Rodrigues
by
5M ago
If you haven’t, you should read this first. This is part two. Speedrunning is the… hrm… - sport? art? - of playing games from start to finish as fast as possible. Speedrunning requires an insane amount of knowledge of the game being played, as well as an enourmous amount of skill. Also, contrary to what you might think, it is a community effort. Players do speedrun the game alone, and it is a ferocious competition, each one of them aiming for the top spot on the leaderboards. But discovering the strategies that will allow the top players to shave off, sometimes literally, hundredths of seco ..read more
Visit website

Follow Bruno Rodrigues on Feedspot

Continue with Google
OR