That’s so Random
1,919 FOLLOWERS
Edwin Thoen is a statistician turned data scientist, currently working at Funda. After doing bachelor programs in psychology and history he did the Statistical Science program at Leiden University, at which he completed his MSc with professor Hein Putter. From that moment he started R programming, he was hooked, and it always remained his weapon of choice. He is the author and maintainer of..
That’s so Random | A playground for data analysis and R programming.
4y ago
“Overengineering is the act of designing a product to be more robust or have more features than often necessary for its intended use, or for a process to be unnecessarily complex or inefficient.” This is how the Wikipedia page on overengineering starts. It is the diligent engineer who wants to make sure that every possible feature is incorporated in the product, that creates an overengineered product. We find overengineering in real world products, as well as in software. It is a relevant concept in data science as well. First of all, because software engineering is very much a part of data sc ..read more
That’s so Random | A playground for data analysis and R programming.
4y ago
“Overengineering is the act of designing a product to be more robust or have more features than often necessary for its intended use, or for a process to be unnecessarily complex or inefficient.” This is how the Wikipedia page on overengineering starts. It is the diligent engineer who wants to make sure that every possible feature is incorporated in the product, that creates an overengineered product. We find overengineering in real world products, as well as in software. It is a relevant concept in data science as well. First of all, because software engineering is very much a part of data sc ..read more
That’s so Random | A playground for data analysis and R programming.
4y ago
A few weeks ago, Miles McBain toke us for a tour through his project organisation in this blogpost. Not surprisingly given Miles’ frequent shoutouts about the package, it is completely centered around drake. About a year ago on twitter, he convinced me to take this package for a spin. I was immediately sold. It cured a number of pains I had over the years in machine learning projects; storing intermediate results, reproducibility, having a single version of the truth, forgetting in which order steps should be applied, etc. In addition to Miles, I’d like to share my drake-centered workflow. As ..read more
That’s so Random | A playground for data analysis and R programming.
5y ago
Two years ago, I wrote about meta-learning to fight imposter feelings. In this blog I made a distinction between impostering because you don’t feel you are up to the job, and because you feel you ought to know something which you don’t. The meta-learning blog focuses on how you define yourself as a data scientist and what, as a consequence, you decide to learn (and more importantly what not). Staying sane while doing data science is something that always has my interest. Imposter feelings are a major foe to the joy this work can bring. I came across/to two more insights on the topic that I fou ..read more
That’s so Random | A playground for data analysis and R programming.
5y ago
We did a large machine learning project at work recently. It involved two data scientists, two backend engineers and a data engineer, all working on-and-off on the R code during the project. The project had many interesting and new aspects to me, among them are doing data science in an agilish way, how to keep track of the different model versions and how to deal with directories, data and code on different machines. I planned to do a series of write-ups this summer, describing each of them, but then this happened
Let me know if you write this up somewhere and I could summarize and/or link t ..read more
That’s so Random | A playground for data analysis and R programming.
5y ago
Two years ago, I wrote about meta-learning to fight imposter feelings. In this blog I made a distinction between impostering because you don’t feel you are up to the job, and because you feel you ought to know something which you don’t. The meta-learning blog focuses on how you define yourself as a data scientist and what, as a consequence, you decide to learn (and more importantly what not). Staying sane while doing data science is something that always has my interest. Imposter feelings are a major foe to the joy this work can bring. I came across/to two more insights on the topic that I fou ..read more
That’s so Random | A playground for data analysis and R programming.
5y ago
I have been meaning to write this for a while, but with the dplyr vs data.table feud rising to new levels on Twitter the last couple of days, it all of a sudden seems more relevant. For those who don’t know what I am talking about, there are different ways of doing data science. There are the two major languages R and python, with their own implementations for analysing data. Then within R there are the different flavours of using the base language or applying the functions of the tidyverse. Within the tidyverse there is the dplyr package to do data wrangling, of which the functionality of the ..read more
That’s so Random | A playground for data analysis and R programming.
5y ago
Yesterday v.0.5.0 of the padr package hit CRAN. You will find the main changes in the thicken function, that has gained two new arguments. First of all, by an idea of Adam Stone, you are now enabled to drop the original datetime variable from the data frame by using drop = TRUE. This argument defaults to FALSE to ensure backwards compatibility. Without setting drop to TRUE the datetime variable will be returned alongside the added, thickened variable:
library(padr)
thicken(coffee, interval = "hour")
## time_stamp amount time_stamp_hour
## 1 2016-07-07 09:11:21 3.14 2016-07 ..read more
That’s so Random | A playground for data analysis and R programming.
6y ago
The European tennis season is in full swing, with Roland Garros starting today and Wimbledon taking place in a few weeks. For a sports buff like me, it is the essence of summer (together with the Tour de France). Time to dive into some tennis data. As a follower of both the men’s and the women’s tour it occurred to me that in the latter the tournaments are less predictable. My gut feel was that in the men’s matches the favourite wins the match more frequently than at women’s matches. Of course gut feels are what the world makes go round, unless you are a data scientist. So lets analyse all the ..read more