Streamline - tidy data as a service
Simply Statistics
by
3y ago
Tldr: We started a company called Streamline Data Science https://streamlinedatascience.io/ that offers tidy data as a service. We are looking for customers, partnerships and employees as we scale up after closing our funding round! Most of my career, I have worked in the muck of data cleaning. In the world of genomics, a lot of my research has focused on batch effects, synthesizing big genomic data into usable formats, and generally making data easier to use. A couple of years ago, we also started a company called Problem Forward Data Science. Problem Forward offered fractional data science s ..read more
Visit website
The Four Jobs of the Data Scientist
Simply Statistics
by
3y ago
In 2019 I wrote a post about The Tentpoles of Data Science that tried to distill the key skills of the data scientist. In the post I wrote: When I ask myself the question “What is data science?” I tend to think of the following five components. Data science is (1) the application of design thinking to data problems; (2) the creation and management of workflows for transforming and processing data; (3) the negotiation of human relationships to identify context, allocate resources, and characterize audiences for data analysis products; (4) the application of statistical methods to quantify evid ..read more
Visit website
Palantir Shows Its Cards
Simply Statistics
by
3y ago
File this under long-term followup, but just about four years ago I wrote about Palantir, the previously secretive but now soon to be public data science company, and how its valuation was a commentary on the value of data science more generally. Well, just recently Palantir filed to go public and therefore submitted a registration statement (S-1) describing its business. It’s a fascinating read, if you’re into that kind of stuff. But the important thing is that Palantir itself summarized the question I asked more than 4 years ago. In their enumeration of risk factors, one risk factor they hig ..read more
Visit website
Asymptotics of Reproducibility
Simply Statistics
by
3y ago
Every once in a while, I see a tweet or post that asks whether one should use tool X or software Y in order to “make their data analysis reproducible”. I think this is a reasonable question because, in part, there are so many good tools out there! This is undeniably a good thing and quite a contrast to just 10 years ago when there were comparatively few choices. The question of toolset though is not a question worth focusing on too much because it’s the wrong question to ask. Of course, you should choose a tool/software package that is reasonably usable by a large percentage of your audience ..read more
Visit website
Amplifying people I trust on COVID-19
Simply Statistics
by
3y ago
Like a lot of people, I’ve been glued to various media channels trying to learn about the latest with what is going on with COVID-19. I have also been frustrated - like a lot of people - with misinformation and the deluge of preprints and peer reviewed material. Some of this information is critically important and some is hard to trust. As a biostatistician at a very visible school of public health I have also had a number of media outreaches, but I’ve been hesitant to do any interviews or talk about COVID-19. The reason is that even thought I have a PhD in Biostatistics and I work in a School ..read more
Visit website
Is Artificial Intelligence Revolutionizing Environmental Health?
Simply Statistics
by
3y ago
NOTE: This post was written by Kevin Elliott, Michigan State University; Nicole Kleinstreuer, National Institutes of Health; Patrick McMullen, ScitoVation; Gary Miller, Columbia University; Bhramar Mukherjee, University of Michigan; Roger D. Peng, Johns Hopkins University; Melissa Perry, The George Washington University; Reza Rasoulpour, Corteva Agriscience, and Elizabeth Boyle, National Academies of Sciences, Engineering, and Medicine. The full summary for the workshop on which this post is based can be obtained here. On June 6 and 7, 2019, the National Academy of Sciences, Engineering, and M ..read more
Visit website
You can replicate almost any plot with R
Simply Statistics
by
3y ago
Although R is great for quickly turning data into plots, it is not widely used for making publication ready figures. But, with enough tinkering you can make almost any plot in R. For examples check out the flowingdata blog or the Fundamentals of Data Visualization book. Here I show five charts from the lay press that I use as examples in my data science courses. In the past I would show the originals, but I decided to replicate them in R to make it possible to generate class notes with just R code (there was a lot of googling involved). Below I show the original figures followed by R code and ..read more
Visit website
The data deluge means no reasonable expectation of privacy - now what?
Simply Statistics
by
3y ago
Today a couple of different things reminded me about something that I suppose many people are talking about but has been on my mind as well. The idea is that many of our societies social norms are based on the reasonable expectation of privacy. But the reasonable expectation of privacy is increasingly a thing of the past. Three types of data I’ve been thinking about are: Obviously identifying data: Data like cellphone GPS traces and public social media posts are obviously information that is indentifiable and reduce privacy. Data that can be inferred from public data: We can also now infer a ..read more
Visit website
More datasets for teaching data science: The expanded dslabs package
Simply Statistics
by
3y ago
Introduction We have expanded the dslabs package, which we previously introduced as a package containing realistic, interesting and approachable datasets that can be used in introductory data science courses. This release adds 7 new datasets on climate change, astronomy, life expectancy, and breast cancer diagnosis. They are used in improved problem sets and new projects within the HarvardX Data Science Professional Certificate Program, which teaches beginning R programming, data visualization, data wrangling, statistics, and machine learning for students with no prior coding background. You c ..read more
Visit website
A
by
ago
A ..read more
Visit website

Follow Simply Statistics on FeedSpot

Continue with Google
Continue with Apple
OR