Big Company Bachelor's
Data Piques
by
3M ago
I recently wrapped up 4 years at a Big Company. During that time, I switched teams once, I transitioned from an Individual Contributor (IC) to an Engineering Manager (EM), the company name changed from Square to Block, and the number of employees increased from something like 5,000 to 15,000. Those numbers may seem small potatoes compared to many other Big Companies. For me, having only worked (full-time) at $\le$ Series C startups prior to joining Square, this was a 1 or 2 order of magnitude increase in company size ..read more
Visit website
Do you actually need a vector database?
Data Piques
by
1y ago
Spoiler alert: the answer is maybe! Although, my inclusion of the word “actually” betrays my bias. Vector databases are having their day right now. Three different vector DB companies have raised money on valuations up to $700 million (paywall link). Surprisingly, their rise in popularity is not for their “original” purpose in recommendation systems, but rather as an auxillary tool for Large Language Models (LLMs). Many online examples of combining embeddings with LLMs will show you how they store the embeddings in a vector database ..read more
Visit website
Bio
Data Piques
by
1y ago
Welcome! I am a New York City-based data scientist. On this site, you can learn more about me, read my data blog, wade through some idle thoughts, or take a tour through an old, hand-coded website from my Physics days. Enjoy ..read more
Visit website
Data scientists work alone and that's bad
Data Piques
by
1y ago
In Need of a Good Editr Growing up, I had always considered myself a decent writer based on my decent grades in English class. My sophomore year English teacher made it very clear that I did not, in fact, know how to properly write. All of my essays were returned riddled with red-inked edits culminating in low scores. This was disheartening. Thankfully, there was a solution! These essay edits directly told me what I needed to do to improve my writing ..read more
Visit website
Ego, Identity, and Rationalization
Data Piques
by
1y ago
Physics is a macho field. Not physically. Take a look at your average physics student. Physics is academically macho. Physics majors love to scoff at the other sciences. Biology’s just bullshit. Where’s the math?! Chemistry is all memorization, whereas Physics derives the quantum numbers from first principles. “Soft” science? That’s not science. Even within Physics, there’s a hierarchy of hardcoreness. Theoretical Physics is clearly at the top. Those who can’t cut it end up becoming experimentalists ..read more
Visit website
ML Monitoring with Materialize
Data Piques
by
1y ago
In my last post, I strongly encouraged monitoring Machine Learning (ML) models with streaming databases. In this post, I will demonstrate an example of how to do this with Materialize. If you would like to skip to the code, I have put everything in this post into AIspy, a repo on GitHub. DTCase Study Let’s assume that we are a machine learning practitioner who works for Down To Clown, a Direct To Consumer (DTC) company that sells clown supplies ..read more
Visit website
Let's Continue Bundling into the Database
Data Piques
by
2y ago
A very silly blog post came out a couple months ago about The Unbundling of Airflow. I didn’t fully read the article, but I saw its title and skimmed it enough to think that it might’ve been too thin of an argument to hold water but just thick enough to clickbait the VC world with the word “unbundling” while simultaneously Cunningham’s Law-ing the data world. There was certainly a Twitter discourse ..read more
Visit website
Bayesian Rock Climbing Rankings
Data Piques
by
2y ago
(Note: this is not a 343 minute read.) Just like every other scientist, engineer, or Matt, I’m pretty into rock climbing. Being carless in NYC, I primarily climb indoors. One of the first things that you learn when going to a climbing gym is that you don’t get to grab on to every “hold” (the bright plastic things on the wall). Different colored holds correspond to different “routes”, and you challenge yourself by only using the holds for a particular route ..read more
Visit website
Everything Gets a Package: My Python Data Science Setup
Data Piques
by
2y ago
I make Python packages for everything. Big projects obviously get a package, but so does every tiny analysis. Spinning up a quick jupyter notebook to check something out? Build a package first. Oh yeah, and every package gets its own virtual environment. Let’s back up a little bit so that I can tell you why I do this. After that, I’ll show you how I do this. Notably, my workflow is set up to make it simple to stay consistent ..read more
Visit website
Autoretraining is Easy if You Skip the Hard Parts
Data Piques
by
2y ago
You Can Not Measure What You Do Not Care To Manage When I started my first data scientist job in 2015, the team I joined had a recommendation system that would run every night to compute new recommendations for all users of our platform. This was the easiest way to handle the cold start problem. At the next company I worked at, we had a rule that every machine learning model must be setup to automatically retrain (“autoretrain”) on fresh data on a periodic basis ..read more
Visit website

Follow Data Piques on FeedSpot

Continue with Google
Continue with Apple
OR