jbencook
94 FOLLOWERS
The goal of this website is to help people who want to become better data scientists become better programmers - anyone on a journey like mine. I mostly do this with bite-sized tricks, solutions, and quick starts. I hope you like it!
jbencook
1y ago
The key to unlocking the power of machine learning (ML) lies in having high-quality labeled data. In this email, we’ll explore the significance of labeled data, its impact on the performance of ML models, and how you can capitalize on this natural resource of the modern age to drive your business forward.
First, let’s briefly discuss what labeled data is. In the context of ML, labeled data refers to datasets where each data point (e.g. an image, a text snippet, a sound file, etc) is accompanied by a corresponding label or annotation. These labels help the ML algorithm understand patterns in th ..read more
jbencook
1y ago
As you think through the ways machine learning (ML) can be used to accelerate your business, it can be helpful to see how other companies have done it. Today, I want to share an example of how General Electric (GE) harnessed the power of ML to transform a large part of their business operations.
Predictive Maintenance
Predictive maintenance uses ML and other data science algorithms to monitor equipment and detect potential failures before they become critical. The idea is to use a proactive approach so that their customers can reduce unplanned downtime and improve overall efficiency.
Their eff ..read more
jbencook
1y ago
As an entrepreneur looking to harness the power of machine learning (ML) in your business, understanding the data science process is crucial. This process can be broken down into three main steps:
Proof of concept (evaluate technical feasibility)
Minimum viable product (scale up dataset size)
Deployment (run the algorithm in production)
The goal is to move through these stages as quickly as possible so that you can gather feedback from real-world users. The longer you spend “in the lab” perfecting your algorithm, the less likely you are to build something your customers actually care about ..read more
jbencook
1y ago
How do utility companies monitor thousands of miles of electrical wire to find small imperfections that threaten the entire system? For the entire history of electrical infrastructure, the only answer has been ‘very slowly.’
Now, Sparrow’s computer vision capabilities, combined with Fast Forward’s thermal imaging system, can accomplish what used to take over a decade in less than a month. Here’s how they do it.
How it Started
Dusty Birge, CEO of Fast Forward began inspecting power lines using a drone and crowd-sourced labor. He knew that automation would be needed to take the next step forwar ..read more
jbencook
1y ago
Overview
This post is going to showcase the development of a vehicle speed detector using Sparrow Computing’s open-source libraries and PyTorch Lightning.
The exciting news here is that we could make this speed detector for any traffic feed without prior knowledge about the site (no calibration required), or specialized imaging equipment (no depth sensors required). Better yet, we only needed ~300 annotated images to reach a decent speed estimate. To estimate speed, we will detect all vehicles coming toward the camera using an object detection model. We will also predict the locations of the b ..read more
jbencook
2y ago
The TorchVision datasets subpackage is a convenient utility for accessing well-known public image and video datasets. You can use these tools to start training new computer vision models very quickly.
TorchVision Datasets Example
To get started, all you have to do is import one of the Dataset classes. Then, instantiate it and access one of the samples with indexing:
from torchvision import datasets
dataset = datasets.MNIST(root="./", download=True)
img, label = dataset[10]
img.size
# Expected result
# (28, 28)
You’ll get a tuple with a Pillow image and an integer label back:
The TorchVis ..read more
jbencook
2y ago
The np.any() function tests whether any element in a NumPy array evaluates to true:
np.any(np.array([[1, 0], [0, 0]]))
# Expected result
# True
The input can have any shape and the data type does not have to be boolean (as long as it’s truthy). If none of the elements evaluate to true, the function returns false:
np.any(np.array([[0, 0], [0, 0]]))
# Expected result
# False
Passing in a value for the axis argument makes np.any() a reducing operation. Say we want to know which rows in a matrix have any truthy elements. We can do that by passing in axis=-1:
np.any(np.zeros((2, 3)), axis ..read more
jbencook
2y ago
PyTorch comes with powerful data loading capabilities out of the box. But with great power comes great responsibility and that makes data loading in PyTorch a fairly advanced topic.
One of the best ways to learn advanced topics is to start with the happy path. Then add complexity when you find out you need it. Let’s run through a quick start example.
What is a PyTorch DataLoader?
The PyTorch DataLoader class gives you an iterable over a Dataset. It’s useful because it can parallelize data loading and automatically shuffle and batch individual samples, all out of the box. This sets you up for a ..read more
jbencook
2y ago
Anyone familiar with Python will know about the list append method:
a = [1, 2, 3]
a.append(4)
print(a)
# Expected result
# [1, 2, 3, 4]
But what if you want to append to a NumPy array? In that case, you have a couple options. The most common thing you’ll see in idiomatic NumPy code is the np.concatenate() operation which concatenates two or more arrays along a given axis.
NumPy does have an np.append() operation that you can use instead, but you have to be a little careful because the API has a some weirdness in it.
For 1-D arrays, np.append() works as you might expect (the same as Python ..read more
jbencook
3y ago
When you’re building a production machine learning system, reproducibility is a proxy for the effectiveness of your development process. But without locking all your Python dependencies, your builds are not actually repeatable. If you work in a Python project without locking long enough, you will eventually get a broken build because of a transitive dependency (that is, a dependency of a dependency).
But a broken build isn’t the most dangerous problem. A transitive dependency can change in a way that affects an algorithm’s results without you ever knowing it. Imagine being unable to reproduce ..read more