Software Engineering for Data Scientists (New book!)
Open Source Automation
by Andrew Treadway
1y ago
Very excited to announce the early-access preview (MEAP) of my upcoming book, Software Engineering for Data Scientists is available now! Check it out at this link. Use promo code au35tre to save 30% on this book and any products sold from Manning. Why Software Engineering for Data Scientists? Data science and software engineering have been merging more and more, especially over the last decade. Software Engineering for Data Scientists is my upcoming book that will help you learn more about software engineering and how it can make your life easier as a data scientist! This book covers the foll ..read more
Visit website
How to stop long-running code in Python
Open Source Automation
by Andrew Treadway
1y ago
Ever had long-running code that you don’t know when it’s going to finish running? If you have, then Python’s stopit library is for you. In a previous post, we talked about how to create a progress bar to monitor Python code. This post will show you how to automatically stop long-running code with the stopit package. Getting started with stopit To get started with stopit, you can install it via pip: pip install stopit In our first example, we’ll use a context manager to stop the code we want to execute after a timeout limit is reached. import stopit with stopit.ThreadingTimeout(5) as c ..read more
Visit website
Faster alternatives to pandas
Open Source Automation
by Andrew Treadway
1y ago
Background If you’ve done any type of data analysis in Python, chances are you’ve probably used pandas. Though widely used in the data world, if you’ve run into space or computational issues with it, you’re not alone. This post discusses several faster alternatives to pandas. R’s data table in Python If you’ve used R, you’re probably familiar with the data.table package. A port of this library is also available in Python. In this example, we show how you can read in a CSV file faster than using standard pandas. For our purposes, we’ll be using an open source dataset from the UCI repository ..read more
Visit website
Automated EDA with Python
Open Source Automation
by Andrew Treadway
1y ago
In this post, we will investigate the pandas_profiling and sweetviz packages, which can be used to speed up EDA (exploratory data analysis) with Python. In a previous article, we talked about an analagous package in R (see this link). Getting started with pandas_profiling pandas_profiling can be installed using pip, like this: pip install pandas-profiling[notebook] Next, let’s read in our dataset. The data we’ll be using is a heart attack-related dataset, which can be found here. import pandas as pd heart_data = pd.read_csv("heart.csv") heart_data.head() Now, let’s import ProfileR ..read more
Visit website
How to plot XGBoost trees in R
Open Source Automation
by Andrew Treadway
1y ago
In this post, we’re going to cover how to plot XGBoost trees in R. XGBoost is a very popular machine learning algorithm, which is frequently used in Kaggle competitions and has many practical use cases. Let’s start by loading the packages we’ll need. Note that plotting XGBoost trees requires the DiagrammeR package to be installed, so even if you have xgboost installed already, you’ll need to make sure you have DiagrammeR also. # load libraries library(xgboost) library(caret) library(dplyr) library(DiagrammeR) Next, let’s read in our dataset. In this post, we’ll be using this customer chu ..read more
Visit website
Python collections tutorial
Open Source Automation
by Andrew Treadway
1y ago
In this post, we’ll discuss the underrated Python collections package, which is part of the standard library. Collections allows you to utilize several data structures beyond base Python. How to get a count of all the elements in a list One very useful function in collections is the Counter method, which you can use to return a count of all the elements in a list. nums = [3, 3, 4, 1, 10, 10, 10, 10, 5] collections.Counter(nums) The Counter object that gets returned is also modifiable. Let’s define a variable equal to the result above. counts = collections.Counter(nums) counts[20 ..read more
Visit website
How to create PDF files with Python
Open Source Automation
by Andrew Treadway
1y ago
In a previous article we talked about several ways to read PDF files with Python. This post will cover two packages used to create PDF files with Python, including pdfkit and ReportLab. Create PDF files with Python and pdfkit pdfkit was the first library I learned for creating PDF files. A nice feature of pdfkit is that you can use it to create PDF files from URLs. To get started, you’ll need to install it along with a utility called wkhtmltopdf. Use pip to install pdfkit from PyPI: pip install pdfkit Once you’re set up, you can start using pdfkit. In the example below, we download Wikip ..read more
Visit website
Faster data exploration with DataExplorer
Open Source Automation
by Andrew Treadway
1y ago
Data exploration is an important part of the modeling process. It can also take up a fair amount of time. The awesome DataExplorer package in R aims to make this process easier. To get started with DataExplorer, you’ll need to install it like below: install.packages("DataExplorer") Let’s use DataExplorer to explore a dataset on diabetes. # load DataExplorer library(DataExplorer) # read in dataset diabetes_data <- read.csv("https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv", header = FALSE) # fix column names names(diabetes_data) <- c("number_ ..read more
Visit website
How to get stock earnings data with Python
Open Source Automation
by Andrew Treadway
1y ago
In this post, we’ll walk through a few examples for getting stock earnings data with Python. We will be using yahoo_fin, which was recently updated. The latest version now includes functionality to easily pull earnings calendar information for individual stocks or dates. If you need to install yahoo_fin, you can use pip: pip install yahoo_fin If you already have it installed and need to upgrade, you can update your version like this: pip install yahoo_fin --upgrade To get started, let’s import yahoo_fin: import yahoo_fin.stock_info as si Getting stock earnings calendar data The f ..read more
Visit website
Technical analysis with Python
Open Source Automation
by Andrew Treadway
1y ago
In this post, we will introduce how to do technical analysis with Python. Python has several libraries for performing technical analysis of investments. We’re going to compare three libraries – ta, pandas_ta, and bta-lib. The ta library for technical analysis One of the nicest features of the ta package is that it allows you to add dozen of technical indicators all at once. To get started, install the ta library using pip: pip install ta Next, let’s import the packages we need. We’ll be using yahoo_fin to pull in stock price data. Now, data contains the historical prices for AAPL. # lo ..read more
Visit website

Follow Open Source Automation on FeedSpot

Continue with Google
Continue with Apple
OR