ALL YOUR BASE ARE BELONG TO US
1,722 FOLLOWERS
Physics. Data Science. General Geekery. Probably Coffee.
ALL YOUR BASE ARE BELONG TO US
2y ago
This is a little intro to using an LSTM model for time series data. It’s not in any way a thorough introduction to how LSTMs work, which is pretty complex and far too much info for a short blog like this one…
I’m going to demonstrate it using some financial (stock market) data, where I’ll predict the adjusted close price of a stock from the other trading info. I’ll also add some error bars on those predictions because, well, why not?
Getting the Data
I’m going to use the YahooFinancials python library to find time series data on particular stocks. It’s pip installable and has more reliable and ..read more
ALL YOUR BASE ARE BELONG TO US
3y ago
This post is password protected. You must visit the website and enter the password to continue reading ..read more
ALL YOUR BASE ARE BELONG TO US
5y ago
A while ago I wrote about how to extract text from PDF documents in Python using the PDFMiner library. However, in a recent project I had some trouble using PDFMiner to extract text, possibly because the documents I was working with were scanned PDFs. In this case the answer is to use OCR-based text extraction, and that’s exactly what the textract library is able to do by making use of the tesseract OCR algorithms.
Using textract is extremely straightforward:
import textract
pdffile = "myfile.pdf"
text = textract.process(pdffile, method='tesseract', language='eng')
et voila.
However… althou ..read more
ALL YOUR BASE ARE BELONG TO US
5y ago
In my previous post I talked about how to use random forest classification to separate true pulsar candidates from RFI. That classification used numerical features extracted from the processed data.
Ultimately it would be interesting to just be able to use the data itself, rather than extracted features. To help me do that I’ve created a PyTorch torchvision dataset in the same format as the well-known CIFAR10 dataset, so that I can easily load and manipulate those data for use with image-based classifiers (e.g. CNNs).
Here I’m going to explain how I did that, just in case anyone else wants to ..read more