My name is David Walsh. I'm a 33-year old web developer and software engineer from Madison, Wisconsin. This blog is targeted toward all levels of web designers and developers. Follow this blog for tutorials on Node.js.
Images are a great way to communicate without text but oftentimes images are used/abused to spread text within social media and advertisements. Text in images also presents an accessibility issue. The truth is that it’s important, for any number of reasons, to be able to detect text in image files. The amazing open source tool that makes detecting text in images possible is tesseract OCR!
I recommend using Homebrew to install tesseract:
brew install tesseract
To run tesseract to read text from an image, you can run the following from command line:
tesseract ~/Downloads/MyImage.png ~/Downloads/MyImage.txt -l eng
The command above extracts detected text in the English language (-l eng) into a text file (MyImage.txt). The process is very quick and there are dozens of supported languages.
Let’s look at the following example:
The following text is detected:
~- TOUR SQUAD
CECH MUSTAFI GUENDOUZI oziL
LENO SOKRATIS NELSON IWOBI
MARTINEZ MAVROPANOS SMITHROWE = NKETIAH
BELLERIN OSEI-TUTU WILLOCK PEREZ
KOLASINAC ELNENY RAMSEY LACAZETTE
CHAMBERS MAITLAND-NILES MKHITARYAN AUBAMEYANG
There are a number of utilities in different programming languages that plug into tesseract’s functionality, but it’s important to know the underlying tool! tesseract is an unbelievable tool that you should take advantage of if you need an open source utility for detecting text in an image!
Convert HTML to Markdown with Node.js
Start by installing Turndown:
yarn add turndown
Then use Turndown’s simple API to convert HTML to markdown:
var TurndownService = require('turndown');
var turndownService = new TurndownService();
var markdown = turndownService.turndown(`
Most developers look for a Markdown to HTML solution so it’s rate to find myself in a position to need to convert HTML to Markdown. I look forward to migrating my site’s content to Markdown so that writing posts is much less stressful in the future!
URL shorteners are a dime a dozen these days, and it is quite nice to have a pretty URL instead of a mile long string, but there are some downsides to URL shorteners: they can mask dangerous URLs and getting to the endpoint can be slow, since you end up making multiple requests. And what if a shortener sold out to a porn company?! Whoa!