Adventures Converting Large PDF Files Into Text


Yesterday, I discovered that programmatically searching for text in PDF files is more complicated than you would imagine. A friend of mine that works at a non-profit came to me with a problem she thought could be automated. Her organization investigates the accounting practices of public institutions and works with lots of old government files as a result. This particular task involved searching large amounts of PDF files that were made between 1995 to present day. She needed a way to search through each states financial records, year by year, and record the amount certain search terms occurred in all of the PDF’s. To do this, she was opening each file and “ctrl + f” ing for the search term. Then manually recording each count in a spreadsheet. To me, this sounded like a microcosm of hell on earth. Writing a script for this seemed pretty straight forward. Then I started to learn about PDF files. They are tricky to say the least.


Part 1: Wrangling Redis, Gevent, SocketIO and Django

Wiring panel for electric door bell and buzzer


So the other day I was digging around a storage space under neath the step in our apartment and found something awesome. I found the little box and all the wiring for our doors buzzer. It completely exposed and ready to be fiddled with. This discovery paired with seeing this bad boy  Spark Core got me thinking of all sorts of cool things you could do with small WIFI enabled micro-controllers.