Every month when I get my credit card statement I like to go through it and categorise the transactions (a bit like in you can in Netbank), so that I can see where and how I am spending my money. The statements come as PDFs and getting the data into spreadsheets involved copying and pasting data from the PDF into Notepad++, running a few regular expressions (setup as a macro) to format the data and then copying and pasting that into the spreadsheet. It wasn’t a difficult process, but I thought it would be fun to try and automate it, even if it that turned out like xkcd 1319.
Architecture and language
The 4 months from first commit (Feb 11th) to last (Jun 24th) is more than I would have expected at the start. I think the biggest factor in this was that I found learning new libraries and tools hindered my momentum early on. From the start I decided to use this as an opportunity to use new tools and libraries, which was good, but it was frustrating especially when it seemed to take 2 hours of research to make 5 minutes actual progress.
The main offenders here were webpack and pdf.js, I found the documentation of these pretty unhelpful at times. For webpack I had look for the way other people configured it and dig into what it was actually doing and for pdf.js I read through all the provided examples and built a prototype to check it could do what I wanted. Using TypeScript also meant that setting up webpack, jasmine and karma were all a little bit trickier. I found this boilerplate helpful.
Once I had webpack configured it worked well. The continuous test runner was very quick, by the time I had changed to the console after saving a change the tests had completed. This fast feedback is something I now miss in my normal C# and Visual Studio development environment.
I took the inside out approach to code the StatementParser and CsvConverter classes, but didn’t create any tests for PdfScraper. I probably should have created some dummy PDF files and approached it from a TDD perspective as well.
Definitely not a strong point of mine. Making something look like a design is easy, but coming up with an appealing design is challenging. It looks alright now, but that was several hours on its own (yes, even for how simple it is). I decided to use material design concepts and referenced Materialize for CSS.
I rhetorically asked before what can’t you do in a browser now, but there are still a few quirks. Converting the csv data into a file and downloading it was two lines of code (thanks data URIs), but giving the file a meaningful name required 8 and is a bit of a browser hack that may break with future updates.
Where to now?
When I started the project I was using Excel, but I have since moved to Google Sheets. Google Sheets has a nice API for making changes so the next steps are to fork this project and create a version that adds the data directly into the sheet.