median blog

Beware the temptation of the universal CSV importer

I work on B2B software in an industry traditionally driven by spreadsheets. We're regularly presented with a need to import data into our system from customers or partners. While we do build integrations, that data is very often spreadsheets.

Every time a new need comes up it is tempting to try and 'solve' this problem once and for all. As a younger engineer I would have spent a long time on that problem.

Now I am much more willing to just build a lot of one off ingestion tools. In reality the process of getting data from one format to another is very rarely just reshuffling and renaming columns. Data is messy, there are spaces where there shouldn't be. There are rules that apply that don't make sense in other use cases.

These customers, partners and other sources of data may one day stop creating spreadsheets. Maybe they will all have some form of structured integration with a whole suite of guarantees. Until then I'll keep building my parse_vendor.py scripts, commit them, put them to work and move on to trying to solve more interesting problems.