Blog Posts & News

Some cleaning is best done by hand

When I’m not writing and editing my bestselling novels and short stories, I actually work a day job with numbers and spreadsheets and databases. I found my niche in that market. Think of someone who has to write, edit copy and make improvements to a manuscript. What skills do they possess that could possibly be needed in database administration? It seems like a completely unrelated field.

It’s not unrelated. Think about these skills:
1. Ability to pick out an error in a sea of words and numbers
2. Patience to comb through manuscripts that need cleanup work
3. Ability to track complex relationships between characters, setting and themes
4. Pattern recognition
5. Search and replace wizardry

So why does this matter?

The reason I make this point is that at any large company, there are massive piles of bad, dirty data in need of a good scrub. If the data is dirty for a simple reason, an automated process using tools like Excel, Python or SQL can handle it. Some tools will build maps that take data from one format to another.

When the data is dirty for a bunch of different reasons, some of which are at odds with each other, automation isn’t an option. Here is where you need a bespoke data scrubber.

A what?!?

Okay, you got me. I made that job title up, but it is self-explanatory. This will be someone who looks at the data and makes manual changes using carefully considered search/replace functions. Sometimes it’s easier to find the ten records in a million that are causing the problem, and change them by hand because writing the python script might take all day. And who is better than a copy editor/writer to find those problems quickly?

So what else do I need?
A knowledge of data tools, especially Python and Tableau, are helpful but not necessary if the data can be broken into smaller chunks. Every organization needs one of these unsung heroes. The problem is that these heroes have no formal position in an organization. They often get called “utility people” by busy managers. This is a branding problem. From now on, if you’re a data cleaner who has to do primarily manual cleaning, you should re-brand yourself as “bespoke”. It’s fancy Savile Row terminology for “custom”. So be proud of your tailored approach to data!

* * *