I received an email the other day. It was carefully crafted, and was obviously intended to sound like it was written especially to me. It lost me, however, in the first two words: "Dear Tannis00". Dirty Data lost the sale.
When it comes right down to it, data is an asset. It has value just like a chair or a desk. If you caught an employee sawing off the corner of his desk because he kept bumping into it, you would be upset. Cutting corners with Data is no different.
Clean Data vs Dirty Data
There is a lot of talk about clean data, but what is it? The very existence of the term suggests that data, like your car, tends to get dirty. Perhaps to understand clean data, it is easiest to start by looking at what makes data dirty.
- Extra stuff, like mud on your car, stuck on where it shouldn’t be. For example a customer’s name entered as “William Smith (prefers Bill)”. It’s great information if used by a human, but it can’t be used for other purposes such as a mail merge. Dear Mr. Smith (prefers Bill)…
- Missing or damaged stuff (like rust, dents, or a missing tail light on your car). For example a customer’s phone number entered without the area code, or a missing email address.
How does data get dirty?
The usual response to this question is that it is human error. While that does happen in the form of keying errors and contextual misunderstandings, it is not the biggest cause. So what are the causes?
- Human error (as mentioned)
- Trying to capture potentially useful data that the system wasn’t designed for (such as the example of a preferred name)
- Data that is not required for the task at hand, so it is skipped. A customer’s email address may not be required in order to produce an invoice.
- Defaults on a data entry form. One of my clients had the majority of their customers listed as coming from Alberta because that was the first province in the list.
In truth, the biggest cause of dirty data is a lack of understanding of the value of data.
What are the benefits of Clean Data?
Like precious metals, data that is more pure is more valuable. Clean data is obviously more reliable when you want to use it to help you make decisions, but there is more.
Clean data is more flexible, meaning that it can be used for more than its original purpose. An example is customer data that was originally captured for the invoicing process, but can also be used for a targeted email campaign or to notify customers of an event you are running in their area. Without complete data (email address, or mailing address) neither of these would be possible.
Clean data is easier to integrate with other data. For example, you may have the opportunity to acquire demographic data that you want to use for a marketing campaign. Being able to link that data to your current customer base could yield valuable insights. But if the demographics are based on postal code, and your customer data is incomplete in that area then you can’t link it.
How do I get clean data?
The answer will obviously depend on your particular situation. It’s about automated checks and repeatable processes, but more than anything it’s about understanding and business culture. Data is the new currency. Make sure you are not being short changed by your employees by treating data yourself with the same respect you give to money.