The 6 Dimensions of Data Quality Explained
Most organizations today, from startups to enterprises, profess to be data-driven. Putting data at the center of the business operations and making data-informed decisions are sound strategic objectives, but they are easier said than done. Very few organizations seem to have grasped what that entails, let alone harness the potential of data to achieve business goals.
The enigma called 'data'
Companies experience problems in leveraging data. A report commissioned by Seagate revealed that only 32 percent of data available to enterprises was properly used. The remaining 68 percent was left unexploited, showing us the extent of missed opportunities despite all the talk around data and its value.
The data organizations were able to use was not without flaws, either: A study conducted with 75 executives in Ireland found that 47 percent of newly-created data records contained a work-impacting error. Only 3 percent of the organizations in the study scored 97 or higher error-free records out of 100.
What happens when data quality decreases? Confidence in analyses dependent on that data plunges; customers lose trust in the organization; costs increase as more time and resources are expended, and the reputation of the organization takes a big hit.
Data is piling up at an ever-faster pace as you read these lines. Without a holistic and systematic view, no organization can keep up with that growth or tap the potential of data. Improving the quality of the data should be the first step in that regard. Such an endeavor should start with understanding the six dimensions of data quality:
Accuracy
This dimension measures how well the data maps to reality. Compromise in this department can seriously hamper business operations. For example, suppose the bank account number of a customer asking for a refund is not correctly recorded in your database. In that case, the refunding process will take longer than it should. You can expect winning over that frustrated customer to be a formidable challenge.
Completeness
This dimension refers to the data conveying enough information to make a productive engagement possible. If you are running a pizza parlor and trying to improve your delivery performance, getting the customer address right without missing parts is the prerequisite. The data in question would be deemed complete if it allows business operations to go on smoothly, even without other details like the customer's gender or age.
Consistency
Consistency refers to a situation where data stored in different locations matches. If two different departments like customer support and sales have different data for a customer's phone number, confusion ensues. The presence of hundreds of inconsistent data entries can paralyze business operations at a company.
Sorting out conflicting pieces of data can be a challenge. This is where data lineage comes into play. Understanding how data was moved from one location to the other and how it was transformed go a long way toward preventing discrepancies in data. That's why having a data governance policy in place is critical.
Validity
Data is considered valid when it conforms to the predefined format requirements of a special domain and falls within the specified ranges. Different domains may require different formats for things like birthdays (MM/DD/YYYY in the U.S. or DD/MM/YYYY in Europe) or phone numbers (starting with either a zero or plus before the international dialing codes). Entries incompatible with the requirements are rendered invalid, providing no benefit to the data user.
Timeliness
This dimension refers to the lag between the actual time of an event and the time it was captured as data on a system. While some bureaucratic jobs are insensitive to this lag or are even defined by using outdated data, some professions like air traffic controllers rely on real-time data. Organizations should define the maximum data lag they can tolerate and take necessary measures to ensure data is captured with acceptable promptness.
Uniqueness
Uniqueness is about data being stored in a single location without any redundancy or duplication. It ensures that one identifier (for example, a name like James Thompson) corresponds to a single individual and that individual, in turn, is identified by a single identifier (strictly as James Thompson, not as "Jim" or "Jimmy"). Ensuring that data entries are unique prevents confusion. Verifying whether these two entries belong to the same person takes time and effort, making data processing needlessly difficult.
Final thoughts
Any attempt at making data-informed decisions is doomed to be a futile exercise in trendiness so long as the data quality remains low. Improving the quality of data at hand and making sure that it is reliable, accessible, and up-to-date for the data users is the first step in transforming your organization into a data-driven one. How that can be achieved deserves another post in its own right.