A couple weeks ago, I was in a meeting and someone said something that really resonated with me, “Our data has grown somewhat organically”.
I love that statement, because it’s true of every organization regardless of size. Data really is organic. Not in the chemical sense, nor the the marketing sense (as in the "organic" section of your local grocery store), but in how it behaves. Data usually starts small, but it grows. It doesn't just grow in volume, but in scope, and in different directions. Data is like a tree in your yard. If it is taken care of, it will be healthy and strong. But it can also grow in ways that are not useful, like a hedge taking over a sidewalk, or a large branch of your favourite tree that threatens your house whenever the wind blows.
What is Data Architecture?
I'm going to switch analogies at this point, because tree trimming doesn't work too well for this part. Data architecture, like building architecture, has many components:
Policies, Rules, and Standards (building code)
Like building architecture, data architecture starts with a set of policies, rules, and standards. Some of these may come from frameworks (such as TOGAF), data organizations such as DAMA (Data Management Association), or industry specific organizations that create data standards for particular industries. Many will be unique to your organization. What these do is set up a framework within which your data architecture is built, so they are ideally set up first.
Data Inventories (bill of materials)
Like the inventory of your favorite retail store, data inventories are a way to categorize, and keep track of the different types of data that your organization has.
Data Models (building plans)
These models, usually presented as diagrams, are what many people think of when they think of data architecture. Data models provide a way of visualizing, and testing the structure of your data. They are built to be consistent with the policies, rules, and standards, and to handle all or a specific part of the data inventory.
Meta-data (all the notes that add meaning to the plans)
Literally data about data. This includes all the descriptive stuff that helps give meaning and context to your data. Like the little help balloon on an application that tells you that the field on your data entry screen labled PC actually contains a postal code.
Integration (utilities hookup)
Integration includes plans for all the ways that your data can be connected between your systems, or with outside data (such as a demographics dataset that you may purchase). Data integration should be firmly imbedded in the data models.
Databases (the physical building)
Databases are the physical (if such a word can be used to describe something inside a computer system) instantiation of the data models. In large organizations, this includes Oracle, SQL Server, My SQL, and Hadoop to name a few. In smaller organizations the database may not be as obvious. It could be Microsoft Access or an Excel spreadsheet, or a proprietary solution like Quickbooks or Salesforce. It could even be several of these. In any case, this is where the data lives; the container that you keep it in.
Data (the people who occupy the building)
At the end of the day, data architecture is about data. If done well, the architecture will allow the data to grow and mature over time. If done poorly, like a house designed for one person, it will need to be modified; walls removed or added; new doors added to change the flow. Or it may need to be abandoned in favour of a new and bigger building that provides room to expand.
Who needs a Data Architecture?
Every organization that has data. As a data architect, I’ve spent much of my time working with large organizations. They tend to have a lot of data that they have built up over a number of years, or even decades. One of the main reasons that large organizations hire a data architect is to help them make sense of what years without a data architect has left them. To help them trim those large looming branches without destroying the house. Sometimes that means taking down the whole tree. But they are not the only ones who need a data architecture.
Small and medium sized companies proclaim that their data is not very complex, or they don’t have much data anyway. In truth, that is the perfect time to start building a data architecture; to help shape that tree before it becomes an issue. Even if all an organization's data is in commercial off the shelf packages (we call those COTS solutions because acronyms are cool), a data architecture is what will help you tie it all together.
With smaller organizations we are obviously not talking about the huge, complex data architectures of government, banks, or national retailers. The data architecture should match the needs of the organization.
While it's unlikely that a bad (or "organic") data architecture will result in a hole in your roof, it is very likely that your data will be less flexible, and less available than it could be. Data is one of your company's most valuable assets. It's like gluing your laptop to your desk, that just doesn't make sense.