I have to admit that I’ve been putting off completing part two of this article (see part one here). It’s a simple question, but the answer seems to not be an easy one.
Last week I attended the World Wide Data Vault Consortium (WWDVC for short), more on that in some upcoming articles. While at the WWDVC, I had the privilege of mingling with some rather brilliant people, many of whom were data architects. So, I asked them some questions.
How long have you been a Data Architect?
This should be an even easier question.
“Well, um… I’ve had the title Data Architect for about two years, but I’ve kind of been doing it for more like… I don’t know. I guess about ten years”.
The actual numbers differed from person to person, but the answers pretty much all followed the same pattern. The fact that a group of people who self identify as Data Architects, have a hard time answering this simple question is actually very telling.
What about people who live with a Data Architect?
My wife joined me on the trip. Not to attend the conference, mind you, but it was held in beautiful Stowe, Vermont at a very nice resort named Stoweflake. Several of the attendees also brought their families along. This gave me the chance to ask some people who were close to Data Architects about what a Data Architect does.
“I’m not really sure, but it has something to do with organizing data”, was the most common response.
My favourite response was, “My daughter says that her daddy draws little boxes, and puts words in them, and it makes him happy.”
So my point here is (again) that this simple question is, for some reason, very difficult to answer.
What is data anyway?
How about taking a step back? What the heck is data? We all use it, but do we understand it? We often use the term to mean several different things. This brings up the concept of the data pyramid.
Data makes up the base of the pyramid. The facts. Just the facts. Information is something that is built on top of data data using calculations, aggregations, and models. Knowledge is derived from information, and Wisdom comes from long term application of knowledge.
A Data Architect works in the first two levels – that’s why they are sometimes called Information Architects. The point here is that a Data Architect’s job deals with making sure that these foundational levels can support the load of the upper levels. But what does that mean?
What does that mean to a business?
Perhaps this is the real question. Every business uses data to make decisions. Sometimes that data is easy to get, and the right decisions get made, and life is good. Sometimes it’s hard to get, but once it’s got (as long as it didn’t take too much time), life is still OK. Sometimes the data required to make an informed decision is too hard to get, or doesn’t exist at all, and bad decisions get made, and life is not good. The decision doesn’t go away just because you don’t have the data to support it.
A Data Architect – a real Data Architect – not only designs ways to hold the data so that it’s easy to get when you need it, they also help figure out what’s missing (hopefully before a crisis). Then they build strategies, and roadmaps, to act as a guide in helping the business to start filling in the gaps.
A couple real world examples...
Company 1 - Understanding the source
All our reports tell us that everything is fine, but out in the field we can see that it's not. How can our reports be so off the mark?
This company had a lot of data. Companies with a lot of data tend to have alot of reports. Some of the reports were fairly complex, having lots of caclulation, and combining data from multiple sources.
When someone asks for a report that's 80-90% duplication of an existing report, it's human nature to take the short cut. Copy the existing report, and add the missing 10-20%. In this case, the issue was that the business rules (filters, calculations, etc) of the reports being copied were not fully understood. A report that was thought to have 90% of the data may have really only had 50%.
In this case, the problem was in the difficulty of getting to the data. The answer was to build a single data store (similar to a data warehouse) that contained the data required for all the reporting. This data was complete, and with no filters applied so that all reports could be sourced from it.
Company 2 - When is an Asset not an Asset?
This company was having a difficult time reconciling data from different parts of their organization. They would spend the first half of their budget meetings trying to figure out why the numbers from accounting didn’t match the numbers from operations. After all, operations ordered and installed the stuff, and then told accounting what they did.
How did the numbers get messed up? It turned out in this case that the problem was hiding in the terminology. Operations started counting assets as soon as they took possession of them. Accounting was only counting assets once they were installed and turned on, because that’s when they started earning money. Both groups used the term asset, and thought they were both using it the same way, but in practice they were very different terms. By creating two terms: Idle Assets, and Activated Assets (and two places to store the data), the problem was solved. A new data store is not always the answer.