Shoehorning into Oblivion: The COTS System Strikes Back

When I was first getting started in the business of Information Technology there were’t many Commercial, Off The Shelf (COTS) systems out there. Systems were written in-house to automate business processes, resulting in a pretty good alignment between the business’ view of their data, and the way it was stored. Reporting needs where usually basic, consisting mostly of what we now call operational reports.
 
Over time, starting with standard processes like accounting, companies started to discover that they could replace their aging in-house systems more quickly, and for less money by buying packages of pre-written, commercially available software. Sure, these packages didn’t always match the business terminology exactly, but they were close enough given the cost savings. Much to the chagrin of I.T. organizations everywhere, these systems where installed, and the data (which was becoming more important than ever before) was shoehorned into the pre-made containers provided by the packages.
 
Fast forward to today. I work as a Data Architect, consulting for a wide variety of companies who are struggling with data that they no longer understand, and can’t easily access. Reporting often takes the form of thousand line queries that try to untangle the data from it’s pre-fab home, where User_Field_264 from the Customer_Extra_Extenstion table represents the Account Start Date.
 
If a simple name mapping were all that was required, then we wouldn’t have much of a problem, however, the real scourge is more insidious. The COTS systems are coded to maximize performance, and flexibility so that they can be made to work for the largest scope of potential customers possible. They are designed to accept the data, and process it in pre-determined ways. They are not designed to care for, and provide easy access to their customers’ valuable data. Over time, I.T. stops talking about the Customer Sub-account data, and starts talking about the Customer_Extra_Extention data. Not long after that the business starts doing the same, and the meaning of the data starts to get lost.
 
When I go into a new company, I don’t start by looking at their systems. I start by meeting with the I.T. and business people who use the data on a daily basis, and I ask them one simple question, “Tell me about your data”. This is often met with blank stares. “What is your main piece of data? The thing you use most.”
 
“Oh!”, they exclaim, “That would be the Entities.”
 
“Great!”, I say, “What is an Entity?”
 
More blank stares. “Well, an Entity is... Well, everything is an Entity.”
 
Bingo. There is the problem right there. A multi-purpose system construct has overtaken the real meaning of the data. These Entities all have similar processing requirements, but not necessarily similar business meaning.
 
It takes time, but together we build a Conceptual model of the Business data. A model that depicts the different bits of data that the business works with; the data that I.T. is constantly struggling to pull out of their cost effective COTS system so they can meet the business’ reporting needs. We gradually find out that the business doesn’t have Entities. The business has Fund Portfolios, and Sleeves, and Series, and Benchmarks, and Indexes, and Client Portfolios. The business doesn’t really care that all of these concepts have been shoe-horned into a construct called Entity. They just want their data.
 
Not surprisingly, one of the more important artifacts that comes out of this process is a Business Glossary. A document that lists, and clarifies the business terms that have been forgotten. This includes the contextual terms - like one company where Accounting called something an Asset, and Operations called something an Asset, but not all Operations’ Assets were Accounting’s Assets and visa versa. Before that clarification, Accounting kept telling Operations that their reports were wrong, and Operations kept telling Accounting that there was no way their numbers were right.
 
So the Scourge of the COTS System isn’t lost IT jobs, nor loss of control of the company’s systems. It’s loss of control of the company’s data. Lost or muddled of knowledge of the data. Loss of terminology that accurately describes the data. It’s confusion, and miscommunication, and rework. It’s duplication of data, and faked data that’s been added to trick the system into processing it the way the business needs. It’s the need for a reporting architecture on top of the COTS systems to untangle the data, and put it back in a form that reflects the way the business needs to use it.