How to avoid the most common CMDB mistake

Being lucky enough to work for something of an pioneering ITIL adopter back in the late 1990s, I guess I was a CMDB practitioner earlier than most.  Back in those days, there was little “off-the-shelf” CMDB technology, and even automated discovery was in its infancy, with little adoption. But we were driving our ITSM initiative ambitiously, and every year the ITIL auditors would come and score us. We needed a CMDB pretty quickly, and it was clear we needed to built it ourselves.

This was great, because it meant we got to make all the usual CMDB mistakes early.  In our case, we only started to get it right third time.

On our first attempt, we built a tool that wasn’t really up to the task of holding the data we needed to hold.  That was definitely our fault, although ITSM platforms were pretty simplistic back then too.  With no off-the-shelf CMDBs to buy or to benchmark against, we were pretty much on our own, and we got our design wrong. We abandoned that, and tried again.

The next system was much better, but our implementation was marred by a second big error.  This mistake was driven by the received wisdom of its day, inspired by ITIL version 2 itself and its definition of the CMDB as…

“A database that contains all relevant details of each CI and details of the important relationships between CI”

Great, that’s clear. We need everything.

Unfortunately, despite ITIL having long cleared up its story, and technology bringing us clever tricks like data federation, the same fundamental mistake still gets made today.  In fact, it gets made a lot.

This type of failure is characterized by an approach that is founded on the data sources for the CMDB, not the required outputs.

  1. The organization identifies a bunch of data sources, and decides these will be the basis of its CMDB.  These might be discovery tools, other management systems, spreadsheets, or more typically a combination of some or all of these things.Simple representation of three multiple data sources
  2. The organization spends a lot of time, money and effort integrating the data sources into a single data store.
  3. With this hefty new database built, the organization tries to derive some outputs from the new CMDB.  And here, it hits a problem:

Representation of a multi-source CMDB with overlayed output requirements. There are gaps and overlaps

Suddenly, the organization faces the realization that having made their CMDB out of “everything”, their “everything” is both too much and not enough.  It’s too much, because a whole bunch of data is being included, expensively, for which there is no actual end result.  At the same time, it’s not enough, because the outputs we actually need are not completely supported by the data in the CMDB anyway, due to two key problems – gaps, and overlaps:

Close up representation of the CMDB failing to support its output requirements due to gaps and overlaps

This mistake is devastating to a project, but it’s completely avoidable, if some basic fundamentals are followed in any CMDB initiative:

  • Focus on the requirements, not the data you happen to own.
  • Identify what data is needed to support those requirements.  This is the data your CMDB needs.
  • If there are overlaps – i.e. objects of required CMDB data which could be sourced from more than one place, you need to determine the best source. A good CMDB tool needs to support effective reconciliation of multiple sources, with appropriate priority for the best source of each item.
  • If there are gaps, where no obvious source is available, there are two basic choices: Either re-think the requirement, or find a way to get that data.
  • Know how every piece of data is kept accurate. As soon as governance fails, trust is lost in the CMDB. That’s fatal.

Of course, as the middle diagram above illustrates, some data sources might actually provide more data than the initial requirements set requires.  For example, automated discovery tools may gather a lot more information than is initially needed.  This isn’t necessarily a bad thing: future requirements or investigative data mining might each benefit from this data.

If there’s little extra cost to maintaining this extra data (as might be the cost if it’s automatically supplied by discovery tools), then it might be worth hanging on to. If it’s complex, manual, time consuming, and doesn’t support any outputs, then why bother?