The Association of Metadata with Data Governance

MetadataHistorically, we hear metadata defined as “data about data.” I prefer to describe metadata as data about data that defines and describes other data within a given context or set of circumstances, for particular purposes, and with specific viewpoints. All metadata is data but that does not mean all data is metadata. In other words, metadata only exists to describe the characteristics surrounding a particular data construct. With enough characteristics stored in a database (metadata repository) or data governance tool, such as Collibra, the Data Governance organization is able to formulate the appropriate level of conceptual models that are meant for both business and IT to read, understand, and use to make quantifiable data management decisions.

For Data Governance to succeed at managing data assets and related resources, it must rely on metadata. Metadata is used to describe matters such as who does what with the data, who needs the data, who produces the data, and even who is accountable for the data (i.e. data steward). Frankly, we are collecting metadata all the time. We have it in our data models, classification schemes (data protection), databases, glossary, issues log, process flows, system maps, business rules, quality rules, privacy rules, stewards, reports, ETL processes, data lineage (lifecycle), and even on our organizational charts. Yes, metadata is simply about collecting information about data, activities, and people.

As you would expect, metadata does not govern itself. It too is data that needs to be governed using the same activities the organization uses for non-metadata.

Metadata is captured based on information asset requirements and is described and used in various models and business intelligence reports. Traversing the evolutionary lifecycle of metadata is relatively straightforward. You define what the business needs to know, design and build related models and reports, and create and distribute informational assets (models and reports) to appropriate parties. We use a three-step model to convey this concept for traversing metadata through its evolutionary lifecycle through definition, generation, and consumption.

Meta data blog image2

The Definition Stage is where Data Governance establishes the requirements (what to collect) and standards for the creation of conceptual models. These requirements are nothing more than questions that need to be answered through metadata and solidifies what metadata is required and what is the meaning of that metadata across the enterprise. Some example questions metadata helps to answer are:

  • What customer or product data is linked to what system, and what processes use that system?
  • Who is accountable for that system?
  • Who are the users of that system and where are they located?
  • What sales channels are linked to that system?
  • Where is all of the PII, PCI, and other sensitive data?
  • What is the meaning of “customer” across all lines of business?
  • What is the most reliable data source?
  • Who are the data stewards for that master data domain?
  • Who has access to create, read, update, and delete authority across data-related resources?
  • Where does the data reside and how does it flow through the system?

The Generation Stage is where the Data Governance organization produces conceptual models and reports to answer questions derived from the Definition Stage. These models can simply be spreadsheets in the short term. However, it’s recommended that the Data Governance organization depend on more sophisticated models that conceptualize the relationships of a particular data resource or data element across all business and IT viewpoints. This will save on the potential pain caused to users when copies of metadata are in multiple spreadsheets and one of the metadata elements has a deviation that requires research, thereby breaking the level of trust required for a successful Data Governance program. In this stage it’s important that all participants agree on where the source of metadata shall reside, how it should be accessed, and when it‘s appropriate to be collected from any specific data management process (e.g., data quality management, master data management, data architecture, and so on).

The Consumption Stage is probably the most important. It’s responsible for providing the right conceptual models to the right users at the right time to make the right decisions. For example, users receive the conceptual models before any project work begins so that they can perform an Impact Assessment on any potential changes, thus allowing both business and IT to locate the impacted data element, system, data steward, and so on. Metadata definitions unlock the value of data, turning enterprise information into assets.

Metadata is absolutely needed to keep data governance running smoothly. You will know you have the correct level of metadata when your quality dashboard reflects an improvement in data quality, users understand the conceptual models and use them for their designed purposes, data is being protected and you can prove it, and, last but not least, analytical reporting capabilities are indeed more reliable.

Since all metadata is really data, successful data governance mandates that companies govern metadata itself using the same established data governance principles. In that sense, metadata issues are tracked and metadata repositories are kept up-to-date as part of everyday duties, which includes recording metadata updates related to both issues and projects. Not all companies seem to grasp that data governance necessitates that metadata updates happen as part of typical issue resolution and project life cycles. A conceptual model reflecting all the projects in flight or planned that will impact a particular data element is a nice model to mandate, especially when you are concerned about data protection and maintaining BCBS 239 compliance.

Where does one start? Take inventory of data-related assets such as existing systems, critical processes, users, stewards, data sets and so on. In conjunction, flush out the business glossary and work with IT to define the data dictionary, which probably has many technical permutations of the business terms defined in the glossary. Along the way, the Data Governance team must set time aside to build an inventory of questions that both the business and IT agree are worth answering on a specific cadence. At the end of the day, the enterprise will start to feel as if it’s speaking with the same common vocabulary, that it’s able to better manage data risks and data challenges, and that it can trust financial and operational reports.

Want to continue reading about Data Management? Make sure you didn’t miss our latest articles on Data Value Management and Securing your data so you can focus on growth.