by Chandran Saravana, Predictive Analytics, SAP

42-47254170-LumiraI’ve been interacting with a few customers over the last few months, and our core discussions have focused on data governance with respect to the new age of data discovery and predictive modeling. One of the repeated discussions we had was about the gap between the governed and ungoverned, and many of them see that it’s widening. Several of the executives worried about the mainly known perils of ungoverned data discovery, predictive modeling, and ultimately, the applied decisions.

These discussions inspired me to write this blog post. The intent is not to boil the ocean regarding the parts of data governance, but to focus on the new age of data discovery and predictive modeling and its impact on data governance.

Data Discovery and Predictive Modeling Impact Data Governance

I see two types of worlds – one with complete control that’s focused on trusted information done with traditional business intelligence (BI), and another with ultimate flexibility and self-service (to enable users outside IT) done with this new age of data discovery and predictive modeling. The rise of self-service data discovery and ad-hoc analysis in an enterprise is primarily due to the inefficiencies of IT (and capability of existing tools) and is most often related to speed and agility.

The gap between these two domains is widening in the context of data governance. There are many issues in the new age of data discovery and predictive modeling. Here is a subset of those issues:

  • Un-approved / unknown data sources
  • Good analysis, bad data, and bad decisions
  • Insecure data storage
  • Proliferations of unmanaged analysis and predictive modeling.

Traditional Data Governance Is Limited

Traditionally, data governance is more about focusing on data life cycle management, data quality, and data security and privacy supported by data architecture, metadata management, and master data management. And traditional data governance serves nicely with the traditional BI world. But it does not serve the new age of self-service driven data discovery and predictive modeling.

In fact, traditional data governance is one of the indirect drivers for self-service driven data discovery and predictive modeling. Traditional data governance has to evolve and address the needs of the new age data discovery and predictive modeling. The new age of data discovery and predictive modeling touches many types of data, since data exists in the enterprise and outside, varying from structured to semi-structured and unstructured.

It’s more critical now than ever that ultimate freedom to self-service data discovery and predictive modeling be given without compromising security, privacy, and compliance. While at same time, speed and agility must be leveraged to produce analysis and influence decision makers with trusted information.

In the context of governance, there are many aspects that need to be considered by governance leaders in this new age of data discovery and predictive modeling:

  • Build normalized enterprise data store for self-service
  • Make normalized dataset access easier to end user
  • Restrict those source datasets access by URI: example link – http://mydataset.v1
  • Make self-service tool process at the data source (that is, without making copy of dataset in the local system): for example, cloud processing
  • Provide self-service end user with the ability to publish new dataset and build a community that enforces traditional data governance (most importantly, that enriches data quality)
  • Provide a collaboration platform to collaborate on data, model, findings, and more
  • Make those findings and insights easily accessible by end users

What are your thoughts on the future of data governance?