Businessman Looking out Window --- Image by © BASE/CorbisIn this five-part series, I’m walking you through some of the details around trusted data discovery, or governed data discovery. The goal has been to get you past the mom-and-apple-pie idea of trusted data discovery, and into some specific areas you can address at your company.

Today, in the final blog of the series, we’re discussing the foundational data pieces of trusted data discovery. Your data foundation should ensure confidence in your reporting landscape, as the necessary solutions and process for actively managing their core asset—their data—are set up.

Common Use Cases

Let’s just take a look at one common use case. In this case, you are populating your data warehouse with clean, governed data from many, many systems. And along the way, you’re gathering metadata to make sure that you have visibility into how data is moving, how it’s being transformed, and who is using that data.


You aren’t confined to just the structured data that we all know and love. Many of the insights and innovation you’re looking for come from pairing new sources with your classic, structured sources. So how can you blend those unstructured sources with these classic sources? By using the same tools and the same process.


Notice that the target remains the data warehouse. But now your sources are joined, cleaned, and fit-for-use.

Detail: How Can You Tell Where the Data Came from On A Report?

If your business wants more efficient and collaborative business processes for their analytics, consider starting here. To trust the data in a report, it’s nice to know where it came from. You can identify where data came from, which reports use the data, and who would be impacted by potential changes in the data sources. You can even identify which objects are not conforming to standards.

Normally, you have your Business Intelligence Competency Center (BICC) involved here, as well as your Information Governance team. With these teams in place, you can also provide a data dashboard for data stewards to monitor the quality of their BI data over time.

SAP does this with SAP Information Steward.

Detail: How Can You Tell What the Data Means?

Before consuming varied data in a self-service scenario, you need to know what data you are actually consuming. This leads to one of the best practices – establish a business glossary. This glossary should include these items:

  • Name of the field, and common aliases
  • Owner of the field
  • Definition
  • Business impact if the field is incorrect
  • Allowable values and common mistakes
  • Key reports, data warehouses, and business processes that use this data element


Once you have this glossary, you can expose the definitions within your transactional systems, within your BI reporting, and on portals. Suddenly, everyone is speaking the same language! Check out this hilarious video from Organic Valley on the benefits they achieved by using SAP Information Steward’s metapedia for this purpose!

Detail: How Can I Make Sure The Data Is Fit-for-Use?

Fit data inspires confident decision making with trusted analytics, and lets you work with both structured and unstructured data in the same environment. Fit-for-use requires the definitions (described in the Business Glossary section above), and then the transformations to re-form the data. Those transformations are done with data quality software. But your team has to make some decisions. Where is it most important to fix the data quality?

  • Interactively, as data is entered into the source with a data quality firewall. (Best option if you can do it. Then data is fit for every consumer of that data.)
  • At the source (Ideally, but most expense. And sometimes, the source is out of your control.)
  • En route (After the source but before populating the data warehouse.)
  • Highly governed, high value enterprise data assets managed by the information governance team.
  • Monitor data quality at the target to stay clean

You need a single tool that will support all of these different styles, as the style choice will vary with domain and source application. SAP does this with SAP Data Services, which supports a large number of data sources, and has in-depth integration into SAP tools.

Detail: How Can I Prepare The Data in Advance of Using Analytics Tools?

All of this great functionality is nice, but sometimes it can be overwhelming. What if you just want to do a bit of data quality before moving a relatively simple data set into your data warehouse, and use it for reporting?

We do that, too. For this use case, we use a brand new solution based on the same great data services technology: SAP Agile Data Preparation. With this solution, your business analysts can clean the data in an easy user interface without IT having to get involved.

Use this solution when you need a bit more data preparation than the Prepare room of SAP Lumira can handle, like automatically correcting addresses and finding duplicates in across the sources. For example, notice how the spreadsheets are being merged according to how you want to join the data:


For an in-depth demo, try this YouTube video.


All of the data foundation issues described in this blog need to be addressed to give your end-users TRUST in the data they are consuming in analytics. They need to not only have easy access to these enterprise, curated sources, but also need to know that the data in these sources is indeed fit-for-use.

Thanks for reading this series. Does your organization have trust in the data discovery numbers? Follow the principles in this series, and soon they will!

All blogs in this series: