What Comes After FAIRification?

Once your data has been FAIRified, the next step is to determine how to extract value from FAIR data. This involves addressing key questions such as: How can the value of FAIR data be realized? How can the return on investment (ROI) for FAIRification efforts be justified?

FAIR data is the a stepping stone to better know and understand your data.

Why Data Catalogs Often Fail to Deliver Value

Despite their potential, data catalogs frequently fall short of expectations due to several common challenges:

  • Customer-Centricity: It is essential to identify the true users of a data catalog. Who needs it? Were these users actively seeking a catalog, or was the need imposed upon them without their input?
  • Relevance: The information and metadata showcased in the data catalog must be meaningful and useful to its users. Additionally, mechanisms to capture user feedback and suggestions are often missing, leading to a disconnect between the catalog and its audience.
  • Standardization: Different organizational silos often have their own methods for organizing and categorizing data, resulting in inconsistencies that hinder the catalog’s effectiveness.
  • Maintenance: As datasets are added, updated, or removed, the metadata within the catalog may not be consistently updated, leading to outdated or inaccurate information.
  • Integration:
    • Many data catalogs lack the ability to integrate seamlessly with other data management tools and systems.
    • They also often fail to link datasets with other relevant information and knowledge. In reality, data is interconnected, and isolated datasets that do not integrate with others are of limited utility.

To address these challenges, we propose an enhanced set of metadata to complement the existing FAIR metadata framework. This extended metadata set enables the creation of data inventories, catalog and market place.

 

Catalog vs. Inventory vs. Market Place

Data Catalog

A data catalog serves as a customer-facing resource, functioning as a centralized directory that allows users to locate data products available for consumption. It is an enterprise-wide asset that provides a comprehensive reference for the description, location, and usage conditions of any dataset. Only datasets that have been FAIRified are eligible for inclusion, as the application of FAIR metadata is a prerequisite for effective cataloging.

Data Inventory

In contrast, a data inventory is not intended for customer use. It comprises technical information about datasets that is relevant to data management professionals rather than business users. The inventory focuses on aspects such as availability, quality, and completeness of metadata, ultimately indicating whether a data asset is ready to be cataloged.

Data Market Place

A data marketplace is an online platform that enables the exchange of datasets between data providers and consumers. In this environment, data providers can present their datasets, allowing consumers to compare, select, and access the data that meets their needs. Additionally, data marketplaces facilitate transactions and the establishment of data contracts, which outline the rights and responsibilities of both parties involved in the exchange.

 

From knowing to understanding data

What does it mean to “know” data and then to “understand” data?

First, “E” like Existing: It’s first important to know if a data sets exists. At some point, someone has somehow collected this data for some reason.

Second, “FAR” like Find, Access, and Reuse: You need to locate the data. It may not be readily available, so understanding how to access it and whether it can be reused is crucial. Often, data does not adhere to the FAIR principles, making this process challenging and time-consuming. Ideally, metadata that clarifies these aspects should be easily accessible in a data catalog.

Third, “C” for Context: If a data set hasn’t been created to match your needs, don’t expect it to do so. In the best-case scenario, only a portion of the dataset (a subset) may fullfil some of your needs.

“P” like Profiling: To determine which subset of a dataset will meet your requirements, you need to engage data profiling: identify which tables, columns, documents, images, … can add value to your work.

Finally “I” like Interoperable: Once you have identified the relevant subset, you must assess whether it is interoperable with your data: How does it compare to your data? Is it possible to integrate it seamlessly? Do both datasets adhere to the same standards?

 

Understand without Opening

Gaining insight into the content of a dataset should not necessitate accessing, opening, or examining the data itself.

Comprehensive metadata should adequately describe any dataset in sufficient detail, allowing data prospectors to determine its relevance without needing to open it.

This approach requires data profiling to extend beyond mere profiling, delving into data modeling where the model is reverse-engineered from the profile derived from the dataset’s structure and content.

For this reason, BioMedima has been developing a data profiling ontology.