The Role of FAIR Principles in the AI Era

The Rise, Evolution, and Exhaustion of FAIR

When the FAIR Principles were introduced in 2016, they landed with unusual clarity and ambition. They were not a technology, not a data model, not a compliance checklist. They were a story, a compelling narrative about how the scientific world could finally break free from decades of fragmentation, system-siloed, and inaccessible data. FAIR promised a future in which data could be found, accessed, interoperated, and reused by both humans and machines.

FAIR quickly inspired methodologies, FAIRification frameworks, implementation profiles, FAIR metrics, maturity assessments, community initiatives across ELIXIR, GO FAIR, RDA, NIH, IMI and many domain-specific applications from clinical data to omics, imaging, chemistry, manufacturing, and regulatory submissions. FAIRification became a recognised process blending a mindset, data governance, semantics, and technology.

However, data FAIRification is a demanding and often underestimated undertaking. At its core, a FAIRification program is not a technical project but a cultural transformation, and changing organisational culture is arguably the hardest challenge any enterprise can face. It is a long, sometimes painful journey that requires genuine executive sponsorship, sustained motivation, long-term commitment, and significant resources across teams. FAIRification progresses through three phases (diagnostic, treatment, and monitoring) and rests on three foundational pillars: mindset and culture, metadata and semantic practices, and technological capabilities. These elements must evolve together for FAIR to succeed. More on each of these in the next section.

A Fair Story

The original FAIR story and promise was simple, bold, and transformative: to make data findable, accessible, interoperable, and reusable by both humans and machines so that scientific knowledge could flow freely across disciplines, systems, and organisations. FAIR envisioned a world where data carried a minimal message about itself through metadata, identifiers, and semantics, liberated from system silos, platform dependencies, and individual interpretation. It promised faster discovery, stronger reproducibility, smoother collaboration, and a foundation for automation and machine reasoning, ultimately envisioning billions saved in time, effort, and money by reducing the burden of managing data before any value can be extracted from it. At its core, FAIR offered a new way of working: data that explains itself, ready to be reused, combined, and understood far beyond its original purpose.

People are naturally drawn to stories that promise hope and a better future, yet they are equally wary of change and the unfamiliar. As a result, they often cling to comfortable routines and reshaping narratives to fit personal wishes (aka “wishful thinking”), crafting elaborate rationalisations that make convenient but unrealistic ideas seem credible.

A story retold many times, subtly tweaked at each repetition by the narrator’s cognitive biases, eventually loses its original shape. The FAIR story suffered precisely this fate. Misinterpretations multiplied, not only because messages naturally drift as they pass from person to person, but also because people intentionally bent the narrative to fit their own interests, strategies, habits, or the technical constraints of legacy systems:

  • “We are FAIR because we have a data catalogue”
  • “We assign IDs, so data is FAIR”
  • “My search engine is FAIR as it enables to find data”
  • “Our system has drop-down lists, so we use controlled vocabularies, therefore data is interoperable”
  • “We use APIs, so our data is accessible”
  • “We tagged our data with keywords; data is reusable”

Over time, FAIR became a convenient label to stick onto IT systems, legacy datasets enhanced with a few identifiers, or questionable data management practices. It often turned into a sophisticated rhetoric used to justify whatever new data initiative an organisation wanted to pursue. The original mission (machine-actionable interoperability enabled by semantics, persistent identifiers, and rich metadata) was gradually sidelined in favour of cataloguing exercises, storage projects, and system upgrades. The term was stretched, diluted, and in some organisations, almost completely stripped of its meaning.

The result: a FAIR fatigue. The excitement faded, momentum slowed, and the term followed the trajectory of other famous buzz words such as “cyber” in the 90s, “i-everything” in the 2000s, and “digital” in the 2010s. A term once brilliant became void of sense and a background music that one ear in any data discussion.

And the tragedy? Very little of the original FAIR vision has truly been achieved. What started as a movement to bring semantic order to data entropy has often been reduced to a branding exercise. FAIR became a mirror reflecting organisational inertia rather than a vehicle for change.

The Essence of FAIR: Mindset, Resources, and Technology

Although FAIR has often been misapplied, its core remains simple and powerful: FAIR is fundamentally about a cultural transformation and revolves around three pillars: individual mindset and/or corporate culture, digital resources and IT tools.

Mindset/Culture: Data as an Asset

A genuine FAIRification initiative is not about buying, building, or upgrading IT systems (a system-centric mindset). It is about conceiving and treating data as an asset, with the goal of extracting maximum value from it (a data-centric mindset). FAIRification rests on a few guiding principles:

  1. Data is an asset or product, not a by-product of IT systems.
  2. Data must be shared and reused, not confined to silos.
  3. Producers are responsible for ensuring FAIRification, not consumers.
  4. Consumers are responsible for articulating their data needs clearly.

FAIR requires a culture in which people see themselves not as data traders moving information around, but as data stewards accountable for the clarity, metadata, and meaning of the assets they produce and exchange.

Digital Resources: Metadata and Identifiers

FAIR calls for a small number of non-negotiables:

  • URI-based identification schemes, ideally Globally Unique Persistent Identifiers (GUPRIs)
  • A minimal accurate metadata set
  • Authoritative sources (machine and human)

This is the semantic scaffolding that makes data discoverable, linkable, interpretable, and reusable.

The FAIRification approach must be anchored in an organisation’s digital strategy, which in turn is shaped by its business strategy, needs, and context, all of which differ from one organisation to another. FAIRification should address the three foundational pillars outlined above, but it does not necessarily entail activities or resources often mistaken for FAIR work.

FAIR does not necessarily require:

  • moving all data into a lake or central repository,
  • buying a new IT system or platform,
  • introducing new file formats or storage technologies,
  • inventing custom reference or master data models,
  • replacing existing architecture with a single monolithic system,
  • re-engineering data pipelines,
  • enforcing organisation-specific taxonomies instead of community standards.

FAIR requires metadata management and persistent identifiers, not wholesale data transformation or large-scale infrastructure rebuilds.

Technology: Fit-for-Purpose Semantic Infrastructure

Although not explicitly stated in the FAIR principles, FAIR is fundamentally rooted in Semantic Web standards, not because they are fashionable, but because they are uniquely fit-for- purpose:

  • URIs provide global and persistent identity, enabling unambiguous referencing across systems
  • Semantic Web standards provide rich, shared vocabularies and ontologies that can express various flavours of FAIR principles in a machine-readable and interoperable way

Semantic web tech stack is built for machine-actionability, which is the core intent of FAIR. A relational database is excellent at storing and retrieving data, but fails providing semantically rich metadata that can explain what the data means (which is btw what generative/LLM-based AI needs). FAIR is not about infrastructure, it is about establishing a minimal standard meaning to data for humans and machines.

FAIR in AI Era: Why Semantics Matter More Than Ever

Riding the generative AI hype, much of the modern digital world dreams of in silico intelligence and ever-greater assistance with cognitive tasks. Yet in this excitement, many forget a simple truth: a system is only as smart as the data available to it. If “data is the new oil,” why focus solely on pumps and pipes (the logic and the AI) while paying less attention to the quality of the oil itself? Auto-regressive models and neural architectures may promise intelligence, but they rely entirely on the quality, clarity, diversity and meaning of the data they consume, whether during training or prompting. The fundamental reality has not changed: regardless of how advanced AI becomes, its power is limited by the data that it feeds from and by the data it produced as it will eventually also feed from it.

AI cannot invent accurate context or infer purpose and meaning from ambiguous, generic, or poorly described data. If data was not collected, and ideally annotated, with a specific purpose in mind, one cannot expect an AI system trained or prompted with that data to reliably solve problems or answer questions related to that purpose.

Why FAIR Data Is Critical for AI

FAIR data provides the essential metadata needed to understand a dataset without opening it. This foundation can be expanded to support trustworthiness through provenance and licensing, contextual business architecture description, track data lineage from source-of-truth to consumption, quantify data value, and, most importantly, establish and guarantee trust in data by specifying the 5 Ws (what, who, when, where, how) of a dataset throughout its lifecycle.

These elements are exactly what AI systems need to:

  • reason over data,
  • ground their responses,
  • reduce hallucinations,
  • integrate knowledge across datasets,
  • and automate actions reliably.

AI needs semantics. FAIR provides the minimal semantics necessary to make that possible.

The Collapse of FAIR Semantics

FAIR started as a visionary story: “Let’s make data understandable to machines.” As explained above the story was retold, stretched, and twisted. FAIR principles were used in conversation as: “my system is FAIR”, “my data is in a catalogue”, “my structured data is FAIR”, gradually eroding the original meaning of FAIR principles. Why?

It is important to acknowledge that data FAIRification runs counter to the dominant logic of the software and platform industry. Most commercial software is built on a strategy of differentiation, offering a unique, proprietary solution with its own pre-cooked, IP-protected data model and logic, designed to deliver value that competitors cannot replicate. FAIR takes the opposite approach. It does not seek uniqueness and ring-fencing, but rather openness and sharing. These represent two fundamentally different mindsets: one centred on proprietary systems, and the other centred on open, interoperable (meta)data.

Yet this framing oversimplifies the landscape. Not all software is inherently anti-FAIR, and not all FAIR advocates fully appreciate the economic and architectural realities software vendors must navigate. Modern platforms increasingly offer APIs, export capabilities, semantic layers, and standards-based interfaces that can enable FAIRification rather than hinder it. Conceptually, software is not the enemy of FAIR; it is the infrastructure through which FAIR can become real.

I personally think that FAIR principles and software must co-evolve to adapt novel needs from generative AI. FAIR provides principles (such as identifiers, metadata, semantics, interoperability) but it cannot operationalise itself without capable tools, automation, and workflows. Conversely, software neglecting FAIR principles produce isolated, rigid data that limits the users’ freedom to operate and value generation.

The real tension is individual mindsets and/or corporate cultures between system-centric thinking and data-centric thinking. Vendors, data producers, and consumers all play a role in aligning incentives and responsibilities. The future is not a choice between proprietary tools and FAIR ideals; it is a shared path where software produces FAIR data by design, FAIR guides how software should treat data, and both coexist to support flexible, interoperable, AI-ready ecosystems.

We Need a New Story

The storytelling analogy is powerful: FAIR today resembles a story whose ending no longer matches its beginning. The original plot has been overwritten by side characters embodying a system-centric mindset. As a result, the initial FAIR story is, in many places, effectively dead. But the need is not. The challenge is not. And the vision is certainly not.

What FAIR set out to achieve is still urgently needed, perhaps now more than ever. But the acronym “FAIR,” and the enthusiasm it once carried, have been distorted, overused, and ultimately exhausted. As the term was stretched beyond recognition, the promise of better data gradually faded, along with the motivation of FAIR advocates and implementers. “FAIR” became a buzzword that served commercial purpose and entrenched individual interests far more than it served the original mission. As the story was repeatedly tweaked, belief in a truly FAIRified data future steadily diminished. So what now?

A New Story

In the new AI era we’ve entered, we need a refreshed vision, a renewed narrative, a new vocabulary that captures the importance of semantics, identifiers, and machine-actionable metadata as per the original FAIR principles.

The AI hype has created a new momentum around data: to fully benefit from generative AI, we must avoid Garbage In, Garbage Out of AI (GIGO). Avoiding GIGO requires not only improving data quality but also providing high-quality data context: who generated the data, when and where it was created, for what purpose, following which protocol, and with what expertise. Much of this information was meant to be supplied by FAIR metadata. Yet because these descriptive elements are still widely missing, organisations are once again turning their attention to data and metadata quality, searching for a guiding principle capable of bringing order to data chaos: structure, meaning, and trust, all of which are essential for AI value and reliability.

Whether we revive FAIR or coin something new matters less than this: we must restore the narrative that data meaning and semantics can (and should) be understood by machines, not treated as incidental by-products of IT systems to be stored, catalogued, and retrieved.

The next chapter is waiting.