Two years ago, the IDMP ontology (IDMP-O) won the BioITWorld award [https://pistoiaalliance.org/news/pistoia-alliance-wins-2024-bio-it-world-innovative-practices-award-for-idmp-pre-competitive-collaboration/]. We built the first version at Novartis, released it into the pre-competitive space, merged with similar work done at Accurids [https://accurids.com/], and watched it being further developed inside the Pistoia Alliance [https://pistoiaalliance.org/project/idmp-o/] where it has been extended since 2019. A summary of how the story is published by Heiner Oberkampf at [https://www.linkedin.com/posts/heiner-oberkampf_idmp-ontology-bioitworld-activity-7175829175658815490-szqc/]. That part worked. What has not worked is the next step: getting enterprise systems to produce IDMP-compliant data consistently, across the full regulatory landscape, without humans supporting data reconciliation by hand.
The imbalance is visible in the literature. Fewer than 20 papers address IDMP ontology implementation. Several hundred address IDMP regulatory requirements. The industry has written at length about what must be submitted, and almost nothing about how systems should be built to produce those submissions reliably.
I used to think the ontology was the hard part. I was wrong. The ontology defines concepts and how they relate in a shared meaning. It does not tell them how to behave. That is a different artifact, and the industry has yet to build it.
How it started
The first version was a business architecture exercise. My problem at the time was mundane and familiar: regulatory, clinical, safety and manufacturing teams were using the same words to mean different things, and their tabular data and documents could not be reconciled without someone reading each row. I needed stable containers. The ISO IDMP suite: 11615 (medicinal products), 11616 (pharmaceutical products), 11238 (substances), 11239 (dose forms, routes, units of presentation) and 11240 (units of measurement) defined those containers, so I used them.
Releasing the model pre-competitively felt like the natural move. Pistoia gave it a home, and the work became collaborative.
Where my path diverged
Inside Pistoia the modeling took on a different character. The direction became descriptive: a careful, academically rigorous effort to represent every regulatory possibility in an ontology. The work is sound. It answers what do we mean and what concepts exist, what a Medicinal Product is, how it differs from a Pharmaceutical Product, which identifier applies where (MPID, PhPID, SubID), how a substance, a strength and a dose form hang together. As a rich set of specifications and a communication artifact, it has real value.
But I was not trying to describe regulatory meaning. I was trying to constrain how systems handle medicinal product (MP) data: whether a given SubID is valid as an active-substance identifier in an EU submission, whether an activity-based strength is mandatory for a given biologic, whether a packaging hierarchy is complete for the dose form declared, whether a controlled term is the one allowed for this field in this jurisdiction. Those are questions about what “must be”, not about what “is”.
A descriptive ontology treats flexibility as a feature. When you are governing enterprise data, flexibility is the problem you came to solve.
By late 2021 I stopped contributing to the modeling work. The decision had nothing to do with quality. It had to do with purpose. I needed something the ontology was not designed to be.
What I actually need when I work on a use case
When I help organizations deal with cross-jurisdiction product equivalence, SKU-to-licence decoupling, SmPC text processing for reverse engineering in tabular data, regulation-change or label-change impact analysis, PV signal propagation, or GS1 traceability, the full IDMP-O is not what helps me. It is too big (often too big for an LLM context window), and too permissive to make any particular dataset valid.
What I need is an execution artifact: scoped to one use case, unambiguous, formally constrained, machine-processable and context-rich to enable AI. SHACL for validation, SKOS for controlled terms, and narrative explanations of constraints where it fits. Every rule is written once, in a machine-readable format, enforced algorithmically, and not interpreted by a business expert reading a PDF or a developer a TTL file before a system update or release.
Rigidity is the point. Rigidity is what makes the rule binding.
The shape of the work, per use case
For each use case I work on, the recipe is the same:
-
- Trim IDMP-O to the concepts the use case actually needs. Everything else is noise.
- Add an implementation guide (machine-readable, semantic-standard-compliant) that says how those concepts must be structured, populated, and validated.
- Attach the context: the specific business requirement, the success criteria, and the parameters that decide what “correct data” means here e.g. the jurisdiction (EMA, FDA, Swissmedic…), the submission type (initial MAA, variation, renewal, PSUR), the product modality (small molecule, biologic, ATMP, combination product), the effective-date window, the target downstream system… Without this, an AI agent will produce data that conforms to IDMP in general and to no specific application in particular.
That guide becomes the thing a developer or an LLM agent can actually act on. It answers the production question: how do I produce IDMP-compliant data this system will treat as valid?
The gap that makes everything else hard
Most IDMP work in the industry has translated ISO standards into machine-readable artifacts. That work mostly stops at meaning. It does not specify how data should be structured, validated, or operated on inside a running system, for a specific purpose.
The missing layer is the machine-executable implementation layer: the equivalent of a formal implementation guide, written for machines. Without explicit constraints, systems cannot reason reliably over IDMP data. They can look up terms. They cannot enforce validity.
I see the consequence regularly. An organisation tells me it is “IDMP-ready” because it has adopted ISO standards terms and definitions or, in a more data-centric manner, the IDMP-O. In practice it holds a shared vocabulary, descriptive specifications but no machine-executable rule that can refuse a non-compliant record.
A recent example. A pharma company had every RIM field mapped to an IDMP-O class, the graph queries ran, the readiness dashboard was green. Six months in, an EU variation for a pegylated granulocyte colony-stimulating factor left the submission pipeline with mass concentration populated and activity-based strength (IU) missing. Nothing in the stack refused the record, even though EMA requires activity-based strength expression for modified proteins of this kind EMA/CHMP/BWP/85290/2012. A regulatory reviewer caught it by hand at final QC. The ontology carried the concept of “strength” and knew two sub-types existed. No machine-executable rule tied “biologic + EU submission” to “activity-based strength is mandatory”. That single missing constraint is the difference between an ontology and an implementation layer, and it is the reason the manual reconciliation is still required.
What the combined layer actually delivers
Across several use cases, when pairing a fit-for-purpose subset of IDMP-O with an machine-readable implementation guide (SHACL, SKOS, explicit rules to be interpreted by as AI prompt), many things change such as:
-
- Cross-jurisdiction equivalence stops being a judgement call and becomes a rule the system can enforce.
- Impact analysis becomes deterministic. Change an excipient, and the system tells you every submission, dataset and supply-chain element that now needs to move.
- SmPC-to-IDMP extraction by an LLM agent becomes auditable. The agent is not left to “interpret” the SmPC; it fills a constrained shape, and the shape either validates or not.
- Pharmacovigilance signal-to-label propagation stops depending on someone remembering where a substance is used. The graph answers the question.
- IMP-to-MP transitions and M&A product lineage become queries rather than side projects.
- Traceability across business domains becomes computable, because GS1 identifiers, IDMP concepts and validation rules live in the same governed layer.
The ontology still carries the meaning. The implementation guide makes that meaning translate into data transformations across actions domains.
Where I am less certain
This approach has limits though… Writing an implementation guide per use case is not cheap, and the moment the business requirement shifts, the guide has to be revised unless you implement AI business monitoring capabilities.
The industry does not yet have a shared convention for versioning these guides, nor tooling mature enough to diff them the way we diff code. Raphael Sergent has proposed to tackle this as part of ISO/TC 215/SC 2, and thanks to Sheila Elz, ISO/TS 21405:2026 now provides a methodology and framework for the development and representation of IDMP ontologies, which is a step in the right direction but does not yet reach the implementation-guide layer I am describing here.
If every pharma company writes its own implementation guides in isolation, we risk recreating the interoperability problem one layer up, in the rules rather than the terms.
I think the right path is collaborative e.g. shared patterns, shared SHACL profiles and that part is far from being solved.
Why it is worth doing anyway
The real cost inside every IDMP programme I have seen is not the ontology work. It is cross-system reconciliation: lining up RIM, PLM, ERP, pharmacovigilance, and regulatory submissions by hand because the alignment rules are not enforced anywhere. That is where the money and the fatigue go.
The industry has been making the same diagnosis for years. The post-mortems on the missed 2016–2017 EMA deadline consistently point to the same root cause: siloed data across regulatory, manufacturing and commercial functions, and the mapping, cleaning and integration work required to bring it into IDMP shape [https://pharmaceuticalmanufacturer.media/pharmaceutical-industry-insights/mastering-idmp-%E2%80%94-what%E2%80%99s-involved-in-master-data-management-a/] [https://www.linkedin.com/pulse/why-idmp-compliance-pharmaceutical-industry-has-been-delayed-crone-uo0re/]. IQVIA’s industry-readiness studies on IDMP reach a similar conclusion: the blocker is not the standard, it is the data-quality and mapping debt that surrounds it. The phased rollout now running into 2025–2026 is, at ground level, a reconciliation-debt problem, not a standards problem.
A canonical product backbone with explicit constraints collapses most of that work. Dossier checking, SmPC-to-IDMP extraction, jurisdiction-agnostic product cores, label-change propagation, E2B reporting, IMP-to-MP transition, M&A traceability… each stops being a recurring project and becomes a repeatable operation.
The arithmetic* follows:
-
- Mid-size pharma, ~50 products, ~$1M/year maintenance: 30–50% reduction → $300–500K/year
- Top-10 pharma, 500+ products: $1–4M/year
- Across ~500 MAHs: $100M+/year industry-wide
These are not savings from better documentation. They come from taking manual reconciliation out of the daily workflow.
*based on a ~$20K/product/year IDMP business-as-usual cost (industry consensus range $10–40K) and a 30–50% reconciliation share reported in the IDMP-O Benchmark Report (2024) — “Accelerating Digital Transformation in Pharma with IDMP” and IQVIA readiness studies: 1) Navigating IDMP through the Extended EudraVigilance Medicinal Product Dictionary (XEVMPD), 2) Is Your Organization IDMP Ready? and 3) New Challenges in Pharmacovigilance
What we are doing differently
At MIGx AG leveraging BioMedima assets, we focus on serializing the creation of fit-for-purpose semantic layers (IDMP is one example among other) to maximize value generation at the level of concrete use cases, rather than leveraging ontologies as overarching, static, descriptive artifacts.
Building on IDMP-O as a foundation, we systematically trim it to use case-specific entities, enrich it with executable implementation instructions (constraints, rules, and validation logic) tailored to the specific requirements and success criteria for one use case (e.g., regulatory submissions, system integration, pharmacovigilance, or supply chain traceability).
Our approach is underpinned by a thorough, AI-enabled Knowledge Elicitation methodology that ensures every semantic layer is operational, machine-actionable, and AI-ready by design, enabling immediate automation or use case-specific insights generation. In doing so with IDMP-O, we transform compliance specifications from static description into an industrialized foundation for data governance and product intelligence, one that delivers measurable business value, use case by use case.