Word 1: This publish is a component 2 of a three-part sequence on healthcare, data graphs, and classes for different industries. Half 1, “What Is a Data Graph — and Why It Issues” is offered right here.
Word 2: All photographs by creator
In Half 1, we described how structured data enabled healthcare’s progress. This text examines why healthcare, greater than another trade, was capable of construct that construction at scale.
Healthcare is probably the most mature trade in the usage of data graphs for just a few elementary causes. At its core, drugs is grounded in empirical science (biology, chemistry, pharmacology) which makes it potential to determine a shared understanding of the varieties of issues that exist, how they work together, and causality. In different phrases, healthcare lends itself naturally to ontology.
The trade additionally advantages from a deep tradition of shared managed vocabularies. Scientists and clinicians are pure librarians. By necessity, they meticulously listing and categorize every part they will discover, from genes to ailments. This emphasis on classification is bolstered by a dedication to empirical, reproducible statement, the place information have to be comparable throughout establishments, research, and time.
Lastly, there are structural forces which have accelerated maturity: strict regulation; sturdy pre-competitive collaboration; sustained public funding; and open information requirements. All of those components incentivize shared requirements and reusable data somewhat than remoted, proprietary fashions.
Collectively, these components created the situations for healthcare to construct sturdy, shared semantic infrastructure—permitting data to build up throughout establishments, generations, and applied sciences.
Ontologies
People have all the time tried to know how the world works. Once we observe and report the identical factor repeatedly, and agree that it’s true, we develop a shared understanding of actuality. This course of is formalized in science utilizing the scientific methodology. Scientists develop a speculation, conduct an experiment, and consider the outcomes empirically. On this method, people have been creating an implicit medical ontology for 1000’s of years.
Otzi, the caveman found in 1991, who lived 5,300 years in the past, was found with an antibacterial fungus in his leggings, more likely to deal with his whipworm an infection (Kirsch and Ogas 4). Even cavemen had some understanding that crops might be used to deal with illnesses.

Ultimately, scientists realized that it wasn’t the plant itself that was treating the ailment, however compounds contained in the plant, and that they might mess with the molecular construction of those compounds within the lab and make them stronger or more practical. This was the start of natural chemistry and the way Bayer invented Aspirin (by tweaking Willow bark) and Heroin (by tweaking opium from poppies) (Hager 75; Kirsch and Ogas 69). This added a brand new class to the ontology: compounds. With every new scientific breakthrough, our understanding of the pure world advanced, and we up to date our ontology accordingly.

Over time, drugs developed a layered ontology, the place every new class didn’t exchange the earlier one however prolonged it. The ontology grew to incorporate pathogens after scientists Fritz Schaudinn and Erich Hoffmann found the underlying reason behind syphilis was a bacterium known as Treponema pallidum. We discovered microbes might be discovered nearly in every single place and a few of them may kill micro organism, like penicillin, so microbes had been added to our concept.

We discovered that DNA comprises genes, which encode proteins, which work together with organic processes and danger components. Each main advance in drugs added new lessons of issues to our shared understanding of actuality and compelled us to purpose about how these lessons work together. Lengthy earlier than computer systems, healthcare had already constructed a layered ontology. Data graphs didn’t introduce this mind-set; they merely gave it a proper, computational substrate.
As we speak, we’ve got ontologies for anatomy (Uberon), genes (Gene Ontology), chemical compounds (ChEBI) and a whole lot of different domains. Repositories reminiscent of BioPortal and the OBO Foundry present entry to effectively over a thousand biomedical ontologies.
Managed vocabularies
As soon as a category of issues was outlined, drugs instantly started naming and cataloging each occasion it may discover. Scientists are nice at cataloging and defining cases of lessons. De materia medica, the primary pharmacopoeia, was accomplished in 70 CE. It was a e-book of about 600 crops and about 1000 medicines. When chemists started working with natural compounds within the lab, they created 1000’s of latest molecules that wanted to be cataloged. In response, the primary quantity of the Beilstein Handbook of Natural Chemistry was launched in 1881. This handbook catalogued all recognized natural compounds, their reactions and properties, and grew to comprise thousands and thousands of entries.

This sample repeats all through the historical past of medication. Each time our understanding of the pure world improved, and a brand new class was added to the ontology, scientists started cataloging all the cases of that class. Following Louis Pasteur’s discovering in 1861 that germs trigger illness, folks started cataloging all of the pathogens they might discover. In 1923, the primary model of Bergey’s Guide of Determinative Bacteriology was printed, which contained a few thousand distinctive micro organism species.

The identical sample repeated with the invention of genes, proteins, danger components, and antagonistic results. As we speak, we’ve got wealthy managed vocabularies for situations and procedures (SNOMED CT), ailments (ICD 11), antagonistic results (MedDRA), medication (RxNorm), compounds (CheBI and PubChem), proteins (UniProt), and genes (NCBI Gene). Most giant pharma firms work with dozens of those third-party managed vocabularies.
Considerably confusingly, ontologies and managed vocabularies are sometimes blended in follow. Massive managed vocabularies steadily comprise cases from a number of lessons together with a light-weight semantic mannequin (ontology) that relates them. SNOMED CT, for instance, consists of cases of ailments, signs, procedures, and scientific findings, in addition to formally outlined relationships reminiscent of has intent and attributable to. In doing so, it combines a managed vocabulary with ontological construction, successfully functioning as a data graph in its personal proper.
Laws
Following a mass poisoning that killed 107 folks attributable to an improperly ready “elixir” in 1937, the US authorities gave the Meals and Drug Administration (FDA) elevated regulatory powers (Kirsch 97). The Federal Meals, Drug, and Beauty Act of 1938 had necessities on how medication needs to be labeled and required that drug producers submit security information and a press release of “supposed use” to the FDA. This helped the US largely keep away from the thalidomide tragedy within the late Nineteen Fifties in Europe, the place a tranquilizer was prescribed to pregnant girls to deal with nervousness, bother sleeping, and morning illness—regardless of not ever being examined on pregnant girls. This prompted the “largest anthropogenic medical catastrophe ever”, throughout which 1000’s of girls suffered miscarriages and greater than 10,000 infants had been born with extreme deformities.
Whereas the US largely averted this due to FDA reviewer warning, it additionally uncovered gaps within the system. The Kekauver-Harris Amendments to the Federal Meals, Drug, and Beauty Act in 1962 now required proof that medication had been each secure and efficient. The elevated energy of the FDA in 1938, and once more in 1962, pressured healthcare to standardize on the which means of phrases. Drug firms had been pressured to agree upon indications (what’s the drug meant for), situations (what does the drug deal with), antagonistic results (what different situations have been related to this drug) and scientific outcomes. Elevated regulatory stress additionally required replicable, well-controlled research for all claims made a few drug. Regulation didn’t simply demand safer medication; it demanded shared which means.
Observational information
These regulatory adjustments didn’t simply have an effect on approval processes; they essentially reshaped how medical observations had been generated, structured, and in contrast. To make scientific proof comparable, reviewable, and replicable, information requirements for scientific trials turned codified by means of organizations just like the Medical Information Interchange Requirements Consortium (CDISC). CDISC defines how scientific observations, endpoints, and populations have to be represented for regulatory evaluation. Likewise, the FDA turned the shared terminologies cataloged in managed vocabularies from finest follow to necessary.
Pre-competitive collaboration
One of many enabling components that has led healthcare to dominate in data graphs is pre-competitive collaboration. Numerous the work of healthcare is grounded in pure sciences like biology and chemistry which are handled as a public good. Firms nonetheless compete on merchandise, however most contemplate a big portion of their analysis “pre-competitive.” Organizations just like the Pistoia Alliance facilitate this collaboration by offering impartial boards to align on shared semantics and infrastructure (see information requirements part beneath).
Public funding
Public funding has been important to constructing healthcare’s data infrastructure. Governments and public analysis establishments have invested closely within the creation and upkeep of ontologies, managed vocabularies, and large-scale observational information that no single firm may afford constructing alone. Companies such because the Nationwide Institutes of Well being (NIH) fund many of those belongings as public items, leaving healthcare with a wealthy, open data base able to be linked and reasoned over utilizing data graphs.
Information requirements
Healthcare additionally embraced open information requirements early, making certain shared data might be represented and reused throughout techniques and distributors. Requirements from the World Extensive Internet Consortium (W3C) made medical data machine-readable and interoperable, permitting semantic fashions to be shared independently of any single system or vendor. By anchoring which means in open requirements somewhat than proprietary schemas, healthcare enabled data graphs to operate as shared, long-lived infrastructure somewhat than remoted implementations. Requirements ensured that which means may survive system upgrades, vendor adjustments, and many years of technological churn.
Conclusion
None of those components alone explains healthcare’s maturity; it’s their interplay over many years—ontology shaping vocabularies, regulation implementing proof, funding sustaining shared infrastructure, and requirements enabling reuse—that made data graphs inevitable somewhat than elective. Lengthy earlier than fashionable AI, healthcare invested in agreeing on what issues imply and the way observations needs to be interpreted. Within the remaining a part of this sequence, we’ll discover why most different industries lack these situations—and what they will realistically borrow from healthcare’s path.
Concerning the creator: Steve Hedden is the Head of Product Administration at TopQuadrant, the place he leads the technique for EDG, a platform for data graph and metadata administration. His work focuses on bridging enterprise information governance and AI by means of ontologies, taxonomies, and semantic applied sciences. Steve writes and speaks frequently about data graphs, and the evolving function of semantics in AI techniques.
Bibliography
Hager, Thomas. Ten Medication: How Crops, Powders, and Capsules Have Formed the Historical past of Medication. Harry N. Abrams, 2019.
Isaacson, Walter. The Code Breaker: Jennifer Doudna, Gene Modifying, and the Way forward for the Human Race. Simon & Schuster, 2021.
Kirsch, Donald R., and Ogi Ogas. The Drug Hunters: The Inconceivable Quest to Uncover New Medicines. Arcade, 2017.
















