The Living Knowledge Graph: Durability, Roles and Who Owns the Meaning
A companion to "The Semantic Medallion", building out some of what the original walkthrough deliberately scoped out.
When I wrote The Semantic Medallion for Modern Data 101, I had my focus around the Gold layer of a medallion architecture as a knowledge graph, rather than a set of tables. And the maplib library makes the DataFrame-to-RDF transformation tractable enough that “the four lines of Python” is not an exaggeration. The piece was a walkthrough of thoughtwork, not a complete blueprint.
If you’re still not sure why you should care about knowledge graphs, I encourage you to read my piece Data Engineer! Why should you care about Knowledge Graphs?
A thoughtful response from Bjørn Broum raised a set of questions in his essay Semantically Rich, Temporally Unanchored. A stable IRI, he points out, is not a contract. A shared ontology is agreed meaning at a point in time, and meaning drifts. Operational systems evolve on their own timelines, for their own legitimate reasons, without signalling those changes to the graph that depends on them. An agent reasoning over a Gold-layer graph, he argues, is reasoning about “the last time the pipeline agreed with itself”. Connection is not the same as fidelity, richness is not the same as durability, and the architecture that solves the first does not necessarily solve the second.
Well, these are real concerns, and I have some thoughts and answers on this that require more space than a comment. Therefore, this thing! The Modern Data 101 piece showed that you can transform tabular data into a knowledge graph in four lines of Python. This one shows what surrounds those four lines once the graph live in production. Meaning 1) the technical primitives that handle durability, 2) the roles that own them, and 3) the governance framework that gives those roles standing.
A knowledge graph is not a deliverable
The deepest misconception about semantic architectures (and it shows up everywhere, not just in this exchange), is the assumption that a knowledge graph is a static artefact produced by a pipeline. Some describe the graph as something that “settles”, is “handed over”, becomes a “snapshot”. That mental model belongs to data engineering, where you build a thing and ship it. That does not fit knowledge representation, where the artefact is continuously curated and the curation is the work!
A knowledge graph is a living information architecture. The triples in it are not the architecture. The ontology, SHACL shapes, controlled vocabularies, versioning conventions, provenance metadata—that’s the architecture1! And they evolve continuously. When Finance extends the customer definition to include subsidiaries for a new consolidation requirement, the response is not to wait for the next pipeline run. It is to update the ontology, version it, deprecate the old term if necessary, and let consumers migrate against an explicit IRI.
That work is governance work, but governance here does not mean meeting and policy documents. It means editing the artefacts. The artefacts are machine-readable, versional, and queryable. The governance happens in the substrate, not adjacent to it.
This is the move that is easy to miss if you’re reasoning from a data engineering frame. Governance and architecture look like separate concerns; the architecture solves richness, governance has to solve durability, and the second is somehow bolted onto the first. In a working semantic system, governance is architecture, expresses in machine-readable form, maintained by the people who know what the meaning should be.
The durability stack
So, let’s have a look at the primitives. Every concern about temporal fidelity, contract violation, and ontology drift has a corresponding native mechanism in RDF, and the mechanisms compose into a coherent durability stack.
SHACL shapes are the contract layer
A SHACL shape declares what a contributing system has promised about a class of nodes; required properties, limitations of interpretation through constraints, conformance to controlled vocabularies. When CRM contributes a customer node, the shape says: this node must have these properties, with these types, in these ranges, drawn from these vocabularies. When CRM changes the meaning of a field (the “silent” contract-violation scenario where an identifier still resolves, but its semantics have shifted) the shape fails. The validation is loud, machine-readable, and traceable to the specific contributor/shape. SHACL is not optional infrastructure for production knowledge graphs. It is how you turn an identifier into a contract.
Named graphs with provenance are the temporal anchor
RDF datasets are not flat sets of triples. They are organised into named graphs, each of which can carry metadata about who asserted what, when, and under what conditions. Using PROV-O, every named graph can record prov:generatedAtTime, prov:wasAttributedTo, prov:wasDerivedFrom, and validity intervals through prov:startedAtTime and prov:endedAtTime. A SPARQL query can then filter stuff as: “show me only assertions from sources that refreshed in the last 24 hours”, or “show me what we believed about this entity on this date”. The graph is not reasoning about a single coherent snapshot, it is reasoning about claims with metadata attached, and consumers can decide which claims to trust based on that metadata.
Versioned ontologies
An OWL ontology is not just a vocabulary; it is a versioned artifact. owl:versionIRI lets consumers pin to a specific version. owl:priorVersion and owl:incompatibleWith express the compatibility. owl:deprecated marks terms that should not be used in new work, with rdfs:seeAlso pointing to the replacement. Ontologies evolve! Consumers can detect that a term they depend on has been deprecated, validation tools can flag triples using deprecated terms, and migration becomes a traceable action rather than a diff.
Controlled vocabularies, the meaning of things
SKOS lets you express concept hierarchies, with skos:broader, skos:narrower, skos:related, including terms as skos:prefLabel and skos:altLabel for human readability in multiple languages. When the meaning of a category shifts, you don’t redefine the category in place. You introduce the new concept and mark the old one with a skos:changeNote, and let the change carry forward through the graph. The vocabulary is curated by people who know what the concepts mean, in a manner machines can also read. This is elementary logic.
Each primitive solves a specific durability concern. And together they compose into a stack that handles contracts, time, ontology evolution, and conceptual meaning. All in a form where the governance is expressed in the artefacts, rather than adjacent to them.
The roles that own the stack
The durability stack is not self-maintaining, tho. Someone has to write the SHACL shapes, curate the SKOS vocabularies, and version the ontologies. Keepers of the knowledge! The question of who is where many critiques of semantic architectures go wrong, I think.
A working semantic system involves at least four distinct roles, with different skills, different responsibilities, and different relationships to the knowledge graph. (And of course, one person can hold several of these roles.)
The data engineer
She builds and operates the pipeline. They configure ingestion from source systems, manage the Bronze and Silver layers, handle scheduling and orchestration, and integrate transformation into the workflow. Her craft is reliability, throughput and operational discipline. They are not, and should not be, the ones that decide what “customer” means.
The knowledge engineer
The knowledge engineer write the SHACL shapes, OTTR templates (or any other mapping), SPARQL queries that downstream consumers run. They translate domain knowledge into machine-readable constraints and translate operational requirements into ontology patterns. Their craft is at the interface between domain meaning and formal representation. They are the ones who turn a domain expert’s “a fragment must have a provenance designation” into a SHACL shape that screams when it does not.
The ontologist
She designs and maintain the ontology itself. The class hierarchy, the properties, alignment with external ontologies, and versioning strategy. She work with domain experts to capture meaning and with knowledge engineers to make that meaning enforceable. Her craft is conceptual modelling; getting the structure of meaning right so that everything downstream can rest on it. And often, the ontologist is also known as the information architect.
The domain expert
The domain expert is source of authority on meaning. The compliance officer who knows who a regulatory definition has shifted. The finance guy who knows when subsidiaries need to be folded into the consolidation. They are not engineers, and they should not have to become engineers to express their expertise. But they are often the users of data/-products/-systems.
In a join-logic architecture, where the Gold layer is Parquet tables joined in dbt, the meaning of “customer” lives in SQL scripts. The domain expert cannot inspect that code, cannot validate it or correct it. They file tickets and review PRs they cannot read. Every translation step between their knowledge and the system is a place where meaning gets lost, it’s like a whispering game. So, meaning drifts. Not because governance is hard, but because the ones that know what the meaning should be have no instruments to express it with. Bring the meaning of data closer to the source of their meaning!
In a semantic architecture, they do! A SKOS vocabulary is editable by anyone who can use a structured editor, like VocBench, PoolParty, WebProtégé, a well designed web form, or even Excel with mapping flavours(!!) The compliance officer can maintain the concept hierarchy for regulatory categories directly. The finance guy can flag, in a machine-readable form, that the customer concept has been extended. A SHACL shape, written once by a knowledge engineer in consultation with the domain expert, express constraints both can read and validate. Ontology evolution becomes a conversation between domain experts, knowledge engineer, and ontologist, conducted in the same base all three can inspect.
This is what makes governance logic easier to maintain than join logic in a real organisation. ❤️ The maintenance load moved to the people who actually know the answer, and the engineering team stops being the bottleneck for every semantic change. The domain expert stops being a passive reviewer of work they can’t fully see. This foundation, the knowledge graph, makes the right division of labour possible, which is exactly what data engineering has been trying and failing to do for two decades with documentation, data dictionaries, and business glossaries that nobody maintains because nobody can2.
The governance that owns the roles
The roles needs something to drag them in their ears. A knowledge engineer who write a SHACL shape that fail when CRM violated its contract has accomplished nothing if no one has the authority to require CRM to fix the violation. Same for domain experts flagging drifting ontologies or vocabularies, somebody has to be responsible for acting on it. The architecture alone does not generate this authority. But luckily, frameworks exist! And for the Norwegian public sector (and private sector, for that matter), where I spend most of my time, this is called Orden i eget hus3.
Orden i eget hus is a framework developed by a range of public organisations in Norway, maintained by the Digitalisation Agency. It is a framework for information management, and it names the authority structure explicitly. It defines three roles:
data owner (no: dataeier), accountable for dataset and its quality
data steward (no: dataansvarlig), responsible for day-to-day management
data coordinator (no: datakoordinator), facilitate work across the organisation
It defines a process:
cataloging data,
describing it in a concept catalog (no: begrepskatalog) using application profiles for RDF4-standards, maintained as an ongoing service,
assessing access levels,
publishing descriptions for internal and external reuse (at data.norge.no).
It defines a maturity model, a five-point scoring system across defined axes, which lets organisations measure where they are and where they need to be. And it has legal anchoring in legislations. For Norwegian public organisations, this is not optional.
This framework maps onto our roles, as data owner owns the SHACL shapes for their domain as they have a standing to require systems to conform, because their accountability for data quality is formally assigned. The concept catalog is a SKOS vocabulary (SKOS-AP-NO for Norway, SKOS-AP for EU), and published as machine-readable concepts through e.g. Felles datakatalog (data.norge.no). The maturity model's "structured form, systematically maintained" criterion is exactly what versioned ontologies and named graphs provide.
The governance structure is not hypothetical, and where it exists, it composes naturally with the technical primitives. Machine-readable knowledge representation.
Where this leaves the architecture
Semantic richness and semantic durability are different properties. But the picture in which richness is the deliverable and durability is the “missing half” that has to come from the outside does not match how production semantic systems actually work.
Durability is not external.
SHACL shapes are the contract layer. Named graphs with PROV-O are the temporal anchor. Versioned ontologies with deprecation axioms are the drift signal. SKOS vocabularies are the meaning layer. Each primitive solves a specific durability concern, and together they compose into a stack that handles the failure modes critics name.
Durability is not external to the roles.
Knowledge engineers write the shapes, ontologists evolve the model, domain experts own the meaning, data engineers operate the pipeline. The right division of labour is possible because the substrate is machine-readable in a form domain experts can also read.
Durability is not external to governance.
Frameworks like Orden i eget hus assign accountability, define processes, measure maturity, and provide legal standing. The technical stack and the governance framework wire together at every level. The data owner owns the shape, the concept catalog is the SKOS vocabulary, the maturity model measures structured maintenance.
Decoration would be either half without the other: SHACL shapes nobody owns, or owners with nothing to enforce. The interesting work, the work I find most worth doing, is the seam where the substrate meets the structure. That seam is where a knowledge graph stops being a tutorial example and starts being an architecture.
Relevant content from the author
From SQL Constraints to SHACL Shapes: Declarative Data Validation (Substack)
What is an: Ontologist or Knowledge / Data Engineer? w/ Katariina Kari & Veronika Heimsbakk (Ashleigh N. Faith, YouTube)
Data Engineer! Why should you care about Knowledge Graphs? (Substack)
Other highlighted relevant readings
Jessica Talisman, MLS, Where Provenance Ends, Knowledge Decays (Substack) - The title speaks for itself.
Tara Raafat, Human-Centered Knowledge Graph and Metadata Leadership (Episode 41 of Knowledge Graph Insights by Larry Swanson) - On the people that makes a successful knowledge graph team, and beyond.
Teodora Petkova, Into the Hear of a UX-driven Knowledge Graph: IKEA’s RDF Way Forward (teodorapetkova.com) - On bridging the gap between users and data.
About the author
Ontologist who code in C, Java and Python. Been a knowledge graph practitioner for over a decade, and had numerous talks on the topic around the globe. Knowledge Graph Specialist at Data Treehouse. Author of SHACL for the Practitioner and awarded amongst Norway’s Top 50 Women in Tech 2024.
All of this happens to be represented as triples too. Power of the triple!
Satt litt på spissen, as we say in Norway.
Direct translation: order in one’s own house
Resource Description Framework (standardised in 1999 by W3C), foundational technology in a knowledge graph. Read more in A brief introduction to the syntax of knowledge graphs.


