why a knowledge graph vs a relational db

why foreign keys can never function as relationship rules

Start for Free

Start For Free

Timebase Atlas lets you start for free and prove value before every making a commercial decision.

Download

a knowledge graph is key to manufacturing

What a Knowledge Graph Actually Is, and Why Foreign Keys Are Not Edges

There is a conversation that happens in nearly every manufacturing digital transformation program, usually around the time someone proposes building a plant model. An architect looks at the requirement, looks at the stack, and says something reasonable: we already run Postgres, or Timescale, or Influx. Tables for equipment, tables for processes, foreign keys for the relationships. A knowledge graph is just nodes and edges, and an edge is just a foreign key. Why introduce a new class of system for something the relational model has handled since 1970?

It is a fair question, and the instinct behind it is sound. Good architects resist adding technology. The problem is that the equivalence at the heart of the argument, that a foreign key is an edge by another name, is wrong in a way that stays invisible during design and becomes expensive after deployment.

What a Knowledge Graph Actually Is

Strip away the vendor language and a knowledge graph is two things.

The first is an ontology: a set of types that describe what exists in your world and how those things can relate. 'Tank' and 'Pump' is a type. 'Feeds', 'isPartOf', and 'interlocksWith' are relationship types, defined with the same rigor as the entity types they connect.

The second is an instance model: the real things, Tank-101 and Pump-7 and the Line 2 pasteurizer, each an instance of a type, connected by instances of those relationships.

The property that makes this a knowledge graph rather than just a diagram is that the relationships are first-class objects. An edge has a type. It can carry properties of its own. Tank-101 feeds Reactor-3 is not a mere association between two rows; it is a fact with identity, and it can carry qualifiers such as "during fermentation" or "since the 2024 repipe." The meaning of the operation lives in the structure itself, and the structure can be queried directly: give me everything downstream of Pump-7, show me every interlock that touches Line 2, find any path between this alarm and that batch.

That is the whole idea. Entities, typed relationships as real objects, and a query surface that traverses structure. Everything that follows in this article about relational databases comes down to how each of those three elements maps, or fails to map, onto tables.

Foreign Keys Are Constraints. Edges Are Facts.

Here is where the architect's equivalence breaks. A foreign key is not a thing in a relational database. It is a constraint on things: a rule saying that a value in this column must match a value in that one. It has no type beyond the columns it joins, it cannot carry properties, it has no identity of its own, and there is no way to ask the database about it as an object. The relationship it encodes exists only in the schema, which means the meaning of the relationship exists only in the head of whoever designed the schema and whatever documentation they left behind.

An edge in a knowledge graph is the opposite. It is data. It exists as a record with a type, properties, and a lifecycle. You can create it, qualify it, version it, and query it without touching the schema. This sounds like a fine technical distinction, but it drives a consequence that decides the fate of the whole modeling effort.

how relationships are handled

In a relational database, relationships are schema. In a knowledge graph, relationships are data. That single difference determines who can extend the model, how often it will break, and whether it survives contact with a real plant.

Relationships as schema means every new kind of relationship is a design change. The plant discovers it needs to represent "shares a CIP circuit with," and the answer is a new join table, a migration, a change ticket, and a wait for the person who owns the schema. Relationships as data means the same discovery is an authoring action: a domain expert defines the relationship type and starts using it, and the model grows the way knowledge actually grows, incrementally and from the edges of the organization inward.

The Costs That Hide at Design Time

The relational approach does not fail at the whiteboard. On the contrary, the whiteboard phase goes beautifully, because a static snapshot of a plant maps onto tables well enough. The costs are structural, and they arrive in three forms.

Relationship types multiply, and each one is a table
A plant model does not have one kind of relationship. It has dozens: physical containment, material flow, electrical supply, control relationships, interlocks, procedural dependencies, calibration references. In a relational design, every many-to-many relationship type becomes its own join table, and the model's expressiveness is capped by how many of these anyone is willing to create and maintain. Teams respond in one of two ways. Either they under-model, collapsing distinct relationships into a generic "related_to" table that throws away the meaning the model existed to capture, or they reach for the entity-attribute-value pattern, storing everything as generic triples in a few giant tables. EAV is the relational world's confession that the schema could not hold the domain. It abandons typing, ruins query performance, and produces a database that is technically Postgres and practically an unindexed pile of assertions.

There is a third path, the diligent one, and it is worth playing out to its conclusion. Suppose the team refuses both shortcuts and commits to modeling every relationship properly. Material flow gets a table. Containment gets a table. Interlocks, CIP circuits, calibration references, electrical supply, each gets its own. Then the requirements arrive that any real relationship carries. Relationships need qualifiers, because Tank-101 feeds Reactor-3 only during fermentation, so each table grows condition columns. Relationships need validity windows, because the plant changes, so each table grows effective-from and effective-to timestamps. Some need direction, some need ordinal position, some need a reference to the change order that created them. Twenty relationship types become twenty tables carrying nearly identical scaffolding, each with its own indexes, its own insert paths, and its own variant of every query. Eventually a sharp engineer notices the duplication and consolidates: one table, with source, target, a relationship-type column, a JSONB field for qualifiers, and a pair of validity timestamps. It feels like a clean refactor. It is actually a surrender, because that table is an edge store. The team has hand-built a graph database inside Postgres, minus the type system, minus the traversal engine, minus the query language, on a storage engine tuned for exactly the access patterns the model no longer has. The diligent path and the EAV shortcut converge on the same place.

right idea, wrong tool

Taken seriously, the relational approach to knowledge modeling does not compete with a knowledge graph. It slowly and expensively becomes a worse one.

Traversal is the query you need and the query SQL punishes
The questions a plant model exists to answer are path questions. What is downstream of this valve? What does this interlock ultimately protect? Which batches passed through equipment that shares a lubrication system with the pump that failed? These are variable-depth traversals, and in SQL they become recursive common table expressions: hard to write, harder to review, and prone to performance collapse as the model grows, because the join cost compounds with every hop. A graph engine treats traversal as its native operation. The query says "follow feeds relationships downstream from Pump-7 to any depth," and the engine walks the structure. The difference is not elegance. It is whether the questions that justify the model are ones your engineers will actually be able to ask.

The schema assumes uniformity the plant does not have
Relational modeling wants populations of similar things: many rows, same columns. A plant is the opposite, a long tail of nearly-unique equipment where this tank has a jacket and that one does not, this line has an inline densitometer and its sister line uses lab samples. The relational responses are all bad: hundreds of near-empty nullable columns, a subtype table for every variation, or the EAV escape hatch again. A graph with a proper ontology handles variation natively, because types can be extended and instances can carry the properties they actually have.

What Happens After Deployment

Suppose the team pushes through anyway. The schema is designed, the migrations run, the model is loaded, and for a few months it works, because for a few months it describes the plant as it was on day one. Then the real world starts.

The first thing that happens is change. A pump is replaced, a line is repiped, a recipe moves to different equipment. In the relational model, updating the current state is easy, but the previous state is destroyed by the update. When quality asks how the system was connected in March, when Batch 47 ran, the honest answer is that the database no longer knows. Retrofitting structural history onto a relational model means temporal tables and validity intervals on every join table, which roughly doubles the schema's complexity and makes every traversal query time-qualified. A knowledge graph built for operations treats time as intrinsic: relationships have lifespans, and "the plant as of March" is a query parameter, not an archaeology project.

The second thing that happens is exceptions. Every plant model meets the one tank that violates the pattern, the temporary bypass that becomes permanent, the relationship that only holds during a particular product run. Each exception in a relational model is a schema decision, and schema decisions queue behind one owner. The migration backlog becomes the rate limiter on how fast the organization can describe its own reality, and domain experts learn that getting their knowledge into the model takes a ticket and a month. They stop trying. The model freezes at its day-one snapshot and quietly becomes wrong.

the slow decay of accuracy

A relational plant model is accurate on the day it ships and decays from there, because every change to reality requires a change to schema. The model ends up describing the plant the architect designed for, not the plant you run.

The third thing that happens is the quietest and the worst. The schema itself becomes tribal knowledge. Only the original architect (often contracted labor) knows why the join tables are shaped the way they are, which columns are trustworthy, and how to write the recursive query that answers a downstream-impact question. The system built to externalize operational knowledge has manufactured a new dependency on one person's head. Anyone who has read about knowledge governance will recognize the pattern: the failure mode was not eliminated, only relocated.

the correct positioning of solutions

Where relational databases and knowledge graphs best fit

Where Relational and Time-Series Databases Still Belong

None of this is an argument against Postgres, Timescale, or Influx, which are superb at what they were built for. Time-series databases in particular are the right home for the value stream: millions of timestamped measurements, compressed, retained, and served fast. That job does not go away when a knowledge graph arrives, and pretending otherwise would be as naive as the foreign-key argument in reverse.

The same holds for the transactional systems of record. The MES, the LIMS, the batching system, and the ERP each exist to guarantee the integrity of a specific class of record, and relational databases are precisely the right engine for that job. Each of these systems validates its own records against its own rules: the LIMS enforces what makes a lab result valid, the MES enforces what makes a production event complete, the ERP enforces what makes a financial transaction balance. That segmentation by function is not an accident of procurement history. It is what makes each record trustworthy, because validation happens inside the system that owns the domain, independent of every other database and every other function's rules. Collapsing those records into one shared store, or asking one system to validate another's records, breaks exactly the guarantees the records exist to provide.

So the division of labor is clean, and it has three parts rather than two. The time-series store holds what happened, the values. The transactional systems hold what was done and attested, each validating its own records within its own function. The knowledge graph holds what things are, how they relate, and under what conditions the values and records mean anything, relating across those systems without absorbing or overruling any of them. Store values in the system built for values, keep records in the systems that validate them, and model meaning in the system built for meaning. The mistake is not 'using Timescale', and it is certainly not 'running a proper LIMS'. The mistake is asking any of them to be an ontology.

Building on a Graph That Was Built for This

The last honest objection is that general-purpose graph databases carry their own risks: another server to run, a data engineering skill set OT teams do not have, and a blank canvas where a plant model should be. This is the gap a manufacturing knowledge platform exists to close. Timebase Atlas is built around a knowledge graph engineered for operations, running on a custom backend designed for millions of objects and relationships rather than a repurposed general-purpose store, deployable on-prem, in the cloud, or at the edge. The ontology work does not start from zero, because standards libraries for ISA-95, ISA-88, and ISA-5.1 ship as pluggable starting points. Structure is queryable over Cypher while the underlying historian data flows through untouched, so questions like "which batches on Filler 3 last week ran with lab deviations during a temperature excursion" become one query across structure and values together.

Most importantly, the model is federated by construction. Relationship types and model slices are authored by the domain experts who hold the knowledge, in their own language, without a schema owner in the loop, and the slices compose into one coherent graph. That is the property the relational approach can never offer, because it is a property of treating relationships as data. The architect's instinct to avoid new technology is right most of the time. This is the case where the new thing is not an alternative implementation of the old thing. It is a different answer to what a relationship is, and everything your model will survive after deployment follows from that answer.

finally understand your plant - completely

Build Your Model. Free to Start.
No Conversation Required.

Download Timebase Atlas and create the governed digital blueprint of your plant that every system, application, and AI initiative can rely on.

Download Timebase

FAQs

Why do AI pilots in manufacturing fail even when the underlying data is clean and well-organized?

The most common failure mode is not dirty data or a weak model — it is a data environment that stores values without meaning. An AI system can pattern-match on numbers, but it cannot reason about causes, relationships, or process context unless that information is explicitly modeled. A historian full of clean tag values still cannot tell an AI agent why a temperature reading matters, what upstream condition caused it, or whether it is normal for the current product run. The missing layer is a semantic model of how the facility actually works, and most digital transformation programs skip it entirely because it is difficult to scope and difficult to demonstrate in a 90-day pilot.

What is the difference between data and operational knowledge in a manufacturing context?

Data is what your systems record: a temperature value, an alarm event, a work order completion. Operational knowledge is the structured understanding of why things happen the way they do — how equipment is interconnected, what conditions govern which outcomes, which process steps depend on which utilities, and what "normal" looks like for a specific product under specific conditions. Most of that knowledge currently lives in the heads of experienced engineers, not in any system. When those engineers retire or move on, the knowledge goes with them. Operational knowledge is the context that makes data meaningful, and it is almost entirely absent from conventional manufacturing data architectures.

What is a manufacturing knowledge graph, and how is it different from an asset framework or data hierarchy?

A data hierarchy organizes assets into a tree structure — enterprise, site, area, line, unit, tag — and gives every data point an address. That is genuinely useful for navigation and reporting. A knowledge graph does something different: it makes relationships first-class objects. Instead of a node having one parent, a node in a knowledge graph can have any number of named relationships to any other node — "feeds," "is upstream of," "is required for," "failed during." Real manufacturing facilities are networks, not trees. A heat exchanger serves multiple process lines. A CIP circuit touches equipment across different hierarchy levels. A batch recipe spans physical units in a sequence that changes by product. A data hierarchy cannot express any of that without workarounds. A knowledge graph can, and that structural difference determines whether the model can actually be reasoned on.

Why does the single-architect model fail for multi-site manufacturers?

Building a complete data model for a 14-site operation requires capturing knowledge that is distributed across hundreds of domain experts — process engineers, maintenance teams, QA leads, operators — each of whom understands their piece of the operation deeply. A central data architect cannot extract and formalize all of that knowledge on their own. Hierarchical modeling tools compound the problem because they require the structure to be defined before it can be populated, which creates a bottleneck at the design stage that scales badly across sites and disciplines. A federated approach — where each domain expert authors their own slice of the model and those slices compose into a coherent whole — is the only architecture that works at enterprise scale.

What does knowledge governance mean in a manufacturing context, and why is it different from data governance?

Data governance in manufacturing typically addresses structure: naming conventions, tag standards, data quality rules, access controls. Those things matter, but they govern the shape of data, not its meaning. Knowledge governance addresses the operational layer — ensuring that the structured understanding of how the facility works is captured, versioned, owned, and kept current as equipment changes, processes evolve, and personnel turn over. Most organizations have no owner for that work and no system capable of holding it. The consequence is that knowledge accumulates informally in people, documents, and spreadsheets, and the organization becomes progressively more dependent on individuals whose departure creates operational risk.

What is tribal knowledge, and why is it a strategic risk for manufacturers running more than 10 sites?

Tribal knowledge is the operational understanding that experienced engineers carry but that no system records: which alarm thresholds were set for legacy reasons and which actually matter, how a specific piece of equipment behaves under edge conditions, what the last senior process engineer knew about Reactor-3 that made her the first call whenever something went wrong. At one site, this is a manageable risk. Across 14 sites, it becomes a systematic vulnerability. Each site accumulates its own body of implicit knowledge, none of it is cross-referenced, and the organization has no way to determine what it knows collectively or where its critical knowledge dependencies are concentrated.

What is a Unified Namespace, and what does it actually give a manufacturer versus what it promises?

A Unified Namespace, as most manufacturers have implemented it, is a broker-based architecture — typically MQTT — that gives every data source a consistent topic taxonomy and routes messages to consumers through a single bus. What it delivers is better-organized transport and a naming convention. What it does not deliver is a queryable model of relationships, structural history, or the operational context that makes data meaningful. A UNS as implemented today can tell you what value a tag had at a given moment. It cannot tell you which processes were affected by a change in that value, what the equipment state was upstream, or how the facility was configured during a specific production run. The namespace part of "Unified Namespace" requires a semantic layer that the broker cannot provide.

How does a knowledge graph complete a UNS investment rather than replace it?

The broker layer of a UNS investment handles transport well and does not need to be replaced. What a knowledge graph adds above the broker is the semantic namespace — a structured, queryable model of the facility that gives meaning to the data flowing through the broker. The broker stays as the data highway. The knowledge graph becomes the map. Together, they deliver what the UNS movement was pointing toward: a single coherent model of the operation that any consumer, human or AI, can query without building a new integration project. Organizations that have already invested in MQTT infrastructure are not starting over — they are adding the layer that makes the investment useful at depth.

What does "AI-ready" actually require from a manufacturing data environment?

An AI agent reasoning on manufacturing data needs four things that raw data environments do not provide: a structured model of the facility's equipment and processes, explicit relationships between entities so the agent can traverse causes rather than just retrieve values, a time-aware structure that records how the facility was configured during past events (not just what values were recorded), and a governed model that a domain expert has validated rather than one inferred by the AI from raw data alone. Most vendors describe their platform as AI-ready without addressing any of these requirements. The honest test is whether an AI agent using the platform can answer a specific operational question by reasoning through relationships, not just by pattern-matching on historical values.

Why does the AI corpus problem make the knowledge modeling decision more urgent than it appears?

A facility's knowledge graph — if properly structured and governed — is exactly the shape of dataset needed to fine-tune a language model into one that understands that specific operation. The relationships, conditions, equipment logic, and process context captured in the graph constitute a training corpus that no generic model has and no competitor can replicate. Organizations that begin building their knowledge model now are, without any additional work, building the AI training data that will differentiate their future operational AI from everyone else's. Organizations that defer the modeling work are also deferring that advantage. The gap between the two compounds over time.

How should a multi-site manufacturer evaluate whether their current modeling tools are sufficient?

Three questions cut through most of the noise. First, can the model tell you which processes are affected by a failure in a specific shared utility without someone doing manual analysis? Second, can it tell you which equipment was in what state during a specific batch from three months ago, tracing through actual equipment relationships rather than reconstructing the picture from raw logs? Third, can domain experts at different sites author their sections of the model independently without breaking structures that others have built? If the answers are no, the current tooling is performing as designed for navigation and reporting, but it is not a knowledge model and will not support the AI and operational intelligence use cases the organization is building toward.

What is the actual organizational work required to build a manufacturing knowledge model, and why can't a tool do it automatically?

A knowledge model captures how a facility actually works — the relationships, conditions, process logic, and operational context that make data meaningful. That knowledge does not exist in any data source. It exists in the minds of process engineers, maintenance leads, operators, and QA teams who have accumulated it over years of working with the equipment. A tool can provide the structure to capture and formalize that knowledge, and an AI agent can assist the process by partnering with domain experts during modeling sessions. But the knowledge itself has to come from the people who hold it. No automated process can infer operational relationships from tag values alone with the accuracy and specificity a governed model requires. The work is an organizational commitment, not a software deployment.