The Hidden Cost of Dirty Data (Part 2): When Workarounds Become Structural Risk

Another common pattern emerges when data quality issues are discovered downstream.

In many organisations, engineers and analysts don’t own the source systems. Fixing data at the origin requires coordination, influence, and time. Fixing it downstream is faster and firmly within their control.

So they do what makes sense in the moment. They engineer around the problem.

Transformations are added. Rules are layered in. Pipelines compensate.

I’ve seen this play out in environments where the root cause was well understood, but addressing it upstream was considered too hard or out of scope. The fastest path was to fix the data where it landed, even though everyone knew the issue would return.

This isn’t poor engineering. It’s a rational response to organisational boundaries.

The hidden cost is structural.

Problems are solved once, for one use case. The source remains unchanged. Other consumers continue using the same low-quality data. Over time, complexity grows and fragility increases. The organisation becomes dependent on fixes that only a few people fully understand.

When AI is introduced, these compensations become liabilities.

Models trained on heavily patched data are harder to validate, harder to explain, and harder to trust. The effort required to defend outcomes increases rather than decreases.

Silent upstream changes

In many enterprise environments, changes upstream happen quietly.

Logic inside database views is updated. Source system behaviour changes. Schemas evolve. Often, these changes are made with good intent and limited visibility into downstream impact.

The effect is rarely immediate failure. Instead, metrics drift. Model performance degrades slowly. Teams sense that results are “off” without being able to point to a clear cause.

I’ve seen this particularly where views are treated as stable sources, even though they’re easier to change than underlying tables. Without clear lineage or communication, downstream teams are left reacting rather than anticipating.

AI amplifies this problem.

Small upstream changes can have disproportionate effects when models retrain automatically. Without visibility into how data changes over time, diagnosing issues becomes slow and reactive.

The hidden cost here is confidence. When teams can’t easily explain why outputs change, trust erodes, even if the data is technically correct.

In Part 3, I’ll focus on what minimum viable data quality actually looks like — and the engineering practices that make AI reliable rather than fragile.