Start with Why: Driving Data Architecture with Purpose

At Dot Collective, my team recently conducted a strategic engagement with a UK regulator facing serious data sprawl. They wanted tools that fitted their use cases, but this wasn't about picking the tools; it was about understanding the organisational and stakeholder needs. We applied a principle that I call "Why First Architecture": before we decide what to build or how to deliver, we agree on why the change matters and what makes it necessary.
Contents
Author
Glenn is the Why-First Data Architect. You’ll find him where data complexity meets business urgency — building clarity as a Senior/Enterprise Data Architect, Solutions Architect, or Business Intelligence Lead. He is now focused on sectors where data shapes ecosystems and resilience: Aviation and Airports | Maritime and Ports | Rail and Transport Networks | Public Sector and Government (Housing, Compliance, Resilience) These are the industries where his strengths in data architecture, governance, and interoperability create the greatest impact — ensuring data flows across hubs, networks, and platforms to support both business outcomes and public trust.
Context: Traditional in the cloud → Cloud native technologies
The client’s estate was fragmented and largely traditional architecture running in the cloud. Services were tightly coupled with batch ETL and long release cycles. Stable, but slow to change. Priorities of the enterprise data warehouse team were unaligned with those of the reporting team. Manual file transfer for inbound data and data between systems via manual CSV extracts through SFTP/email/SharePoint was widespread, and teams often loaded data directly into BI tools to move fast. Those shortcuts bypassed governed pipelines, ignored guardrails, creating audit gaps and fragile dependencies.
Our goal was not live/ streaming data. It was disciplined, governed batch: scheduled, reproducible, and auditable, fit for regulatory scrutiny. The recommendations as an outcome of the engagement were to bring all critical manual feeds into a modern data lake house through governed interfaces and orchestrated batch pipelines, serving analytics from certified, curated datasets rather than direct loads into BI tools. Design was anchored to regulatory non-functional requirements, including retention & disposition, security classification, RPO/RTO for batch loads, and reproducibility for audit purposes.
What cloud native brings to regulators
- Elasticity & cost control: Scale scheduled workloads for month/quarter peaks and scale down afterwards.
- Governed ingestion: Replace ad-hoc file drops with contracted interfaces, validations, quarantines and orchestrated batch pipelines into the lakehouse.
- Unified lakehouse foundation: open formats and ACID/table governance to manage raw, curated and consumption layers from all platforms with auditability.
- Data products (certified datasets): curated, versioned, documented; consumed by BI not loaded into BI.
- Compliance by design: policy as code, end-to-end lineage, and access controls embedded in batch pipelines.
- Safer change: automated checks and reproducible runs for audits.
- Interoperability: open formats and welldocumented interfaces for industry collaboration.
Challenges
- Fragmented architecture: siloed domains and unclear flows which were difficult to evolve and integrate.
- Legacy + manual movement: CSV/SFTP/email drops and Direct to BI loads bypassing lineage and controls embedded into organisational culture.
- Weak strategic alignment: delivery without a shared, evolving roadmap.
Approach
Understand the Now
We mapped the capabilities and pain points and explicitly tagged manual handoffs and direct BI loads as high-risk shadow pipelines with no lineage/SLOs.
Capability & Maturity Assessment
We evaluated the stack against data platform capabilities with a focus on data ingestion, storage, processing, access, integration, governance and quality.
Target Operating Model
Modern data lakehouse with governed, orchestrated batch ingestion replacing manual transfers. Data lands in Raw, is validated and transformed into Curated layers and exposed as Certified datasets with owners, and SLOs for batch freshness. BI tools consume from certified layers with no direct loads and a phased roadmap. Inbound files follow a simple contract (name, checksum, schema version). Nonconforming files can now be quarantined with automated feedback to publishers.
Options & Costing
We presented options and trade-offs for scheduled ingestion/orchestration, catalog/lineage, policy enforcement, and cost governance so leadership could decide with confidence.
Impact
- Manual feeds unified into a modern data lakehouse, eliminating adhoc file drops and direct BI loads.
- A traceable link from business goals to design choices, anchored in regulatory outcomes.
- Governance enablement by design: auditable lineage, leastprivileged access, reproducible runs across raw, curated and consumption layers.
- Risk reduction via legacy remediation and retirement of manual, uncontrolled feeds.
- A culture shift to architectureled, batchnative cloud operations with less duplication, faster, safer change.
Lessons Learned
- Start with Why. Tools without purpose add complexity.
- Evolve deliberately. From traditional in cloud-to-cloud native with a lake house foundation aligned with the organisation existing tech stack.
- Governance enables value. Policyascode, lineage, and access controls accelerate safe delivery.
- Show don't tell. Well-defined definitions and transform abstract issues into priorities.
- Architecture through conversation. Engage early and often to align expectations.
Data architecture provides a managed decision framework. When starting from Why, and evolving to a new organisational approach, in this case a cloud native, governed approach with flexibility on a modern data lakehouse, the transformation becomes strategic, auditable, and sustainable.
If you’re rethinking your data estate and want architecture grounded in business outcomes, discover how we partner with organisations to get there. Get in touch.
Author
Glenn is the Why-First Data Architect. You’ll find him where data complexity meets business urgency — building clarity as a Senior/Enterprise Data Architect, Solutions Architect, or Business Intelligence Lead. He is now focused on sectors where data shapes ecosystems and resilience: Aviation and Airports | Maritime and Ports | Rail and Transport Networks | Public Sector and Government (Housing, Compliance, Resilience) These are the industries where his strengths in data architecture, governance, and interoperability create the greatest impact — ensuring data flows across hubs, networks, and platforms to support both business outcomes and public trust.


