Lecture 4: System Integration Design Principles

Anyone can wire two systems together. The discipline of integration architecture is about wiring them together in ways that remain maintainable, reliable, and evolvable over years. The principles in this lecture are the difference between integrations that work in production and integrations that become maintenance nightmares. These principles apply regardless of which technology you use.

Principle 1: Loose Coupling

Coupling measures how much one system depends on the internal details of another. Loose coupling means systems depend only on each other's agreed interfaces — not on implementation details, deployment locations, or internal data structures.

Why Loose Coupling Matters

Tightly coupled systems:

Break when the other system changes internally
Cannot be deployed independently
Cannot be replaced without rewriting the integration

Loosely coupled systems:

Are resilient to internal changes in either system
Can be deployed, scaled, and maintained independently
Can be replaced or upgraded without affecting the other side

Types of Coupling to Avoid

Temporal coupling — System A only works when System B is running at the same time. Solved by: asynchronous messaging.

Location coupling — System A hardcodes System B's URL or IP address. Solved by: service discovery or API gateway routing.

Schema coupling — System A sends raw database rows to System B. Solved by: canonical data models, API-specific response schemas.

Technology coupling — System A requires System B to use the same programming language, framework, or database. Solved by: protocol-level integration (HTTP, AMQP) rather than library-level integration.

Achieving Loose Coupling

Define integration contracts (API schemas, message schemas) as the only coupling point
Use API gateways or message brokers as intermediaries rather than direct system-to-system connections
Design canonical data models that are independent of any system's internal representation

Principle 2: Contract-First Design

Design the integration interface (the contract) before writing any implementation code. The contract is the agreement between producer and consumer.

What a contract specifies:

The operations available (endpoints, message types)
The structure of requests and responses (schemas)
The HTTP methods, status codes, or message formats
Authentication requirements
SLAs (latency, availability)

Why contract-first:

Both sides can develop against the contract simultaneously
The contract can be used to generate test mocks for consumer-side testing before the server is built
Breaking changes are visible — a contract change must be explicitly approved
Documentation is derived from the contract, not written separately

Tools: OpenAPI (REST), AsyncAPI (events and messaging), Protobuf (gRPC), JSON Schema, XML Schema (SOAP/WSDL).

Principle 3: Idempotency

An operation is idempotent if performing it multiple times produces the same result as performing it once.

Why It Matters

Networks are unreliable. A request may be sent successfully but the response is lost — the client does not know if the server processed the request. If the client retries:

A non-idempotent operation creates duplicates (two orders, two payments, two emails)
An idempotent operation is safe to retry — the result is the same regardless

Natural Idempotency

Some operations are naturally idempotent:

GET — reading data twice returns the same data
PUT — replacing a resource with the same data twice has the same result
DELETE — deleting a resource twice leaves it deleted (the second call is a no-op)

POST (create) is not naturally idempotent — calling it twice creates two records.

Making Non-Idempotent Operations Idempotent

Use an idempotency key — a unique identifier generated by the client and sent with the request. The server stores the key and the result of the first successful execution. On re-delivery, it returns the cached result without re-executing.

HTTP

POST /payments HTTP/1.1
Idempotency-Key: client-generated-uuid-abc123

{ "amount": 150.00, "currency": "EUR" }

Apply idempotency to all state-changing operations in integration flows. Design message consumers to be idempotent — in messaging systems, at-least-once delivery means duplicates are possible.

Principle 4: Design for Failure

Assume everything will fail at some point. Network calls time out. Services crash. Databases become unavailable. Data arrives malformed. Your integration must handle all of these without losing data or leaving systems in an inconsistent state.

Failure Categories

| Category | Examples | |----------|---------| | Transient | Network timeout, brief service downtime | | Permanent | Service shut down, endpoint removed | | Data | Missing required field, invalid format, business rule violation | | Systemic | Target system overloaded, schema mismatch |

Retry Policies

For transient failures, retry — but with exponential backoff and jitter:

Attempt 1 → fail → wait 1 second + random jitter
Attempt 2 → fail → wait 2 seconds + jitter
Attempt 3 → fail → wait 4 seconds + jitter
Attempt 4 → fail → wait 8 seconds + jitter
Attempt 5 → fail → route to Dead Letter Queue

Exponential backoff: increases wait time between retries so a recovering service is not immediately overwhelmed
Jitter: random offset prevents all retrying clients from hitting the server at the same instant
Maximum retry count: after N attempts, stop retrying and escalate

Never retry for permanent failures (404 Not Found, 400 Bad Request) — these will not resolve by retrying.

Circuit Breaker

A circuit breaker prevents cascading failures by stopping calls to a failing dependency:

CLOSED (normal): calls pass through
  → if failure rate exceeds threshold:
    OPEN (failing): calls fail immediately, no contact with failing service
      → after timeout:
        HALF-OPEN: one test call allowed
          → success: return to CLOSED
          → failure: return to OPEN

This gives the failing service time to recover while protecting the calling system from being blocked.

Dead Letter Queue (DLQ)

Messages that cannot be processed after exhausting retries must go somewhere. The DLQ is that place — a holding area for messages requiring human investigation.

DLQ rules:

Never silently discard messages
Alert when any message enters the DLQ
Preserve the original message, error reason, and timestamp
Build a mechanism to resubmit after fixing the root cause

Timeout Configuration

Always configure timeouts. Without them, connections can hang indefinitely:

Connection timeout: time to establish the TCP connection (2–5 seconds)
Read timeout: time to receive the complete response (10–60 seconds depending on operation)

Principle 5: Reliability and Delivery Guarantees

Choose your delivery guarantee based on what data loss means for your use case:

At-most-once: message delivered once; if delivery fails, no retry. Possible message loss. Use only for non-critical, high-volume data (telemetry, metrics).

At-least-once: message retried until acknowledged. No loss, but duplicates possible. Consumer must be idempotent. Default choice for business integrations.

Exactly-once: delivered precisely once, no duplicates. Requires broker + consumer coordination (expensive). Use only when the cost of deduplication on the consumer side is higher than the infrastructure cost.

Principle 6: Separation of Concerns

Each component in an integration should do one thing well. Do not mix:

Routing logic with transformation logic
Business rules with protocol handling
Error handling with main processing flow

Why: mixed concerns make code harder to test, harder to change, and harder to debug. When something goes wrong, you want to isolate the problem quickly.

Practical application:

Separate validators (check input correctness) from transformers (convert format)
Separate routing decisions from message processing
Keep integration flow logic out of business domain services

Principle 7: Observability by Design

You cannot manage what you cannot see. Design observability into integrations from the start — do not add it as an afterthought.

The three pillars of observability:

Logs: structured records of what happened. Every integration transaction should produce a log entry with:

Correlation ID (links the full transaction across systems)
Source and target system
Message type and ID
Timestamp of each step
Success or failure status
Error detail if failed

Metrics: numerical measurements over time. Essential metrics:

Throughput (messages/second)
Processing latency (p50, p95, p99)
Error rate (percentage of failed messages)
Queue depth (how many messages are waiting)

Traces: the end-to-end journey of a single transaction across all systems and services. Distributed tracing tools (OpenTelemetry) reconstruct this journey from individual service spans.

Principle 8: Canonical Data Model

When many systems need to exchange data, define a Canonical Data Model (CDM) — a neutral, shared representation of each key data entity.

Instead of building N×(N-1) pairwise transformations between N systems:

System A → System B format
System A → System C format
System B → System A format
System B → System C format
... (grows quadratically)

Build N transformations — one per system — to/from the CDM:

System A ↔ CDM
System B ↔ CDM
System C ↔ CDM

The CDM also acts as the authoritative definition of shared business concepts — what fields an "order" has, what a "customer" looks like, what date formats are used.

Practical tip: start small. Define a CDM for the most-exchanged entities first (customer, order, product). Expand as new integrations are added.

Principle 9: Security by Design

Security controls must be designed in — not bolted on at the end:

Authenticate every caller — no integration should accept data without verifying identity
Authorise every operation — authentication confirms who you are; authorisation confirms what you are allowed to do
Encrypt in transit — all integration traffic must use TLS
Encrypt sensitive data at rest — messages stored in queues containing PII must be encrypted
Principle of least privilege — each integration service account has only the permissions it needs
Secrets management — credentials and keys are stored in a secrets vault, not in configuration files or source code

Principle 10: Evolvability

Integrations must be designed to change over time:

Version your interfaces. Changes are inevitable. Version numbers let you evolve without breaking existing consumers.

Tolerate unknown fields. Consumers should ignore fields they do not recognise. This allows producers to add new fields without breaking existing consumers (Postel's Law: be conservative in what you send, liberal in what you accept).

Backward compatibility first. Prefer backward-compatible changes (adding optional fields) over breaking changes (removing fields, changing types).

Document change history. When an interface changes, document what changed, why, and when. This makes debugging future issues much faster.

Putting Principles Together: Integration Design Checklist

Use this checklist when reviewing an integration design before build:

Coupling and contract:

[ ] Is there a formal interface contract (OpenAPI, AsyncAPI, schema)?
[ ] Does the integration depend only on the contract, not on implementation details?

Reliability:

[ ] Is a delivery guarantee defined (at-least-once, at-most-once)?
[ ] Is the consumer idempotent where at-least-once delivery is used?
[ ] Are retry policies defined with exponential backoff and a maximum?
[ ] Is a Dead Letter Queue configured?
[ ] Are timeouts configured?

Failure handling:

[ ] Are all failure categories identified (transient, data, systemic)?
[ ] Is the error handling path as explicitly designed as the happy path?
[ ] Is there alerting for DLQ messages and error rate spikes?

Observability:

[ ] Does every transaction produce a structured log with correlation ID?
[ ] Are throughput, latency, and error rate metrics exported?
[ ] Are alerts defined for availability and performance thresholds?

Security:

[ ] Is authentication defined?
[ ] Is TLS required for all connections?
[ ] Is the service account scoped to minimum required permissions?

Evolvability:

[ ] Is the interface versioned?
[ ] Is there a deprecation policy?

Lecture 4 Summary

Loose coupling is achieved by depending only on contracts, not implementations. Use API gateways and message brokers as intermediaries.
Design contracts first. Contracts enable parallel development, test mocks, and explicit change management.
Idempotency makes retries safe. Use idempotency keys for POST operations; design message consumers to be idempotent.
Design for failure: define retry policies with backoff, implement circuit breakers, configure DLQs, and always set timeouts.
Choose a delivery guarantee that matches your data criticality. At-least-once is the right default for business integrations.
Build observability in from the start: structured logs with correlation IDs, metrics for throughput/latency/error rate, and distributed traces.

Next: Lecture 5 — System Integration Monitoring