Lecture 4: System Integration Design Principles
Apply the core design principles that distinguish robust, maintainable integrations from brittle ones: loose coupling, idempotency, reliability patterns, scalability, contract-first design, and designing for failure.
Anyone can wire two systems together. The discipline of integration architecture is about wiring them together in ways that remain maintainable, reliable, and evolvable over years. The principles in this lecture are the difference between integrations that work in production and integrations that become maintenance nightmares. These principles apply regardless of which technology you use.
Principle 1: Loose Coupling
Coupling measures how much one system depends on the internal details of another. Loose coupling means systems depend only on each other's agreed interfaces — not on implementation details, deployment locations, or internal data structures.
Why Loose Coupling Matters
Tightly coupled systems:
- Break when the other system changes internally
- Cannot be deployed independently
- Cannot be replaced without rewriting the integration
Loosely coupled systems:
- Are resilient to internal changes in either system
- Can be deployed, scaled, and maintained independently
- Can be replaced or upgraded without affecting the other side
Types of Coupling to Avoid
Temporal coupling — System A only works when System B is running at the same time. Solved by: asynchronous messaging.
Location coupling — System A hardcodes System B's URL or IP address. Solved by: service discovery or API gateway routing.
Schema coupling — System A sends raw database rows to System B. Solved by: canonical data models, API-specific response schemas.
Technology coupling — System A requires System B to use the same programming language, framework, or database. Solved by: protocol-level integration (HTTP, AMQP) rather than library-level integration.
Achieving Loose Coupling
- Define integration contracts (API schemas, message schemas) as the only coupling point
- Use API gateways or message brokers as intermediaries rather than direct system-to-system connections
- Design canonical data models that are independent of any system's internal representation
Principle 2: Contract-First Design
Design the integration interface (the contract) before writing any implementation code. The contract is the agreement between producer and consumer.
What a contract specifies:
- The operations available (endpoints, message types)
- The structure of requests and responses (schemas)
- The HTTP methods, status codes, or message formats
- Authentication requirements
- SLAs (latency, availability)
Why contract-first:
- Both sides can develop against the contract simultaneously
- The contract can be used to generate test mocks for consumer-side testing before the server is built
- Breaking changes are visible — a contract change must be explicitly approved
- Documentation is derived from the contract, not written separately
Tools: OpenAPI (REST), AsyncAPI (events and messaging), Protobuf (gRPC), JSON Schema, XML Schema (SOAP/WSDL).
Principle 3: Idempotency
An operation is idempotent if performing it multiple times produces the same result as performing it once.
Why It Matters
Networks are unreliable. A request may be sent successfully but the response is lost — the client does not know if the server processed the request. If the client retries:
- A non-idempotent operation creates duplicates (two orders, two payments, two emails)
- An idempotent operation is safe to retry — the result is the same regardless
Natural Idempotency
Some operations are naturally idempotent:
GET— reading data twice returns the same dataPUT— replacing a resource with the same data twice has the same resultDELETE— deleting a resource twice leaves it deleted (the second call is a no-op)
POST (create) is not naturally idempotent — calling it twice creates two records.
Making Non-Idempotent Operations Idempotent
Use an idempotency key — a unique identifier generated by the client and sent with the request. The server stores the key and the result of the first successful execution. On re-delivery, it returns the cached result without re-executing.
POST /payments HTTP/1.1
Idempotency-Key: client-generated-uuid-abc123
{ "amount": 150.00, "currency": "EUR" }Apply idempotency to all state-changing operations in integration flows. Design message consumers to be idempotent — in messaging systems, at-least-once delivery means duplicates are possible.
Principle 4: Design for Failure
Assume everything will fail at some point. Network calls time out. Services crash. Databases become unavailable. Data arrives malformed. Your integration must handle all of these without losing data or leaving systems in an inconsistent state.
Failure Categories
| Category | Examples | |----------|---------| | Transient | Network timeout, brief service downtime | | Permanent | Service shut down, endpoint removed | | Data | Missing required field, invalid format, business rule violation | | Systemic | Target system overloaded, schema mismatch |
Retry Policies
For transient failures, retry — but with exponential backoff and jitter:
Attempt 1 → fail → wait 1 second + random jitter
Attempt 2 → fail → wait 2 seconds + jitter
Attempt 3 → fail → wait 4 seconds + jitter
Attempt 4 → fail → wait 8 seconds + jitter
Attempt 5 → fail → route to Dead Letter Queue- Exponential backoff: increases wait time between retries so a recovering service is not immediately overwhelmed
- Jitter: random offset prevents all retrying clients from hitting the server at the same instant
- Maximum retry count: after N attempts, stop retrying and escalate
Never retry for permanent failures (404 Not Found, 400 Bad Request) — these will not resolve by retrying.
Circuit Breaker
A circuit breaker prevents cascading failures by stopping calls to a failing dependency:
CLOSED (normal): calls pass through
→ if failure rate exceeds threshold:
OPEN (failing): calls fail immediately, no contact with failing service
→ after timeout:
HALF-OPEN: one test call allowed
→ success: return to CLOSED
→ failure: return to OPENThis gives the failing service time to recover while protecting the calling system from being blocked.
Dead Letter Queue (DLQ)
Messages that cannot be processed after exhausting retries must go somewhere. The DLQ is that place — a holding area for messages requiring human investigation.
DLQ rules:
- Never silently discard messages
- Alert when any message enters the DLQ
- Preserve the original message, error reason, and timestamp
- Build a mechanism to resubmit after fixing the root cause
Timeout Configuration
Always configure timeouts. Without them, connections can hang indefinitely:
- Connection timeout: time to establish the TCP connection (2–5 seconds)
- Read timeout: time to receive the complete response (10–60 seconds depending on operation)
Principle 5: Reliability and Delivery Guarantees
Choose your delivery guarantee based on what data loss means for your use case:
At-most-once: message delivered once; if delivery fails, no retry. Possible message loss. Use only for non-critical, high-volume data (telemetry, metrics).
At-least-once: message retried until acknowledged. No loss, but duplicates possible. Consumer must be idempotent. Default choice for business integrations.
Exactly-once: delivered precisely once, no duplicates. Requires broker + consumer coordination (expensive). Use only when the cost of deduplication on the consumer side is higher than the infrastructure cost.
Principle 6: Separation of Concerns
Each component in an integration should do one thing well. Do not mix:
- Routing logic with transformation logic
- Business rules with protocol handling
- Error handling with main processing flow
Why: mixed concerns make code harder to test, harder to change, and harder to debug. When something goes wrong, you want to isolate the problem quickly.
Practical application:
- Separate validators (check input correctness) from transformers (convert format)
- Separate routing decisions from message processing
- Keep integration flow logic out of business domain services
Principle 7: Observability by Design
You cannot manage what you cannot see. Design observability into integrations from the start — do not add it as an afterthought.
The three pillars of observability:
Logs: structured records of what happened. Every integration transaction should produce a log entry with:
- Correlation ID (links the full transaction across systems)
- Source and target system
- Message type and ID
- Timestamp of each step
- Success or failure status
- Error detail if failed
Metrics: numerical measurements over time. Essential metrics:
- Throughput (messages/second)
- Processing latency (p50, p95, p99)
- Error rate (percentage of failed messages)
- Queue depth (how many messages are waiting)
Traces: the end-to-end journey of a single transaction across all systems and services. Distributed tracing tools (OpenTelemetry) reconstruct this journey from individual service spans.
Principle 8: Canonical Data Model
When many systems need to exchange data, define a Canonical Data Model (CDM) — a neutral, shared representation of each key data entity.
Instead of building N×(N-1) pairwise transformations between N systems:
System A → System B format
System A → System C format
System B → System A format
System B → System C format
... (grows quadratically)Build N transformations — one per system — to/from the CDM:
System A ↔ CDM
System B ↔ CDM
System C ↔ CDMThe CDM also acts as the authoritative definition of shared business concepts — what fields an "order" has, what a "customer" looks like, what date formats are used.
Practical tip: start small. Define a CDM for the most-exchanged entities first (customer, order, product). Expand as new integrations are added.
Principle 9: Security by Design
Security controls must be designed in — not bolted on at the end:
- Authenticate every caller — no integration should accept data without verifying identity
- Authorise every operation — authentication confirms who you are; authorisation confirms what you are allowed to do
- Encrypt in transit — all integration traffic must use TLS
- Encrypt sensitive data at rest — messages stored in queues containing PII must be encrypted
- Principle of least privilege — each integration service account has only the permissions it needs
- Secrets management — credentials and keys are stored in a secrets vault, not in configuration files or source code
Principle 10: Evolvability
Integrations must be designed to change over time:
Version your interfaces. Changes are inevitable. Version numbers let you evolve without breaking existing consumers.
Tolerate unknown fields. Consumers should ignore fields they do not recognise. This allows producers to add new fields without breaking existing consumers (Postel's Law: be conservative in what you send, liberal in what you accept).
Backward compatibility first. Prefer backward-compatible changes (adding optional fields) over breaking changes (removing fields, changing types).
Document change history. When an interface changes, document what changed, why, and when. This makes debugging future issues much faster.
Putting Principles Together: Integration Design Checklist
Use this checklist when reviewing an integration design before build:
Coupling and contract:
- [ ] Is there a formal interface contract (OpenAPI, AsyncAPI, schema)?
- [ ] Does the integration depend only on the contract, not on implementation details?
Reliability:
- [ ] Is a delivery guarantee defined (at-least-once, at-most-once)?
- [ ] Is the consumer idempotent where at-least-once delivery is used?
- [ ] Are retry policies defined with exponential backoff and a maximum?
- [ ] Is a Dead Letter Queue configured?
- [ ] Are timeouts configured?
Failure handling:
- [ ] Are all failure categories identified (transient, data, systemic)?
- [ ] Is the error handling path as explicitly designed as the happy path?
- [ ] Is there alerting for DLQ messages and error rate spikes?
Observability:
- [ ] Does every transaction produce a structured log with correlation ID?
- [ ] Are throughput, latency, and error rate metrics exported?
- [ ] Are alerts defined for availability and performance thresholds?
Security:
- [ ] Is authentication defined?
- [ ] Is TLS required for all connections?
- [ ] Is the service account scoped to minimum required permissions?
Evolvability:
- [ ] Is the interface versioned?
- [ ] Is there a deprecation policy?
Lecture 4 Summary
- Loose coupling is achieved by depending only on contracts, not implementations. Use API gateways and message brokers as intermediaries.
- Design contracts first. Contracts enable parallel development, test mocks, and explicit change management.
- Idempotency makes retries safe. Use idempotency keys for POST operations; design message consumers to be idempotent.
- Design for failure: define retry policies with backoff, implement circuit breakers, configure DLQs, and always set timeouts.
- Choose a delivery guarantee that matches your data criticality. At-least-once is the right default for business integrations.
- Build observability in from the start: structured logs with correlation IDs, metrics for throughput/latency/error rate, and distributed traces.
Next: Lecture 5 — System Integration Monitoring
Enjoyed this article?
Explore the Integration Engineering learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.