Home · Operational Stewardship · Change Management

Deliberate Change Management for Linux & Unix Systems

Changes are where most outages are created. Managing them carefully is a core operational responsibility.

Last reviewed: May 2026

Description the work of being deliberate § 1

Change management exists to reduce the risk introduced by well-intentioned action.

Most serious incidents are not caused by hardware failure or software bugs, but by changes made without sufficient context, preparation, or rollback paths.

The goal is not to avoid change, but to make it deliberate, understandable, and survivable when assumptions prove wrong.

Three principles § 2
01

Clarity Before Execution

Changes are evaluated in the context of the whole system, not as isolated technical tasks. Affected components, dependencies, and downstream consumers are identified before anything is touched.

02

Reversibility First

Whenever possible, changes are designed so they can be undone quickly if outcomes differ from expectations. Where reversibility is impossible, that constraint is named explicitly before execution begins.

03

Calm Over Speed

Speed is rarely the primary constraint. Acting calmly under uncertainty reduces long-term damage. Pressure to move fast is a signal to slow down, not to skip steps.

The five-step process § 3

Every non-trivial change moves through the same stages. The depth of each stage scales with the blast radius and reversibility of the change, not the urgency surrounding it.

  1. 1ProposalPurpose, scope, and expected impact described in plain language. Affected systems and dependencies identified.
  2. 2Risk AssessmentFailure modes, blast radius, and recovery options considered. High-risk changes may be broken into smaller steps or deferred.
  3. 3PreparationBackups, snapshots, and verification steps confirmed. Rollback procedures defined where applicable.
  4. 4ExecutionChanges applied deliberately, with monitoring in place to observe real-world impact as it happens.
  5. 5Verification & DocumentationOutcome validated and documented. New assumptions or operational knowledge recorded for future work.
Emergency changes § 4

During incidents, some changes may be required quickly to stabilize systems. Even under pressure, the priority is to stop further damage and preserve future recovery options.

Permanent fixes are often deferred until systems are stable and there is time to reason clearly about long-term impact. Emergency changes are documented after the fact with the same rigor as planned ones, including what was skipped and why.

What it is not § 5

Change management is not a bureaucratic approval process. It is not an excuse to avoid necessary work, nor a promise that changes are risk-free.

It is a disciplined approach to accepting that systems are complex, and that mistakes are most costly when made casually.

Most outages avoided are the result of decisions made before changes are applied.
In practice, this means § 6
  • Before a change is made, someone has written down what it is meant to do, what could go wrong, and how to undo it.
  • Backups and verification steps are confirmed before execution, not assumed.
  • High-risk changes are broken into smaller, observable steps rather than executed as a single leap.
  • After execution, the outcome is recorded along with anything that surprised the operator.
See also § 7

Change management is most effective when it sits inside a long-term operational relationship: stewardship defines the context, incident handling takes over when changes go wrong, and disaster recovery planning assumes some changes will fail in ways that exceed normal recovery.

Discuss your infrastructure → Operational Stewardship