DevOps Practices

Sharon Watkins
2 min readMay 25, 2021

DevOps practices reflect the idea of continuous improvement and automation. Many practices focus on one or more development cycle phases. These practices include:

Chaos Monkey- Was first developed by NetFlix, is the discipline of experimenting on a software system in production in order to build confidence in the system’s capability to withstand turbulent and unexpected conditions. Chaos Monkey is responsible for randomly terminating instances in production to ensure that engineers implement their services to be resilient to instance failures.

Blue/Green Deployment- In this setup there are two systems. One is live one is not. An upgrade is persisted in production by the following process:

  1. The offline system, Green is upgraded.
  2. The green system is tested.
  3. Production is pointed to the green system.
  4. If there is a problem then production points to the blue system.

Andon Cords- Originally used by Toyota on the production line, andon cords are a mechanism to halt, or upgrade a deployment to stop a bug from propagating downstream.

Cloud- The cloud solution gives you an entirely API driven way to create and control infrastructure. This allows you to treat system infrastructure exactly as if it were any other program component. As soon as you conceive of a new deployment strategy or disaster recovery plan. This gives you the ability to try it out without waiting on anyone. The cloud approach to infrastructure can make your other devops changes move along at high velocity.

Embedded Teams-Dev teams want to ship new code and the ops team wants to keep service up. This leads to a conflict of interest. Some teams reorganize to embed an ops engineer on each dev team. This makes the team responsible for all its own work. This enables disciplines to closely coordianate.

Blameless Postmortems-This principle makes a team look at what went wrong without “scapegoating” the blame onto a particular team or person. This eliminates human error as an acceptable reason of failure.

Public Status Pages- This has been shown to increase customer satisfaction and retain trust during these outages is communication. Every service that runs a public or private page or application should have a status page that details when there is a problem andwhat is being done.

Developers on Call- Teams have began to put devs on call for the services they have created. This creates very fast feedback loop. Logging and deployment are rapidly improved. Core application problems get solved very quickly.

--

--