Ansible and chef basics

Up till now, we have looked in Terraform for infrastructure provisioning and initial setup using provisioners. Now let’s look at ansible which is an open source automation platform. Ansible does configuration management, application deployment, along with infrastructure orchestration. Ansible is procedural rather than declarative. In ansible, we define what we want to do and ansible go through each and every step for that. In terraform, we specify what state we want to achieve and it makes sure we are at that state by creating, modifying or destroying needed resources. Ansible doesn’t manage any state so we need to define how we want to keep track of created resources using tags or other properties while terraform keeps the state of infrastructure so we don’t need to worry about duplicate resource creation. Personally, I recommend terraform for provisioning the infrastructure, and Ansible for configuring the software as terraform is much more intuitive for infrastructure orchestration.

Once upon a time, managing servers reliably and efficiently was a challenge. System administrators managed server by hand, installing software manually, changing configuration and managing services on servers. As managed servers grew and managed services become more complex, scaling manual process was time-consuming and hard. Then came Ansible which is helpful in creating the group of machines, define how to configure them, what action to be taken on them. All these configurations and actions can be triggered from a central location which can be your local system (named controller machine). Ansible uses SSH to connect to remote hosts and do the setup, no software needed to be installed beforehand on a remote host. It’s simple, agentless, powerful and flexible. It uses YAML in form of ansible playbook. Playbook is a file where automation is defined through tasks. A task is a single step to be performed like installing a package.

Ansible works by connecting to remote hosts (using SSH) defined in inventory file, which contains information about servers to be managed. Ansible then executes defined modules or tasks inside a playbook. Execution of playbook which is called the play. We can use predefined organised playbook called roles, which are used for sharing and reusing a provisioning.

Let’s have a look at some of the terminology used in ansible:

  1. Controller Machine: Machine where Ansible is installed
  2. Inventory: Information regarding servers to be managed
  3. Playbook: Automation is defined using tasks defined in YAML format
  4. Task: Procedure to be executed
  5. Module: Predefined commands executed directly on remote hosts
  6. Play: Execution of a playbook
  7. Role: a Pre-defined way for organizing playbooks
  8. Handlers: Tasks with unique names that will only be executed if notified by another task

As I am using Mac OS, so will be installing pip first using easy_install and then ansible using pip. Please look here to install for other platforms.

sudo easy_install pipsudo pip install ansible

Once above command executed, run command below to make sure that ansible is installed properly.

ansible --version

The output should be something like below.

ansible 2.5.3
config file = None
configured module search path = [u'/Users/mitesh/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /Library/Python/2.7/site-packages/ansible
executable location = /usr/local/bin/ansible
python version = 2.7.10 (default, Oct 6 2017, 22:29:07) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)]

Ansible reads the ssh keys form ~/.ssh/id_rsa. We need to make sure we have public key setup on all remote hosts as we already done using terraform while creation of a remote EC2 instance.

For running ansible command, we need inventory file which is expected to be at a specified path: “/etc/ansible/hosts”. We can change its path using ansible config file (ansible.cfg file) in ansible workspace and define inventory file path there. We need to define username which we are going to use during ssh in ansible config file.

File: ansible.cfg
[defaults]
inventory = ./inventory
remote_user = ec2-user

Create an inventory file and add the IP address (dummy)of a remote host.

File: inventory
[all]
18.191.176.209[server]
18.191.176.209

Once this is done, let’s execute below command to ping all given remote host.

ansible all -m ping

Ansible executes ping command to a remote host and gives below output:

18.191.176.209 | SUCCESS => {
"changed": false,
"ping": "pong"
}

We can even create groups in the inventory file and execute ansible commands by replacing all with a group name. In below example, the server is our group name specified in the inventory file.

ansible server -m ping

Let’s look at playbooks to execute a series of actions. We need to make sure we define playbooks as idempotent so that they can run more than once without having any side effects. Ansible executes playbook in a sequential manner from top to bottom.

Sample playbook is like:

---
- hosts: [hosts]
tasks:
- [first task]
- [second task]

We are going to create a directory on our remote node using playbook for all hosts. Below mentioned playbook will create test directory in /home/ec2-user path.

---
- hosts: all
tasks:
— name: Creates directory
file: path=/home/ec2-user/test state=directory

When we execute above playbook using command “ansible-playbook playbook.yml” we get below result. In this, the first result is gathering facts. This happens as ansible executes a special module named “setup” before executing any task. This module connects to a remote host and gathers all kinds of information like IP address, disk space, CPU etc. Once this is done, our create directory task is executed to create the test directory.

PLAY [all] ***************************************************************************************************************************************************TASK [Gathering Facts] ***************************************************************************************************************************************
ok: [18.191.176.209]TASK [Creates directory] *************************************************************************************************************************************
changed: [18.191.176.209]PLAY RECAP ***************************************************************************************************************************************************
18.191.176.209 : ok=2 changed=1 unreachable=0 failed=0

There are many modules and commands available to be executed on remote hosts. With ansible, we can do a server setup, software installation and lot more tasks.

Terraform basics

Introduction to Terraform

Welcome to the intro guide to Terraform! This guide is the best place to start with Terraform. We cover what Terraform is, what problems it can solve, how it compares to existing software, and contains a quick start for using Terraform.

If you are already familiar with the basics of Terraform, the documentation provides a better reference guide for all available features as well as internals.

What is Terraform?

Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions.

Configuration files describe to Terraform the components needed to run a single application or your entire datacenter. Terraform generates an execution plan describing what it will do to reach the desired state, and then executes it to build the described infrastructure. As the configuration changes, Terraform is able to determine what changed and create incremental execution plans which can be applied.

The infrastructure Terraform can manage includes low-level components such as compute instances, storage, and networking, as well as high-level components such as DNS entries, SaaS features, etc…

Examples work best to showcase Terraform. Please see the use cases.

The key features of Terraform are:

Infrastructure as Code

Infrastructure is described using a high-level configuration syntax. This allows a blueprint of your datacenter to be versioned and treated as you would any other code. Additionally, infrastructure can be shared and re-used.

Execution Plans

Terraform has a “planning” step where it generates an execution plan. The execution plan shows what Terraform will do when you call apply. This lets you avoid any surprises when Terraform manipulates infrastructure.

Resource Graph

Terraform builds a graph of all your resources, and parallelizes the creation and modification of any non-dependent resources. Because of this, Terraform builds infrastructure as efficiently as possible, and operators get insight into dependencies in their infrastructure.

Change Automation

Complex changesets can be applied to your infrastructure with minimal human interaction. With the previously mentioned execution plan and resource graph, you know exactly what Terraform will change and in what order, avoiding many possible human errors.

Bash scripting basics

Bash scripting is one of the fastest ways to automate repetitive tasks on Linux and macOS. If you regularly rename files, parse logs, deploy services, back up directories, or glue command-line tools together, a small Bash script can save a surprising amount of time.

In this updated article, I want to go beyond a basic introduction. We will cover the core building blocks of Bash scripts, some best practices that make scripts safer in production, and a few interesting features that many beginners do not discover early enough.

1. What is a Bash script?

Bash stands for Bourne Again SHell. It is both an interactive shell and a scripting language. A Bash script is simply a text file that contains shell commands executed in order.

Bash is especially useful when you want to:

  • combine multiple terminal commands into one reusable script
  • automate system administration tasks
  • wrap existing tools such as grep, awk, sed, tar, find, docker, or kubectl
  • build small deployment, backup, cleanup, or monitoring utilities
  • run scheduled jobs using cron

2. The smallest possible Bash script

#!/usr/bin/env bash

echo "Hello from Bash"

The first line is called the shebang. It tells the operating system which interpreter should run the file.

To execute the script:

chmod +x hello.sh
./hello.sh

I generally prefer #!/usr/bin/env bash over #!/bin/bash because it is a bit more portable across environments.

3. Variables and command-line arguments

Variables in Bash are simple, but there is one rule beginners often forget: do not put spaces around =.

#!/usr/bin/env bash

name="Thanh"
role="DevOps Engineer"

echo "Name: $name"
echo "Role: $role"

You can also receive input from command-line arguments:

#!/usr/bin/env bash

echo "Script name: $0"
echo "First argument: $1"
echo "Second argument: $2"
echo "Argument count: $#"
echo "All arguments: $@"
echo "Current process id: $$"

Run it like this:

./demo.sh deploy production

4. Reading input from the user

Bash can also ask the user for input interactively:

#!/usr/bin/env bash

read -rp "Enter your username: " username
echo "Welcome, $username"

Useful options for read:

  • -r: prevents backslash escaping from being interpreted unexpectedly
  • -p: prints a prompt
  • -s: hides input, useful for passwords

Example:

read -rsp "Enter password: " password
echo
echo "Password received"

5. Conditions with if, elif, and else

Conditional logic is one of the most common things you will use in shell automation.

#!/usr/bin/env bash

if [[ $# -eq 0 ]]; then
  echo "Please provide at least one argument"
elif [[ $1 == "start" ]]; then
  echo "Starting the service"
else
  echo "Unknown command"
fi

In modern Bash, prefer [[ ... ]] instead of [ ... ] when possible. It is generally safer and easier to read.

6. Case statements are cleaner than long if chains

If you are building a script with subcommands such as start, stop, restart, or status, case is usually the better tool.

#!/usr/bin/env bash

case "$1" in
  start)
    echo "Starting service"
    ;;
  stop)
    echo "Stopping service"
    ;;
  restart)
    echo "Restarting service"
    ;;
  status)
    echo "Checking status"
    ;;
  *)
    echo "Usage: $0 {start|stop|restart|status}"
    ;;
esac

7. Loops: for and while

A for loop is good when iterating over a list:

#!/usr/bin/env bash

for file in *.log; do
  echo "Processing: $file"
done

A while loop is often better when reading a file line by line:

#!/usr/bin/env bash

while IFS= read -r line; do
  echo "Line: $line"
done < input.txt

This pattern avoids several common parsing issues and is much safer than many ad-hoc alternatives.

8. Functions make scripts easier to maintain

Even in small scripts, functions are worth using. They help you avoid duplication and make the script easier to read later.

#!/usr/bin/env bash

log_info() {
  echo "[INFO] $1"
}

backup_file() {
  local source_file="$1"
  cp "$source_file" "$source_file.bak"
}

log_info "Starting backup"
backup_file "config.yaml"

9. A few interesting Bash features many beginners miss

9.1 Strict mode

This is one of the most useful things you can add at the top of production scripts:

#!/usr/bin/env bash
set -euo pipefail

IFS=$'
	'
  • set -e: exit immediately when a command fails
  • set -u: treat unset variables as errors
  • set -o pipefail: fail a pipeline if any command inside it fails
  • IFS: helps avoid bad splitting behavior around whitespace

This is not magic, but it prevents many silent failures.

9.2 Arrays

Bash supports arrays, which are useful when working with multiple files, services, or environments:

#!/usr/bin/env bash

services=(nginx redis postgres)

for service in "${services[@]}"; do
  echo "Checking $service"
done

9.3 Here documents

Heredocs are a neat way to generate files or multi-line output:

cat <<EOF > config.env
APP_ENV=production
APP_DEBUG=false
APP_PORT=8080
EOF

This is extremely useful for quick config generation in CI/CD or provisioning scripts.

9.4 Traps

You can use trap to clean up temporary files even if the script exits early:

#!/usr/bin/env bash

TMP_FILE=$(mktemp)
trap 'rm -f "$TMP_FILE"' EXIT

echo "temporary data" > "$TMP_FILE"
cat "$TMP_FILE"

This is one of those small details that makes scripts much more reliable.

9.5 Debugging mode

When debugging Bash, use:

bash -x your_script.sh

Or inside the script:

set -x

This prints commands as they execute, which is very helpful when tracking down quoting or branching issues.

10. Common Bash mistakes

Bash is powerful, but it is also easy to write fragile scripts if you rush. Here are some very common mistakes:

  • Unquoted variables: write "$file", not $file
  • Using for f in $(ls): this breaks with spaces and special characters
  • Writing var = value: spaces make it invalid in Bash
  • Ignoring exit codes: check failures when your script touches production systems
  • Parsing text carelessly: shell scripts often fail because of unexpected spaces, tabs, or newlines

A great external resource for these issues is Bash Pitfalls.

11. Example 1: backup a directory

Here is a practical example that creates a timestamped backup archive:

#!/usr/bin/env bash
set -euo pipefail

SOURCE_DIR="${1:-}"
BACKUP_DIR="${2:-./backup}"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)

if [[ -z "$SOURCE_DIR" ]]; then
  echo "Usage: $0 <source_dir> [backup_dir]"
  exit 1
fi

if [[ ! -d "$SOURCE_DIR" ]]; then
  echo "Directory does not exist: $SOURCE_DIR"
  exit 1
fi

mkdir -p "$BACKUP_DIR"
ARCHIVE_NAME="backup_${TIMESTAMP}.tar.gz"

tar -czf "$BACKUP_DIR/$ARCHIVE_NAME" "$SOURCE_DIR"

echo "Backup created at: $BACKUP_DIR/$ARCHIVE_NAME"

Run it like this:

./backup.sh /var/log ./artifacts

12. Example 2: check if a service is running

#!/usr/bin/env bash

service_name="$1"

if pgrep -x "$service_name" >/dev/null; then
  echo "$service_name is running"
else
  echo "$service_name is not running"
fi

Example:

./check-service.sh nginx

13. Example 3: batch rename files

#!/usr/bin/env bash

for file in *.txt; do
  mv "$file" "old_$file"
done

This is a tiny script, but it demonstrates why Bash is so productive for file operations.

14. Example 4: a simple deployment helper

#!/usr/bin/env bash
set -euo pipefail

APP_DIR="/opt/myapp"

echo "Pulling latest code..."
git -C "$APP_DIR" pull

echo "Installing dependencies..."
npm --prefix "$APP_DIR" install

echo "Restarting service..."
systemctl restart myapp

echo "Deployment complete"

This is the kind of script many engineers write very early in their DevOps journey.

15. When should you use Bash and when should you switch to Python?

Bash is a great choice when you are:

  • gluing command-line tools together
  • working with files, directories, processes, and environment variables
  • writing short automation scripts for CI/CD, DevOps, or local tooling

However, if your logic becomes complex, if you need data structures beyond basic arrays, or if you need better testing and maintainability, Python is often the better long-term choice.

16. Final thoughts

Bash scripting is not just about putting commands into a .sh file. The real value comes from writing scripts that are safe, readable, and useful under real operational pressure. Start small, automate something annoying, and improve your scripts over time. That is how Bash becomes genuinely powerful.

References

Kubernetes in practice

Kubernetes, often shortened to K8s, is a container orchestration platform designed to run and manage applications across clusters of machines. It became popular because running one container is easy, but running many services reliably in production is much harder.

What Kubernetes Actually Solves

At a high level, Kubernetes helps teams deploy applications consistently, scale them, recover from failures, and expose them through stable networking. It gives a standard control model for distributed applications.

Important Core Objects

Pods

A pod is the smallest deployable unit in Kubernetes. It usually contains one application container plus any closely related helper containers.

Deployments

Deployments manage the desired number of pod replicas and support rolling updates.

Services

Services provide stable access to a group of pods even when pod IPs change.

ConfigMaps and Secrets

These help separate configuration and sensitive values from the container image.

Ingress

Ingress provides HTTP routing so external users can reach services inside the cluster.

A Small Real-World Example

Imagine a web application with three parts:

  • a frontend service,
  • a backend API,
  • and a worker processing background jobs.

With Kubernetes, you can package each one as a container, run them as deployments, expose the frontend and API with services, and scale the worker independently when background load increases.

Example Deployment Manifest

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: demo-api
  template:
    metadata:
      labels:
        app: demo-api
    spec:
      containers:
        - name: api
          image: myorg/demo-api:1.0.0
          ports:
            - containerPort: 8080

Why Teams Like Kubernetes

  • It standardizes deployment patterns.
  • It supports rolling updates and self-healing.
  • It works well with cloud-native tooling.
  • It helps large teams manage many services consistently.

Why Kubernetes Can Be Painful

  • It adds operational complexity.
  • Debugging networking and configuration issues can be time-consuming.
  • Not every small project needs cluster-level orchestration.

Kubernetes is powerful, but it is not a badge of maturity by itself. The right question is whether its operational model matches your scale and team capability.

Final Thoughts

Kubernetes is best understood as an operational platform, not just a trendy technology. If your system already needs scaling, resilience, and service coordination, Kubernetes can be a strong solution. If not, simpler deployment models may be better.

Infrastructure as code basics

Infrastructure as Code, usually shortened to IaC, is the practice of managing infrastructure through code instead of manual clicks in a cloud console. Instead of creating servers, networks, IAM roles, and storage by hand, teams define them in version-controlled files and apply them consistently across environments.

Why Infrastructure as Code matters

Manual infrastructure changes are slow, difficult to review, and easy to forget. IaC solves this by making infrastructure reproducible. A team can create the same environment for development, staging, and production with far less drift.

  • Infrastructure becomes reviewable through pull requests
  • Changes can be repeated safely in multiple environments
  • Disaster recovery becomes faster because environments can be rebuilt
  • Documentation improves because the code itself describes the system

Popular Infrastructure as Code tools

Different tools solve different layers of the problem:

  • Terraform for cloud resources such as VPCs, EC2, IAM, S3, and Kubernetes clusters
  • Ansible for configuration and software setup on existing hosts
  • CloudFormation if you are heavily invested in AWS-native tooling
  • Helm for packaging and deploying applications on Kubernetes

A simple Terraform example

provider "aws" {
  region = "eu-central-1"
}

resource "aws_s3_bucket" "logs" {
  bucket = "my-team-logs-example"
}

Even this tiny example shows the core idea: infrastructure is declared, reviewed, and applied in a consistent way.

Best practices

  • Keep state management secure and backed up
  • Separate environments clearly
  • Review all IaC changes via pull requests
  • Use modules or reusable components to avoid duplication
  • Never hardcode secrets in IaC files

Final thoughts

Infrastructure as Code is not only about automation. It is about making infrastructure predictable, testable, and maintainable. Teams that adopt IaC well usually move faster and recover from mistakes more confidently.

High availability with corosync and pacemaker

Once upon a time, I need to setup High Availability for my servers. I have 2 servers: 1 main server, let’s say A (with public IP, for example 1.0.0.1, private IP: 2.0.0.1) and 1 backup server, let’s say B (with public IP 1.0.0.2 , private IP: 2.0.0.2) and I have a public IP (1.0.0.3) which is used as the IP for my programmed APIs. Two servers are in the same private network.

Goal

Server A and B run with an active/passive configuration. Server A always take public IP (1.0.0.3), whenever server A is down, server B will take this public IP and become the main server.

Solution

After some researches, I decided to use Corosync and Pacemaker to setup the High Availability for my servers.

Corosync is an open source program that provides cluster membership and messaging capabilities, often referred to as the messaging layer, to client servers.

Pacemaker is an open source cluster resource manager (CRM), a system that coordinates resources and services that are managed and made highly available by a cluster. In essence, Corosync enables servers to communicate as a cluster, while Pacemaker provides the ability to control how the cluster behaves.

Synchronizing time betweenservers

Whenever you have multiple servers communicating with each other, especially with clustering software, it is important to ensure their clocks are synchronized. Let’s use NTP (Network Time Protocol) to synchronize our servers. On two servers, run those commands, select the same timezone on both servers:

Configure Firewall

Corosync uses UDP transport between ports 5404, 5405 and 5406 . If you are running a firewall, ensure that communication on those ports are allowed between the servers.

If you use ufw, you could allow traffic on these ports with these commands on both servers:

Or if you use iptables, you could allow traffic on these ports and eth1 (the private network interface) with these commands:

Install Corosync and Pacemaker

Corosync is a dependency of Pacemaker, so we can install both of them using one command. Run this command on both servers:

Configure Authorization Key for two servers

Corosync must be configured so that our servers can communicate as a cluster.

On server A (main server), run these commands:

This will generate a 128-byte cluster authorization key, and write it to /etc/corosync/authkey on server A. Now we need to run this command on server A to copy the authkey to server B (backup server)

Then, on server B, run thoses commands:

Configure Corosync cluster

On both servers, open the corosync.conf and write the below scripts:

You can try to read the scripts and try to understand it. If you can’t, just forget about it :). There are only something that’s you need to remember:

  • server_A_private_IP_address: Private IP of server A
  • server_B_private_IP_address: Private IP of server B
  • private_binding_IP_address: The private IP that’s both server A and B are binding to). To know this address, just run ifconfig on server A (or server B) and take a look at the private interface (usually eth1), you will see something like below, the IP 2.0.0.255 is the value for private_binding_IP_address, because 2 server are running in the same private network, this value must be the same on both server:

Enable and run Corosync

Next, we need to configure Corosync to allow the Pacemaker service. On both servers, create the pcmk file in the Corosync’s service directory with below commands:

Then add this scripts to the pcmkfile

Finally, open file /etc/default/corosync and add this line (if there is already a line START=no, change it to YES as below)

Now, start Corosync on both server

Let’s check if everything is working ok with command:

This should output something like this (if not, wait 1 minute and run the command again):

Enable and Start Pacemaker

Pacemaker, which depends on the messaging capabilities of Corosync, is now ready to be started. On both servers, enable Pacemaker to start on system boot with this command:

Because Pacemaker need to start after Corosync, we set Pacemaker’s start priority to 20, which is higher than Corosync‘s (it’s 19 by default).

Now let’s start Pacemaker:

To interact with Pacemaker, we will use the crm utility. Check Pacemaker’s status:

This should output something like this (if not, wait for 30 seconds and run the command again):

Configure Pacemaker and add our Public IP as a Resource

First we need to config some properties. We can run Pacemaker (crm) commands from either server, as it automatically synchronizes all cluster-related changes across all member nodes. Let’s try to run those commands on server A

Now we will add our public IP (1.0.0.3) as a Resource with this command:

NOTE: The config resource-stickiness=”100″ means that’s whenever a server take the resource, our public IP (1.0.0.3), because the other server is down, it will take it forever even when the other server is online again.

Check the Pacemaker’s status again with command ‘sudo crm status’ you can see:

So we are having one resource running and the primary node (server A) is taking it. It means server A is handle our public IP (1.0.0.3). To double check this, try to run command:

You should see:

Testing, simulate the situation when server A going down

Now, we try to simulate the situation when server A is down, server B should take the public IP (1.0.0.3) in this case.

Of course you can shutdown server A, but if you really don’t want to shut it down, you can make the primary node become standby with command:

Let’s open server B and check pacemaker status with command ‘sudo crm status’ you should see:

Check the server B’s ip with:

You should see server B is now taking our public IP:

Now, to make the server A online again:

Because we set the resource-stickiness=”100″ we need to make secondary node standby and online again to make primary node take our public IP again as default setting.