DevOps with Kubernetes
University of Helsinki course on Kubernetes
Chapter 2 - Kubernetes Basics
First deploy
Learning goals:
- Create and run a Kubernetes cluser locally with k3d
- Deploy a simple application to Kubernetes
Terminology:
- Microservices: Small, autonomous services that work together.
- Monolith: A service that is self-contained.
- Kubernetes: An open-source system for automating deployment, scaling, and management of containerized applications. It groups containers into logical units for easy management and discovery. See this simple explanation and this fun comic for a quick overview.
Microservices:
Top three reasons for using them:
- Zero-downtime independent deployability
- Isolation of data and processing around that data
- They reflect the organizational structure
Basic Kubernetes concepts:
- POD == the smallest building block in the Kubernetes object model. The pod sees the container(s) it contains, Kubernetes only sees the pod
- NODE == groups of pods co-located on a single machine (real or virtual)
- CLUSTER == nodes are grouped into clusters, each of which is overseen by a MASTER NODE
- DEPLOYMENT == a
.yamlfile declaration that puts clusters in place => Kubernetes then selects the machines and propagates the containers in each pod
K3s is a lightweight Kubernetes distribution developed by Rancher Labs. It’s designed to be easy to install, resource-efficient, and suitable for local development, edge, and IoT environments. It removes some non-essential features and dependencies to reduce complexity.
K3d is a tool that runs K3s clusters in Docker containers. It makes it easy to spin up and manage local Kubernetes clusters for testing and development.
Differences from full Kubernetes:
- K3s is much smaller and faster to start, with a reduced binary size.
- It omits some advanced features (like in-tree cloud providers, some storage drivers).
- K3s is ideal for local development, CI, and edge use cases, while full Kubernetes is used for production-grade, large-scale deployments.
- K3d lets you run K3s clusters inside Docker containers, making it even easier to experiment locally.
For most learning and development scenarios, k3s/k3d is sufficient and much simpler to use than a full Kubernetes. If you use k3d then you don’t need to install k3s separately.
Containers in k3d Cluster
When you run the command k3d cluster create -a 2, the following containers are created as part of the Kubernetes cluster:
- Server container: Acts as the control plane, managing the cluster state and scheduling workloads.
- Agent containers: Two worker nodes that run the actual workloads (pods).
- Load balancer container: Proxies traffic to the server container, ensuring external requests are routed correctly.
- Tools container: Used internally by
k3dfor cluster management tasks.
These containers collectively form the Kubernetes cluster managed by k3d.
If we run the command k3d kubeconfig get k3s-default then we can see the auto-generated kubeconfig file, located at ~/.kube/config.
Some more basic k3d commands: k3d cluster start, k3d cluster stop, k3d cluster delete.
Common k3d Troubleshooting
Connection Refused Error:
When running kubectl get nodes, you might encounter:
1
The connection to the server 0.0.0.0:63096 was refused - did you specify the right host or port?
This typically means your k3d cluster is stopped. Check cluster status:
1
k3d cluster list
If you see 0/1 servers and 0/2 agents, the cluster is stopped. Start it with:
1
k3d cluster start k3s-default
After starting, verify the cluster is running:
- Servers should show
1/1 - Agents should show
2/2 kubectl get nodesshould now work successfully
kubectl and its role in k3d and k3s
kubectl is the command-line tool used to interact with Kubernetes clusters. It works seamlessly with k3d and k3s as follows:
Cluster Creation:
k3dcreates a Kubernetes cluster by runningk3sinside Docker containers. It also generates a kubeconfig file that contains the connection details for the cluster.Configuration:
The kubeconfig file is typically located at~/.kube/config.kubectluses this file to connect to the Kubernetes API server running in thek3sserver container.Interaction:
- You use
kubectlcommands (e.g.,kubectl get pods,kubectl apply -f deployment.yaml) to manage Kubernetes resources. kubectlcommunicates with the Kubernetes API server, which processes the commands and manages the cluster accordingly.
- You use
In summary, kubectl is the tool you use to interact with the Kubernetes cluster created by k3d and powered by k3s. It relies on the kubeconfig file for connection details and authentication.
kubectl communicates with the Kubernetes API server running inside the k3s server container. k3d is responsible for setting up and managing the infrastructure (containers) that run the k3s cluster, but it does not process Kubernetes commands itself.
In this setup:
kubectlsends commands to the Kubernetes API server.- The API server processes these commands and manages the cluster resources.
k3densures thek3scluster infrastructure is running smoothly, providing the environment for the Kubernetes cluster.
A useful command is kubectl explain <resource>, e.g., kubectl explain pod. Another good command to know is kubectl get <resource>, e.g., kubectl get pods.
Ex. 1.1 - First Application Deploy
Goal: Create a simple application that outputs a timestamp and UUID every 5 seconds, containerize it, and deploy it to Kubernetes.
Create simple app: Generates UUID on startup, outputs timestamp + UUID every 5 seconds
Containerize the app
docker build -t your-dockerhub-username/ex-1-1:latest .docker logindocker push your-dockerhub-username/ex-1-1:latestKubernetes Deployment
- Created cluster:
k3d cluster create k3s-default -a 2 - Deployed app:
kubectl create deployment log-output --image=aljazkovac/kubernetes-1-1 Initial issue: Wrong image name caused
ImagePullBackOff- lesson learned about exact namingTesting and Scaling
- Scaling experiment:
kubectl scale deployment log-output --replicas=3 - Key insight: Each pod is independent with its own UUID - they don’t share log files
Multi-pod logging:
kubectl logs -f -l app=log-output --prefix=trueshows which pod generated each log lineEssential Commands Learned
kubectl logs -f deployment/log-output- Stream logs from all podskubectl logs -f -l app=log-output --prefix=true- Stream with pod nameskubectl scale deployment <name> --replicas=N- Scale applicationkubectl get pods- Check pod status
Result:
✅ Successfully deployed and scaled a containerized application, understanding pod independence and basic Kubernetes orchestration.
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/1.1/log_output
Exercise 1.2: TODO Application
Objective: Create a web server that outputs “Server started in port NNNN” when started, uses PORT environment variable, and deploy to Kubernetes.
Application Development
- Created Express.js server: Simple web server with configurable port
- PORT environment variable:
const port = process.env.PORT || 3000; Startup message: Logs “Server started in port 3000” as required
Docker Containerization
- Dockerfile: Node.js 24-alpine base with npm install and app copy
- Local build:
docker build -t todo-app . Docker Hub push: Tagged and pushed as
aljazkovac/todo-app:latestKubernetes Deployment
- Reused existing cluster: Used the same k3s-default cluster from exercise 1.1
- Deployed app:
kubectl create deployment todo-app --image=aljazkovac/todo-app:latest No networking yet: As expected, external access not configured (covered in future exercises)
Essential Commands Learned
docker tag <local-image> <dockerhub-username>/<image>:latest- Tag for registrydocker push <dockerhub-username>/<image>:latest- Push to Docker Hubkubectl create deployment <name> --image=<image>- Deploy from registrykubectl logs deployment/<name>- Check application logs
Result: ✅ Successfully created and deployed a simple web server to Kubernetes, confirming proper startup message and environment variable usage.
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/1.2/todo_app
Exercise 1.3: Declarative Deployment Manifests
Objective: Move the “Log output” app to a declarative Kubernetes manifest and verify it runs by restarting and following logs.
- Manifests folder: Created
devops-with-kubernetes/log_output/manifests/and addeddeployment.yaml. - Deployment spec:
apps/v1Deployment namedlog-output, labelapp=log-output, 1 replica, imagealjazkovac/kubernetes-1-1:latest.
Apply & verify:
1
2
3
4
5
6
7
8
9
10
11
# Apply the declarative deployment
kubectl apply -f devops-with-kubernetes/log_output/manifests/deployment.yaml
# Wait for rollout to complete
kubectl rollout status deployment/log-output
# Inspect pods
kubectl get pods -l app=log-output
# Follow logs (shows timestamp + UUID)
kubectl logs -f -l app=log-output --prefix=true
Restart test:
1
2
3
4
# Trigger a rolling restart and watch logs
kubectl rollout restart deployment/log-output
kubectl rollout status deployment/log-output
kubectl logs -f -l app=log-output --prefix=true
Result:
✅ Deployment applied successfully; pods emit periodic timestamp + UUID as before using the declarative manifest.
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/1.3/log_output
Exercise 1.4: Declarative Deployment for TODO app
Objective: Create a deployment.yaml for the course project you started in Exercise 1.2 (todo-app). You won’t have access to the port yet — that comes later.
- Manifests folder: Created
devops-with-kubernetes/todo_app/manifests/and addeddeployment.yaml. - Deployment spec:
apps/v1Deployment namedtodo-app, labelapp=todo-app, 1 replica, imagealjazkovac/todo-app:latest, with resource requests/limits.
Apply & verify (Deployment):
1
2
3
4
5
6
7
8
9
10
11
# Apply the declarative deployment
kubectl apply -f devops-with-kubernetes/todo_app/manifests/deployment.yaml
# Wait for rollout to complete
kubectl rollout status deployment/todo-app
# Inspect pods
kubectl get pods -l app=todo-app
# Chech logs to verify startup message (no external port yet)
kubectl logs -l app=todo-app
Result:
✅ The todo-app runs via a declarative Deployment, and logs confirm the server starts with the given port. External access will be added in a later exercise.
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/1.4/todo_app
Introduction to Debugging
Some useful commands:
kubectl describekubectl logskubectl deletekubectl get events
Using Lens, the Kubernetes IDE, can also make for a smoother debugging experience.
Introduction to Networking
The kubectl port-forward command is used to forward a local port to a pod. It is not meant for production use.
Exercise 1.5: Port forwarding for the TODO app
Objective: Return a simple HTML website and use port-fowarding to reach it from your local machine
- Create a simple HTML website
- Build a new Docker image and push:
docker build -t aljazkovac/todo-app:latest .&&docker push aljazkovac/todo-app:latest - Apply new deployment:
kubectl apply -f todo_app/manifests/deployment.yaml - Restart deployment:
kubectl rollout restart deployment/todo-app - Port forward:
kubectl port-forward todo-app-66579f8fd6-j72f8 3000:8080(<local port>:<pod port>) - Check at localhost:3000: Go to localhost:3000 and make sure you see the HTML website.
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/1.5/todo_app
Exercise 1.6: Use a NodePort service for the TODO app
Objective: Use a NodePort service to reach your TODO-app from your local machine
- Prepare a service.yaml file
- Delete existing Kubernetes cluster:
k3d cluster delete k3s-default - Create new Kubernetes cluster and open ports on the Docker container and the Kubernetes node:
k3d cluster create k3s-default --port "3000:30080@agent:0" --agents 2 - Apply the deployment:
kubectl apply -f manifests/deployment.yaml - Apply the service:
kubectl apply -f manifests/service.yaml - Check at localhost:3000: Go to
localhost:3000and make sure you see the HTML website.
Here is the complete chain of port-forwarding:
Browser (localhost:3000) ↓ (Docker port mapping) Docker Container: k3d-k3s-default-agent-0 port 30080 ↓ (This container IS the Kubernetes node) Kubernetes Node port 30080 ↓ (NodePort service routing) Service nodePort: 30080 → targetPort: 8080 ↓ TODO app listening on port 8080
Important: It doesn’t matter which node has the port mapping. The Kubernetes NodePort service handles the cross-node routing. In this case, we opened the port on node agent-0, but the pod was running on agent-1, but we could still access it at localhost:3000.
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/1.6/todo_app
Exercise 1.7: Add HTTP Endpoint to Log Output App
Objective: “Log output” application currently outputs a timestamp and a random string (that it creates on startup) to the logs. Add an endpoint to request the current status (timestamp and the random string) and an Ingress so that you can access it with a browser.
Solution:
I extended the log_output/app.js to include an HTTP server with a /status endpoint that returns the current timestamp and the application’s UUID as JSON.
The key changes were:
- Added HTTP server using Node.js built-in
httpmodule - Created
/statusendpoint that returns{timestamp, appId} - Kept the existing 5-second logging functionality
- Random string (UUID) is stored in memory for the application lifetime
I also needed to update the Kubernetes manifests:
- Updated deployment.yaml: Added
containerPort: 3000to expose the HTTP server port - Created service.yaml: ClusterIP service exposing port 2345, targeting container port 3000
- Created ingress.yaml: Ingress resource to route HTTP traffic from the browser to the service
Networking and Port Configuration:
The networking flow works as follows:
- k3d Port Mapping: k3d maps host port 3000 to the cluster’s LoadBalancer port 80
- Ingress (Traefik): Receives requests on port 80 and routes them based on ingress rules
- Service: Exposes the deployment on cluster port 2345 and forwards to container targetPort 3000
- Container: The Node.js app listens on port 3000 inside the container
The complete flow: localhost:3000 → Traefik LoadBalancer:80 → log-output-svc:2345 → container:3000
This differs from direct port forwarding (kubectl port-forward) because:
- Ingress routing: Uses HTTP path-based routing instead of direct port mapping
- Service abstraction: The Service provides load balancing and service discovery
- Production-ready: Ingress is designed for production use, while port-forward is for development
After building and pushing the updated Docker image (aljazkovac/log-output:latest) and applying the manifests, the endpoint is accessible at:
1
2
curl http://localhost:3000/status
# Returns: {"timestamp":"2025-08-21 19:47:06","appId":"f67b6cb3-9982-40d9-b50f-0eb85059bbae"}
Key Insight: The random string (UUID) is stored in memory and persists for the lifetime of the application. Each restart generates a new UUID, but it remains constant while the container is running.
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/1.7/log_output
Exercise 1.8: Use Ingress for the TODO app
Objective: Use a Ingress service to reach your TODO-app from your local machine
- Delete the existing cluster:
k3d cluster delete k3s-default - Create a new cluster with the port mapping to port 80 (where Ingress listens):
k3d cluster create k3s-default --port "3000:80@loadbalancer" --agents 2 - Create a service file
- Create an ingress file: make sure you reference your service correctly
- Apply all services:
kubectl apply -f manifests/ - Check at
http://localhost:3000
The traffic flow: localhost:3000 → k3d loadbalancer:80 → Ingress → Service(2345) → Pod(8080)
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/1.8/todo_app
Exercise 1.9: Ping-Pong Application with Shared Ingress
Objective: Develop a second application that responds with “pong X” to GET requests and increases a counter. Create a deployment for it and have it share the same Ingress with the “Log output” application by routing requests directed to ‘/pingpong’ to it.
- Create the ping-pong application: Express.js app that handles
/pingpongendpoint directly - Build and push the Docker image:
docker build -t aljazkovac/pingpong:latest ./pingpong && docker push aljazkovac/pingpong:latest - Create deployment and service manifests: Deploy with resource limits and expose on port 2346
- Update the existing Ingress: Add a new path rule for
/pingpongto route topingpong-svc - Apply the manifests:
kubectl apply -f pingpong/manifests/ - Test both endpoints:
curl http://localhost:3000/status- returns log-output statuscurl http://localhost:3000/pingpong- returns “pong 0”, “pong 1”, etc.
The traffic flow with shared Ingress:
1
2
3
localhost:3000 → k3d loadbalancer:80 → Ingress
├─ /status → log-output-svc:2345 → Pod:3000
└─ /pingpong → pingpong-svc:2346 → Pod:9000
Key implementation details:
- The ping-pong app listens on
/pingpongdirectly (not/), avoiding the need for path rewriting - Both applications share the same Ingress resource with path-based routing
- The counter is stored in memory and may reset on pod restart
- Port 9000 (where the ping-pong container listens) is not directly accessible from outside the cluster - you must go through the Ingress at
localhost:3000/pingpong. This is why attempting to accesslocalhost:9000directly doesn’t work.
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/1.9/pingpong
Introduction to Storage
There are two really hard things in Kubernetes: networking and storage.
There are several types of storage in Kubernetes:
- emptyDir volume: shared filesystem within a pod => lifecycle tied to the pod => not to be used for backing up a database, but can be used for cache.
- persistent volume: local (not to be used in production as they are tied to a specific node)
Exercise 1.10: Multi-Container Pod with Shared Storage
Objective: Split the log-output application into two applications: one that writes timestamped logs to a file every 5 seconds, and another that reads from that file and serves the content via HTTP endpoint. Both applications should run in the same pod and share data through a volume.
- Restructure the application: Split
log_output/intolog-writer/andlog-reader/subdirectories - Create log-writer app: Writes
timestamp: appIdto/shared/logs.txtevery 5 seconds, serves status on port 3001 - Create log-reader app: Reads from
/shared/logs.txtand serves aggregated data via/statusendpoint on port 3000 - Build and push images:
docker buildanddocker pushboth applications - Update deployment: Multi-container pod with emptyDir volume mounted at
/sharedin both containers - Deploy and test:
kubectl apply -f log_output/manifests/
The traffic flow with multi-container pod:
```localhost:3000/status → Ingress → Service:2345 → log-reader:3000 → /shared/logs.txt ↖ log-writer:3001 → /shared/logs.txt ↑ (writes every 5 seconds)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Key implementation details:
- **emptyDir volume**: Shared storage mounted at `/shared` in both containers, lifecycle tied to the pod
- **File-based communication**: log-writer appends to `/shared/logs.txt`, log-reader reads entire file and counts lines
- **Port separation**: log-writer (3001) and log-reader (3000) use different ports to avoid conflicts
- **Service routing**: Only log-reader exposed externally; log-writer's HTTP server accessible only within pod
- **Real-time updates**: Each request to `/status` shows current file state with increasing `totalLogs` count
The `totalLogs` count increases over time as the writer continuously appends new entries. The log-reader serves the most recent log entry and total count from the shared file.
**Release:**
Link to the GitHub release for this exercise: `https://github.com/aljazkovac/devops-with-kubernetes/tree/2.1`
---
#### Scaling Deployments
The `kubectl scale` command allows you to dynamically adjust the number of replicas (pods) for a deployment. This is essential for managing resource consumption and handling varying workloads.
**Scale up a deployment:**
```bash
kubectl scale deployment <deployment-name> --replicas=<number>
Scale down to zero (stop all pods):
1
kubectl scale deployment <deployment-name> --replicas=0
Scale back up:
1
kubectl scale deployment <deployment-name> --replicas=1
Practical Examples:
1
2
3
4
5
6
7
8
9
10
11
# Scale the log-output deployment to 3 replicas
kubectl scale deployment log-output --replicas=3
# Scale down the todo-app to save resources
kubectl scale deployment todo-app --replicas=0
# Scale back up when needed
kubectl scale deployment todo-app --replicas=1
# Check current replica status
kubectl get deployment <deployment-name>
Use Cases:
- Resource Management: Scale to zero when testing applications to free up CPU and memory
- Load Handling: Scale up replicas to handle increased traffic
- Development: Quickly stop/start applications during development cycles
- Cost Optimization: Scale down non-production environments when not in use
The scaling approach is much more efficient than deleting and recreating deployments, as it maintains your configuration while allowing precise control over resource usage.
Exercise 1.11: Shared Persistent Volume Storage
Objective: Enable data sharing between “Ping-pong” and “Log output” applications using persistent volumes. Save the number of requests to the ping-pong application into a file in the shared volume and display it alongside the timestamp and random string when accessing the log output application.
Expected final output:
1
2
2020-03-30T12:15:17.705Z: 8523ecb1-c716-4cb6-a044-b9e83bb98e43.
Ping / Pongs: 3
Implementation Summary: This exercise demonstrates persistent data sharing between two separate Kubernetes deployments using PersistentVolumes and PersistentVolumeClaims. The key challenge was enabling the ping-pong application to save its request counter to shared storage that the log-output application could read and display.
Step-by-Step Process:
- Create Cluster-Admin Storage Infrastructure:
1
docker exec k3d-k3s-default-agent-0 mkdir -p /tmp/kube
- Modify Ping-Pong and Lod-Reader Applications
- Update Kubernetes Deployments
- Rebuild and Deploy Updated Images
- Test Persistent Storage
How the Applications Work Together:
The system consists of three main components working together through shared persistent storage:
Ping-Pong Application: Runs as a separate deployment, handles /pingpong requests by incrementing an in-memory counter, returning “pong X” responses, and persistently saving the counter value to /shared/pingpong-counter.txt in the format “Ping / Pongs: X”.
Log-Writer Component: Continues its original function of writing timestamped UUID entries to /shared/logs.txt every 5 seconds, but now uses writeFile instead of appendFile to maintain only the latest entry.
Log-Reader Component: Enhanced to read from both shared files - combines the latest log entry from logs.txt with the current ping counter from pingpong-counter.txt, serving both pieces of information through the /status endpoint.
Data Flow: When a user hits /pingpong, the ping-pong app increments its counter and saves it to shared storage. When accessing /status, the log-reader reads both the latest timestamp/UUID and the current ping count from shared files, presenting them as a unified response.
Deployment Configuration and Node Scheduling:
Node Affinity Solution: The PersistentVolume includes nodeAffinity constraints that force any pods using this volume to schedule on k3d-k3s-default-agent-0. This ensures both applications can access the same hostPath directory. Since hostPath storage is node-local (each node has its own /tmp/kube directory), pods on different nodes would see different file systems. The nodeAffinity constraint solves this by ensuring co-location.
ReadWriteOnce vs ReadWriteMany: We use ReadWriteOnce access mode, which allows multiple pods on the same node to share the volume. This works because our nodeAffinity ensures both pods run on the same node.
Container Inspection and Debugging:
You can inspect the shared volume contents from either pod using kubectl exec commands to list directory contents and view file contents. This helps verify that data is being written and read correctly.
Key Kubernetes Concepts Demonstrated:
PersistentVolume vs emptyDir: Unlike
emptyDirvolumes that are tied to pod lifecycle, PersistentVolumes provide data persistence that survives pod restarts and rescheduling.Storage Classes and Manual Provisioning: The
manualstorage class indicates that storage is manually provisioned rather than dynamically allocated by a storage controller.Cross-Application Data Sharing: This exercise demonstrates how separate deployments can share data through persistent volumes, enabling microservices to communicate via shared file systems.
Node Affinity for Storage Locality: When using node-local storage like
hostPath, node affinity constraints ensure pods can access the same underlying storage.
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/1.11
Exercise 1.12: Random Image from Lorem Picsum
Objective: Add a random picture from Lorem Picsum to the TODO app that refreshes hourly and is cached in a persistent volume to avoid repeated API calls.
Requirements:
- Display a random image from
https://picsum.photos/1200in the project - Cache the image for 10 minutes
- After 10 minutes, serve the cached image once more, then fetch a new image on the next request
- Store images in a persistent volume so they survive container crashes
Implementation Summary: This exercise focused on integrating external API calls with persistent storage and implementing smart caching logic. The main challenge was ensuring the image fetching logic executed properly within the Express.js middleware stack.
Key Technical Issues and Solutions:
Express.js Middleware Order Problem: The initial implementation placed app.use(express.static()) before the route handlers, causing Express to serve index.html directly from the public directory without executing the image fetching logic in the “/” route handler.
Solution: Moved the express.static() middleware after the route handlers. This ensures that custom route handlers (like “/” for image fetching) execute first, and static file serving only happens if no routes match.
Caching Logic Implementation: The application implements a three-phase caching strategy: images fresh for less than 10 minutes are served from cache, expired images are served once more from cache while marking them as “served after expiry”, and subsequent requests trigger a new API fetch.
Persistent Storage Integration: Used PersistentVolume and PersistentVolumeClaim to mount /app/images directory, ensuring cached images survive pod restarts and container crashes. The volume mount allows the application to maintain its cache across deployments.
Application Workflow:
Image Fetching Process: On each request to the root path, the application checks if a new image is needed based on cache age and usage. If required, it downloads a new image from Lorem Picsum using axios with streaming, saves it to the persistent volume, and updates metadata with fetch timestamp.
Cache Management: Metadata stored in JSON format tracks when images were fetched and whether they’ve been served after expiry. This enables the “serve expired image once” requirement while ensuring fresh content delivery.
Integration with HTML: The HTML page includes an image element that references /image endpoint, which serves the cached image file directly from the persistent volume.
Debugging and Deployment:
Container Orchestration: The deployment uses imagePullPolicy: Always to ensure latest code changes are pulled, combined with kubectl rollout restart to trigger immediate deployment updates.
Networking Flow: Requests flow through the ingress controller to the service (port 2345) to the container (port 8080), where the Express application handles both the HTML serving and image caching logic.
Kubernetes Resource Configuration:
The solution uses existing persistent volume infrastructure from previous exercises, mounting the image storage at /app/images in the container. This ensures cached images persist across pod restarts while maintaining the 10-minute caching behavior.
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/1.12
Exercise 1.13: TODO App Input Functionality
Objective: Add real todo functionality to the project by implementing an input field with character validation, a send button, and a list of hardcoded todos.
Requirements:
- Add an input field that doesn’t accept todos over 140 characters
- Add a send button (functionality not required yet)
- Display a list of existing todos with hardcoded content
Implementation Summary: This exercise transformed the basic TODO app from a simple image display into an interactive web application with form inputs and validation. The focus was on frontend development with proper user experience enhancements while maintaining the existing image caching functionality.
Key Technical Issues and Solutions:
Local Development Path Issues: The application initially used absolute paths (/app/images) designed for containerized environments, causing filesystem errors when running locally with npm start.
Solution: Changed to relative paths (./images) that automatically resolve to the correct location in both environments - local development uses the project directory while Docker containers use the /app working directory set by WORKDIR.
Docker Volume Mounting for Development: Managing the development workflow between local changes and containerized testing required setting up proper volume mounts for real-time file synchronization.
Solution: Created a docker-compose.yml configuration with volume mounts (.:/app and /app/node_modules) enabling live code reloading while preserving container-specific dependencies.
Express.js Path Resolution for sendFile: The res.sendFile() method requires absolute paths, but relative paths from ./images caused “path must be absolute” errors even in the container environment.
Solution: Used path.resolve() instead of path.join() to ensure all file paths are converted to absolute paths before being passed to Express.js methods.
Application Workflow:
User Interface Design: The application now features a clean, responsive TODO interface with input validation, character counting, and visual feedback. The design maintains consistency with the existing image display while adding dedicated todo functionality sections.
Input Validation Layer: Implements both HTML-level validation (maxlength="140") for bulletproof character limits and JavaScript enhancements for real-time user feedback including character counters and visual warnings.
State Management: The send button dynamically enables/disables based on input content, provides visual feedback through color changes, and shows character count progression with red warning colors when approaching the 140 character limit.
Debugging and Deployment:
Development Environment Setup: Successfully configured Docker Compose for streamlined development with automatic file synchronization, eliminating the need to rebuild containers after each code change.
Browser Development Tools Integration: Leveraged browser console debugging to understand DOM element properties and troubleshoot JavaScript event handling, demonstrating practical web development debugging techniques.
Container vs Local Development: Resolved path resolution differences between local Node.js execution and containerized deployment, ensuring consistent behavior across development environments.
Kubernetes Resource Configuration:
The exercise builds upon existing Kubernetes infrastructure with persistent volume mounting for image storage. The container paths now work seamlessly in both development (Docker Compose) and production (Kubernetes) environments through consistent relative path usage.
The deployment continues to use the established ingress routing, service configuration, and persistent volume claims from previous exercises, demonstrating how frontend enhancements integrate with existing infrastructure.
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/1.13
Chapter 3 - More Building Blocks
Networking Between Pods
Exercise 2.2: Microservices Architecture with Todo Backend
Objective:
Create a separate backend service (todo-backend) that handles todo data management through REST API endpoints. This service should provide GET /todos and POST /todos endpoints with in-memory storage, while the existing todo-app serves the frontend and acts as a proxy to the backend service.
Requirements:
- Create a new todo-backend microservice with RESTful API endpoints
- Implement GET /todos endpoint for fetching all todos from memory
- Implement POST /todos endpoint for creating new todos
- Modify todo-app to communicate with todo-backend via HTTP
- Make todo list dynamic by fetching data from backend API
- Deploy both services as separate Kubernetes deployments
- Enable communication between services using Kubernetes networking
Implementation Summary:
This exercise successfully implemented a microservices architecture by separating the todo application into two distinct services: a frontend service (todo-app) that handles user interface and static content, and a backend service (todo-backend) that manages todo data through REST API endpoints.
The todo-backend service was built as a lightweight Express.js API that stores todos in memory and exposes two core endpoints: GET /todos returns all todos in JSON format, while POST /todos creates new todos with auto-generated IDs and timestamps. The service includes proper input validation and HTTP status codes (400 for validation errors, 201 for successful creation).
The todo-app service was enhanced to act as both a frontend server and API proxy. It serves the HTML interface and handles form submissions server-side (traditional web approach), while also providing an /api/todos endpoint that acts as a bridge between browser JavaScript and the todo-backend microservice for dynamic content loading.
Key Technical Issues and Solutions:
Architecture Pattern Decision: The implementation uses a hybrid rendering approach that combines server-side and client-side techniques. Form submissions are handled server-side with redirects (traditional web pattern), while todo list population happens client-side via JavaScript fetch calls (modern SPA pattern).
Service-to-Service Communication: The todo-app communicates with todo-backend using internal Kubernetes service discovery (todo-backend-svc:3001). This enables secure, cluster-internal communication without exposing the backend API externally.
Container Port Configuration: Resolved confusion about Kubernetes containerPort declarations by adding documentation explaining that while not strictly required for functionality, containerPort serves as important metadata for tooling, monitoring, and team communication.
Networking Architecture: Implemented proper microservices networking where only todo-app is exposed externally via Ingress, while todo-backend remains internal. The backend service uses ClusterIP (port 3001) for internal communication only.
Application Workflow:
The application follows a two-phase loading pattern that optimizes both performance and user experience:
Phase 1 - Server-Side HTML Delivery:
- User visits
localhost:3000→ todo-app serves static HTML immediately - Browser receives complete page structure including forms and containers
- Page renders instantly with empty todo list placeholder
Phase 2 - Client-Side Dynamic Content:
- Browser JavaScript executes
DOMContentLoadedevent → triggersloadTodos() - JavaScript makes AJAX call:
fetch('/api/todos')→ todo-app/api/todosendpoint - todo-app acts as proxy:
axios.get('todo-backend-svc:3001/todos')→ todo-backend service - Data flows back: todo-backend → todo-app → browser → DOM updates
- User sees todos appear dynamically without page refresh
Form Submission Flow:
- User submits form →
POST /todos→ todo-app server - todo-app validates and forwards:
axios.post('todo-backend-svc:3001/todos')→ todo-backend - todo-backend creates todo, returns data → todo-app redirects browser to
/ - Browser reloads page → triggers dynamic loading cycle again with updated data
Debugging and Deployment:
Docker Image Management: Built and pushed separate Docker images for both services using consistent multi-stage build patterns with Node.js 24-alpine base images and production-only dependency installation.
Kubernetes Resource Management: Deployed services as independent deployments with separate service definitions, enabling independent scaling and management. Used kubectl rollout restart to deploy updated code without downtime.
Service Communication Testing: Verified internal service discovery by confirming that todo-backend-svc resolves correctly within the cluster while remaining inaccessible from external traffic.
Ingress Configuration: Removed conflicting ingress rules and ensured only todo-app ingress handles external traffic routing, preventing interference between different applications in the cluster.
Kubernetes Resource Configuration:
The microservices architecture required distinct Kubernetes resources for each service:
todo-backend deployment and service:
- Deployment: Runs on port 3001 with resource limits (100m CPU, 128Mi memory)
- Service: ClusterIP type exposing port 3001 for internal cluster communication
- No external access - purely internal API service
todo-app deployment and service:
- Enhanced deployment: Updated image with proxy endpoints and dynamic frontend
- Service: Continues using existing ClusterIP on port 2345
- Ingress: Routes external traffic from
localhost:3000to todo-app service - Persistent volume: Maintains image caching functionality from previous exercises
The networking architecture ensures secure microservices communication where:
- External users access only the todo-app frontend via Ingress
- Internal API calls flow through Kubernetes service discovery
- todo-backend remains protected within the cluster perimeter
Understanding Client-Side vs Server-Side Rendering:
A fundamental concept demonstrated in this exercise is the distinction between client-side and server-side rendering - this refers to where HTML assembly happens, not where data comes from.
Server-Side Rendering: The server builds complete HTML with data before sending to browser. Example: res.send('<ul><li>Todo 1</li><li>Todo 2</li></ul>') - HTML is assembled on the server.
Client-Side Rendering: The browser JavaScript builds HTML elements dynamically. Example:
1
2
3
4
5
data.todos.forEach((todo) => {
const li = document.createElement("li"); // HTML created in browser
li.textContent = todo.text;
todoList.appendChild(li);
});
Key Insight: Both approaches typically fetch data from backend APIs for security reasons. Direct database access from browsers would be a massive security vulnerability. The “client-side” part refers to DOM manipulation and HTML generation happening in the browser, while data still comes from secure backend endpoints.
Benefits of Client-Side Rendering:
- Instant updates without page refreshes (better user experience)
- Reduced server load (server only sends data, not complete HTML)
- Rich interactivity (drag-and-drop, real-time updates, animations)
- Offline capabilities with service workers and local storage
Benefits of Server-Side Rendering:
- Excellent SEO (search engines see complete HTML immediately)
- Faster initial page loads (complete content sent immediately)
- Simpler development (no complex client-side state management)
- Works without JavaScript (progressive enhancement)
Our Hybrid Approach: Combines benefits by serving HTML structure immediately (fast initial load) while using JavaScript for dynamic updates (better interactivity). Form submissions use server-side redirects for reliability, while todo loading uses client-side rendering for smooth updates.
How Browsers Work:
A browser is fundamentally a universal code interpreter and execution environment that downloads code from servers worldwide and transforms it into interactive visual experiences.
Core Browser Components:
1. Multi-Language Runtime Environment:
- HTML Parser: Converts markup into DOM tree structure
- CSS Engine: Applies styling and layout rules
- JavaScript Engine: (V8, SpiderMonkey) Executes application logic
- Network Stack: Handles HTTP/HTTPS requests, DNS resolution, security
2. Operating System for Web Applications: Browsers provide system-level services like file system access, camera/microphone APIs, notifications, local storage, and networking - essentially acting as a platform for web applications.
3. Security Sandbox: Prevents malicious code from accessing your computer through same-origin policies, content security policies, and process isolation.
Browser Execution Model:
When you visit localhost:3000, your browser:
- Downloads code: HTML, CSS, JavaScript files from the todo-app server
- Parses and interprets: Uses your CPU to build DOM trees and execute JavaScript
- Renders interface: Uses your GPU to display visual elements
- Manages interactions: Handles clicks, form submissions, API calls using your local resources
Key Insight: Browsers are local desktop applications (like Chrome.exe) that download and execute code from remote servers, but all the processing happens on your own computer. When you visit a website, you’re essentially downloading a temporary application that runs on your machine using your CPU, memory, and graphics card.
The browser acts as a universal application platform that can instantly run applications from any server worldwide without installation, making the web the most accessible software distribution platform ever created.
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/2.2
Organizing a cluster
Namespaces
We can use namespaces to organize a cluster and keep resources separated. With namespaces you can split a cluster into several virtual clusters. Most commonly namespaces would be used to separated environments, e.g., into development, staging and production. “DNS entry for services includes the namespace so you can still have projects communicate with each other if needed through service.namespace address. e.g. if a service called cat-pictures is in a namespace ns-test, it could be found from other namespaces via http://cat-pictures.ns-test(opens in a new tab).”
Useful Commands:
kubectl get namespacekubectl get all --all-namespaceskubectl get pods -n <namespace>kubectl create namespace <name>
All commands are run against the current active namespace! You can switch between them easily using the kubens tool.
Useful Tools:
- Kubectx and Kubens == kubectx is a tool to switch between contexts (clusters) on kubectl faster,kubens is a tool to switch between Kubernetes namespaces (and configure them for kubectl) easily.
Kubernetes comes with three namespaces out-of-the-box:
- default = can be used out-of-the-box, and can be deleted, but should be avoided in large production systems
- kube-system = good to leave alone
- kube-public = not used for much
Services can communicate across namespaces like so: service-name>.<namespace-name>.
Namespaces act as deletion boundaries in Kubernetes - deleting a namespace is like rm -rf for everything inside it. This makes namespaces powerful for environment cleanup (dev/test/staging) but dangerous if used accidentally. Always double-check which namespace you’re targeting!
Labels:
We can use labels to separate applications from others inside a namespace, and to group different resources together. They can be added to almost anything. They are key-value pairs.
We can use them in combination with other tools to group objects, e.g., nodeSelector.
Exercise 2.3: Keep them separated
Objective: Move the “Log output” and “Ping-pong” to a new namespace called “exercises”.
This was just about adding the namespace to all the manifests files. A good way of creating namespaces is having a namespace.yaml file where you can define all your namespaces.
Exercise 2.4: Keep them separated
Objective: Move the “Todo App” and “Todo Backend” to a new namespace called “project”.
This was just about adding the namespace to all the manifests files. If things get stuck in a “terminating” state while you are deleting or moving them you need to figure out the dependencies and sort them out.
Configuring Applications
Exercise 2.5 and Exercise 2.6.: Documentation and ConfigMaps
Objective: Use a ConfigMap to inject the container with environment variables
ConfigMaps are a practical way to inject data into a pod. It was interesting to look inside a pod and see that even the environment variables are mapped as files.
I was also wondering about how to update a config map, especially one that has been created partially declaratively (using a configmap.yaml) and partially imperatively (using the kubectl create configmap command).
This seems to be a good way: add -dry-run=client -o yaml | kubectl apply -f - which:
- Generates the ConfigMap YAML
- Pipes it to kubectl apply
- Updates the existing ConfigMap instead of failing with “already exists”
StatefulSets and Jobs
StatefulSets are similar to deployments but are “sticky”, meaning that they maintain a persistent storage and a stable, unique network identity for each pod.
Useful command: kubectl get all --all-namespaces == a way to see all the resources in all the namespaces
Exercise 2.7: PostgreSQL StatefulSet for Persistent Counter Storage
Objective: Run a PostgreSQL database as a StatefulSet (with one replica) and save the Ping-pong application counter into the database. This replaces the in-memory counter with persistent database storage that survives pod restarts.
Requirements:
- Deploy PostgreSQL as a StatefulSet with persistent storage
- Modify the ping-pong application to use PostgreSQL for counter persistence
- Ensure the database is operational before the application tries to connect
- Test that counter values persist across pod restarts
The final architecture implements a complete database persistence layer:
Database Layer:
- PostgreSQL StatefulSet: Single replica with persistent volume for data storage
- Counter Table: Stores application state with auto-incrementing counter values
- Connection Management: Retry logic handles database startup delays
Application Layer:
- Database Initialization: Creates counter table and initial row on startup
- State Persistence: All counter operations (increment, read) use PostgreSQL queries
- Error Handling: Graceful degradation with database connection failures
Networking and Service Discovery:
- Internal Communication: ping-pong app connects to
postgres-svc:5432 - Environment Configuration: Database credentials shared via ConfigMap
- Service Abstraction: PostgreSQL service provides stable endpoint for database access
Data Persistence Comparison:
Before (In-Memory Counter):
- Counter stored in JavaScript variable (
let counter = 0) - Pod restart: Counter resets to 0 ❌
- Cluster destruction: Counter lost forever ❌
- Scaling: Each replica has separate counter ❌
After (PostgreSQL Database):
- Counter stored in PostgreSQL table on persistent volume
- Pod restart: Counter survives (reads from database) ✅
- Pod scaling: All replicas share same database ✅
- Cluster destruction: Data survives with proper storage configuration ⚠️
Storage Persistence Levels:
Current Setup (local-path):
- k3d cluster destruction: Data is LOST ❌ (
local-pathstores data on cluster nodes) - Pod restarts: Data survives ✅ (persistent volume remains intact)
- Node failures: Data may be lost ⚠️ (depends on node-local storage)
Production Setup (external storage):
- Cluster destruction: Data survives ✅ (external storage systems)
- Node failures: Data survives ✅ (storage independent of nodes)
- Disaster recovery: Possible with proper backup strategies ✅
Key Kubernetes Concepts Demonstrated:
StatefulSet vs Deployment: StatefulSets provide stable network identities, ordered deployment/scaling, and persistent storage associations that survive pod rescheduling.
Key Insights Summary:
Database Configuration & Environment Variables: • PostgreSQL initialization: postgres:13 image uses POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD env vars for automatic database/user creation • Configuration consistency: Database credentials must match between StatefulSet and application containers • Environment variable priority: process.env values take precedence over hardcoded defaults - without deployment env vars, you use hardcoded values, not ConfigMap values
Storage & Persistence: • StorageClass differences: local-path (dynamic provisioning, automatic) vs manual (static provisioning, requires pre-created PV) • StatefulSet persistence: StatefulSets automatically recreate pods and maintain persistent volumes • Resource visibility: Kubernetes resources are namespace-scoped and can’t see across namespace boundaries
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/2.7
Exercise 2.9: Wikipedia Reading Reminder CronJob
Objective: Create a CronJob that generates a new todo every hour to remind you to read a random Wikipedia article. The job should fetch a random Wikipedia URL and POST it as a todo to the existing todo application.
Requirements:
- CronJob runs every hour (
0 * * * *) - Fetch random Wikipedia article URL from
https://en.wikipedia.org/wiki/Special:Random - Extract the actual article URL from the redirect response
- POST the todo to the todo-app service with format “Read
" - Use cluster-internal service communication
CronJob Architecture:
The implementation uses a lightweight container approach with the curlimages/curl image and inline shell scripting rather than building a custom Docker image. This design choice prioritizes simplicity and maintainability - the entire job logic is contained within the Kubernetes manifest, making it easy to modify without rebuilding containers.
Wikipedia URL Resolution Process:
The most technically interesting aspect involves HTTP redirect parsing to extract random Wikipedia URLs:
1
WIKI_URL=$(curl -s -I "https://en.wikipedia.org/wiki/Special:Random" | grep -i "^location:" | sed 's/location: //i' | tr -d '\r\n')
This command chain demonstrates several HTTP optimization patterns:
- HEAD requests only (
-I): Fetches headers without downloading full page content - Location header extraction: Parses redirect target from HTTP 302 responses
- Bandwidth efficiency: Minimal data transfer compared to following redirects with full page downloads
Service-to-Service Communication:
The CronJob POSTs todos using internal Kubernetes service discovery:
1
2
3
curl -X POST "http://todo-app-svc.project.svc.cluster.local:2345/todos" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "todo=$TODO_TEXT"
Key networking insights:
- Full DNS names: Uses complete Kubernetes FQDN for cross-namespace reliability
- Internal-only traffic: CronJob communicates directly with todo-app, which proxies to todo-backend
- Form data compatibility: Matches the existing HTML form submission format for seamless integration
Deep Dive: CronJob Lifecycle and Resource Management:
Understanding the relationship between CronJobs, Jobs, and Pods reveals important Kubernetes design principles for batch workload management.
Three-Tier Execution Model:
1
2
3
4
5
6
7
CronJob (wikipedia-todo-cronjob)
↓ Every hour creates...
Job (wikipedia-todo-cronjob-29317560)
↓ Which creates...
Pod (wikipedia-todo-cronjob-29317560-dkz6x)
↓ Which runs...
Container (curlimages/curl + our script)
CronJob = Scheduler/Template:
- Purpose: Defines when and how to run recurring tasks
- Lifecycle: Permanent until explicitly deleted
- Responsibility: Creates Jobs according to schedule (
0 * * * *)
Job = Execution Record:
- Purpose: Manages individual execution attempts with retry logic
- Lifecycle: Controlled by history limits (
successfulJobsHistoryLimit: 3) - Responsibility: Creates and monitors Pods until successful completion
Pod = Runtime Environment:
- Purpose: Provides isolated execution environment for the script
- Lifecycle: Created fresh for each execution, deleted with parent Job
- Responsibility: Runs the actual container and captures logs/exit codes
Resource Cleanup and History Management:
The CronJob configuration controls how long execution history is retained:
1
2
successfulJobsHistoryLimit: 3 # Keep 3 successful Jobs
failedJobsHistoryLimit: 1 # Keep 1 failed Job for debugging
Cleanup Timeline:
- Job completion: Pod status changes to “Completed” but remains accessible
- History retention: Jobs and their Pods persist for debugging/audit purposes
- Automatic cleanup: When history limits are exceeded, oldest Jobs (and Pods) are deleted
- Resource efficiency: Completed Pods consume no CPU/memory, only etcd metadata
Why New Pods for Each Execution:
Kubernetes Jobs create fresh Pods for each execution rather than reusing containers, demonstrating several design principles:
Fresh Execution Environment Benefits:
- State isolation: No leftover files, environment variables, or memory state
- Failure isolation: Crashes or corruption don’t affect subsequent executions
- Resource cleanup: Each Pod gets dedicated CPU/memory that’s released on completion
- Debugging clarity: Each execution has distinct logs and resource metrics
Release:
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/2.9
Monitoring
Exercise 2.10: Set up Monitoring
Objective: Set up monitoring for the project. Use Prometheus for metrics, Loki for logs, and Grafana for dasbhoards.
Used Helm to install the kube-prometheus stack. Otherwise just followed instructions in the course and in the docs to set everything up.
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/2.10
Chapter 4 - To the Cloud
Introduction to Google Kubernetes Engine
Exercise 3.1: Pingpong GKE
Objective: Set up the pingpong app in the Google Kubernetes Engine
Created the cluster with: gcloud container clusters create dwk-cluster --zone=europe-north1-b --cluster-version=1.32 --disk-size=32 --num-nodes=3 --machine-type=e2-micro or machine-type=e2-small. P.S. Delete the cluster whenever you’re not using it: gcloud container clusters delete dwk-cluster --zone=europe-north1-b
Then I removed the service file and replaced it with a loadbalancer config. I had trouble with the container crashing. The logs from kubectl logs showed an exec format error. This indicated the Docker image was built for the wrong CPU architecture (e.g., ARM on an Apple Silicon Mac) for the x86/amd64 GKE nodes.
Another problem was that the database was stuck in a pending stage. kubectl describe pod showed an unbound immediate PersistentVolumeClaims error. The statefulset.yaml was requesting a storageClassName: local-path, which is common for local clusters but doesn’t exist on GKE. I removed the storageClassName line from the statefulset.yaml. This allowed Kubernetes to use the default standard-rwo storage class provided by GKE.
After fixing the storage class, the postgres pod went into an Error state. The logs showed initdb: error: directory "/var/lib/postgresql/data" exists but is not empty. This happens because the new persistent disk comes with a lost+found directory, which the postgres initdb script doesn’t like. I added a subPath: postgres to the volumeMount in the statefulset.yaml. This mounts a clean subdirectory from the persistent disk into the container, allowing the database to initialize correctly.
Key Takeaways:
- Always check pod events with
kubectl describewhen a pod isPending. exec format errorin logs almost always means a CPU architecture mismatch in your Docker image.- Ensure your
storageClassNamein manifests matches what your cloud provider offers. - StatefulSets are largely immutable; you often need to
deleteandapplyto make changes to their pod or volume specifications.
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/3.1
Exercise 3.2: Pingpong and LogOutput in GKE
Objective: Set up the pingpong app and the logoutput app in the Google Kubernetes Engine
In this exercise, we deployed two applications, “ping-pong” and “log-output,” into a GKE cluster and exposed them through a single Ingress. This process uncovered critical issues related to resource allocation, storage configuration, health checks, and deployment strategies, providing a realistic debugging experience.
The first step was to establish a unified entry point for both applications.
- Action: We configured a single
ingress.yamlto handle routing for both services, with thelog-outputapp at the root (/) and thepingpongapp at/pingpong. - Action: We replaced the
pingpongapp’sLoadBalancerservice with aClusterIPservice, as it no longer needed a dedicated external IP.
The initial deployment failed with all application pods stuck in a Pending state.
- Diagnosis 1: Insufficient Memory. Using
kubectl describe pod, we found the errorInsufficient memory. An analysis of the cluster nodes (kubectl describe nodes) revealed thee2-microinstances were too small; after accounting for GKE’s system pods, there was not enough memory left for our applications. Solution 1: The cluster was recreated with larger
e2-smallnodes, which provided sufficient memory.- Diagnosis 2: Incompatible Storage. The
log-outputpod and itsPersistentVolumeClaim(PVC) were stillPending. We found thePersistentVolumewas defined withhostPathand anodeAffinityfor ak3dnode, making it incompatible with GKE. - Solution 2: We abandoned the static provisioning model. We deleted the
persistentvolume.yamlandpersistentvolumeclaim.yamlfiles and replaced them with a single, dynamic PVC manifest that requests storage from GKE’s defaultStorageClass.
With the pods scheduled, the Ingress failed to become healthy, returning a “Server Error”. kubectl describe ingress showed the backends were UNHEALTHY.
- Diagnosis 1 (log-output): The
log-outputapp’s/route was returning a302redirect. The health checker requires a200 OKand treats a redirect as a failure. Solution 1: We modified the
/route in thelog-reader/app.jsto respond directly with200 OK.- Diagnosis 2 (Stale Images): The backends remained unhealthy. We realized that after fixing the code, the Docker images had not been rebuilt and pushed.
- Solution 2: We rebuilt all application images using
docker buildxto create multi-architecture images and pushed them with new, unique tags (e.g.,:v2,:v3). Using unique tags is a best practice to avoid caching issues with the:latesttag.
The applications would run for a while and then become unavailable. kubectl get pods showed a high RESTARTS count.
- Diagnosis: The logs of the previous containers (
kubectl logs --previous) showed no errors, indicating a “silent crash.” This led us todescribethe pod, which revealed the termination reason:OOMKilled. The containers were using more memory than their configuredlimit. - Solution: We edited the
deployment.yamlfiles to increase the memorylimitfor the crashing containers (e.g., from32Mito128Mi), while keeping the memoryrequestlow. This allowed the applications to run without being killed for exceeding their memory allowance.
This exercise was a comprehensive tour of the entire application lifecycle on Kubernetes, from initial deployment and configuration to debugging complex issues with scheduling, storage, health checks, and resource limits.
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/3.2
Exercise 3.3: Use Gateway API Instead of Ingress
In this exercise, we migrated the traffic management for the log-output and pingpong applications from the traditional Ingress API to the newer and more powerful Gateway API. This process led to one final, crucial lesson in Kubernetes resource allocation.
First, we replaced the Ingress resource with two new, more expressive resources:
- A
GatewayResource: This defined the entry point for our cluster:gcloud container clusters update clustername --location=europe-north1-b --gateway-api=standard - An
HTTPRouteResource: This defined the actual routing rules. We configured it to attach to ourGatewayand specified how paths should be directed to our internalClusterIPservices:- Requests to
/were routed to thelog-output-svc. - Requests to
/pingpongand/counterwere routed to thepingpong-svc.
- Requests to
Even with a correct Gateway API configuration, the deployment failed.
- Symptom: After applying the manifests, the
log-outputpod was stuck in thePendingstate. - Diagnosis: Using
kubectl describe pod, we discovered the familiar error:Insufficient memory. Although thee2-smallcluster was larger than our first attempt, it was still not enough. After the GKE system pods, thepostgrespod, and thepingpongpod were scheduled, there was not enough free memory left on any node to accommodate thelog-outputpod’s request.
This confirmed that the total resource demand of the Kubernetes system combined with the full suite of applications was too great for the e2-small nodes.
- Action: The cluster was deleted and recreated one last time using the
e2-mediummachine type. - Outcome: The
e2-mediumnodes provided a substantial amount of memory, creating a large enough buffer to comfortably run all the GKE system pods and all of our application pods. When the manifests were applied to this new cluster, all pods started quickly and without issue.
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/3.3
Exercise 3.4: Use rewrite on the /pingpong route
The goal was to make the pingpong application more portable by having its main logic served from its root path (/) internally, while still being accessible from the /pingpong path externally.
This was accomplished with a two-part solution: one change in the application code and one in the Kubernetes configuration.
First, we modified the pingpong/app.js file. The core logic for incrementing the database counter was moved from the app.get('/pingpong', ...) route to the app.get('/', ...) route. This made the application self-contained, serving its primary function from its own root path.
Second, we edited the HTTPRoute resource (route.yaml). We added a filters section to the rule that matches the /pingpong path.
This filter intercepts any incoming request for /pingpong, replaces that prefix with /, and forwards the modified request to the pingpong service.
Since the GKE health checker probes the / path, and the main application logic was now also at /, every health check would increment the counter.
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/3.4
Deployment Pipeline
Kustomize is a tool for configuration customization, baked into kubectl. Alternatively, we could use Helm or Helmsman.
Add a file kustomization.yaml and apply with kubectl apply -k. A dry run can be run with kubectl kustomize .. Read the Kustomize Cheat Sheet.
Exercise 3.5: Deploy the TODO project to the GKE
To enable the gateway-API from the start, I added the appropriate flag when creating the cluster: gcloud container clusters create dwk-cluster --zone=europe-north1-b --cluster-version=1.32 --disk-size=32 --num-nodes=3 --machine-type=e2-medium --gateway-api=standard
- Kustomize Setup: We organized the todo-project with a Kustomize base and moved all manifests into it, renaming them for clarity.
- GKE Preparation: We identified the need to replace Ingress with Gateway and adjust PersistentVolume handling for GKE.
- PostgreSQL Fix: We resolved a CrashLoopBackOff in PostgreSQL caused by the lost+found directory by correctly implementing subPath for its volume.
- Backend Image Fix: We fixed an exec format error in the backend by rebuilding and pushing multi-architecture Docker images.
- Backend Config Fix: We resolved a CreateContainerConfigError in the backend by correcting case sensitivity mismatches in ConfigMap and Deployment environment variable keys.
- Frontend Volume Fix: We overcame a Multi-Attach error for the frontend’s volume by deleting a lingering old ReplicaSet.
- API Routing Fix: Finally, we resolved the “Error loading todos” by correcting the HTTPRoute to ensure all traffic, including API calls, correctly flowed through the frontend application, which acts as a proxy.
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/3.5
Exercise 3.6: Deploy the TODO project to the GKE with GitHub Actions
We followed the instructions on how to set up the necessary resources in GKE (artifact registry) and authentication, and prepare a GHA workflow. We did run into some issues or challenges:
- Changing Deployment Strategy: Updated todo-app-deployment.yaml to use the Recreate strategy, resolving ReadWriteOnce PVC conflicts.
- Correcting Workflow Paths: Adjusted docker build commands and kustomize paths in the workflow to correctly reference project subdirectories and the kustomization base.
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/3.6
Exercise 3.7: Each branch should create a separate deployment
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/3.7
Exercise 3.8: Deleting a branch should delete the environment
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/3.8
GKE Features
Exercise 3.10: Backup project database to Google Cloud
To create a Kubernetes CronJob that backs up the todo database to Google Cloud Storage, we performed the following:
- Initial CronJob Setup: Started with a basic CronJob manifest (todo-backend-dbdump-cronjob.yaml) and integrated it into Kustomize.
- GCS Credentials: Created a Kubernetes Secret (gcs-credentials) from a Google Cloud Service Account key, and configured the CronJob to use it for GCS authentication.
- Database Credentials: Created a Kubernetes Secret (todo-backend-postgres-credentials) for the PostgreSQL password, as the CronJob was configured to retrieve it securely from a Secret, not the insecure ConfigMap.
- Diagnosed Node Access Scopes: Discovered that the GKE cluster’s default node pool had devstorage.read_only access scopes, preventing GCS write operations despite correct IAM roles.
- Created Specialized Node Pool: Created a new GKE node pool (backup-pool) with broader cloud-platform access scopes to allow GCS write operations.
- Targeted Backup Pod: Modified the CronJob to include a nodeSelector, ensuring the backup pod runs exclusively on the new backup-pool.
- Verification: Applied all changes and successfully ran a test job, confirming the database backup was uploaded to the Google Cloud Storage bucket.
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/3.10
Exercise 3.11: Resource limits
Objective: Set sensible resource requests and limits for the project.
Notes:
- Vertical scaling (more resources) vs. Horizontal scaling (more pods or nodes).
- Resources can have
requests(guaranteed minimum) andlimits(hard maximum) defined for specific containers. - HorizontalPodAutoscaler: Automatically scales pods horizontally based on CPU/memory usage.
- VerticalPodAutoscaler: Automatically scales pods vertically (adjusts requests/limits).
- PodDisruptionBudget: Determines how many pods must always be available during voluntary disruptions.
- ResourceQuotas: Put a hard cap on total aggregate resource consumption (CPU and memory) for a specific namespace.
- LimitRange: Similar to ResourceQuotas but applies to individual containers, creating default values and min/max constraints.
Implementation:
I verified that todo-app and todo-backend deployments already had sensible resource requests and limits defined. However, the postgres StatefulSet was missing them, so I added:
- Requests:
cpu: 200m,memory: 256Mi - Limits:
cpu: 500m,memory: 512Mi
Additionally, to ensure the stability of the project namespace, I implemented:
- ResourceQuota: Capped the namespace at 20 Pods, 2 CPU cores, and 2GB RAM (requests). This prevents any single project from consuming all cluster resources.
- LimitRange: Defined default requests/limits for any new containers and set min/max constraints to prevent “tiny” or “monster” pods.
Link to the GitHub release for this exercise: https://github.com/aljazkovac/devops-with-kubernetes/tree/3.11
Exercise 3.12: Logging
Objective: Turn on logging on the cluster
Ran the command: gcloud container clusters update dwk-cluster --zone europe-north1-b --logging=SYSTEM,WORKLOAD --monitoring=SYSTEM