Linux Interview Questions and Answers for SRE

 


Linux Interview Questions & Answers for SRE

Table of Contents

  1. Introduction

  2. Core Linux Concepts

  3. Process Management

  4. System Performance and Monitoring

  5. Networking in Linux

  6. File Systems and Storage

  7. Security and Permissions

  8. Automation and Scripting

  9. Troubleshooting Scenarios

  10. Advanced Systemd and Boot Process

  11. 10 FAQs About Linux Interview for SRE

  12. Conclusion


Introduction

In the world of Site Reliability Engineering (SRE), Linux is not just an operating system—it's the very foundation upon which the reliability, scalability, and performance of systems rest. Whether you are preparing for your next senior SRE interview or brushing up on essential Linux topics, this guide will walk you through the most commonly asked Linux interview questions tailored specifically for SRE roles.

This comprehensive article on the topic "Linux interview for SRE" covers everything from system internals to real-world troubleshooting—exactly the kind of questions you can expect in a high-level technical interview.


Core Linux Concepts

Q1: What happens when you execute a command in Linux?

Answer: When you execute a command:

  1. The shell searches for the command in the $PATH.

  2. If found, a child process is forked using fork().

  3. The child process replaces itself with the new program using execve().

  4. The parent process waits for the child to complete (unless run in the background).

This understanding is essential for SREs who troubleshoot user command failures or investigate high CPU usage from shell processes.

Q2: What are runlevels or targets?

Answer: In traditional SysV init, runlevels define system states (e.g., 3 = multi-user, 5 = GUI). With systemd, they are replaced with targets like graphical.target, multi-user.target, etc.
Use:

systemctl get-default

To check the current default target.


Process Management

Q3: How do you identify zombie and orphan processes?

Answer:

  • Zombie: A process that has completed execution but still has an entry in the process table. It shows up with a Z in the STAT column.

    ps aux | grep 'Z'
    
  • Orphan: A process whose parent has died. These are adopted by init (PID 1).

Both types can cause issues in long-running production systems, so SREs must monitor and clean them up.

Q4: How do you limit process resources?

Answer: Use ulimit or cgroups:

  • ulimit can limit CPU time, file sizes, open files, etc.

  • For advanced control, cgroups (Control Groups) allow hierarchical management of resource limits across users or services.


System Performance and Monitoring

Q5: What tools do you use to monitor system performance?

Answer:

  • top/htop: Live CPU/memory usage.

  • vmstat/iostat: VM and IO statistics.

  • sar: Historical system activity.

  • perf: Performance counters for deep profiling.

  • dstat, atop, bpftrace, and netdata are also powerful tools in the SRE toolkit.

Q6: What are load average numbers in Linux?

Answer:
Displayed as 1.52, 2.03, 1.78 in uptime output, they show the average number of processes in the run queue over 1, 5, and 15 minutes. A number higher than the number of CPU cores may indicate CPU contention.


Networking in Linux

Q7: How do you troubleshoot network issues in Linux?

Answer:

  • ping, traceroute, and mtr for latency and routing issues.

  • netstat or ss to inspect open ports and connections.

  • tcpdump or wireshark for packet-level analysis.

  • ip addr, ip route for interface and routing diagnostics.

Q8: What is the difference between TCP and UDP?

Answer:

  • TCP: Reliable, connection-oriented, ordered delivery. Use cases: HTTP, SSH.

  • UDP: Unreliable, connectionless, low-latency. Use cases: DNS, VoIP.

Understanding protocols helps in optimizing performance and debugging failures.


File Systems and Storage

Q9: How do you check disk usage and inode usage?

Answer:

  • Disk usage:

    df -h
    
  • Inode usage:

    df -i
    

SREs often encounter "disk full" scenarios due to inode exhaustion rather than block size limits.

Q10: How does the ext4 journaling filesystem work?

Answer:
It maintains a journal (a special area) to keep track of uncommitted changes. It ensures consistency even during crashes by replaying journal logs. ext4 supports journaling modes like ordered, writeback, and journal.


Security and Permissions

Q11: How does Linux handle file permissions?

Answer:

  • Three types: read (r), write (w), and execute (x).

  • Permissions are set for owner, group, and others.

  • Use chmod, chown, and umask to manage permissions.

  • For ACLs:

    setfacl -m u:john:rw file.txt
    

Q12: What is SELinux or AppArmor?

Answer:
Mandatory Access Control systems for Linux:

  • SELinux: More granular, default in RHEL.

  • AppArmor: Easier to manage, default in Ubuntu.
    They provide policies to restrict program capabilities beyond file permissions.


Automation and Scripting

Q13: What’s your approach to automating Linux tasks?

Answer:

  • Shell scripting for basic automation.

  • Ansible or Chef for infrastructure automation.

  • Cron jobs or systemd timers for scheduled execution.

  • Python for advanced file/network operations.

Example cron job:

0 1 * * * /usr/local/bin/backup.sh

Q14: How do you debug a broken Bash script?

Answer:
Use set -x for verbose output.
Run with bash -x script.sh.
Check exit codes using $? and use trap for signal handling and cleanup.


Troubleshooting Scenarios

Q15: A system is running slow. Where do you begin?

Answer:

  1. Check CPU, memory, and IO: top, vmstat, iostat.

  2. Check dmesg and /var/log/messages for hardware issues.

  3. Use ps to find rogue processes.

  4. Use iotop for IO bottlenecks.

  5. Examine network usage using iftop.

Q16: Your app cannot write logs. How do you debug?

Answer:

  1. Check disk and inode space (df -h, df -i).

  2. Verify file/directory permissions.

  3. Look for SELinux/AppArmor blocks.

  4. Check if logrotate has misconfigured permissions.


Advanced Systemd and Boot Process

Q17: How does the Linux boot process work?

Answer:

  1. BIOS/UEFI loads the bootloader (GRUB).

  2. Bootloader loads the kernel.

  3. Kernel initializes hardware and mounts root filesystem.

  4. init (or systemd) is invoked to start user space.

Q18: How do you analyze systemd boot failures?

Answer:

  • Use journalctl -xb for recent boot logs.

  • systemctl status for unit status.

  • systemctl list-dependencies for service trees.


10 FAQs About Linux Interview for SRE

Q1: What should I focus on for a Linux-based SRE interview?

Answer: Emphasize troubleshooting, performance monitoring, networking, and scripting.

Q2: Will I be asked kernel-level questions?

Answer: Not always. For senior roles, knowledge of kernel tuning (e.g., sysctl, memory management) is preferred.

Q3: Is scripting mandatory?

Answer: Yes. Bash is a must; Python is a strong bonus.

Q4: How deep should I know systemd?

Answer: Understand unit files, dependencies, logging, and debugging failed services.

Q5: Will containerization topics be included?

Answer: Often yes, especially namespace, cgroups, and networking topics.

Q6: What’s the role of Linux in incident response?

Answer: Linux logs, metrics, and performance tools are core to triaging and resolving production issues.

Q7: How do I prepare for hands-on scenarios?

Answer: Practice real-time debugging and try Linux scenarios using VMs or online sandboxes.

Q8: Are cloud-related Linux skills important?

Answer: Yes. Especially using Linux in AWS/GCP/Azure environments.

Q9: Should I know logrotate, rsyslog, journald?

Answer: Absolutely. Logging is a key part of observability in SRE.

Q10: What are some mock tasks I can try?

Answer:

  • Fix a broken cron job.

  • Write a script to restart a service on failure.

  • Setup monitoring for disk usage using Prometheus node exporter.


Conclusion

Preparing for a Linux interview for SRE roles requires a strong foundation in system internals, practical experience with troubleshooting tools, and real-world debugging scenarios. Whether you're managing servers at scale, maintaining uptime for critical applications, or automating infrastructure, Linux mastery is essential for a senior SRE. Use this guide as a reference and practice deeply on the terminal to enhance your confidence and command.

Stay curious, keep learning, and best of luck for your next SRE interview!

Comments

Popular posts from this blog

DevOps Learning Roadmap Beginner to Advanced

What is the Difference Between K3s and K3d

Lightweight Kubernetes Options for local development on an Ubuntu machine