Linux Interview Questions and Answers for SRE
Linux Interview Questions & Answers for SRE
Table of Contents
-
Introduction
-
Core Linux Concepts
-
Process Management
-
System Performance and Monitoring
-
Networking in Linux
-
File Systems and Storage
-
Security and Permissions
-
Automation and Scripting
-
Troubleshooting Scenarios
-
Advanced Systemd and Boot Process
-
10 FAQs About Linux Interview for SRE
-
Conclusion
Introduction
In the world of Site Reliability Engineering (SRE), Linux is not just an operating system—it's the very foundation upon which the reliability, scalability, and performance of systems rest. Whether you are preparing for your next senior SRE interview or brushing up on essential Linux topics, this guide will walk you through the most commonly asked Linux interview questions tailored specifically for SRE roles.
This comprehensive article on the topic "Linux interview for SRE" covers everything from system internals to real-world troubleshooting—exactly the kind of questions you can expect in a high-level technical interview.
Core Linux Concepts
Q1: What happens when you execute a command in Linux?
Answer: When you execute a command:
-
The shell searches for the command in the
$PATH
. -
If found, a child process is forked using
fork()
. -
The child process replaces itself with the new program using
execve()
. -
The parent process waits for the child to complete (unless run in the background).
This understanding is essential for SREs who troubleshoot user command failures or investigate high CPU usage from shell processes.
Q2: What are runlevels or targets?
Answer: In traditional SysV init, runlevels define system states (e.g., 3 = multi-user, 5 = GUI). With systemd
, they are replaced with targets like graphical.target
, multi-user.target
, etc.
Use:
systemctl get-default
To check the current default target.
Process Management
Q3: How do you identify zombie and orphan processes?
Answer:
-
Zombie: A process that has completed execution but still has an entry in the process table. It shows up with a
Z
in theSTAT
column.ps aux | grep 'Z'
-
Orphan: A process whose parent has died. These are adopted by
init
(PID 1).
Both types can cause issues in long-running production systems, so SREs must monitor and clean them up.
Q4: How do you limit process resources?
Answer: Use ulimit
or cgroups
:
-
ulimit
can limit CPU time, file sizes, open files, etc. -
For advanced control,
cgroups
(Control Groups) allow hierarchical management of resource limits across users or services.
System Performance and Monitoring
Q5: What tools do you use to monitor system performance?
Answer:
-
top/htop: Live CPU/memory usage.
-
vmstat/iostat: VM and IO statistics.
-
sar: Historical system activity.
-
perf: Performance counters for deep profiling.
-
dstat, atop, bpftrace, and netdata are also powerful tools in the SRE toolkit.
Q6: What are load average numbers in Linux?
Answer:
Displayed as 1.52, 2.03, 1.78
in uptime
output, they show the average number of processes in the run queue over 1, 5, and 15 minutes. A number higher than the number of CPU cores may indicate CPU contention.
Networking in Linux
Q7: How do you troubleshoot network issues in Linux?
Answer:
-
ping
,traceroute
, andmtr
for latency and routing issues. -
netstat
orss
to inspect open ports and connections. -
tcpdump
orwireshark
for packet-level analysis. -
ip addr
,ip route
for interface and routing diagnostics.
Q8: What is the difference between TCP and UDP?
Answer:
-
TCP: Reliable, connection-oriented, ordered delivery. Use cases: HTTP, SSH.
-
UDP: Unreliable, connectionless, low-latency. Use cases: DNS, VoIP.
Understanding protocols helps in optimizing performance and debugging failures.
File Systems and Storage
Q9: How do you check disk usage and inode usage?
Answer:
-
Disk usage:
df -h
-
Inode usage:
df -i
SREs often encounter "disk full" scenarios due to inode exhaustion rather than block size limits.
Q10: How does the ext4
journaling filesystem work?
Answer:
It maintains a journal (a special area) to keep track of uncommitted changes. It ensures consistency even during crashes by replaying journal logs. ext4 supports journaling modes like ordered
, writeback
, and journal
.
Security and Permissions
Q11: How does Linux handle file permissions?
Answer:
-
Three types:
read (r)
,write (w)
, andexecute (x)
. -
Permissions are set for
owner
,group
, andothers
. -
Use
chmod
,chown
, andumask
to manage permissions. -
For ACLs:
setfacl -m u:john:rw file.txt
Q12: What is SELinux or AppArmor?
Answer:
Mandatory Access Control systems for Linux:
-
SELinux: More granular, default in RHEL.
-
AppArmor: Easier to manage, default in Ubuntu.
They provide policies to restrict program capabilities beyond file permissions.
Automation and Scripting
Q13: What’s your approach to automating Linux tasks?
Answer:
-
Shell scripting for basic automation.
-
Ansible or Chef for infrastructure automation.
-
Cron jobs or systemd timers for scheduled execution.
-
Python for advanced file/network operations.
Example cron job:
0 1 * * * /usr/local/bin/backup.sh
Q14: How do you debug a broken Bash script?
Answer:
Use set -x
for verbose output.
Run with bash -x script.sh
.
Check exit codes using $?
and use trap
for signal handling and cleanup.
Troubleshooting Scenarios
Q15: A system is running slow. Where do you begin?
Answer:
-
Check CPU, memory, and IO:
top
,vmstat
,iostat
. -
Check
dmesg
and/var/log/messages
for hardware issues. -
Use
ps
to find rogue processes. -
Use
iotop
for IO bottlenecks. -
Examine network usage using
iftop
.
Q16: Your app cannot write logs. How do you debug?
Answer:
-
Check disk and inode space (
df -h
,df -i
). -
Verify file/directory permissions.
-
Look for SELinux/AppArmor blocks.
-
Check if logrotate has misconfigured permissions.
Advanced Systemd and Boot Process
Q17: How does the Linux boot process work?
Answer:
-
BIOS/UEFI loads the bootloader (GRUB).
-
Bootloader loads the kernel.
-
Kernel initializes hardware and mounts root filesystem.
-
init
(orsystemd
) is invoked to start user space.
Q18: How do you analyze systemd boot failures?
Answer:
-
Use
journalctl -xb
for recent boot logs. -
systemctl status
for unit status. -
systemctl list-dependencies
for service trees.
10 FAQs About Linux Interview for SRE
Q1: What should I focus on for a Linux-based SRE interview?
Answer: Emphasize troubleshooting, performance monitoring, networking, and scripting.
Q2: Will I be asked kernel-level questions?
Answer: Not always. For senior roles, knowledge of kernel tuning (e.g., sysctl, memory management) is preferred.
Q3: Is scripting mandatory?
Answer: Yes. Bash is a must; Python is a strong bonus.
Q4: How deep should I know systemd
?
Answer: Understand unit files, dependencies, logging, and debugging failed services.
Q5: Will containerization topics be included?
Answer: Often yes, especially namespace, cgroups, and networking topics.
Q6: What’s the role of Linux in incident response?
Answer: Linux logs, metrics, and performance tools are core to triaging and resolving production issues.
Q7: How do I prepare for hands-on scenarios?
Answer: Practice real-time debugging and try Linux scenarios using VMs or online sandboxes.
Q8: Are cloud-related Linux skills important?
Answer: Yes. Especially using Linux in AWS/GCP/Azure environments.
Q9: Should I know logrotate, rsyslog, journald?
Answer: Absolutely. Logging is a key part of observability in SRE.
Q10: What are some mock tasks I can try?
Answer:
-
Fix a broken cron job.
-
Write a script to restart a service on failure.
-
Setup monitoring for disk usage using Prometheus node exporter.
Conclusion
Preparing for a Linux interview for SRE roles requires a strong foundation in system internals, practical experience with troubleshooting tools, and real-world debugging scenarios. Whether you're managing servers at scale, maintaining uptime for critical applications, or automating infrastructure, Linux mastery is essential for a senior SRE. Use this guide as a reference and practice deeply on the terminal to enhance your confidence and command.
Stay curious, keep learning, and best of luck for your next SRE interview!
Comments
Post a Comment