Basic Linux for System Administrators You Need to Know

As a basic Linux system administrator, the real value is not memorizing commands but understanding context, risk, and impact when you touch production systems.

Prologue: the mistake that changed my perspective

I remember that day clearly. It was a Friday evening in 2013. The sky was dark, and the server room felt even colder.

I was a junior IT support with enthusiasm and, honestly, a bit careless. We had a web app permission error, a tight deadline, and my boss calling every few minutes. In panic, I typed the classic beginner sin: chmod -R 777 /var/www.

“Done,” I thought. “The app is running. I am a genius.”

Three days later, on Monday morning, that server became a spam botnet farm. The website got defaced, the database corrupted, and our IP reputation was blacklisted everywhere. I spent the next 48 hours rebuilding the server with no sleep and a lot of shame.

1.1 the core lesson

That day taught me one hard rule: In Linux, “it works” is not enough. You must know why it works.

This article is not a list of commands. It is a survival foundation to keep you safe from 3 a.m. panic calls.

Mindset: avoid copy-paste engineering

The biggest trap for beginner sysadmins is dependency on instant tutorials.

You see an error, copy it to Google, open the first StackOverflow result, and paste the fix into production.

Stop. Breathe.

I have watched new admins do this. Sometimes I want to slap their keyboard. “Read first,” I say. “You just told the system to delete root.”

2.1 a risky habit

Linux is obedient. If you tell it to destroy itself, it will do it without hesitation.

System administration is not about memorizing 1000 commands. It is about understanding the system anatomy. Let us break it down properly.

Identity, privilege, and context

Before touching production, know who is executing a command and under what context.

3.1 core commands

whoami, id, groups
sudo -l
umask

3.2 practical rules

Run commands as a regular user first.
Use sudo only when truly required.
Separate admin, service, and app accounts.

File permissions and ownership

Back to my fatal mistake: chmod 777.

Linux enforces strict ownership and permissions. Every file has an owner (user) and a group (group).

Read (4): view content.
Write (2): modify or delete.
Execute (1): run a program or enter a directory.

4.1 why 777 is dangerous

777 means everyone can read, write, and execute. It is like leaving your house unlocked with a sign saying, “Take whatever you want.”

4.2 least-privilege practice

Never use 777 in production.

Apply least privilege so the app can run without exposing the system.

Config files: 640 or 644 (avoid world-read for sensitive data).
Executable scripts: 755 (rwxr-xr-x).
Public upload directories: allow write only for the service account (for example www-data).

4.3 permission checklist

ls -lah
chown -R user:group
chmod u=rw,g=r,o=

Also learn setuid, setgid, and the sticky bit (/tmp uses it). These can be escalation paths if ignored.

During client audits, I often think: “Who gave root ownership to this public folder?”

Processes and load average

Slow server? Do not restart first. Diagnose.

5.1 reading load average

Use top or htop. Load average shows 1, 5, and 15-minute windows.

Supermarket analogy:

Your CPU is a cashier.

Load 0.0: idle.
Load 1.0: one person in line.
Load 5.0 (single core): five people in line. One is served, four are waiting.

If you have 4 cores, a load of 4 is acceptable. A load of 20 is a disaster.

Technical context:

Load average = runnable tasks + uninterruptible sleep (often I/O wait).
High load can mean CPU saturation or slow storage.

5.2 diagnosis tools

Use a combination:

top or htop
vmstat 1 for wa and r
iostat -xz 1 for disk latency and utilization

5.3 killing with etiquette

Find the culprit: ps aux | grep [process_name]. Then be careful with kill.

kill -15 [PID] (SIGTERM): graceful shutdown.
kill -9 [PID] (SIGKILL): force stop. Last resort only.

I once ran kill -9 on a database during heavy transactions. Corrupted data followed.

Memory, cache, and oom

Linux uses RAM for cache. That is normal.

6.1 oom signals

Key commands:

free -h (check available)
cat /proc/meminfo
dmesg -T | grep -i oom

Common signals:

processes die without clear errors
logs show Out of memory: Kill process
heavy swap and high si/so in vmstat

6.2 mitigation strategy

Do not just add RAM. Check for memory leaks, profile the app, and apply cgroup limits.

Log files and investigation

Your black box is /var/log.

7.1 key log locations

/var/log/syslog or /var/log/messages
/var/log/nginx/error.log
journalctl -xe

7.2 reading in real time

Use tail -f to watch logs live. Reproduce the error and read the output.

There is a quiet satisfaction when your eyes catch the exact line: “Out of Memory: Kill process mysqld”.

Disk usage and inodes

“Disk full even though files look small.” Classic.

8.1 size vs inodes

Disk can be full for two reasons:

Size full: large files.
Inodes full: too many small files.

Check with df -i.

8.2 cleanup tactics

Use du -h --max-depth=1 / | sort -hr for size, and find . -type f -delete to clean inode-heavy folders.

I once found 4 million spam queue files that crashed a mail server.

Filesystem and mounts

Production issues often come from bad mounts or unstable storage.

9.1 essential tools

lsblk, blkid
mount, findmnt
fstab

9.2 safer practices

Mount by UUID, not /dev/sdX.
Use nofail and x-systemd.automount for non-critical disks.
Watch I/O errors with dmesg -T | grep -i error.

Networking for sysadmins

Sometimes the problem is not the server, but the network.

10.1 must-have tools

ping
curl -v
ss -tulpn

10.2 DNS and ports

Extra essentials:

DNS debug: dig +short domain, resolvectl status, /etc/resolv.conf
TCP: ss -tan state established
Firewall: ufw status or iptables -L -n -v / nft list ruleset

Service management with Systemd

In modern distros, systemd is the control center. Do not just restart, read status first.

11.1 status and logs

systemctl status service
journalctl -u service -b
systemctl show service

11.2 troubleshooting flow

Check status
Read logs
Fix config and permissions
Restart after the fix

Packages and updates

Blind updates in production are risky.

12.1 version control

apt-cache policy or dnf info
apt-mark hold

12.2 update strategy

Snapshot VM before major updates.
Read changelogs for critical services.
Test in staging before production.

Closing: mindset over memorization

Being a senior sysadmin is not about memorizing every tar flag. It is about calmness and method.

When production goes down, follow this flow:

Check connectivity.
Check CPU and RAM.
Read logs.
Isolate the issue.
Apply the fix.

Do not guess. Do not shoot in the dark.

Open your terminal. Type whoami. Take responsibility for every command you run.

Internal link suggestions:

Linux server hardening best practices
Docker container crash case study
LVM storage failure case study