I wrote this wiki for my interns and junior admins as a quick overview of basic things to check and use when they encountered an issue. The intent of this guide is to provide nothing more than the fundamental tools that one uses to operate a Linux Server and act as a central jumping point (use the links to learn more). This was originally written around the year 2014/5, but still remains relevant due to the core nature of these utilities. They will most likely still be relevant to troubleshooting and navigating a linux server in 10 or 20 years.

This was written during the height of CentOS 6. Some concepts may be stale and out of date.

Summary

There are basically an infinite amount of tools and ways to resolve an issue. 10 times out of 10, if you have a problem someone else has already encountered it and documented how to resolve the issue somewhere out there via google.com.

  • EVERYTHING on Linux is a file, understanding this fact is key to understanding linux, permissions, access, drivers, kernels, and debugging
  • Follow through! Do not just hit enter and walk away, understand the consequences (good and bad) of what you are doing before you change something.
  • Start small, the first thing every aspiring Linux Admin should build is a LAMP or LAPP Stack and then move on to more complicated services/setups
  • Take notes, so that you remember where you got stuck and what you did so it can be a repeatable process that should be able to be duplicated many times over (this is what the wiki is for)
  • Don’t Panic! when you mess up, learn how to fix it this will make you a better admin, trial by fire is the way of life for all of us.
  • Take snapshots, backups, SVN/GIT, copy files before any major changes, always start at Testing/DEV and work your way up to BTA/PROD deployments.
  • Ask questions, discuss what you are doing before/after you change something(even if it is just research), measure twice/cut once: This allows everyone to know where you are and what you are doing, that way if something bad happens we can quickly triage the situation.

Bash - Shell Scripting

Bash scripting is the bread and butter of systems administration. Too often we have to repeat a set of commands across many machines, and the goal of everyone should be to complete that goal with as much automation as possible. Any command that you can type on the CMD line is valid in bash scripting along with basic if/elif/else logic, for loops, variable substitution, and mathematical expressions.

#!/bin/bash

Wikipedia:Bash (Unix shell)

Man Pages

If you are ever unsure of what a command does man pages are the way to locally read and understand how to use any command in Linux. It is literally the command manual in text format.

man ifconfig
 IFCONFIG(8)                Linux Programmer’s Manual               IFCONFIG(8)
 
 NAME
        ifconfig - configure a network interface
 
 SYNOPSIS
        ifconfig [interface]
        ifconfig interface [aftype] options | address ...
 
 DESCRIPTION
        Ifconfig  is  used to configure the kernel-resident network interfaces.
        It is used at boot time to set up interfaces as necessary.  After that,
        it  is  usually  only  needed  when  debugging or when system tuning is
        needed.
 
        If no arguments are given, ifconfig displays the  status  of  the  cur-
        rently  active interfaces.  If a single interface argument is given, it
        displays the status of the given interface only; if a single  -a  argu- 
        ment  is  given,  it  displays the status of all interfaces, even those
        that are down.  Otherwise, it configures an interface.
  [there is much more to this document in full]

/var/log/messages

/var/log/messages is the default catch-all for any system errors that may be occurring. There are sometimes logs setup for specific services. Always try to see if a specific service has its own log file or directory (httpd,cron,secure,php-fpm,mysql,postgresql) and if it does not exist or is empty, check /var/log/messages it will probably have the errors, warning, INFO, emergencies, kernel panics, and any other frightening logs that you need to resolve any issues that may occur.

journalctl

The command journalctl can be used to read system and application logs through systemd’s journaling service. This is the newer, sexier way to quickly access logs, but by default journalctl only keep logs around for a few days or until you max out the reserved amount.

$ journalctl -f # follow the logs
$ journlactl -n100 # look back last 100 lines of the log file

If you have checked both the service log and /var/log/messages and still cannot find any information about the service you are trying to fix, you may have to enable debugging mode(s) or level(s) on the service itself. This is different for all services and can usually be found in the products documentation.

Machine Info

The following is a collection of commands and their result to gather basic information about the machine.

Get name, kernel # and date of build with uname

$ uname -a
Linux devsql005.example.com 3.10.0-123.8.1.el7.x86_64 #1 SMP Mon Sep 22 19:06:58 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

OS release info can be found in the following files

$ cat /etc/redhat-release; # shows official CentOS/Redhat version number
  CentOS Linux release 7.0.1406 (Core)

or for Ubuntu

$ cat /etc/lsb-release                                             
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"

Use Uptime to get the amount of time since the server has started, and also load averages

$ uptime
 20:40:43 up 1 day, 22:36,  0 users,  load average: 0.23, 0.16, 0.18

Ip Address and network information

$ ip addr; 
 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
     inet6 ::1/128 scope host
        valid_lft forever preferred_lft forever
 2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
     link/ether 00:50:56:84:59:79 brd ff:ff:ff:ff:ff:ff
     inet 10.0.0.[?]/24 brd 10.0.0.255 scope global ens160
        valid_lft forever preferred_lft forever
     inet6 fe80::250:56ff:fe84:5979/64 scope link
        valid_lft forever preferred_lft forever
$ ifconfig;
 ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 10.0.0.[?]  netmask 255.255.255.0  broadcast 10.0.50.255
         inet6 fe80::250:56ff:fe84:5979  prefixlen 64  scopeid 0x20<link>
         ether 00:50:56:84:59:79  txqueuelen 1000  (Ethernet)
         RX packets 1675151509  bytes 296413707167 (276.0 GiB)
         RX errors 0  dropped 235945  overruns 0  frame 0
         TX packets 1428044385  bytes 471991954342 (439.5 GiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
 lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
         inet 127.0.0.1  netmask 255.0.0.0
         inet6 ::1  prefixlen 128  scopeid 0x10<host>
         loop  txqueuelen 0  (Local Loopback)
         RX packets 409040766  bytes 330694444196 (307.9 GiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 409040766  bytes 330694444196 (307.9 GiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Find the directory you are in

$ pwd 
/var/lib/pgsql/9.3/data

Show local users

$ cat /etc/passwd;
 root:x:0:0:root:/root:/bin/bash
 bin:x:1:1:bin:/bin:/sbin/nologin
 daemon:x:2:2:daemon:/sbin:/sbin/nologin
 adm:x:3:4:adm:/var/adm:/sbin/nologin
 lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
 sync:x:5:0:sync:/sbin:/bin/sync
 shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
 halt:x:7:0:halt:/sbin:/sbin/halt
 mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
 ....

Show local groups

$ cat /etc/group;
 root:x:0:
 bin:x:1:
 daemon:x:2:
 sys:x:3:
 adm:x:4:
 tty:x:5:
 disk:x:6:
 ....

Find which groups the user ‘apache’ is in

$ groups apache;
 apache : apache svc_account1

grep/awk/sed

Learn these three tools, they are the swiss army knife to the system’s administrator’s war chest. They can be used to do most anything and you will see many example combinations and use cases in the sections below.

You can be like captain planet and when you combine the power of all three of these tools you can create very powerful and flexible scripts that do all the heavy lifting for you.

grep

grep is a tool mainly used for searching or weeding out information.

$ grep -r mysql; # will search the entire directory tree and every file in those directories for lines containing the word mysql
$ cat /var/log/maillog | grep user@fqdn.com

https://en.wikipedia.org/wiki/Grep

awk

I generally use awk piped at the end of a command to filter parts of the returned information. But awk is it’s own programming language and can be just as powerful as perl or any other scripting language.

$ df -h -t nfs -P | grep /vol/ |  awk '{ print $5 " " $6}'; 
# df -h gives me a listing of the mounts, I search for /vol/ to get mounted drives, then I awk for result $5 and $6 of each returned line which is always used percentage and the partition that is in question.
$ grep "request ID" history | awk 'match($0,"is"){print substr($0,RSTART+3,50)}'`; 
# this searches for "request ID" in a file called history and awk looks for the word "is" then prints 50 characters, 3 characters after "is", which is always the request ID in this example. 

http://en.wikipedia.org/wiki/AWK

sed

sed is used for inline text editing and manipulation

$ sed -n 51,61p sbr/index.html | sed -i '50r /dev/stdin' testsed;
# this takes lines 51-61 of index.html and appends them after line 50 in testsed
 
IP=`nslookup $HOSTNAME | grep Address | grep -v "#53"| awk '{ print $2}'`;
# looks up hostname using DNS and greps for address, ignores anything with #53,prints second value on returned line
$ echo "$IP"
$ sed -i 's/0.0.0.0/'$IP'/g' /etc/monitrc; - # reads file, greps for 0.0.0.0, then replaces 0.0.0.0 with result of $IP in script.

CBSG

$ curl -s http://cbsg.sourceforge.net/cgi-bin/live | grep -Eo '^<li>.*</li>' | sed s,\</\\?li\>,,g | shuf -n 1

https://en.wikipedia.org/wiki/Sed

Disk is full

The following are commands and explanation of the many things you can do to find and remove large log/temporary files, unusual file names, and similar scenarios.

Below is a list of the most likely to be filled areas and should be the first place(s) one looks for space that can be reclaimed

  1. /var/log
  2. /var/spool/{clientmqueue,mqueue,mail}
  3. /tmp
  4. /var/tmp

df

df -h; shows one the local and mounted filesystems and the amount of space, used vs available.

$ df -h
Filesystem                              Size  Used Avail Use% Mounted on
/dev/sda3                                12G  4.9G  6.7G  43% /
/dev/sda1                               497M  115M  383M  24% /boot
10.10.0.248:/vol/remote_data            1.9T  1.2T  628G  66% /mnt/remote_data

du

du-sh /var/log/; will count the size of each file within a directory and total it.

$ du -sh /var/log
113M    /var/log
$ du -shx /var/log/*
4.0K    /var/log/alternatives.log
100K    /var/log/apt
0       /var/log/btmp
4.0K    /var/log/dist-upgrade
92K     /var/log/dpkg.log
4.0K    /var/log/journal
4.0K    /var/log/landscape
4.0K    /var/log/lastlog
4.0K    /var/log/unattended-upgrades
0       /var/log/wtmp

Clearing large Log files

Do not just rm -rf logfile; if a process, say apache, is still writing to the logfile and you remove the file from existence the process will crash. Remember we wish to have an uptime of 24/7/365. Instead you can clear out the file to size 0 and allows the process to continue writing to it.

$ cat /dev/null > /var/log/messages

No Space left to Delete

Rarely, but it does happen a volume will fill up and be so full that rm will not work as it creates temporary records while it deletes files. There is another way to delete the file by finding it’s inode number and using find to delete the file

let’s assume you did all steps above and found the large offending file.

$ rm /var/tmp/3a8066e5-a90c-4ae5-bdc6-47e117acf354.error
 rm: remove regular file ‘/var/tmp/3a8066e5-a90c-4ae5-bdc6-47e117acf354.error’? y
 rm: cannot remove ‘/var/tmp/3a8066e5-a90c-4ae5-bdc6-47e117acf354.error’: No space left on device
$ ls -li /var/tmp/3a8066e5-a90c-4ae5-bdc6-47e117acf354.error
 56436168 -rw-r--r-- 1 gerbn308 zxdev 0 May 13 11:17 /var/tmp/3a8066e5-a90c-4ae5-bdc6-47e117acf354.error
find . -inum 56436168 -delete

find

Find is really really useful and below are some of the ways find has solved strange issues for me

find date range

$ ll -tr;
# will list out files in time/date order newest -> oldest (remove the r if you would like oldest -> newest)
$ find . -type f -newer file_xyz.txt ! -newer recent_zzy.txt -exec ls -l {} \; -print &> output.txt;
# make sure I got the correct range of data greped for earliest date and latest date
$ find . type -newer file_xyz.txt ! -newer recent_zzy.txt delete; 

find a specific file type

In this example we are finding all mp3 files and moving them to a mounted usb drive on /mnt/mp3

$ grep *.mp3 
# lets assume there is a mix of file types with no standard naming onvention and there are thousands of them
$ find / -iname "*.mp3" -exec mv {} /mnt/mp3 ;
# you can have any command in the -exec section

find anything older than 12 hours

Sometimes you have to alleviate some pressure so the file system does not fill up and you do not want to clear out files that may be actively being worked on.

ls -R | wc -l; 
# will list number of files in directory
find . -type f -mmin +720 -delete; 
# find and delete anything over 12 hours old

find like grep

find can be used much in the same way as grep to search for the name of files in a directory.

cd /usr/local/lib;
find ./ -name "*gdal*" -print; 
# prints out all files containing the word gdal in their name
find ./ -name "*gdal*" -delete; 
# deletes all the files containing the word gdal in their name

Server seems slow

This is the holy grail of complaints as there are literally millions of things that could cause a server to be slow.

top

top gives one a task explorer like peak into the activity of the server. When one is using top you can press u; key and sort by specific username. c; allows one to get more detail on the processes running. top by default tries to list everything in order by highest use of %CPU.

$ top -u apache
  top - 08:37:14 up 41 days, 20:30,  3 users,  load average: 0.00, 0.04, 0.08
  Tasks: 140 total,   3 running, 136 sleeping,   0 stopped,   1 zombie
  %Cpu(s):  1.7 us,  2.0 sy,  0.0 ni, 95.9 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
  KiB Mem:   1885520 total,  1339856 used,   545664 free,        0 buffers
  KiB Swap:  4095996 total,    52368 used,  4043628 free.   462444 cached Mem
  
  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
  11462 apache    20   0 1005776  12212   4072 S  0.0  0.6   1:13.79 httpd
  11492 apache    20   0 1005640  12688   4208 S  0.0  0.7   1:13.22 httpd
  11493 apache    20   0 1005636  12124   4104 S  0.0  0.6   1:13.60 httpd
  12802 apache    20   0  254124   1636    836 S  0.0  0.1   0:00.00 httpd
  12803 apache    20   0  255300   1532    672 S  0.0  0.1   0:14.62 httpd
  12804 apache    20   0  255300   1508    652 S  0.0  0.1   0:00.18 httpd
  12806 apache    20   0 1005940  12892   3548 S  0.0  0.7   4:35.21 httpd
  19201 apache    20   0  656156  13036   1832 S  0.0  0.7   0:00.03 php-fpm
  19202 apache    20   0  656156  13036   1832 S  0.0  0.7   0:00.03 php-fpm

free

The free command will give one a snapshot of the current memory and swap usage. Remember to subtract buffered and cached memory from used to get an actual representation of the amount of RAM in use.

$ free
              total       used       free     shared    buffers     cached
 Mem:       3924876    3663056     261820          0     298528    1755512
 -/+ buffers/cache:    1609016    2315860
 Swap:      4194296     235456    3958840

ps

The ps or process command can be used to get a very detailed account of every single process running at the exact moment you enter the command, it does not refresh like top, but can clue you into process trees and zombie/dead/defunct processes that might not show up in top. I will truncate the output below as it can be quite lengthy.

$ ps fax
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:18 /sbin/init
  480 ?        S<s    0:00 /sbin/udevd -d
 5799 ?        S<     0:00  \_ /sbin/udevd -d
 1057 ?        S      1:29 /opt/chef-server/embedded/service/bookshelf/erts-5.9.3.1/bin/epmd -daemon
 1349 ?        Sl   174:03 /usr/sbin/vmtoolsd
 1836 ?        S<sl   0:47 auditd
 1854 ?        Ss     0:00 /sbin/portreserve
 ...
 2194 ?        Ssl    1:07 hald
 2195 ?        S      0:00  \_ hald-runner
 2224 ?        S      0:00      \_ hald-addon-input: Listening on /dev/input/event2 /dev/input/event0
 2235 ?        S      0:00      \_ hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
 2255 ?        Ssl    1:44 automount --pid-file /var/run/autofs.pid
 2307 ?        Ss     0:00 rpc.rquotad
 3078 tty2     Ss+    0:00 /sbin/mingetty /dev/tty2
 3081 tty3     Ss+    0:00 /sbin/mingetty /dev/tty3
 3083 tty4     Ss+    0:00 /sbin/mingetty /dev/tty4
 3092 tty5     Ss+    0:00 /sbin/mingetty /dev/tty5
 3095 tty6     Ss+    0:00 /sbin/mingetty /dev/tty6
 4846 tty1     Ss+    0:00 /sbin/mingetty /dev/tty1
21685 ?        Ss     3:27 /usr/sbin/httpd
 7578 ?        S      0:18  \_ /usr/sbin/httpd
 7579 ?        S      0:00  \_ /usr/sbin/httpd
 7580 ?        S      0:00  \_ /usr/sbin/httpd
 7581 ?        S      0:00  \_ /usr/sbin/httpd
 7582 ?        S      0:00  \_ /usr/sbin/httpd
 7583 ?        S      0:00  \_ /usr/sbin/httpd
 7584 ?        S      0:00  \_ /usr/sbin/httpd
 7585 ?        S      0:00  \_ /usr/sbin/httpd
 7586 ?        S      0:00  \_ /usr/sbin/httpd
 6584 ?        Sl    24:30 /usr/bin/monit
 6133 ?        Ss     0:07 /usr/sbin/sssd -f -D
 6135 ?        S      0:02  \_ /usr/libexec/sssd/sssd_nss --debug-to-files
 6136 ?        S      0:02  \_ /usr/libexec/sssd/sssd_pam --debug-to-files
 6137 ?        S      0:01  \_ /usr/libexec/sssd/sssd_ssh --debug-to-files
 6166 ?        S      0:05  \_ /usr/libexec/sssd/sssd_be --domain default --debug-to-files

ps faux # the u will give one system usage along with the process tree

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      2951  0.0  0.0 117296  1256 ?        Ss    2014   1:35 crond
root      6584  0.0  0.0 179628  2872 ?        Sl   Mar17  24:30 /usr/bin/monit
jenkins   5454  1.4 14.7 2503936 577244 ?      Ssl  Apr09 700:00 /etc/alternatives/java -Djava.awt.headless=true
root     28121  0.0  0.0 101428  3364 ?        Ss   May11   0:09 /var/cfengine/bin/cf-execd
root     28130  0.6  0.1 366888  4236 ?        Ss   May11  24:15 /var/cfengine/bin/cf-serverd
root     28141  0.0  0.1  35624  5284 ?        Ss   May11   0:59 /var/cfengine/bin/cf-monitord
root     17905  0.0  0.0  22180   988 ?        Ss   May12   0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
dhcpd    18334  0.0  0.1  49000  4376 ?        Ss   May12   0:02 /usr/sbin/dhcpd -user dhcpd -group dhcpd
root      6133  0.0  0.0 199608  2372 ?        Ss   May12   0:07 /usr/sbin/sssd -f -D
root      6135  0.0  0.3 201872 14588 ?        S    May12   0:02  \_ /usr/libexec/sssd/sssd_nss --debug-to-files
root      6136  0.0  0.0 192212  2808 ?        S    May12   0:02  \_ /usr/libexec/sssd/sssd_pam --debug-to-files
root      6137  0.0  0.0 189892  2688 ?        S    May12   0:01  \_ /usr/libexec/sssd/sssd_ssh --debug-to-files
root      6166  0.0  0.1 229764  6916 ?        S    May12   0:05  \_ /usr/libexec/sssd/sssd_be --domain default

kill

Now is the time to learn how to stop runaway processes. Always first try to do a service reset as it will exit “correctly”; and start again and if there are more problems you should be able to see them in a logfile, instead if you just kill it. The process will not have time to show you errors.

$ kill PID PID2 PID3;
# you can kill any number of specific processes as long as you get their PID number from either top or ps fax.
$ kill -9 PID PID2 PID3; 
# this <b>REALY</b> kills it if the above does not work, sometime you have to resort to the most drastic measure to get a zombie process out of there. Use sparingly as regular kill alows the process to stop and let go of files before exiting. adding -9 kills it immediately and may leave behind file locks.
$ killall httpd; 
# will kill all processes with the word httpd in the name
$ killall -u user1; 
# will kill all processes running as the user1 user

Kill - my favorite one liner

ps fax | grep httpd | awk '{print $1}' | xargs kill 
# substitue any process name in the grep and kill a bunch at one time, good for when things are going really crazy

Networking

Often times you will have to determine if a service is working correctly by whether or not it is listening on the correct port or if it is responding at all

ping

The most common network debugging tool, ping. It is good for a quick up/down test of the machine, but NOTE if the machine is super busy it will not respond to ping right away and you could be seeing slow ping times and it could have nothing to do with the network or the NIC.

$ ping -t 8.8.8.8 
# -t will continuously ping, never stop until you quit

telnet

Telnet is a good way to query a specific remote listening port to see if a service is responding as it should

$ telnet sql_host 5432
Trying sql_host...
Connected to sql_host.
Escape character is '^]'.
^]
exit

netstat

netstat produces a list of listening interfaces, ports, and connections to and from the machine.

$ netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:5308            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:46341           0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:9000          0.0.0.0:*               LISTEN
tcp6       0      0 :::443                  :::*                    LISTEN
tcp6       0      0 :::34459                :::*                    LISTEN
tcp6       0      0 :::22                   :::*                    LISTEN
udp        0      0 0.0.0.0:53348           0.0.0.0:*
udp        0      0 0.0.0.0:111             0.0.0.0:*
udp        0      0 0.0.0.0:123             0.0.0.0:*
udp        0      0 0.0.0.0:36748           0.0.0.0:*
udp6       0      0 :::111                  :::*
udp6       0      0 :::55414                :::*
udp6       0      0 :::48913                :::*
raw6       0      0 :::58                   :::*                    7
$ netstat -anetu | grep 514; 
tcp        0      0 0.0.0.0:514                 0.0.0.0:*                   LISTEN
tcp        0      0 :::514                      :::*                        LISTEN
tcp        0      0 :::5514                     :::*                        LISTEN
tcp        0      0 ::ffff:127.0.0.1:44946      ::ffff:127.0.0.1:9300       ESTABLISHED 497        3320514
udp        0      0 0.0.0.0:514                 0.0.0.0:*                       
udp        0      0 :::514                      :::*                            
udp        0      0 :::5514                     :::*       

ss

the ss command can do the same and more as netstat and should be used in the future. netstat has become deprecated/obsolete to ss since CentOS 6.4

$ ss -apnetu | grep 443;
 tcp    LISTEN     0      128                   :::443                  :::*      ino:202599741 sk:ffff88013beba100
 tcp    ESTAB      0      0      ::ffff:10.0.20.63:443    ::ffff:70.198.42.193:2400   timer:(keepalive,119min,0) uid:48 ino:228626831 sk:ffff880011629880
 tcp    TIME-WAIT  0      0      ::ffff:10.0.20.63:443    ::ffff:70.198.42.193:2426   timer:(timewait,48sec,0) ino:0 sk:ffff88013cb3c940
 tcp    ESTAB      0      0      ::ffff:10.0.20.63:443    ::ffff:70.198.42.193:2421   timer:(keepalive,119min,0) uid:48 ino:228626830 sk:ffff8800027380c0
 tcp    TIME-WAIT  0      0      ::ffff:10.0.20.63:443    ::ffff:70.198.42.193:2410   timer:(timewait,47sec,0) ino:0 sk:ffff88013cb3ca80
 tcp    ESTAB      0      0      ::ffff:10.0.20.63:443    ::ffff:70.198.42.193:2414   timer:(keepalive,119min,0) uid:48 ino:228626829 sk:ffff88006031c100
 tcp    TIME-WAIT  0      0      ::ffff:10.0.20.63:443    ::ffff:70.198.42.193:2401   timer:(timewait,48sec,0) ino:0 sk:ffff880017550e80

Find all incoming connections from unique IP Addresses

$ netstat -tapn | awk '{print $5}' | sed 's/::ffff://' | sed 's/:.*//' | sort | uniq -c | sort;
     1 10.0.20.63
     1 10.0.50.232
     1 10.0.50.234
     1 10.0.50.244
     1 10.0.5.54
     1 10.10.1.231
     2 10.0.5.154
     2 129.82.224.115
    22 129.82.224.137
     3 10.0.20.201
    39 10.0.20.52
     4 10.0.20.30
     9 0.0.0.0

Find all outgoing connections to unique IP Addresses

$ ss -tapw | awk '{print $5}' | sed 's/::ffff://' | sed 's/:.*//' | sort | uniq -c | sort;
  1 10.10.1.63
  12 10.0.20.63
  1 Local
  3 10.0.50.63

lsof

lsof can also be used to determine which process is using a specific port. We once had an issue with a process running on a sendmail port which would not allow sendmail to start on one of our production web servers. This was key in tracking that down and has become a quicker way to check specific ports than netstat.

$ lsof -i :514
COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
rsyslogd 4835 root    1u  IPv4 3354631      0t0  TCP *:shell (LISTEN)
rsyslogd 4835 root    2u  IPv6 3354632      0t0  TCP *:shell (LISTEN)
rsyslogd 4835 root    3u  IPv4 3354623      0t0  UDP *:syslog
rsyslogd 4835 root    4u  IPv6 3354624      0t0  UDP *:syslog

Copying large number of files

rsync

rsync is the method to use when copying a large number of files big or small. The / in the from where to where are very important and is the diference between copying the directory or just everything within the directory to a new location.

$ rsync -avr /nfs/data othermachine:/new/dataarea/ --progress;
# copies local directory and all of it's content data to othermachine:/new/dataarea/data
$ rsync -avr othermachine:/new/dataarea/ /nfs/data --progress; 
# copies all data FROM the othermachine under dataarea/ to /nfs/data/

screen

Highly recommended to do any long running operations in a screen session so that if your connection to the machine timesout, the operation does not end. This is also a way to do things privetly on a server no one can see within a screen session, unless you allow it

$ screen -S copyfiles; 
# start a new session, named copyfiles
$ rsync -avr /nfs/data othermachine:/new/dataarea/ --progress; 
## example command
Ctrl-a d 
# While the rsync is running, Control-a d will detach but leave running the operation and you can continue doing what ever you want
$ screen -r copyfiles; 
# will reattach to the the rsync screen session
$ screen -d R session_share; 
# allows you to <a class="external text" href="http://technonstop.com/screen-commands-for-terminal-sharing" rel="nofollow">share your session</a> with someone else, useful for training.
$ screen -x session_share; 
 # allows your friend on the same machine to connect to your session
Ctrl-d - while inside the screen will detach and terminate the session

I need my script to run at startup

Scenario: On many of our servers we need NFS mounts to other servers/NAS devices to be present at start up so that users/services/websites can get to their data.

$ vim /usr/local/bin/nfs_mounts.sh
# insert nfs mount command to new file nfs_mounts.sh
$ chmod 700 /usr/local/bin/nfs_mounts.sh
# changes permissions so only root user can execute.
$ vim /etc/rc.local
# add the line at the end
$ /usr/local/bin/nfs_mounts.sh &
# make sure their is an &; symbol after the command
$ chmod -x /etc/rc.local 
# the rc.local file should be set to executable permissions

sidenote: Do not use fstab/mtab to auto-mount NFS volumes if they are not present at boot time then the system will hang indefinetly until it is available. our nfs_mounts shell script is the way to get around that.

PATH

Sometimes a problem occurs when one build a project via source and the executable are not located in /usr/bin or /usr/local/bin, the filesystem does not automatically know where to find them via name so one has to type out the full path to interact with those files.

env

$ env | grep PATH
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/var/cfengine/bin:/root/bin

Add new location to PATH

To temporarily add a new location to path type the following:

$ export PATH=/home/user/new_path:$PATH;  
# adding $PATH will keep the old PATH along with the new one, order matters, IF you forget $PATH nothing will work, reboot

To permanently add a new location to path there are 2 options:

User Specific paths

If the user always logs onto the same machine and needs certain files.

$ vim /nethome/username/.bashrc
add export command to the file, examples of some below
 # .bashrc
 export SVN_EDITOR=vim
 export PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/opt/grads-  2.0.1.oga.1/Contents:/opt/grib2/wgrib2:/home/gempak/NAWIPS/os/linux64/bin:/home/ldm/bin:
 export GAVERSION=2.0.1.oga.1
 source /home/gempak/NAWIPS/Gemenviron.profile
 export LAPS_DATA_ROOT=/wxrnd/lapsdata
 export LAPS_SRC_ROOT=/opt/laps-0-50-19
 export LAPSINSTALLROOT=/opt/laps-0-50-19

Machine Specific paths

This is probably the “best practices”; way and I would recommend doing this from now instead of .bashrc. As it allows anyone who logs into the machine to have the same PATH settings.

$ touch /etc/profile.d/postgresl.sh
$ vim /etc/profile.d/postgresql.sh;  # make sure file ends with .sh or it will not be read by filesystem
 export PATH=/usr/pgsql-9.3/bin:$PATH
 export MANPATH=$MANPATH:/usr/pgsql-9.3/share/man
# save, log out and log back in
$ env | grep PATH;
# and you should now see the new PATH settings

Repeat this for any/all source built files that have non-standard paths or settings.

Change Hostname

The following files are what you need to edit to change the hostname of a machine

$ vim /etc/hostname;
# edit to newhostname.example.com
$ vim /etc/hosts; 
# add/edit "newhostname.example.com newhostname" to beginning of both lines in file
$ hostname newhostname.example.com
$ export HOSTNAME=newhostname.example.com;