Friday, 10 August 2012

Load Status script -For Monitoring the server

Load Status script -For Monitoring the server
============

 Load Status script -For Monitoring the server

       This script will append the results of w, pstree -apu and
mysqladmin proc results to a file  /loadtest.txt.



Suppose you want a server to be monitored 1 or 2 days and make sure that no
load spikes is there you can use this script.

Script: /load.sh

==========

#!/bin/bash
#Script written by MANOJ
#Script for checking the Load status...
cd /
touch loadtest.txt

echo  "LOAD STATUS RUN ON" `date` >> /loadtest.txt
echo "*******************" >> /loadtest.txt
echo "*******************" >> /loadtest.txt

echo " W Results  " >> /loadtest.txt

w >> /loadtest.txt
echo "*******************" >> /loadtest.txt
echo "*******************" >> /loadtest.txt

echo "PSTREE Results" >> /loadtest.txt
echo "*******************" >> /loadtest.txt

pstree -apu >> loadtest.txt
echo "*******************" >> /loadtest.txt
echo "*******************" >> /loadtest.txt
echo "MYSQLADMIN Results" >> /loadtest.txt
echo "*******************" >> /loadtest.txt

mysqladmin proc >> /loadtest.txt

echo "*******************" >> /loadtest.txt
echo "*******************" >> /loadtest.txt

==========

Now run the script in cron for every 5 or 10 minutes:

echo " */5 * * * * sh /load.sh " >>/var/spool/cron/root


Advantages:


1. Can be used for load monitoring and finding the cause of load.

2. When server load is high, you might not be able to SSH to server or server
will be unresponsive. Most of times you will get the reasons from  the script
logs. :)

3. Can check if the server went down due to load spike.



How to check the load status:


If the logs is not that of big size you can use following grep command:

grep "load av" /loadtest.txt

else tail last 1000's of lines:

tail -10000 /loadtest.txt |grep "load av"



Example

I will illustrate with one example:

===============

root@[/]# tail -10000 /loadtest.txt |grep "load av"

 14:00:02 up 28 days, 23:20,  0 users,  load average: 1.32, 1.54, 1.69
 14:05:02 up 28 days, 23:25,  0 users,  load average: 2.31, 4.02, 3.00
 14:10:01 up 28 days, 23:30,  0 users,  load average: 1.85, 2.91, 2.80
 14:15:01 up 28 days, 23:35,  0 users,  load average: 3.23, 2.72, 2.73
 14:20:03 up 28 days, 23:40,  0 users,  load average: 4.95, 3.73, 3.11
 14:25:02 up 28 days, 23:45,  0 users,  load average: 5.21, 5.17, 3.99
 14:30:35 up 28 days, 23:51,  0 users,  load average: 45.81, 33.03, 16.14
 14:35:02 up 28 days, 23:55,  0 users,  load average: 4.86, 21.42, 16.05
 14:40:01 up 29 days, 0 min,  0 users,  load average: 3.32, 9.75, 12.43

We can see that there is load  spike during "14:30:35 up 28 days, 23:51,".


How to find the cause:


            Grep some lines after "14:30:35 up 28 days, 23:51,  0 users,".

Command:  grep -A600 -B20 "14:30:35 up 28 days, 23:51,  0 users," /loadtest.txt


I am able see that load spike was due to high usage by a php script from the
pstree results.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.