Universal System Monitoring

Author : james@2longbeans.net
Date : 30th Nov 2007

Description

The Universal System Monitoring (USM) system is a software solution for monitoring operating system statistics. It is aimed at system administrators interested in reviewing system load and utilization over a period of several minutes, or as long as a year. Presently, it has been tested on Linux, Solaris and FreeBSD. Its design and development has been motivated by several factors. They include :

However, unlike other monitoring or management solutions, USM is not designed for :

Once Upon A Time ...

Most of the common monitoring systems I've implemented revolve around RRDTool. This typically involves writing a script which periodically collects the data of interest, and then updates it into a local RRD database. In order to view graphs of the data captured, my script would also periodically render image files by pulling out data from the local RRD database. To make it convenient to view these graphs, I'd then write a webpage containing the graphs that are of interest to me, and this is in turn published by a webserver. My webpage makes my browser auto refresh every minute, while my script keeps regenerating new image files in the background.

This works fine mostly, but there are some design flaws. Firstly, if a second server is set up, the whole process has to be repeated. This means I now have 2 different URLs at which I view my graphs. If I had 50 servers, I'd have 50 URLs. This also means that I have 50 webservers running, with 50 scripts pulling out data and rendering graphs. I'd most probably would not be watching all 50 servers at the same time, realistically I'd probably watch about 5 or so boxes. This means that I have 45 servers rendering graphs which never get looked at. This translates to a lot of CPU cycles "wasted" doing work that was never used. Then there is the problem of running 50 webservers, which is simply ridiculous. If I consolidated the webservers into a single webserver, that would be great, but that means I'd have to invent a mechanism to transfer the rendered graphs over the network from all the other 49 servers. Again, if I wasn't looking at any graphs, it will all be a waste of CPU and network resources.

Architecture Of USM

USM consists of exactly 2 perl scripts. The perl scripts however, do require access to external tools (eg, vmstat, iostat, rrdtool, etc). The first script, usm.pl, is designed to run on each server you want to collect statistics on. This script runs as a daemon process. The second script, usm.cgi is run by a webserver, and thus is only found on servers running httpd. You decide where you want to place your httpd instances. This design allows an unlimited number of servers to run USM. If nobody's interested to watch, not a single packet goes across the network. If engineer A wants to check on servers I,J,K, then data flows from I,J,K to A's webserver. At the same time, engineer B wants to check on servers X,Y,Z. Data now flows from X,Y,Z to B's webserver. Data transfer only occurs if somebody wants to watch graphs and the graphs are only rendered when a webbrowser requests for them. If data needs to be obtained over the network, we could use NFS, FTP or some other mechanism. However, in the interest of security and efficiency, USM uses its own methods for data transfer. More on this later. The following diagram illustrates all the moving parts in the system.

usm.pl : This script is actually a daemon. For making deployment easy, it is designed for (almost) zero configuration. Only the path of the RRD database is specified on commandline. If it does not exist, it is created. From this point on, usm.pl just sits in the background entering data in 10 sec intervals. In addition, usm.pl opens a TCP and UDP port and listens on both. The TCP port is used for interacting (reliably) with usm.pl, while the UDP port is used for transferring snapshots of current OS statistics (like SNMP).

usm.cgi : This script has dual personalities. It begins its life when httpd spawns it like any CGI. This is usually in response to the enduser submitting an HTML form which specifies the graphs desired. The script has the capability of behaving like a CGI, and also fork()'ing and becoming a daemon. Both the CGI instance and daemon instance serve different purposes, and the following time line would illustrate their roles, as well as the interaction with remote servers running usm.pl.

  usm.cgi (CGI instance) usm.cgi (Daemon instance) usm.pl (remote box)
1. usm.cgi is spawned by httpd. It checks if the daemon's control pipe exists. If absent, it forks and the child becomes a daemon.    
2. wait for control pipe to get created. Daemon starts up and creates a control pipe as well as a UDP socket. Requests from CGI will arrive on this pipe. The UDP socket will be used later.  
3. Compose a list of servers user wants graphs rendered for. Send this via the control pipe. Create a UDP socket, the daemon will send a reply to this socket later. Receives a list of servers. We'll need local copies of the RRD data for each of these servers.  
4. wait for reply from daemon Check if we have a local copy of each RRD required. If non-existent, connect to the remote server running usm.pl and pull a copy of its database over. Accepts an incoming TCP connection. Responds by transferring my RRD file to the requesting usm.cgi deamon.
5. wait for reply from daemon Keep a list of all the servers for which we've got a copy of their RRD database. Now send a UDP packet to the CGI to indicate that its required databases are present.  
6. We now generate an rrdcgi executable script. This script contains directives which will become the graphs requested by the enduser. When complete, exec() the script.    
7. We're now an rrdcgi script (see the manpage for rrdcgi). Parse the contents of the script, generate PNG graphs as requested. Output the parsed HTML to standard output, which is actually connected to the httpd process which created us.    
8. process exits Go through the servers for which we hold a copy of their RRD database. Every 10 seconds, send a UDP packet, requesting for a snapshot of their current OS stats. Respond to the request by sending back a snapshot of current OS stats as a UDP packet.

Implementing usm.pl

When designing the usm.cgi script, one of the main design goals was to ensure that 1 script could be used on diverse operating systems. Thus I began by examining the common performance metrics shared between Linux, Solaris and FreeBSD. I also wanted to ensure consistency between OSes. That is to say, if I measured something in Linux, I would like to measure that item in FreeBSD and Solaris as well. For this reason, items which may be hardware specific have been left out (eg, system temperature gauges). The units of measurement had to be standardized as well. I finally arrived at 15 core attributes :

  1. RAM used in bytes.
  2. Swap used in bytes (sum of all swap devices).
  3. Forks since boot.
  4. Load average.
  5. Percentage CPU time in user space.
  6. Percentage CPU time servicing system calls.
  7. Percentage CPU time idle.
  8. Current context switches per second.
  9. Current number of hardware interrupts per second.
  10. Bytes read from disk per second (sum of all block devices).
  11. Bytes written to disk per second (sum of all block devices).
  12. Total packets arrived on the network (sum of all interfaces).
  13. Total packets written to the network (sum of all interfaces).
  14. Current number of processes (including kernel processes).
  15. Percentage disk usage on the root filesystem.

In order to capture these attributes, usm.pl would have to deploy slightly different techniques depending on which OS it ran on. For example, on Linux, swap devices can be read from /proc/swaps, while on Solaris that information is obtained from the command swap -l. At the same time, disks on linux are typically named hda, sda, etc. On Solaris, they could be called cmdk0, sd0, etc. And on FreeBSD, they could be named dad0, ad0, etc. In order to correctly identify these devices, usm.cgi makes use of preset regular expressions. Unfortunately, this means that it might be necessary to modify these regular expressions to match future hardware.

When you want to obtain statistics of a server running usm.cgi, you can simply telnet to its TCP port. Upon accepting an incoming TCP connection, usm.cgi forks a child and the child exits when the TCP session terminates. Access control should be handled by your machine's packet filter. There are some features to reduce the effects of abuse. These include restricting the number of accepted TCP connections to 10 per 10 second block, and also disconnecting idle sessions. The interactive telnet interface has been designed so that you don't need to use usm.cgi to pull data. You could always write your own. The following example shows a telnet session with a server running usm.

    % telnet kelly 55355
    Trying 10.1.1.1...
    Connected to kelly.
    Escape character is '^]'.
    usm> showver
    byteorder: 1234
    conf_debug: 0
    conf_interval: 10
    conf_cli_idle_timeout: 60
    conf_control_port: 55355
    conf_update_check: 600
    conf_accepts_per_loop: 10
    conf_rrd_file: /tmp/usm.rrd
    glob_myname: /home/jamsie/BII/work/usm-0.1/usm.pl
    glob_os: Linux
    glob_pagesize: 0
    glob_loops: 6730
    glob_accepts: 4
    glob_version_check: 1196498238
    glob_version: 1196230148, 14:9:8 28/11/2007
    usm> curstats
    ram_used: 399446016
    swap_used: 0
    forks: 938126
    load_average: 0.410000
    user_cpu: 1
    system_cpu: 5
    idle_cpu: 94
    ctx_switches: 3636
    interrupts: 915
    disk_io_in: 0
    disk_io_out: 154419
    network_pkts_in: 366646698
    network_pkts_out: 339475706
    processes: 111
    rootfs_usage: 81
    usm>
    You have idled too long. Goodbye.
    Connection closed by foreign host.
    

The usm.pl script is also designed for auto-updating. This means that if you overwrote the usm.pl file with a newer one, usm.pl will automatically exec() the new script over itself. This makes upgrades easier if a large number of servers run usm.pl off a shared network filesystem. The script can also be run in debug mode, by setting the environment variable DEBUG to non-zero (1 means a bit of messages, 2 means more detail). It does not daemonize if run in debug mode.

Implementing usm.cgi

Since this script is a CGI, it is designed to be run from httpd, with all necessary information set in the QUERY_STRING environment variable. The information passed to usm.cgi is typically collected by an HTML form, which invokes a HTML GET and example of such an HTML form is :

One possible QUERY_STRING would be :

    servers=202.6.243.39&duration=1h&network_pkts_in=on&network_pkts_out=on

The QUERY_STRING is typically constructed by form input elements and the following tokens are used :

If invoked for the very first time by httpd, its daemon instance would not exist. Thus usm.cgi calls fork() to create the daemon. From this point on, a single daemon will service multiple instances of usm.cgi, created by httpd as endusers request for graphs. The main purpose of the daemon is to ensure that the local copies of the RRD databases required, are available and their data is up to date. Upon requesting that a (new) local RRD database be made available, the daemon starts by connecting to the TCP port of the usm.pl running on the remote machine. If the endianess is compatible (ie, webserver machine and the remote machine), then the RRD file is copied over directly, incurring about 5MB of network traffic. If the endianess is not compatible, the RRD file cannot be copied over, and the data must be imported by running "rrdtool dump" on the remote machine and then connecting the data stream to a "rrdtool restore" on the webserver machine. This incurs about 10-15MB of network traffic. In order to avoid performing such an expensive data transfer, it would make sense to keep the local RRD databases up to date. Every 10 seconds, usm.cgi daemon will request remote servers to send their statistics. These statistics are then updated into the local RRD copies.

The usm.cgi daemon will linger as long as there is demand for graphs to be rendered. That is to say, the daemon keeps timers of which local databases are requested. If a particular server database has not been requested for in a very long time, it makes no sense to keep updating it. Thus the local copy is deleted. Similarly, if no enduser requests for graphs to be rendered for a very long time, then there is no reason for the daemon to exist. It deletes whatever files it has created and then exits.