Torrus User Guide

Quick start guide

The steps below will explain you how to make the thing running.

Install Torrus. Follow the Torrus Installation Instructions document, all prerequisits and necessary steps are described there.

What is where. The executables reside in /opt/t3/t/torrus/bin/. You normally don't need to access this directory, because the commandline wrapper, torrus, is installed in a usual execution path (/opt/t3/t/bin). All site-specific behaviour is controlled by configuration files in /opt/t3/t/etc/torrus/conf/. Usually you need to change torrus-siteconfig.pl only. In this file, you must list your XML configuration sources. The datasource trees configuration is read out of XML files. They are searched in several directories, normally /opt/t3/t/torrus/xmlconfig/ and /opt/t3/t/etc/torrus/xmlconfig/. The first one contains files that come with Torrus distribution, and the second one is for your local site-specific XML files. Global site-specific XML configuration parameters may be defined in site-global.xml. XML configuration is compiled into internal database representation by torrus compilexml command. The database itself is a Git repository, and it is resided in /opt/t3/t/var/gitrepo/. It is safe to re-compile the configuration while the Torrus daemons are running.

The datasource trees. Torrus configuration consists of a number of trees. Each tree is independent from the others. A tree may run multple Collector and one Monitor processes. Also the web interface access control lists differentiate the user rights by datasource trees.

Inside the tree. A tree defines the hierarchy of Torrus datasources. The structure of the tree is solely defined by XML configuration files. The tree consists of nodes, each being either a subtree or a leaf. Subtrees contain child subtrees and/or leaves. The leaf represents a datasource: normally this is a numerical value that changes over time. The leaf is the entity that may be presented as a graph. There are leaves of special type: multigraph. They are not numerical values, and are designed for drawing several values in one graph. Each node has path, a string that consists of slashes and node names, and uniquely identifies this node. The path of a subtree always ends with slash, and the root of the tree has the path consisting of a single slash.

Trees configuration. The trees are defined in torrus-siteconfig.pl. See Torrus Installation Instructions for a basic example of tree configuration.

Round-robin databases. RRDtool is the primary type of storage for the collector data. Each leaf represents a datasource stored in an RRD file. Of course, several leaves may refer to different datasources within the same RRD file. Even more, more than one leaf may refer to the same datasource within an RRD file. RRD files are created and updated either by collector, or by some other external programs.

Define the targets. If you only want to collect some standard SNMP counters from network devices, there are tools called torrus genddx and torrus devdisover.

torrus genddx creates a basic discovery instructions file, and it's designed to be run only once for an initial DDX file. Further on, you etither edit the DDX files manually, or generate them by some other tools.

torrus devdisover uses the discovery instructions to explore the SNMP device capabilities and information: interface names, input/output counters, CPU and memory usage, temperature sensors, and many other vendor-specific statistics sources.

Torrus is much more than just an SNMP collector. So, when you decide to use it in a more advanced way, you will have to read the whole bit of this guide, and also Torrus XML Configuration Guide and probably some other documents too.

Build the hierarchy. By default, torrus genddx will put all your devices into one hierarchy: /Routers/<hostname>/.... The subtree name, Routers, may be changed with a command line option of torrus genddx. This program may also read the device names (or IP addresses in case if you don't use DNS) from space-delimited text files.

  torrus genddx \
    --hostfile=myrouters.txt \
    --domain=example.net \
    --community=MySecretSNMPCommunity \
    --out=myrouters.ddx \
    --discout=myrouters.xml \
    --subtree=/My_Routers \
    --datadir=/data1/torrus/collector_rrd

  torrus genddx \
    --hostfile=myswitches.txt \
    --domain=example.net \
    --community=MySecretSNMPCommunity \
    --out=myswitches.ddx \
    --discout=myswitches.xml \
    --subtree=/My_Switches \
    --datadir=/data1/torrus/collector_rrd

  torrus devdiscover  --in=myrouters.ddx

  torrus devdiscover  --in=myswitches.ddx

In the example above, the routers' and switches' names are read from myrouters.txt and myswitches.txt in the user's current directory. They form a hierarchy with two subtrees: /My_Routers/ and /My_Switches/. genddx creates the discovery instruction XML files into myrouters.ddx and myswitches.ddx accordingly. By default, you would find them in /opt/t3/t/etc/torrus/discovery/. The result of devdiscover is the Torrus configuration files: myrouters.xml and myswitches.xml, placed into /opt/t3/t/etc/torrus/xmlconfig/. The collector will place the RRD files into /data1/torrus/collector_rrd. Make sure that this directory exists, has enough free space, and is writable by torrus user.

Note: the genddx utility is designed as a one-time helper, so that you create your basic discovery instructions files from scratch. Further on, the discovery files should be maintained separately.

Another useful utility is called ttproclist. It can be used to generate a DDX file from a template and a list of SNMP hosts. It is very useful if you want to monitor many devices of similar type or function.

You can also define a bundle file in your DDX file. Genddx will create it after all devices would discovered, and it will contain <include> statements for all XML files. This makes it practical to use one XML file per SNMP host, and use the bundle file for inclusion in the tree configuration.

Add your XML files to the tree configuration. For each tree, /opt/t3/t/etc/torrus/conf/torrus-siteconfig.pl lists the XML files that have to be compiled for it. In the example above, you would add myrouters.xml and myswitches.xml into xmlfiles array in the tree configuration.

See Torrus SNMP Discovery User Guide for more details on how genddx and devdisover interact and how you can customize the discovery process.

Tip: in most cases, your hierarchy division will be different. It might be arranged by geographical locations, or by customer names. There is a configuration statement that allows you to include other XML files into configuration, thus giving you a big flexibility in building the data hierarchies.

Compile the configuration. After the XML configuration is prepared, you need to execute the compiler:

  torrus compile --tree=treename --verbose

For most of the processes that you run within Torrus, you need to specify the tree name with --tree option. Some programs accept --all option, which causes them to process all existing trees. With --verbose option, the compiler tells you about the files being processed, and about some other actions that may take quite a long time. It will also tell you if there's any error in your configuration.

Build the search database. The search database is updated by executing the following command:

  torrus bs --all --verbose

For users that are allowed to display all the trees, you can enable the global search across all trees:

  torrus acledit --addgroup=staff --permit=GlobalSearch --for='*'

Launch the collector. Assuming that compilation went smoothly, you may now launch the data collector:

  torrus collector --tree=treename

Without additional options, the collector will fork as a daemon process, and write only error messages in its log file, /opt/t3/t/var/log/collector.treename.log.

See the Torrus installation guide for details about startup scripts.

The monitor daemon is used for monitoring the thresholds in the data files. For more details, see the Torrus XML configuration guide, in the section about monitor definitions.

Define the ACLs. By default, user authentication is enabled in the web interface. You can change this by setting $Torrus::CGI::authorizeUsers = 0 in your torrus-siteconfig.pl. In order to get use of user authentication, you need to create groups and user accounts. Each user belongs to one or more groups, and each group has access to a set of datasource trees. See Torrus Installation Instructions for a basic example.

Browse with your browser. Provided that you followed the installation guide to the end, and your HTTP server is running, your Torrus hierarchy must be visible with your favorite web browser.

Configuration guidelines

In complete detail, the XML configuration is described in Torrus XML Configuration Guide. The guidelines below will help you to read that document.

Tree structure. The tree structure is defined by the structure of <subtree> and <leaf> XML elements. The rule is simple: child XML elements of a <subtree> element define the child nodes in the configuration tree.

Parameters. Each node has a number of parameters. They are defined by <param> XML element. Parameters are inherited: the child node has all its parent's parameters, some of which may be overridden.

Additive configuration. The whole XML configuration is additive. It means that you may define your subtree several times across your XML configuration, and the new parameters and child nodes will be added to previously defined ones.

Templates. Some pieces of configuration may be written as templates, and then re-used in multiple places.

Incremental compiler. Subsequent running of torrus compile will only pricess the changes in XML files, thus saving the time if the configuration has only partially changed.

For most of the processes that you run within Torrus, you need to specify the tree name with --tree option. Some programs accept --all option, which causes them to process all existing trees. With --verbose option, the compiler tells you about the files being processed, and about some other actions that may take quite a long time. It will also tell you if there's any error in your configuration.

Build the search database. The search database is updated by executing the following command:

  torrus bs --all --verbose

For users that are allowed to display all the trees, you can enable the global search across all trees:

  torrus acledit --addgroup=staff --permit=GlobalSearch --for='*'

Launch the collector. Assuming that compilation went smoothly, you may now launch the data collector:

  torrus collector --tree=treename

Without additional options, the collector will fork as a daemon process, and write only error messages in its log file, /opt/t3/t/var/log/collector.treename.log.

See the Torrus installation guide for details about startup scripts.

The monitor daemon is used for monitoring the thresholds in the data files. For more details, see the Torrus XML configuration guide, in the section about monitor definitions.

Define the ACLs. By default, user authentication is enabled in the web interface. You can change this by setting $Torrus::CGI::authorizeUsers = 0 in your torrus-siteconfig.pl. In order to get use of user authentication, you need to create groups and user accounts. Each user belongs to one or more groups, and each group has access to a set of datasource trees. See Torrus Installation Instructions for a basic example.

Browse with your browser. Provided that you followed the installation guide to the end, and your HTTP server is running, your Torrus hierarchy must be visible with your favorite web browser.

Configuration guidelines

In complete detail, the XML configuration is described in Torrus XML Configuration Guide. The guidelines below will help you to read that document.

Tree structure. The tree structure is defined by the structure of <subtree> and <leaf> XML elements. The rule is simple: child XML elements of a <subtree> element define the child nodes in the configuration tree.

Parameters. Each node has a number of parameters. They are defined by <param> XML element. Parameters are inherited: the child node has all its parent's parameters, some of which may be overridden.

Additive configuration. The whole XML configuration is additive. It means that you may define your subtree several times across your XML configuration, and the new parameters and child nodes will be added to previously defined ones.

Templates. Some pieces of configuration may be written as templates, and then re-used in multiple places.

Incremental compiler. Subsequent running of torrus compile will only process changed files, so the compilation time will be much shorter. Also the collector will pick up the changes incrementally, so that its schedule is not disturbed by re-initialization.

The configsnapshot utility generates one large XML file back from the compiled configuration. Its main purpose is backup of the configuration, but it can also be used for studying the relationships between templates and input files.

Handling SNMP errors

During SNMP discovery process, some SNMP devices may not be reachable. By default, devdiscover reports the error, and does not write the output XML file containing that device. It also skips writing the bundle files that contain the output file affected.

When devdiscover is executed with --forcebundle option, the bundle files are written, and the output files related to the unreachable devices are skipped from the bundles. This ensures that we always get a configuration that may compile and run the collector.

Another option, --fallback=DAYS, if given together with --forcebundle, tells the discovery engine to reuse old XML files if the related SNMP devices are not reachable and the files are not older than DAYS.

If an SNMP device is unreachable by the moment of the collector initialization, the collector reports the error and waits for a period of time specified in $Torrus::Collector::SNMP::unreachableRetryDelay, which is 10 minutes by default. It then tries to reach the device with the specified retry interval during some period of time, defined in $Torrus::Collector::SNMP::unreachableTimeout, by default 6 hours. If the device is not available within the specified timeout, it is excluded from collection. It would be tried again on collector initialization only (at the collector process start or after recompiling the configuration).

If a device is not reachable during the normal collector running cycle, it is retried in every collector's cycle (usually every 5 minutes), during the period defined in $Torrus::Collector::SNMP::unreachableTimeout. It will be then excluded from configuration after the timeout.

If a device hardware configuration changes after the devdiscover execution, the collector may not find some values in SNMP tables, such as interface names in ifTable. It then excludes such datasources from collection immediately.

Tips and tricks

Comments, descriptions, and legends

torrus devdiscover will extract some useful information from your SNMP devices, and place it in the XML configuration:

Grouping the datasources alternatively

In most cases, you would want to have several different groupings of your datasources.

For instance, the default devdiscover gives only one level of freedom: the subtree name above the host level. It's reasonable to use this name for grouping by geographical location . Thus, the hierarchy would be characterised as /[location]/[hostname]/[interface]/[counter].

Let's say you would like to have alternative grouping, such as:

Torrus provides two different ways for organising your datasources:

Automating XML generation

It is quite common task that you want Torrus to monitor certain set of devices, and devdiscover does not (yet) support them. Of course, it's quite a pain to maintain a manually written XML file, especially if the there are more than one devices of the same type.

In such case you may benefit from the approach suggested by Christian Schnidrig:

Imagine you have 50 gizmos which are able to speak SNMP and which you would like to put into some Torrus tree structure. A good designer's approach would be to keep the data and the presentation separately. In addition, changing the presentation once would produce 50 changes accordingly. To do that, let's create two files: gizmos.data and gizmos.tmpl. The first one would contain data about our devices:

    [%
      gizmos = [
        {
          name => 'atwork'
          color => 'blue',
          location => 'Javastrasse 2, 8604 Hegnau'
          description => 'My gizmo @ Sun'
          community => 'blabla',
          hands => [
              {name => 'Left'}
              {name => 'Right'}
            ],
        }
        {
          name => 'athome'
          color => 'gray',
          location => 'Riedstrasse 120, 8604 Hegnau'
          description => 'My gizmo @ Home'
          community => 'blabla',
          hands => [
              {name => 'Upper'}
              {name => 'Lower'}
            ],
        }
      ]

    %]

Then gizmos.tmpl would contain the XML template that would produce the Torrus configuration file:

    [% PROCESS $data %]
    <?xml version="1.0"?>
    <configuration>
      <datasources>
        <subtree name="SNMP">
          <subtree name="Gizmos">
          [% FOREACH g = gizmos %]
          <!-- ******************************************************* -->
          <!-- [% g.name %] -->
          <subtree name="[% g.color %]">
              <alias>/ByName/[% g.name %]/</alias>

              <param name="snmp-community"  value="[% g.community %]" />
              <param name="comment"         value="[% g.description %]" />
              <param name="snmp-host"       value="[% g.name %]" />
              <param name="legend">
                Description: [% g.description %]
                Location:    [% g.location %]
              </param>

              [% FOREACH h=$g.hands %]
              <leaf name="[% h.name %]Hand">
                <!-- do something, my fantasy exhausted here -->
              </leaf>
          </subtree>
          [% END %]
        </subtree>
      </subtree>
    </datasources>
    </configuration>

See xmlconfig/examples/servers.data and xmlconfig/examples/servers.tmpl for a more useful example of the described approach.

At the end, you will generate the Torrus config with the tpage utility, which is the standard part of Template-Toolkit package:

  tpage --define data=gizmos.data gizmos.tmpl > gizmos.xml

Several Torrus instances on one server

Sometimes it is necessary to have a separate instance of Torrus for testing purposes on the same server as the production installation. In the example below, a completely autonomous installation of Torrus is installed in /usr/testtorrus directory on a FreeBSD system.

Watching the collector failures

There is a script in Torrus distribution in examples/rrdup_notify.sh, which provides a simple way of telling if the collector runs right: it checks the modification time of RRD files, and if any file is older than given threshold, it sends an e-mail warning.

Copy the script file to some place in your system and edit it so that it fits your requirements: you might want to change the maximum age parameter (default is 1 hour), the notification e-mail address, and the directory paths where to look for RRD files. Then chmod it so that it's executable, and add it to crontab. Depending on your operation requirements, it might run every hour, or few times a day, or even at business hours only.

The script writes the number of aged files in the e-mail subject, and lists the file names in the body. In case of relatively large installation, you might want to amend the script, in order to avoid too large email messages.

Viewing external RRD files

Some external program may create its own RRD files, and you may want to display and monitor them in Torrus.

Also some collector-generated RRDs may become outdated -- for example, after a module is removed from a router, and the interface counters not being updated any more.

The easiest way to use such files would be to utilize the torrus rrddir2xml command. It generates the XML configuration file that represents all RRD files found in a given directory. It can also scan the directory recursively.

See also few examples in Torrus distribution. There are some templates for use with Smokeping, OpenNMS, and Flowscan.

Torrus usage scenarios

Scenario 1. Netflow Traffic Analyser

Cisco routers are capable of exporting the traffic statistics data in Netflow UDP packets.

A cflowd or flow-tools daemon collects Netflow packets into flow files.

FlowScan software analyses the flow files and stores the statistics into numerous RRD files.

Torrus is used to monitor the thresholds and diplay the graphs in convenient form.

Scenario 2. Backbone Traffic Statistics

CiscoWorks2000 or NMSTOOLS software is used to provide the list of all devices in the network.

Torrus's devdiscover buids the XML configuration to monitor the router interfaces, CPU and memory usage, and temperature sensors.

Data importing scripts generate configuration for alternative grouping of the datasources: by location; by customer connection; by device type; by service type; etc...

Troubleshooting guidelines

SNMP Error: Received tooBig(1)

For some devices, the collector may issue the following error messages:

 [27-May-2004 10:15:17*] SNMP Error for XX.XX.XX.XX:161:public: Received
 tooBig(1) error-status at error-index 0

For better performance, SNMP collector sends several SNMP requests in one UDP datagram. The SNMP agent then tries to send the reply to all requests in a single datagram, and this error indicates the failure. In most cases, this is caused by the agent software limitations or bugs.

The number of requests per datagram is controlled by the parameter snmp-oids-per-pdu, and it may be set in the discovery input XML or in Torrus configuration XML. The default value is 40, and setting it to 10 generally works.

Author

Copyright (c) 2002-2017 Stanislav Sinyagin <ssinyagin@k-open.com>