Overview

Sometimes, things go wrong in Riak. How can you know what's wrong? Riaknostic is here to help.

$ riak-admin diag
15:34:52.736 [warning] Riak crashed at Wed, 07 Dec 2011 21:47:50 GMT, leaving crash dump in /srv/riak/log/erl_crash.dump. Please inspect or remove the file.
15:34:52.736 [notice] Data directory /srv/riak/data/bitcask is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve performance.

Riaknostic, which is invoked via the above command, is a small suite of diagnostic checks that can be run against your Riak node to discover common problems and recommend how to resolve them. These checks are derived from the experience of the Basho Client Services Team as well as numerous public discussions on the mailing list, IRC room, and other online media.

Installation

After downloading the package, expand it in the directory below according to your platform:

PlatformDirectory
Linux (Redhat, CentOS, Debian, Ubuntu) /usr/lib/riak/lib
Linux (Fedora) /usr/lib64/riak/lib
Solaris, OpenSolaris /opt/riak/lib
Mac OS/X or Self-built $RIAK/lib (where $RIAK=rel/riak for source installs, or the directory where you unpacked the package)

For example, on Linux, I might do this:

$ wget https://github.com/basho/riaknostic/downloads/riaknostic-1.0.2.tar.gz -P /tmp
$ cd /usr/lib/riak/lib
$ sudo tar xzvf /tmp/riaknostic-1.0.2.tar.gz

The package will expand to a riaknostic/ directory which contains the riaknostic script, source code in the src/ directory and documentation. Now try it out!

Usage

For most cases, you can just run the riak-admin diag command as given at the top of the page. However, sometimes you might want to know some extra detail or run only specific checks. For that, there are command-line options. Add --help to get the options:

$ riak-admin diag --help
Usage: riak-admin diag [-d <level>] [-l] [-h] [check_name ...]

  -d, --level		Minimum message severity level (default: notice)
  -l, --list		Describe available diagnostic tasks
  -h, --help		Display help/usage
  check_name		A specific check to run

To get an idea of what checks will be run, use the --list option:

$ riak-admin diag --list
Available diagnostic checks:

  disk                 Data directory permissions and atime
  dumps                Find crash dumps
  memory_use           Measure memory usage
  nodes_connected      Cluster node liveness
  ring_membership      Cluster membership validity
  ring_size            Ring size valid

If you want all the gory details about what Riaknostic is doing, you can run the checks at a more verbose logging level with the --level option:

$ riak-admin diag --level debug
18:34:19.708 [debug] Lager installed handler lager_console_backend into lager_event
18:34:19.720 [debug] Lager installed handler error_logger_lager_h into error_logger
18:34:19.720 [info] Application lager started on node nonode@nohost
18:34:20.736 [debug] Not connected to the local Riak node, trying to connect. alive:false connect_failed:undefined
18:34:20.737 [debug] Starting distributed Erlang.
18:34:20.740 [debug] Supervisor net_sup started erl_epmd:start_link() at pid <0.42.0>
18:34:20.742 [debug] Supervisor net_sup started auth:start_link() at pid <0.43.0>
18:34:20.771 [debug] Supervisor net_sup started net_kernel:start_link(['riak_diag87813@127.0.0.1',longnames]) at pid <0.44.0>
18:34:20.771 [debug] Supervisor kernel_sup started erl_distribution:start_link(['riak_diag87813@127.0.0.1',longnames]) at pid <0.41.0>
18:34:20.781 [debug] Supervisor inet_gethost_native_sup started undefined at pid <0.49.0>
18:34:20.782 [debug] Supervisor kernel_safe_sup started inet_gethost_native:start_link() at pid <0.48.0>
18:34:20.834 [debug] Connected to local Riak node 'riak@127.0.0.1'.
18:34:20.939 [debug] Local RPC: os:getpid([]) [5000]
18:34:20.939 [debug] Running shell command: ps -o pmem,rss,command -p 83144
18:34:20.946 [debug] Shell command output: 
%MEM    RSS COMMAND
 0.4  31004 /srv/riak/erts-5.8.4/bin/beam.smp -K true -A 64 -W w -- -root /srv/riak/rel/riak -progname riak -- -home /Users/sean -- -boot /srv/riak/releases/1.0.2/riak -embedded -config /srv/riak/etc/app.config -name riak@127.0.0.1 -setcookie riak -- console

18:34:20.960 [warning] Riak crashed at Wed, 07 Dec 2011 21:47:50 GMT, leaving crash dump in /srv/riak/log/erl_crash.dump. Please inspect or remove the file.
18:34:20.961 [notice] Data directory /srv/riak/data/bitcask is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve performance.
18:34:20.961 [info] Riak process is using 0.4% of available RAM, totalling 31004 KB of real memory.

Most times you'll want to use the defaults, but any Syslog severity name will do (from most to least verbose): debug, info, notice, warning, error, critical, alert, emergency.

Finally, if you want to run just a single diagnostic or a list of specific ones, you can pass their name(s):

$ riak-admin diag dumps
18:41:24.083 [warning] Riak crashed at Wed, 07 Dec 2011 21:47:50 GMT, leaving crash dump in /srv/riak/log/erl_crash.dump. Please inspect or remove the file.

Contributing

Have an idea for a diagnostic? Want to improve the way Riaknostic works? Fork the github repository and send us a pull-request with your changes! The code is documented with edoc, so give the API Docs a read before you contribute.

If you want to run the riaknostic script while developing and you don't have it hooked up to your local Riak, you can invoke it directly like so:

$ ./riaknostic --etc ~/code/riak/rel/riak/etc --base ~/code/riak/rel/riak --user `whoami` [other options]

Those extra options are usually assigned by the riak-admin script for you, but here's how to set them:

--etc Where your Riak configuration directory is, in the example above it's in the generated directory of a source checkout of Riak.
--base The "base" directory of Riak, usually the root of the generated directory or /usr/lib/riak on Linux, for example. Scan the riak-admin script for how the RUNNER_BASE_DIR variable is assigned on your platform.
--user What user/UID the Riak node runs as. In a source checkout, it's the current user, on most systems, it's riak.