Overview
Sometimes, things go wrong in Riak. How can you know what's wrong? Riaknostic is here to help.
$ riak-admin diag 15:34:52.736 [warning] Riak crashed at Wed, 07 Dec 2011 21:47:50 GMT, leaving crash dump in /srv/riak/log/erl_crash.dump. Please inspect or remove the file. 15:34:52.736 [notice] Data directory /srv/riak/data/bitcask is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve performance.
Riaknostic, which is invoked via the above command, is a small suite of diagnostic checks that can be run against your Riak node to discover common problems and recommend how to resolve them. These checks are derived from the experience of the Basho Client Services Team as well as numerous public discussions on the mailing list, IRC room, and other online media.
Installation
After downloading the package, expand it in the directory below according to your platform:
Platform | Directory |
---|---|
Linux (Redhat, CentOS, Debian, Ubuntu) | /usr/lib/riak/lib |
Linux (Fedora) | /usr/lib64/riak/lib |
Solaris, OpenSolaris | /opt/riak/lib |
Mac OS/X or Self-built | $RIAK/lib
(where $RIAK=rel/riak for source installs,
or the directory where you unpacked the package) |
For example, on Linux, I might do this:
$ wget https://github.com/basho/riaknostic/downloads/riaknostic-1.0.2.tar.gz -P /tmp $ cd /usr/lib/riak/lib $ sudo tar xzvf /tmp/riaknostic-1.0.2.tar.gz
The package will expand to a riaknostic/
directory which contains the riaknostic
script,
source code in the src/
directory and
documentation. Now try it out!
Usage
For most cases, you can just run the riak-admin
diag
command as given at the top of the
page. However, sometimes you might want to know some extra
detail or run only specific checks. For that, there are
command-line options. Add --help
to get the options:
$ riak-admin diag --help Usage: riak-admin diag [-d <level>] [-l] [-h] [check_name ...] -d, --level Minimum message severity level (default: notice) -l, --list Describe available diagnostic tasks -h, --help Display help/usage check_name A specific check to run
To get an idea of what checks will be run, use
the --list
option:
$ riak-admin diag --list Available diagnostic checks: disk Data directory permissions and atime dumps Find crash dumps memory_use Measure memory usage nodes_connected Cluster node liveness ring_membership Cluster membership validity ring_size Ring size valid
If you want all the gory details about what Riaknostic is
doing, you can run the checks at a more verbose logging
level with the --level
option:
$ riak-admin diag --level debug 18:34:19.708 [debug] Lager installed handler lager_console_backend into lager_event 18:34:19.720 [debug] Lager installed handler error_logger_lager_h into error_logger 18:34:19.720 [info] Application lager started on node nonode@nohost 18:34:20.736 [debug] Not connected to the local Riak node, trying to connect. alive:false connect_failed:undefined 18:34:20.737 [debug] Starting distributed Erlang. 18:34:20.740 [debug] Supervisor net_sup started erl_epmd:start_link() at pid <0.42.0> 18:34:20.742 [debug] Supervisor net_sup started auth:start_link() at pid <0.43.0> 18:34:20.771 [debug] Supervisor net_sup started net_kernel:start_link(['riak_diag87813@127.0.0.1',longnames]) at pid <0.44.0> 18:34:20.771 [debug] Supervisor kernel_sup started erl_distribution:start_link(['riak_diag87813@127.0.0.1',longnames]) at pid <0.41.0> 18:34:20.781 [debug] Supervisor inet_gethost_native_sup started undefined at pid <0.49.0> 18:34:20.782 [debug] Supervisor kernel_safe_sup started inet_gethost_native:start_link() at pid <0.48.0> 18:34:20.834 [debug] Connected to local Riak node 'riak@127.0.0.1'. 18:34:20.939 [debug] Local RPC: os:getpid([]) [5000] 18:34:20.939 [debug] Running shell command: ps -o pmem,rss,command -p 83144 18:34:20.946 [debug] Shell command output: %MEM RSS COMMAND 0.4 31004 /srv/riak/erts-5.8.4/bin/beam.smp -K true -A 64 -W w -- -root /srv/riak/rel/riak -progname riak -- -home /Users/sean -- -boot /srv/riak/releases/1.0.2/riak -embedded -config /srv/riak/etc/app.config -name riak@127.0.0.1 -setcookie riak -- console 18:34:20.960 [warning] Riak crashed at Wed, 07 Dec 2011 21:47:50 GMT, leaving crash dump in /srv/riak/log/erl_crash.dump. Please inspect or remove the file. 18:34:20.961 [notice] Data directory /srv/riak/data/bitcask is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve performance. 18:34:20.961 [info] Riak process is using 0.4% of available RAM, totalling 31004 KB of real memory.
Most times you'll want to use the defaults, but any
Syslog severity name will do (from most to least
verbose): debug, info, notice, warning, error,
critical, alert, emergency
.
Finally, if you want to run just a single diagnostic or a list of specific ones, you can pass their name(s):
$ riak-admin diag dumps 18:41:24.083 [warning] Riak crashed at Wed, 07 Dec 2011 21:47:50 GMT, leaving crash dump in /srv/riak/log/erl_crash.dump. Please inspect or remove the file.
Contributing
Have an idea for a diagnostic? Want to improve the way
Riaknostic works? Fork
the github
repository and send us a pull-request with your
changes! The code is documented with edoc
,
so give the API Docs a
read before you contribute.
If you want to run the riaknostic
script
while developing and you don't have it hooked up to your
local Riak, you can invoke it directly like so:
$ ./riaknostic --etc ~/code/riak/rel/riak/etc --base ~/code/riak/rel/riak --user `whoami` [other options]
Those extra options are usually assigned by
the riak-admin
script for you, but here's
how to set them:
--etc |
Where your Riak configuration directory is, in the example above it's in the generated directory of a source checkout of Riak. |
--base |
The "base" directory of Riak, usually the root of
the generated directory
or /usr/lib/riak on Linux, for
example. Scan the riak-admin script for
how the RUNNER_BASE_DIR variable is
assigned on your platform. |
--user |
What user/UID the Riak node runs as. In a source
checkout, it's the current user, on most systems,
it's riak . |