Troubleshooting

If you run into any issue installing the Host Agent or getting metrics reported, please check the troubleshooting techniques below.


Restart the agent

The agent supports standard service control commands including status, start, stop and restart. For example, you can restart the agent by running:

sudo service appoptics-snapteld restart

View the agent log

The agent log file is located at /var/log/appoptics/snapteld.log. By default only messages at or above the warning level are reported. To increase logging verbosity:

  • Set the log level to 1 (debug) in the agent config file

  • Restart the agent

  • Check for new messages in the log file, for example:

    tail -f /var/log/appoptics/snapteld.log
    

Check the loaded plugins

The Host Agent includes the snaptel command line tool to interact with the snap daemon on which our agent is based. Out of the box, our agent will automatically load two plugins that enable collecting system metrics (aosystem) and publishing to AppOptics (publisher-appoptics). To check that they are loaded, you can run snaptel plugin list and confirm they are listed as loaded under the STATUS column:

$ snaptel plugin list

NAME                  VERSION         TYPE            SIGNED          STATUS          LOADED TIME
aosystem              20              collector       false           loaded          Thu, 23 Nov 2017 18:34:54 UTC
publisher-appoptics   5               publisher       false           loaded          Thu, 23 Nov 2017 18:34:54 UTC

Check the state of tasks

Similar to checking loaded plugins, you can use the snaptel command line tool to check the state of tasks, which define the metrics collection and publishing jobs run by the agent. Out of the box, our agent will automatically define a task to report system metrics continuously every minute.

  • Use snaptel task list to get a list:

    $ snaptel task list
    ID                                           NAME                                            STATE           HIT     MISS    FAIL    CREATED                 LAST FAILURE
    53c0afb1-1e47-471f-b1c0-af69207842eb         Task-53c0afb1-1e47-471f-b1c0-af69207842eb       Running         5       0       0       9:45PM 11-23-2017
    193e1a04-4c9b-495e-a9de-06862c57b092         Task-193e1a04-4c9b-495e-a9de-06862c57b092       Ended           10      0       0       9:45PM 11-23-2017
    
  • The above output shows a running task. To further confirm that it is the one reporting system metrics, you can either use the snaptel task export <task id> command to print to console the task details, or use the snaptel task watch <task id> command which logs to console the metrics being gathered at each task interval. An example of the watch command:

    $ snaptel task watch 53c0afb1-1e47-471f-b1c0-af69207842eb
    
    Watching Task (53c0afb1-1e47-471f-b1c0-af69207842eb):
    NAMESPACE                   DATA                    TIMESTAMP
    /system/cpu/guest           0                       2017-11-23 21:58:00.004377321 +0000 UTC
    /system/cpu/idle            99.5330998828381        2017-11-23 21:58:00.004382944 +0000 UTC
    ...
    (ctl-c to quit)
    

Run the plugin directly

If you’re experiencing issues with a specific integration and the agent logs are not providing much help, you can also run the binary for the plugin independently of the snaptel service to attempt a collection. This could reveal errors or permission issues that are being obscured.

$ sudo -u appoptics /opt/appoptics/bin/<plugin_binary> --config '<config to use in JSON format>'

For example, the following reveals a config isue with the consul plugin. The service is running on port 8500 on the host, but the config is looking at port 80.

$ sudo -u appoptics /opt/appoptics/bin/snap-plugin-collector-bridge-consul --config '{"address": "localhost"}'

...

Config Policy:
NAMESPACE          KEY                     TYPE      REQUIRED      DEFAULT    MINIMUM    MAXIMUM
bridge.consul      datacentre              string    false
bridge.consul      ssl_key                 string    false
bridge.consul      token                   string    false
bridge.consul      password                string    false
bridge.consul      address                 string    false
bridge.consul      username                string    false
bridge.consul      ssl_ca                  string    false
bridge.consul      scheme                  string    false
bridge.consul      ssl_cert                string    false
bridge.consul      insecure_skip_verify    bool      false         false

printConfigPolicy took 3.238904ms

2017/12/13 13:05:49 Bridge.init: configured telegraf input consul
Metric catalog will be updated to include:
    Namespace: /consul/*/all

printMetricTypes took 446.017µs

2017/12/13 13:05:49 Error gathering /consul/*/all: Get http://localhost/v1/health/state/any: dial tcp 127.0.0.1:80:      getsockopt: connection refused
Metrics that can be collected right now are:

...

Fixing the config resolves the issue, and shows a successful collection.

$ sudo -u appoptics /opt/appoptics/bin/snap-plugin-collector-bridge-consul --config '{"address": "localhost:8500"}'

...

Config Policy:
NAMESPACE          KEY                     TYPE      REQUIRED      DEFAULT    MINIMUM    MAXIMUM
bridge.consul      datacentre              string    false
bridge.consul      ssl_key                 string    false
bridge.consul      token                   string    false
bridge.consul      password                string    false
bridge.consul      address                 string    false
bridge.consul      username                string    false
bridge.consul      ssl_ca                  string    false
bridge.consul      scheme                  string    false
bridge.consul      ssl_cert                string    false
bridge.consul      insecure_skip_verify    bool      false         false

printConfigPolicy took 2.408139ms

2017/12/13 13:12:31 Bridge.init: configured telegraf input consul
Metric catalog will be updated to include:
Namespace: /consul/*/all

printMetricTypes took 314.801µs

Metrics that can be collected right now are:
    Namespace: /consul/consul_health_checks/service_id  Type: string      Value:
    Namespace: /consul/consul_health_checks/status      Type: string      Value: passing
    Namespace: /consul/consul_health_checks/passing     Type: int         Value: 1
    Namespace: /consul/consul_health_checks/critical    Type: int         Value: 0
    Namespace: /consul/consul_health_checks/warning     Type: int         Value: 0
    Namespace: /consul/consul_health_checks/check_name  Type: string      Value: Serf Health Status

printCollectMetrics took 2.14235ms

...