How to control metric aggregation

A metric is a series of key, value, tag set, timestamp measurements. Using these measurements you can track the timings, rates, counts, etc whose individual measurements all fit the metric model. When you want to combine individual measurements across the dimensions of either time or space, the semantics diverge. As such we’ve provided fine-grained controls that enable you to control aggregation on each dimension independently.

Summarization over Time

As your measurements age we automatically summarize them into single measurements covering larger periods of time. You can control the function used to display a metric’s measurement over time through the metric’s attributes.

metric_aggregation

Aggregation across Data Streams

When the metric you are reporting represents a large number of data streams, it’s often useful to aggregate the same metric across data streams into a single representation. We provide a mechanism for this through our wildcard support. When adding the metric to a chart, specify under “Graph as” what aggregation methodology you want to use. When you select a tag and use a * character to match multiple streams, the streams will be aggregated with the methodology you chose.

metric-aggregation-filter

Example Use Cases

That’s all well and good, but what do these features mean for your data? Here’s a set of broad characterizations around the types of metrics you might be tracking in gauges and how you should accordingly configure these options.

Timings

A timing is a metric that tracks on average (or percentile) how long it takes to perform a given task e.g. process an application request or process a background job. As we summarize across time our primary interest is still the characteristics of a single measurement. Therefore you should leave summarization on the default of average and also combine wildcard tag values with average for metrics with these characteristics.

Counts

A count is a metric that tracks how many times an event has occurred, e.g. jobs processed. There are typically two types of counters: one where you are tracking an absolute counter value (monotonically increasing) and one where you are counting something where you only know an incremental count at a given time. With counters our primary interest is the absolute total number of events. Therefore you should use sum as the summarization over time and also aggregate data streams with sum.

For metrics that report the absolute value of a counter we recommend using the composite derive() function to highlight change over time.

Rates

A rate is a metric that tracks how many times an event occurs in a given period of time e.g. requests/minute or jobs processed/minute. As we summarize across time we want to preserve the unit of time the rate is measured over, while we summarize over multiple streams of tag sets. Therefore, to summarize a rate you should graph as the average.

You can calculate a rate over any interval other than a minute with a little more configuration. To begin with you ideally should always report rates that align with the reporting interval. In other words if you submit a measurement every 30 seconds, it should represent a 30 second rate. Likewise if you report a measurement every 17 seconds, it should represent a 17 second rate.

Once you are reporting an aligned rate, first change the metric’s summarization to sum and then use the display transform to calculate any rate you want. The display transform is a linear function that is run on every measurement prior to rendering it on the screen. There are two variables you can use in a display transform, x (the current measurement value) and p (the number of seconds in the current resolution). So setting a display transform of x/p will always display a 1 second rate. To calculate any higher rate, simply multiply p by the given interval.  For example, to calculate a 10 second rate use a display transform of: x/p*10