An emerging set of conventions, standards and concepts around timeseries metrics metadata
We have pretty good timeseries collection agents, storage and dashboards.
But linking a timeseries to nothing more than a string "name" and maybe a few tags
without further metadata is profoundly limiting us. Especially when they're not standardized and missing information.
Metrics 2.0 aims for self-describing, standardized metrics using orthogonal tags for every dimension.
"metrics" being the pieces of information that point to, and describe timeseries of data.
collectd.dfs1.df.srv-node-dfs10.df-complex.used
diskspace._srv_node_dfs10.byte_used { host: dfs1 }
{ host: dfs1 what: diskspace mountpoint: srv/node/dfs10 unit: B type: used metric_type: gauge } meta: { agent: diamond, processed_by: statsd2 }
If you have a handful of metrics, you don't need to think about this and can stick with simple names for your metrics.
However, as we grow our number of metrics and/or want to make more sense out of them, we need to be more systematic.
Here are the reasons, concepts and their benefits
Generating timeseries metrics is easy.
Add a statsd call to an app, write to graphite from a cron, or add a plugin to your monitoring agent. Give it a name and done! Or plug in another monitoring system of your choice.
Not so fast!
How often will someone need this data in visualizations, processing or alerting? How quick will they find it? How often will somebody wonder what the metric means? So much information about the metric is available when adding the metric, yet gets dumbed down or not included.
Diving in code and asking around when all you want to do is graph data you know you have, is cumbersome.
Metrics 2.0 aims to retain all information so that metrics self-describe themselves and are more easily found and understood.
Standardization enables compatibility between tools, easier searching for metrics, automatic data conversion and more. Imagine the ability to swap out monitoring agents without hassle or a dashboard automatically processing data to be in the requested unit.
Being limited to strings for metric identifiers (even when modeled in a tree like graphite) can be very limiting when trying to use several different metrics in the same information need. There is simply no way to organize an entire tree of metrics from different apps and environments in a way that's optimal and allows all correlations and aggregations. You can't even predict all correlations that somebody might want to do in the future. Some systems add tags, which help, but only for a handful of properties, which diminishes their value. A user should be able to correlate on, and aggregate across any chosen dimension[s], and the only way to enable this, is by using orthogonal dimensions, i.e. a independent tag key/value pairs.
That dashboard apps can automatically provide dashboards (for example, a mysql dashboard for every host by looking at host=X and service=mysql). That you can search for metrics on any of their attributes, that you can group them into graphs or aggregate them by any given dimensions (tag keys).
Information that comprises the metric identifier, is what we call intrinsic properties. Change a value and you get a different series name, nothing new there. Sometimes we want to include information about a metric that's not part of the metric identity. Extrinsic properties allow to include information about a metric (in the network proto, in the db, etc), which might change, without changing the metric identifier. I.e. it's metadata. Some examples: You can include the source of a metric (filename and line number), so you know who to contact in case the metric source goes berserk. You can include which agent the data is coming from, without being forced to recreate graphs when you switch to a different agent. You can even include comments if the tags are not sufficient.
If metrics describe themselves, and do so in a standardized way, does that mean that advanced dashboards and processing engines can leverage this to build visualizations or alerting rules automatically? Yes it does! Graph-Explorer is an example dashboarding tool (for graphite) that does this: For any given information need, expressed as a query, it will automatically generate graph definitions and alerting rules, it takes care of fetching the metrics matching given conditions, doing aggregations, grouping and processing (such as unit conversion, scaling, deriving/integrating). It can even automatically fill in the graph title, y-axis labels and legend entries and apply sensible coloring, by looking at the metric tags and which of them they have in common on a graph vs which set them apart from each other.
There's some more interesting benefits, related to rollups, correctness of visualizations. Check out media to learn more.