This specification lives at github.com/metrics20/spec, that’s where change requests can be made
With the end goal of tooling interoperability, correctness and being more user friendly. see http://metrics20.org/ for more details.
It does not dictate transport protocols or storage mechanisms (except it imposes minimum requirements to support the spec), since that’s an area in heavy flux and spans a broad technical spectrum where varying tradeoffs make sense (e.g. simplicity vs high performance), though the metrics2.0 project and website also aims to bring projects together under shared implementations and formats (see http://metrics20.org/implementations/)
Tag key | use |
---|---|
host | physical or virtual machine |
http_method | the http method. like PUT, GET, etc. |
http_code | 200, 404, etc |
device | block device, network device, … |
unit | the unit something is expressed in (b/s, MB, etc). See below. |
what | the thing being measured, if the other tags don’t suffice. often same as metric key. |
type | further describe the metric. type is a very generic word, only use it if you really don’t know anything better. |
result | values: ok, fail, … (for http requests, http_code is probably more useful) |
stat | to clarify the statistical view |
bin_max | if your metrics are separated into bins by some numeric value, upper limit of a bin (like (statsd) histograms) |
direction | in/out (not ‘tx’ or ‘rx’, more consistent) |
mtype | type of metric in terms of how the data should be interpreted. See below. |
unit | in what is the magnititude being measured. see below |
file | file (that generated a metric) |
line | line (that generated a metric) |
env | environment |
Value | Meaning |
---|---|
_sum_ |
represents the sum of all other (would-be) metrics summed across this tag. ( equivalence) |
_avg_ |
represents the avg of all other (would-be) metrics averaged across this tag. (equivalence) |
Unit | Meaning |
---|---|
s | second (time) |
Hz | frequency (1/s) |
For the full listing, see the SI website
The most common ones are in the table below:
Unit | Meaning |
---|---|
n | nano, 10^-9 |
μ | micro, 10^-6 |
m | milli, 10^-3 |
c | centi, 10^-2 |
d | deci, 10^-1 |
k | kilo, 10^3 |
M | mega, 10^6 |
G | giga, 10^9 |
T | tera, 10^12 |
P | peta, 10^15 |
Ki | kibi 1024 |
Mi | mebi, 1024^2 |
Gi | gibi, 1024^3 |
Ti | tebi, 1024^4 |
Pi | pebi, 1024^5 |
Ei | exbi, 1024^6 |
Symbol | Meaning |
---|---|
b | bit |
B | byte |
M | minute (strftime) |
h | hour (strftime) |
d | day (strftime) |
w | week (strftime) |
mo | month (not ’m’ like in strftime because that would be SI conflict) |
err | errors |
warn | warnings |
conn | connections |
event | events (TCP events etc) |
ino | inodes |
email messages | |
jiff | jiffies (i.e. for cpu usage) |
job | job (as in job queue) |
file | (not ‘F’ that’s farad) |
load | cpu load |
metric | a metric line like in the statsd or graphite protocol |
msg | message (like in message queues) |
P | probability (between 0 and 1) |
page | page (as in memory segment) |
pckt | network packet |
process | process |
req | http requests, database queries, etc |
sock | sockets |
thread | thread |
ticket | upload tickets, kerberos tickets, .. |
Any combination of a prefix with any of the unit is supported. I.e. kHz, MB/s, etc.
Note that out of consistency, and for clarity ‘Mb/s’ should be used instead of ‘Mbps’, and so forth for similar network metrics. 1
Symbol | Meaning |
---|---|
min | lowest value seen |
max | highest value seen |
mean | standard mean |
std | standard deviation |
*_NUM | the NUM percentile of the stat |
Symbol | Meaning |
---|---|
rate | a number per second (implies that unit ends on ‘/s’) |
count | a number per a given interval (such as a statsd flushInterval) |
gauge | values at each point in time |
counter | keeps increasing over time (but might wrap/reset at some point) i.e. a gauge with the added notion of “i usually want to derive this to see the rate” |
timestamp | value represents a unix timestamp. so basically a gauge or counter but we know we can also render the “age” at each point. |
This comes from the structured_metrics toolkit which upgrades a metric from the traditional form:
stats.timers.dfsproxy1.proxy-server.object.GET.206.timing.upper_90
Into: 2
{
what=response_time
http_code=206
http_method=GET
host=dfsproxy1
service=proxy-server
stat=upper_90
swift_type=object
target_type=gauge
unit=ms
}
A hypothetical monitoring agent “diamond2” could submit native metrics 2.0 to track used disk space on a given mountpoint (file system) on a given server, like so: 3
{
mountpoint=/srv/node/dfs3
what=disk_space
host=dfs4
target_type=gauge
type=used
unit=B
}
meta: {
agent=diamond2
}
A hypothetical storage system could hence use something like this as the id for the corresponding series:
id=mountpoint=/srv/node/dfs3,what=disk_space,host=dfs4,target_type=gauge,type=used,unit=B
Note that if we switch to a different agent, the id will stay the same because meta tags are not used to generate the identifier for the storage system.
target_type
was the old name for mtype
, still used by tools such as structured_metrics and graph-explorer. They should be updated. They also still use ‘server’ instead of ‘host’, and lower/upper instead of min/max.
[return]target_type
was the old name for mtype
, still used by tools such as structured_metrics and graph-explorer. They should be updated. They also still use ‘server’ instead of ‘host’, and lower/upper instead of min/max.
[return]