Thursday, April 18, 2024

Timestream live dashboards in Grafana

 Amazon Timestream is a strange beast.  It is a time series "database" but under the hood it behaves like a bunch of flat files that are partitioned by time and something else.  That something could be the measurement name (this is the default) but it could also be dimension that you specify.  You better specify them carefully, though, because once the table is created you can't ever change them.

Partitioning is automatic.  If you insert data as time, measurement_name, value, then at the beginning all your rows will go into a few (maybe one) partitions.  As your data grows, partitions get larger and eventually new partitions are created.  Distinct measurements can be within a single partition or spread across many partitions.  There is no way to really tell how this works - under the hood, partitions may be split (but they probably aren't) or new partitions might be appended and indexed.  It keeps track of what measurements and time ranges are within each partition using an index.  This mean that certain queries, like asking for data for a specific measurement and time range, are faster - it can figure out what blocks to look in using the index.  But you pay for each gigabyte scanned.  If you ask for every measurement for a short time range, you can't really predict how many partitions will be scanned.

Amazon tells you to trust them - they will make new partitions when needed in order to minimize query costs.  In Grafana, though, you still have to be very careful.  In the case where you have only a few measurements, then a given time range (say the last 15 minutes) might be in a single partition, with lots of measure_names in it.  If you query them one by one, then you might be paying to scan the same data multiple times, and paying $0.01/GB scanned every time.

In my case, measure_name has pretty low cardinality - there are maybe 50 distinct measurements.  What that means is that my partitions are likely by time only.  For queries that get the most recent data - for example the one below - that isn't necessarily a good thing.


SELECT DISTINCT
       measure_name,
       COALESCE(
           FIRST_VALUE(measure_value::double) OVER (
               PARTITION BY measure_name
               ORDER BY time DESC
               ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
           ),
           CASE WHEN FIRST_VALUE(measure_value::boolean) OVER (
               PARTITION BY measure_name
               ORDER BY time DESC
               ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
           ) THEN 1 ELSE 0 END
       ) AS "value",
       FIRST_VALUE(time) OVER (
           PARTITION BY measure_name
           ORDER BY time DESC
           ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
       ) AS "last update"
FROM $__database.$__table
WHERE time >= ago(5m)
ORDER BY measure_name


I thought I could put this query into a table, hide the table, then have 20 individual gauges refer to its data using the table visualization as the datasource for the remaining dials and gauges.  Turns out that's not how Grafana works.  Instead of fetching the data once, it runs this query once per gauge.  The result is that my development (of the dashboards) is costing me $0.20/hr in timestream queries.  Not a huge deal, but still kind of annoying.

What Grafana really needs is the concept of a 'dashboard query' that can be a datasource for multiple visualizations on the dashboard.  I'll be putting this in as a feature request soon.

The point is this: Timestream works very well with Grafana, but construct your dashboards and queries very, very carefully.


No comments:

Post a Comment