Friday, December 17, 2021

Relative solar panel power output in Flux

The problem: I want to display solar generation data as a percentage of the max output each individual inverter has generated over its lifetime.  That will let me more easily compare panels of widely varying power outputs, as we have some huge arrays (2 megawatts) and some tiny ones (1000-2000 watts).

In an SQL like query language what I'm looking for is something like this:

SELECT uuid, _time, W / MAX(W) ...

but how do you do that with flux?  The challenge gets back to being able to think about queries in terms of streams that you later join together.  I solved the problem this way:

//
// For each inverter, find the maximum power this year
// The result is a series of maximums (one per tag set, which
// in this case resolves to individual inverters)
//
wmax = from(bucket: "vpp")
  |> range(start2021-01-01T00:00:00Z, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "data" and r["_field"] == "W")
  |> max()

//
// Now pull out just the power output - this is the time series data
//
wdata = from(bucket: "vpp")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "data" and r["_field"] == "W")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)

//
 // Finally, Join these together on uuid, the result will contains columns
// including _time_wdata, _value_wmax and _value_wdata. Follow the join
// with a map() that actually calculates the percentage (and as as side
// effect filters out extra tags/columns so that it displays nicely in
// grafana)
//
join(tables: {wdata: wdata, wmax: wmax}, on: ["uuid"] )
|> map( fn: (r) => ({
uuid: r.uuid,
_time: r._time_wdata,
_value: r._value_wdata/r._value_wmax
})
)
|> yield()

The result gives you a set of relative power curves normalized to each system's max this year



For performance reasons, it's best to avoid any kind of map() in the wmax or wdata queries.  The goal for things like this is to use the pushdown pattern, meaning use functions that let the query be performed (and the result set reduced in size) at the storage layer, rather than in memory.  This blog post does a good job of explaining this, and the pushdown patterns are explained Here.  Note that map() can't be pushed down, so in general I think the best strategy is to avoid map() until the very end - after the results have been filtered and stats like min(), max(), mean() have already run on the storage side.

Flux is a different way of thinking.  If you have spent decades thinking in SQL (like me), it takes a little time to get used to this.  Flux tends to be wordier, but what you get out of it is something that's much more flexible, so I think it's worth it.

I hope these flux notes help others.  I've grown to like flux a lot, but I found the learning curve to be a little steep, and whenever I'm faced with a new type of problem I have to really think about the data model and what is coming out of each step in the stream I set up.  I tend to bug the InfluxData staff, but fortunately they have been very patient and helpful.








Wednesday, December 8, 2021

Closing the Initial Gap on Grafana State Timelines (Flux queries)

I have an application that feeds periodic state into an influxdb 2 table.  This data can be queried rather simply:

from(bucket: "vpp")
  |> range(start: vTimeRangeStart, stop: v.timeRangeStart)
  |> filter(fn: (r) => r.uuid == "${uuid}" and r._field == "State")|
  |> last()
  |> keep(columns: ["_time", "_value"])


This sort of works, but it causes an unsightly gap in the data.  A state exists there, but it is before vTimeRangeStart so the query doesn't see it.



My flux solution to this problem is to fetch an initial value up to 24 hours prior to vStartTime.  That proved to be a little tricky but Flux does provide an experimental way to manipulate times:

import "experimental"

initialStateTimeRangeStart = experimental.subDuration(
    d:24h,
    from: v.timeRangeStart
)

Using this you can then get an initial state:

initial=from(bucket: "vpp")
  |> range(start: initialStateTimeRangeStart, stop: v.timeRangeStart)
  |> filter(fn: (r) => r.uuid == "${uuid}" and r._field == "State")
  |> last()
  |> keep(columns: ["_time", "_value"])


Here is the complete query fetches the initial state and appends it to the recent data:

import "experimental"

initialStateTimeRangeStart = experimental.subDuration(
  d: 24h,
  from: v.timeRangeStart
)

initial=from(bucket: "vpp")
  |> range(start: initialStateTimeRangeStart, stop: v.timeRangeStart)
  |> filter(fn: (r) => r.uuid == "${uuid}" and r._field == "State")
  |> last()
  |> keep(columns: ["_time", "_value"])

recent=from(bucket: "vpp")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r.uuid == "${uuid}" and r._field == "State")
  |> keep(columns: ["_time", "_value"])

union(tables: [initial, recent])
  |> sort(columns: ["_time"])
  |> yield()


with a pleasant result that extends the previous state on to the graph:



A few things to watch out for:

  • It appears to me that v.timeRangeStart is a 'time' in grafana and a 'duration' in chronograf (or influx 2). For most queries you won't notice this, as range() accepts either, but in this case, in chronograf you have to convert the time to a duration first, so the time calculation is a bit different (see below)
  • The documentation for union warns you that the sort order is not necessarily maintained, and they are right.  That sort() at the end is required.
  • Multiple yields lets you see what's happening along the way, so feel free to inital |> yield(name: "initial") and recent |> yield(name: "recent") if it helps with debugging.

As I mentioned, in the chronograf or influx data explorer you need to modify the time offset calculation:

initialTimeRangeStart = experimental.subDuration(
   d: 24h,
   from: experimental.addDuration(
      d: duration(v: v.timeRangeStart),
      to: now()
   )
)

I don't find this ideal but it's still useful to know how to do it, as developing queries in influxdb data explorer is something I find easier as it seems to give more meaningful error messages when things go haywire, and it's a little better at presenting the raw query results.

Note: I'm not sure that anybody reads any of this, I know that the search indexes all cover it, so perhaps somebody will find this useful some day.  If you're that person, feel free to comment here so I know someone is listening, thanks in advance!