Wednesday, February 7, 2024

IPv6 Only Ubuntu instance on Amazon Web Services

Pay Per IPv4 Address?


You're probably aware of the AWS plan charge for IPv4 Addresses.  The costs for an address is$0.005 per hour, which amounts to about $3.65/month.  That's not going to break the bank, but if you use a few tiny instances (t3.micro or even t3.nano) you might pay more for the IP address than for the virtual machine it's attached to.  


Secondarily, I wondered if I can create an EC2 instance that has only IPv6, and would there be problems?  I decided to try.

Configuring IPv6 on EC2

Setting up IPv6 on EC2 takes a little work, most of it being done in the VPC.  Amazon has documentation that explains it.  The gist is:

  1. Associate a public IPv6 border address block to your VPC.  To do this, open the VPC console, select the VPC your instances are on, and edit the CIDR blocks.  Add an "Amazon-provided IPv6 CIDR block" associated with your ec2 region or AZ
  2. Create at least one subnet within that block. Typically this will be a subset of the block (for example a /64).  Your instances will choose an address from within this subnet.
  3. Make sure your EC2 instance belongs to this VPC/subnet (if it doesn't already) and auto-assign it an address
  4. Update the routing tables to include an IPv6 default rout

My instructions here are very incomplete, but the Amazon instructions are good (though long), so follow those.

APT Updates


The first pain point I ran into was breaking apt.  There are two reasons for this.  One is that you have to specifically configure apt to use IPv6.  You can do this by adding a file:

root# cat /etc/apt/apt.conf.d/1000-force-ipv6-transport
Acquire::ForceIPv6 "true";

Second, if you built your Ubuntu instance from one of the EC2 templates, it's going to have apt repositories listed that don't support IPv6.  In my case this was us-east-1.ec2.ports.ubuntu.com.  It surprises me that there would be such a repository, obviously within AWS, but without an IPv6 address.

To fix this, I changed /etc/apt/sources.list to point toward a more generic Ubuntu package source:

deb http://ports.ubuntu.com/ubuntu-ports/ jammy main restricted
deb http://ports.ubuntu.com/ubuntu-ports/ jammy-updates main restricted
deb http://ports.ubuntu.com/ubuntu-ports/ jammy universe
deb http://ports.ubuntu.com/ubuntu-ports/ jammy-updates universe

This probably has some speed implications, and most likely some cost implications as well as the package repository is no longer within the AWS region as your instance, but that is likely to cost you pennies at most and the speed isn't that important for regular patching and updates.

Is it connectable?


All the testing I have done so far has been fine.  I work at a place that has good IPv6 infrastructure, as does my cable internet service (Spectrum).  Mobile devices seem to be ahead of the curve with respect to IPv6.  So, for my use case, IPv6 only seems fine.  I do have concerns, though, about connectivity - how many IPv4-only clients are out there?  No idea.  I don't think I would risk it for a truly "production" application until I understand the answer to that question.

Docker


Docker containers running within the EC2 instance are a little more problematic.  If those containers need to reach out to the internet for any reason, they won't work unless you specifically enable IPv6 (or set up some kind of proxy) for them.  That's an issue I was abled to solve, and I will explain how another day.




Friday, December 17, 2021

Relative solar panel power output in Flux

The problem: I want to display solar generation data as a percentage of the max output each individual inverter has generated over its lifetime.  That will let me more easily compare panels of widely varying power outputs, as we have some huge arrays (2 megawatts) and some tiny ones (1000-2000 watts).

In an SQL like query language what I'm looking for is something like this:

SELECT uuid, _time, W / MAX(W) ...

but how do you do that with flux?  The challenge gets back to being able to think about queries in terms of streams that you later join together.  I solved the problem this way:

//
// For each inverter, find the maximum power this year
// The result is a series of maximums (one per tag set, which
// in this case resolves to individual inverters)
//
wmax = from(bucket: "vpp")
  |> range(start2021-01-01T00:00:00Z, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "data" and r["_field"] == "W")
  |> max()

//
// Now pull out just the power output - this is the time series data
//
wdata = from(bucket: "vpp")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "data" and r["_field"] == "W")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)

//
 // Finally, Join these together on uuid, the result will contains columns
// including _time_wdata, _value_wmax and _value_wdata. Follow the join
// with a map() that actually calculates the percentage (and as as side
// effect filters out extra tags/columns so that it displays nicely in
// grafana)
//
join(tables: {wdata: wdata, wmax: wmax}, on: ["uuid"] )
|> map( fn: (r) => ({
uuid: r.uuid,
_time: r._time_wdata,
_value: r._value_wdata/r._value_wmax
})
)
|> yield()

The result gives you a set of relative power curves normalized to each system's max this year



For performance reasons, it's best to avoid any kind of map() in the wmax or wdata queries.  The goal for things like this is to use the pushdown pattern, meaning use functions that let the query be performed (and the result set reduced in size) at the storage layer, rather than in memory.  This blog post does a good job of explaining this, and the pushdown patterns are explained Here.  Note that map() can't be pushed down, so in general I think the best strategy is to avoid map() until the very end - after the results have been filtered and stats like min(), max(), mean() have already run on the storage side.

Flux is a different way of thinking.  If you have spent decades thinking in SQL (like me), it takes a little time to get used to this.  Flux tends to be wordier, but what you get out of it is something that's much more flexible, so I think it's worth it.

I hope these flux notes help others.  I've grown to like flux a lot, but I found the learning curve to be a little steep, and whenever I'm faced with a new type of problem I have to really think about the data model and what is coming out of each step in the stream I set up.  I tend to bug the InfluxData staff, but fortunately they have been very patient and helpful.








Wednesday, December 8, 2021

Closing the Initial Gap on Grafana State Timelines (Flux queries)

I have an application that feeds periodic state into an influxdb 2 table.  This data can be queried rather simply:

from(bucket: "vpp")
  |> range(start: vTimeRangeStart, stop: v.timeRangeStart)
  |> filter(fn: (r) => r.uuid == "${uuid}" and r._field == "State")|
  |> last()
  |> keep(columns: ["_time", "_value"])


This sort of works, but it causes an unsightly gap in the data.  A state exists there, but it is before vTimeRangeStart so the query doesn't see it.



My flux solution to this problem is to fetch an initial value up to 24 hours prior to vStartTime.  That proved to be a little tricky but Flux does provide an experimental way to manipulate times:

import "experimental"

initialStateTimeRangeStart = experimental.subDuration(
    d:24h,
    from: v.timeRangeStart
)

Using this you can then get an initial state:

initial=from(bucket: "vpp")
  |> range(start: initialStateTimeRangeStart, stop: v.timeRangeStart)
  |> filter(fn: (r) => r.uuid == "${uuid}" and r._field == "State")
  |> last()
  |> keep(columns: ["_time", "_value"])


Here is the complete query fetches the initial state and appends it to the recent data:

import "experimental"

initialStateTimeRangeStart = experimental.subDuration(
  d: 24h,
  from: v.timeRangeStart
)

initial=from(bucket: "vpp")
  |> range(start: initialStateTimeRangeStart, stop: v.timeRangeStart)
  |> filter(fn: (r) => r.uuid == "${uuid}" and r._field == "State")
  |> last()
  |> keep(columns: ["_time", "_value"])

recent=from(bucket: "vpp")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r.uuid == "${uuid}" and r._field == "State")
  |> keep(columns: ["_time", "_value"])

union(tables: [initial, recent])
  |> sort(columns: ["_time"])
  |> yield()


with a pleasant result that extends the previous state on to the graph:



A few things to watch out for:

  • It appears to me that v.timeRangeStart is a 'time' in grafana and a 'duration' in chronograf (or influx 2). For most queries you won't notice this, as range() accepts either, but in this case, in chronograf you have to convert the time to a duration first, so the time calculation is a bit different (see below)
  • The documentation for union warns you that the sort order is not necessarily maintained, and they are right.  That sort() at the end is required.
  • Multiple yields lets you see what's happening along the way, so feel free to inital |> yield(name: "initial") and recent |> yield(name: "recent") if it helps with debugging.

As I mentioned, in the chronograf or influx data explorer you need to modify the time offset calculation:

initialTimeRangeStart = experimental.subDuration(
   d: 24h,
   from: experimental.addDuration(
      d: duration(v: v.timeRangeStart),
      to: now()
   )
)

I don't find this ideal but it's still useful to know how to do it, as developing queries in influxdb data explorer is something I find easier as it seems to give more meaningful error messages when things go haywire, and it's a little better at presenting the raw query results.

Note: I'm not sure that anybody reads any of this, I know that the search indexes all cover it, so perhaps somebody will find this useful some day.  If you're that person, feel free to comment here so I know someone is listening, thanks in advance!

Thursday, January 30, 2020

Influx 2.0 queries with Grafana


Influx 2.0 (flux) queries in grafana are a little on the messy side.  Here are a comple of examples that might be useful.


Here is an example of summing up a number of fields for display on a "single stat" panel widget

from(bucket: "powermon")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "modbus")
|> filter(fn: (r) => r._field == "pv1" or r._field == "pv2" or r._field == "pv3" or r._field == "pv4" or r._field == "pv5" or r._field == "pv6" or r._field == "pv7")
|> last()
|>keep(columns:["_value"])
|> sum()


When playing with graphs, things are more or less as you'd expect:

from(bucket: "powermon")
|> range(${range})
|> filter(fn: (r) => r._measurement == "modbus")
|> filter(fn: (r) => r._field == "pv1" or r._field == "pv2" or r._field == "pv3" or r._field == "pv4" or r._field == "pv5" or r._field == "pv6" or r._field == "pv7")
|> aggregateWindow(every: 1m, fn: (tables=<-, column) => {
return tables |> mean(column: column) |> map(fn: (r) => ({r with _value: r._value*1000.0}))
})
|> yield(name: "mean")
 
 



Wednesday, July 12, 2017

Table-driven holes in Solidworks

I recently had occasion to want to create a series of holes in Solidworks.  The hole definition isn't complicated but it has a few parts to it - the specific shape I am going for cups an LED and couples it to an acrylic light pipe.  This is embedded into a large plastic plate, which could be a base plate or perhaps a cover.  The hole shape, in section view, looks like this:




The hole shape is made by drawing a 2D profile them revolving it around the centerline, which in the view below is the on the left side.  The 2D shape, in profile, makes up one half of the hole.


I needed nine such holes but not on a regular pattern.  It turns out there is a way to duplicate features in SolidWorks but using it is a little confusing - the manual page and tutorial are pretty short.  I'm documenting the process here so I can give some tips (and remind myself how to do this a year from now when I probably will have forgotten the whole thing).

The first thing you need to do is to create a coordinate system.  This was for me one of the confusing parts.  The coordinate system defines an orientation (which direction is X, Y, and Z, including which way is positive and negative) and an origin (what point corresponds to coordinates 0,0,0).  There are a few ways to define a coordinate system.  For this, I'm setting an origin, Y, and Z.  From those the software figures our which way is X.

To begin, I've drawn the 'base' into which all these holes will be extruded:


I started the hole by using the hole wizard, but hole wizard isn't magical.  All it does is creates two 2D sketches then revolves the second one around a centerline:


The first sketch does nothing but define the center point of the hole.  The second sketch is the 2D hole shape listed above.

Next, what I did was to create a reference point that marks the center of the hole - essentially what this does is to bring the centerpoint of the hole out of the sketch and up to the 3D drawing.



Now I have my base with one hole in it:


The next step is to define a coordinate system.  In my case, the origin is the center of the hole.  This mean that in my table, the first hole will be at location 0,0 and all the "copies" will be relative to that:



Next I create the table.  I did this in excel.  By default the units will be in the same units as your document, but you can specify 'in' or 'mm' (or any other SolidWorks unit) if you like:


Finally, create the pattern:



A couple of things to note:

  • The coordinate system is placed at the point we created, so that's 0,0 - all the coordinates in the table will be relative to that
  • The "Feature" is the hole, created by the hole wizard (then its shape modified by me)


The result is my baseplate with all of the holes replicated at my X,Y locations:


Now don't ask me what it is (it's a secret) - I'll tell you next month!


Monday, May 15, 2017

Which is better?

I am making a Spring Boot app using Spring Data that has, as part of its function, an embedded wiki. Individual wiki pages have workflows associated with them. Rather than maintain a list of parent/child relationships for pages I'm using an implied hierarchical structure based on the 'path' columns that serve as the coordinates for each page. The hard part is finding only the direct descendants of each page. To do this, I look for paths that include the 'parent' but have no other path separators. In other words, descendants of labs would include /labs/electronics and /labs/bio but would not include /labs/electronics/equipment because, while equipment is a descendant of /labs it is not a direct descendant.

To accomplish this, I do two things:
  • Find the paths that begin with the parent's path, including trailing slash, for example /labs/. This selects all descendants of the parent.
  • Of those, find the ones who have NO OTHER SLASHES beyond the parent. This selects only the direct descendants.
Here are two ways to do it, both produce identical results:

Using JPQL
@Query("select w from WikiPage w " +
    " where w.path like concat(:path, '/', '%') " +
    " AND LOCATE('/', w.path, length(:path)+2) = 0")
public List<WikiPage>
    findDirectChildrenOfUsingAnnotationQuery(@Param("path") String path);

Using the Java-based Query DSL
@Override
public List<WikiPage> findDirectChildrenOfUsingQueryDSL(String searchPath) {
    CriteriaBuilder builder = em.getCriteriaBuilder();
    CriteriaQuer<WikiPage> query = builder.createQuery(WikiPage.class);
    Root<WikiPage> wikiPage_root = query.from(WikiPage.class);

    Expression<String> wikiPage_path = wikiPage_root.get("path");
    Predicate startsWithPath = builder.like(wikiPage_path,  searchPath + "/%");

    Expression<Integer> slashLocation = builder.locate(wikiPage_path, "/",
        searchPath.length()+2);
    Predicate noMoreDescendants = builder.equal(slashLocation, 0);

    query = query.where(builder.and(startsWithPath, noMoreDescendants));
    return em.createQuery(query.select(wikiPage_root)).getResultList();
}

Reasons to use the Java DSL
  • It is more refactorable ... but, a good IDE like IntelliJ can refactor JPQL just fine
  • It's more typesafe ... but not completely so, and again, IntelliJ does a pretty good job. You could even argue that it does better, because the Java DSL can't tell that "path" is a valid attribute of WikiPage, but in JPQL it could
  • Compiler-time errors if you screw up ... but again, IntelliJ already tells me if my JPQL is invalid
  • Certain operations - like the concat and strlen - are arguably better handled in java than by the db engine

The Java DSL is definitely harder to read. But I have a distaste for 'magical strings' which includes JPQL in a @Query annotation.

Thoughts?