Lessons in Grafana - Bonus Entry: Achieving Acceptable Averages¶

2026-03-02

Welcome! This is a bonus entry. If you skip it, the rest of the series will make sense, but I thought that these details were worth sharing. This is in part because I found it difficult to find a well-published way to solve this problem, and in part because I enjoyed the process of learning it. I’m hopeful that you will as well.

1) A Vision
2) Litter Logs
2.5) Achieving Acceptable Averages (you are here)
3) Mental Metadata
4) Noticing Notes
5) Graphing Goatcounter
6) Bridging Gadgets? Biometrics Galore!
7) Plotting Prospective Plans

As part of the previous and next post, I needed to have a way to implement a rolling average. When dealing with a time series that has inherent error in measurements, this is incredibly useful. It can help ensure that you’re getting closer to the true datapoints you are trying to represent, since values near each other in time should be highly correlated in the cases relevant here. Unfortunately, doing this well was tricky.

Why Not Just Use Mean?¶

The easy solution here would be to use a rolling mean. You might say something like “take the average of the last 7 entries”. If you were doing this by hand, that is almost certainly the solution you would use. This, however, runs into several problems.

The Problem of Window Size¶

If you set a fixed number of entries as your window for averaging, you risk being over or under sensitive to recent changes. Having an inflexible window will also give you problems whenever there is an extreme value in your data. As it disappears, it will have a disproportionate influence on the current average.

The Problem of Weighting¶

It would be a mistake to not apply some weighting to your average. A simple mean would have you believe that a point 1 week ago has the same correlation to now as a point 1 hour ago. This is nonsense.

Applying a weight based on the index can somewhat work around that. The naive, efficient approach would be something like:

def our_mean(iterable):
    total = 0
    weight = 0
    for idx, val in enumerate(iterable):
        delta = 1 / (idx + 1)
        weight += delta
        total += val * delta
    return total / weight

This is wonderful. It solves the problem at hand, and even has a nice diminishing profile. It’s nearly what we want, but it introduces another problem.

The Problem of Inconsistent Frequency¶

Who ever said you get data at a constant rate? The use case we describe in the previous entry, pet weights over time, certainly doesn’t. If you want old entries to have proportionally less influence, using the index alone isn’t enough to do that.

Unfortunately, adapting a linearly-weighted mean isn’t going to solve this problem. Anything you do will end up being fairly brittle. The only solution that solves all of these problems is an Exponential Moving Average.

The Simple Version¶

An exponential average captures our intuition for what a running average should be very well. It has two key properties:

All past influences affect the current average
Past measurements affect the current average much less when they are older

In math parlance, it is implemented like so:

\[\begin{split}\begin{aligned} Y_0 &= X_0 \\ Y_n &= Y_{n-1} + α \cdot (X_n - Y_{n-1}) \end{aligned}\end{split}\]

The \(α\) here is an abstract weighting parameter. Higher values of \(α\) will give old values a higher influence on the current average; it will reduce the rate of decay.

Adding Complexity¶

Thing is, the implementation above only works for fixed data rates. This is pretty easily implemented, and lets you precompute \(α\) in a way that runs much more efficiently. But that doesn’t work for any case we are using, as all of them have variable sample rates. We have to compute \(α\) at each step using the formula it’s derived from [Sac]:

\[\begin{split}\begin{aligned} Y_0 &= X_0 \\ Y_n &= Y_{n-1} + α \cdot (X_n - Y_{n-1}) \\ α &= 1 - e^{-Δt/τ} \end{aligned}\end{split}\]

Now it’s revealed that our weighting factor is actually \(τ\), which was previously hidden in a precomputed \(α\). In data with a fixed sample rate, you can think of it as analagous to a half-life. Values that are \(τ\) apart in time will influence the running average a constant factor more than values that are \(2τ\) away, giving us a lovely exponential curve.

Figuring out what \(τ\) represented under the hood was not obvious to me. The cited StackOverflow post leaves a lot of that to be inferred by the reader, and I wasn’t able to find other resources on this. I am making this blog post primarily to explain this specific variable for those who implement this in the future.

First let’s give it a unit. We are dividing a duration, an interval, and getting a real number output. This means that \(τ\) has to be an interval as well. The next question is what the exact relationship it has on the function.

Through some playing around, you can find that setting \(τ' = \tfrac{τ}{\ln(2)}\) gives you an exact half-life. This is pretty wonderful to know, and gives you a lot more flexibility in how to define units. No more guess and check, you can directly reason about it.

Final Implementation¶

Implementing this in PostgresQL requires a good deal of boilerplate, because of how aggregate functions work [Serb]. We’ll end up doing it in five steps:

Implement the main function
Add a data type to fit the data format they like
Add a step function for that data type
Add a final function to extract the value we want
Add an aggregate function to work on a series

This will create a control flow that looks as follows:

flowchart LR
subgraph I[Impl]
I1{NULL<br>Prev?}
I1 -->|No| I2b{{Math}}
I1 -->|Yes| I2a{{Value}}
end
subgraph S[Step]
S1{NULL<br>State?}
S1 -->|No| I1
S1 -->|Yes| S2a{{Value}}
S2a & I2a & I2b --> S2b[Wrap<br>State]
end
subgraph A[Aggregate]
AS((Start))
AS --> A1{{Blank<br>State}} --> S1
S1 -.- A4([Repeat<br>As Needed]) -...- A2
S2b --> A2{Fork} --> A3[Extract<br>State] --> AR((Return))
end

The Main Function¶

This is the main function that implements our average. It takes in two time-value pairs and computes their weighted average. If the earlier pair is NULL, it returns the newer value. I’ve added as many qualifiers as possible to improve performance with the planner [Sera]. IMMUTABLE tells the planner that nothing gets modified as a result of this function. PARALLEL SAFE tells it that this function can be run in parallel with no restrictions. LEAKPROOF tells the planner that information only leaves by the return value, not from side effects like logging.

CREATE FUNCTION lgbot._ema_func (
   val NUMERIC,
   ts TIMESTAMP WITH TIME ZONE,
   prev NUMERIC,
   prev_ts TIMESTAMP WITH TIME ZONE,
   tau INTERVAL
)
RETURNS NUMERIC
LANGUAGE SQL
IMMUTABLE
LEAKPROOF
PARALLEL SAFE
AS $$
   SELECT CASE
      WHEN prev IS NULL THEN val
      ELSE prev + (1 - exp(
               -EXTRACT(EPOCH FROM (ts - prev_ts)) / EXTRACT(EPOCH FROM tau)
            )) * (val - prev)
   END;
$$;

If you aren’t familiar with SQL development, it might throw you to see the EXTRACT() function being called. This is happening because, unlike in some other languages, you can’t directly divide intervals. If you try, you will get:

postgres=# SELECT (interval '3 days') / (interval '1 day');
ERROR:  operator does not exist: interval / interval
LINE 1: SELECT (interval '3 days') / (interval '1 day');

The way we get around this is by extracting the seconds-component. This lets us do that division cleanly, with the units canceling out as expected.

The Data Type¶

Because aggregate functions frustratingly require the state to be a single argument, we need to wrap it in a custom type. This is annoying, but tolerable.

CREATE TYPE lgbot._ema_state AS (
   val NUMERIC,
   ts TIMESTAMP WITH TIME ZONE
);

The Step Function¶

The purpose of a step function is to take us from the previous state to the next one. Strictly speaking, this could have the main function folded in. It would likely perform faster if it did, even. But when I was writing this, I appreciated having that be cleanly separated. It made it easier to reason about what was going on.

In this case, the step function has two jobs. It needs to guard against a NULL state, and it needs to translate inputs/outputs so that the main function and aggregate function understand each other.

CREATE FUNCTION lgbot._ema_step_func (
   state lgbot._ema_state,
   val NUMERIC,
   ts TIMESTAMP WITH TIME ZONE,
   tau INTERVAL
)
RETURNS lgbot._ema_state
LANGUAGE SQL
IMMUTABLE
LEAKPROOF
PARALLEL SAFE
AS $$
   SELECT CASE
      WHEN state IS NULL THEN ROW(val, ts)::lgbot._ema_state
      ELSE ROW(
            lgbot._ema_func(val, ts, state.val, state.ts, tau),
            ts
      )::lgbot._ema_state
   END;
$$;

The Final Function¶

This is the last step before we can define the aggregate. It has a very, very simple job: take in a state, and extract the value. This gives us the final output in a way that the end-user doesn’t need to think about.

CREATE FUNCTION lgbot._ema_final(state lgbot._ema_state)
RETURNS NUMERIC
LANGUAGE SQL
IMMUTABLE
LEAKPROOF
PARALLEL SAFE
AS $$
   SELECT state.val;
$$;

The Aggregate Function¶

With all of these pieces in place, we can define the aggregate function. It’s actually very clean to read. It needs to know the arguments we’re feeding initially, and a reference to the step function, state type, and final function (optional). Lastly, we indicate that the function can safely run in parallel.

CREATE AGGREGATE lgbot.ema(
   val NUMERIC,
   ts  TIMESTAMP WITH TIME ZONE,
   tau INTERVAL
) (
   SFUNC     = lgbot._ema_step_func,
   STYPE     = lgbot._ema_state,
   FINALFUNC = lgbot._ema_final,
   PARALLEL  = SAFE
);

Main Takeaways¶

This would have been a lot easier if I had settled for an easier-to-implement average function, but I am very glad that I didn’t. I learned a lot as a result of this, and think that I have a better fit to the data I track because of it. I am very hopeful that the next person who runs into this problem will have an easy resource for how to work through it.

If you enjoyed this, stick around for next week’s entry discussing psychological data collection: Mental Metadata.

Acknowledgments¶

Thank you to Jason Sachs (@jason-s@stackoverflow.com), who is one of the very few people that have addressed this question online. I have endless gratitude for my friends @Ultralee0@linktr.ee and Ruby, who’ve done an exhaustive test reading of this series. Lastly, I’d like to shout out @xy@mastodon.mit.edu for encouraging me to make this its own post.

Citations

[Sac]

Jason Sachs. Answer: exponential moving average sampled at varying times. Author link: https://stackoverflow.com/users/44330/jason-s and editor link: https://stackoverflow.com/users/107294/cjs. URL: https://stackoverflow.com/a/1027808 (visited on 2026-02-11).

[Sera]

Redrock Data Services. Create function - sql commands. URL: https://www.rockdata.net/docs/15/sql-createfunction.html (visited on 2026-02-17).

[Serb]

Redrock Data Services. Postgresql tutorial: custom aggregate functions. URL: https://www.rockdata.net/tutorial/ddl-create-aggregate/ (visited on 2026-02-11).

Cite As

Click here to expand the bibtex entry.

@online{appleton_blog_0008,
  title    = { Lessons in Grafana - Bonus Entry: Achieving Acceptable Averages },
  author   = { Olivia Appleton-Crocker },
  editor   = { Ultralee0 and Ruby },
  language = { English },
  version  = { 1 },
  date     = { 2026-03-02 },
  url      = { https://blog.oliviaappleton.com/posts/0008-lessons-in-grafana-02-5 }
}

Tags: grafana, data-collection, data-visualization, databases, scraping, postgres, postgresQL, python, exponential-moving-average, ema