Active users is part of almost every business's key metrics suite, especially for online business doing subscription, e-commerce, or social. It answers one of the most fundamental questions for every business leader: how many people are using my product, and is that number growing?
Typically it's defined as unique users who have taken some action (visited, clicked, purchased) within a fixed time window (1 day, 7 days, 30 days, etc). Due to the fixed time window, active users metrics (whether daily, weekly, or monthly) are typically reported with a time dimension.
A common report may look like this:
.png)
A user is active in a given time window if they did some action within that time window. For example, if a user places an order on 2021-07-13, they'd be considered a daily active user (DAU) on 2021-07-13, but not on 2021-07-14. On 2021-07-31, they'd be considered a monthly active user (MAU).
In this post, we'll define active users as the number of posters on Hacker News. There is a publicly available, daily updated BigQuery table called bigquery-public-data.hacker_news.full that contains a row for every story post and comment on Hacker News. Each row contains a by column (the username), the timestamp of the action (timestamp), a title value if they posted a story or a text value if they posted a comment. To calculate active users, we'll only need to use the by and timestamp columns.
Since this dataset is publicly available, you should be able to run all queries as is in your BigQuery project.



To save end-users your time, maybe you save this query as a view in your data warehouse called active_users_by_date. Now, users only need to run select * from active_users_by_date.



The corresponding SGQL queries are straightforward:

The SQL approach here has the advantage of being familiar, but it doesn't allow a lot of flexibility in the analysis. For example, what happens if you actually want weekly active users (WAU)? You have three options:

However, this assumes you have edit access to the view, which is not always the case since the original author probably doesn't want anybody to go in and adjust active user definitions.
In the Supergrain approach, you can just write:

If WAU is an important metric, it's still best for this to be represented at the SML layer like MAU and DAU. However, in the interim, queriers are not blocked. This is the core advantage of Supergrain: it balances flexibility with consistency of definitions.
Defining and calculating active users requires more boilerplate code than you think e.g. date spines. Using SQL views to lock in definitions works, but requires you to make a choice between consistent metrics and query flexibility.
The Supergrain approach requires more upfront work, but gives you both consistent metrics definitions and query-time flexibility.