How to build Annual Recurring Revenue
A step-by-step guide to creating an ARR build that makes reporting on and answering questions about ARR a breeze.
This post is an adaptation of Chapter 5 of our book, The Ultimate Guide to ARR.
Join thousands of founders and finance professionals from Uber, Atlassian, Plaid, Retool, Intercom, and more, and get your free copy today.
Bobby's original script for building out ARR at Intercom was thousands of lines of Python.
Here's how that script calculated expansion. It would sort customers' charges by billing number, calculate the difference between the periods, and return the delta or the expansion amount.
This ultimately turned into a multi-thousand-line Python script that pieced various pricing models together. It accounted—pretty horrifically and manually—for all the ways in which somebody could pay us (monthly, quarterly, annually) using all the different methods.
In some ways, it was a pure miracle that this worked, and we were able to keep it all together. The script was brittle, though. It'd break anytime we changed something about our business. Worst of all, nobody could use it but Bobby, meaning nobody else could do ARR analysis—which became a major problem as we scaled.
The most intimidating part was not knowing how to turn this Python script and its output into something that would scale. For Bobby, the Finance team, and everyone else that would ultimately need to report in some way on ARR.
We ultimately settled on the following format—the table we outline in this chapter. It was a complete game changer in how we and the rest of the company accessed ARR reporting.
Now it's time to turn all the logic you've defined into a source of truth.
This exercise aims to create a set of tables that you and your team can use to report on ARR easily and efficiently. All of your logic will be applied in various SQL queries.
As you get stuck into this, here are three principles to remember:
You are ultimately building towards a one-stop-shop reporting table.
This table should be comprehensive enough to quickly answer +90% of ARR questions. Be thoughtful about what fields that output table should include to streamline reporting pulls.The queries in this exercise will be complex. Be intentional about making them comprehensible. Stay organized in how you write your query. Drop in comments. Modularize the query into as many bite-size components as possible. The goal is to make this query approachable to others. There's no way to avoid it—this query will be long, but that doesn't mean it should be impossible for others to understand.
Your ending ARR table should play nicely with other tables you build around it. This ARR table will help you answer the large majority of questions related to ARR, but it's also the centerpiece for other questions you'll want to ask about the business. That's why you'll want to create a design framework that makes it simple to join additional datasets to the ARR Build.
First, we'll outline the process and pieces you'll put together as part of this exercise. If this isn't quite obvious or intuitive at first, don't worry.
We'll walk through each table in more detail and show examples to help illustrate.
It starts with your Base ARR table. Think of this as creating the ingredients and all the raw data from which you are going to transform your data.
Within your query, you'll apply the considerations and business logic outlined in the previous table. For example, here, you'll define start and end dates, trial periods, etc.
That creates a Clean ARR table, which now has all of your considerations and business logic applied.
Next, you'll create a simple Date table. This table outputs every day of every year across which you have ARR data.
Now, join your Clean ARR table to your Date table, which creates what we call a Padded ARR table. In other words, it creates a table that shows your Cleaned ARR table with all its business logic applied by customer, for every day.
From this Padded ARR table, you'll calculate your Net New ARR—Or the net differences for each customer daily. You'll use those net calculations to categorize ARR into its component parts: Gross New, Expansion, Contraction, and Churn.
From here, you create your Final ARR table. This is the end state. It serves as your source of truth—housing by customer, ARR in its component parts, for every day.
Here’s how it looks at a surface level:
Let’s dig into each table in more detail.
The Base ARR Table
This is the query's building block. It’s where you’ll join across various other tables to bring in the comprehensive set of fields needed to power your ARR definitions.
Note that no calculations are made at this stage. Instead, think of the Base ARR table as a flat file with the raw material to power analysis later on in the query.
Here are some examples of the types of data you’ll want to include.
Date fields
The start and end dates associated with each primary object across the customer base.
Product and Price make-up
The IDs and names of product SKUs associated with the primary object and the frequency at which those products are billed (monthly, quarterly, annually, etc.). This detail also allows you to differentiate between service-based (e.g., implementation) and product-based revenue.
Customer and timing fields
Make sure your logic for defining a customer is included in this table. For this example, we’re assuming this happens based on the customer_id
.
If you’ve decided to limit the dataset to only customers who have successfully paid an invoice, you’ll also want to bring payment data to this table.
Here’s an example of what the output should look like:
The Clean ARR Table
This will get you from the raw Base ARR table to a clean view of the primary object data. The query can get lengthy here, especially with the invoice-based method, as there are more edge cases to address in the data.
Every company will have unique nuances to contend with, but accounting for the below will get you ~95% of the way there.
Accounting for billing frequency (invoice or subscription)
As discussed earlier, you’ll want to derive the ARR amounts directly in the query by taking the product SKU multiplied by quantity. If you’re working with invoices, this will require an extra step of adjusting for invoice duration.
Overlapping and backdated invoices (invoice specific)
Overlapping invoices can cause double counting as you typically only want to honor a single invoice on any given day for each subscription. Here’s how to account for the two edge cases discussed in the previous chapter, Defining Your ARR:
Corrective invoices (same day): A quick window function can be used to honor the last invoice created on any given day. It’s a good idea to review a handful of examples to ensure this logic is correct and better understand why corrective invoices are being issued.
Backdated invoices: To avoid restating historicals, create a
start_date_clean
field that defaults to the created date when the created date > the start date on the invoice. As with corrective invoices, review a handful of these examples to ensure the ARR logic matches your current business logic.Amendments: The logic used to account for billing frequency can also be applied to calculating ARR values for amendments. The only additional piece needed is adjusting the “end date” of the original invoice (pre-amendment) to one day prior to the amended invoice’s issue date.
Removing non-recurring items (invoice or subscription)
Clean up the dataset for non-recurring products or services and test accounts.
Non-recurring items: Identify any products that are one-time in nature and remove them from your calculations. In your ARR Build, you only want to focus on recurring items.
Discounts: Remove recurring fixed dollar or percent-off discounts from the derived amounts. To enable discount reporting, it may be helpful to surface both gross and net amount fields in your calculations. This can be enlightening from a macro perspective but also useful in enforcing policies on a customer-by-customer basis.
Test accounts: Lastly, always be sure to remove any test accounts from the data. Ideally, you have a way to remove these with some kind of 100% off coupon, but keep a close eye out for any emails from your company’s domain or customer names matching your employees.
Ultimately, you want to aggregate the data into a single record per primary object id. Here’s an example of what the output should look like:
The Date Table
The Clean ARR table returns a single record for each invoice or subscription object, depending on which you use to generate your build. However, the end goal of the ARR Build is to pull ARR across the entire customer base at any point in time, not just when an invoice or a subscription was created. In order to make this possible, the table needs to expand from a single object record to a record for every day that object was active.
This process is called padding and requires creating a Date table in SQL that the Clean ARR table can join to. Most data warehouses have functions that allow this. Adding a few other dimensions to the date table, such as month_ending
or month_beginning
fields, is helpful in making reporting easier.
SELECT
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', '2100-12-31')
)::DATE as date
,DATE_TRUNC('week',date)::DATE AS week
,DATE_TRUNC('month',date)::DATE AS month
,CASE WHEN date = DATE_TRUNC('week',date)::DATE THEN TRUE ELSE FALSE END AS week_beginning
,CASE WHEN date = (DATE_TRUNC('week',date)::DATE + 6) THEN TRUE ELSE FALSE END AS week_ending
,CASE WHEN date = DATE_TRUNC('month',date)::DATE THEN TRUE ELSE FALSE END AS month_beginning
,EXTRACT(MONTH FROM date) <> EXTRACT(MONTH FROM (date + INTERVAL '1 day')) AS month_ending
FROM TABLE (generator(rowcount => 365.25 * (2100 - 2020)))
ORDER BY 1
Here’s an example of how we build these in Snowflake and the corresponding output:
The Padded ARR Table
Next, join the Date and Clean ARR tables together where the dates from the Date table are between the primary object’s start and end dates. We call this the Padded ARR table.
Here’s what the logic looks like:
FROM date_table d
LEFT JOIN clean_arr_table c ON d.date BETWEEN c.period_start_date_clean
AND c.period_end_date_clean
The end result returns a record each date the primary object was active. Extrapolating this across all customers makes it possible to pull ARR on any date.
The Final ARR Table
With the Padded ARR table in place, you have the foundation to calculate Net New ARR. This is derived by taking the ARR on any given date and subtracting it from the ARR on the prior date. The easiest way to derive this is through the LAG function in SQL:
annualized_amount - COALESCE(LAG(annualized_amount,1) OVER
(PARTITION BY customer_id ORDER BY date ASC),0) as net_new_arr
Once Net New ARR is calculated, you can start dropping these values into each ARR component part’s calculation depending on
Where you’ve drawn the line for counting a customer
The state of the customer at the time of Net New ARR
Here are some examples of how to think about this:
,SUM(CASE WHEN subscription_cancelled_at IS NULL AND invoice_number = 1 AND net_new_arr > 0 THEN net_new_arr
ELSE 0 END) AS gross_new_arr
,SUM(CASE WHEN subscription_cancelled_at IS NULL AND invoice_number > 1 AND net_new_arr > 0 THEN net_new_arr
ELSE 0 END) AS expansion_arr
,SUM(CASE WHEN subscription_cancelled_at IS NULL AND invoice_number > 1 AND net_new_arr < 0 THEN net_new_arr
ELSE 0 END) AS contraction_arr
,SUM(CASE WHEN subscription_cancelled_at IS NULL AND invoice_number > 1 AND subscription_id_lag <> subscription_id THEN total_arr
ELSE 0 END) AS restart_arr
,SUM(CASE WHEN date = subscription_cancelled_at THEN annualized_amount_lag * -1 ELSE 0 END) AS churn_arr
Data output
The Final ARR table brings everything together on a per-customer, per-day basis and can easily be aggregated from there. Here’s how it should look:
Incorporating additional datasets
With the ARR Build complete, you may consider bringing in other datasets to join against the Final ARR table for other reporting needs, such as product usage, customer acquisition, and firmographic data.
The most efficient way to do this is to establish separate, modular groupings of data and eventually join those together in one large “Comprehensive” ARR table. Under this structure, modular data sets are grouped together when they share a similar classification or upstream data source.
There are a few advantages to organizing your data like this:
You create a one-stop-shop table. This single table can handle ~90% of your reporting, from board decks to go-to-market updates.
You simplify the SQL writing. Creating modular datasets creates a single source of truth, rather than rewriting it each time you want to query the data. Also, updating logic happens in one location and automatically flows through to your downstream sources
You gain a lot of efficiency in debugging and comprehension. The rest of the team can get up to speed on your ARR table more quickly, introducing redundancy and better data comprehension.
Here’s every field we ended up with in our table at Intercom:
Now that you’ve built a one-stop shop for your ARR reporting, you’re ready to start operationalizing ARR to drive better business outcomes.