How to create a formula to show accumulated amount per account

Garret_McGraw · December 20, 2022, 1:18am

I have a table that describes a financial ledger and it looks like the example below. I would like to have the final column “cum_sum” be a formula computed from date, account, amount, and it will also use the “manualSort” column data for equal date values. Here is an example:

+------------+----------+-----------+----------+
|    date    |  account |   amount  | cum_sum  |
+------------+----------+-----------+----------+
| 2022-01-10 |    22    |    1.30   |  19.54   |
+------------+----------+-----------+----------+
| 2022-01-08 |    22    |   14.00   |  14.00   |
+------------+----------+-----------+----------+
| 2022-01-08 |    22    |    4.24   |  18.24   |
+------------+----------+-----------+----------+
| 2022-01-11 |    13    |    3.40   |   3.40   |
+------------+----------+-----------+----------+
| 2022-01-21 |    22    |    4.24   |  23.78   |
+------------+----------+-----------+----------+

I’m having difficulty coming up with an efficient formula for this. I did get a formula working with a large table lookup and filtering data using python list comprehension but it was WAY too slow for my table which contains 15K rows. Inserting a new row took 20ish seconds just because of my badly performing formula so I deleted it.

The data should accumulate the “amount” column even if there are multiple rows with the same date. Equal dates should be ordered based on the table order (“manualSort” internal column).
Every distinct “account” needs to have a separate accumulation.
Hopefully it’s possible for out-of-order dates to have correctly computed data (as shown in the above example where the first row is a later date than subsequent rows)

Just for reference, here is the self-referencing (and badly performing) formula that was too slow:

=SUM([r.amount for r in Postings.lookupRecords(account=$account) if r.transaction_date < $transaction_date or (r.transaction_date == $transaction_date and r.manualSort <= $manualSort)])

Obviously having so many lookups and a python list comprehension was not a good idea. Any ideas on how to speed this up without hardcoding this value (it would quickly become out-of-sync) and without killing my grist server? Thanks in advance! This software has been really helpful to me.

jperon · December 21, 2022, 2:53pm

I think this one should do the trick, as it avoids looping over big ranges:

acc = table.lookupRecords(account=$account,sort_by="date")
idx = acc.id.index($id) + 1
sum(acc.amount[0:idx])

Garret_McGraw · December 22, 2022, 2:26am

Hi jperon,

I tried your approach and it’s still super slow. Simply the fact that large lists of records are being looked up for each row in the table makes it really slow. Even if grist internally caches similar “lookupRecords” function calls, it’s still a crazy amount of data to process. Some of my accounts have thousands of records under each one. I think I would need to use “lookupOne” to look up a single record but it would need to allow me to use “>” or “<” conditions rather than simply “=” conditions. Unfortunately, I don’t think that is currently possible in grist. If I could do that, I could just find the id of the previous series in my cumulative sum and then create a running tally, each record only referencing the previous one in the series.

Thanks for your suggestion!

Garret

jperon · December 29, 2022, 3:35pm

Sorry for the delay; what about this solution:

Add a new column (that could be hidden) with name acc_n, with this formula:

acc = table.lookupRecords(account=$account,sort_by="date")
acc.id.index($id)

Change the formula within cum_sum as follows:

prev = table.lookupOne(account=$account, acc_n=$acc_n-1)
return prev and prev.cum_sum + $amount or $amount

Garret_McGraw · December 30, 2022, 4:53pm

I did try a formula very similar to this. It still pulls too much data and it’s very slow to update when new records are added. Pulling in 100-3000 records (using lookupRecords) for EACH row in a 15000 row table is just slow and memory intensive. The only way to speed it up (I think) would be to figure out how to use “lookupOne” instead of “lookupRecords” in both columns. I think this would require a new feature supporting “greater-than” types of lookups as a parameter to lookupOne.

@natalie-grist @paul-grist @anais-grist @dmitry-grist feel free to correct me if I’m missing something here.

dmitry-grist · December 31, 2022, 7:11am

UPDATE 2024: There is now a simpler way, see How to create a formula to show accumulated amount per account - #11 by dmitry-grist!

Hi Garret, this is indeed tricky, but I believe exploring this example will help: Cumulative - Efficient - Grist.

Everything is on one screen: on the left is a table of transactions for a couple different accounts, sorted by date. On the right is a summary (grouped by account) and transactions for the account selected in the summary.

The key to this approach is in using the sortedcontainers library which allows for more efficient operations with ordered lists. Three formulas are involved:

In the summary table, a SortedKeyList of transactions is created for each account, sorting by Date, disambiguated using manualSort. (The whole extra class definition of SList is unfortunate, but addresses a subtle issue: that SortedKeyLists must never appear equal to Grist when they aren’t in fact identical. It also produces a more concise representation.)
```
from sortedcontainers import SortedKeyList
class SList(SortedKeyList):
  def __eq__(self, other): return self is other
  def __repr__(self): return "SList<{}, #{}>".format(len(self), id(self))

SList($group, key=lambda r: (r.Date, r.manualSort))
```
In the Transactions table, we construct a reference to the “Previous” transaction, according to that sort order, within the current account. The fact that lst is a SortedKeyList makes the operation many times faster than it would be with regular lists:
```
 lst = Transactions_summary_Account.lookupOne(Account=$Account).Sorted_Transactions
 index = lst.index(rec)
 return lst[index - 1] if index > 0 else None
```
Once the notion of “Previous” is available, the formula for cumulative sum is easy:
```
($Previous.Cumulative or 0) + $Amount
```

In reality, it’s similar to @jperon’s solution, with two optimizations: the lookupRecords and the sorting happens once per account, in the summary table (rather than for each transaction), and the lookup in the sorted list (which is done for each transaction) is extra efficient (sublinear) thanks to using the sortedcontainers library.

Give it a shot, I think it should work efficiently for large tables.

Garret_McGraw · January 3, 2023, 3:42am

@dmitry-grist this approach looks really cool! I haven’t had too much time to work on this but as soon as I do, I will try this approach on my dataset and report back. Thanks so much for taking the time to reply!

Garret_McGraw · January 5, 2023, 6:22pm

Thanks so much, this works perfectly for me. I never realized that summary table cells could actually contain such arbitrary types. That is a really powerful feature!

Thanks again,
Garret

Elliott_Lee · June 12, 2024, 12:09pm

@dmitry-grist thanks for this cumulative sum approach. I ran into the issues that if there’s an empty row, it the cumulative sum stops when I need it to continue. In our data we have months as rows and columns are budget, invoice amounts, and paid amounts. There are some months we don’t have an invoice for a client. Unfortunately these empty rows can’t be removed because they may have data in other columns and they are following a time series, so the budget burndown charts would look incorrect.

How might we create a cumulative sum for each client, in an efficient manner, that can handle empty rows?

dmitry-grist · June 26, 2024, 6:24pm

If you are using this formula from the suggestion above:

($Previous.Cumulative or 0) + $Amount

Try changing to the following, to handle empty rows:

($Previous.Cumulative or 0) + ($Amount or 0)

To explain: empty values in numeric columns are represented by the special Python value of None. An expression like ($Amount or 0) says to use $Amount when it is truthy (i.e. a non-zero non-None value), and 0 otherwise. So in effect it changes None values to zeroes.

dmitry-grist · October 17, 2024, 7:43pm

As of a few months ago, Grist has a much better way of doing this: the PREVIOUS function.

Unlike the previously-proposed workaround, no need for helper columns in summary tables to create helper data structures. You can create a “Previous” column like so:

PREVIOUS(rec, group_by="Account", order_by="Date")

And use it to calculate cumulative values, e.g. in a column named “Cumulative”:

($Previous.Cumulative or 0) + $Amount