How to create a formula to show accumulated amount per account

I have a table that describes a financial ledger and it looks like the example below. I would like to have the final column “cum_sum” be a formula computed from date, account, amount, and it will also use the “manualSort” column data for equal date values. Here is an example:

+------------+----------+-----------+----------+
|    date    |  account |   amount  | cum_sum  |
+------------+----------+-----------+----------+
| 2022-01-10 |    22    |    1.30   |  19.54   |
+------------+----------+-----------+----------+
| 2022-01-08 |    22    |   14.00   |  14.00   |
+------------+----------+-----------+----------+
| 2022-01-08 |    22    |    4.24   |  18.24   |
+------------+----------+-----------+----------+
| 2022-01-11 |    13    |    3.40   |   3.40   |
+------------+----------+-----------+----------+
| 2022-01-21 |    22    |    4.24   |  23.78   |
+------------+----------+-----------+----------+

I’m having difficulty coming up with an efficient formula for this. I did get a formula working with a large table lookup and filtering data using python list comprehension but it was WAY too slow for my table which contains 15K rows. Inserting a new row took 20ish seconds just because of my badly performing formula so I deleted it.

  • The data should accumulate the “amount” column even if there are multiple rows with the same date. Equal dates should be ordered based on the table order (“manualSort” internal column).
  • Every distinct “account” needs to have a separate accumulation.
  • Hopefully it’s possible for out-of-order dates to have correctly computed data (as shown in the above example where the first row is a later date than subsequent rows)

Just for reference, here is the self-referencing (and badly performing) formula that was too slow:

=SUM([r.amount for r in Postings.lookupRecords(account=$account) if r.transaction_date < $transaction_date or (r.transaction_date == $transaction_date and r.manualSort <= $manualSort)])

Obviously having so many lookups and a python list comprehension was not a good idea. Any ideas on how to speed this up without hardcoding this value (it would quickly become out-of-sync) and without killing my grist server? Thanks in advance! This software has been really helpful to me.

I think this one should do the trick, as it avoids looping over big ranges:

acc = table.lookupRecords(account=$account,sort_by="date")
idx = acc.id.index($id) + 1
sum(acc.amount[0:idx])

Hi jperon,

I tried your approach and it’s still super slow. Simply the fact that large lists of records are being looked up for each row in the table makes it really slow. Even if grist internally caches similar “lookupRecords” function calls, it’s still a crazy amount of data to process. Some of my accounts have thousands of records under each one. I think I would need to use “lookupOne” to look up a single record but it would need to allow me to use “>” or “<” conditions rather than simply “=” conditions. Unfortunately, I don’t think that is currently possible in grist. If I could do that, I could just find the id of the previous series in my cumulative sum and then create a running tally, each record only referencing the previous one in the series.

Thanks for your suggestion!

Garret

Sorry for the delay; what about this solution:

  1. Add a new column (that could be hidden) with name acc_n, with this formula:
acc = table.lookupRecords(account=$account,sort_by="date")
acc.id.index($id)
  1. Change the formula within cum_sum as follows:
prev = table.lookupOne(account=$account, acc_n=$acc_n-1)
return prev and prev.cum_sum + $amount or $amount

I did try a formula very similar to this. It still pulls too much data and it’s very slow to update when new records are added. Pulling in 100-3000 records (using lookupRecords) for EACH row in a 15000 row table is just slow and memory intensive. The only way to speed it up (I think) would be to figure out how to use “lookupOne” instead of “lookupRecords” in both columns. I think this would require a new feature supporting “greater-than” types of lookups as a parameter to lookupOne.

@natalie-grist @paul-grist @anais-grist @dmitry-grist feel free to correct me if I’m missing something here.

1 Like

Hi Garret, this is indeed tricky, but I believe exploring this example will help: Cumulative - Efficient - Grist.

Everything is on one screen: on the left is a table of transactions for a couple different accounts, sorted by date. On the right is a summary (grouped by account) and transactions for the account selected in the summary.

The key to this approach is in using the sortedcontainers library which allows for more efficient operations with ordered lists. Three formulas are involved:

  1. In the summary table, a SortedKeyList of transactions is created for each account, sorting by Date, disambiguated using manualSort. (The whole extra class definition of SList is unfortunate, but addresses a subtle issue: that SortedKeyLists must never appear equal to Grist when they aren’t in fact identical. It also produces a more concise representation.)
    from sortedcontainers import SortedKeyList
    class SList(SortedKeyList):
      def __eq__(self, other): return self is other
      def __repr__(self): return "SList<{}, #{}>".format(len(self), id(self))
    
    SList($group, key=lambda r: (r.Date, r.manualSort))
    
  2. In the Transactions table, we construct a reference to the “Previous” transaction, according to that sort order, within the current account. The fact that lst is a SortedKeyList makes the operation many times faster than it would be with regular lists:
     lst = Transactions_summary_Account.lookupOne(Account=$Account).Sorted_Transactions
     index = lst.index(rec)
     return lst[index - 1] if index > 0 else None
    
  3. Once the notion of “Previous” is available, the formula for cumulative sum is easy:
    ($Previous.Cumulative or 0) + $Amount
    

In reality, it’s similar to @jperon’s solution, with two optimizations: the lookupRecords and the sorting happens once per account, in the summary table (rather than for each transaction), and the lookup in the sorted list (which is done for each transaction) is extra efficient (sublinear) thanks to using the sortedcontainers library.

Give it a shot, I think it should work efficiently for large tables.

3 Likes

@dmitry-grist this approach looks really cool! I haven’t had too much time to work on this but as soon as I do, I will try this approach on my dataset and report back. Thanks so much for taking the time to reply!

Thanks so much, this works perfectly for me. I never realized that summary table cells could actually contain such arbitrary types. That is a really powerful feature!

Thanks again,
Garret

2 Likes