Speeding up a len(Table.lookupRecords()) with multiple conditions

Pastoral_Shackle · April 16, 2025, 5:58pm

Hi all! Happy to be redirected but I didn’t see anything similar.

I know how i’d attack this problem efficiently in Python, and I know how I’d do it in SQL, but the combination is tripping me up.

I’m hoping to add a column to a People table. For each person row, if that person is still in a queue, I want to figure out what the length of the queue was at the time they joined it. Current approach is:

if $still_on_queue:
  return len([i for i in People.lookupRecords(was_on_queue=True) if (
    (i.queue_join_datetime<$queue_join_datetime) and
    (
      (i.queue_leave_datetime is None) or
      (i.queue_leave_datetime>$queue_join_datetime)
    )
  )])

The lookupRecords only captures people who were at one point on the queue (a minority) and iterates through them filtering, in order, for

joined the queue prior to the current person AND
is either still on the queue OR left the queue after the current person joined it

The speed is a blocker since I get a slowdown every time the table that queue_join_datetime and queue_leave_datetime are calculated from has a new record added.

It’s not clear to me where efficiency gains are. I could create more helper columns, but they would be additional list comprehensions around lookupRecords calls. It seems like addition more conditions on the lookupRecords call would help, but everything that’s left is conditioned on the queue_join_datetime value of the current record.

Any help greatly appreciated!

Leonard_Gallion · April 18, 2025, 2:20pm

File this under “Half baked, probably bad idea” and one which really doesn’t answer your question, but my initial reaction would be to add an Action Button to “Check My Place in Line” so it is only processed for that one person on demand. Alternately you might use the API integration to just calculate once at midnight and label it “Your Approximate Position in Line” or some such.

Pastoral_Shackle · April 18, 2025, 3:06pm

No that’s perfect! I was thinking that lazy table evaluation may help me here, and tinkered with it for a few minutes before thinking it was too much trouble, but manual evaluation like this may be just the trick. And given the problem I’m trying to solve (calculation at a single point in time), it really only needs to be done once for each person (unless something changes and gets backdated). Thanks!

Pastoral_Shackle · April 18, 2025, 5:47pm

And now that I say that and think about it for a bit I wonder if I can just make it a trigger formula. On a first try it doesn’t seem to be behaving quite the way I wanted, but I’ll poke at it some more.

Pastoral_Shackle · April 18, 2025, 9:50pm

Ok, was able to implement this with an Action button, and recalculations for each row on demand are very fast, but document reloads are still slow. Even slower, actually. But again, might be doing this wrong!

Here’s how I implemented the action button to fill in a few columns at once. When I time this, it’s the action button column (the one that this formula is in) that’s the slowest by an order of magnitude. It seems like on reload, or whenever the source tables get updated, it’s refreshing this whole column? I wouldn’t expect this formula to take any time if the button isn’t pressed, but it seems like it’s revalidating the formula when the tables it pulls from change.

if $backlog_join_datetime and not $backlog_leave_datetime:
  blwj = len([i for i in People.lookupRecords(was_on_backlog=True) if (
    (i.backlog_join_datetime<$backlog_join_datetime) and
    (
      (i.backlog_leave_datetime is None) or
      (i.backlog_leave_datetime>$backlog_join_datetime)
    )
  )])
  pob = len([i for i in People.lookupRecords(still_on_backlog=True) if (
    i.backlog_join_datetime<$backlog_join_datetime
  )])
  prog = (blwj-pob)/blwj
else:
  blwg = None
  pob = None
  prog = None
return {
  "button":"recalc backlog progress",
  "description":"recalculate backlog progress on demand",
  "actions":[[
    "UpdateRecord",
    "People",
    $id,
    {
      "backlog_length_when_joined": blwj,
      "position_on_backlog": pob,
      "backlog_progress": prog
    }
  ]]
}

Leonard_Gallion · April 18, 2025, 11:58pm

After thinking about it for a while, going back to the example in your first method, I wonder if lambdas would provide a performance boost. This makes the SQL engine do the work instead of dragging items in one by one for Python to process. Maybe something like this?

position = People.filter(
was_on_queue=True,
queue_join_datetime=lambda dt: dt < $queue_join_datetime,
queue_leave_datetime=lambda dt: dt is None or dt > $queue_join_datetime
).count()

Pastoral_Shackle · April 19, 2025, 2:03am

No dice, unfortunately-- got a couple AttributeError, starting with filter as an unknown method of UserTable. Looks like pyspark-like syntax but doesn’t seem implemented here, unless i’m really misunderstanding things. And this kind of lambdas inside a lookupRecords doesn’t seem to do the trick-- ended up not matching on anything.

Appreciate the help!

Leonard_Gallion · April 19, 2025, 2:24am

Yes I was trying carry over some other ideas to Grist. However Grist does support the lambda syntax in some situations so I was hoping it would work