I’m happy to present Pygrister GitHub - ricpol/pygrister a Python wrapper for the Grist API. This grew out of a contingent need for another project, after realising that the existing Python library is lagging rather far behind.
Pygrister covers all the documented Grist APIs, comes with documentation and a test suite, also useful to have some working examples.
At the moment it is still pretty raw, and I haven’t published it on PyPI yet. You may install it directly from Github, if you want to take it for a spin - careful, though: it should work, more or less, but it’s not ready for production.
Feedback is always appreciated!
Wow, glad I made it into the newsletter!
In the meantime, I have done quite a bit of work on Pygrister, uniforming and simplifying the apis, adding support for self-hosted Grist url patterns, and so on.
I have released version 0.3 just a few hours ago on Github, and I feel that overall the apis are more stable now. I think I’m ready to release the thing on PyPI in the next few days…
Wow,
Pygrister sounds like a much needed update for interacting with the Grist API. You needed appriciation for adding support for self-hosted URLs and the effort to make the APIs more stable. Looking forward to seeing it on PyPI soon.
All the best!
Ta-da!, it’s on PyPI now… Pygrister · PyPI so this means that you may install it with python -m pip install pygrister
…
What’s next? Probably supporting a few on the undocumented apis, and adding at least a basic type system for records… I should also look into the Requests support for keep-alive connections, and see if pygrister could benefit from requests sessions… we’ll see…
.
Oh well, after much debate and self-doubt, I finally figured out a way to provide type support in Pygrister.
It’s committed to the repo, but not yet released on PyPI… so if you want to try it out, you’ll have to install directly from GitHub.
Writing the documentation is the part that took me the longest, and it says a lot about my doubts about all this…feedback is always appreciated!
Hey @Riccardo_Polignieri.
Really nice documentation, really informative. I didn’t realise many of those inconsistency issues existed with the SQLite later / GUI / API.
Reading it through, it feels like you’ve put in a lot of work to bypass/fix many of the inconsistencies on Grist’s side.
Given you’ve just done a deep dive into all of this, are there any changes we could make to Grist’s API that would help simplify type handling and reduce some of this complexity?
Ha! Now that’s a though question…
Broadly speaking, I think that Grist usually does the good, common-sense thing, given the wide array of possibilities supported, and that consistency is the hobgoblin etc. etc. Truth is, most of what I changed in Pygrister is also a matter of taste. I don’t know how much of this actually needs to be fixed (plus, if you start changing the APIs now, I’ll have to fix Pygrister too!)
That being said, in writing Pygrister I came across 3 kinds of “personal distaste”:
- a few APIs returns are just too complex/nested: those I simplify a little;
- a few APIs are, in fact, inconsistent;
- recently, I surveyed quite a few oddities about the type system in specific, but these are not really a problem with the API, and I choose not to get involved (Pygrister only provides a hook for custom converters… let the user deal with every specific case!).
As for the first two problems, if you just browse my code (start from line 300) you’ll be able to spot them very easily:
- every time a Pygrister function returns something like
return self.apicall(url, ...)
, this basically means that I agree with the underlying Grist API and return it unchanged - TBH, this is the most common case; - when I start writing things like
return res['whatever']
or worse, this is where I think the original API needs some massaging.
Now, most of the time it’s just because Grist APIs like to return an object with a label attached that just says “object”. For instance, {'users': [<an array of users]}
instead of just [<an array of users>]
. Or, even worse to me, {'tables': [{'id': 'tablea'}, {'id': 'tableb'}, ...]}
instead of just ['tablea', 'tableb']
when you are just returning a list of IDs. In such cases I want to simplify and cut to the chase but maybe Grist has its own good reasons too (maybe it’s even some sort of standard somewhere? My ignorance about rest api design is impossible to overestimate).
The format of records is a particular case where I got annoyed by the Grist APIs, as I documented. To me, a “record” must be a dictionary, and a “list of records”, just a list of dictionaries. This is also a very common way of doing things in the popular sqlite3
Python library. Working my way around the various Grist record formats resulted in a few nasty one-liners that really made my day!
A couple of times I just gave up: for the sake of simplicity, in Pygrister both GET workspaces/id/access
and docs/id/access
APIs leave the maxInheritedRole
bit out from the returned result. This is because, at this point, I had a rather elegant and consistent schema in mind, about what kind of objects the various Pygrister functions shall return (it is detailed here), and these two Grist APIs just did not fit.
Then we have the occasional case of plain inconsistency in the Grist API, that I “fixed” in Pygrister. For instance, a couple of webhooks APIs return a rather odd success message {success: true}
that I simply choose to ignore.
This is about the output values in the Grist APIs… It’s not that I didn’t have my eyebrows raised with the parameters too, at times: but here I choose a more conservative approach, staying always close the original API (mainly because if I started changing too many things, then I would have to write my own docstrings, instead of just pointing to the Grist api reference). (The notable exception is the already mentioned unified record format: add_records
, for instance, accepts a “Pygrister record list”, then converts it to Grist’s own format.)
I remember being mildly annoyed by the widgetOptions
field in column APIs, which required a separate treatment.
I am not a fan of the complex filter
option that has to be url-encoded in the path
I plainly dislike the choice of doubling parameters in the docs/id/attachments
API (Pygrister will just put them in the url).
Sometimes I wish I had a better Grist documentation: for instance, parameters’ defaults are missing, and this forced me to always include some of them (well, I didn’t take the time to test, tbh)… Or, I never quite figured out the noparse
parameter in the records APIs…
Or, what’s the matter with the PATCH docs/id/tables
API, having a completely different body schema than its POST
version? (I even left an angry note in my tests, from a moment when I haven’t realized this yet. At some point I’ll have to revisit and write a better example/test, but I mean… grrrr…)
I am not speaking here of the overall api design… It has been noticed that some needed APIs are missing and so on… But it’s a broader scope than Pygrister’s concerns…
Then, there is the whole matter of the Sqlite/Grist GUI/Grist API type system to consider. I have decided that Pygrister should not be involved, as I said. However, while researching for a viable strategy, I made a test table for types and I found a few oddities. But this post is getting way too long. I’ll add more about this in a few days…
Thanks for taking the time to write up such a detailed response @Riccardo_Polignieri. And apologies for the long wait on this, I got caught up with other bits and it slipped my mind!
I’ll have a look through your code, especially for those first two problems (complexity and inconsistency). See if there’s any work we can schedule on our side to sort those out.
When it comes to overly nested objects - I can see the frustration there! Personally, I’m a fan of key-value structures in API design, as they provide an avenue for extension without making breaking changes.
For example, a response like {'users': [<an array of users>]}
means we could potentially add pagination later (for example), without having to rework the data structure and forcing any users / client libraries to have to make significant updates. The same goes for the other example with tables.
Otherwise we’d have to have multiple versions of endpoints every time we wanted to add additional data, and client libraries would be forced to rework their internal data structures accordingly.
Note that this is just my own thoughts on the matter, and I’m sure there’s people with more API design experience that might have a differing opinion .
I think the way you’ve handled it is close to ideal - the API is more verbose than necessary (right now), but the client library abstracts that away for any users.
When it comes to record formats, I’m guessing that it’s to separate user-defined fields from Grist metadata fields (like ‘id’). This means to enumerate all your fields, you can do something like: response["fields"].items()
. The alternatives (to me) would be to:
- Add a list of user-defined fields to the response
userDefinedFields: [<my column names>]
, so you can iterate them by name (for field in response["userDefinedFields"]:
). - Add an explicit metadata object, where everything else is a user field:
{ "metadata": { "id": 123 }, myField: 3 }
. But then you need to filter out the metadata field when iterating anyway!
I’d love to know what the cleanest approach from your perspective here would be!
With webhook inconsistencies - that’s worth investigating. I’d have thought using HTTP status codes for errors would be cleaner. I’ll chat with the team about that one.
widgetOptions
is a pretty weird edge case too. I’ll have a chat about that as well. No idea why that one gets passed as a string.
Same with the PATCH / POST versions of docs/id/tables
- that seems like a mistake rather than a design choice.
Parameter defaults / undocumented options isn’t ideal. Pretty sure we can just schedule work to sort that. The sources for the doc pages are here, if you’re interesting in contributing your own findings too!
Finally, when it comes to the type system… yeah, reading through you’re write-up, I’ve no idea what to suggest there right now. And I suspect any work there would be big enough to not be done any time soon either! That said, if there’s anything you think could be cleaned up or small improvements that could be made - please do say them, and I’ll bring them to the team.
Thanks again for the detailed post - it’s really interesting to get these insights from someone who’s spent a long time diving into the APIs and figuring out the best ways to make them accessible.
(As an extra note, I love that you’ve linked your code where relevant. Its incredibly helpful!)
Hi there, finally I have time to come back to my grist-related projects (I started translating the docs, btw!). So, here’s my 2 cents…
When it comes to overly nested objects - I can see the frustration there! Personally, I’m a fan of key-value structures in API design, as they provide an avenue for extension without making breaking changes.
I get it… not sure if I agree, but I kinda get it… My gut instinct, if you need something like {'users': [<an array of users>]}
because in the future you may want to extend the api to the point where you will return both “an array of users” and “an array of something else” in the same response, then I’m afraid it’s another api.
But I’m no web api expert… I’ll just keep Pygrister as flat as possible for the time being, and worry about this later…
When it comes to record formats, I’m guessing that it’s to separate user-defined fields from Grist metadata fields (like ‘id’).
Uhm, don’t think so?.. Grist fields already come mixed together with user-defined ones (except for “id”, that is). If you want to filter those out, you can use the “hidden=false” option in your request.
On the other side, the “id” field being the real primary key, you always want it in the mix, when possible. In my (and pygrister’s) opinion, a record
- either includes an “id”, which means it’s already stored in the table,
- or not, which means it’s about to be.
Parameter defaults / undocumented options isn’t ideal. … if you’re interesting in contributing your own findings too!
Problem is, I’m not fluent in typescript and I had quite a few difficulties navigating the source code looking for the api endpoints… I’ll give it another shot when I have more time…
Finally, when it comes to the type system… yeah, reading through you’re write-up, I’ve no idea what to suggest there right now.
TBH, it’s not a big deal anyway. I mean, perhaps developer-wise, you should find a way to a more formal specification for your own sake… but from the point of view of a user, meh… When your backend is Sqlite, you sorta expect things being fuzzy, and it’s really a feature not a bug. However, the Grist gui does its job, and the apis deal in json anyway, so who cares.
The only problem I have as a user, is when I download the sqlite db to run queries against it. Then, it’s pretty annoying having to deal with the quirkiness, esp. the occasional value encoded as bytes, esp. because the marshalling algorithm is not documented. Ideally, it would be nice to have a stipulation like “everything that is not of the expected type for a column, is stored as text (and complex data is serialized as json)”. But I understand json is slow. At the very least, you should document your marshalling algorithm and update the doc page on types…
In any case, when I was investingating Grist’s type system, I made a test table… the values here are either inserted in the gui, or passed through the apis. It’s far from being complete, but it helped me find a few quirks… if you download the sqlite db, you’ll find which values are actually stored as bytes, for example.
@Riccardo_Polignieri , just a token of my appreciation for your superb work and delivery of pygrister. I am testing your GristAPI while solving the Advent of Code 2024 in a competition with friends in Norway, Netherlands, and Germany. Your API is super-stable, clearly coded and and fun to use.
Ha! Thanks, this is me blushing…
There’s still a few tweaks to add, but yes, I think the api is pretty stable by now… Maybe in a few weeks I’ll take a leap in the dark and take it out of beta… also because I don’t like this “perpetual beta” thing at all…
hello, thank you very much for this api! I peeked at the code (not an expert) and was wondering, would there any benefit to using requests.session inside the library for persistence? the requests docs mention possible performance improvement Advanced Usage — Requests 2.32.3 documentation
Ehm, good question! When I started writing Pygrister, implementing sessions was in my todo list… then I totally forgot about that.
I’ll look into it at some point, but it’s kinda tricky. Integrating sessions in my code would not be difficult in itself… the problem is, you don’t want to always enforce sessions: users should be free to use them or not. So, I have to figure out a way to offer the choice, withouth breaking the exsistent api: maybe a switch like “grist.use_sessions = True”… but I’m not sure yet.
As always, writing code that more-or-less works is easy. The hard part to figure out is the interface for your code.
Done! Just added support for sessions, see the documentation. It’s not on PyPI yet, so you will have to install directly from GitHub if you want to try it out. The important thing is, it doesn’t affect the rest of Pygrister’s api… just open a session, then use the GristApi class as before.