Hello Grist Community,
I’m running a self-hosted Enterprise instance of Grist and encountering 429 “Rate limit reached for gpt-4o … TPM” errors when using the built-in Assistant. This happens despite lowering token usage and switching models for bulk operations.
Has anyone faced this issue and resolved it while still using an OpenAI API key? If so, what configuration changes, token limits, retry logic, or model-rotation strategies worked for you?
If not, has anyone successfully connected the Assistant to an alternative AI provider—such as via OpenRouter or DeepSeek—to bypass OpenAI’s TPM constraints? I attempted to set up a proxy but ran into DNS resolution and outbound connectivity issues.
Below is a redacted view of my environment and the Docker commands I used to restart Grist:
text
# /opt/grist-ee/grist-ee.env
GRIST_EDITION=enterprise
URL=http://<your-domain>:8484
GRIST_HTTP_PORT=8484
GRIST_HTTPS_PORT=8443
HOST_HTTP_PORT=8484
HOST_HTTPS_PORT=8444
EMAIL=<your-admin-email>
PASSWORD=<your-admin-password>
TEAM=<team-name>
PERSIST_DIR=/persist
GRIST_DISABLE_SANDBOX=true
GRIST_ACTIVATION=<your-activation-key>
GRIST_ASSISTANT_ENABLED=true
OPENAI_API_KEY=<your-openai-key>
OPENAI_MODEL=gpt-3.5-turbo ASSISTANT_MAX_TOKENS=1000
text
# Stop and remove existing container docker stop grist-omnibus-test docker rm grist-omnibus-test # Start with updated configuration docker run -d \ --name grist-omnibus-test \ --env-file /opt/grist-ee/grist-ee.env \ --add-host <your-domain>:<your-ip> \ -p 8484:80 \ -p 8444:443 \ -v /media/.../data:/persist \ gristlabs/grist-omnibus:latest
Thank you for any insights on staying within rate limits or integrating with OpenRouter/DeepSeek!
My understanding is that it’s currently not possible to suppress Grist’s built-in function_calling and response_format fields when proxying through OpenRouter (or any other non-OpenAI endpoint). Grist always includes these parameters in its API calls, and providers like OpenRouter’s Fireworks backend reject requests containing both.
At present, the only fully supported configuration is to point Grist at a truly OpenAI-compatible endpoint that accepts these parameters—namely, OpenAI’s own API or a service with full OpenAI parity (such as Azure OpenAI). Custom or proxy endpoints will continue to encounter this conflict until Grist provides a setting to disable function calling.
Unfortunately, Grist currently includes a response_format parameter even when using models that don’t support structured outputs, so any non-OpenAI endpoint (including OpenRouter) will reject the request with that error. The only way to avoid the response_format conflict today is to point Grist at a fully OpenAI-compatible endpoint that accepts it, such as:
Could someone confirm this assumption?
Hi @ROBIN_ANGELE.
You’re correct. Those parameters are currently always set for requests made to the LLM endpoint.
response_format is really only used internally by us for evaluating model performance in tests. We could relax it to being optional without any perceptible changes to how the assistant works.
tools is more critical, however. Without it, there is no way for the LLM to read or modify your Grist document.
What kind of use case do you have in mind, if you don’t mind answering?
George
Hi George, thanks for confirming! That matches what I’ve observed.
My main use case is to keep using Grist’s Assistant (with full read/write capability on my documents) but without bumping into OpenAI’s TPM limits. For bulk operations it’s easy to hit 429s even with reduced token usage. That’s why I’ve been experimenting with OpenRouter and DeepSeek: they offer bigger rate limits or cheaper pricing.
Right now the roadblock seems to be exactly what you described: Grist always sends tools and response_format in the API payload, so any endpoint that isn’t 100 % OpenAI-compatible (e.g. OpenRouter’s Fireworks backend) rejects the call. I’ve only been able to get the Assistant working with my own OpenAI API key.
It’s totally possible I’ve overlooked something or mis-configured my setup. If anyone in the community has actually managed to run the AI Assistant successfully against a non-OpenAI provider — or to make it work with DeepSeek or even another provider less restricted in terms of tokens/minutes than OpenAI — I’d be really happy to hear about how you did it (settings, proxy, anything).
If/when response_format becomes optional (or if there’s a toggle to disable it), it might open the door to proxies or other providers. Until then, the only stable configuration seems to be:
Happy to beta-test or file a PR if you ever add a setting to relax response_format or allow custom payloads — it would help self-hosters like me use alternative LLMs without losing the Assistant’s ability to read/write docs.
Thanks again for your work on this!
Got it.
Will look into making response_format opt-in as part of the next round of improvements to the assistant.
George
Hi @georgegevoian
Do you have any news on making response_format opt-in?
Quick follow-up question: has anyone here (or in the broader community) actually successfully run the full self-hosted Enterprise AI Assistant with a local model like Ollama or another self-hosted LLM? Specifically, using ASSISTANT_CHAT_COMPLETION_ENDPOINT pointed at a local /v1/chat/completions endpoint with tool calling support.
I’ve seen a community setup for the legacy Formula Assistant with Ollama, but I’m curious if the full Assistant works reliably with local models for self-hosters.
Would love to hear any success stories, configs, or pitfalls!
Thanks!
Hi @ROBIN_ANGELE.
It’s opt-in as of a few months ago. The env variable that controls it is ASSISTANT_STRUCTURED_OUTPUT and it defaults to false.
The latest Grist images and the prior few releases from earlier in the year should have the change.
George