Self-hosted backups of standalone single-container instance

Hi all,

I have been successfully running Grist in a docker container in what I guess is probably the simplest possible configuration, mounting a local folder in /persist and exposing port 8484 to localhost, all this running just on my laptop. This has been working incredibly well for me and so I’ve been enthusing about it to others. We are now considering self-hosting for me and my colleagues, putting it on a little server somewhere, but we are wondering about how to automatically back up.

Ideally, we’d not have to enable any kind of S3 storage: the plain sqlite-on-disk seems to be working well for us so far.

So is it safe to just rsync the /persist folder (e.g. from a cron job), or is there sqlite-related possibility of data corruption there?

If simple rsyncing isn’t advisable, I could imagine using the sqlite3 .backup command on the various sqlite files found in /persist. If this is a potential strategy, are there any cross-database considerations I should bear in mind, or is it sufficient to take independent .backups of home.sqlite3 and */*.grist?

I’ve wrote a tool for grist backups:

https://community.getgrist.com/t/small-stand-alone-application-for-periodially-grist-backups-sqlite-xlsx-csv

this might be what you’re looking for.

Thank you, yes this looks lovely. I don’t think it’s quite what I’m after, as it uses the API rather than being able to take advantage of colocation on the server itself. If it turns out there’s no sensible way of running backups on the local server machine, then I will revisit using the API of course.

I would advice against just copying the sqlite database (during operation).
This is not good practice in general.

What i’ve used in the past for robust backup was:

VACUUM_INTO

The benefit of using the grist api is that they could do the stuff they do (or will do in the future) on export, such as cleaning up old logs etc.

1 Like

Yes, I think just rsync is probably a non-starter. But I’m still hopeful about the vacuum option (which I presume is largely the same as .backup from the command line tool, or at least comparable). So I think my questions still stand.

I think you can get away with rsync if you set GRIST_SQLITE_MODE=wal. Without that, you might get unlucky and copy a database in an inconsistent state. With it, you could lose some of the latest writes that haven’t been flushed if you’re unlucky, but you should have the bulk of the database always in a consistent and readable form.

The sqlite backup api is great and what Grist uses internally for wrangling external storage of these documents. I’d expect .backup (or VACUUM INTO) to work fine for a small setup like this.

1 Like

then something like this untested code might work,
idk if it locks the database tbh.:

ssh yourserver  ‘sqlite3 "your.grist" "VACUUM INTO \"/tmp/your.grist.backup\"'
rsync yourserver:/tmp/your.grist.backup /your/pool/
1 Like

Thanks! So I’ve just tried out the following small script, and it seems to work OK (though of course we won’t know if it works under load until there’s some actual, er, load):

Run it from the directory mounted as /persist.

#!/bin/sh

set -e

v="$(date +%Y%m%d-%H%M%S)"
target="$HOME/backups/grist/$v"

find . -iname '*.sqlite3' -o -iname '*.grist' | while read p
do
    printf "%s...\n" "$p"
    mkdir -p "$target/$(dirname "$p")"
    sqlite3 "$p" ".backup $target/$p"
done

After running it, simply spinning up a docker container pointing /persist to the new backup directory containing home.sqlite3 seems to work fine (though to see the actual docs I had to also set the GRIST_DEFAULT_EMAIL container environment variable, because the backed-up system uses OIDC).

1 Like