External attachment on S3 are not cleaned up after re-internalizing attachements?

Hello, i am working with a self-hosted 1.7.6 instance using compose

Everything (psql, saml, minio, redis) works fine, and i do not have technical problems

But i have found something weird when toggling doc attachements :

  • i create a doc –> OK
    • attachments default to internal –> OK
    • grist files appears in minio –> OK
  • i add a few attachments –> OK
    • minio doc gets bigger and has more revisions –> OK
  • i switch to external attachments –> OK
    • minio doc gets smaller with some new revisions –> OK
    • a new folder appears on minio for the attachements –> OK
  • i switch back to internal attachements –> OK
    • minio doc gets bigger with some new revisions –> OK
    • the attachement folder and the previously attached files are still there –> WRONG ?

My questions are the following

  • is it normal / expected behaviour ?
  • if yes
    • why is it done this way ?
    • is the pruning done automatically ?
      • if yes, when is it done and by what ?
      • if no, how to trigger it (or how to do it manually ?)

If it not cleaned up automatically, i worry it would eat much space in the long run.

Just to make things more visual, here is the minio layout i am working seeing for a doc:

$ mcli tree --files dev/grist-rc
dev/grist-rc
└─ docs
   ├─ s8naqXEQQqzPzqbu82j63n.grist --> about 80 megabytes
   ...
   └─ assets
      └─ unversioned
         ├─ s8naqXEQQqzPzqbu82j63n
         │  └─ meta.json
         ...

# toggling attachements to external

$ mclitree --files dev/grist-rcn
dev/grist-rc
└─ docs
    ...
    ├─ s8naqXEQQqzPzqbu82j63n.grist --> back to a few kilobytes
    ...
    ├─ assets
    │  └─ unversioned
          ...
    │     ├─ s8naqXEQQqzPzqbu82j63n
    │     │  └─ meta.json
          ...
    └─ attachments
        └─ s8naqXEQQqzPzqbu82j63n
            ├─ 90305afdb21feeccc8eb72f65542c64468c09e03.cfb
            ├─ c5b9678e6d578f8ba399d3a24ec98798b6356def.bmp
            └─ f61d5adfb3198387ab1e76e6703c4931e437f851.bmp

# toggling attachements back to internal

$ mclitree --files dev/grist-rcn
dev/grist-rc
└─ docs
    ...
    ├─ s8naqXEQQqzPzqbu82j63n.grist --> back to about 80 megabytes
    ...
    ├─ assets
    │  └─ unversioned
          ...
    │     ├─ s8naqXEQQqzPzqbu82j63n
    │     │  └─ meta.json
          ...
    └─ attachments
        └─ s8naqXEQQqzPzqbu82j63n
            ├─ 90305afdb21feeccc8eb72f65542c64468c09e03.cfb --> still there
            ├─ c5b9678e6d578f8ba399d3a24ec98798b6356def.bmp --> still there
            └─ f61d5adfb3198387ab1e76e6703c4931e437f851.bmp --> still there

Thanks in advance for your feedback, and have a nice day

Nicolas

Hi Nicolas,

It is a bit awkward. The external attachments feature was built for a client with a specific budget for development, and pruning external attachments at more granularity than individual document deletion wasn’t a big deal for them. I do hope someone gets to building this out, since I am seeing people put the feature to work for use cases that aren’t practical without pruning. It wouldn’t be that hard to build, it just isn’t trivial to know which attachments might still be in use by older versions of the document.

Regular internal attachments do get pruned in a straightforward way, since there’s no issue related to older versions of the document (they would have their own copies of the attachments). But that isn’t practical for the kinds of use cases where people worry about significant disk space consumption.