Verify Your Backups

Let me tell you how I was an idiot, ignored best practices, and luckily didn’t lose anything of value. And yes, this is a story about backups.

But first some context: in my homelab I try new stuff all the time. Recently I picked up longhorn, which is a pretty neat volume manager for Kubernetes. The idea is simple: you deploy longhorn, you assign disks, and then you can dynamically allocate volumes which map to persistent volumes in Kubernetes. This allows workloads to move around nodes and the volume will automatically be remounted. Longhorn also offers recurring tasks so you can automatically backup your volumes on a schedule. Perfect. So I set up a new service (forgejo) to play around and provisioned a postgres instance backed by a longhorn volume. Easy enough with a persistent volume claim. I verified the volume was created, postgres was running, manually triggered a backup and happily noted that the backup had some stuff in it, about ~60MB. I even set up a recurring backup schedule. Then I started adding some data to forgejo and overall was happy.

Fast forward a week or two. I open up my forgejo instance and try to sign in but my credentials are rejected. Strange. After some poking around I connect to the postgres instance and… it’s empty. Very strange. My first reaction was that something must have gone wrong with longhorn. I faintly remember having heard rumors that longhorn sometimes loses data. Unfortunate, but first order of business was to restore the data. No problem, I thought, I have backups! So I open the longhorn UI, click around, read up on how to restore backups, and create a new volume from the backup. But when I poke inside the volume it’s empty, aside from the filesystem’s lost+found directory. Weird. Now cold sweat is starting to appear on my forehead. The backup clearly has a non-zero size, so something must be in there. So I spent an hour or two researching backups in longhorn and issues with them, but couldn’t find anything that would explain this. In the end I traced through debug logs of the instance manager and saw that it dutifully did it’s job, but yet… the volume was empty.

At that point it dawned on me that it might be empty because it might actually be empty. So I double checked my service definition and here’s what I found, boiled down to the essentials:

          env:
            - name: PGDATA
              value: "/var/lib/postgres/18/docker/data"
           volumeMounts:
            - mountPath: "/var/lib/postgresql/18/docker"
              name: postgres
  volumeClaimTemplates:
    - metadata:
        name: postgres
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: '1Gi'
        storageClassName: 'longhorn'

Did you spot it? At first glance everything seems in order, but when you look closely you’ll notice that PGDATA instructs postgres to store data in /var/lib/postgres/18/docker/data whereas the mountpath is actually /var/lib/postgresql/18/docker.

<insert endless facepalm meme>

So postgres was happily writing it’s data into ephemeral storage and not in the actual volume. And longhorn dutifully backed up an empty volume. Which, by the way, reported a non-zero size because it operates at block level and not filesystem level and when you create a filesystem in a block device, it will take up some space even if it’s empty. When the postgres pod eventually was restarted, everything in its ephemeral storage got wiped and the new pod started fresh.

Ultimately not much data of value was lost, just a few tickets I had created to test things out in forgejo. It was however an invaluable reminder why we must verify our backups. An unverified backup is (probably) worthless. So learn from my mistakes and verify your backups.

Seriously, go verify your backups today.

For those looking for a happy ending here, I’m glad to report that the mistakes have been addressed, forgejo and its postgres instances are running again, and I’m now verifying my backups.

ilikeorangutans

Jakob Külzer’s personal blog


2026-04-25

Post Edits
  • 2026-04-25: edits for clarity, added links
  • 2026-04-25: post on verifying backups