[Proposal] Network data retention policy

As our Network instances grow in numbers of observations (8300+ on production and 4000 on dev) and more stations are coming online, we are facing a gradual challenge with the amount of data our servers will store.

Although it is not an issue right now, we would like to be proactive about it, and include @all in the though process that will lead us to a policy. After discussions with @fredy @comzeradd and @BOCTOK-1 the proposal is this:

  • All vetted (verified with data) observations stay as is and will never be deleted
  • On the development instance of the network, all unvetted or bad observations that are more than 1 month old will be deleted automatically every day
  • As more and more automatic decoders will come online in satnogs-client (and thus audio recordings become obsolete, unless audio is the payload) we will be examining ways to store audio more efficiently (dedicated server? uploads to archive.org?) That said, we don’t expect any changes in our current flow for the foreseeable future.

For reference audio recordings account for 10x the size of everything else we store in our DB :wink:
Once again, I would like to emphasize that all vetted and decoded payloads are and will be forever safe!

What do you all think?

1 Like

+1

You didn’t mention culling bad data from prod, but it is implied. Is that part of the plan?

Just for clarity sake - the audio recordings are not stored in the db itself but as external files (lest any dba read this and twitch) :slight_smile:

If it came down to storage, we could adjust the architecture to keep pretty much everything indefinitely by moving those audio files to S3 and using glacier for the archival of really old data.

3 Likes

Could archive the oldest observations audio in some form of compressed file. (Maybe not zip one of those type of things)

Those are already lossy compressed (ogg), so there is nothing more usable we can do on them.

So compressing multiple into one tar.gz file for example won’t help?