Over 18 million frames in db today… getting this data decoded and usable should be a top priority for LSF. Thankfully, I don’t think it will be too hard if we decide to pick some tools off the shelf, and give us the added benefit of crowd-sourced visualizations.
I set up a proof of concept last weekend, thinking that the data in these frames is really a bunch of time series data (albeit irregular in frequency). This makes time series databases a perfect fit for handling our decoded data. For this POC I went with influxdb after recommendations from colleagues.
I took a month of data from db for unisat-6 (getting “all of the data” for it made db choke and I never got the email). I went with unisat-6 because we already have a decoder written for it. I modified this decoder a bit to read the data from the .csv and discard any frames that were not the telemetry we were looking for (“U S 6”).
Then, each data point was entered into influxdb with the timestamp of the frame, tagged with the satellite name. We have a ton of data for unisat-6 so in all I think some 40,000 frames were imported.
I slapped grafana on top of this and with a few clicks I had a dashboard that I could visualize, dive into, change the time scope, etc:
Now, there are some design issues in this model that need some discussing and consideration (like the data schema, tagging, etc) which is why I haven’t opened this discussion up for the broader community to pitch in on yet… but I feel this is a feasible way of getting our data out there and it shouldn’t be too hard. In fact, the visualization and data usage opportunities are endless here - especially if we open up the data via an API.
- raw frames remain in db as they are today
- we stand up a TSDB, datawarehouse.satnogs.org?? warehouse.satnogs.org?
- add a “decoded” field to db
- create per-satellite decoder scripts (or decode functions that we store alongside the satellite in db?).
- the decoder scripts run, and as each frame is decoded it is tagged as such in the “decoded” field in db - this allows us to start decoding a satellite we have been capturing for a while now while continuing to do so for new frames that come in (select where ‘decoded’ = 0)
- we stand up a grafana instance with user accounts for the community
- create general dashboards for each satellite in grafana and import the dashboard iframe back into the satellite display in db.satnogs.org
- if someone wants a different type of dashboard, or create a dashboard that mixes data between different satellites, they can do that themselves within grafana. The opportunities for data mining and discovery here are endless. Alerts can be setup within grafana and sat owners can be notified when events or anomalies occur.
- eventually, or in parallel, middleware needs to be written to copy frames from network.satnogs.org into db.satnogs.org (I would propose we do this by writing a SIDS exporter for network.satnogs.org just to keep things simple in the db). But I don’t want this to block or derail the decoding and visualization work as we already have a majority of our data in db.
- Is the per-satellite python method right? I hear of people looking into kaitai but is that going to work in our environment? The python struct unpack method does indeed work today but may be more work up front in writing the decoders, however we get more flexibility then I think we would get with kaitai (ie: skipping to the US6 in a unisat-6 packet to begin telemetry).
- TSDB schema design… in my POC every data point was unique. For instance, there’s a data point called “uptime” so picking the latest data is a matter of picking the “uptime” field where the “satellite” tag is “upsat-6”. Uptime is a common field and could be common across a lot of different satellites, while some satellites may differ in terminology (and maybe that’s okay). Obviously there will be fields unique to each satellite. And, maybe in the TSDB world we don’t care as long as the fields are programmatically decoded and properly tagged.
- tagging needs to be thoroughly discussed and agreed upon before we import 18M records. Does the submitter become a tag? Location?.. (I forget what all we collect in db, but I know its not all exposed through the csv exporter)
- how do we handle data of different types from satellites? In the unisat-6 example I only decoded telemetry packets. Maybe that’s good enough for now and all we care about.
- Is a TSDB the right choice for satellites we won’t be collecting frequent data from? (I still think it is - dashboards can be “last data is ##” without the need for graphing)