I’m having fun with a small side project looking to recognise RF signals with a machine learning algorithm to optimise ground station usage for space applications. I was wondering if it were possible to have the raw I/Q data from as many observations from as many different small satellites as possible over the course of a few weeks/a month for example from a well equipped station (to give consistent quality recordings) - something similar to the EIRSAT-1 ground station.
Is there somewhere where the raw data from observations is stored? Or this is avoided due to the size of the files? Would anyone be willing to help me out with this request?
For this kind of work, I think the raw I/Q files are only half of the dataset.
The associated metadata is just as important if the goal is to make the data reusable and comparable across stations or processing pipelines. Some public IQ archives already expose useful observation-level metadata, such as observation id, date/time, satellite, status and sometimes decoded telemetry links. That is already very helpful.
For ML-oriented reuse, though, I would still try to keep a small machine-readable manifest per recording with satellite/NORAD id, observation id if available, station id, timestamp, center frequency, sample rate, sample format, doppler correction status, pass information and any known decoder/result associated with that recording. Witout that context, the files may still be useful, but reproducing results or comparing different observations later becomes much harder.
Good point, yes. SigMF is probably the right direction here, much better than inventing an ad-hoc manifest format. The useful part would probably be a small SatNOGS-specific SigMF extension or convention for the mission/observation context that is not purely capture-level: observation id, station id, satellite/NORAD id, Doppler correction status, and possibly decoder/result links when available. That would keep the IQ recordings aligned with an existing open standard, while making them easier to reuse across ML experiments and downstream tools.
I’ve updated the stations to upload .tar.zst archives with sigmf-data. Preliminarily, the metadata is just link by an url to the observations on satnogs-network. It is not that easy to get everything within a post-observation script.
Linking the SigMF recording back to the SatNOGS observation URL already solves a large part of the provenance problem, especially if the post-observation script does not have easy access to all the context locally.
Maybe a practical split could be:
keep the station-side writer simple and robust: SigMF data, core capture metadata, checksum, observation URL;
optionally enrich the archive later from the SatNOGS Network API, using the observation URL as the key.
That second step could add SatNOGS-specific context such as observation id, station id, satellite/NORAD id, transmitter/mode, Doppler correction status, and decoder/result links if available.
This would keep the capture side clean while still allowing ML or downstream tools to work with richer metadata when needed.
F.