I’m having fun with a small side project looking to recognise RF signals with a machine learning algorithm to optimise ground station usage for space applications. I was wondering if it were possible to have the raw I/Q data from as many observations from as many different small satellites as possible over the course of a few weeks/a month for example from a well equipped station (to give consistent quality recordings) - something similar to the EIRSAT-1 ground station.
Is there somewhere where the raw data from observations is stored? Or this is avoided due to the size of the files? Would anyone be willing to help me out with this request?
For this kind of work, I think the raw I/Q files are only half of the dataset.
The associated metadata is just as important if the goal is to make the data reusable and comparable across stations or processing pipelines. Some public IQ archives already expose useful observation-level metadata, such as observation id, date/time, satellite, status and sometimes decoded telemetry links. That is already very helpful.
For ML-oriented reuse, though, I would still try to keep a small machine-readable manifest per recording with satellite/NORAD id, observation id if available, station id, timestamp, center frequency, sample rate, sample format, doppler correction status, pass information and any known decoder/result associated with that recording. Witout that context, the files may still be useful, but reproducing results or comparing different observations later becomes much harder.
Good point, yes. SigMF is probably the right direction here, much better than inventing an ad-hoc manifest format. The useful part would probably be a small SatNOGS-specific SigMF extension or convention for the mission/observation context that is not purely capture-level: observation id, station id, satellite/NORAD id, Doppler correction status, and possibly decoder/result links when available. That would keep the IQ recordings aligned with an existing open standard, while making them easier to reuse across ML experiments and downstream tools.
I’ve updated the stations to upload .tar.zst archives with sigmf-data. Preliminarily, the metadata is just link by an url to the observations on satnogs-network. It is not that easy to get everything within a post-observation script.
Linking the SigMF recording back to the SatNOGS observation URL already solves a large part of the provenance problem, especially if the post-observation script does not have easy access to all the context locally.
Maybe a practical split could be:
keep the station-side writer simple and robust: SigMF data, core capture metadata, checksum, observation URL;
optionally enrich the archive later from the SatNOGS Network API, using the observation URL as the key.
That second step could add SatNOGS-specific context such as observation id, station id, satellite/NORAD id, transmitter/mode, Doppler correction status, and decoder/result links if available.
This would keep the capture side clean while still allowing ML or downstream tools to work with richer metadata when needed.
F.
I use the sigmf-python packet. With that one, it is fairly easy to generate the correct meta-data. It even calculates your the SHA512sum of the iq-data and give you the option to validate everything.
I published the script I currently use in production here:
Please feel free to re-publish your changes, or send them to me, if you do any. I would be especially interested in integrating or using a native way for the find_baudrate.py script, which is currently borrowed from the sa2kng docker stack.
If github is easier for you, I can publish it there too.
For now I would not touch the sample-rate detection part without understanding the station setup better.
A small low-risk contribution could be documentation around sigmf_packer.py: expected SatNOGS post-script arguments, required environment variables, generated .sigmf-data/.sigmf-meta/.tar.zst archive layout, and the current find_samp_rate.py dependency.
I also noticed two small pyproject details: the package name seems to have a typo, and requests/urllib3 are imported by the script but not listed as dependencies.
That kind of cleanup may make the script easier for others to inspect or reuse, without changing the working station-side behavior.
Thank you very much for the review and your feedback.
Very good catches with the packet title typo and the missing dependencies. I just added those.
The post-script arguments are the standard ones, the satnogs client will call your script with. I’d love to have that interface somehow reformed (may have some json there), but atm it works that way.
I spend wuite some attention on the tar archive, so that it will indeed only contain the iq data and the sigmf meta file, without a top directory. I found it much cleaner and minimalistic that way.
Unfortunately, I forgot to mention, that my station runs directly, withouth the docker container, but the python app installed diretly via pip. That way, I assume that .env file and /tmp are accessible freely. Might be a bit more complex with the docker container boundaries, but shouldn’t be too difficult to adjust it accordingly.