I’ve been using the fetch_frames_from_network.py script from the satnogs-decoders repository to download observations of CUTE from the SatNOGS DB. However, I find that using this script to download even only about a week of data takes an excruciatingly long time to complete. So, I took a look at the script to see how it operates:
- The script starts by running a locally defined method
- This method then invokes
fetch_observation_data_from_id(defined in satnogs_api_client.py) to get a list of observations.
fetch_frames_from_networkthen loops through all of the observations returned by
fetch_observation_data_from_idand skips those that don’t have
- For each item in
demoddata(in each observation),
fetch_frames_from_networkdownloads the contents of the observation and stores said in RAM.
fetch_frames_from_networkthen returns a (possibly large) array of filenames and
- Back in global scope (
if __name__ == '__main__'), the script loops over all of the filename/data pairs and writes each of these to disk.
So, the first (and most apparent) negative of the script is that it defers writing
demoddata to disk until after all of the observations have been downloaded (into RAM). From an end-user perspective, this can make it look as though the script has hung, as the expectation is that the output directory should slowly populate with observation data. Instead, the output directory will remain empty until after all the data has been downloaded, and then it will suddenly explode with new files. Fixing this problem is pretty easy, and just consists of refactoring
fetch_frames_from_network.py to defer downloading of each observation to the same time that data is written to disk: i.e., the script should (1) get a list of observations, (2) get
demoddata URLs for each observation, (3) then for each
demoddata URL, download the data, write it to a file, then close the file before advancing to the next
I went ahead and modified
fetch_frames_from_network.py to behave in this manner, only to find that there is actually a much more glaring bottleneck, which is the
fetch_observation_data_from_id() method in
satnogs_api_client.py. There are two problems that I see with this method:
- There is no way for the client to request the server to filter observations by presence of
demoddata. At the time of writing, CUTE (object 49263) returns 846 observations spread over 33 pages of results, but only 163 of these observations have
- Having to fetch the list of observations one page at a time seems to really slow things down–it would seem that it would be ideal if a complete list of observations could be downloaded instead.
However, all of these issues aside, it is not clear to me that
fetch_frames_from_network is even the right tool to be using. It looks as though it has not been touched in about two years. It looks like there are three API libraries floating around:
- satnogs_api_client.py in satnogs-decoders/contrib/manage/
Looking at the documentation for the SatNOGS DB API, it appears as though there are methods for querying artifacts and observations, but these seem to assume that you already know the artifact/observation ID.
So, before I dig too much further into this–is
fetch_frames_from_network old/unmaintained and there is some newer/better tool that I should be using? Else, if it seems worthwhile to the community, I could push my modified version of this script (which at least interleaves fetching observations with writing to disk) to my fork of satnogs-decoders.
I suppose one other way to make the script a little less slow seeming would be to fetch one page of observations at a time, then fetch all of the
demoddata from that page of results before advancing to the next page (rather than trying to get a list of all observations first).
Also, and I might be imagining things, but I swear that at some point in my experimentation, it seemed as though the network API returned all of the observations (regardless of NORAD ID) in the date range I selected (something like 16,000 observations), rather than just those for CUTE. There is a note in
satnogs_api_client.py which says
# Current prod is broken and can't filter on NORAD ID correctly, use client-side filtering instead observations = list(filter(lambda o: o['norad_cat_id'] == norad_id, observations))
However, I have not been able to reproduce this behavior.