Hi all,
I’ve been using the fetch_frames_from_network.py script from the satnogs-decoders repository to download observations of CUTE from the SatNOGS DB. However, I find that using this script to download even only about a week of data takes an excruciatingly long time to complete. So, I took a look at the script to see how it operates:
- The script starts by running a locally defined method
fetch_frames_from_network
. - This method then invokes
fetch_observation_data_from_id
(defined in satnogs_api_client.py) to get a list of observations. -
fetch_frames_from_network
then loops through all of the observations returned byfetch_observation_data_from_id
and skips those that don’t havedemoddata
. - For each item in
demoddata
(in each observation),fetch_frames_from_network
downloads the contents of the observation and stores said in RAM. -
fetch_frames_from_network
then returns a (possibly large) array of filenames anddemoddata
. - Back in global scope (
if __name__ == '__main__'
), the script loops over all of the filename/data pairs and writes each of these to disk.
So, the first (and most apparent) negative of the script is that it defers writing demoddata
to disk until after all of the observations have been downloaded (into RAM). From an end-user perspective, this can make it look as though the script has hung, as the expectation is that the output directory should slowly populate with observation data. Instead, the output directory will remain empty until after all the data has been downloaded, and then it will suddenly explode with new files. Fixing this problem is pretty easy, and just consists of refactoring fetch_frames_from_network.py
to defer downloading of each observation to the same time that data is written to disk: i.e., the script should (1) get a list of observations, (2) get demoddata
URLs for each observation, (3) then for each demoddata
URL, download the data, write it to a file, then close the file before advancing to the next demoddata
.
I went ahead and modified fetch_frames_from_network.py
to behave in this manner, only to find that there is actually a much more glaring bottleneck, which is the fetch_observation_data_from_id()
method in satnogs_api_client.py
. There are two problems that I see with this method:
- There is no way for the client to request the server to filter observations by presence of
demoddata
. At the time of writing, CUTE (object 49263) returns 846 observations spread over 33 pages of results, but only 163 of these observations havedemoddata
. - Having to fetch the list of observations one page at a time seems to really slow things down–it would seem that it would be ideal if a complete list of observations could be downloaded instead.
However, all of these issues aside, it is not clear to me that fetch_frames_from_network
is even the right tool to be using. It looks as though it has not been touched in about two years. It looks like there are three API libraries floating around:
- satnogs-db-api-client
- satnogs-network-api-client
- satnogs_api_client.py in satnogs-decoders/contrib/manage/
Looking at the documentation for the SatNOGS DB API, it appears as though there are methods for querying artifacts and observations, but these seem to assume that you already know the artifact/observation ID.
So, before I dig too much further into this–is fetch_frames_from_network
old/unmaintained and there is some newer/better tool that I should be using? Else, if it seems worthwhile to the community, I could push my modified version of this script (which at least interleaves fetching observations with writing to disk) to my fork of satnogs-decoders.
I suppose one other way to make the script a little less slow seeming would be to fetch one page of observations at a time, then fetch all of the demoddata
from that page of results before advancing to the next page (rather than trying to get a list of all observations first).
Also, and I might be imagining things, but I swear that at some point in my experimentation, it seemed as though the network API returned all of the observations (regardless of NORAD ID) in the date range I selected (something like 16,000 observations), rather than just those for CUTE. There is a note in satnogs_api_client.py
which says
# Current prod is broken and can't filter on NORAD ID correctly, use client-side filtering instead
observations = list(filter(lambda o: o['norad_cat_id'] == norad_id, observations))
However, I have not been able to reproduce this behavior.
Thoughts?
Thanks,
Nick