Workflow for uploading observation artifacts

Hello SatNOGS team,

I’ve got 8500+ frames waiting to upload that are between 60 and 512 bytes in size so let’s round up 5 MBytes – Bobcat/Grifex and ELFIN observations pretty close to each other. Then there are about 8 OGG/PNG files that are backlogged waiting for the frames to finish uploading, totaling about 100 MBytes. So 105 MBytes in SatNOGS data total.

What’s happening is the 8500+ tiny frames/files (~ 5 MBytes total) are taking 2-3 hours to upload – meanwhile more and more observations/frames are being captured and are piling up behind it. Maybe 1 tiny frame/file every 3-5 seconds. A data traffic jam! (beep beep)

A 9600 baud modem (~ 960 bytes/sec) would be moving these frames faster. :upside_down_face:

Even with my upload bandwidth cap of 1.5 Mbit/sec all this data should easily upload in 15 to 20 minutes.

Am i bumping up against a SatNOGS workflow ceiling or do I need to pony up for better internet service to move 100 Mbytes of data in less than 3 hours?

Certainly willing to do the later but got to be something else coming into play here I think and hence the reason for this post. Any ideas?

My traceroute to network.satnogs.org show no packet loss though any of the ~15 hops from me to DB. Ping time to network.satnogs.org was < 160 ms (I know this isn’t an indication of throughput)

Also used speedtest.net to various servers in the US and am maintaining my 1.5 Mbit/sec speeds across the board.

As always, looking forward to your collective insights on why these are taking so long to upload.

satcolintel5

2 Likes

update

2 hours later – it went from 8500 frames down to 2000 frames but still not caught up

The failed observations are actually successful but just waiting for their respective ogg/png to upload – waiting on the 2000+ frames leftover from 2 hours ago

1 Like

Circling back on this. Thousands of valid data frames dropped on the floor for the 5th day in a row for station 2134. Internet connection in fine.

Am i doing something wrong? Help!

I don’t think you are doing something wrong, We need to debug this issue, not sure yet how but I’ll come back with something. A quick guess is that the combination of several observation with many data back to back and that the client uploads files one by one create that issue.

3 Likes

Trying to understand the location of the problem:
What is the observable frames-per-second upload speed for other stations?
Do other stations observe the same issue (backlog of frames)?

Sorry for not providing this sooner, I found this exception in syslog when looking for clues as to what is going on

Oct 25 09:48:28 satnogs satnogs-client[20602]: satnogsclient.scheduler.tasks - INFO - Trying to GET observation jobs from the network
Oct 25 09:48:28 satnogs satnogs-client[20602]: urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): network.satnogs.org:443
Oct 25 09:48:28 satnogs satnogs-client[20602]: satnogsclient.scheduler.tasks - INFO - Post data started
Oct 25 09:48:28 satnogs satnogs-client[20602]: apscheduler.executors.default - ERROR - Job "post_data (trigger: interval[0:03:00], next run at: 2021-10-25 13:51:28 UTC)" raised an exception
Oct 25 09:48:28 satnogs satnogs-client[20602]: Traceback (most recent call last):
Oct 25 09:48:28 satnogs satnogs-client[20602]:   File "/var/lib/satnogs/lib/python3.7/site-packages/apscheduler/executors/base.py", line 125, in run_job
Oct 25 09:48:28 satnogs satnogs-client[20602]:     retval = job.func(*job.args, **job.kwargs)
Oct 25 09:48:28 satnogs satnogs-client[20602]:   File "/var/lib/satnogs/lib/python3.7/site-packages/satnogsclient/scheduler/tasks.py", line 100, in post_data
Oct 25 09:48:28 satnogs satnogs-client[20602]:     observation = {'demoddata': (fil, base64.b64decode(data['pdu']))}
Oct 25 09:48:28 satnogs satnogs-client[20602]: TypeError: 'int' object is not subscriptable
2 Likes

That’s make sense… it seems that one of the files causes an error on the post_data function which uploads the data. This error stops the uploading task and the rest files are not uploaded and wait for the next run of post_data.

Do you use any 3rd party script for generating data?

For finding the file, if you are familiar with linux command line, I suggest you create a directory inside /tmp/.satnogs/data/ and move all the data files there. Then move some of them each time until you hit the error. By following this some times you will be able to limit the data files and find it.

If you are not very familiar, I suggest you enable the sentry logging by changing to True the SENTRY_ENABLED setting in sudo satnogs-setup under AdvancedDebug and hit Apply. This will allow client to log any errors in the remote platform Sentry where developers of SatNOGS Client have access to. After finding the file in the logs, you can disable this option if you want.

1 Like

@fredy, yes I have been using the gr_satellites add-on for SatNOGS for a few weeks now.

I had tried moving the older frame files into /data/.satnogs/data/tmp and then back to /data/.satnogs/data but did this with huge blocks of files at a time – so finding an errant frame file would been hard. I will try your suggestion of one-at-a-time and let you know the result.

Some of my frames are well over a day old, are they too old to be uploaded now (even if I can find the errant file(s) causing the issue)?

No problem at all.

I’ll let you know if we see any errors in Sentry.

1 Like

Moving 100 frame files a a time seems to be working, about 4000 backlogged frames have uploaded in the last couple hours. None have been deleted so that’s good news.

Sentry has not been activated, sorry I was not clear on that. Once my current observation is complete, I will enable it in the satnogs-client.

1 Like

Sentry has been enabled.

1 Like

The file that causes the issue is data_4900171_2021-10-24T10-04-37_g0. But it may be more than one, so I suggest you move it and let the rest to see if there is any other.

1 Like

@fredy Nice find! I relocated it to a tmp/ dir and the remaining 500 frames are in the process of downloading. Should be all caught up in moments.

The file was 1 byte.




root@satnogs:/data/satnogs/data/tmp# ls -l
total 4
-rw-rw-r-- 1 satnogs satnogs 1 Oct 24 13:40 data_4900171_2021-10-24T10-04-37_g0
root@satnogs:/data/satnogs/data/tmp# od data_4900171_2021-10-24T10-04-37_g0  -c
0000000   8
0000001

Thank you, much appreciated! And sorry I didn’t catch this myself. Now I know what to look for if this happens again.

Cheers

3 Likes

Glad we found it out!

2 Likes

Thank you very much @satcolintel5 for your extensive debug information (log, the actual file content, the file name and detailed description of the problem)! :ok_hand: :100:

Based on it I was able to trace this issue back to this existing bug in satnogs-client:

4 Likes

I just wrote a fix for this issue, see satnogs-client#554.

4 Likes

circling back on this – slow upload of a lot of tiny frames.

Hours to upload ~3000 frames (BOBCAT-1 ~100 MB total). Sentry is still enabled if that helps.

Wondering if upload workflow for high frame counts needs attention (transaction cost per frame). Batch upload vs 1 frame at a time???

1 Like

We have similar experiences. We are following some data-burster satellites (UVSQ-SAT, GRIFEX, LEDSAT), and sometimes we have many thousand frames (the record braking is UVSQ-SAT so far, with 7145 packets: Observation #5045128).

Uploading of these piles take dozens of minutes, and if a new observation starts during this interval, then it (I mean the new observation of the next satellite) sometimes appears in the database as failed.

1 Like

No error on Sentry, so the issue is that these are a lot of frames.

The plan is to create a stream in order to upload the data (not audio and waterfall) in real time as they are demodulated. This will need changes in SatNOGS Client, SatNOGS Network and SatNOGS Router. For the latter there is progress but for Client and Network not yet. My estimation for its completion is around the end of January 2022, mainly due to limited developer resources, except if there is extra help on this.

Currently the check for marking an observation without any artifact as failed is done ~30min after the end of the observation. We can increase this value, is there any suggestion on how much this delay should be?
Just to be clear the observation status changes whenever one of the artifacts is uploaded, even after this 30min period.

1 Like

@fredy + core SatNOGS team, much appreciated and thanks for the insight. Cheers