Since the beginning of our Network operations we have been having a vetting system. Vetting of observations is crucial for our operations and provides extremely useful statistics for satellites and ground stations that are in turn providing great insights and guidance for future observations or identifying issues. (also for future usage on our auto-scheduling algorithms)
Although the guidelines and processes have changed in the past (with introduction of new states like “Failed”) the general concept has remained the same and you can see the outline and our vetting guidelines here: https://wiki.satnogs.org/Operation#Rating_observations
Recent growth of the network and growing diversity of ground stations and satellites is forcing us to re-evaluate those processes and guidelines. Thoughts of introducing more crowd-sourcing around the process and possible AI on it are also further complicating the discussion.
The main issue at hand (among various others) is the difference between “Failed” and “Bad” states. Coincidentally this is also the most debatable decision per observations as evident in various other threads (like this) and not without a reason.
In theory the concept is (although admitingly not easily digestible in our guidelines) that “Failed” observations should be marked as such if there is a problem with the station (i.e. not returned artifacts, problematic RF line etc) and “Bad” should be reserved as a state when a satellite is malfunctioning and thus we could not pick-up a signal.
The problem arises that in many situations it is really hard to tell what is happening, plus it requires knowledge and research done by the observation vetting user. The vetter should know what are the capabilities of the station, and also know if the satellite is performing well and be able to deduct that it is a ground station issue. In some extreme cases it is even harder to tell what is happening since only a handful of stations could determine the status of the satellite (e.g. the Kicksat situation where it was only picked up by 3 stations, 1 of them being the Dwingeloo Radio Telescope)
Although there are clear cases of “Failed” (like when no artifacts, waterfall, audio etc are returned), there is always an amount of uncertainty on the “Bad” observations (it could be the station or it could be the satellite failing).
Scale comes to the rescue?
In principle since we are doing operations at large scale (3k+ observations per day with more than 200 stations) erroneous vetting should not be a problem. We will always have some level of uncertainty and it should be fine.
In theory we could stick to a much more strict vetting guideline (as suggested by @acinonyx and @fredy before) and mark “Failed” only the absolutely obvious cases (like malformed waterfalls or missing artifacts). This could be done in an automated way since those are easily detectable cases. Then for everything else (except the auto-vetted good) it is up to the users to vet as “Good” or “Bad” (and gradually an AI system could supplement this). That scenario removes any uncertainty between “Failed” or “Bad” but introduces a crucial issue about the statistics and they way we calculate them.
At present, transmitter (thus satellite) statistics are calculated based on Good vs Bad states of observations. Ground station statistics are calculated based on Good-Bad vs Failed states. If we were to change our policy to the above mentioned scenario this would skew the transmitter statistics to unrealistic numbers.
I am opening up this thread to gather feedback and ideas on the way forward and determine collectively what makes sense to follow as a policy and implement the technical aspect of it in our Network. Please do chime in!