Standardization of frames formats for SatNOGS DB

they do, all the time :slight_smile: just look around in DB.

Thanks for your pointers @pierros. All this is very relevant to improve the usefulness of the DB. However, to try to get the discussion back on track, I would say that the goal here is:

For every burst of RF transmitted by a satellite, there should be a exactly one way to transform that data into one or several segments of bytes (frames or packets) with the goal of storing those segments in the DB.

We know how the RF bursts of each of the satellites we are able to decode look like. Now we need to decide on the most appropriate transformation into ā€œsegments of bytesā€ for storage in the DB.

Other side-topics that are very important but I think could be treated in separate threads are:

  • How to identify different satellites given one the packets
  • How to protect the DB against erroneous entries
  • What tag to use for satellites (i.e., NORAD, unique satellite ID, etc.)
  • Transmitters (what is a transmitter? does it need to be physically separate or is it enough it uses a different mode or setting? if so, what really constitutes a ā€œmodeā€?)

In fact it seems to me that some of these questions are related to the metadata of DB entries, so we could open up a new thread for the DB metadata, while this thread could focus on the data itself.

Issue 317 in satnogs-db looks quite interesting. I already put there lots of comments that I believe are still open (there was no confirmation of agreement or a rebuttal). Once all that is implemented, if all the information that gr-satellites needs to choose and configure a matching decoder for each satellite and signal is included, then youā€™re right that there will be no need for SatYAML and it will most likely disappear.

So I think that it is a reasonable design driver that the proposal in issue 317 covers at least as much information as SatYAML currently does. It will also be necessary to prepare for the future. SatYAML is a ā€œlive formatā€, but the proposal in issue 317 will be much slower to evolve to a new version. I wonder how does that play with the fact that any moment a satellite can be launched that uses a protocol which is different (totally or just slightly) from the rest weā€™ve seen so far. With SatYAML I can just extend or change the format, because itā€™s only used by gr-satellites and not intended to be backwards compatible (for instance, see what I did here).

Coming back to my original points, we are seeking input on the following:

A. Input/opinion about whether it is best to swap the endianness of the CSP header or not
B. Confirmation to agree on including the CRC-32C from CSP frames in the DB, or opinion against. If this is agreed, then weā€™ll need to check with Andy about the cases in which SatNOGS drops this CRC-32C.
C. Input about what to store for AAUSAT-4, specially regarding whether the FSM should be included. My opinion here is that we should only include the useful information (user bytes) of the Reed-Solomon codeword (this includes a 16bit ā€œframe length fieldā€ followed by a CSP packet, so I think itā€™s not exactly what @dernasherbrezon proposed).
D. Input about what to store for MOBITEX frames and the confirmation about whether SatNOGS implements a MOBITEX decoder currently. My choice would be to store exactly the output of the gr-tnc_nx decoder, mainly because that code is not my own and Iā€™m lazy to change it, but thatā€™s not a good reason by any means.
E. Try to identify if there are any other cases in which different applications are formatting frames differently.

2 Likes

Maybe they are different because we arenā€™t decoding in a similar way?
And that is exactly what we are trying to achieve with this subject.

So please for those building encoders and decoders share your thoughts.

@EA4GPZ, thank you VERY much for bringing this topic up. Below are some notes I made while reading this thread. Some of the earlier questions have been answered later on in the ā€œread throughā€, but I left them in for completeness.

Why are frames for AX.25 not including the CRC? It makes sense that the HDLC flag is not included.

What do you mean by swapping CSP header? I think the DB should not flip ā€œrandomā€ bytes in the packet. The decoders are satellite specific and hence they should just be stored as is. :S I really think the decoder should handle any quirks, not the demodulator. In my mind the demodulator is the part that takes care to convert a continuous byte stream to headers, data and checksums. For AAUSAT4 this means that the demodulator performs the FEC decoding, that is RS and convolution decoding. This is essentially the same point as @dernasherbrezon has.

Re. C: I just made my AAUSAT4 satnogs decoder to match what was already in the db (submitted via SIDS?), DK3WNā€™s decoder I guess. I newer relally looked into UZ7HOā€™s soundmodem, but I assume it just demodules and makes a bit stream? I agree that there is no reason to include the the FSM byte and the length as those are just used by the decoder to figure out what to run the FEC decoding on. I would like to see the HMAC included, but as most of the ā€œoriginalā€ db entries did not have it I stripped them as well.

I am operator on AAUSAT4, how does the ā€œgenericā€ GMSK demodulator handle different baud rates? In satnogs you sort of specify this as a radio config, but for me it would be nicer if it could support multiple rates as a default. Sometimes we test higher rates for a while, until the radio performs a fallback to a lower rate. I donā€™t want to loose on 4k2 for example.

@DL4PD has a valid point about the ā€œouter identificationā€ for frames. I donā€™t understand @dernasherbrezonā€™s reply to thatā€¦ where he mentions it will be ā€œnatrually solved in the futureā€. EA4GPZā€™s point of the ID not being FEC protected is indeed valid. For AAUSAT4, the callsign is
used a an ID, but also as a sync word, hence there could be bit errors here. ā€œBandwidth is preciousā€ as EA4GPZ words it. :slight_smile:

I meant currently identification is not a pressing matter. Thatā€™s why CSP saves bytes on identification. But later on (in the future) we might face a situation when multiple satellites transmit on the same frequency. And that would require changes in the protocols. Not the database. Which, as @EA4GPZ pointed out, is not related to the current topic.

Hi Nick, thanks for your reply!

Why are frames for AX.25 not including the CRC? It makes sense that the HDLC flag is not included

There are three reasons: first, AX.25 frames over a KISS link are sent without a CRC, as it is added by a TNC; second, tradition, mainly motivated by the first reason, since the first existing systems for telemetry collection in a databased mirrored what KISS did; third, because once the frame has been checked for correctness (and with AX.25 it is mandatory, to prevent false decodes), then the CRC is useless and can be recomputed by the data.

What do you mean by swapping CSP header?

Swapping the endiannes of the 32 bit header, as done here.

I think the DB should not flip ā€œrandomā€ bytes in the packet.

This I agree. The DB should store frames unaltered, as they are received from decoders via the SiDS protocol. However, the question is what should decoders send to the DB.

Can you please try to define ā€œdecoderā€ and ā€œdemodulatorā€ in your message more precisely. I think it might have different meanings to different people. Here we are talking about the following pieces of software, all of which may produce frames/packets that can get sent to the DB:

  • SatNOGS (the observations that appear on SatNOGS network)
  • The various Soundmodem from UZ7HO
  • gr-satellites
  • jradio

Are these ā€œdecodersā€, ā€œdemodulatorsā€, or do they include both. In gr-satellites I have a precise definition for ā€œdemodulatorā€: a block that gets a stream of IQ samples representing an RF waveform and outputs a stream of soft symbols, without having any consideration for frame boundaries. I donā€™t have a precise definition for ā€œdecoderā€, but loosely I call the whole software that gets IQ samples and outputs frames a ā€œdecoderā€. This is quite different from your definition, since all the frame boundary detection, FEC decoding and CRC checking I consider part of another decoding step I call ā€œdeframerā€.

I newer relally looked into UZ7HOā€™s soundmodem, but I assume it just demodules and makes a bit stream?

No. UZ7HOā€™s soundmodem output is packets, already FEC decoded and CRC checked. Its functionality is very similar to SatNOGS or gr-satellites.

I would like to see the HMAC included, but as most of the ā€œoriginalā€ db entries did not have it I stripped them as well.

Is the HMAC part of the Reed-Solomon codeword user data? In gr-satellites the output of the AAUSAT4 deframer is the output of the Reed-Solomon decoder. This is what gets sent to the DB currently. Nothing is stripped. I agree that having the HMAC is useful for you satellite operators, who have the cryptographic key to check it, so I also think it should be included. Should I change something to include the HMAC in gr-satellites?

how does the ā€œgenericā€ GMSK demodulator handle different baud rates?

I am at a loss here. What is the ā€œgeneric GMSK demodulatorā€? gr-satellites has a generic FSK demodulator, but it always needs to be instanced with a particular baudrate. For satellites that can transmit using several baudrates, gr-satellites instances several copies of the complete decoder chain, using a different baudrate in each of the FSK demodulators.

With UZ7HO soundmodem I think you must choose the baudrate together with the decoder type (the decoder type is somehow a combination of baudrate + protocol details). I think you can have two different decoders running in parallel.

With SatNOGS you must chose the decoder type, including baudrate, for a given observation. I donā€™t think it supports simultaneous decoding by trying several baudrates.

3 Likes

Yes, I sometimes call all this demodulator or just receiver. I agree with your terms though.

From the packets I saw decoded by sa2kngā€™s setup it looks like the HMAC is already included, but I think the header is flipped, which is unexpected and the opposite of what I have in my receiver for AAUSAT4 for satnogs and whats already in the DB. I only looked at the raw bytes to conclude this, I didnā€™t decode it to double check.

For reference:
SA2KNG https://network.satnogs.org/observations/3301580/
vs
OZ3RF (me) https://network.satnogs.org/observations/3279667/

Also this issue has been created to track the decision here for gr-satellites:


Thank for the response on the other questions.

I just saw the issue that Jan opened on gr-satellites. The fact that he run the same SatNOGS observation through gr-satellites has been extremely helpful to identify the differences.

Besides stripping vs. preserving the HMAC, from your last comment now I see that there is another difference: gr-satellites has an additional 00 56 at the beginning of the frame (I think this is what made you think the header is flipped; if you look at the next four bytes, youā€™ll see the header is not flipped).

This additional 00 56 field is in fact the ā€œlengthā€ field, as shown in the Janā€™s webpage. As I said above, gr-satellites sends to the DB the user data inside the Reed-Solomon code word, and this length field is part of the Reed-Solomon codeword. Should gr-satellites strip this field? As I replied to Jan in the issue, I leave the decision about the standard frame format for AAUSAT-4 in the DB to you, @nickoe, since it makes a lot of sense that satellite operators should be able to define these things.

By the way: the figure in Janā€™s page was hosted before in AAUSAT-4ā€™s webpage, but this webpage is no longer available. I didnā€™t know that Jan had a copy of this figure in his webpage. The figure is very useful as documentation.

1 Like

I suppose not. But using the complex data and sufficient bandwidth with the gr-satellites satyaml should be able to run all of them at the same time. Just watched your yt video TceMth67r9c

My point is still valid; your proposed future is now and we see several satellites transmitting on the same frequency. Some of them are just crossing pathesā€¦

Really great to see that we are working towards a solution for AAUSAT, one down.

Thank you all for the effort, now back to the original question and see if we can work towards consensus and overcome the challenges that we will encounter.

Let me give some insight on what we are trying to achieve with the new artifacts format and the unique satellite IDs we plan to roll out, with respect to the telemetry data.

Actually, itā€™s all about data modeling. We need a comprehensive model for telemetry data that will allow us to describe all these modulation and encoding/decoding quirks. In other words, it should be possible for people to submit data to the DB indicating:

  • the endianess of CSP header
  • possible frame encapsulation hierarchy, so that decoders can be chained
  • whether CRC, HMAC, etc. is included or not
  • etc.

All this information will be part of the artifact metadata and will be available to the DB and people who want to make further analysis.

We plan to tackle the ā€œmissing identificationā€ issue of CSP and other similar protocols by introducing the concepts of ā€œintention of satellites to be observedā€ and ā€œunique satellite IDsā€. On each telemetry data artifact, the user (or app) will be able to attach one or more SatNOGS satellite IDs which will be the userā€™s intention of the object(s) they attempted to observe. Once the data is submitted to DB, a process of actual association to an ID will be triggered based on statistics or ML, returning the confidence level of the system for this association. Cases in which identification information is completely missing will get at least a hint based on the ā€œintentionā€ submitted.

Of course, SIDS cannot support all the above. So a new format, based on HDF5 files, will be used. SIDS interface will continue to operate, but data still submitted to it will have to be isolated.

I think I should also make clear what a satellite transmitter is in SatNOGS. A transmitter is not the radio hardware of the satellite but rather the ā€œtransmissionā€. This means, that if the same radio is used to transmit in two different modes or frequencies, these are two separate transmitters in SatNOGS. We also plan to move away from modes for transmitter and define all transmitter parameters using a schema.

Hi @Acinonyx, thanks for your explanation about the new artifacts format.

After reading your comment, Iā€™m worried that the new system that you are proposing might place an unnecessary burden and complexity on decoder software that wants to submit data to SatNOGS DB. When I think about it, the job of a decoder is to get RF samples at its input, do some DSP and peel some of the outermost protocol layers, and obtain a sequence of bytes that we may call a ā€œframeā€ (and with complex protocols it is a good question how many protocol layers should the decoder peel to obtain this ā€œframeā€).

When this frame is obtained, I think it is much more simple and efficient that the decoder sends it as a sequence of bytes to the DB for storage, as is currently done with SIDS.

Think about the various metadata youā€™ve mentioned in your comment: CSP header endianness, encapsulation hierarchy, presence of CRC, HMAC, etc. Once a decoder has obtained this ā€œframeā€, by what reasonable means would the decoder be able to know all this metadata? This isnā€™t marked explicitly in the ā€œframeā€ bytes and often does not respond very much to decisions that the decoder has taken in the decoding process, but rather to the format of packets that are being transmitted by the satellite. With satellites using the Amateur Satellite service we often face incomplete or nonexistent documentation from the satellite designers, so it is not at all simple to know all these details for decoded frames.

The only reasonable way I can think of to generate this metadata at the decoder is to implement in the decoder the knowledge about what each satellite is transmitting, in the most accurate way possible, given the lack of documentation. This is probably going to cause more incompatibilities in the long run with different decoder softwares. With SIDS, we just need to agree on how to store frames as sequences of bytes, and we are already having trouble with slightly different formats. With this artifact metadata we need to agree on how to unambiguously interpret and assign all the metadata fields.

Additionally, I think that your idea for the metadata is to have a full description of the protocol stack and variations included in that ā€œframeā€, which as I said is a difficult task, due to the usual lack of documentation. I think that this full knowledge of the protocol stack should be a duty of the consumer of the frames (whichever application that reads frames from the DB, such as the SatNOGS dashboards or any other study of the values of the telemetry fields). Decoders should just ā€œcaptureā€ over-the-air frames to be stored in a DB.

In any case, since apparently the SIDS interface is going to continue to exist, we should better agree on the format of frames to be sent through the SIDS interface, which was the original topic of this thread.

3 Likes

I think I wasnā€™t clear enough. The metadata shall contain parameters the decoder used; not the full protocol of the transmission. The decoder knows its quirks and should inform the DB about them. So, in case of CSP header, if a decoder assumes that itā€™s big endian and decides to flip the bytes, it should inform the DB that it has done so. Same for CRC. If the decoder strips the CRC, it should inform that the data is without CRC. If it peels any layers, it should tell which layers it has peeled.

1 Like

If all applications submitting frames to DB can agree on a common definition of the frame format for each satellite, then there is no need to indicate any specific details about the decoder process (i.e. CSP endiannessā€¦) in metadata. [There are other usecases of metadata though, e.g. SNR, bits-corrected-by-fec, ā€¦]

But there will always be cases where different applications behave differently (as we just have seen with AAUSAT-4, fixed according to gr-satellites#208 (comment), thanks to all ppl involved!).
@Acinonyx proposes to document all decoder quirks in metadata, but who decides what involves this? Once all applications agreed on this, they could have also agreed on a common format (as successfully demonstrated as we have just seen). Imo we should keep relying on mutual agreement on the frame format of each satellite as done up to now.

But To be able to understand frames from DB produced when not all applications agreed on the same frame format and without having all decoder ā€œquirksā€ documented alongside the frames in metadata, the decoder application name and version is needed.

For the upcoming artifact format, the application name & version can be easily transmitted as part of the metadata (and will be stored as such by DB). For SiDS there already exists a different solution though, as described by @EA4GPZ earlier:

To prevent ā€œgarbage framesā€ in DB via SiDS, imho either this rejecting of frames should be implemented or this version field value should be stored alongside each frame in DB (so ā€œgarbage framesā€ are not lost, but the consumer gets the ability to detect different applications&versions and deal with their frames).

[OT: A third option is to deprecate SiDS and rely on the new artifact + metadata (application name&version, not decoder details) solution not only for satnogs stations but all submitting applications. For this solution to work, this new transport must be documented and advertised to third parties. Letā€™s discuss this in a different thread in the future though]

That means that the DB should not accept/process any data from a newer version unless the format becomes somehow known to the DB. Otherwise, it wouldnā€™t be able to make head or tails of it.

@kerelā€™s comment is right in line with my thoughts. To me, ā€œdocument all decoder quirks in metadataā€ seems to shift the problem to a different place. Instead of having to agree on a common format, we need to agree on a common metadata format to describe format variations.

Supporting different applications sending different frame formats to the DB by using metadata might seem a good idea, because in principle depending on the decoder it might make more sense to use one format or another. But this not only requires that the metadata is an accurate description of the format that was used, but also requires applications that consume DB entries to implement all the possible format quirks according to the metadata.

In my opinion it is much more simple to agree on a common format for each satellite and have all applications that produce frames for the DB and all applications that consume frames from the DB use that format. This is not easy, as it requires good coordination and understanding of the ā€œcorner casesā€ of each format, but I think it can be done. A successful example is what weā€™ve done with AAUSAT-4. In the course of a few days we have coordinated with @nickoe, decided the common format we want to use, and modified gr-satellites, UZ7HO Soundmodem, and Mikeā€™s Telemetry Parser to use the this format (the SatNOGS decoder is being modified also). There will always be bugs where some application didnā€™t implement the format correctly in some particular version, but hopefully this can be detected soon enough and its impact limited by the use of the SIDS version field.

The agreement should be on a common data model; not on how decoders should decode. There is a lot more information that can be useful besides the end result of the processed data. We should solve this problem by removing restrictions in a future-proof way. Unfortunately, current SIDS is very restrictive on that matter. We need a more flexible format that can hold ā€œobjectsā€ and has a hierarchical structure.

As long as there is SiDS and no other transport documented and advertised to third parties, we must also agree on how decoders should decode.

And even the with a new artifacts-based transport, there must be an agreement between decoder applications and the consumers on the expected content of the artifacts. Due to the flexibility added by additional fields decoders are allowed to deviate from the standard frame format, but I expect there to be always some form of agreement.

After reading the reply from @Acinonyx I immediately had to think about:

Occamā€™s Razor

In simpler language, Occamā€™s razor states that the simplest explanation is preferable >to one that is more complex . Simple theories are easier to verify. Simple solutions >are easier to execute.

By introducing even more complexity the chance that is going to be successful will reduce.

But apart from that, please create a separate topic where this can be discussed so in the meantime we can move towards to a common frame format that could ultimately be incorporated in a data model.