After many and long discussions we are almost ready to start implement (SatNOGS) Satellite ID. This is a quick FAQ and an overview on how we are going to move. Any feedback or questions are more than welcome.
Why another ID, or why not using NORAD ID?
We need an ID that will allow us to uniquely identify satellites and to uniquely relate Artifacts with them. NORAD ID can not be used, as a satellite can have 0, 1 or more than one possible NORAD IDs, making this ID unusable.
What is an Artifact?
For those not familiar with the term, that is used lately, artifact is a combination of a result and its metadata of an observation. We are still looking for the best way to store artifacts in DB, but we have found and we currently experiment with HDF5 for transferring artifacts from clients to DB. About the metadata, Metasat project is in progress. Metasat is focused on describe the structure and the fields of the metadata that are related with space missions.
Are we going to change SiDS or how SiDS will change?
The DB endpoint for Artifacts will not be compatible with SiDS. SiDS endpoint will remain, until is fully deprecated. So, the goal is to move to better ways/protocols that will allow us to keep observations results in a better and more scientific way. And a way to do this is by implementing Satellite ID.
Who is going to create these IDs?
There will be two ways to create a Satellite ID:
Creating a Satellite object in DB
Receive Artifact with a new Satellite ID
Will this be a UUID?
Technically yes, practically due to the nature of DB and how we are going to accept Artifacts, no. It would be possible to merge or split “Satellite” objects as there have been cases and there will be ones that demand to be able to merge or split.
How clients, that send Artifacts, would be able to generate an ID?
For generating these IDs, we are going to use a library called Nano ID which is available in several programming languages. So any client would be able by using Nano ID to generate Satellite IDs. Note: Generating an ID on a client will be necessary in case that there isn’t one available on DB, also developers and users of such clients will be encouraged to create satellite entries directly in DB, instead of using the “Sending Artifacts” way.
How the producer of this artifact will generate a valid ID? Does the process rely on specific attributes? How we avoid that x stations are producing x UUIDs for the same tracking object, making the merging process a PITA?
Not sure if this answers your question… Nano ID ensures that the ID will be unique enough to have any collisions. But even in this very very rare occasion we will have the split mechanism.
Unfortunately you can not, as you can not right now avoid the fact that stations could generate random NORAD IDs. Merging would be a simple action. To be honest I don’t expect many clients/stations to generate their own IDs, usually would be easier and more convenient to use the existing or create a new one in DB.
However if we end up in a strange situation that independent stations generate many IDs for their artifacts, then we can create extra toolset that will allow us to recognize duplicates, for example checking transmitter, NORAD IDs, TLEs, demodulated or decoded data etc.
A User Story I’m thinking about since a few days:
I discovered a previously unknown satellite which is not tracked by NORAD and thus does not have a NORAD ID which is transmitting in amateur band. I was able to reverse engineer the protocol and decoded some frames. Now I want to upload those frames (& artifacts for doppler analysis) to SatNOGS.
I generate a new SatNOGS Satellite ID using Nano ID and upload the frames & artifacts.
SatNOGS DB accepts this data and creates a new satellite.
Now a second observer receives frames from this satellite and wants to upload those to db. She has to browse the SatNOGS DB Web UI for this satellite and copy the SatNOGS Satellite ID into her software/client.
How to avoid manually handling the Nano ID (21 char string) in this case? The satellite doesn’t have a unique name (maybe it is dubbed “Mystery Sat”).
Currently (using SiDS) our temporary assigned “NORAD ID” would be used which is much easier to work with (but which is only valid for a limited time).
Let’s split the example above into different use cases:
Observer knows about the mystery satellite, so the intention is to observe this satellite. In this case observer’s client/software should have requested from DB the satellites, so the observer be able to choose from a list of satellites (with their names or other properties) the mystery one. Artifacts from observations will have the right ID as the selected satellite has already ID.
Observer doesn’t know about the mystery satellite but receives it. Then there are two options:
a) Observer search in DB for the mystery satellite and finds it and uses the ID to upload observation’s artifacts.
b) Observer generate a new ID (satellite entry) and upload the artifacts in DB. Now in DB when we realize that the two satellites are the same one, we associate/merge the two entries. In this case both observers would be able to continue using the initial IDs they used, but in DB would be associated and count as one satellite.
@kerel let me know if on the cases above I missed something.
The observer’s software might be some non-interactive CLI tool, so I assume ultimately the user will have to manually handle the Satellite ID. This is a UX issue when the Satellite ID is a long alphanumeric string, but not a show-stopper.
The proposed support of truncated Satellite IDs on UI would fix this issue.
Having an ID which is “memorizable” would be a nice feature. Inspired by car license plates, which are somewhat possible to memorize, I would suggest a few improvements:
Split into fixed size segments
Use a limited character set per segment
Keep it truncate-able
Make sure it is extendable to increase the size in the future in a backwards compatible way, if required
Some facts:
Standard Nano ID size/charset equivalent is 126bits (64^21 = 2^126)
IPv6 ULAs are 120bits.
I am setting a target of 1% chance of collision on ~10 billion generated IDs. This is close to 72 bits in size.
With Nano ID full charset, the above equivalent number of characters for 72bits is 12. That could be split into 4 segments of 3 characters each to satisfy (1). It also satisfies (3) and (4) easily. But (2) is a problem since the large (64) character set of uppercase, lowercase, numbers and two symbols makes it very difficult to memorize, even truncated.
What I’m proposing is this form: CCCC-NNNN-NNNN-NNNN-NNNN
The above form can be truncated into its segments. E.g. JAHK-2812-9284-1248-4812 can be truncated to JAHK-2812 (which is “memorizable” as a typical license plate is equivalent to ~32bits) or JAHK-2812-9284 if a collision occurs.
Also, if at some point we get close to collisions, the size can be increased to 85bits by adding an additional numeric segment while the existing 72bit IDs can be converted to 85bit IDs by adding a -0000 segment at the end.
Oh! This could exploited in so many ways. I would assume no one would bother to map his/her generated satellite ID back to the standard. Why? Look at the vetting algorithm right now. Half of observations not vetted, half of positively vetted observations doesn’t contain anything.
NORAD ID can not be used, as a satellite can have 0, 1 or more than one possible NORAD IDs
Can you elaborate on this? Doesn’t look like correct. Do you refer to a situation just after the launch? When satellite cannot be reliably identified?
To be honest I don’t expect many clients of DB not using the already generated satellite IDs. But in any case new satellite IDs/entries will stand as suggested satellites in DB. Operations team will review these new entries and either keep them, merge them with another entry or reject them, depending on the rest of the metadata and actual data of the artifact. The suggestion/review process would be similar to the current one for transmitters and clients will be able to use this new ID all the time, before and after review.
The main reason for allowing clients to generate their own IDs is to cover cases that satellite is not known or access to IDs is not possible. In these cases you need the client(s) to be able to group their results/artifacts with an ID.
We have 3 states:
Before launch/deployment, 0 NORAD ID assigned which means no way (except from a temporary NORAD ID) to test integration of the mission with the SatNOGS in different levels (Client, Network, DB, Dashboards etc).
After launch/deployment but before identification, 0 to many (possible) NORAD IDs. In this state again there are issues with integration with SatNOGS. Also external stations/clients/observers are not sure which NORAD ID should use to send data to SatNOGS DB (the temporary one, one of the possible NORAD IDs?).
After identification, 1 NORAD ID but this may never happen. If identified then well no problem but there are plenty examples of satellites that never have been identified and you need to somehow be able to still refer to these satellites. Also in these cases it would be useful to be able to document which NORAD IDs could be the satellite, allowing anyone to choose one of the objects to track (currently we are limited to the one NORAD ID we set to follow).
I hope I answered your questions, let me know If I missed any or if there are more questions.
I do see a DoS vulnerability on review of new satellites when (malicious) users submits many artifacts with unknown satellite IDs.
But as just discussed with @fredy we consider this not a blocking issue for the implementation of this proposal (especially given that we are discussing SatNOGS Satellite ID proposals since a long time and have to settle with some solution).
My 50c for ID format: noradid[-company name/owner code-some incrementing number-some random stuff / crc]
pros:
compatible with norad id
easy for group by company name
unique
before the launch: 0-{company}-1-{crc}. After successful match with norad: {norad id}
cons:
hard to calculate auto increment consistently?
have to merge different ids into single one.
How do you plan to merge ids? Let’s assume somebody generated ID for object XYZ, then somebody else generated ID for object ABC. At some point later, it turned out XYZ and ABC is the same satellite. Will you support such use cases?
It will be but still you will have the issues of NORAD ID changing and also people using wrong NORAD ID when identification isn’t settled. So, even if some people use 0(I guess 5 0s for length compatibility) as NORAD ID, some other will use one of the possible and then you will have confusion.
Another thing with NORAD IDs is that they are going soon to be deprecated or extended, already organizations try to move away from it by stopping the use of TLEs.
Unfortunately company name is not always known and in some cases can not be found at all, so I guess we will need again something like the 0s in NORAD ID for these cases.
Another issue is the length which is different for each company name, the solution here would be to cut the name to some chars.
Finally you may face issues like two companies participating and then you will need some politics which one you are going to use, which can raise other unnecessary issues.
That’s the main reason that we don’t want to use incremental number, the problem is with the requirement that anyone could create an ID, so there is a possibility of IDs with the same incremental number, in this case it loses it’s meaning and it gets like a random number again.
Yes this is exactly what merge process will be. XYZ and ABC will be merged by associating the newer with the oldest. Both they’ll continue to be entries and clients will be able to send data with using both IDs, however on searches and listings it will be pointed out that older is the original and the other is merged.
So to summarize:
We need an id that can be fairly easy generated by anyone (this is why we suggest using the nanoid which is cross-platform)
From 1, we need to make sure that collisions are almost impossible
It should be extendable in an easy way
Part or all of it should be easily communicated between humans (truncating as @Acinonyx described)
It should be easy to remember (again truncate is solution) if you use it daily (like it is easy to remember that 25544 is ISS)
noradid is changing… dependence on it is actually a cons.
This is just ambiguous. Many times it is not clear who is the operator. Plus: lack of info about an object should not be an obstacle for us on assigning an ID.
Respectfully, I think you are missing the point and the driving factor of our proposal. We dont want changing IDs, and dont want the ID to be dependent on any information around the object. e.g. We should be able to assign an ID on a random object that we are picking up a transmission from.