Too many failed uploads

I am running my station now for a while, overall it is working fine, but in approx. 20% of all observations the upload fails. In the log I see:
Assertion 'close_nointr(fd) != -EBADF' failed at ../src/basic/fd-util.c:67, function safe_close(). Aborting.
and satnogsclient.observer.observer - ERROR - No waterfall data file found.
These messages come at the beginning of an observation. Does anyone has an idea, what the reason could be? I have a fast and reliable internet connection. It is frustrating to see every fifth observation to be lost.
My configuration (Same problem also without the pre/post observation script) :

{
    "versions": {
        "satnogs-client": "1.6",
        "satnogs-client-ansible": "202205101826",
        "satnogs-flowgraphs": "1.4-1",
        "gr-satnogs": "2.3.1.1-1",
        "gr-soapy": "2.1.3.1-1",
        "gnuradio": "3.8.2.0-14satnogs2",
        "satnogs-config": "0.12"
    },
    "state": {
        "is-applied": true,
        "pending-tags": null
    },
    "system": {
        "date": "2022-05-23T06:01:55.147093+00:00",
        "distribution": {
            "DESCRIPTION": "Raspbian GNU/Linux 10 (buster)",
            "RELEASE": "10",
            "CODENAME": "buster",
            "ID": "Raspbian"
        },
        "pending-updates": false,
        "platform": {
            "system": "Linux",
            "node": "satnogs",
            "release": "5.15.36-v7l+",
            "version": "#1548 SMP Thu Apr 28 16:59:01 BST 2022",
            "machine": "armv7l",
            "processor": ""
        },
        "memory": {
            "total": 4025397248,
            "available": 3684831232,
            "percent": 8.5,
            "used": 175738880,
            "free": 151048192,
            "active": 280428544,
            "inactive": 3464810496,
            "buffers": 228237312,
            "cached": 3470372864,
            "shared": 8658944,
            "slab": 84508672
        },
        "disk": {
            "total": 31068966912,
            "used": 3641868288,
            "free": 26109968384,
            "percent": 12.2
        }
    },
    "configuration": {
        "satnogs_antenna": "RX",
        "satnogs_api_token": "[redacted]",
        "satnogs_gain_mode": "Settings Field",
        "satnogs_other_settings": "LNA=14,MIX=8,VGA=8",
        "satnogs_post_observation_script": "/usr/local/bin/satnogs-post {{ID}} {{FREQ}} {{TLE}} {{TIMESTAMP}} {{BAUD}} {{SCRIPT_NAME}}",
        "satnogs_pre_observation_script": "/usr/local/bin/satnogs-pre {{ID}} {{FREQ}} {{TLE}} {{TIMESTAMP}} {{BAUD}} {{SCRIPT_NAME}}",
        "satnogs_rx_samp_rate": "3e6",
        "satnogs_soapy_rx_device": "airspy",
        "satnogs_station_elev": "50",
        "satnogs_station_id": "2650",
        "satnogs_station_lat": "1.3380",
        "satnogs_station_lon": "103.934",
        "udp_dump_host": "127.0.0.1"
    }
}

check the ‘dmesg’ logs to see if the USB device for the Airspy is showing any errors. Maybe the device is timing out when an observation is supposed to stop which in turn kills the next upcoming observation when it tries to start.

I have an AirSpy R2 and have issues with the micro USB connector not being firmly engaged from time to time although I don’t get that particular error when it happens.

Scott

I don’t find a ‘dmesg’ log, but in the messages log something like

May 23 21:08:15 satnogs kernel: [  277.181609] usb 1-1.1.4: usbfs: interface 0 claimed by usbfs while 'satnogs_cw_deco' sets config #1
May 23 21:26:54 satnogs kernel: [ 1396.403950] usb 1-1.1.4: usbfs: interface 0 claimed by usbfs while 'satnogs_fsk.py' sets config #1
May 23 21:40:40 satnogs kernel: [ 2222.407746] usb 1-1.1.4: usbfs: interface 0 claimed by usbfs while 'satnogs_fm.py' sets config #1

usb 1-1.1.4 is the Airspy

from the command line as root:

dmesg | grep -i usb

This dumps the kernel (ring buffer) logs, this sometimes yields more information on what is happening with a device than is logged to ‘syslog’ or ‘messages’.

Thanks for the info. Yes, I did dmesg and get the similar output:
usb 1-1.1: usbfs: interface 0 claimed by usbfs while 'satnogs_amsat_f' sets config #1 All other output related to USB looks normal.
I also changed the USB cable from the SDR to my Raspberry. It looks like a slight improvement, I had two missed passes out of 30. Still, a bit strange, it should actually run flawlessly. In the log it always says:

May 24 22:42:44 satnogs satnogs-client[388]: Assertion 'close_nointr(fd) != -EBADF' failed at ../src/basic/fd-util.c:67, function safe_close(). Aborting.
May 24 22:42:48 satnogs satnogs-client[388]: satnogsclient.observer.observer - ERROR - No waterfall data file found

Some more logs would help, like before that error. The “-EBADF”, as far as I can tell, is saying it is trying to close a file that isn’t open/doesn’t exist. Are you running out of disk space or any similar types of errors? It may just give that error because the data doesn’t get generated at all, so no file to close. So the “real” error is happening before that. My guess.

Yes, exactly. It looks like it is happening when the program starts the observation. There is enough disk space, usually after the error the next observation is working fine. And the problem is that the whole issue happens sporadically.
Which logs to check in addition?

Perhaps /var/log/daemon or kern.log or similar for more context to the error.

None of the logs shows any abnormal messages. I also tried the debug mode of satnogs-client, but the only two messages are:
Assertion 'close_nointr(fd) != -EBADF' failed at ../src/basic/fd-util.c:67, function safe_close(). Aborting.
satnogsclient.observer.observer - ERROR - No waterfall data file found.
Nothing more.

What are the lines before -EBADF?

For example:

May 28 17:53:42 satnogs satnogs-client[360]: gr-satellites: Observation: 5996012, Norad: 41460, Name: 0_AAUSAT_4, Script: satnogs_fsk.py
May 28 17:53:42 satnogs satnogs-client[360]: gr-satellites: Starting observation 5996012
May 28 17:53:43 satnogs satnogs-client[360]: gr-satellites: running at 48000 sps
May 28 17:53:45 satnogs satnogs-client[360]: Assertion 'close_nointr(fd) != -EBADF' failed at ../src/basic/fd-util.c:67, function safe_close(). Aborting.
May 28 17:53:49 satnogs satnogs-client[360]: gr-satellites: Observation: 5996012, Norad: 41460, Name: 0_AAUSAT_4, Script: satnogs_fsk.py
May 28 17:53:49 satnogs satnogs-client[360]: gr-satellites: Stopping observation 5996012
May 28 17:53:49 satnogs satnogs-client[360]: satnogsclient.observer.observer - ERROR - No waterfall data file found

When successful the log looks like:

May 29 05:51:14 satnogs satnogs-client[360]: gr-satellites: Starting observation 5996006
May 29 05:51:15 satnogs satnogs-client[360]: gr-satellites: running at 57600 sps
May 29 05:51:17 satnogs satnogs-client[360]: [INFO] [UHD] linux; GNU C++ version 8.3.0; Boost_106700; UHD_3.15.0.0-4satnogs1
May 29 05:51:18 satnogs satnogs-client[360]: [INFO] Using format CF32.
May 29 06:00:37 satnogs satnogs-client[360]: gr-satellites: Observation: 5996006, Norad: 40014, Name: BUGSAT-1_TITA, Script: satnogs_fsk.py
May 29 06:00:37 satnogs satnogs-client[360]: gr-satellites: Stopping observation 5996006

It looks like the failure may be in gr-satellites not satnogs-client itself.