Skip to content

Bug Write Up

sfd_init.sh fails because it incorrectly checks ntp status. The failure condition can be replicated by using an ntp server of time.google.com on template import for the OVA.

SFD version is 1.1.0.

admin@sfd.local@sfd:~$ cat /etc/*-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
admin@sfd.local@sfd:~$ diff -c sfd_init.sh.original sfd_init.sh.patch
*** sfd_init.sh.original 2020-02-04 21:58:22.327755522 +0000
--- sfd_init.sh.patch 2020-02-04 21:59:38.651751964 +0000
***************
*** 188,194 ****
echo "Checking if NTP is configured."
until [[ $cnt -ge $max_tries ]]
do
! ntpq -pn | awk '{print $2}' | xargs | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}|PPS|GPS|LOCAL|LOCL"
[[ "$?" -eq 0 ]] && echo "NTP synchronized successfully." && return 0
cnt=$[$cnt+1]
echo "Waiting for NTP to get configured. Sleeping for 10 seconds..."
--- 188,194 ----
echo "Checking if NTP is configured."
until [[ $cnt -ge $max_tries ]]
do
! ntpq -pn | awk '{print $1}' | sed -n '/^\*/p'
[[ "$?" -eq 0 ]] && echo "NTP synchronized successfully." && return 0
cnt=$[$cnt+1]
echo "Waiting for NTP to get configured. Sleeping for 10 seconds..."

See below for a detailed explanation of my troubleshooting and additional information.

Based on the regex looking for an IP I’m guessing the intent was to verify that the server is synched to a remote NTP server. However there are two problems. The first is that it is looking at the refid (column 2) which is not the server you are synched to, but the server that the server you are pointing at is synched to. Second, there is no guarentee that an IP address will be in this column (or PPS/GPS/LOCAL/LOCL). This was my case. Instead, I would check for the line that has the synchronized server (starts with *). Using ntpq -pn | awk '{print $1}' | sed -n '/^\*/p'. You don’t really need to check beyond that because it won’t synchronize unless everything is working. Though if you want to take it a step further you could do: ntpq -pn | awk '{print $1}' | sed -n '/^\*/p' | sed 's/\*//g' | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}|PPS|GPS|LOCAL|LOCL".

  • Page 17 step 1 of the manual doesn’t work. There is no web server listening. It looks like on the first run the SFD service failed. There is no obvious log information output from the service. Suggest that sfd_init.sh print the log path to stdout on failure so that getting the service status will more obviously provide direction on where the problem is.

    # Dumped listening ports
    admin@sfd.local@sfd:~$ ss -ltn
    State Recv-Q Send-Q Local Address:Port Peer Address:Port
    LISTEN 0 128 127.0.0.1:39125 *:*
    LISTEN 0 128 *:22 *:*
    LISTEN 0 128 :::9100 :::*
    LISTEN 0 128 :::5555 :::*
    LISTEN 0 128 :::22 :::*
    # Checked service status after figuring out SFD ran a service
    admin@sfd.local@sfd:~$ systemctl status sfd
    ● sfd.service - NFC Bootstrap Service
    Loaded: loaded (/etc/systemd/system/sfd.service; enabled; vendor preset: enabled)
    Active: failed (Result: exit-code) since Tue 2020-02-04 20:38:10 UTC; 18min ago
    Main PID: 1327 (code=exited, status=1/FAILURE)
  • Entire system will hard fail if NTP cannot start. I realized this is the cause of the failure above. This should not be the default behavior. Should continue and issue a warning. Most customers working on servers would not know how to troubleshoot this.

    ● ntp.service - LSB: Start NTP daemon
    Loaded: loaded (/etc/init.d/ntp; bad; vendor preset: enabled)
    Active: active (running) since Wed 2020-02-05 03:50:34 UTC; 10s ago
    Docs: man:systemd-sysv-generator(8)
    Process: 2235 ExecStop=/etc/init.d/ntp stop (code=exited, status=0/SUCCESS)
    Process: 2248 ExecStart=/etc/init.d/ntp start (code=exited, status=0/SUCCESS)
    Tasks: 2
    Memory: 1.5M
    CPU: 12ms
    CGroup: /system.slice/ntp.service
    └─2260 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 112:117
    Feb 05 03:50:34 sfd ntpd[2260]: Listen and drop on 1 v4wildcard 0.0.0.0:123
    Feb 05 03:50:34 sfd ntpd[2260]: Listen normally on 2 lo 127.0.0.1:123
    Feb 05 03:50:34 sfd ntpd[2260]: Listen normally on 3 ens192 192.168.1.31:123
    Feb 05 03:50:34 sfd ntpd[2260]: Listen normally on 4 lo [::1]:123
    Feb 05 03:50:34 sfd ntpd[2260]: Listening on routing socket on fd #21 for interface updates
    Feb 05 03:50:35 sfd ntpd[2260]: Soliciting pool server 216.239.35.12
    Feb 05 03:50:36 sfd ntpd[2260]: Soliciting pool server 216.239.35.4
    Feb 05 03:50:37 sfd ntpd[2260]: Soliciting pool server 216.239.35.0
    Feb 05 03:50:38 sfd ntpd[2260]: Soliciting pool server 216.239.35.8
    Feb 05 03:50:39 sfd ntpd[2260]: Soliciting pool server 2001:4860:4806::
    -----RESTART NTP - END-----
    Checking if NTP service is running.
    NTP service is running.
    Checking if NTP is configured.
    Waiting for NTP to get configured. Sleeping for 10 seconds...
    ... SNIP ...
    Waiting for NTP to get configured. Sleeping for 10 seconds...
    NTP failed to get synced.
    NTP config failed
    # This failure seems to be erroneous. Checking the NTP service confirms it is working.
    admin@sfd.local@sfd:~$ systemctl status ntp
    ● ntp.service - LSB: Start NTP daemon
    Loaded: loaded (/etc/init.d/ntp; bad; vendor preset: enabled)
    Active: active (running) since Wed 2020-02-05 03:50:34 UTC; 6h left
    Docs: man:systemd-sysv-generator(8)
    Process: 2235 ExecStop=/etc/init.d/ntp stop (code=exited, status=0/SUCCESS)
    Process: 2248 ExecStart=/etc/init.d/ntp start (code=exited, status=0/SUCCESS)
    Tasks: 2
    Memory: 1.5M
    CPU: 81ms
    CGroup: /system.slice/ntp.service
    └─2260 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 112:117
    admin@sfd.local@sfd:~$ ntpstat
    The program 'ntpstat' is currently not installed. You can install it by typing:
    sudo apt install ntpstat
    admin@sfd.local@sfd:~$ ntp -qn
    No command 'ntp' found, but there are 21 similar ones
    ntp: command not found
    admin@sfd.local@sfd:~$ ntpq -pn
    remote refid st t when poll reach delay offset jitter
    ==============================================================================
    time.google.com .POOL. 16 p - 64 0 0.000 0.000 0.000
    +216.239.35.12 .GOOG. 1 u 54 64 375 64.340 -91.086 8.997
    *216.239.35.4 .GOOG. 1 u 62 64 377 62.412 -83.944 8.385
    +216.239.35.0 .GOOG. 1 u 42 64 377 63.484 -83.978 10.080
    -216.239.35.8 .GOOG. 1 u 21 64 377 72.245 -73.087 15.674
  • Problem isolated to:

    max_tries=30 # check counter till 30. (total 300 seconds)
    echo "Checking if NTP is configured."
    until [[ $cnt -ge $max_tries ]]
    do
    ntpq -pn | awk '{print $2}' | xargs | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}|PPS|GPS|LOCAL|LOCL"
    [[ "$?" -eq 0 ]] && echo "NTP synchronized successfully." && return 0
    cnt=$[$cnt+1]
    echo "Waiting for NTP to get configured. Sleeping for 10 seconds..."
    sleep 10
    done
  • The offending line is ntpq -pn | awk '{print $2}' | xargs | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}|PPS|GPS|LOCAL|LOCL". ntpq -pn has output formatted like this:

    admin@sfd.local@sfd:~$ ntpq -pn
    remote refid st t when poll reach delay offset jitter
    ==============================================================================
    time.google.com .POOL. 16 p - 64 0 0.000 0.000 0.000
    +216.239.35.12 .GOOG. 1 u 54 64 375 64.340 -91.086 8.997
    *216.239.35.4 .GOOG. 1 u 62 64 377 62.412 -83.944 8.385
    +216.239.35.0 .GOOG. 1 u 42 64 377 63.484 -83.978 10.080
    -216.239.35.8 .GOOG. 1 u 21 64 377 72.245 -73.087 15.674
  • Based on the regex looking for an IP I’m guessing the intent was to verify that the server is synched to a remote NTP server. However there are two problems. The first is that it is looking at the refid which is not the server you are synched to, but the server that the server you are pointing at is synched to. Second, there is no guarentee that an IP address will be in this column (or PPS/GPS/LOCAL/LOCL). This was my case. See below for my ntpq -pn results. Instead, I would check for the line that has the synchronized server (starts with *). Using ntpq -pn | awk '{print $1}' | sed -n '/^\*/p'. You don’t really need to check beyond that because it won’t synchronize unless everything is working. Though if you want to take it a step further you could do: ntpq -pn | awk '{print $1}' | sed -n '/^\*/p' | sed 's/\*//g' | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}|PPS|GPS|LOCAL|LOCL".

My ntpq -pn with time.google.com as my server

Section titled “My ntpq -pn with time.google.com as my server”
admin@sfd.local@sfd:~$ ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
time.google.com .POOL. 16 p - 64 0 0.000 0.000 0.000
*216.239.35.12 .GOOG. 1 u 38 64 377 67.370 -0.549 23.161
+216.239.35.4 .GOOG. 1 u 29 64 377 75.466 -1.691 22.089
+216.239.35.0 .GOOG. 1 u 36 64 377 67.731 -1.434 24.201
+216.239.35.8 .GOOG. 1 u 38 64 377 76.705 -0.042 22.436