Bug Write Up
sfd_init.sh fails because it incorrectly checks ntp status. The failure condition can be replicated by using an ntp server
of time.google.com on template import for the OVA.
Version
Section titled “Version”SFD version is 1.1.0.
Ubuntu
Section titled “Ubuntu”admin@sfd.local@sfd:~$ cat /etc/*-releaseDISTRIB_ID=UbuntuDISTRIB_RELEASE=16.04DISTRIB_CODENAME=xenialDISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"NAME="Ubuntu"VERSION="16.04.6 LTS (Xenial Xerus)"ID=ubuntuID_LIKE=debianPRETTY_NAME="Ubuntu 16.04.6 LTS"VERSION_ID="16.04"HOME_URL="http://www.ubuntu.com/"SUPPORT_URL="http://help.ubuntu.com/"BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"VERSION_CODENAME=xenialUBUNTU_CODENAME=xenialSuggested Patch
Section titled “Suggested Patch”admin@sfd.local@sfd:~$ diff -c sfd_init.sh.original sfd_init.sh.patch*** sfd_init.sh.original 2020-02-04 21:58:22.327755522 +0000--- sfd_init.sh.patch 2020-02-04 21:59:38.651751964 +0000****************** 188,194 **** echo "Checking if NTP is configured." until [[ $cnt -ge $max_tries ]] do! ntpq -pn | awk '{print $2}' | xargs | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}|PPS|GPS|LOCAL|LOCL" [[ "$?" -eq 0 ]] && echo "NTP synchronized successfully." && return 0 cnt=$[$cnt+1] echo "Waiting for NTP to get configured. Sleeping for 10 seconds..."--- 188,194 ---- echo "Checking if NTP is configured." until [[ $cnt -ge $max_tries ]] do! ntpq -pn | awk '{print $1}' | sed -n '/^\*/p' [[ "$?" -eq 0 ]] && echo "NTP synchronized successfully." && return 0 cnt=$[$cnt+1] echo "Waiting for NTP to get configured. Sleeping for 10 seconds..."Explanation
Section titled “Explanation”See below for a detailed explanation of my troubleshooting and additional information.
Based on the regex looking for an IP I’m guessing the intent was to verify that the server is synched to a remote NTP server. However there are two problems. The first is that it is looking at the refid (column 2) which is not the server you are synched to, but the server that the server you are pointing at is synched to. Second, there is no guarentee that an IP address will be in this column (or PPS/GPS/LOCAL/LOCL). This was my case. Instead, I would check for the line that has the synchronized server (starts with *). Using ntpq -pn | awk '{print $1}' | sed -n '/^\*/p'. You don’t really need to check beyond that because it won’t synchronize unless everything is working. Though if you want to take it a step further you could do: ntpq -pn | awk '{print $1}' | sed -n '/^\*/p' | sed 's/\*//g' | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}|PPS|GPS|LOCAL|LOCL".
Detailed Explanation
Section titled “Detailed Explanation”-
Page 17 step 1 of the manual doesn’t work. There is no web server listening. It looks like on the first run the SFD service failed. There is no obvious log information output from the service. Suggest that
sfd_init.shprint the log path to stdout on failure so that getting the service status will more obviously provide direction on where the problem is.# Dumped listening portsadmin@sfd.local@sfd:~$ ss -ltnState Recv-Q Send-Q Local Address:Port Peer Address:PortLISTEN 0 128 127.0.0.1:39125 *:*LISTEN 0 128 *:22 *:*LISTEN 0 128 :::9100 :::*LISTEN 0 128 :::5555 :::*LISTEN 0 128 :::22 :::*# Checked service status after figuring out SFD ran a serviceadmin@sfd.local@sfd:~$ systemctl status sfd● sfd.service - NFC Bootstrap ServiceLoaded: loaded (/etc/systemd/system/sfd.service; enabled; vendor preset: enabled)Active: failed (Result: exit-code) since Tue 2020-02-04 20:38:10 UTC; 18min agoMain PID: 1327 (code=exited, status=1/FAILURE) -
Entire system will hard fail if NTP cannot start. I realized this is the cause of the failure above. This should not be the default behavior. Should continue and issue a warning. Most customers working on servers would not know how to troubleshoot this.
● ntp.service - LSB: Start NTP daemonLoaded: loaded (/etc/init.d/ntp; bad; vendor preset: enabled)Active: active (running) since Wed 2020-02-05 03:50:34 UTC; 10s agoDocs: man:systemd-sysv-generator(8)Process: 2235 ExecStop=/etc/init.d/ntp stop (code=exited, status=0/SUCCESS)Process: 2248 ExecStart=/etc/init.d/ntp start (code=exited, status=0/SUCCESS)Tasks: 2Memory: 1.5MCPU: 12msCGroup: /system.slice/ntp.service└─2260 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 112:117Feb 05 03:50:34 sfd ntpd[2260]: Listen and drop on 1 v4wildcard 0.0.0.0:123Feb 05 03:50:34 sfd ntpd[2260]: Listen normally on 2 lo 127.0.0.1:123Feb 05 03:50:34 sfd ntpd[2260]: Listen normally on 3 ens192 192.168.1.31:123Feb 05 03:50:34 sfd ntpd[2260]: Listen normally on 4 lo [::1]:123Feb 05 03:50:34 sfd ntpd[2260]: Listening on routing socket on fd #21 for interface updatesFeb 05 03:50:35 sfd ntpd[2260]: Soliciting pool server 216.239.35.12Feb 05 03:50:36 sfd ntpd[2260]: Soliciting pool server 216.239.35.4Feb 05 03:50:37 sfd ntpd[2260]: Soliciting pool server 216.239.35.0Feb 05 03:50:38 sfd ntpd[2260]: Soliciting pool server 216.239.35.8Feb 05 03:50:39 sfd ntpd[2260]: Soliciting pool server 2001:4860:4806::-----RESTART NTP - END-----Checking if NTP service is running.NTP service is running.Checking if NTP is configured.Waiting for NTP to get configured. Sleeping for 10 seconds...... SNIP ...Waiting for NTP to get configured. Sleeping for 10 seconds...NTP failed to get synced.NTP config failed# This failure seems to be erroneous. Checking the NTP service confirms it is working.admin@sfd.local@sfd:~$ systemctl status ntp● ntp.service - LSB: Start NTP daemonLoaded: loaded (/etc/init.d/ntp; bad; vendor preset: enabled)Active: active (running) since Wed 2020-02-05 03:50:34 UTC; 6h leftDocs: man:systemd-sysv-generator(8)Process: 2235 ExecStop=/etc/init.d/ntp stop (code=exited, status=0/SUCCESS)Process: 2248 ExecStart=/etc/init.d/ntp start (code=exited, status=0/SUCCESS)Tasks: 2Memory: 1.5MCPU: 81msCGroup: /system.slice/ntp.service└─2260 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 112:117admin@sfd.local@sfd:~$ ntpstatThe program 'ntpstat' is currently not installed. You can install it by typing:sudo apt install ntpstatadmin@sfd.local@sfd:~$ ntp -qnNo command 'ntp' found, but there are 21 similar onesntp: command not foundadmin@sfd.local@sfd:~$ ntpq -pnremote refid st t when poll reach delay offset jitter==============================================================================time.google.com .POOL. 16 p - 64 0 0.000 0.000 0.000+216.239.35.12 .GOOG. 1 u 54 64 375 64.340 -91.086 8.997*216.239.35.4 .GOOG. 1 u 62 64 377 62.412 -83.944 8.385+216.239.35.0 .GOOG. 1 u 42 64 377 63.484 -83.978 10.080-216.239.35.8 .GOOG. 1 u 21 64 377 72.245 -73.087 15.674 -
Problem isolated to:
max_tries=30 # check counter till 30. (total 300 seconds)echo "Checking if NTP is configured."until [[ $cnt -ge $max_tries ]]dontpq -pn | awk '{print $2}' | xargs | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}|PPS|GPS|LOCAL|LOCL"[[ "$?" -eq 0 ]] && echo "NTP synchronized successfully." && return 0cnt=$[$cnt+1]echo "Waiting for NTP to get configured. Sleeping for 10 seconds..."sleep 10done -
The offending line is
ntpq -pn | awk '{print $2}' | xargs | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}|PPS|GPS|LOCAL|LOCL".ntpq -pnhas output formatted like this:admin@sfd.local@sfd:~$ ntpq -pnremote refid st t when poll reach delay offset jitter==============================================================================time.google.com .POOL. 16 p - 64 0 0.000 0.000 0.000+216.239.35.12 .GOOG. 1 u 54 64 375 64.340 -91.086 8.997*216.239.35.4 .GOOG. 1 u 62 64 377 62.412 -83.944 8.385+216.239.35.0 .GOOG. 1 u 42 64 377 63.484 -83.978 10.080-216.239.35.8 .GOOG. 1 u 21 64 377 72.245 -73.087 15.674 -
Based on the regex looking for an IP I’m guessing the intent was to verify that the server is synched to a remote NTP server. However there are two problems. The first is that it is looking at the refid which is not the server you are synched to, but the server that the server you are pointing at is synched to. Second, there is no guarentee that an IP address will be in this column (or PPS/GPS/LOCAL/LOCL). This was my case. See below for my
ntpq -pnresults. Instead, I would check for the line that has the synchronized server (starts with *). Usingntpq -pn | awk '{print $1}' | sed -n '/^\*/p'. You don’t really need to check beyond that because it won’t synchronize unless everything is working. Though if you want to take it a step further you could do:ntpq -pn | awk '{print $1}' | sed -n '/^\*/p' | sed 's/\*//g' | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}|PPS|GPS|LOCAL|LOCL".
My ntpq -pn with time.google.com as my server
Section titled “My ntpq -pn with time.google.com as my server” admin@sfd.local@sfd:~$ ntpq -pn remote refid st t when poll reach delay offset jitter ============================================================================== time.google.com .POOL. 16 p - 64 0 0.000 0.000 0.000 *216.239.35.12 .GOOG. 1 u 38 64 377 67.370 -0.549 23.161 +216.239.35.4 .GOOG. 1 u 29 64 377 75.466 -1.691 22.089 +216.239.35.0 .GOOG. 1 u 36 64 377 67.731 -1.434 24.201 +216.239.35.8 .GOOG. 1 u 38 64 377 76.705 -0.042 22.436