Slow (initial) ssh logins

I’d been annoyed about a problem when connecting to my NexentaStor appliances over ssh, in that always the initial connection would take a good 20 seconds before I got the password prompt.
Subsequent logins would then run in a reasonable response time, so once that first login of the day had been completed this wouldn’t be an issue … but still, that initial delay really started to bother me.

The traditional way to start debugging such issues is to use the -v switch when invoking ssh and so that’s what I did and discovered we appeared to be waiting here:

debug1: Remote protocol version 2.0, remote software version Sun_SSH_1.5
debug1: no match: Sun_SSH_1.5
debug1: Authenticating to 10.0.0.1:22 as 'root'
debug1: SSH2_MSG_KEXINIT sent

Searching Google for SSH_MSG_KEXINIT and slow ssh logins revealed a number of hits, all of which were wrong for this particular problem.   I tried most of the ideas, including:

  • Removing mdns from the /etc/nsswitch.conf file for the hosts: entry
  • Ensuring client and server could resolve each other in the local /etc/hosts file
  • Reviewing Illumos Bug #1983 (sshd problems)
  • Disabling and unloading the crypto pkcs11_tpm.so module
  • Attempting to prevent reverse DNS lookups

Having exhausted this list and reviewing a bunch of non-Solaris/Illumos based ssh issues, I went back to basics and decided to truss sshd on the appliance after rebooting the appliance and then logging into the console.

# truss -o /tmp/sshd.truss -Dealf -rall -vall -wall -p `pgrep sshd`

Then I attempted the initial ssh from my Mac, got the delay, logged in and then stopped the truss so I could review it.
The truss was huge, much bigger than I’d anticipated, which meant analysing it was going to take much longer than I’d planned.

# ls -lh /tmp/sshd.truss
-rw-r--r-- 1 root root 137M Dec 7 13:21 /tmp/sshd.truss

Frustratingly there were no traces of the ‘SSH2_MSG_KEXINIT’ string (nor any KEX related string), which is probably because these messages are client side, rather than server side debug data.
Using the -D switch turns on time stamp deltas, useful for when you’re searching for a long pause between functions but alas where there were interesting delays between functions, these were uneventful forkx() and vforkx() calls:

# awk '$2 ~ /[0-9]*\.[0-9][0-9][0-9][0-9]/ {print $0}' /tmp/sshd.truss | sort -n -k2 | tail
1305/1: 45.9956 forkx() (returning as child ...) = 1272
1307/1: 46.0128 forkx() (returning as child ...) = 1305
1309/1: 46.0365 vforkx() (returning as child ...) = 1305
1311/1: 46.0539 forkx() (returning as child ...) = 1272
1313/1: 46.6702 forkx() (returning as child ...) = 1272
1315/1: 46.7773 forkx() (returning as child ...) = 1272
1317/1: 46.8037 forkx() (returning as child ...) = 1272
1319/1: 62.3412 forkx() (returning as child ...) = 1272
1321/1: 62.3736 forkx() (returning as child ...) = 1319
1323/1: 62.3934 forkx() (returning as child ...) = 1319

Still as I worked through the code I couldn’t help notice that we seemed to be doing an awful lot of work reading locale files:

1261/1:          0.0001 open("/usr/lib/locale//en_US.UTF-8/LC_CTYPE/LCL_DATA", O_RDONLY) = 5
    ...
1261/1: 0.0002 read(5, 0x0807BC9C, 95232) = 94904
1261/1: R u n e M a g 1 U T F - 8\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
1261/1: \0\0\0\0\0\0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0
1261/1: \0\0\0 \0\0\0 \0\0\0 h\0\0\0 (\0\0\0 (\0\0\0 (\0\0\0 (\0\0\0
1261/1: \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0
1261/1: \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0
1261/1: \0\0\0 \0\0\0 H80\0\010A0\0\010A0\0\010A0\0\010A0\0\010A0\0\0
    ...
1261/1: A3A7\0\0A2A7\0\0A5A7\0\0A5A7\0\0A4A7\0\0A7A7\0\0A7A7\0\0A6A7\0\0
1261/1: A9A7\0\0A9A7\0\0A8A7\0\0 AFF\0\0 ZFF\0\0 !FF\0\0
1261/1: 0.0054 brk(0x080949D0) = 0
1261/1: 0.0001 brk(0x080B29D0) = 0
1261/1: 0.0005 llseek(5, 0, SEEK_CUR) = 94904
1261/1: 0.0003 close(5) = 0
1261/1: 0.0003 open("/usr/lib/locale//en_US.UTF-8/LC_NUMERIC/LCL_DATA", O_RDONLY) = 5
    .
    .
    .
1261/1:          0.0941 open("/usr/lib/locale//el_CY.UTF-8/LC_CTYPE/LCL_DATA", O_RDONLY) = 7
    ...
1261/1: 0.0074 read(7, 0x080B1AFC, 95232) = 94904
1261/1: R u n e M a g 1 U T F - 8\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
1261/1: \0\0\0\0\0\0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0
1261/1: \0\0\0 \0\0\0 \0\0\0 h\0\0\0 (\0\0\0 (\0\0\0 (\0\0\0 (\0\0\0
1261/1: \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0
1261/1: \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0 \0\0\0
1261/1: \0\0\0 \0\0\0 H80\0\010A0\0\010A0\0\010A0\0\010A0\0\010A0\0\0
1261/1: 10A0\0\010A0\0\010A0\0\010A0\0\010A0\0\010A0\0\010A0\0\010A0\0\0
1261/1: 10A0\0\010A0\0\084A0\0\084A0\0\084A0\0\084A0\0\084A0\0\084A0\0\0
1261/1: 84A0\0\084A0\0\084A0\0\084A0\0\010A0\0\010A0\0\010A0\0\010A0\0\0
    ...

In addition whilst the ssh to the server had been running I also happened to notice that the disk access light was flashing quite heavily.

So just how many locales are installed on this system?

# cd /usr/lib/locale
# ls -1 | wc -l
 223
# locale -a
C
POSIX
af_ZA.UTF-8
ar_AE.UTF-8
ar_BH.UTF-8
ar_DZ.UTF-8
ar_EG.UTF-8
ar_IQ.UTF-8
ar_JO.UTF-8
    .
    .
    .
zh_MO.UTF-8
zh_SG.UTF-8
zh_TW.UTF-8

# find . -type f -print | wc -l
 1342

# grep 'open.*/usr/lib/locale' /tmp/sshd.truss | wc -l
 2787

# dpkg -l | grep -i locale
ii library-perl-5-encode-lo 40-0-3 Determine the locale encoding
ii library-perl-5-gettext-l 40-0-0 Gettext-Locale PERL module
ii locale-af 40-3-3 Afrikaans language support
ii locale-ar 40-3-3 Arabic language support
ii locale-as 40-3-3 Assamese language support
ii locale-az 40-3-3 Azerbaijani language support
ii locale-be 40-3-3 Belarusian language support
ii locale-bg 40-3-3 Bulgarian language support
ii locale-bg-extra 40-3-3 Bulgarian language support extra files
ii locale-bn 40-3-3 Bengali language support
    .
    .
    .
ii locale-zh-sg 40-3-3 Singapore Chinese language support
ii locale-zh-tw 40-3-3 Traditional Chinese language support
ii system-library-iconv-utf 1.0.2 Iconv modules for UTF-8 Locale
ii text-locale 40-3-3 System Localization

Quite a few it turns out … could these be the cause of the initial slow down as we have to read them from disk and subsequently the data is cached in the ARC?
Whilst it’s a good thing there are so many locales available, for my particular requirements I really don’t need so many, so I set about removing ones that I’ll never need.

# dpkg -f locale-vi locale-ur locale-uk locale-ug locale-tr-extra locale-tr locale-th-extra locale-th

(repeat for all non-required locale packages)

After completing this then rebooting the appliance, an ssh to this system completed much much quicker taking just a couple of seconds.