Tuesday, September 22, 2015

Lesson Learned: Issues with Signature Algorithms

Over the weekend we had a client whose Exchange and Lync internally signed certificates expired. On Sunday I went through their entire deployment and replaced all of the certificates with new certs issued by their PKI. 

Monday morning, they reported that their Lync clients were working, however all of their phones were not able to sign in. I immediately realized I had not replaced the certificate on their F5 and did not have the management IP and credentials. The IT person who had this information was on a plane and would be out of pocket for a couple hours. Therefore, in order to avoid an extended outage I attempted to change DNS for the VIP, that I thought their phones (CX600) were using, to point directly to one of the FE’s and skip the F5. After making the change they reported that all of their phones were still down. We could not get logs from the phones, so I thought that this had to still be somehow connected with the F5 but could not be certain due to lack of logs.

I was able to obtain the information for the F5 from a colleague luckily and began the certificate replacement process however the F5 would NOT accept the certificate. I reached out to one of our MVP’s Jeff Guillet and we were still not able to get it to take the certificate. I then escalated to F5 support at which point we attempted to export and import the certificate in a multitude of different ways. We tried the certificate by itself, no extended properties, importing via text file, importing it via CLI nothing seemed to work. When we pulled a packet capture, we saw the client hello, however we did not see a server hello in response:

Running via CLI: tcpdump -n -i 0.0:nnn -s0 -w /var/tmp/1-1475239048.pcap host or –vvv

We than ran an OpenSSL command on the F5 that would dump the certificate information when an attempt to connect to the VIP was made, this resulted in no certificate being sent:

[admin@sac-f5-02:Active:Changes Pending] ~ # openssl s_client -connect

At this point, we swapped the old expired certificate back and verified that we were able to obtain output with a certificate warning which we could and running the same command showed the old cert and chain:

[admin@sac-f5-02:Active:Changes Pending] ~ # openssl s_client -connect

We then attempted a couple other variations of importing and exporting the certificate. We enabled debug logging on the SSL components, and then dumped the SSL log to the CLI:

tmsh modify /sys db log.ssl.level value Debug
tailf /var/log/ltm |grep -i 'ssl'

However this resulted in nothing showing up, I verified that logging was working by hitting another one of the VIP’s and the connection showed up in the logs. We then attempted to reboot the passive F5, and failover to that unit once it came back online in an attempt at answering the age old question “Did you reboot?” however this also did not make a change. We once again tried a series of imports and exports on the unit just to make sure it wasn’t a combination of the reboot failover and importing. No luck.

We tried one other command that essentially makes a connection and then dumps the output of that connection:

At this point, our client had been without phones for a little more than half the day, we had already escalated at F5 and had three of their support engineers on the call. They sent out an all support announcement as we had stumped most of their support staff and engineering also was out of ideas. Finally someone got back to them and asked “What signature algorithm was being used?” We immediately pulled the certificate information from the F5:

openssl x509 -in /config/filestore/files_d/Common_d/certificate_d/\:Common\:Lync2013-Web-int-2015-V4.crt_51169_1  -noout -text

We responded to the individual who asked, who then brought it to our attention that F5 does not support the RSASSA-PSS algorithm. We were able to find a posting on F5’s support forums that described a similar output from another user when suing RSASSA-PSS:

We were wondering why this all of a sudden started occurring. We had recently migrated their PKI from a single Root/Issuing server to a two tier PKI however it was supposed to be a 1:1 migration and no settings/configuration was to be changed outside of making it two tiers. We decided to check a certificate issued by their old root/issuing CA:

Then looking at all the certificates issued by the new intermediate/issuing CA:

A quick search on the internet also showed that Adobe, Citrix, Cisco, Firefox, and VMWare all do not support this algorithm and/or have various issues with its use. Various blog posts and forum entries alluded to that you had to rebuilt your PKI if this was the case. At this point we thought that we had two options, purchase a 3rd party certificate for the F5 with just the pool name or bypass SSL on the F5. After informing the client they elected to go with a Godaddy certificate. After obtaining, installing and verifying that it worked, we asked the client to then test a phone. They reported back that the phones were still not able to sign in…. so we then pulled the DHCP options and lo and behold they were pointing to the FE pool directly and not the F5. So all of this work on the F5 while important, was not the root cause of the phone issue. I immediately thought well if all of these companies and their devices don’t support RSASS-PSS then maybe the phones don’t either. Sure enough the Polycom CX line of phones does NOT support it!

EDIT: I have also been informed that the VVX line running at least 5.3.1 also do not support RSASSA-PSS.

We knew at this point we were looking at having to change their PKI due to the fact that we needed a .local on their FE pool’s internal cert and could not obtain that from Godaddy. We opened a up a ticket with Microsoft PSS and while waiting on the call back started looking deeper into how the PKI was setup. We noticed that their root CA and Intermediate/Issuing CA both were using the sha1RSA signature algorithm and not the RSASSA-PSS:

We thought this was strange, why was the intermediate CA issuing certs with a different signature algorithm than what their own certs were using. We attempted cloning the web server template and selecting different Cryptography providers, however this also did not work. We did a bit more research and noticed that one of the common “resolutions” when rebuilding the PKI was that we needed to disable alternatesignaturealgorithm by setting its value in the registry to 0. So we decided because the root and intermediate CA’s were using sha1RSA why not just try disabling that on the intermediate. 

We made the change using the following command followed by restarting the certificate service:

certutil -setreg csp\alternatesignaturealgorithm 0
net stop certsvc && net start certsvc

We then reissued the FE pool cert and lo and behold we finally had a certificate using an acceptable signature algorithm:

We immediately assigned the Lync FE pool to use this new cert and were able to confirm with the client that their phones were able to sign in!

Lesson Learned: Check your signature algorithm when migrating PKI’s and whatever you use check for compatibility!

A big thanks to Jeff Guillet, Rick Steele, and Scott Winslow for assisting in this effort!

1 comment:

  1. I had to add 'ca' to the command
    certutil -setreg ca\csp\alternatesignaturealgorithm 0