Leopard DNS Issues (and work-around)
Shortly after Leopard was released we began receiving a few reports of issues, typically a spinning color wheel on startup, that affected a small number of users. We were unable to replicate the problem in-house and eventually determined it was something specific to the user’s network connection (since they could often get it to work on the same machine from work but not home or vise-versa). With some additional help and debugging from affected users we were able to track the problem down to changes in DNS resolution in Leopard. We’re documenting the problem here to help Leopard users who may be having problems with other applications, as well as application developers or ISPs who may start getting complaints.
Note: the rest of this post is technical info. For users on Leopard having trouble with Jungle Disk, you can download the latest Beta version for a fix. This problem only affects some users with old or misconfigured DNS servers - most Leopard users should be fine already. We’ll be releasing a non-beta version shortly incorporating the change.
Summary
The DNS resolver in Leopard has been changed to first attempt SRV requests for lookups initiated by the getaddrinfo() function. If the user’s DNS server drops these requests the DNS lookup may take an extended period of time to complete (30 seconds to several minutes) as Leopard tries different domain requests and eventually falls back to making an A record request. This can result in application freezes or timeouts, as was occurring with Jungle Disk.
Details
There are two primary BSD APIs for domain name resolution - gethostbyname() and getaddrinfo(). getaddrinfo() is the more robust API with better support for things like IPV6. Jungle Disk uses libcurl for HTTP access (as do many other apps), which by default uses getaddrinfo() when compiled with IPV6 support.
On OSX 10.4 (Tiger) both APIs perform similarly - they do an A record lookup of the provided domain. On 10.5 (Leopard) the getaddrinfo() API attempts an SRV lookup first when provided with a well-known port. For example, when performing a lookup for “s3.amazonaws.com” on port 80, you will first see requests for:
- SRV _http._tcp.s3.amazonaws.com
- SRV _http._tcp.s3.amazonaws.com.yourlocaldomain.com
Depending on your DNS server, you will get one of three possible responses:
- The correct SRV response (in this case a CNAME record for s3-directional-w.amazonaws.com)
- NXDOMAIN (if your server does not support or understand SRV requests)
- No response at all
In the second case, Leopard will fall back to doing a A record query after the SRV requests fail and although there is a slight delay it is not generally visible to the user. In the last case Leopard will make several retries of the query over a period of several minutes, finally falling back to an A record query. During this retry period the application will appear frozen or unresponsive (depending on what thread the lookup occurs on).
It’s important to note that this problem does not appear to be a bug in Leopard - it’s caused by old, buggy, or misconfigured DNS servers. The change in Leopard to use the latest IETF recommendations for DNS lookups is simply bringing the DNS server problem to the surface. It’s unclear how many users or applications will be affected by this change, since it only appears with some DNS servers and only for applications using getaddrinfo (most applications still use gethostbyname). For many users these days, their DNS server is actually their home router, which then proxies the request (possibly with a local cache) - so updating router firmware may address the issue.
To work around the issue in Jungle Disk we’ve switched off IPV6 support in libcurl, which changes it to use the gethostbyname() function. It’s not clear if there is a way to disable SRV lookups system-wide on Leopard to fix other applications using getaddrinfo(). Anyone with further information on this issue is encouraged to post in the comments.
If you’d like to see the behavior for yourself, try the following test on Tiger and Leopard:
- Open up two terminal windows
- In one window, run “sudo tcpdump port 53″
- In the second window, run “curl http://s3.amazonaws.com” (or another domain)
On Tiger you will see an initial request for “A? s3.amazonaws.com” and reply. On Leopard you will see a request for “SRV? _http._tcp.s3.amazonaws.com”. Depending on your DNS server you will then see either the correct response, a NXDomain response, or a series of timeouts or retries.


