Monday, June 23, 2008

Exim is a bit nuts

I'm using Nagios to monitor servers, but was having some trouble getting emails to exit the Ubuntu JeOS server that I had set up to run it under VMware. (Most of my stuff is Windows, and Nagios is a linux program). It turns out that a program called exim is used to send emails, and it's a bit crazy.

All attempts to send email to myself resulted in replies which contained the error:
all relevant MX records point to non-existent hosts
Thanks to this entry on the PkgExim4UserFAQ, I was able to get a clue

A probable cause for this might be that all MX records for the offending domain point to site local or link local IP addresses, which are ignored by the dnslookup router to protect from misconfigured external domains. The default configuration has relaxed checking for domains that the local system is configured to allow relaying to, so adding the offending domain to dc_relay_domains will most probably help. Please note that this entry might be necessary anyway to bypass relay control for the domains in question.

Please note that no domain on the public Internet should have MX records pointing to site local or link local IP addresses, so you might check your externally visible MX records.

If this doesn't help, try analyzing the output of exim -d -bt some.local.part@the.offending.domain.example

Well, I did the requisite test, to find this among the output

ignored host mail.xxxx.com [10.0.0.x]

Ignored host?? Clearly not the same as one that is non-existent. So... the first error was a lie.

All the relevant records pointed correctly to a very much alive and well host, but exim chose to ignore it because it was local.

In order to get around this, you have to follow the suggestion of Marc Haber and tell it that it is going to relay email for your local domain (which sounds like a very bad idea) in order to get it to work.

I don't know why they did it this way, but I'm posting this here to help others figure it out.