[9fans] small dns improvements

  • Follow


it must be that time of year.  dns is driving folks bats.  :-)

i've been spending some time looking at why ndb/dns fails.  as is well kn=
own,
there are very long-standing locking problems.  in the past, i've gotten =
hung up on
those and not made any progress.  while imho, the long-term strategy shou=
ld be
to replace ndb/dns with an easier-to-maintain structure, i only have a fe=
w weeks
to fix as much as possible.  so i decided to see if there were simple thi=
ngs we could
do to improve things.

geoff has made a few big improvements.  some sites which were broken for =
a long
time are now working.  tomshardware.com is one that i've used as a test, =
and it
finally works.  (although the results don't seem worth the effort.  =E2=98=
=BA)

but there are a number of other lookups that are still broken for me, and=
 it
there seem to be some straightforward reasons that i think i've fixed:

1.  we're sending the RD (recursion desired) bit when we ourselves are ac=
ting as
a recursive server.  this looks okay by the standard, but many servers re=
turn Srvfail
(code 2, Rserver in the dns code) rather than ignoring this bit.  turning=
 this off
helps alot (example: ocsp.netsolssl.com).

2.  we're ignoring status codes that we should be treating as fatal (like=
 Srvfail)

3.  we're not using edns0.  this is kind of a sticky bit.  some servers i=
nsist on sending
enormous answers but don't answer via tcp.  on the other hand, some serve=
rs insist
on sending enormous answers, but return nasty errors when given edns0 que=
ries.
what seems to work best is to send udp/no edns0, udp/edns0 and finally tc=
p.

4.  we get confused attaching the name servers to an answer for an out-of=
-baliwick
cname record.  (this is largely a problem with logging, but has the poten=
tial to
corrupt the database.)

if anyone would like to try a 386 executable (amd64 available on request)=
,
i've put a copy at
	http://ftp.quanstro.net/other/^(dns dnsdebug)

i'd be happy to hear of any dns lookup problems.  please let me know
which version of dns you're using.

thanks,

- erik

0
Reply quanstro (3877) 1/16/2012 5:13:59 PM

--0015175cfb7eab005704b6a9047a
Content-Type: text/plain; charset=UTF-8

that one's not inherently fatal, in the sense that it shouldn't stop the
search.

On 16 January 2012 17:13, erik quanstrom <quanstro@quanstro.net> wrote:

> 2.  we're ignoring status codes that we should be treating as fatal (like
> Srvfail)

--0015175cfb7eab005704b6a9047a
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

that one&#39;s not inherently fatal, in the sense that it shouldn&#39;t sto=
p the search.<br><br><div class=3D"gmail_quote">On 16 January 2012 17:13, e=
rik quanstrom <span dir=3D"ltr">&lt;<a href=3D"mailto:quanstro@quanstro.net=
">quanstro@quanstro.net</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">2. =C2=A0we&#39;re ignoring status codes tha=
t we should be treating as fatal (like Srvfail)</blockquote></div><br>

--0015175cfb7eab005704b6a9047a--

0
Reply charles.forsyth (162) 1/16/2012 6:02:30 PM


On Mon Jan 16 13:03:38 EST 2012, charles.forsyth@gmail.com wrote:

> that one's not inherently fatal, in the sense that it shouldn't stop the
> search.
> 
> On 16 January 2012 17:13, erik quanstrom <quanstro@quanstro.net> wrote:
> 
> > 2.  we're ignoring status codes that we should be treating as fatal (like
> > Srvfail)

not clear enough.  we were persisting in asking the same question in the
same manner of a server returning srvfail, thus preventing us from asking
the same question in a different way, or of a different server.

we persisted long enough that we timed out the query before asking a reasonable
question of a capable server.

- erik

0
Reply quanstro (3877) 1/16/2012 6:05:38 PM

On Mon Jan 16 13:03:20 EST 2012, charles.forsyth@gmail.com wrote:

> that one's not inherently fatal, in the sense that it shouldn't stop the
> search.
> 
> On 16 January 2012 17:13, erik quanstrom <quanstro@quanstro.net> wrote:
> 
> > 2.  we're ignoring status codes that we should be treating as fatal (like
> > Srvfail)

also, i forgot that it's possible to return Srvfail and return some RRs.  these
all need to be ignored.  we weren't ignoring them in the past.

- erik

0
Reply quanstro3716 (244) 1/16/2012 6:07:49 PM

--000e0ce0f306f5a00404b6a92a52
Content-Type: text/plain; charset=UTF-8

ah.

On 16 January 2012 18:05, erik quanstrom <quanstro@quanstro.net> wrote:

> we were persisting in asking the same question in the
> same manner of a server returning srvfail, thus preventing us from asking
> the same question in a different way, or of a different server.
>

--000e0ce0f306f5a00404b6a92a52
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

ah.<br><br><div class=3D"gmail_quote">On 16 January 2012 18:05, erik quanst=
rom <span dir=3D"ltr">&lt;<a href=3D"mailto:quanstro@quanstro.net">quanstro=
@quanstro.net</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div id=3D":250">we were persisting in asking the same question in the<br>
same manner of a server returning srvfail, thus preventing us from asking<b=
r>
the same question in a different way, or of a different server.</div></bloc=
kquote></div><br>

--000e0ce0f306f5a00404b6a92a52--

0
Reply charles.forsyth (162) 1/16/2012 6:13:13 PM

On Mon Jan 16 13:14:01 EST 2012, charles.forsyth@gmail.com wrote:

> ah.
> 
> On 16 January 2012 18:05, erik quanstrom <quanstro@quanstro.net> wrote:
> 
> > we were persisting in asking the same question in the
> > same manner of a server returning srvfail, thus preventing us from asking
> > the same question in a different way, or of a different server.

thanks for asking the question.  the way i wrote it wasn't very clear.

here are just a few domains that i've had trouble with that work for
me now:

reject queries with the RD flag
	ocsp.netsolssl.com
	ocsp.trust-secure.com

hangs
	c.l.britecove.com
	world-100.bc.gapx.yahoodns.net

if you have a linux box, dig +trace is similar to dnsdebug.  if dig +trace
fails for a query, there's no point in debugging it.

- erik

0
Reply quanstro (3877) 1/16/2012 6:19:34 PM

5 Replies
19 Views

(page loaded in 0.119 seconds)


Reply: