it must be that time of year. dns is driving folks bats. :-)
i've been spending some time looking at why ndb/dns fails. as is well kn=
own,
there are very long-standing locking problems. in the past, i've gotten =
hung up on
those and not made any progress. while imho, the long-term strategy shou=
ld be
to replace ndb/dns with an easier-to-maintain structure, i only have a fe=
w weeks
to fix as much as possible. so i decided to see if there were simple thi=
ngs we could
do to improve things.
geoff has made a few big improvements. some sites which were broken for =
a long
time are now working. tomshardware.com is one that i've used as a test, =
and it
finally works. (although the results don't seem worth the effort. =E2=98=
=BA)
but there are a number of other lookups that are still broken for me, and=
it
there seem to be some straightforward reasons that i think i've fixed:
1. we're sending the RD (recursion desired) bit when we ourselves are ac=
ting as
a recursive server. this looks okay by the standard, but many servers re=
turn Srvfail
(code 2, Rserver in the dns code) rather than ignoring this bit. turning=
this off
helps alot (example: ocsp.netsolssl.com).
2. we're ignoring status codes that we should be treating as fatal (like=
Srvfail)
3. we're not using edns0. this is kind of a sticky bit. some servers i=
nsist on sending
enormous answers but don't answer via tcp. on the other hand, some serve=
rs insist
on sending enormous answers, but return nasty errors when given edns0 que=
ries.
what seems to work best is to send udp/no edns0, udp/edns0 and finally tc=
p.
4. we get confused attaching the name servers to an answer for an out-of=
-baliwick
cname record. (this is largely a problem with logging, but has the poten=
tial to
corrupt the database.)
if anyone would like to try a 386 executable (amd64 available on request)=
,
i've put a copy at
http://ftp.quanstro.net/other/^(dns dnsdebug)
i'd be happy to hear of any dns lookup problems. please let me know
which version of dns you're using.
thanks,
- erik
|
|
0
|
|
|
|
Reply
|
quanstro (3877)
|
1/16/2012 5:13:59 PM |
|
--0015175cfb7eab005704b6a9047a
Content-Type: text/plain; charset=UTF-8
that one's not inherently fatal, in the sense that it shouldn't stop the
search.
On 16 January 2012 17:13, erik quanstrom <quanstro@quanstro.net> wrote:
> 2. we're ignoring status codes that we should be treating as fatal (like
> Srvfail)
--0015175cfb7eab005704b6a9047a
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
that one's not inherently fatal, in the sense that it shouldn't sto=
p the search.<br><br><div class=3D"gmail_quote">On 16 January 2012 17:13, e=
rik quanstrom <span dir=3D"ltr"><<a href=3D"mailto:quanstro@quanstro.net=
">quanstro@quanstro.net</a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">2. =C2=A0we're ignoring status codes tha=
t we should be treating as fatal (like Srvfail)</blockquote></div><br>
--0015175cfb7eab005704b6a9047a--
|
|
0
|
|
|
|
Reply
|
charles.forsyth (162)
|
1/16/2012 6:02:30 PM
|
|
On Mon Jan 16 13:03:38 EST 2012, charles.forsyth@gmail.com wrote:
> that one's not inherently fatal, in the sense that it shouldn't stop the
> search.
>
> On 16 January 2012 17:13, erik quanstrom <quanstro@quanstro.net> wrote:
>
> > 2. we're ignoring status codes that we should be treating as fatal (like
> > Srvfail)
not clear enough. we were persisting in asking the same question in the
same manner of a server returning srvfail, thus preventing us from asking
the same question in a different way, or of a different server.
we persisted long enough that we timed out the query before asking a reasonable
question of a capable server.
- erik
|
|
0
|
|
|
|
Reply
|
quanstro (3877)
|
1/16/2012 6:05:38 PM
|
|
On Mon Jan 16 13:03:20 EST 2012, charles.forsyth@gmail.com wrote:
> that one's not inherently fatal, in the sense that it shouldn't stop the
> search.
>
> On 16 January 2012 17:13, erik quanstrom <quanstro@quanstro.net> wrote:
>
> > 2. we're ignoring status codes that we should be treating as fatal (like
> > Srvfail)
also, i forgot that it's possible to return Srvfail and return some RRs. these
all need to be ignored. we weren't ignoring them in the past.
- erik
|
|
0
|
|
|
|
Reply
|
quanstro3716 (244)
|
1/16/2012 6:07:49 PM
|
|
--000e0ce0f306f5a00404b6a92a52
Content-Type: text/plain; charset=UTF-8
ah.
On 16 January 2012 18:05, erik quanstrom <quanstro@quanstro.net> wrote:
> we were persisting in asking the same question in the
> same manner of a server returning srvfail, thus preventing us from asking
> the same question in a different way, or of a different server.
>
--000e0ce0f306f5a00404b6a92a52
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
ah.<br><br><div class=3D"gmail_quote">On 16 January 2012 18:05, erik quanst=
rom <span dir=3D"ltr"><<a href=3D"mailto:quanstro@quanstro.net">quanstro=
@quanstro.net</a>></span> wrote:<br><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div id=3D":250">we were persisting in asking the same question in the<br>
same manner of a server returning srvfail, thus preventing us from asking<b=
r>
the same question in a different way, or of a different server.</div></bloc=
kquote></div><br>
--000e0ce0f306f5a00404b6a92a52--
|
|
0
|
|
|
|
Reply
|
charles.forsyth (162)
|
1/16/2012 6:13:13 PM
|
|
On Mon Jan 16 13:14:01 EST 2012, charles.forsyth@gmail.com wrote:
> ah.
>
> On 16 January 2012 18:05, erik quanstrom <quanstro@quanstro.net> wrote:
>
> > we were persisting in asking the same question in the
> > same manner of a server returning srvfail, thus preventing us from asking
> > the same question in a different way, or of a different server.
thanks for asking the question. the way i wrote it wasn't very clear.
here are just a few domains that i've had trouble with that work for
me now:
reject queries with the RD flag
ocsp.netsolssl.com
ocsp.trust-secure.com
hangs
c.l.britecove.com
world-100.bc.gapx.yahoodns.net
if you have a linux box, dig +trace is similar to dnsdebug. if dig +trace
fails for a query, there's no point in debugging it.
- erik
|
|
0
|
|
|
|
Reply
|
quanstro (3877)
|
1/16/2012 6:19:34 PM
|
|
|
5 Replies
19 Views
(page loaded in 0.119 seconds)
|