Interesting A10 GSLB interop problem
George Bonser
gbonser at seven.com
Tue Oct 25 19:15:17 CEST 2011
> >
> > The problem isn't so much the mishandling of the AAAA records, per
> se,
> > as it is the fact that the mishandling of them messes up future v4 A
> > record requests by clients using the same DNS server due to the
> > caching of the CNAME. I have even reduced the cname TTL to 10 seconds
> > but you still end up with any v4 clients that make a request to the
> > local DNS server getting scattered to the failover VIPs during that
> 10
> > second period after an AAAA request. That can be a substantial number
> > of requests.
> >
>
> What happens if you are replacing the A record, and not shifting to a
> CNAME? Surely it wouldn't reply to an AAAA request with an A record.
> I'm
> wondering if this is something specific to the usage of a CNAME.
>
>
> Jack
The idea here is that you have an A record that points to a VIP. If the VIP is down, the system returns a CNAME to a different name which acts as a fallback. What this CNAME does in this specific case is scatters clients across the remaining VIPs that provide the service for a period of at least 2 hours.
So basically a client has a VIP that services its geographical area if the service IP in its geographical area goes unavailable, the client gets a CNAME with a TTL of 7200 that round robins it to one of the remaining IPs for the same service but outside its geographical area. So if the east coast VIP fails, it might get sent to the Midwest of the West coast for two hours and then it will try again. This prevents a service that has been down for a while from getting slammed suddenly by all of its users coming back all at once. They will come back gradually over a period of two hours once the service is restored.
The problem is that if a client makes an AAAA request and there is no IPv6 address associated with the service VIP, the system handles it as if it were down and gives the CNAME out instead of handling it as if there just isn't a v6 IP and giving NOERR with a reference to the A record.
This causes the client's DNS server to cache the 7200 TTL failover CNAME resulting in all clients using that DNS server to go into the failure mitigation mode when there is no failure.
What I believe the correct response should be is:
If an AAAA request arrives for a gslb service-ip that does not have an associated IPv6 address it should get a NOERR response with the A record in the "additional" section.
If an AAAA request arrives for a gslb service-ip that DOES have an associated IPv6 address and that service-ip is down, then it should get the CNAME.
More information about the ipv6-ops
mailing list