Some very nice broken IPv6 networks at Google and Akamai (Was: Some very nice IPv6 growth as measured by Google)

Sun Nov 9 12:00:56 CET 2014

* Jeroen Massar

> On 2014-11-08 18:38, Tore Anderson wrote:
> > Yannis: «We're enabling IPv6 on our CPEs»
> > Jeroen: «And then getting broken connectivity to Google»
> > 
> > I'm not a native speaker of English, but I struggle to understand it
> > any other way than you're saying there's something broken about
> > Yannis' deployment. I mean, your reply wasn't even a standalone
> > statement, but a continuation of Yannis' sentence. :-P
> 
> That statement is correct though. As Google and Akamai IPv6 are
> currently broken, enabling IPv6 thus breaks connectivity to those
> sites.

Only if Google and Akamai are universally broken, which does not seem
to have been the case. I tested Google from the RING at 23:20 UTC
yesterday:

redpilllinpro at redpilllinpro01:~$ ring-all -t 120 -n 0 'wget -q -6 --timeout=10 -O /dev/null https://lh6.googleusercontent.com/-msg_m1V-b-Y/Ufo23yPxnXI/AAAAAAAAAMw/Mv5WbEC_xzc/w387-h688-no/13%2B-%2B1 && echo OK || echo FAILED'  | egrep '(OK|FAILED)$'| sort | uniq -c
10     FAILED
255     OK

And Akamai just now (10:30 UTC):

redpilllinpro at redpilllinpro01:~$ ring-all -t 120 -n 0 'wget -q -6 --header "User-Agent: foo" --timeout=10 -O /dev/null http://www.akamai.com/images/img/banners/entertainment-home-page-banner-932x251.jpg && echo OK || echo FAILED'  | egrep '(OK|FAILED)$'| sort | uniq -c
10     FAILED
252     OK

The files I get are both plenty larger than 1500B. Note that (some of)
the FAILED might be explained by the RING-node in question having
generally defective IPv6 connectivity, so it doesn't have to be
Akamai/Google specific.

I'll investigate the failing nodes further and let you know if I find
something that points to Google/Akamai-specific problems.

> No, PMTUD is fine in both IPv4 and IPv6.
> 
> What is broken is people wrongly recommending to break and/or
> filtering ICMP and thus indeed breaking PMTUD.

There's a critical mass of broken PMTUD on the internet (for whatever
reasons). It does not matter who's fault it is, the end result is the
same - the mechanism cannot be relied upon if you actually care about
service quality.

From where I'm sitting, Google is advertising me an IPv6 TCP MSS of
1386. That speaks volumes. I don't believe for a second that my local
Google cluster is on links with an MTU of 1434; the clamped TCP MSS must
have intentionally have been configured, and the only reason I can
think of to do so is to avoid PMTUD.

What works fine in theory sometimes fail operationally (cf. 6to4).
Insisting that there exists no problem because it's just everyone else
who keeps screwing it up doesn't change operational realities.

> I also have to note that in the 10+ years of having IPv6 we rarely saw
> PMTU issues, and if we did, contacting the site that was filtering
> fixed the issue.

Looking at it from the content side, users using IPv6 tunnels are in a
tiny, tiny minority, while still managing to be responsible for a
majority of trouble reports. Our stuff reacts to ICMPv6 PTBs, so it's
not *all* tunnel users that get in trouble at the same time, it's just
that they're suspectible to problems such as:

* Dropping ICMPv6 PTBs emitted by their CPE/tunnel ingress in their
  computer's personal/local firewall.
* The Internet tunnel ingress router rate-limiting ICMPv6
  generation. For example, Juniper has a hard 50pps ICMP generation
  limit per FPC, and at least one Cisco platform has "100/10" by
  default. Given enough traffic on the tunnel router, this limit will
  exceeded more or less continously. See the thread «MTU handling in 6rd
  deployments», btw.

Native users are immune against these problems, because they do not have
to use PMTUD.

> The two 'workarounds' you mention are all on the *USER* side (RA MTU)
> or in-network, where you do not know if the *USER* has a smaller MTU.

LAN RA MTU, yes. TCP MSS, no - it can be done in the ISP's tunnel
router.

> Hence touching it in the network is a no-no.

It appears to me that the ISPs that are deploying tunnels (6RD) for
their users consider these a "yes-yes". Presumably because they've
realised that reducing reliance on PMTUD is in their customer's best
interest, as it gives the best user experience.

Is there *any* ISP in the world that does 6RD that does *not* do TCP MSS
clamping and/or reduced LAN RA MTUs? (Or, for that matter, does IPv4
through PPPoE and does not do TCP MSS clamping?)

For what it's worth, the vast majority of tunneled IPv6 traffic we see
comes from ISPs with 6RD, which generally works fine due to these
workarounds. Thankfully.

> > «this must be a major issue for everybody using IPv6 tunnels»
> > «MTU 1480 MSS 1220 = fix»
> > «the 1480MTU and 1220MSS numbers worked for my pfsense firewall»
> > «The only thing that worked here is 1280 MTU / 1220 MSS»
> > «clamping the MSS to 1220 seems to have fixed the problem for me»
> > «I changed the MSS setting [...] for the moment Google pages are
> > loading much better»
> > 
> > This is all perfectly consistent with common PMTUD mailfunctioning /
> > tunnel suckage.
> 
> NOTHING to do with tunnels, everything to do with somebody not
> understanding PMTUD and breaking it, be that on purpose or not.

It has everything to do with tunnels, because tunnels require PMTUD to
work correctly. When it doesn't, the entire setup fails.

It doesn't matter whose fault it is, the end result is the same.

> Tested failing also on MTU=1500 links
> 
> > (Assuming there's only a single problem at play here.)
> 
> That is indeed an assumption, as we can't see the Google/Akamai end of
> the connection.

If you see failures on MTU=1500 links, I think there must be at least
two distinct problems at play. When users report «MTU 1480 MSS 1220 =
fix», then that is extremely indicative of a PMTUD problem.

With MTU=1500 links, PMTUD isn't necessary, so it must be some other
root cause.

> As you are wearing the hat of a hoster though, you should as there are
> eyeballs that you want to reach that are behind tunnels and other
> linktypes with a lower MTU than 1500.
> 
> Hence, I can only suggest to do your testing also from behind a node
> that has a lower MTU. eg by configuring a monitoring node with a
> tunnel into your own network and setting the MTU lower, or do the
> MTU-in-RA-trick for that interface.

The problem is that even if tunneled user A has no problems with PMTUD,
tunneled user B might have. So testing from a node that has a lower MTU
can't tell me with any degree of certainty that «tunneled users are
fine».

Tore