MTU = 1280 everywhere? / QUIC (Was: Some very nice ...)

Tue Nov 11 10:42:57 CET 2014

On 2014-11-11 02:18, Lorenzo Colitti wrote:
> On Sun, Nov 9, 2014 at 8:10 PM, Jeroen Massar <jeroen at massar.ch
> <mailto:jeroen at massar.ch>> wrote:
> 
>     > Another fun question is why folks are relying on PMTUD instead of
>     > adjusting their MTU settings (e.g., via RAs).
> 
>     Because why would anybody want to penalize their INTERNAL network?
> 
> 
> Lowering the MTU from 1500 to 1280 is only a 1% penalty in throughput.
> I'd argue that that 1% is way less important than the latency penalty.

1%? You do mean at least: (1500 - 1280 = 220) / 1500 * 100% = 3.3%

And that is only if looking at the packet size.

But that includes a IPv6 and TCP header, thus another 20 + 20 is lost
for every packet (ignoring TCP options etc).

Thus we can either send:
1500 - 40 = 1460
or
1280 - 40 = 1240

1460 / 1260 * 100 = 115.87301587 ~= 115% => 15% bigger packets

According to a random google search for "youtube video size" apparently
a a 25 minute clip is about 200 MiB.

Hence, at MTU = 1500 and optimal splitting you are sending:
(200 * 1024 * 1024) / 1460 = 143640+X packets
(200 * 1024 * 1024) / 1240 = 169125+X packets

Ignoring of course the small packets for ACKs and other such stuff.

(169125 / 143640) * 100 = 117.7422723 ~= 118% => 18% more packets

Hence 18% more packets than with a MTU of 1500.

But, lets look at your "small" page: https://www.google.com

According to Chrome, that is 441 KiB in 6.85 seconds here (that is
excluding ads/socialcrap, thank you AdBlock, Disconnect & Ghostery)

(441 * 1024) / 1460 = 309+X packets
(441 * 1024) / 1240 = 364+X packets

(364 / 309) * 100 = 117.79935275 ~= 118% => 18% more packets

Like magic, the same 18%, which is close to the 15% larger size.

(ignoring the fact that those where 23 separate requests, hence all the
TCP overhead of syn/ack which are mostly empty etc).

Thus maybe your 1% is a worst-case situation where not all packets are
being 'filled' and my 18% is the best-case situation where all packets
are fully 'filled'.

As the problem with data transfer is the speed of light in a lot of
cases, having the need to send more packets thus hurts. Sending 18% more
packets thus definitely impacts your speed of transfer.

Thus limiting the whole world to the same low MTU, hurts the whole world.

Can we thus please NOT limit the world at 1280 and actually find the
culprits and resolve this properly.

>     Because you can't know if that is always the case.
> 
> 
> I'm not saying that PMTUD shouldn't work. I'm saying that if you know
> that your Internet connection has an MTU of 1280, setting an MTU of 1500
> on your host is a bad idea, because you know for sure that you will
> experience a 1-RTT delay every time you talk to a new destination.

Please realize that while Google might think they are the center of the
Internet, a lot of people use *LOCAL* resources, that are not accessed
over their puny WAN link.

>     As you work at Google, ever heard of this QUIC protocol that does not
> 
>     use TCP?
> 
>     Maybe you want to ask your colleagues about that :)
> 
> 
> Does QUIC work from behind your tunnel? If so, maybe my colleagues have
> already solved that problem.

From:
https://docs.google.com/document/d/1RNHkx_VvKWyWg6Lr8SZ-saqsQx7rFV-ev2jRFUoVD34/mobilebasic
"UDP PACKET FRAGMENTATION" but IPv6 dos not fragment...
no mention on handling ICMP PTBs either, let alone mention of IPv6.

so it is at least not anywhere mentioned in the "design document"

But there are apparently things like 'resending important packets' or
even sending them multiple times etc.

But what if you could just avoid the loss altogether instead of doing
guess work and retransmissions.

Looking at Chromium source directly:
https://src.chromium.org/svn/trunk/src/net/quic/quic_protocol.h

8<----------------------------
// Default and initial maximum size in bytes of a QUIC packet.
const QuicByteCount kDefaultMaxPacketSize = 1350;

// The maximum packet size of any QUIC packet, based on ethernet's max size,
// minus the IP and UDP headers. IPv6 has a 40 byte header, UPD adds an
// additional 8 bytes.  This is a total overhead of 48 bytes.  Ethernet's
// max packet size is 1500 bytes,  1500 - 48 = 1452.
const QuicByteCount kMaxPacketSize = 1452;
// Default maximum packet size used in Linux TCP implementations.
const QuicByteCount kDefaultTCPMSS = 1460;
----------------------------->8

(Btw, note the "UPD" typo there, too minor to report that as a 'bug' ;)

Seems it was 1200, but got upped to a magic 1350:
https://codereview.chromium.org/427673005/
https://codereview.chromium.org/420313005/

Not much background there; but those sizes are likely

And just in case, checking the current source:

$ git clone https://chromium.googlesource.com/chromium/src.git
...
remote: Sending approximately 2.54 GiB ...
...
Receiving objects: 100% (3272908/3272908), 2.53 GiB | 13.56 MiB/s, done.
...

(Wireless home networks behind tunnels are so slow...)

Going through that, it does not seem to let the OS handle fragmentation
as it seems they are just stuffing packets upto 1350.

As in "solved", not really... ignoring is the right word.

Now of course, that code might not match at all what is used in Chrome
or the server side.

Bcc'd some folks who might actually know how that is handled without me
having to try and understand all the code (lots of stuff happening
there...).

>     > (Some parts of) Google infrastructure do not do
>     > PMTUD for the latency reasons above and for reasons similar to those
>     > listed
>     > in https://tools.ietf.org/html/draft-v6ops-jaeggli-pmtud-ecmp-problem-00 .
> 
>     As such, you are ON PURPOSE breaking PMTUD, instead trying to fix it
>     with some other bandaid.
> 
> 
> The draft explains some of the reasons why infrastructure is often built
> this way.

Yes, and it is great that is documented. Thank you also for admitting
that that is the problem. And it is not something that anybody who does
not touch loadbalancers on a daily basis will think of either.

Hence why I don't think the original IPv6 design accounted for this kind
of setup, that is quite common nowadays, but was not back then.

Thus can we please PROPERLY fix this MTU/ICMPv6-PTB problem before we
stick the world to a 1280 MTU for as long as IPv6 will exist (which will
outlast our lifetimes hopefully...)?

See my proposal about stuffing the MTU in the flow-label that can avoid
that nightmare. You where one of many in the BCC as you should be caring
about it as it affects your companies network a lot.

Greets,
 Jeroen