ipv6 next-hop link-local
Francis Dupont
Francis.Dupont at fdupont.fr
Sat Feb 19 14:56:08 CET 2011
In your previous mail you wrote:
> => can I kindly ask you to read the RFC before saying it is stupid?
Well, re-reading the RFC 3 times, and trying to fully understand it, I
need to modify this statement - this RFC is actually trying to clean up
the problems caused by architectural designs (so apologies to the
authors).
=> our goal was to fill the gap between BGP which uses only global
addresses and IGPs which use only link-local addresses, and IMHO
if you follow what is written it should always work even in the uncommon
case a link-local address is needed.
It doesn't help, though, as it still says (section 3):
"A BGP speaker shall advertise to its peer in the Network Address of
Next Hop field the global IPv6 address of the next hop, potentially
followed by the link-local IPv6 address of the next hop.
=> note this means the common case is a global address and the
exception clarified in the next paragraph is a global address *and*
a link-local address.
...
The link-local address shall be included in the Next Hop field if and
only if the BGP speaker shares a common subnet with the entity
identified by the global IPv6 address carried in the Network Address
of Next Hop field and the peer the route is being advertised to."
well, there you go, and this is exactly what happened in the scenario
we've seen - link-local nexthop advertised, Cisco peers using the LL
next-hop, Juniper peers using the global next-hop.
=> it seems according to your description the Cisco behavior is incorrect.
Global next-hop was
working (obviously!, as otherwise the BGP session would not have been
established), link-local ND was broken - Juniper peers worked, Cisco
peers had black-holing.
(Unfortunately, I can't seem to find the text reference anymore that
says that receivers are basically free to decide which nexthop type to
use
=> there is no such freedom: the only thing allowed by the RFC is to
ignore a following link-local address when it is not needed (it is
formally a sender mistake but without a bad consequence if the receiver
is correct).
- RFC4760 seems to tell me that a conforming implementation must
only ever send a single next-hop in MP_REACH_NLRI, so maybe that was in
one of the previous versions of [BGP-4])
=> there is no document overruling RFC 2545 as far as I know.
After reading the RFC two more times, I seem to understand where the
initial idea comes from - networks that share eBGP routers and "other
stuff", and where you want to send ICMP redirects and/or RIPng updates
with a next-hop pointing to "other routers".
=> the sharing common case is a third party. As far as I know it
is a rare case even with route servers.
Our operational problems come from networks that only have eBGP
speakers - namely, exchange point meshes - and link-local next-hops
have no reason for existance there. No RIPng, no ICMP redirects.
=> yes, you don't seem to be in the case where a link-local address
is needed so the next-hop should contain only a global address.
So what I would have wished for is some strong words in this RFC
that discourage use of received link-local next-hop, unless other
protocols are in use that require them. Or something that would
encourage router vendors to add a switch to their implementation
to give the network admin the choice...
=> hum, do you mean the shall words should be uppercased?
BTW as I implemented a long time ago BGP4 for IPv6 I remember the
condition for adding a link-local address in next-hops can be
determined only from the whole configuration, so the proper way
is IMHO to read and implement what is in the RFC and *not* in an
imaginative way.
(Basically, this is what I hoped to find in the Cisco BGP implementation
- a switch like "neighbour 2001:db8::1 always-use-global-nexhop", but that
one doesn't exist)
=> fill a bug report? build a test case to check implementation
conformance and send an announce in the NANOG list (:-)?
Regards
Francis.Dupont at fdupont.fr
PS: if you believe RFC 2545 needs support and/or improvement please
say it. It was written before any IPv6 interdomain routing went
into production so even I think its wording is fine obviously
it was not followed as it was intended, something which can perhaps
be fixed by a new version.
More information about the ipv6-ops
mailing list