Connectivity

Maintaining productivity, ensuring customer satisfaction, improving network performance, and reducing downtime are all key benefits of connectivity troubleshooting. Network outages can halt business operations, preventing employees from accessing critical applications and collaborating effectively. This leads to lost productivity and revenue.

Additionally, these problems can disrupt online services, negatively impacting customer satisfaction and damaging the business’s reputation. Troubleshooting helps identify and resolve bottlenecks, congestion, and other issues that affect network performance, ensuring efficient and reliable operations. By quickly identifying and resolving issues, businesses can minimize downtime and reduce the financial impact of connectivity problems.

Allocating IP netmasks

The whole netmask business is a complicated one. An IP interface has an IP address and usually a netmask associated with it. Netmasks on point-to-point interfaces like a PPP link are generally not used.

If you set the Framed-IP-Netmask attribute in a radius profile, you are setting the netmask of the interface on the side of the NAS. The Framed-IP-Netmask attribute is NOT something you can set to influence the netmask on the side of the dialin user. And usually, that makes no sense anyway even if you could set it.

The result of this on most NAS is that they start to route a subnet (the subnet that contains the assigned IP address and that is as big as the netmask indicates) to that PPP interface and thus to the user. If that is exactly what you want, then that’s fine, but if you do not intend to route a whole subnet to the user, then by all means do NOT use the Framed-IP-Netmask attribute.

Many NAS interpret a left-out Framed-IP-Netmask as if it were set to 255.255.255.255, but to be certain you should set the Framed-IP-Netmask to 255.255.255.255.

The following entries do almost the same on most NASs:

user Cleartext-Password := "blegh"
	Service-Type = Framed-User,
	Framed-Protocol = PPP,
	Framed-IP-Address = 192.168.5.78,
	Framed-IP-Netmask = 255.255.255.240

user Cleartext-Password := "blegh"
	Service-Type = Framed-User,
	Framed-Protocol = PPP,
	Framed-IP-Address = 192.168.5.78,
	Framed-Route = "192.168.5.64/28 0.0.0.0 1"

The result is that the end user gets IP address 192.168.5.78 and that the whole network with IP addresses 192.168.5.64 - 195.64.5.79 is routed over the PPP link to the user (see the RADIUS RFCs for the exact syntax of the Framed-Route attribute).

Fragmentation issues

802.1X authentication methods like EAP-TLS transmit large UDP packets that need IP fragmentation to reach their destination. If the network used for 802.1X mishandles IP fragments or has an issue with Path MTU Discovery (PMTUD), this issue shows up as unreliable or non-functional 802.1X authentication.

Before attempting to troubleshoot possible IP or EAP fragmentation issues, it’s important to have a comprehensive understanding of the normal behavior of IP networks regarding fragmentation and reassembly, forwarding of fragments, and critical network services like PMTUD.

Debugging network problems without understanding IP networking usually leads to making mistakes. Often, random changes are made until something appears to work. The final result may have issues or cause network instability.

This section outlines how to identify, investigate, and resolve fragmentation issues, including common scenarios with broken or misconfigured network devices.

Identifying broken Path MTU Discovery

The tracepath tool provides a useful indication of any path MTU restrictions to a destination.

In the example below, the tracepath tool sends maximum size UDP packets marked "Don’t Fragment" to the destination with increasing TTLs. At each stage it records information about the hop, based on ICMP responses, reducing the payload size if necessary, as indicated by the "Next-Hop MTU" field of an ICMP "Fragmentation Needed" or ICMP "Too Big" responses.

$ tracepath -m 20 110.60.100.30
 1?: [LOCALHOST]                         pmtu 1500
 1:  _gateway                              0.589ms
 2:  200.100.50.20                         8.486ms pmtu 1492
 3:  30.50.70.90                           9.267ms
 4:  no reply                             10.117ms
 8:  ae23.example.com                     10.806ms
 9:  ae28.example.com                     11.419ms
10:  ae31.example.com                     13.986ms
11:  ae29.example.com                     15.739ms
12:  ae20.example.com                     15.486ms
13:  ae25.example.com                     17.442ms asymm 11
14:  140.90.30.1                          17.718ms asymm 13
15:  no reply  => UDP is filtered as target is known to be at this hop
16:  no reply
17:  no reply
18:  no reply
19:  no reply
20:  no reply

The MTU was first reduced to the default Ethernet MTU (1500 bytes) to allow packet transmission via the source’s uplink interface. The source reaches the local gateway successfully using the 1500-byte MTU on hop 1. However, to reach the host on hop 2, another reduction in the path MTU to 1492 bytes was required. This reduction was learned from an ICMP "Fragmentation Needed" response. In this scenario, hop 2 is probably a host located on the remote side of a PPP connection. Notably, the PPPoE header consumes 8 bytes of overhead, which caused the MTU reduction.

Some network devices are configured to not respond with an ICMP “TTL exceeded in transit” message when dropping a packet because the TTL reaches 0, as in hop 4. If all hosts outside a specific LAN exhibit “no reply,” it’s likely that an critical ICMP response is being filtered by a firewall. Erroneously dropping ICMP “TTL exceeded in transit” messages is a fundamental IP network issue that must be fixed.

Also note, that if connections to the destination host are filtered to prevent the return of an ICMP “destination unreachable” response for closed UDP ports, such as by a host-based firewall, network firewall, or router ACL, the trace will stop receiving replies either at or immediately prior to the hop where the filters are applied. The trace continues to report “no reply” until the hop count is exhausted. This does not necessarily indicate a fundamental network issue (beyond silent filtering of high-numbered ports).

Symptoms of broken PMTUD

Run the tracepath command from a host that’s on the same LAN as the NAS, and target the RADIUS Server. Then, do the same thing in reverse.

Example: MTU size

If the trace stops reporting path information at an intermediate hop en route to the destination that doesn’t perform packet filtering, it’s likely that there’s a path MTU restriction. However, a device along the already-probed segment of the path is preventing ICMP "Fragmentation Needed" or ICMP "Too Big" responses generated by the restrictive network device from reaching the source.

In this type of scenario, a broken PMTUD shows a trace like the following:

$ tracepath -m 20 110.60.100.30
 1?: [LOCALHOST]                         pmtu 1500
 1:  _gateway                              0.589ms
 2:  no reply
 3:  no reply
 ...
19:  no reply
20:  no reply

Execute the trace with a lower MTU e.g. 1400 bytes, allowing the packets to reach the destination. If the packets get closer, but still don’t reach the target, reduce the MTU again and try transmitting.

Example: Missing ICMP Messages

This behaviour is a strong indication that PMTUD is broken between those hosts. The source is not receiving the indications that it needs to reduce the packet size to the destination, therefore it will likely continue to send RADIUS packets that are too big to reach their destination, rather than perform IP fragmentation with a viable fragment size. Broken PMTUD is a fundamental network issue that should be fixed.

$ tracepath -m 20 -l 1400 110.60.100.30
 1?: [LOCALHOST]
 1:  _gateway                              0.589ms
 2:  200.100.50.20                         8.486ms
 3:  30.50.70.90                           9.267ms
 ...

Take captures of all network devices along the path to determine where IP packets are being dropped due to an MTU restriction. Determine if ICMP "Fragmentation Needed" or ICMP "Too Big" responses are being generated (as is required), and --- if so --- where these ICMP responses are being dropped prior to reaching the source.

Identifying broken fragment handling

The following capture taken with the tcpdump utility on a NAS shows a supplicant performing EAP-TLS and is in the process of sending the client certificate chain:

... IP (id 53297, offset 0, flags [+], proto UDP (17), length 1500)
  10.0.0.50.46521 > 10.0.0.51.1812: RADIUS, length: 1472
    Access-Request (1), id: 0x09, Authenticator: e0422b49...
      User-Name Attribute (1), length: 11, Value: anonymous
      ...
      EAP-Message Attribute (79), length: 255, Value: [REDACTED]
      EAP-Message Attribute (79), length: 255, Value: [REDACTED]
      EAP-Message Attribute (79), length: 255, Value: [REDACTED]
      EAP-Message Attribute (79), length: 255, Value: [REDACTED]
      EAP-Message Attribute (79), length: 255, Value: [REDACTED]
      EAP-Message Attribute (79) (bogus, goes past end of packet)

... IP (id 53297, offset 1480, flags [], proto UDP (17), length 98)
  10.145.0.50 > 10.145.0.51: ip-proto-17

It shows the transmission of a single RADIUS packet, containing a large EAP message fragment, to the RADIUS Server as a set of IP fragments. A single IPv4 packet would have a length that would exceed the 1500 byte MTU of the path to the destination, so the NAS performs IP fragmentation.

Note the first fragment (with offset 0), has an overall frame of length 1500 bytes to fill the path MTU to the destination, and has a more fragments indication (“flags [+]”). The second fragment has offset 1480 and has ID 53297 which matches the initial fragment.

Symptoms of bad fragment handling

Take simulataneous captures at both the NAS and the RADIUS server and look for an instances of fragments being generated at source for an IP packet.

If the network is functioning correctly, the capture taken at the destination will show the arrival of all fragments. It is okay for these fragments to arrive out of order.

In the rare case that an in-path device is performing IP fragment reassembly (and the local MTU exceeds that which was discovered by the sender) then it is also possible to observe a single, complete reassembled packet.

In even rarer cases, for IPv4 packets you might even observe a different arrangement of fragments representing the original packet, either because an in-path host has performed further fragmentation of the fragments, or because fragment reassembly has occurred and then the IP packet has been subsequently refragmented using a different IP fragment size.

Each of these scenarios is fine provided that the destination host is provided with a complete set of fragments representing the original IP packet containing the RADIUS request.

In the case of RADIUS requests being sent to the RADIUS Server, debugging the RADIUS Server (radiusd -X) will show it processing the RADIUS request from the reassembled IP packet. If tcpdump shows some IP fragments arriving but FreeRADIUS does not receive the RADIUS request, then something has gone wrong in the network resulting in the operating system failing to reassemble the original IP packet --- due to either missing or incorrectly formatted IP fragments.

Missing or broken IP fragments always infers the existance of one or more network devices that exhibit impaired IP behaviour. Impaired IP fragment handling is a fundamental network issue that should be fixed.

Captures should be taken at network devices along the path to determine where IP fragments are being dropped, or incorrectly routed.

The FreeRADIUS radsniff tool is not a substitute for tcpdump tool when diagnosing IP fragmentation issues. The radsniff tool processes raw data read from a network interface and does not perform userland IP fragment reassembly. Therefore its output can be misleading:

...
(3) Access-Request Id 6 eth0:1.1.1.1:53320 -> 2.2.2.2:1812
(4) Access-Challenge Id 6 eth0:1.1.1.1:53320 <- 2.2.2.2:1812
(5) Packet too small by 82 bytes, ... should be 1562 bytes
(6) **noreq** Access-Challenge Id 7 eth0:1.1.1.1:53320 <- 2.2.2.2:1812
...
(11) Access-Request Id 10 eth0:10.145.0.50:53320 -> 10.145.0.51:1812
(12) Access-Accept Id 10 eth0:10.145.0.50:53320 <- 10.145.0.51:1812

Packet (5) was an Access-Request that was received as a set of IP fragments, and only the first fragment was processed and declared incomplete i.e. Packet too small... Therefore, the Access-Challenge response in packet (6) didn’t match to any request.

This example output is normal when RADIUS requests are delivered as a set of IP fragments, and not a fault. It can be seen that the conversation eventually completes with an Access-Accept.

Identifying impaired network devices

Network RADIUS encounters various scenarios where a AAA service is degraded or broken due to faulty or incorrectly configured network devices.

An issue is likely to be due to one of these common cases for which potential solutions are provided.

Correct IP networking functionality may vary between a device’s firmware versions. Because of this, EAP-based authentication methods should always be carefully tested prior to production network upgrades being undertaken.

Access networks that do not support a standard Ethernet MTU

Supplicants and authenticators anticipate that the MTU of the network over which EAPoL is performed is a standard size for the link type. Some supplicants will generate EAPoL frames that are the full 1500 bytes of a standard Ethernet MTU and cannot be configured to do otherwise. Even when a supplicant can be configured to use a smaller EAP fragment size, it might not be practical to do so, for example in BYOD environments.

Solution: Increase the access network’s MTU so that it meets the standard for the link type technology. If resizing the MTU isn’t possible, configure all supplicants and authenticators to use a smaller fragment size for EAP messages. Also, configure the NAS to advertise the smaller MTU of the EAPoL network in the Framed-MTU attribute of RADIUS requests sent to the RADIUS Server.

NAS doesn’t perform IP fragmentation correctly

Some wireless lan controllers (WLCs) and switches (that do not support asymmetric fragmentation/reassembly) are unable to encapsulate a large EAP message generated by a supplicant into a RADIUS Access-Request that would need to span multiple IP fragments to satisfy the path MTU to the RADIUS Server.

Solution: Upgrade or replace the NAS with a device that performs proper IP fragmentation.

NAS doesn’t perform IP fragment reassembly correctly

Some WLCs are unable to de-encapsulate an EAP message from a RADIUS Access-Challenge that is received as a set of IP fragments, even though the EAP message would fit within the link MTU for the EAPoL interface.

Solution: Upgrade or replace the NAS with a device that performs proper IP fragment reassembly.

Devices drop IP fragments

Some firewalls, routers and network load balancers simply drop all IP fragments on egress or ingress as a matter of policy, for reasons other than a link MTU restriction.

Solution: Reconfigure the malfunctioning network device to permit IP fragments to and from the RADIUS servers.

Devices that sometimes drop IP fragments

Some firewalls drop IP fragments for an extended period of time in reaction to some global network condition, such as during a fragment-based network attack. Services that depend on IP fragmentation may therefore work at some times but not others.

Solution: Override such protections for traffic to and from the RADIUS servers, and disable virtual reassembly if necessary to protect the resources of the firewall. Ensure that the RADIUS Server’s operating system is up to date and that the host has sufficient resources to mitigate fragment-based network attacks by itself.

Devices that attempt "virtual reassembly" on an incomplete packet stream

Firewalls and routers may be configured to perform "virtual reassembly" of complete IP packets using all IP fragments for policy inspection purposes. If traffic takes multiple paths such that a single device does not see all IP fragments then reassembly will fail, fragments will be dropped, and excessive resources consumed.

Solution: Disable virtual reassembly for packets involving the RADIUS servers or amend the routing policy to ensure that all fragments to a destination are forwarded via the same path.

Devices that steer IP fragments of the same packet to different backends

Stateless routers and load balancers, as well as load balancers with broken flow cache lookup for IP fragments, may steer subsequent IP fragments to a different backend than the initial fragment.

Solution: Configure the device to steer packets based on the Layer 2 addresses only, and not the Layer 4 information which is only present in the initial fragment. Note: This configuration is normally required for EAP since the source port is not guaranteed to remain the same throughout the authentication exchange.

Devices perform Network Address Translation with broken flow cache lookup

NAT devices with broken flow cache lookup may either drop or incorrectly rewrite IP fragments and ICMP responses.

Solution: Upgrade or replace the broken device.

Load balancers having pathological IP fragment handling when a backend is degraded

Some load balancers route fragments to the correct backend except when a backend is offline, in which case they route fragments incorrectly. A single backend becoming unavailable results in degradation of the entire service.

Solution: Upgrade or replace the broken load balancer.

Devices that filter ICMP "Fragmentation Needed" and "Too Big" messages

Some routers and firewalls may filter critical ICMP responses, breaking PMTUD, and resulting in authenticators and/or RADIUS servers continuously sending oversized IP packets. These packets are too large for the path and do not reach their destination.

Solution: Configure devices so as not to filter ICMP messages that are essential for basic network services.

Devices steering ICMP to a different backend than the corresponding application data

Some devices performing ECMP routing and other forms of network load balancing with broken flow caches will route an ICMP message to a different backend than to where the application data that originated the ICMP response is sent. This breaks PMTUD and results in RADIUS servers continuing to send oversized IP packets instead of performing IP fragmentation.

Solution: Either use a device that performs flow tracking to match ICMP messages with their associated data flows and steer them to the same backend, or broadcast ICMP messages required for PMTUD to all backends.