Note: This is a very simplified process for MTR trace analysis that covers most common issues and misconceptions. We ask that you provide full MTR trace data in response to any support ticket in which is is requested -- self diagnosis using this article is a great first step, but further information can be derived from the data further down the line.
Issues on the 1st hop: Keep in mind that more often than not, issues appearing on the first hop are caused by link saturation. If the site has an ADSL2 connection that's syncing at 11Mbps and that capacity is being used, this may manifest as high or fluctuation latency, or packet loss on the connection. Be sure to check the modem / router's throughput statistics before investigating further into other areas.
Step 1 - Check for flow-on packet loss
Packet loss can appear in MTR results even when a connection is perfectly healthy - this is particularly common on carrier hops. Routers on the internet typically deal with anything up to millions of packets of traffic per second, meaning that on occasion they need to prioritize how they are using their resources. ICMP packets directed at the router itself (such as the ones used in an MTR trace) quite often are treated with the absolute lowest priority, so although the router may be doing it's job just fine - it may drop direct ICMP requests, which an MTR trace picks up as packet loss.
Consider the following examples...
Fig 1.1 - A healthy example showing packet loss
Host |
Loss % |
Snt |
Last |
Avg |
Best |
Wrst |
Localrouter.localdomain |
0.0% |
100 |
0.5 |
0.4 |
0.5 |
0.6 |
Isplns.someisp |
0.0% |
100 |
40 |
45 |
40 |
50 |
Carrierhop1.carrier |
0.0% |
100 |
42 |
45 |
40 |
55 |
Carrierhop2.carrier |
60.00% |
100 |
44 |
45 |
40 |
55 |
Neuralrouter1.neural.net.au |
0.0% |
100 |
46 |
45 |
40 |
55 |
Awesomeserver.neural.net.au |
0.0% |
100 |
48 |
45 |
40 |
55 |
In this example, the host Carrierhop2.carrier is showing 60% packet loss, however all subsequent hops show 0% loss. In this instance, the connection is happy and healthy and no action is required.
A connection that does in fact have a faulty hop, will look more like this...
Fig 1.2 - A connection with a real faulty hop
Host |
Loss % |
Snt |
Last |
Avg |
Best |
Wrst |
Localrouter.localdomain |
0.0% |
100 |
0.5 |
0.4 |
0.5 |
0.6 |
Isplns.someisp |
0.0% |
100 |
40 |
45 |
40 |
50 |
Carrierhop1.carrier |
0.0% |
100 |
42 |
45 |
40 |
55 |
Carrierhop2.carrier |
60.00% |
100 |
44 |
45 |
40 |
55 |
Neuralrouter1.neural.net.au |
65.0% |
100 |
46 |
45 |
40 |
55 |
Awesomeserver.neural.net.au |
65.0% |
100 |
48 |
45 |
40 |
55 |
Notice in this example, that carrierhop2.carrier shows loss, as does everything after it, including the last hop. By far the most important loss figure to pay attention to is the last hop -- even if multiple prior hops show packet loss, but the last one doesn't - then there is probably nothing wrong with packet loss on the connection.
Step 2 - Check for latency fluctuations
Even on connections with no packet loss - latency fluctuations can cause interruptions to service, particularly on latency sensitive services such as VoIP and faxing.
Although an average latency figure may be a healthy 40 - 80ms, there may be peaks that are more difficult to detect using a standard ping. This is where the data from an MTR trace can be invaluable in identifying a periodic latency fluctuation issue.
Consider the following examples...
Fig 2.1 - Fluctuation in latency (first-hop connection or contention issue)
Host |
Loss % |
Snt |
Last |
Avg |
Best |
Wrst |
Localrouter.localdomain |
0.0% |
100 |
0.5 |
0.4 |
0.5 |
0.6 |
Isplns.someisp |
0.0% |
100 |
40 |
45 |
40 |
700 |
Carrierhop1.carrier |
0.0% |
100 |
42 |
45 |
40 |
800 |
Carrierhop2.carrier |
0.0% |
100 |
44 |
45 |
40 |
800 |
Neuralrouter1.neural.net.au |
0.0% |
100 |
46 |
45 |
40 |
800 |
Awesomeserver.neural.net.au |
0.0% |
100 |
48 |
45 |
40 |
800 |
In this example, all hops beyond the first hop (the first hop being the router accessed via LAN) have high values for "Wrst" (Worst / Highest latency). This indicates that the latency on the internet connection at the site is spiking. Regardless of the average, best and last values - if the worst value is consistently coming up high across multiple MTR's, there may be an issue to look in to on the connection at the site itself.
Fig 2.2 - Fluctuation in latency (ISP, carrier, or Neural issue)
Host |
Loss % |
Snt |
Last |
Avg |
Best |
Wrst |
Localrouter.localdomain |
0.0% |
100 |
0.5 |
0.4 |
0.5 |
0.6 |
Isplns.someisp |
0.0% |
100 |
40 |
45 |
40 |
50 |
Carrierhop1.carrier |
0.0% |
100 |
42 |
45 |
40 |
55 |
Carrierhop2.carrier |
0.0% |
100 |
44 |
45 |
40 |
550 |
Neuralrouter1.neural.net.au |
0.0% |
100 |
46 |
45 |
40 |
550 |
Awesomeserver.neural.net.au |
0.0% |
100 |
48 |
45 |
40 |
550 |
In this example, only hops further away from the site connection are showing high latency worst values. In this instance you'll need to escalate the issue to the first hop showing the issue. For example, if the hop belongs to your ISP, they will need to be advised - however if it's further on (a carrier or on the Control Networks' network) then you will need to advise us via our support channels.
These are the most common issues that can be identified with an MTR trace and this is by no means an extensive list, nor is it accurate in all occurrences. If in doubt, or if requested by our support team, please be sure to send the MTR trace through in reply to your support ticket, or open a new one via ithelpdesk@controlnetworks.com.au