- Published on Monday, 11 March 2013 16:18
I have home internet service from Cox Communications in California. It's cheap but not very reliable. I often get random outages and on average once a month theres a really long service outage that lasts from several hours to over a day. Obviously this is not the sort of internet service you want to run any kind of business on.
However, it's cheap ($30/month) and since I can use Verizon tethering as a backup internet connection I haven't taken the time to find another internet service provider. However I do want to know whenever this connection goes down and when it comes back up. I use Nagios for enterprise service and systems monitoring. Today I discovered that a default setting in my Nagios installation was preventing it from notifying me about service outages.
The Nagios flap detection actually DISABLED notifications of the internet connectivity disruption BECAUSE the internet connectivity was unstable. This is exactly the opposite of what I wanted to have happen! Whenever the Cox internet connection drops, I want it to know about it right away (the notifications come to my phone which is on Verizon 4G). Today I had intermittent disruptions in connectivity and it was so unstable that Nagios actually suppressed notifications!
The disruption happened at the same time that I started getting errors like this ("SYNC Timing Synchronization Failure - Failed to acquire QAM/QPSK symbol timing") recorded in the router's event log. Clearly this is a problem with the Cox network and not a problem with anything in my residence or any of my equipment, even though every time I've called them they have tried really hard to blame me or my equipment for whatever is wrong (and of course it has never been my equipment whenever I've called their technical support, because otherwise I would just fix it myself):
The Nagios flap detection is apparently a default setting and the web documentation is not exactly consistent with what I found in the files. When I started noticing the internet intermittent connection the first thing I checked was Nagios to see if it had noticed. Of course it noticed, but it had disabled notifications because it had determined that the host was flapping!
To permanently disable Nagios flap detection, I edited the /usr/nagios/etc/nagios.cfg main Nagios configuration file as follows:
[root@agadez etc]# grep flap_detection nagios.cfg
Once I changed the enable_flap_detection from 1 to 0 then I restarted nagios (/etc/rc.d/init.d/nagios restart) and of course since then the internet connection has been stable so I haven't gotten any notifications from it. However, in the future I expect to be notified at the first sign of trouble!
P.S. I could go on and on about Cox's technical support and their consistent persistent efforts to blame any outages on me, my equipment, or something in my residence. One time the router's status page was reporting a "Network Access: Denied" in the Status section of the Modem Status page. The cox technical support person tried to route my call to the $15/hour home networking technical support people because she refused to listen to me that I was looking at the router's status page and not some Windows error message on my laptop that it couldn't connect to the wifi network. Since that problem took them about 7 hours to resolve and it started working again without my doing anything, I'm reasonably sure that the problem was some authorization system of Cox's that had de-authorized my cable modem and they eventually found & fixed the problem on their own.