Day-to-day work Technology

Monitoring, a good habit to solve incidents

This is a tale of an incident, one of many that people from the IT world go through everyday. A couple of weeks ago I noticed a roughly 10 MB decrease in the upload speed of my Internet connection. It was the first time that I experienced such a decrease with my ISP over a wired network.

This decrease wasn’t reflected much in the full allocated bandwidth nor in the browsing performance, but it continued over time. I began running the SpeedTest tool more often. The upload speed decrease was always there. Sometimes more evident than the others.

Days went by and one morning, while listening to Blur FM on my Bluetooth speaker, I experienced poor audio quality with lots of cutting ups, similar to a buffering effect. I restarted the server and everything seemed to be fine except for something: I opened the Task Manager and discovered an unfamiliar process running that was taking a high CPU quota: the System interrupts process. This is a component of Windows Operating System that sends alerts whenever there are issues between the CPU and hardware parts of the system.

System Interrupts taking a high CPU quota. Screenshot from How and Wow.

My worries increased when I found the CPU workload was always above the 70%. System interrupts was taking a high quota along with the Virtual Machine that runs this blog’s web server. Blur FM’s streams played smoothly so I thought that was the price to pay to run Debian and SAM Broadcaster encoding 5 audio streams real-time on a low powered CPU.

Another day, another incident. My main desktop PC restarted out of the sudden. Not once, a couple of times in a working day. I thought it could be something related to electric power due to the “click” sound the Power stabilizer made each of the times the PC restarted. Long story short, it was the power strip extender that was faulty.

Probably you might be wondering: what does this have to do with monitoring? Actually, it helped me to solve the issue with the upload speed decrease. When the server was powered off, I noticed the fiber started to work at full-speed again. How was this possible when the Task Manager hadn’t revealed any relevant network activity? It seems that the traffic routed through the VPN is not part of the Network activity the Task Manager reveals.

How monitoring helped to solve the incident

Monitoring the Task Manager activity helped in firing up the first alarm that something was wrong with the high CPU usage. Even though it didn’t show any Network activity, assessing the logs of the Amazon Lightsail instance did. It took me one week to find out what was happening, the same time frame the CPU utilization chart showed a spike in the metrics graphs.

This is how my Metrics Graphs looked like. Something’s not going well here.

I came to the conclusion the server was being hacked all that time, probably by someone mining cryptocurrencies, who knows.

The way to the solution, step-by-step

I needed to make some modifications to my current server setup in order to avoid further similar issues in the future. These are the steps I followed to get everything back and running:

  • New instance launched for the VPN tunnel. This time I picked Debian instead of Ubuntu. When I moved to the cloud, Debian emerged as my go-to distro after having a failed experience with Ubuntu.
  • Two firewalls set up in both, the instance running the VPN tunnel and the home server at home.
  • New IP address attached to avoid possible issues in the future.

With these three steps the CPU utilization dropped drastically not only in the Amazon Lightsail instance, but also in the PC running this blog’s web server. Now Debian and SAM Broadcaster together take an average of 30% of the CPU utilization, with Debian using less than 1% of the quota in idle mode.

Final notes

My hype for Debian was such that I decided to try out Debian 10 Buster with the GNOME desktop environment. Since Ubuntu is based on Debian, making the switch was relatively easy.

Even though Debian needed a little extra configuration to fit my needs, VMware runs way much better with Debian 10 than it did with Ubuntu 20. Transitions are smoother and the experience as a whole is almost like running the OS natively.

This tale ends with a screenshot of my current Debian 10 Buster setup. And they lived happily ever after…

Debian 10 Buster with GNOME
Debian 10 Buster with GNOME 3.30, Dash to Dock and GNOME Tweaks.

Further reading

Leave a Reply

Your email address will not be published. Required fields are marked *