“With this notification we would like to inform you that our in-house Website Performance Monitoring System (WPMS) has signaled that your account constantly uses a large amount of the server’s CPU resources. These excessive requests consume an abnormally high amount of CPU resources and endanger the overall performance of the server. Your account consume more then 55703.75 CPU seconds and 102195.00 CPU executions for the last 24 hours. (…) Unfortunately, your website’s server resource usage is not suitable for this server and that is why we will no longer be able to host it there.“
This is the message that waited for me in my inbox when I woke up last Thursday morning (grammar errors and typos included). A quick visit to mor10.com confirmed my panic: The site was down, replaced by a 503 error. Logging into my site admin panel I discovered the hosting provider had locked down my site banning access to any incoming visitors. And for good reason. A quick inspection of the resource logs showed something dramatic had happened during the night. Here are screenshots of the weekly stats and the execution log for the preceding 24 hours:
Thus began what would become a 16 hour battle with arrogant and ignorant tech support “specialists”.
Searching for a cause
Looking at the server loads it was apparent something was amiss. The causes for these types of spikes are few and should be easy enough to spot with some further investigation:
- Sudden popularity of an article (formerly known as the “Slashdot Effect” or “Digg Effect”)
- Denial of Service attack
- Scripting error
Considering the sudden and extreme spike in activity the Slashdot Effect was rather unlikely. This was further supported by checks of Google Analytics and Cloudflare stats which showed only a marginal increase in traffic during the same time period.
The next obvious cause would be a DDoS attack. However when inspecting the traffic logs for mor10.com everything looked normal. The logs for the past 24 hours matched the logs for the previous 24 hours in size and variance. Odd.
Having eliminated option one and two the remaining options became a major concern. To get some clarity I contacted tech support and requested more information. An hour later I got a response saying they expected the sudden spike was caused by a “plugin”. (I say “plugin” in quotation marks because I would later realize that the tech support staff use the term “plugin” to mean any application that runs on the server.) Knowing what plugins run on my site I could quickly eliminate that as an option but I chose to manually remove all of them and re-install fresh versions just to make sure. While logged in to the back end I also cleared out the theme directory, uploaded a clean version of the current active theme, and did a full inspection of any and all activity on the file server in the last 24 hours. Nothing stood out and nothing had been changed apart from the .htaccess file (which the host used to lock down the site) and the files I had just changed.
Having eliminated any obvious script issues on the site I contacted tech support again and asked them to take the site back online to see if things were now working properly.
Enter the Infinite Tech Support Guru Incompetence Loop
Four and a half hours later tech support responds:
“We have checked your case in details and monitored your account for quite some time, unfortunately the issue does not appear to be resolved. Your account generated over 30 script executions for the period of 20 seconds. You can see the time-stamp as well. My recommendation is to disable any plugins, such as chatboxes or shoutboxes or any other poorly written plugins.”
Chatboxes or shoutboxes? This was the first indication that the tech support “guru” I was talking to had no idea what was going on. Had he taken one second to scan the files on my site he would have known there were no such scripts on the site. And since I had already checked all my scripts I knew this was not a script issue. I asked for more information:
“I have no idea what’s causing this. It’s certainly not something that is being done on purpose. It would be helpful if you could assist me in figuring out where this is coming from.
For example, what is the request being sent? Simply saying that index.php is doing something is not enough information to be able to troubleshoot the problem.“
The response, from tech support “guru” number 2, came an hour later and was less than satisfactory:
“(…) the file index.php is the first file that is loaded when a visitor opens the home page of your website.“
Really? Not only was this not an answer to my question, but it was a clear assumption that I had no idea what I was doing. I was getting quite annoyed at this point and responded:
“I know what the index.php file is and what it does. I’m asking what it is the file is trying to do on the server. If you are telling me that the reason for the takedown is that the site is being requested from outside at a high rate I’m going to respond that that’s what’s known as a DDoS attack. Which you already know.
What I need to know is whether there is something on my end that is causing this or if it’s someone on the outside. Seeing as I did not make any changes to the site yesterday when the spike started there is good evidence to suggest this is not related to the site itself but rather someone on the outside. That is further supported by the fact that when the site is down you don’t have the same server load issues.
So, if it is true that my site is being targeted by a DDoS attack it is uncalled for of your service to blame me and ask me to fix it. If on the other hand you are sitting on info that shows this issue is caused by something on my end you need to provide me with that info so I can troubleshoot the issue. Telling me to deactivate plugins I obviously don’t have on my site is not helpful. Telling me exactly what is causing this server load on the other hand would be.“
Two hours later tech support “guru” number 3 sends me this response:
“During my investigation I found that the most executed script on your account is index.php.
I have checked the domlogs for this domain and it seems that the request to it are legitimate. My advice is to contact a professional developer which can dedicate the time to inspect the code of the mor10.com website and optimize it. Are you using feed plugins on your website?“
At this point I was yelling at my computer. Are you kidding me? Again with the index.php? Again with the ridiculous suggestions about plugins obviously not installed? Contact a professional developer? What’s next? Are you going to ask me to run a virus scan? My rather annoyed response:
“You keep giving me the same useless information. Everything that happens on a WordPress site happens through the index.php file. It’s like telling me there is an accident on a road on planet earth.
What I need to know is this:
1. What are you seeing? A script running queries to the database? A script calling other files? A script sending content out from the site?
2. What is the script doing? Is it a GET or a PULL request? Or is it pushing data elsewhere?
It is impossible to debug anything when all you give me is “the most executed script is index.php”. You advertise yourself as a WordPress host but the info I’m getting from you guys indicates you don’t now much about WordPress.
To be able to sort this out I need information about what is happening on your end for the simple reason that the clone of the site we set up on a local server is not causing any issues. I cannot reproduce the issue you are describing, and since you are not describing the issue in any usable way it is impossible to do any further investigation.“
The response, from Tech Support “Guru” #3, came in 43 minutes later. Are you ready for it?
“I have checked and one of the IP addresses which accessed your website the most is your own IP address.
Could you please scan your local computer with updated antivirus software and provide us with the scan results.“
I wish I had a video of myself as I received this message. It would have gone viral. It’s the closest I’ve ever come to experiencing a cerebral nuclear explosion.
Let’s take a moment to dissect this response:
- The site is currently locked down and only accessible through my IP.
- The host has requested I check all my files – an action that causes a fair bit of traffic on the server from my IP address.
- Upon inspection tech support observes said traffic from my IP address.
- And flags it as problematic.
- And concludes there must be a virus running on my computer.
The lack of understanding of the internet required to come up with this preposterous nonsense is beyond comprehension.
F*ck you, I’ll just do it myself
At this point two things became abundantly clear to me:
- These “tech support gurus” were no more able to solve my issues than some random 5 year old kid on the street
- The problem was likely something they had overlooked
At this point I’d thoroughly eliminated both 3) Scripting Error and 4) Hack from my list of problems, and since stats clearly showed 1) Slashdot Effect could not be the culprit the only remaining explanation would have to be 2) DDoS. The problem was that the server logs were normal.
How can a DDoS attack happen without there being a server log to document it? That’s simply not possible. Something is off.
I went back to the server logs for my entire hosting account to get a fresh copy of the log file and look at what happened just when the problems started. But due to either anger or frustration or just not looking very closely I accidentally downloaded the log for designisphilosophy.com instead of mor10.com. Designisphilosophy.com used to be the domain name for this site and when I made the switch to mor10.com I set the old domain to automatically forward all traffic. The log for the last 24 hours was 29mb. And when I opened it I saw this:
To be more precise, the log showed 120,000 entries from one single IP address hitting the same file on the server, all happening within a 12 hour time span. This one IP address hit the server on average 2 times per second and triggered a redirect every time.
I think at this point my neighbor probably thought I was trying out for a world record in Loudest Repetitive Swearing.
Here I had been running around in circles for 16 hours trying to find a non-existent problem pointed out by tech support while the answer to the “mystery” was sitting in a file tech support should have checked immediately. Pardon my English here, but ARE YOU F U C K I N G KIDDING ME WITH THIS?
The message I sent to tech support cannot be republished here. Needless to say they immediately put my site back online and blocked the offending IP address.
Don’t Trust Tech Support!
If you’re still with me at this point I’m sure you are both baffled and frustrated. And you should be. This is not the first time I’ve encountered complete idiocy at the hands of tech support and it is also not the first time tech support has blamed me for something that had nothing to do with me. The lesson to take away from this is that when something happens tech support is going to blame you for it. They are not going to make any effort to figure out what is going on, and they are likely to kill your site and cancel your service unless you yourself figure out what the cause is and solve it. As I wrapped up my Dies Irae I was stuck with a single thought: What if this happened to someone who didn’t have my skillset? They would have followed the advice of tech support: spent unknown amounts of money hiring a professional developer, ran a virus scan on their system, and likely would have turned up nothing. Their site would have been moved off the server or even worse moved to a more expensive hosting option, and nothing would have been resolved.
In the end the customer would end up paying for tech support’s utter ignorance.
Rest assured I am escalating this to the highest levels of management in the hosting company. And I will publish whatever response I get.
Diagnosis, Aftermath and Fallout
Once the site got back up and online on Friday morning I started a careful inspection of all server logs. This process has continued until today and the news is not good for me or for anyone else:
First, the designisphilosophy.com domain is under repeated attack and I have taken the drastic step of taking it down for the time being. While the original attack was from a single home address in Los Angeles, California, the DDoS attack now originates from hundreds of individual IP addresses all over North America.
Second, a new wave of attacks are now targeting mor10.com. They are less extensive (usually limited to around 6,000 hits per IP), but they are ongoing. I am working on culling these attacks with various tools.
Third, close inspection of the server logs uncovered extensive and complex attacks on all my WordPress sites including hidden ones sitting in unlisted random domains. These attacks mostly target the login panel but also other areas of the sites. In the last 24 hours over 200 different IP addresses have run systematic attacks on key files and locations on all these sites. This pattern is repeated across multiple domains on multiple hosts and are also impacting sites run by clients and friends. They are not related to a specific plugin or theme but seem to be targeting any WordPress site.
Fourth, a large number of the attacks are being perpetrated by other WordPress sites. I suspect this is due to the DDoS vulnerability documented last year that has become headline news in the past week but I can’t be sure.
Bottom line: The web is infested with bots and zombies and your site is probably under attack right now.
I will go into more detail about how to protect yourself from this type of nonsense in a future post, but for now just be aware that when tech support tells you they have taken your site down for using “excessive resources” they are probably too ignorant to know it is a DDoS attack.
To be continued…