The day has finally arrived. You feared it would come and tried to prepare for it, but now that it is here, you are frozen, not sure what to do. That moment when you go to your website and realize it is offline or, even worse, get a call from a client complaining about down time, you suddenly realize something is terribly wrong, and it is all up to you to fix it.
Despite the urgency of the situation, the first thing you should do is not panic. That will not solve anything and may prevent you from looking at a simple solution clearly. The following are steps you can take to quickly find out what is wrong and resolve it.
1. Find out the scope of the outage. Are all sites on the server down or just one? Is it limited to sites that share a particular IP address? In some cases, it may be as simple as a domain you forgot to renew.
2. SSH into the server. If you can still reach the server via SSH, that means, the machine is still running and connected to the network. Check your web server (i.e. Apache) and see if it has crashed. You should also check your database server, if your sites depend on it.
3. If you cannot SSH into it, try a ping. If that does not work, there is a good chance the network is either down or something is wrong with the server itself. Check with your hosting provider to find out what is wrong. If you have a good one, like managed server host 34SP.com, they will help you resolve the situation quickly. In some instances, it may require a reboot.
4. In some rare cases, your server host or data center manager may tell you your server is running just fine. If so, it may be a problem on your end or somewhere in between. A traceroute may reveal a network problem somewhere along the way. Remember, although it appears to happen instantly, a web connection often has to go through several hops to get from your computer to a remote server.
5. When something is wrong with the server, and a network connection is not possible, even after a reboot, you will need to either go to the physical server to work on it or use a virtual interface, such as a KVM switch. There may be an error with the hard drive or other device that prevents proper booting.
After you have restarted Apache or rebooted your server, you could simply go back to business as usual, but you may inadvertently ignore a looming issue. Something brought your server down, and you should do what you can to find out why. Check the server logs, especially Apache, kernel logs, and your security logs. If a spammer or hacker has compromised your system, their activities could temporarily bring down your server.
If you find out the crash was caused by a kernel error or something else related to the system, you should check the documentation and forums for your operating system. If you have paid support, this might be a good time to use it. You may even end up filing a bug report about the incident.
In the event that the problem was a security issue, it is time to beef up your firewall, intrusion detection, and application strength. If there is a vulnerability in one of your scripts or web applications, this will probably not be the last time your server is attacked.
Once you have your server running again, you should keep monitoring it to make sure the same problems do not creep up again. You can then start worrying about damage control with users, if your server was down for an extended period of time. Hopefully, being prepared in the future will prevent it from happening again.