In a perfect world, everything works as expected, and will continue to work indefinItely. Obviously, this just doesn’t exist, and we have to plan ahead for this.
Unfortunately, there is not a guide on what to plan for. The possibilities are just too great to list in a single publication. Most of the time, the failure has to happen at least once to you, before you plan to recover from it next time.
This is where experience comes into play, as text book readings only tell you what works, and usually not what fails. The best you can do is to play failure scenarios in your head, and how you would solve them.
For example, if a router fails, and you are not in the office, how will you access? If VPN fails? You may have a really expensive network management system sending you email notifications. What if it’s the email server that’s down? How will you know?
Your best attack, is to build redundancy into the plan from the start. That will give you time to resolve the issue while the operations still operate normally. But, if redundancy isn’t an option (costs are to high, or not available), then you’ll need to plan on how to access and resolve.
When building new solutions, make sure you have a DR (Disaster Recovery) plan in your documentation before it goes into production. Try to list all the scenarios and how you will resolve them.
For those that think dial-up is dead, think again. Sometimes, dial-up is the only way to access something OOB (Out Of Band). I have numerous sites that I dial-up to the router, if we lose the primary link. Other larger companies may have multiple links to different carriers, but even then, that may prove useless if a construction crew cuts all your local loops.
Redundancy is just like insurance. How much insurance do you need? Some, might be overkill, until you need it. You just have to find the balance between costs and the risks involved, should you be down for any length of time.
Again, nothing is ever perfect. Buying value-named equipment, which is certainly more stable than generic or cheaper brands, will never guarantee 100% uptime. And even if the equipment IS stable, you always have to plan for the human component.