Protect Your Customers

Healthy Server Reseller News

March 10th, 2010

In this issue
  • Article: "Server Recovery Secrets of the Pros"
  • Carroll-Net Server Recovery Kit includes Memtest86+

Service For Your Customer
http://reseller.carroll.net

Server Recovery Secrets of the Pros

Boy, when things go wrong, they go wrong fast. It seems when servers start to come unraveled, things fall apart pretty quickly. One minute, the server’s humming along, and the next minute you’re up to your elbows trying to get the server to even respond to a ctrl-alt-del.

Most server recovery is focused on hard drive recovery. That’s fair enough, hard drive crash is the most common cause of server failure. But with all the attention paid to hard drives, most people ignore the failures of RAM, CPU & Motherboards, the so called green board failures.

Green Board failures typically are caused by either power problems or excess heat. Power problems are the silent killer. Inadequate UPS's, or failed power supplies never seem to give warnings. A single over voltage from utility services, and the damage is done without an indication the damage has happened.

Excess heat is usually caused by a failed fan or data room cooling system. Occasionally a failing fan will make high pitched noises before complete failure, but just as often the fan will simply stop moving air. A failure of data room cooling is much more serious because it means possible damage to all servers in the room.

One thing all green board failures have in common is they are difficult to troubleshoot. With hard drive failures, it’s usually pretty straight forward to troubleshoot the issue and identify a course of action. With green board failures it’s always a murky mess.

Enter Memtest86

One tool to make green board troubleshooting simpler is Memtest86. Memtest86 was designed to test banks of memory. But because of the tight link between memory, motherboards and CPU’s, Memtest86 ends up being an effective test of all three.

Memtest86 is a standalone memory test for x86 architecture servers. It was originally designed to address the short comings of BIOS based memory tests. BIOS tests are largely superficial and rarely identify anything other than catastrophic memory failure.

Memtest86 testing is based on some pretty simple concepts. Memory devices are composed of lots of memory cells packed tightly together. Finding subtle or intermittent errors means writing information to one area of memory, then checking the areas around it to see if they change. If nearby areas change, then memory is failing.

Memtest86 has nine built in tests, each designed to check different attributes of memory. The simplest use of the program is to run it and watch for errors. Errors are shown in flashing red and clearly indicate green board failure.

Of course, knowing what to do with the error report can be somewhat difficult. It’s complicated by the fact that often motherboard vendors don’t make it easy to identify which memory addresses correspond to which memory banks.

In general, there are three things you can do when an error is reported; 1) remove banks of memory, 2) rotate banks of memory and 3) replace banks of memory. Usually, simple trial and error will help you isolate which bank is the one causing you trouble.

One thing to keep in mind, it’s not uncommon for memory to be incompatible with certain systems. Simply because a particular bank doesn’t work doesn’t mean the bank is bad. You might want to follow up and test the bank in another system as a tie breaker.

Memtest86 once started will run until stopped. It will automatically run through each test 1 through 8, then return back to the first. The one test that requires manual selection is test 9. This test is the so called ‘Bit fade test’.

Bit Fade Test

The Bit fade test is an attempt to determine if memory will hold its value. The test is quite simple, write something to an area of memory, then wait 90 minutes and return and confirm the value is still there. The test is repeated twice, and therefore takes 3 hours to run start to finish.

The Bit Fade Test is quite effective at finding memory that is starting to go bad. If you are experiencing an unexplained crash and feel you’ve eliminated most other causes, you might consider taking a server down on the weekend and running this test. It’s incredibly effective.
 
 

Healthy Server Video

We've created a video to help clarify the benefits of Offsite Backup to your customers.


http://carroll.net/flash_movie/movie.htm

 

KLEO Bare Metal Backup
for Servers


Download your FREE Copy Today!

 

Carroll-Net Server Recovery Kit includes Memtest86+

Every copy of the Carroll-Net Server Kit includes Memtest86+.  During boot, you can activate the memory test by typing ‘memtest’ at the boot prompt.  The program will launch and begin testing memory in less than 2 seconds.

  • The top right corner displays the current status.

  • Any errors detected are displayed in red in the center of the screen.

  • You can stop testing anytime by pressing ESC, or just turning the server off (it’s completely safe to just turn off the server during testing).

You can download your free copy of the  Carroll-Net Server Recovery Kit with Memtest86+ at http://www.kleobackup.net

 
reseller.carroll.net
Copyright (c) Carroll-Net, Inc., 2010