본문 바로가기

멋진성이의 지식iN/IBM

AMM "Correctable ECC memory error logging limit reached" - IBM BladeCenter HS22

반응형
AMM "Correctable ECC memory error logging limit reached" - IBM BladeCenter HS22

위 오류가 계속 나올시에 조치 방법입니다


Source

RETAIN tip: H196525

Symptom

The Error Light Emitting Diode (LED) is illuminated on the chassis and the BladeCenter HS22 blade server front information panel. The Advanced Management Module (AMM) system status indicates that there is a "correctable ECC memory error logging limit reached" error. The AMM logs the following errors:

19 E Blade_05 12/08/09, 11:29:06 (octans012)
Correctable memory error logging limit reached

20 E Blade_05 12/08/09, 11:29:05 (octans012)
Correctable memory error logging limit reached on DIMM 5

The memory errors occur in the following BladeCenter HS22 configuration:

- CPU-C states [Enable]

- Thermal Mode [Normal] double refresh rate

- 4 Gigabyte (GB) Samsung VLP DIMMs installed, Option part number 44T1488, replacement part number (FRU) 44T1498.

Affected configurations

The system may be any of the following IBM servers:

  • BladeCenter HS22, Type 1936, any model
  • BladeCenter HS22, Type 7870, any model

This tip is not software specific.
This tip is not option specific.

The system has the symptom described above.

Solution

Choose one of the following two (2) methods to resolve the errors:

Method 1:

Change Thermal Mode setting (preferred method)

  1. Boot the blade into the F1 "System Configuration and Boot Management" screen. Highlight "System Settings." Press Enter and select Memory. Select Thermal Mode and change the setting to "Performance."
  2. Press the Esc key twice to get to "System Configuration and Boot Management" and then select Save Settings and Exit Setup.
  3. Follow the instructions on the next screen to exit the "Setup Utility."
  4. Power the blade off for the changes to take effect and restart.

Changing "Normal" mode to "Performance" mode affects the way that the Dual In-Line Memory Modules (DIMMs) are refreshed. This results in a DIMM temperature warning message occurring at a 10 degree lower temperature. This causes no impact in most industry standard data centers.

Method 2:

Disable CPU C-State

  1. Boot the blade into the F1 "System Configuration and Boot Management" screen. Highlight System Settings, press Enter, and select Processors. Select CPU C-States, and then change the setting to "Disable."
  2. Press the Esc key twice to get to "System Configuration and Boot Management" and then select Save Settings and Exit Setup.
  3. Follow the instructions on the next screen to exit the "Setup Utility.
  4. Power the blade off for the changes to take effect and restart.

If the LED stays on after the changes have been made, do one of the following to turn it off:

  1. Using the IPMItool application (which is a third party application available for Windows and Linux):
    1. impitool sel list (to verify the log contains messages)
    2. ipmitool sel clear
    3. ipmitool sel list (to verify the log is now empty)
    4. Restart the IMM. This can be done via the AMM GUI interface (select Blade Tasks, Power/Restart, and Restart Blade System Mgmt Processor for the appropriate blade) or with the ASU command line tool (asu rebootimm).
  2. Fully power the blade off, then power it back on (do not restart the blade). This can be done with the AMM or locally at the blade.

Additional information

This error message usually indicates a failing DIMM, however, a very rare condition has been identified with Samsung DIMMs that can cause a false error. By implementing either of the recommended Workaround s above, the false "correctable ECC memory logging limit reached" error should not occur.

Note: The false "correctable ECC memory error logging limit reached" error does not indicate defective DIMMs.

반응형