본문 바로가기

멋진성이의 지식iN/IBM

AMM "Correctable ECC memory error logging limit reached" - IBM BladeCenter HS22

반응형

증상 : 지속적인 AMM "Correctable ECC memory error logging limit reached" - IBM BladeCenter HS22  로그 쌓임으로 인한 로그 풀현상발생

원인 : 시스템 이상동작

해결방법 : 기본적인 펌웨어가 최신인 상태에서 이 리테인 팁 적용
RETAIN tip: H196525

Symptom

The Error Light Emitting Diode (LED) is illuminated on the chassis and the BladeCenter HS22 blade server front information panel. The Advanced Management Module (AMM) system status indicates that there is a "correctable ECC memory error logging limit reached" error. The AMM logs the following errors:

19 E Blade_05 12/08/09, 11:29:06 (octans012) Correctable memory error logging limit reached
20 E Blade_05 12/08/09, 11:29:05 (octans012) Correctable memory error logging limit reached on DIMM 5

The memory errors occurs in the following BladeCenter HS22 configuration:

  • CPU-C states [Enable]
  • Thermal Mode [Normal] double refresh rate
  • 4 GB Samsung VLP DIMMs installed, Option 44T1488, Field Replaceable Unit (FRU) 44T1498

Affected configurations

The system may be any of the following IBM servers:

  • BladeCenter HS22, Type 1936, any model
  • BladeCenter HS22, Type 7870, any model

This tip is not software specific.
This tip is not option specific.

The system has the symptom described above.

Solution

Choose one of the following two (2) methods to resolve the errors:

Method 1: Change Thermal Mode setting (preferred method)

  1. Boot the blade into the F1 "System Configuration and Boot Management" screen. Highlight "System Settings". Press Enter and select "Memory". Select "Thermal Mode" and change the setting to "Performance".
  2. Press the Esc key twice to get to "System Configuration and Boot Management" and then select "Save Settings" and "Exit Setup".
  3. Follow the instructions on the next screen to exit the "Setup Utility".
  4. Power the blade off for the changes to take effect and restart.

Changing "Normal" mode to "Performance" mode affects the way that the Dual In-Line Memory Modules (DIMMs) are refreshed. This results in a DIMM temperature warning message occurring at a 10 degree lower temperature. This causes no impact in most industry standard data centers.

Method 2: Disable CPU C-State

  1. Boot the blade into the F1 "System Configuration and Boot Management" screen. Highlight "System Settings," press Enter, and select "Processors". Select "CPU C-States", and then change the setting to "Disable".
  2. Press the Esc key twice to get to "System Configuration and Boot Management" and then select "Save Settings" and "Exit Setup".
  3. Follow the instructions on the next screen to exit the "Setup Utility".
  4. Power the blade off for the changes to take effect and restart.

If the LED stays on after the changes have been made, do one of the following to turn it off:

  1. Using the IPMItool application (which is a third party application available for Windows and Linux):
    1. impitool sel list (to verify the log contains messages)
    2. ipmitool sel clear
    3. ipmitool sel list (to verify the log is now empty)
    4. Restart the IMM. This can be done via the AMM GUI interface (Blade Tasks --> Power/Restart --> select "Restart Blade System Mgmt Processor" for the appropriate blade) or with the ASU command line tool (asu rebootimm).
  2. Fully power the blade off, then power it back on (do not restart the blade). This can be done with the AMM or locally at the blade.

Additional information

This error message usually indicates a failing DIMM, however, a very rare condition has been identified with Samsung DIMMs which can cause a false error. By implementing either of the recommended Workarounds above, the false "correctable ECC memory logging limit reached" error should not occur.

Note: The false "correctable ECC memory error logging limit reached" error does not indicate defective DIMMs.

 아마 펌웨어가 나중에 나올듯 한데
임시로 이방법으로 해야합니다.

반응형