Commit cb51a37
EDAC/ghes: Setup DIMM label from DMI and use it in error reports
The ghes driver reports errors with 'unknown label' even if the actual
DIMM label is known, e.g.:
EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0
module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0
page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location:
node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM
location:N0 DIMM_A0 status(0x0000000000000400): Storage error in
DRAM memory)
Fix this by using struct dimm_info's label string in error reports:
EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0
rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0
page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location:
node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM
location:N0 DIMM_A0 status(0x0000000000000400): Storage error in
DRAM memory)
The labels are initialized by reading the bank and device strings
from DMI. Now, the label information can also read from sysfs. E.g. a
ThunderX2 system will show the following:
/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
/sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
/sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
/sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
/sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
/sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
/sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
/sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
/sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0
/sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0
/sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0
/sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0
/sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0
/sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0
/sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0
/sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0
Since dimm_labels can be rewritten, that label will be used in a later
error report:
# echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label
# # some error injection here
# dmesg | grep foobar
[ 751.383533] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0
module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0
page:0x8c8dc74 offset:0x0 grain:1 syndrome:0x0 - APEI location:
node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM
location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM
memory)
[ bp: Remove curly brackets around a single if-statement in dimm_setup_label(). ]
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200528101307.23245-1-rrichter@marvell.com1 parent b3a9e3b commit cb51a37
1 file changed
Lines changed: 24 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
90 | | - | |
| 90 | + | |
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
96 | | - | |
| 96 | + | |
97 | 97 | | |
98 | 98 | | |
99 | | - | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
100 | 111 | | |
101 | 112 | | |
102 | 113 | | |
| |||
179 | 190 | | |
180 | 191 | | |
181 | 192 | | |
182 | | - | |
183 | | - | |
184 | | - | |
| 193 | + | |
185 | 194 | | |
186 | 195 | | |
187 | 196 | | |
| |||
228 | 237 | | |
229 | 238 | | |
230 | 239 | | |
231 | | - | |
232 | 240 | | |
233 | 241 | | |
234 | 242 | | |
| |||
345 | 353 | | |
346 | 354 | | |
347 | 355 | | |
348 | | - | |
| 356 | + | |
349 | 357 | | |
350 | 358 | | |
351 | 359 | | |
| |||
354 | 362 | | |
355 | 363 | | |
356 | 364 | | |
357 | | - | |
358 | | - | |
359 | | - | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
360 | 370 | | |
361 | 371 | | |
362 | 372 | | |
363 | 373 | | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
364 | 377 | | |
365 | 378 | | |
366 | 379 | | |
| |||
0 commit comments