Discussion:
[Bug 1704972] [NEW] [LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe Machine check interrupt
bugproxy
2017-07-18 09:29:51 UTC
Permalink
Public bug reported:

== Comment: #0 - PAVAMAN SUBRAMANIYAM <***@in.ibm.com> - 2017-05-22 05:12:38 ==
---Problem Description---
HMI TFMR HDEC parity error is throwing Severe Machine check interrupt

---uname output---
Linux zz376p1 4.10.0-21-generic #23~16.04.1-Ubuntu SMP Tue May 2 12:54:57 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = P9

---System Hang---
The system hangs indefinitely and we have to reboot the system to recover back.

---Debugger---
A debugger is not configured


Immediately after injecting the above error, we get Severe Machine check interrupt [[Not recovered]

Contact Information = ***@in.ibm.com

Stack trace output:
no

Oops output:
[ 288.655336] Severe Machine check interrupt [[Not recovered]
[ 288.655339] Severe Machine check interrupt [[Not recovered]
[ 288.655342] Severe Machine check interrupt [[Not recovered]
[ 288.655345] Severe Machine check interrupt [[Not recovered]
[ 288.655348] Initiator: CPU
[ 288.655349] Initiator: CPU
[ 288.655352] Error type: Real address [Load/Store (foreign)]
[ 288.655354] Initiator: CPU
[ 288.655357] Effective address: 333035342dfe3030
[ 288.655360] Error type: Real address [Load/Store (foreign)]
[ 288.655366] Error type: Real address [Load/Store (foreign)]
[ 288.655369] Effective address: 333035342e013030
[ 288.655371] Effective address: 333035342e073030
[ 288.655418] opal: Reboot type 1 not supported
[ 288.655420] opal: Reboot type 1 not supported
[ 288.655422] opal: Reboot type 1 not supported
[ 288.655423] Kernel panic - not syncing: PowerNV Unrecovered Machine Check
[ 288.655430] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G M 4.10.0-21-generic #23~16.04.1-Ubuntu
[ 288.655433] Call Trace:
[ 288.655450] Sending IPI to other CPUs
[ 288.656767] Initiator: CPU
[ 288.656834] Error type: Real address [Load/Store (foreign)]
[ 288.656945] Effective address: 333035342e043030
[ 288.657060] opal: Reboot type 1 not supported
[ 298.655034] ERROR: 3 cpu(s) not responding
[ 298.655183] Activate system reset (dumprestart) to stop other cpu(s)


System Dump Info:
The system is not configured to capture a system dump.

*Additional Instructions for ***@in.ibm.com:
-Attach sysctl -a output output to the bug.

== Comment: #3 - MAHESH J. SALGAONKAR <***@in.ibm.com> - 2017-06-29 03:23:30 ==
(In reply to comment #2)
We need upstream commit
https://git.kernel.org/powerpc/c/be5c5e843c4afa1c8397cb740b6032 that fixes
this issue.
Hi Breno,
We will be needing this upstream commit to be included in Ubuntu 16.04.3
Did this patch make into Ubuntu 16.04.3 ?

** Affects: ubuntu-power-systems
Importance: Undecided
Status: New

** Affects: linux (Ubuntu)
Importance: Undecided
Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
Status: New


** Tags: architecture-ppc64le bugnameltc-154870 severity-critical targetmilestone-inin16043

** Tags added: architecture-ppc64le bugnameltc-154870 severity-critical
targetmilestone-inin16043

** Changed in: ubuntu
Assignee: (unassigned) => Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)

** Package changed: ubuntu => kernel-package (Ubuntu)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Frank Heimes
2017-07-18 09:34:17 UTC
Permalink
** Package changed: kernel-package (Ubuntu) => linux (Ubuntu)

** Also affects: ubuntu-power-systems
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Frank Heimes
2017-07-18 09:49:01 UTC
Permalink
** Changed in: ubuntu-power-systems
Assignee: (unassigned) => Canonical Kernel Team (canonical-kernel-team)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Joseph Salisbury
2017-07-19 20:37:42 UTC
Permalink
Did this issue start happening after an update/upgrade? Was there a
prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v4.13 kernel[0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed".


Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc1/

** Tags added: kernel-da-key

** Changed in: linux (Ubuntu)
Importance: Undecided => Medium

** Changed in: linux (Ubuntu)
Status: New => Incomplete
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Frank Heimes
2017-07-20 05:33:33 UTC
Permalink
** Changed in: ubuntu-power-systems
Status: New => Incomplete
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
bugproxy
2017-07-20 08:59:50 UTC
Permalink
------- Comment From ***@in.ibm.com 2017-07-20 04:49 EDT-------
This upstream patch has already been tested and submitted for Ubuntu 17.04 but it's not submitted for 16.04.3, this patch needs to be integrated with 16.04.3
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Joseph Salisbury
2017-07-21 16:59:26 UTC
Permalink
** Changed in: linux (Ubuntu)
Status: Incomplete => Triaged

** Changed in: ubuntu-power-systems
Status: Incomplete => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Joseph Salisbury
2017-07-21 17:10:27 UTC
Permalink
I dont see commit be5c5e8 in the 17.04(Zesty) kernel. Do you know if
this was submitted as an SRU request? Any commit put into 17.04 will
automatically be committed in the 16.04.3 kernel.

** Also affects: linux (Ubuntu Zesty)
Importance: Undecided
Status: New

** Changed in: linux (Ubuntu Zesty)
Status: New => Triaged

** Changed in: linux (Ubuntu Zesty)
Importance: Undecided => Medium

** Changed in: linux (Ubuntu)
Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) => Joseph Salisbury (jsalisbury)

** Changed in: linux (Ubuntu Zesty)
Assignee: (unassigned) => Joseph Salisbury (jsalisbury)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Frank Heimes
2017-07-21 20:05:42 UTC
Permalink
** Changed in: ubuntu-power-systems
Importance: Undecided => Medium

** Changed in: ubuntu-power-systems
Status: Confirmed => Triaged
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Andrew Cloke
2017-07-24 13:50:48 UTC
Permalink
** Tags added: triage-g
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
bugproxy
2017-07-25 03:39:32 UTC
Permalink
------- Comment From ***@in.ibm.com 2017-07-24 23:37 EDT-------
(In reply to comment #11)
I dont see commit be5c5e8 in the 17.04(Zesty) kernel. Do you know if this
was submitted as an SRU request? Any commit put into 17.04 will
automatically be committed in the 16.04.3 kernel.
I think yes this was submitted as an SRU request for 17.04. external
Canonical Launchpad bug id for 17.04 is 1684054 and we see the following
comment

### External Comment ###

SRU submitted.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
bugproxy
2017-07-25 20:19:31 UTC
Permalink
------- Comment From ***@us.ibm.com 2017-07-25 16:16 EDT-------
*** Bug 156960 has been marked as a duplicate of this bug. ***
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
bugproxy
2017-08-07 05:59:22 UTC
Permalink
------- Comment From ***@in.ibm.com 2017-08-07 01:51 EDT-------
Is this patch integrated in 16.04.3?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Joseph Salisbury
2017-08-10 16:02:49 UTC
Permalink
It looks like it was commited in the master-next branch in bug 1684054

** Changed in: linux (Ubuntu Zesty)
Status: Triaged => Fix Committed

** Changed in: linux (Ubuntu)
Status: Triaged => Fix Committed

** Changed in: ubuntu-power-systems
Status: Triaged => Fix Committed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
bugproxy
2017-09-19 08:49:34 UTC
Permalink
------- Comment From ***@in.ibm.com 2017-09-19 04:46 EDT-------
Any idea which kernel version the fix is available ?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
bugproxy
2017-09-19 09:19:39 UTC
Permalink
------- Comment From ***@in.ibm.com 2017-09-19 05:09 EDT-------
(In reply to comment #15)
Post by Joseph Salisbury
It looks like it was commited in the master-next branch in bug 1684054
Is this fix committed in to Ubuntu 16.04.3 release? which kernel version
of 16.04.3 has this fix?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/
bugproxy
2017-09-21 09:19:37 UTC
Permalink
------- Comment From ***@in.ibm.com 2017-09-21 05:16 EDT-------
Hello Canonical,

Please let us know which kernel version of 16.04.3 has this fix?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/l
Joseph Salisbury
2017-09-21 14:46:57 UTC
Permalink
The fix is in kernel version: 4.10.0-33.37

** Changed in: linux (Ubuntu Zesty)
Status: Fix Committed => Fix Released

** Changed in: linux (Ubuntu)
Status: Fix Committed => Fix Released

** Changed in: ubuntu-power-systems
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1704972

Title:
[LTCTest][Opal][FW910] HMI TFMR HDEC parity error is throwing Severe
Machine check interrupt

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1704972/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubun
Loading...