Discussion:
[Bug 1491797] [NEW] Shuts down when supposed to suspend as a reaction to self-caused overheat, session lost
(too old to reply)
Harri K. Hiltunen
2015-09-03 11:33:18 UTC
Permalink
Public bug reported:

Error:
Kernel foolishly shuts down the computer when it overheats.
/var/log/kern.log
W500 kernel: [1448.648529] thermal thermal_zone1: critical temperature reached (100 C), shutting down

Consequence:
Shutting down destroys session in Ubuntu, Gnome, and all applications that can't remember their latest conscious state (most applications).

Attempted repair, failed:
Laptop has sleeping ability, but I can't find the setting for the kernel to make the computer suspend instead of shutting down.

Repair suggestions:
1. Persistence of session, so that everything would reappear after the restart. (this would also make updating less disruptive)
2. Do not heat the machine like crazy; speed up fans or slow down processes. (problematic Lenovo Thinkpad W500 fan on low speed right up to the fiery end)
3. Put the computer to sleep when it's too hot.

(The problem has remained the same from at least Ubuntu 11.10 through
14.04)

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-62-generic 3.13.0-62.102
ProcVersionSignature: Ubuntu 3.13.0-62.102-generic 3.13.11-ckt24
Uname: Linux 3.13.0-62-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.12
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: user 2171 F.... pulseaudio
CurrentDesktop: Unity
Date: Thu Sep 3 13:42:28 2015
HibernationDevice: RESUME=UUID=991e1383-ff5b-46c1-84c4-c904e1d81256
InstallationDate: Installed on 2013-12-29 (612 days ago)
InstallationMedia: Ubuntu 13.10 "Saucy Salamander" - Release amd64 (20131016.1)
MachineType: LENOVO 4063B22
PccardctlIdent:
Socket 0:
no product info available
PccardctlStatus:
Socket 0:
no card
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-62-generic root=UUID=bd426989-b545-41b3-97b8-de9410f27aa6 ro persistent quiet splash vt.handoff=7
RelatedPackageVersions:
linux-restricted-modules-3.13.0-62-generic N/A
linux-backports-modules-3.13.0-62-generic N/A
linux-firmware 1.127.15
SourcePackage: linux
UpgradeStatus: Upgraded to trusty on 2014-04-27 (494 days ago)
dmi.bios.date: 12/14/2011
dmi.bios.vendor: LENOVO
dmi.bios.version: 6FET92WW (3.22 )
dmi.board.name: 4063B22
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6FET92WW(3.22):bd12/14/2011:svnLENOVO:pn4063B22:pvrThinkPadW500:rvnLENOVO:rn4063B22:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 4063B22
dmi.product.version: ThinkPad W500
dmi.sys.vendor: LENOVO

** Affects: linux (Ubuntu)
Importance: Undecided
Status: New


** Tags: amd64 apport-bug trusty
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Brad Figg
2015-09-03 12:00:08 UTC
Permalink
This change was made by a bot.

** Changed in: linux (Ubuntu)
Status: New => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-03 13:13:39 UTC
Permalink
As recommended on irc/Freenode/#ubuntu-kernel ;
-installed linux-generic-lts-vivid
-ensured that new kernel is running
-ensured that thermald is running
The problem remains the same: overheating with fan speed staying low (3500 out of 5000rpm) after several minutes at 95C.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-03 13:42:02 UTC
Permalink
** Description changed:

- Error:
+ Error:
Kernel foolishly shuts down the computer when it overheats.
/var/log/kern.log
W500 kernel: [1448.648529] thermal thermal_zone1: critical temperature reached (100 C), shutting down

Consequence:
Shutting down destroys session in Ubuntu, Gnome, and all applications that can't remember their latest conscious state (most applications).

Attempted repair, failed:
- Laptop has sleeping ability, but I can't find the setting for the kernel to make the computer suspend instead of shutting down.
+ Laptop has suspending ability, but I can't find the setting for the kernel to make the computer suspend instead of shutting down.

Repair suggestions:
1. Persistence of session, so that everything would reappear after the restart. (this would also make updating less disruptive)
2. Do not heat the machine like crazy; speed up fans or slow down processes. (problematic Lenovo Thinkpad W500 fan on low speed right up to the fiery end)
- 3. Put the computer to sleep when it's too hot.
+ 3. Put the computer to suspend when it's too hot.

(The problem has remained the same from at least Ubuntu 11.10 through
14.04)

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-62-generic 3.13.0-62.102
ProcVersionSignature: Ubuntu 3.13.0-62.102-generic 3.13.11-ckt24
Uname: Linux 3.13.0-62-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.12
Architecture: amd64
AudioDevicesInUse:
- USER PID ACCESS COMMAND
- /dev/snd/controlC0: user 2171 F.... pulseaudio
+  USER PID ACCESS COMMAND
+  /dev/snd/controlC0: user 2171 F.... pulseaudio
CurrentDesktop: Unity
Date: Thu Sep 3 13:42:28 2015
HibernationDevice: RESUME=UUID=991e1383-ff5b-46c1-84c4-c904e1d81256
InstallationDate: Installed on 2013-12-29 (612 days ago)
InstallationMedia: Ubuntu 13.10 "Saucy Salamander" - Release amd64 (20131016.1)
MachineType: LENOVO 4063B22
PccardctlIdent:
- Socket 0:
- no product info available
+  Socket 0:
+    no product info available
PccardctlStatus:
- Socket 0:
- no card
+  Socket 0:
+    no card
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-62-generic root=UUID=bd426989-b545-41b3-97b8-de9410f27aa6 ro persistent quiet splash vt.handoff=7
RelatedPackageVersions:
- linux-restricted-modules-3.13.0-62-generic N/A
- linux-backports-modules-3.13.0-62-generic N/A
- linux-firmware 1.127.15
+  linux-restricted-modules-3.13.0-62-generic N/A
+  linux-backports-modules-3.13.0-62-generic N/A
+  linux-firmware 1.127.15
SourcePackage: linux
UpgradeStatus: Upgraded to trusty on 2014-04-27 (494 days ago)
dmi.bios.date: 12/14/2011
dmi.bios.vendor: LENOVO
dmi.bios.version: 6FET92WW (3.22 )
dmi.board.name: 4063B22
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6FET92WW(3.22):bd12/14/2011:svnLENOVO:pn4063B22:pvrThinkPadW500:rvnLENOVO:rn4063B22:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 4063B22
dmi.product.version: ThinkPad W500
dmi.sys.vendor: LENOVO
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ub
Joseph Salisbury
2015-09-03 14:03:22 UTC
Permalink
Have you tried cleaning the fans and vents to ensure they are free of
dust?

** Changed in: linux (Ubuntu)
Importance: Undecided => High
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Joseph Salisbury
2015-09-03 14:03:29 UTC
Permalink
Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v4.2 kernel[0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed".


Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.2-unstable/


** Changed in: linux (Ubuntu)
Status: Confirmed => Incomplete
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-03 15:50:22 UTC
Permalink
The hardware was disassembled and everything was fine. Fan kept speed well after blowing into it.
I drilled extra holes to the case bottom to aid air flow.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-03 15:54:18 UTC
Permalink
Installed kernel Linux 4.2.0-040200-generic .
Ensured it was running.
Problem remains the same: overheating with fan speed staying low (up to 3600 out of 5000rpm) after several minutes at 90-96C.


** Changed in: linux (Ubuntu)
Status: Incomplete => Confirmed

** Tags added: kernel-bug-exists-upstream
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Colin Ian King
2015-09-03 16:07:09 UTC
Permalink
I'd like to get a profile of your machine's CPU clock speed, CPU
utilization and thermal zone temperatures. Can you install the latest
powerstat and run a profile for me. To do so, use:

sudo add-apt-repository ppa:colin-king/white
sudo apt-get update
sudo apt-get install powerstat

then run:

powerstat -Da 1 60

and attach the output to the bug report. Thanks!
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Colin Ian King
2015-09-03 16:25:18 UTC
Permalink
This model of thinkpad has 8 levels of fan control:
0=off,
1-2 = 1900 RPM
3-5 = ~3000 RPM,
6-7 = ~3500 RPM
and a disengaged mode works at ~5100 RPM

The 6-7 level @ 3500 RPM should be enough to dissipate heat generated by
a loaded CPU. The fact that your fan is running at 3500 RPM is
indicating that the fan control is correctly enabling the highest fan
control level (6-7) and even that is not enough to dump all the heat out
of the laptop. Also, thermald should be actively throttling back the
CPU as a passive mode control, so this should also help reduce the
overheating. The fact that this seems to be occurring across a wide
range of kernel versions suggests to me that that perhaps this is
hardware related, for example, is the thermal paste between the CPU and
the thermal pipe/fan unit working correctly?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-03 16:33:33 UTC
Permalink
running
"powerstat -Da 1 60"
returns
"Device does not have any RAPL domains, cannot power measure power usage."
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-03 16:44:13 UTC
Permalink
I might renew the cooler paste some day.

Why isn't the fan being run in disengaged mode at 5100 RPM by the kernel
in a thermal emergency? Surely that would be easier to do than to have
every Lenovo W500's cooler pastes renewed?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Elias Aarnio
2015-09-03 17:04:36 UTC
Permalink
I also get "Device does not have any RAPL domains, cannot power measure
power usage."

Thinkpad X201, Ubuntu 14.04 LTS.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Colin Ian King
2015-09-03 17:05:16 UTC
Permalink
OK, no RAPL interface, so can you run "powerstat -za 1 480" instead.

The default mode for the fan is to be controlled by the firmware and not
the kernel, so the kernel has no direct control by default. The
alternative mechanism is to enable the thinkpad_acpi fan control and
twiddle the settings either manually or by software control. By
default though, one would expect the firmware do be able to control the
fan correctly since that what the system designers intended to be the
default fan control mode.

If you do intend to try twiddling the fan controls manually, I believe
the following instructions may work (but I've not tried these myself, so
I can't vouch that they are correct):

as root, create a new file /etc/modprobe.d/thinkpad_acpi.conf and add
the following line to it:

options thinkpad_acpi fan_control=1

..and reboot.

as root try the following to set the fan to the highest "engaged" fan
speed:

echo level 7 > /proc/acpi/ibm/fan

..hopefully that will crank the fan up to ~5100 RPM in engaged mode.

you can enabled disengaged mode using:

echo level disengaged > /proc/acpi/ibm/fan

Essentially, "engaged" mode is where the fan speed is locked to a
defined fan speed. "disengaged" mode can be used to drive the fan
faster (I believe there is no feedback loop between the speed and a
firmware control that sets the speed in disengaged mode). For more
details, see http://www.thinkwiki.org/wiki/How_to_control_fan_speed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-03 17:06:36 UTC
Permalink
If thermald is supposed to throttle CPU according to temperature, it
doesn't work.

Whatever is throttling the CPU seems to be unaware of temperature,
because the more I load the machine, the more reliably "cpufreq-info -w"
returns "2801000" while temperature is at 90-95 C.

Only when there is less load, does the CPU run at lower frequencies.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Colin Ian King
2015-09-03 17:19:06 UTC
Permalink
We can examine the thermald issues later, lets get some idea of what
powerstat says and then I can ponder on the next best step to make.

** Changed in: linux (Ubuntu)
Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: linux (Ubuntu)
Status: Confirmed => In Progress
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-03 17:42:17 UTC
Permalink
Attachment from "powerstat -za 1 480".

** Attachment added: "Intermittent full load over 30% base load"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+attachment/4456951/+files/powerstat-lenovo-W500-2015.09.03.txt
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Colin Ian King
2015-09-03 19:00:34 UTC
Permalink
Attached is a spreadsheet with the above data and some graphs.

The first graph compares normalized data from the ACPI thermal zones
(acpiz), CPU freq and CPU utilization. This shows a good correlation
on CPU utilization and temperature (as we would expect). CPU frequencies
are scaling according to the load and looks OK to me.

The second graph is a comparison of ACPI thermal zones and CPU
utilization in absolute terms (e.g. degrees C and CPU %). Note that
the CPU is loaded never less than 20% and that at this level the
temperature seems to be levelling out to around 70 degrees C which seems
rather high. The CPU is bouncing around 800Mhz - 1.8GHz at that point,
which I guess is to be expected if one has so many context switches and
IRQs occurring.

So. Some interesting data:

1. The IRQ rate does seem high.
2. The Context switch rate seems high too.
3. The machine is not that idle.
4. Even at a low-ish load, the system is rather hot. (This makes me wonder if there is something physically wrong between the CPU and the heatpipe/fan unit.)
5. Dropping from fully loaded to partially loaded we see heat being dissipated but not that well, e.g. it drops from nearly 100 C down to 70-65 C, which is surprising and I expected this to go lower.

Hence I think it may be a physical hardware issue with the cooling.


** Attachment added: "LibreOffice spreadsheet"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+attachment/4456965/+files/powerstat-lenovo-W500-2015.09.03.ods
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-04 03:11:39 UTC
Permalink
I can't see how that information could be helpful in fixing the problem. It doesn't matter whether there's a common hardware ailment involved in this particular overheating or not. The software is acting like an idiot and wasting all its chances to remedy the situation:
-fan not running at full speed in disengaged mode in a thermal emergency
-CPU not throttling in a thermal emergency (unless the frequency readings are wrong)
-shutting down when supposed to suspend as a reaction to overheat, unnecessarily destroying session
-destroying session in a shut down/restart cycle (I heard rumours this may be fixed later in Snappy with containers)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Colin Ian King
2015-09-04 05:30:29 UTC
Permalink
Let's take this one point at a time:
* fan not running at full speed in disengaged mode in a thermal emergency
- as mentioned earlier, the default fan mode on the machine is to run under firmware control, in which case it runs in engaged mode with a loop feed back controller so it never exceeds a top speed of 3500 RPM. This matches the original thermal design by the manufacturer. So either they made a mistake and all machines like yours overheat (and we would see lots of owners with your machine reporting this bug) or this issue is particular to your machine

* CPU not throttling in a thermal emergency (unless the frequency readings are wrong)
- that needs investigation as thermald should be doing that (but as I mentioned earlier, I will examine the thermald issues later)

* shutting down when supposed to suspend as a reaction to overheat, unnecessarily destroying session
- when a critical thermal event occurs one has a very short time window to react. Potentially the silicon may be permanently damaged, so the kernel chooses to power down rather ran try to suspend (since this can get stuck and exacerbate the issue). Without the handling of this thermal event, the next step is for the hardware to physically shut itself down which is out of any form of operating system control, so either way, the machine is desperately trying to save itself from breaking.

* destroying session in a shut down/restart cycle (I heard rumours this may be fixed later in Snappy with containers)
- again, in a rush to save your silicon from becoming irreparably damaged shutdown is the fastest mechanism. Snappy containers will not help.

I'd recommend reading https://en.wikipedia.org/wiki/Thermal_design_power, there is paragraph that states:
"Most modern processors will cause a therm-trip only upon a catastrophic cooling failure, such as a no longer operational fan or an incorrectly mounted heatsink."

So, the next step will be to see if we can see what thermald is doing.

1. Stop thermald so we can re-enable it with full debug on:

sudo systemctl stop thermald (if you are using systemd)

or

sudo service thermald stop (if you are using upstart)

2. Run thermald for a while from the command line and capture debug
output:

sudo thermald --no-daemon --dbus-enable --loglevel=debug | tee
thermald.log

..run this say for 5-10 minutes and use your machine, then attach the
thermald.log to the bug report

3. Re-start themrald

sudo systemctl start thermald (if you are using systemd)

or

sudo service thermald start (if you are using upstart)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-04 06:37:50 UTC
Permalink
Attachment from "sudo thermald --no-daemon --dbus-enable
--loglevel=debug | tee thermald.log".

** Attachment added: "A few minutes full load over 30% base load"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+attachment/4457110/+files/thermald-lenovo-W500-2015.09.04.log
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-04 07:06:45 UTC
Permalink
Responses to the points above I disagree with:
* fan not running at full speed in disengaged mode in a thermal emergency
"... so it never exceeds a top speed of 3500 RPM. This matches the original thermal design by the manufacturer."
-When reality acutely requires full speed, it is madness to refer to original designs.

"... we would see lots of owners with your machine reporting this bug..."
-Where are the 100 automated error reports of my previous overheat crashes? I quess nowhere, because Apport doesn't catch them - it is not considered an error to shut down in a self-caused thermal emergency.
I procrastinated filing this bug report for 3 years, because "surely someone will notice all these crashes any day now". Then it took many days of persistent work to find out how to file a kernel bug report using Apport, because there is no such option in the menu; you have to ask Ubuntu support to find out the trick. It is too difficult, so don't expect people to do it. How many users even know how to launch Apport? There is no "Report a bug" in Gnome menu.

* shutting down when supposed to suspend as a reaction to overheat, unnecessarily destroying session
"when a critical thermal event occurs one has a very short time window to react. ... the machine is desperately trying to save itself from breaking."
-Then why not suspend in 1 second, but instead shut down in 8 seconds?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Colin Ian King
2015-09-04 08:53:18 UTC
Permalink
From my understanding, I can see that thermald is detecting the passive
trip point temperature being reached at 75 degrees C and then it
attempts to move to cpufreq index position 3; so it seems to be trying
to do some passive CPU frequency switching.

Sensor hwmon :temp 76000
update_set_point 76000,0,98000
pref 0 type 1 temp 76000 trip 99000
Passive Trip point applicable
Trip point applicable < 0:99000
cdev size for this trippoint 2
cdev at index 0:Processor
Need to switch to next cdev
cdev at index 3:cpufreq
Need to switch to next cdev
pref 0 type 2 temp 76000 trip 95000
Passive Trip point applicable
Trip point applicable < 1:95000
cdev size for this trippoint 2
cdev at index 0:Processor
Need to switch to next cdev
cdev at index 3:cpufreq
Need to switch to next cdev

What is really interesting is that your firmware is configured to do
passive (e.g. non-fan) cooling strategies at 75 degrees C. In the
earlier analysis we saw that even at 20% CPU load your machine is close
to this temperature. So, we can conclude that slightly busy CPU
already into the thermal danger zone and informing the OS to start
attempting passive cooling strategies.

Key facts we can draw from this:

At 20% utilization your machine is already moving into the zone where
the designers of the machine believe that fan control is not sufficient
and hence is triggering passive cooling strategies.

That really tells me that your machine is having some cooling issues
between the CPU and the fan. This implies that one should check that
there is sufficient thermal paste between the CPU and the heat pipe.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-04 13:31:16 UTC
Permalink
The thermald throttling malfunctioning is interesting.

Yes, I believe my laptop is among the several % of portables having
cooling issues due to old age and/or on-the-edge design. Like I said,
it's a common ailment. Just google "ubuntu laptop overheating" to find
out how common. What are we supposed to do to fix that? Approach thermal
grease manufacturers about the short life of their products? Solving my
problem overhauling my computer is not fixing the problem, it's cover
up.

Now how about fixing all the ways the kernel is behaving badly in all
those overheating machines?

Things for software to do in an overheating case (from the original bug report):
-use the fan up to its maximum speed (not up to what a designer years ago assumed would probably be enough)
-throttle the CPU (thermald troubleshooting in progress)
-in a heat emergency, suspend in 1 second instead of shutting down wasting 8 seconds and my session
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Colin Ian King
2015-09-04 13:51:18 UTC
Permalink
* use the fan up to its maximum speed (not up to what a designer years ago assumed would probably be enough)
- see comment #13 for some guidelines. As mentioned before, fan control by default is under firmware control. So, one will have to enabled the thinkpad fan control manually to adjust fan settings outside of firmware control. Have you tried these yet?

* in a heat emergency, suspend in 1 second instead of shutting down wasting 8 seconds and my session
No. The system has had a thermal overrun event that explicitly states to the machine "shut down" because the silicon is at risk from thermal meltdown. This is the policy and will always be the policy.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Colin Ian King
2015-09-04 13:55:42 UTC
Permalink
Can you accept the fact that even on a low CPU load, your machine is
overheating. This looks like a H/W issue. Software can patch up the
issue, but the reality is that I honestly believe that the problem needs
to be fixed by examining the physical aspects of malfunctioning heat
extraction from your laptop. Working around this problem is akin like
saying "my house is on fire, can you fix it by opening a few windows in
the house?".
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-04 14:25:39 UTC
Permalink
I accepted it being a common hardware issue many messages up.

Which is easier; fixing the kernel to be non-destructive for these
machines, or physically overhauling all overheating laptops running
Linux?

The "opening windows in a house fire" -parable is inaccurate. What I'm
suggesting for the house fire is "automatically open all water taps for
cooling and to alert people, maximize ventilation to remove smoke, and
text all phones in the approximate GPS location of the house with the
warning: detected house fire growing at #address - vacate the building
immediately bringing your relatives and valuables outside with you".
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Colin Ian King
2015-09-04 15:06:47 UTC
Permalink
It is easier if you first check that your specific machine is not the
root issue, then we can consider the wider issue.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-04 18:29:07 UTC
Permalink
It doesn't matter whether this machine has the common old-age ailment of
dried thermal grease or not. It's part of normal ageing, so Linux needs
to be able to handle it in a non-destructive way.

Now Linux is being bullheadedly intolerant to common ailments. It's like
a doctor letting an old patient suffer from an indefinitely drug-
relievable illness and saying "no, you must buy an expensive life-
threatening surgery to correct the underlying root cause". (This parable
is apt, because in my computer one insert brass nut has popped off the
case and is now spinning in its cavity with the screw, so I would have
to do something drastic to even reach the thermal grease.)

What kind of an attitude is that to the continual improvement of Linux?
https://en.wikipedia.org/wiki/Continual_improvement_process
Will it ever be smart if you avoid making it smarter? Or have I mistaken about the goal of Linux development?

Even my car from 1995 (Nissan) is smarter than this: in the case of
overheating, it can protect itself gracefully in many ways; it doesn't
kill the engine demanding immediate cooling system repair - it starts
doing every possible cooling action and avoiding various heating actions
in the order of least annoyance to the user. Then it informs the user
about the heat problem, so that they can help. Then it files an error
report about having overheated. Quite a lot better than Linux in 2015.

The concept of "failing safely" should be applied here.
https://en.wikipedia.org/wiki/Fail-safe
Users of these ailing (or just dusty) computers are now suffering of abrupt forced shut downs when they could easily be given lesser evils (noisier fan, slower processing, suspending for a bit every once in a while).

All the opportunities to remedy the situation should be taken, because
as nothing always works, a low number of actions taken will more likely
fail in someone's computer.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Colin Ian King
2015-09-04 18:57:00 UTC
Permalink
I therefore suggest speaking to Lenovo about this.

** Changed in: linux (Ubuntu)
Status: In Progress => Incomplete

** Changed in: linux (Ubuntu)
Assignee: Colin Ian King (colin-king) => (unassigned)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Colin Ian King
2015-09-04 19:00:08 UTC
Permalink
Or, try fan control as I suggested earlier. see comment #13 for some
guidelines.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-04 20:55:36 UTC
Permalink
Speaking to Lenovo won't automatically fix the issue, because it is way
harder to get people to install bios updates than to accept kernel
updates.

Fan control disengaged mode works: fan runs 4700 RPM, which keeps
temperature down at 84 C after several minutes at full load.
Unfortunately in disengaged mode the fan always runs 4700 RPM, no matter
what temperature or load.

So, make the system switch between fan control modes according to
temperature and its growth rate to narrowly avoid hitting 100 C (it
needs to be done on many affected Lenovos).
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Harri K. Hiltunen
2015-09-06 13:50:23 UTC
Permalink
** Description changed:

Error:
- Kernel foolishly shuts down the computer when it overheats.
+ Kernel foolishly shuts down the computer when it overheats because of the underlying bug "Overheat due to slow fans when on 'auto'"
+ ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/751689 ).
/var/log/kern.log
W500 kernel: [1448.648529] thermal thermal_zone1: critical temperature reached (100 C), shutting down
+ (Hardware is ok: when the fan is forced to full speed, no overheating.)

Consequence:
Shutting down destroys session in Ubuntu, Gnome, and all applications that can't remember their latest conscious state (most applications).

Attempted repair, failed:
Laptop has suspending ability, but I can't find the setting for the kernel to make the computer suspend instead of shutting down.

Repair suggestions:
1. Persistence of session, so that everything would reappear after the restart. (this would also make updating less disruptive)
- 2. Do not heat the machine like crazy; speed up fans or slow down processes. (problematic Lenovo Thinkpad W500 fan on low speed right up to the fiery end)
+ 2. Do not heat the machine like crazy; speed up fans or slow down processes. (fix the slow fan bug affecting Lenovo Thinkpads and fix thermald failing to throttle CPU)
3. Put the computer to suspend when it's too hot.

- (The problem has remained the same from at least Ubuntu 11.10 through
- 14.04)
+ The problem has remained the same from at least Ubuntu 11.10 through
+ 14.04 .
+

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-62-generic 3.13.0-62.102
ProcVersionSignature: Ubuntu 3.13.0-62.102-generic 3.13.11-ckt24
Uname: Linux 3.13.0-62-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.12
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: user 2171 F.... pulseaudio
CurrentDesktop: Unity
Date: Thu Sep 3 13:42:28 2015
HibernationDevice: RESUME=UUID=991e1383-ff5b-46c1-84c4-c904e1d81256
InstallationDate: Installed on 2013-12-29 (612 days ago)
InstallationMedia: Ubuntu 13.10 "Saucy Salamander" - Release amd64 (20131016.1)
MachineType: LENOVO 4063B22
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-62-generic root=UUID=bd426989-b545-41b3-97b8-de9410f27aa6 ro persistent quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-62-generic N/A
 linux-backports-modules-3.13.0-62-generic N/A
 linux-firmware 1.127.15
SourcePackage: linux
UpgradeStatus: Upgraded to trusty on 2014-04-27 (494 days ago)
dmi.bios.date: 12/14/2011
dmi.bios.vendor: LENOVO
dmi.bios.version: 6FET92WW (3.22 )
dmi.board.name: 4063B22
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6FET92WW(3.22):bd12/14/2011:svnLENOVO:pn4063B22:pvrThinkPadW500:rvnLENOVO:rn4063B22:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 4063B22
dmi.product.version: ThinkPad W500
dmi.sys.vendor: LENOVO
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-
Colin Ian King
2015-09-06 20:25:39 UTC
Permalink
So it clearly appears that the fan in disengaged mode can't keep the CPU
under the first thermal trip level of 75 degrees C.

1. In this scenario, the first thermal trip level is basically saying
"start using passive cooling strategies to keep CPU cool". This implies
throttling back the CPU, for example CPU freq scaling or P-state
limiting.

2. This *clearly* indicates that something is broken at the hardware
level as I have pointed out numerous times.

Summary:

Even with fan running in disengaged mode the fan cannot get the machine below the passive trip zone level.
Hardware is clearly broken.
Not a software fix issue.
Won't Fix.


** Changed in: linux (Ubuntu)
Status: Incomplete => Won't Fix
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Colin Ian King
2015-09-07 12:16:00 UTC
Permalink
** Changed in: linux (Ubuntu)
Importance: High => Wishlist
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1491797

Title:
Shuts down when supposed to suspend as a reaction to self-caused
overheat, session lost

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491797/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Loading...