NVIDIA suspend-then-hibernate

These notes are for an optimus laptop with an Intel Iris card and a GeForce RTX 3050. YMMV.

I found a systemd issue for this after typing it up, lol:
https://github.com/systemd/systemd/issues/27559

After installing X windows and configuring it to use the proprietary nvidia drivers, the system was hanging on suspend and hibernate. It turns out that NVIDIA requires a couple services be enabled that prepare the graphics card to sleep, nvidia-suspend and nvidia-hibernate. There is also an nvidia-resume service, but this apparently has been superceded by a system-sleep script (/usr/lib/systemd/system-sleep/nvidia), which is run during every sleep state change. Alert readers may note that there’s no suspend-then-hibernate service. As one might expect, the suspend-then-hibernate routine still hangs.

Here is the basic flow:
– the nvidia power management services use the Before= setting to ensure they run the /usr/bin/nvidia-sleep.sh helper script to prepare the GPU before the main systemd pm service is run
– the systemd power management services call the systemd tool /usr/lib/systemd/systemd-sleep
– systemd-sleep calls all scripts in the system-sleep directory /usr/lib/systemd/system-sleep for each state change

The nvidia-sleep script on sleep does:
– saves the current virtual terminal #
– switches to an unused vt (63)
– echoes either suspend or hibernate to /proc/driver/nvidia/suspend

The nvidia-sleep script is also called on resume, and in that case:
– echoes resume to proc file
– switches vt to saved vt #

The strategy I used to get it to work is based on https://forums.developer.nvidia.com/t/systemds-suspend-then-hibernate-not-working-in-nvidia-optimus-laptop/213690

This modifies the nvidia system-sleep script, which currently only handles the resume case, and has it also handle the suspend and hibernate cases. This duplicates the functionality of the nvidia-* service scripts, but is also run for the sustemd-then-hibernate case, so the nvidia-* scripts can all be disabled.

The problem with this script out of the box, at least with systemd 254.6, is that systemd-sleep prevents the virtual terminal from being switched in the system-sleep context for suspend-then-hibernate. There is a call to freeze_thaw_user_slice before suspend is executed that is the most obvious difference in the s2h case vs the suspend only case.

https://github.com/systemd/systemd/blob/main/src/sleep/sleep.c

In order to work around this, I had to make the following changes:
– pass along the command given by system-sleep as arg $2 to the system-sleep scripts
– modify the nvidia-sleep command to ignore resumes for the s2h command
– modify the systemd-suspend-then-hibernate.service file to run the nvidia-sleep script before and after systemd-sleep to control the suspend and resume outside the systemd-sleep lock

There are some issues / areas of improvement with this, namely that modifying systemd-suspend-then-hibernate.service directly is prone to breakage, as is modifying the nvidia scripts. Additionally, the GPU is not notified of the hibernate transition, only the initial suspend.

A couple systemd service scripts could probably be created to run before and after the systemd-suspend-then-hibernate service, so that would not need to be run. /usr/lib/systemd/system-sleep/nvidia would still need to be modified to avoid running the nvidia-sleep resume script inside the system-sleep context during a suspend-then-hibernate action.

Edit: Here’s a script to do that: /etc/systemd/system/systemd-suspend-then-hibernate.service.d/nvidia.conf

[Service]
ExecStartPre=/usr/bin/nvidia-sleep.sh suspend
ExecStartPost=/usr/bin/nvidia-sleep.sh resume

Leave a Reply

Your email address will not be published. Required fields are marked *