Tag Archives: linux

NVIDIA suspend-then-hibernate

These notes are for an optimus laptop with an Intel Iris card and a GeForce RTX 3050. YMMV.

I found a systemd issue for this after typing it up, lol:
https://github.com/systemd/systemd/issues/27559

After installing X windows and configuring it to use the proprietary nvidia drivers, the system was hanging on suspend and hibernate. It turns out that NVIDIA requires a couple services be enabled that prepare the graphics card to sleep, nvidia-suspend and nvidia-hibernate. There is also an nvidia-resume service, but this apparently has been superceded by a system-sleep script (/usr/lib/systemd/system-sleep/nvidia), which is run during every sleep state change. Alert readers may note that there’s no suspend-then-hibernate service. As one might expect, the suspend-then-hibernate routine still hangs.

Here is the basic flow:
– the nvidia power management services use the Before= setting to ensure they run the /usr/bin/nvidia-sleep.sh helper script to prepare the GPU before the main systemd pm service is run
– the systemd power management services call the systemd tool /usr/lib/systemd/systemd-sleep
– systemd-sleep calls all scripts in the system-sleep directory /usr/lib/systemd/system-sleep for each state change

The nvidia-sleep script on sleep does:
– saves the current virtual terminal #
– switches to an unused vt (63)
– echoes either suspend or hibernate to /proc/driver/nvidia/suspend

The nvidia-sleep script is also called on resume, and in that case:
– echoes resume to proc file
– switches vt to saved vt #

The strategy I used to get it to work is based on https://forums.developer.nvidia.com/t/systemds-suspend-then-hibernate-not-working-in-nvidia-optimus-laptop/213690

This modifies the nvidia system-sleep script, which currently only handles the resume case, and has it also handle the suspend and hibernate cases. This duplicates the functionality of the nvidia-* service scripts, but is also run for the sustemd-then-hibernate case, so the nvidia-* scripts can all be disabled.

The problem with this script out of the box, at least with systemd 254.6, is that systemd-sleep prevents the virtual terminal from being switched in the system-sleep context for suspend-then-hibernate. There is a call to freeze_thaw_user_slice before suspend is executed that is the most obvious difference in the s2h case vs the suspend only case.

https://github.com/systemd/systemd/blob/main/src/sleep/sleep.c

In order to work around this, I had to make the following changes:
– pass along the command given by system-sleep as arg $2 to the system-sleep scripts
– modify the nvidia-sleep command to ignore resumes for the s2h command
– modify the systemd-suspend-then-hibernate.service file to run the nvidia-sleep script before and after systemd-sleep to control the suspend and resume outside the systemd-sleep lock

There are some issues / areas of improvement with this, namely that modifying systemd-suspend-then-hibernate.service directly is prone to breakage, as is modifying the nvidia scripts. Additionally, the GPU is not notified of the hibernate transition, only the initial suspend.

A couple systemd service scripts could probably be created to run before and after the systemd-suspend-then-hibernate service, so that would not need to be run. /usr/lib/systemd/system-sleep/nvidia would still need to be modified to avoid running the nvidia-sleep resume script inside the system-sleep context during a suspend-then-hibernate action.

Edit: Here’s a script to do that: /etc/systemd/system/systemd-suspend-then-hibernate.service.d/nvidia.conf

[Service]
ExecStartPre=/usr/bin/nvidia-sleep.sh suspend
ExecStartPost=/usr/bin/nvidia-sleep.sh resume

Varnish A/B Testing

Cloudflare doesn’t cache any of our PHP files so don’t need to worry about that. Cloudflare connects to our nginx SSL reverse proxy.

nginx has a split_clients directive, so could potentially add header there:

Performing A/B Testing with NGINX and NGINX Plus

The SSL reverse proxy connects to a varnish server on the same system.

Varnish can also set the a/b testing header:
https://info.varnish-software.com/blog/live-ab-testing-varnish-and-vcs

Where ever the header is set, Varnish will need to Vary based on that header.

Varnish then connects to nginx, which then connects to the php-fpm server(s). At this point, there needs to be a different wordpress instance loaded, and there are a couple different ways.

The main idea I have now is to bind mount different plugin and theme directories on top of a base directory. The base directory contains the basic wordpress install, and the images, etc (basically everything not themes/plugins). One easy way to do this is with containers, but you should be able to do it outside containers as well.

No Containers Method:
– bind mount base directory to branch directory (ie /vhosts/base -> /vhosts/test-a)
– bind mount plugins and theme directories on top (ie /usr/src/test-a/plugins -> /vhosts/test-a/wordpress/wp-content/plugins)
– map header to root directories in nginx: (needs to be in http block)
https://serversforhackers.com/c/nginx-mapping-headers
#should use mapping so that there is always a default (don’t trust header)
map $http_ab_group $ati_ab_dir {
default “all-that-is-interesting”;
a “test-a”;
}
server {
server_name allthatsinteresting.com;
server_name www.allthatsinteresting.com;
root /vhosts/$ati_ab_dir;
}

Containers Method:
– switch nginx config to use containers (basically need to use a reverse proxy instead of fastcgi)
– which reverse proxy to use would be mapped in from the header
– containers would have the base dir, plugins dir, and themes dir mounted as volumes

Containers method would require more work, and the worker pool will be segmented among the php-fpm backends, so it’s more difficult to size the max workers per container. Like say the server has memory to handle 128 workers, if you have 3 separate worker pools, do you assume that all 3 pools will be maxed, or do you oversubscribe since chances are the load will be uneven? It’s simpler and safer to just have a single PHP backend. (not to mention the added complexity of Docker)

Rust OS

I’ve been working on the Rust OS outlined in these posts:
https://os.phil-opp.com/advanced-paging/

They use a recursive page table map to map arbitrary pages to virtual memory. I’d like to use an offset mapping like the Linux kernel uses. In Linux, the kernel lives in a high memory area (above PAGE_OFFSET) and the user environments are mapped into the low memory area. Thinking about page tables makes my head spin though &em; it’s a real have to think 4th dimensionally kind of situation.

The issue is that on boot the pages are mapped in by the boot loader using its own scheme, and I’m not entirely sure what the mappings are. The kernel will have a bad time if it’s mapped away from where it is running, so if I do move to a different scheme, I’ll need to move the kernel with it. Alternatively, maybe I should modify the boot loader to map it in a way that is consistent with the final result.

Some questions I have:
1) what are the boot loader memory mappings?
2) if I move the kernel, could I just jump to the new location? are the addresses offsets or absolute?
3) could I copy the kernel pages to the equivalent offset pages? (are the equivalent pages guaranteed to be free?)

Some resources:
https://docs.rs/x86_64/0.3.5/x86_64/structures/paging/index.html
https://elinux.org/images/b/b0/Introduction_to_Memory_Management_in_Linux.pdf
https://www.kernel.org/doc/gorman/html/understand/understand006.html
https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-828-operating-system-engineering-fall-2012/labs/

Automatic photo capturing

Steps:
1) detect camera is attached
2) capture images
3) copy image to movie dir and add name to image manifest
4) trigger upload
5) move image to backup folder
6) add cron job to make movie

Detecting camera is attached

USB hotplug events are handled by udev.

Here’s how you monitor udev events to write a rule:
udevadm monitor

Get USB bus info with:
lsusb

You can list udev attributes for a device with:
udevadm info -a -p $(udevadm info -q path -n /dev/bus/usb/001/003)

Example rule:
SUBSYSTEMS==”usb”, ATTRS{product}==”Some FooDev”, RUN+=”/pathto/script”

https://unix.stackexchange.com/questions/28548/how-to-run-custom-scripts-upon-usb-device-plug-in
http://weininger.net/how-to-write-udev-rules-for-usb-devices.html

Capture images

gphoto2 can capture images from a USB connected camera and trigger an event when download. The easiest mode seems to be –capture-image-and-download. There’s an interval setting (-I) as well that will allow continuous downloads every x seconds. The –hook-script allows you to execute a script when the image is downloaded. The –filename option can set a filename pattern, but it doesn’t allow %s.

Resources:
http://www.gphoto.org/doc/manual/ref-gphoto2-cli.html
http://www.gphoto.org/doc/manual/using-gphoto2.html
http://www.gphoto.org/doc/manual/
http://photolifetoys.blogspot.com/2012/08/control-your-camera-with-gphoto2-via.html

Upload images to tumblr

This is fairly straightforward with the tumblr.js node library:

client = tumblr.createClient({api keys});

client.photo(blogname, { 'state': 'published', 'tags': tags, 'data': fullpath }, function(err) {
if (err) {
console.log(err);
process.exit(1);
}
});

The only tricky thing is that if you upload too quickly then tumblr will block you. Once every 10 minutes seems good. Maybe a cron job is the best way to do the rate limiting?

Creating movies

Before uploading to tumblr, copy files to movie directory and if the time is in the 14h window, add filename to the image list for ffmpeg.

You can do something like this in ffmpeg to read in a list of image filenames and create a film:
cat $(cat manifest) | ffmpeg -framerate $in_fr -i – -r $out_fr -preset ultrafast $out_file

UAS error

Oct 1 18:32:37 wacko kernel: [257893.680902] sd 6:0:0:0: [sdc] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
Oct 1 18:32:37 wacko kernel: [257893.680907] sd 6:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 1d 1c 58 80 00 00 08 00
Oct 1 18:32:37 wacko kernel: [257893.680957] scsi host6: uas_eh_bus_reset_handler start
Oct 1 18:32:37 wacko kernel: sd 6:0:0:0: [sdc] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
Oct 1 18:32:37 wacko kernel: sd 6:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 1d 1c 58 80 00 00 08 00
Oct 1 18:32:37 wacko kernel: scsi host6: uas_eh_bus_reset_handler start
Oct 1 18:32:37 wacko kernel: [257893.847751] usb 4-2: reset SuperSpeed USB device number 2 using xhci_hcd
Oct 1 18:32:37 wacko kernel: usb 4-2: reset SuperSpeed USB device number 2 using xhci_hcd
Oct 1 18:32:42 wacko kernel: [257898.858136] usb 4-2: device descriptor read/8, error -110
Oct 1 18:32:42 wacko kernel: usb 4-2: device descriptor read/8, error -110
Oct 1 18:32:42 wacko kernel: [257898.959042] usb 4-2: reset SuperSpeed USB device number 2 using xhci_hcd
Oct 1 18:32:42 wacko kernel: usb 4-2: reset SuperSpeed USB device number 2 using xhci_hcd
Oct 1 18:32:47 wacko kernel: [257903.970129] usb 4-2: device descriptor read/8, error -110
Oct 1 18:32:47 wacko kernel: usb 4-2: device descriptor read/8, error -110
Oct 1 18:32:47 wacko kernel: [257904.122216] scsi host6: uas_post_reset: alloc streams error -19 after reset
Oct 1 18:32:47 wacko kernel: [257904.122225] usb 4-2: USB disconnect, device number 2
Oct 1 18:32:47 wacko kernel: scsi host6: uas_post_reset: alloc streams error -19 after reset
Oct 1 18:32:47 wacko kernel: usb 4-2: USB disconnect, device number 2
Oct 1 18:32:47 wacko kernel: [257904.125987] sd 6:0:0:0: [sdc] Synchronizing SCSI cache
Oct 1 18:32:47 wacko kernel: sd 6:0:0:0: [sdc] Synchronizing SCSI cache

Ubuntu bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1591521

Similar but not exactly the same:
https://bbs.archlinux.org/viewtopic.php?id=215780

Possibly related to this commit:
https://github.com/torvalds/linux/commit/616f0e6cab4698309ff9e48ee2a85b5eb78cf31a

UAS wiki:
https://en.wikipedia.org/wiki/USB_Attached_SCSI

SCSI EH info:
https://www.kernel.org/doc/Documentation/scsi/scsi_eh.txt
http://events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf

Could try USB passthrough on vagrant:
https://github.com/vagrant-libvirt/vagrant-libvirt

dev.scsi.logging_level:
http://cateee.net/lkddb/web-lkddb/SCSI_LOGGING.html

using docker to build Varnish VMODs

Docker is containers. Let’s use it to create a build environment we can delete after we’re done. Part one is figuring out what we need to build vmods.

I like using Fedora:
mkdir /tmp/vmods
docker run -v /tmp/vmods:/data -it fedora:24 bash
dnf install -y git
dnf install -y autoconf automake libtool m4 perl-Thread-Queue make
dnf install -y varnish-libs-devel.x86_64 varnish
# dnf install -y dnf-plugins-core /usr/bin/rpmbuild
cd /usr/src
# dnf download --source varnish # no longer have to build Varnish from source to compile VMODs
# rpm -ivh varnish*src.rpm
# dnf install -y groff jemalloc-devel libedit-devel ncurses-devel pcre-devel python-docutils

dnf install -y GeoIP-devel.x86_64
git clone https://github.com/varnish/libvmod-geoip.git
cd libvmod-geoip
bash autogen.sh
./configure VMOD_DIR=/data
make
make install

dnf install -y libmhash-devel
git clone https://github.com/varnish/libvmod-digest
cd libvmod-digest
bash autogen.sh
./configure VMOD_DIR=/data
make
make install

So we split that into a docker file and a script to build the vmod:

FROM fedora:24
RUN dnf update -y
RUN dnf install -y git autoconf automake libtool m4 perl-Thread-Queue make python-docutils
RUN dnf install -y varnish-libs-devel.x86_64 varnish

ADD build_vmods.sh /

CMD bash /build_vmods.sh


OUT_DIR=/data

usage() {
echo github URL must be set for each VMOD
exit 1
}
build() {
[ -z "$URL" ] && usage
[ ! -z "$DEPS" ] && dnf install -y $DEPS

src_dir=$(mktemp -d)

git clone $URL $src_dir
cd $src_dir
bash autogen.sh
./configure VMOD_DIR=$OUT_DIR
make
make install
}

DEPS="GeoIP-devel.x86_64"
URL="https://github.com/varnish/libvmod-geoip.git"
build

DEPS="libmhash-devel"
URL="https://github.com/varnish/libvmod-digest"
build

ZFS Lustre on Centos 7

Install epel and DKMS:
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum install dkms

Install ZFS: (found https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS)
yum install http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
yum install zfs

Install libzfs-devel:
yum install libzfs2-devel.x86_64

Run configure in lustre dir, look for:
checking whether to enable zfs… yes

Create zpool:
zpool create mdt /dev/loop5

Format fs, Lustre creates device in zpool:
mkfs.lustre –backfstype=zfs –mgs –mdt –index=0 –fsname test –reformat mdt/mdt

Can also directly use backing loop file:
mkfs.lustre –backfstype=zfs –mgs –mdt –index=0 –fsname test –reformat mdt/mdt /tmp/test-mdt

references:
http://lustre.ornl.gov/ecosystem-2016/documents/tutorials/Stearman-LLNL-ZFS.pdf
http://warpmech.com/?news=tutorial-getting-started-with-lustre-over-zfs

Using wp tool to update wordpress no index

wp post meta update _yoast_wpseo_meta-robots-noindex 1

You can use url_to_postid to convert a URL to its post id in the wp shell, or by creating a wp command.

for url in $(cat /tmp/urls-to-noindex); do
id=$(wp url2id $url)
if [ "$id" == "0" ]; then
echo $url - no post found
else
wp post meta update $id _yoast_wpseo_meta-robots-noindex 1
fi;
done

Create scaffolding for a command with wp scaffold plugin.

command.php snippet:

$url2post_command = function($args) {
if (count($args) == 0) {
WP_CLI::fail("no url");
return;
}
$post_id = url_to_postid($args[0]);
echo "$post_id\n";
};
WP_CLI::add_command( 'url2post', $url2post_command );

*BONUS* how to get post publish date:
wp post get –field=post_date
(or post_modified)