Discussion:
[Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name
Jason Hobbs
2018-05-18 12:03:55 UTC
Permalink
Christian, thanks for digging in. Yes, I really just setup base
openstack and hit this condition. I'm not doing anything to setup
devices as passthrough or anything along those lines, and I'm not trying
to start instances.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1771662

Title:
libvirtError: Node device not found: no node device with matching name

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute/+bug/1771662/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu
 Christian Ehrhardt 
2018-05-18 07:14:48 UTC
Permalink
Newly deployed Cavium System with 18.04 to get my own view onto this
(without openstack/charms in the way)

1. start a basic guest
$ sudo apt install uvtool-libvirt qemu-efi-aarch64
$ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=arm64 label=daily release=bionic
$ uvt-kvm create --password=ubuntu b1 release=bionic arch=arm64 label=daily

=> Just works, nothing special in logs
Since it was stated that the special VF/PF are not uses this already breaks the argument made in the bug report - my guest just works on this system.

2. check the odd PF/VF situation

Please note that I had only the initial renames to the new naming scheme, but no others:
dmesg | grep renamed
[ 10.450002] thunder-nicvf 0002:01:00.2 enP2p1s0f2: renamed from eth1
[ 10.489989] thunder-nicvf 0002:01:00.1 enP2p1s0f1: renamed from eth0
[ 10.629936] thunder-nicvf 0002:01:00.4 enP2p1s0f4: renamed from eth3
[ 10.877936] thunder-nicvf 0002:01:00.3 enP2p1s0f3: renamed from eth2
[ 10.957933] thunder-nicvf 0002:01:00.5 enP2p1s0f5: renamed from eth4

None of the devices has pyhsical_port_id but that is no fatal.
Because on other platforms I found the same e.g. ppc64el some have that some don't /sys/devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.0/0003:09:00.0/net/enP3p9s0f0/phys_port_id': Operation not supported
/sys/devices/pci0005:00/0005:00:00.0/0005:01:00.3/net/enP5p1s0f3/phys_port_id 0400000000334233343130363730453131

It will just use NULL which essentially menas there is just one phys
port and that is fine.

It is more interesting that it later checks physfn which exists on Cavium (but not on ppc64 for example)
ll /sys/devices/pci0002:00/0002:00:02.0/0002:01:01.4/physfn
lrwxrwxrwx 1 root root 0 May 18 06:23 /sys/devices/pci0002:00/0002:00:02.0/0002:01:01.4/physfn -> ../0002:01:00.0/

If this would NOT exist it would give up here.
But it does exist, so it tries to go on with it and then fails as it doesn't find anything.
That would match what we read in the reported upstream mail discussion.

But none of this matters as per jhobbs it should not use those devices
at all.

FYI code in libvirt around that:
virNetDevGetPhysicalFunction
-> virNetDevGetPhysPortID
-> virNetDevSysfsFile
This gives you something like
/sys/devices/pci0002:00/0002:00:02.0/0002:01:00.4/net/enP2p1s0f4/phys_port_id
-> virNetDevSysfsDeviceFile
-> virPCIGetNetName
If none of these functions failed BUT returned no path then the reported message appears.
On other HW it either works OR just doesn't find the paths and gives up before the error message.


3. check libvirt capabilities and status
As I asked before, we would need to know the libvirt action that fails, as all I tried just works.

Also general probing like one would expect on an initial nova node setup:
$ virsh capabilities
$ virsh domcapabilities
$ virsh sysinfo
$ virsh nodeinfo
works just fine without the reported errors.

4. Lets even use those devices now
The host uses enP2p1s0f1, that is:
0002:01:00.1 Ethernet controller: Cavium, Inc. THUNDERX Network Interface Controller virtual function (rev 09)
So lets use its siblings
As passthrough host-interface
0002:01:00.2 Ethernet controller: Cavium, Inc. THUNDERX Network Interface Controller virtual function (rev 09)
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0x0002' bus='0x1' slot='0x0' function='0x2'/>
</source>
</interface>
As passthrough generic hostdev:
0002:01:00.3 Ethernet controller: Cavium, Inc. THUNDERX Network Interface Controller virtual function (rev 09)
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0002' bus='0x1' slot='0x0' function='0x3'/>
</source>
</hostdev>

Note: please follow the upstream mailing list discussion on the
difference of those.

$ virsh attach-device b1 interface.xml
error: Failed to attach device from interface.xml
error: internal error: The PF device for VF /sys/bus/pci/devices/0002:01:00.2 has no network device name
And in Log:
4624: error : virPCIGetVirtualFunctionInfo:3016 : internal error: The PF device for VF /sys/bus/pci/devices/0002:01:00.2 has no network device name

As outlined in the mail-thread these special devices can still be attached, if you let libvirt handle it not as VFs but as generic PCI.
$ virsh attach-device b1 hostdev.xml
Device attached successfully
My guest can work fine with this now.

And e voila when you attach it as hostdev then (due to unplugging/pluggin on the host) you get the device renames you have seen.
[ 3222.919212] vfio-pci 0002:01:00.3: enabling device (0004 -> 0006)
[ 3229.172142] thunder-nicvf 0002:01:00.3: enabling device (0004 -> 0006)
[ 3229.219106] thunder-nicvf 0002:01:00.3 enP2p1s0f3: renamed from eth0


This is your error IMHO, but you said multiple times you are not doing that.
I assume you really want to use the VFs as passthrough devices - which is a whole other story than "just set up openstack".

If you really just set up the base nova node, then total +1 on Ryans:
"At this point, we can compare the logs to Xenial, but I think the next
step is back to the charms/nova-compute to determine how a node reports
back to openstack that a compute node is ready."

** Changed in: libvirt (Ubuntu)
Status: New => Incomplete
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1771662

Title:
libvirtError: Node device not found: no node device with matching name

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute/+bug/1771662/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubu
Ryan Harper
2018-05-17 21:19:35 UTC
Permalink
Thanks for the logs.

I generally don't see anything *fatal* to libvirt. In the nova logs, I
can see that virsh capabilities returns host information. It certainly
is failing to find the VFs on the SRIOV device; it's not clear if that's
because the device is misbehaving (we can see the kernel events
indicating the driver is being reset, enP2p1s0f1 renamed eth0, eth0
renamed to enP2p1s0f1 which can only happen if the driver has been
reset) or if the probing of device's PCI address space is triggering a
reset.

Note that netplan has no skin in this game; it applies a DHCP and DNS
config to enP2p1s0f3 which stays up the whole time, juju even bridges
en..f3 etc. The other interfaces found during boot are set to "manual"
config; that is netplan writes a .link file for setting the name, but
note that the name is the predictable name it gets from the default udev
policy anyhow.

At this point, we can compare the logs to Xenial, but I think the next
step is back to the charms/nova-compute to determine how a node reports
back to openstack that a compute node is ready.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1771662

Title:
libvirtError: Node device not found: no node device with matching name

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute/+bug/1771662/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubunt
Jason Hobbs
2018-05-17 19:42:08 UTC
Permalink
all of /var/log and /etc from the bionic deploy.

** Attachment added: "bionic-var-log-and-etc.tgz"
https://bugs.launchpad.net/charm-nova-compute/+bug/1771662/+attachment/5141000/+files/bionic-var-log-and-etc.tgz
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1771662

Title:
libvirtError: Node device not found: no node device with matching name

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute/+bug/1771662/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.co
Jason Hobbs
2018-05-17 19:36:30 UTC
Permalink
@rharper here are the logs you asked for from the bionic deploy

** Attachment added: "bionic-logs.tgz"
https://bugs.launchpad.net/charm-nova-compute/+bug/1771662/+attachment/5140998/+files/bionic-logs.tgz
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1771662

Title:
libvirtError: Node device not found: no node device with matching name

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute/+bug/1771662/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https
Ryan Harper
2018-05-17 19:06:15 UTC
Permalink
Some package level deltas that may be relevant:

ii linux-firmware 1.173
ii linux-firmware 1.157.18

ii pciutils 1:3.3.1-1.1ubuntu1.2
ii pciutils 1:3.5.2-1ubuntu

libvirt0:arm64 4.0.0-1ubuntu7~cloud0
libvirt0:arm64 4.0.0-1ubuntu8

Less likely to have an impact, guest firmware but none-the-less delta:

qemu-efi-aarch64 0~20180205.c0d9813c-2
qemu-efi 0~20160408.ffea0a2c-2
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1771662

Title:
libvirtError: Node device not found: no node device with matching name

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute/+bug/1771662/+subscriptions
--
ubuntu-bugs mailing list
ubuntu-***@lists.ubuntu.com
https://lis
Loading...