sneaky pci_dev refcounts

Normal friends and family, just skip this post. Trust me.

Recently, while doing some PCI slot symlink work, I noticed that we weren’t calling pci_release_dev() after taking a card offline (via sysfs). Well, at least not the first time, due to a leaked refcount in pci_get_dev_by_id(). gregkh fixed it, so that’s good, but I had a slightly difficult time tracking down the exact problem.

See, I had instrumented pci_dev_get() and pci_dev_put() to call dump_stack() every time the device that I cared about was touched:

--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -789,6 +789,11 @@ struct pci_dev *pci_dev_get(struct pci_dev *dev)
 {
        if (dev)
                get_device(&dev->dev);
+       if (dev && (dev->bus->number == 1) && (PCI_SLOT(dev->devfn) == 2)) {
+               dump_stack();
+               printk(KERN_INFO "dev refcount: %dn",
+                       atomic_read(&dev->dev.kobj.kref.refcount));
+       }
        return dev;
 }
 

[mutatis mutandis for pci_dev_put()]

But I wasn’t getting all the information I needed. Other pieces of code were playing with that device’s refcount too. So I did this instead (thanks to willy for the hint!):

@@ -983,7 +984,20 @@ int device_register(struct device *dev)
  */
 struct device *get_device(struct device *dev)
 {
-       return dev ? to_dev(kobject_get(&dev->kobj)) : NULL;
+
+       if (!dev)
+               return NULL;
+
+       if (dev->bus == &pci_bus_type) {
+               struct pci_dev *pci_dev = to_pci_dev(dev);
+               if ((pci_dev->bus->number == 1) && (PCI_SLOT(pci_dev->devfn) == 2)) {
+                       dump_stack();
+                       printk(KERN_INFO "dev refcount: %dn",
+                              atomic_read(&pci_dev->dev.kobj.kref.refcount));
+               }
+       }
+
+       return to_dev(kobject_get(&dev->kobj));
 }

Armed with that patch, this is the list of code paths (as of 2.6.27) that can affect a struct pci_dev’s refcount:

* pci_dev_get
* pci_dev_put
* device_add -> get_device
* device_add -> klist_add_tail -> ... -> get_device
* device_del -> put_device
* device_del -> klist_del -> ... -> put_device
* bus_attach_device -> klist_add_tail -> ... -> get_device
* bus_remove_device -> klist_del -> ... -> put_device
* acpi_platform_notify -> acpi_bind_one -> get_device

There are lots of code paths that can potentially directly call device_add/del so if you’re debugging that, you’ll need to figure out which ones affect you. I included that acpi_bind_one() as an example of a surprising place where we might directly call get_device().

The only surprising gotcha might be the calls to klist_add_tail/klist_del. Those calls will be non-obvious to grep for while debugging your device’s lifetime. So if you are wondering why your struct pci_dev isn’t calling its ->release() properly, just patch get_device() directly, and be prepared to stare at the output for a while. Good luck.

[Note that device_add bumps the refcount by 2, but then calls put_device() during cleanup, for a net gain of +1.]