[dm-devel] [QUESTION]: multipath device with wrong path lead to metadata err

Martin Wilck mwilck at suse.com
Thu Feb 4 14:56:14 UTC 2021


On Thu, 2021-02-04 at 19:25 +0800, lixiaokeng wrote:
> 
> Hi Martin,
> 
> On 2021/1/27 7:11, Martin Wilck wrote:
> > So we can only conclude that (if there's no kernel refcounting bug,
> > which I doubt) either orphan_path()->uninitialize_path() had been
> > called (closing the fd),  or that opening the sd device had failed
> > in
> > the first place (in which case the path WWID should have been
> > nulled in
> > pathinfo(). In both cases it makes little sense that the path
> > should
> > still be part of a struct multipath. 
> 
> I have an idea.
> 
> If pp->fd < 0 ("Couldn't open device node"), pathinfo() returns
> PATHINFO_FAILED. Don't close(pp->fd) in orphan_path(). It may solve
> the
> problem (device with wrong path). I will take some time to test it.

Do you have evidence that the fd had been closed in your error case?
The path in question wasn't orphaned, if I understood correctly. You
said it was still member of a map. In that case, the fd *must* be open.

> However, I don’t know if there are potential risks. Do you have
> suggestions about this?

Other than resource usage ... users might be irritated because if we do
this and a device is remove and reappears, it will *always* have a
different device node attached. But the device nodes are random today,
anyway. If we missed a delete event, we might keep this fd open
forever, because a re-added path would never get the same sysfs path
again; not sure if that might hurt in some scenarios. We shouldn't miss
delete events anyway, of course.

So no, at least off the to of my head, I can't think of anything
serious. Famous last words ;-)

We must make sure to close the fd in the free_path() code path, of
course.

Btw, I just double-checked that the kernel really behaves as I thought.
You can run e.g. in python:

>>> import os
>>> f=os.open("/dev/sdh", os.O_RDWR|os.O_EXCL)

This will keep an fd to the device open. Now if you delete the device
and re-add it by scanning the scsi host, it will get a new device ID.

echo 1 >/sys/block/sdh/device/delete 
echo - - - >/sys/class/scsi_host/host2/scan

If you close the fd in python and repeat the delete/re-add (and nothing
else happened in the meantime), it will become "sdh" again.

Cheers,
Martin






More information about the dm-devel mailing list