<div dir="ltr">Hello, <div> Something else to check is your MPIO configuration. I have seen this same symptom when the linux MPIO feature "queue_if_no_path" was enabled </div><div><br></div><div> From the /etc/multipath.conf file showing it enabled. </div><div><br></div><div><span style="color:rgb(17,17,17);font-family:Roboto,sans-serif;font-size:16px"> failback immediate</span><br style="box-sizing:border-box;color:rgb(17,17,17);font-family:Roboto,sans-serif;font-size:16px"><span style="color:rgb(17,17,17);font-family:Roboto,sans-serif;font-size:16px"> features "1 queue_if_no_path"</span><br></div><div><span style="color:rgb(17,17,17);font-family:Roboto,sans-serif;font-size:16px"><br></span></div><div><span style="color:rgb(17,17,17);font-family:Roboto,sans-serif;font-size:16px"> Also, in the past some versions of linux multipathd would wait for a very long time before moving all I/O to the remaining path. </span></div><div><span style="color:rgb(17,17,17);font-family:Roboto,sans-serif;font-size:16px"><br></span></div><div><span style="color:rgb(17,17,17);font-family:Roboto,sans-serif;font-size:16px"> Regards,</span></div><div><span style="color:rgb(17,17,17);font-family:Roboto,sans-serif;font-size:16px">Don</span></div><div><span style="color:rgb(17,17,17);font-family:Roboto,sans-serif;font-size:16px"> </span></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Feb 15, 2022 at 10:49 AM Zhengyuan Liu <<a href="mailto:liuzhengyuang521@gmail.com">liuzhengyuang521@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi, all<br> <br> We have an online server which uses multipath + iscsi to attach storage<br> from Storage Server. There are two NICs on the server and for each it<br> carries about 20 iscsi sessions and for each session it includes about 50<br> iscsi devices (yes, there are totally about 2*20*50=2000 iscsi block devices<br> on the server). The problem is: once a NIC gets faulted, it will take too long<br> (nearly 80s) for multipath to switch to another good NIC link, because it<br> needs to block all iscsi devices over that faulted NIC firstly. The callstack is<br> shown below:<br> <br> void iscsi_block_session(struct iscsi_cls_session *session)<br> {<br> queue_work(iscsi_eh_timer_workq, &session->block_work);<br> }<br> <br> __iscsi_block_session() -> scsi_target_block() -> target_block() -><br> device_block() -> scsi_internal_device_block() -> scsi_stop_queue() -><br> blk_mq_quiesce_queue()>synchronize_rcu()<br> <br> For all sessions and all devices, it was processed sequentially, and we have<br> traced that for each synchronize_rcu() call it takes about 80ms, so<br> the total cost<br> is about 80s (80ms * 20 * 50). It's so long that the application can't<br> tolerate and<br> may interrupt service.<br> <br> So my question is that can we optimize the procedure to reduce the time cost on<br> blocking all iscsi devices? I'm not sure if it is a good idea to increase the<br> workqueue's max_active of iscsi_eh_timer_workq to improve concurrency.<br> <br> Thanks in advance.<br> <br> -- <br> You received this message because you are subscribed to the Google Groups "open-iscsi" group.<br> To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:open-iscsi%2Bunsubscribe@googlegroups.com" target="_blank">open-iscsi+unsubscribe@googlegroups.com</a>.<br> To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/open-iscsi/CAOOPZo4uNCicVmoHa2za0%3DO1_XiBdtBvTuUzqBTeBc3FmDqEJw%40mail.gmail.com" rel="noreferrer" target="_blank">https://groups.google.com/d/msgid/open-iscsi/CAOOPZo4uNCicVmoHa2za0%3DO1_XiBdtBvTuUzqBTeBc3FmDqEJw%40mail.gmail.com</a>.<br> </blockquote></div>