[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH v5 3/3] libvirtd: fix crash on termination



On Mon, Nov 06, 2017 at 03:20:13PM -0500, John Ferlan wrote:


On 11/03/2017 05:20 AM, Nikolay Shirokovskiy wrote:


On 03.11.2017 11:42, Martin Kletzander wrote:
On Fri, Nov 03, 2017 at 11:07:36AM +0300, Nikolay Shirokovskiy wrote:
On 02.11.2017 19:32, Martin Kletzander wrote:
This just makes the window of opportunity (between daemonServerClose()
and the actual removal of the virNetServerPtr from the hash) smaller.
That's why I don't see it as a fix, rather as a workaround.

What I think should be done is the order of what's done with services,
servers, drivers, daemon, workers, etc.

I read the other threds and I think we're on the same note here, it's
just that there is lot of confusion and we're all approaching it
differently.

Trying to catch up, but this thread has gone beyond where I was
previously. I still have to push the first two patches of this series...

Yes, libvirtd cleanup path is awful and there's quite a few "issues"
that have cropped up because previous patches didn't take care of
refcnt'ing properly and just "added" cleanup "somewhere" in the list of
things to be cleaned up... Fortunately to a degree I think this is still
all edge condition type stuff, but getting it right would be good...

For me theory has always been to cleanup in the inverse order that
something was created - for sure libvirtd doesn't follow that at all.
For libvirtd the shutdown/cleanup logic just seemed too haphazard and
fixing never made it to the top of the list of things to think about.

Another "caveat" in libvirtd cleanup logic is that driversInitialized is
set as part of a thread that could be running or have run if the code
enters the cleanup: path as a result of virNetlinkEventServiceStart
failure before we enter the event loop (e.g. virNetDaemonRun).

Still like Erik - I do like Martin's suggestion and would like to see it
as a formal patch - although perhaps with a few tweaks.

IIRC, this particular patch "worked" because by removing the
dmn->servers refcnt earlier made sure that we didn't have to wait for
virObjectUnref(dmn) to occur after virStateCleanup.


Would it be to daring to ask you guys to send one series that includes all the
related patchsets?  We can then discuss there, it would be easier to follow as
well.  I don't want anyone to do any extra work, but we might save some review
bandwidth by resending those patches in one series.  Or maybe it's just me and
I have a hard time keeping up with everything at the same time =)


So first let's agree on what should be done when virNetDaemonClose() is
called and the loop in virNetDaemonRun() exits.  My opinion it is:

1) Stop accepting new calls and services:
   virNetServerServiceToggle(false);

This not really necessary as this has effect only if loop is running.


Sure, good point.


2) Wait for all the workers to finish:
   First part of virThreadPoolFree()

I was thinking of splitting this function to finish/free too.
After finish it is not really usable because there are no
more threads in pool anymore so we have to introduce state
finished and check it in every thread pool api function to
be safe. So this will introduce a bit of complexity only
for the sake of some scheme IMO.


Yeah, it's similar to the daemon when you are between close and dispose, but for
the daemon we need it.

What function do you think we would need to check this for?  The finish would
end all the threads and since the event loop is not running no new jobs could be
posted.

virThreadPoolSendJob is already good in this respect as it checks @quit at the beginning.
Looks like we need only to add same check to virThreadPoolSetParameters. So
not so much complexity.

I can imagine thread pool function being called from another
thread pools thread. AFAIK there are not such situation though. So just to be on a safe side.



3) Kill off clients:
   virNetServerClientClose()

4) Clean up drivers:
   virStateCleanup()

I think it is better for net daemon/servers not to know about
hypervisor drivers directly.


Sure, I was only talking about the libvirtd for the sake of simplicity.


5) Clean up services, servers, daemon, etc.
   including the second part of virThreadPoolFree()

The last thing is what should be done in virNetDaemonDispose(), but
unfortunately we (mis)use it for more than that.  Numbers 1-4 should be,
IMO in virNet{Daemon,Server}Close().

Would this solve your problem?  I mean I'm not against more reference
counting, it should be done properly, but I think the above would do the
trick and also make sense from the naming point of view.

Since both the daemon and server will probably not ever be consistent
after calling the Close function, it might make sense to just include
virNetDaemonClose() into virNetDaemonDispose() and I'm not against that.
But the order of the steps should still be as described above IMO.

This won't work IMO. If we are already in virNetDaemonDispose then
@dmn object is invalid but we don't yet call virStateCleanup yet
(if it is moved to virNetDaemonClose) and this is the problem 85c3a182 dealt with.


It would solve it if order above is followed.  Sure, virNetDaemon should not

It is ok if virStateCleanup is called from virNetDaemonClose somehow. I was
only against calling virNetDaemonClose from virNetDaemonDispose in this case.

know about any drivers, but we can simply register a cleanup callback for the
daemon which the Dispose function will call after the thread pools are cleaned
up (just thought of that now).

Well to me it is too generic for this case. We can code this order just
as order of statements.

1. close netservers (which imply finishing rpc thread pool)
2. close hypervisor drivers
3. unref @dmn

Or we can use refcounts and take point 3 out of consideration.
Also there is a patch series [1] that introduces 0 step to help 1 step to finish actually
in the situation when loop is down.

0. shutdown hypervirsor drivers.

[1] https://www.redhat.com/archives/libvir-list/2017-October/msg01134.html

Right this still exists out there, but it's hard enough to focus on the
issues being discussed here without adding more into the mix.  If we can
get the shutdown order right as Martin has proposed, then address this
afterwards, it'll perhaps make things easier.


Or just put it all in a series as hinted above ;)



Let me know what you think.  And don't forget to have a nice day!

Martin

P.S.: The order of clean ups in libvirtd's main() is absolutely terrible and
     unreadable.  Also, even though I guess it goes without saying, if proper
     reference counting is in place, the order of virObjectUnref()s and
     VIR_FREE()s should not matter.  I think it chould be changed like this:

It makes sense that in cleanup section we can first make all cleanup and
then all dispose. It is only not clear to me can we call virStateCleanup
before virNetlinkShutdown. Also I would keep comment for @dmn unref
until proper refcount for @dmn is implemented.


I don't think virStateCleanup() and virNetlinkShutdown() are dependent.  The
comment can stay, I was just trying to show that it's also a mess.  What I would
like, actually, is to have the daemon running with all the references and
pointers (that the main() function doesn't need anymore) freed, but both this
and what I suggested below is unrelated to what we're trying to fix, I don't
know why I started talking about it.


So you cut off Martin's patch here, but if one essentially did inverse
order of startup they'd get the following: (the // comments are my notes
about startup order oddity).

cleanup:
   virNetDaemonClose(dmn);

   virNetlinkEventServiceStopAll();

   if (driversInitialized) {
       /* Set via daemonRunStateInit in daemonStateInit */
       driversInitialized = false;
       virStateCleanup();
   }

   virObjectUnref(adminProgram);
   virObjectUnref(srvAdm);
   virObjectUnref(qemuProgram);
   virObjectUnref(lxcProgram);
   virObjectUnref(remoteProgram);

   // FWIW: Not exact here, but the order of startup should be
   // changed since srv is added to dmn, so realistically srv
   // depends on dmn existence.
   virObjectUnref(srv);

We can unref all of the above whenever we please.  @dmn keeps references for
everything it needs, we just have @srv in order to set some things up, we can
unref it right after we add it to @dmn.  The dependencies here are not the
concern thanks to (hopefully) proper refcnt'ing.

   virObjectUnref(dmn);

   virNetlinkShutdown();

   // This one is also strange - seems startup should move the claiming
   // to before run_dir setup after forking into background.
   if (pid_file_fd != -1)
       virPidFileReleasePath(pid_file, pid_file_fd);

During start we should quit immediately, so it should be done before forking
into background, otherwise init scripts will not know whether starting the
service failed or not.


   VIR_FREE(run_dir);

Frees can be together, again, most of them could be freed before virNetDaemonRun
since all the string are just used to initialize stuff and I believe it's
strdup'd everywhere.


   if (statuswrite != -1) {
       if (ret != 0) {
           /* Tell parent of daemon what failed */
           char status = ret;
           while (write(statuswrite, &status, 1) == -1 &&
                  errno == EINTR)
               ;
       }
       VIR_FORCE_CLOSE(statuswrite);
   }

   VIR_FREE(sock_file);
   VIR_FREE(sock_file_ro);
   VIR_FREE(sock_file_adm);

   VIR_FREE(pid_file);

   VIR_FREE(remote_config_file);
   daemonConfigFree(config);


Yes, this does differ from Martin's order...


Sure, no problem, I just quickly sorted the lines by hand, I just wanted to show
that the readability was bad as well.

John

Attachment: signature.asc
Description: Digital signature


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]