[PATCH] rpc: avoid crash when system time jump back

Daniel P. Berrangé berrange at redhat.com
Mon Feb 8 19:00:56 UTC 2021


On Mon, Feb 08, 2021 at 07:50:11PM +0100, Ján Tomko wrote:
> On a Monday in 2021, BiaoxiangYe wrote:
> > From: BiaoXiang Ye <yebiaoxiang at huawei.com>
> > 
> > Setting the system time backward would lead to a
> > multiplication overflow in function virKeepAliveStart.
> > FunctionvirKeepAliveTimerInternal got the same bug too.
> > 
> > Backtrace below:
> > #0  0x0000ffffae898470 in raise () from /usr/lib64/libc.so.6
> > #1  0x0000ffffae89981c in abort () from /usr/lib64/libc.so.6
> > #2  0x0000ffffaf9a36a8 in __mulvsi3 () from /usr/lib64/libvirt.so.0
> > #3  0x0000ffffaf8fd9e8 in virKeepAliveStart (ka=0xaaaaf954ce10, interval=interval at entry=0,
> >     count=count at entry=0) at ../../src/rpc/virkeepalive.c:283
> > #4  0x0000ffffaf908560 in virNetServerClientStartKeepAlive (client=0xaaaaf954cbe0)
> >     at ../../src/rpc/virnetserverclient.c:1628
> > #5  0x0000aaaac57eb6dc in remoteDispatchConnectSupportsFeature (server=0xaaaaf95309d0,
> >     msg=0xaaaaf9549d90, ret=0xffff8c007fc0, args=0xffff8c002e70, rerr=0xffff9ea054a0,
> >     client=0xaaaaf954cbe0) at ../../src/remote/remote_daemon_dispatch.c:5063
> > #6  remoteDispatchConnectSupportsFeatureHelper (server=0xaaaaf95309d0, client=0xaaaaf954cbe0,
> >     msg=0xaaaaf9549d90, rerr=0xffff9ea054a0, args=0xffff8c002e70, ret=0xffff8c007fc0)
> >     at ./remote/remote_daemon_dispatch_stubs.h:3503
> > #7  0x0000ffffaf9053a4 in virNetServerProgramDispatchCall(msg=0xaaaaf9549d90, client=0xaaaaf954cbe0,
> >     server=0x0, prog=0xaaaaf953a170) at ../../src/rpc/virnetserverprogram.c:451
> > #8  virNetServerProgramDispatch (prog=0xaaaaf953a170, server=0x0, server at entry=0xaaaaf95309d0,
> >     client=0xaaaaf954cbe0, msg=0xaaaaf9549d90) at ../../src/rpc/virnetserverprogram.c:306
> > #9  0x0000ffffaf90a6bc in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>,
> >     client=<optimized out>, srv=0xaaaaf95309d0) at ../../src/rpc/virnetserver.c:137
> > #10 virNetServerHandleJob (jobOpaque=0xaaaaf950df80, opaque=0xaaaaf95309d0)
> >     at ../../src/rpc/virnetserver.c:154
> > #11 0x0000ffffaf812e14 in virThreadPoolWorker (opaque=<optimized out>)
> >     at ../../src/util/virthreadpool.c:163
> > #12 0x0000ffffaf81237c in virThreadHelper (data=<optimized out>) at ../../src/util/virthread.c:246
> > #13 0x0000ffffaea327ac in ?? () from /usr/lib64/libpthread.so.0
> > #14 0x0000ffffae93747c in ?? () from /usr/lib64/libc.so.6
> > (gdb) frame 3
> > #3  0x0000ffffaf8fd9e8 in virKeepAliveStart (ka=0xaaaaf954ce10, interval=interval at entry=0,
> >     count=count at entry=0) at ../../src/rpc/virkeepalive.c:283
> > 283	        timeout = ka->interval - delay;
> > (gdb) list
> > 278    now = time(NULL);
> > 279    delay = now - ka->lastPacketReceived; <='delay' got a negative value
> > 280    if (delay > ka->interval)
> > 281        timeout = 0;
> > 282    else
> > 283        timeout = ka->interval - delay;
> > 284    ka->intervalStart = now - (ka->interval - timeout);
> > 285    ka->timer = virEventAddTimeout(timeout * 1000, virKeepAliveTimer, <= multiplication overflow
> > 286                                   ka, virObjectFreeCallback);
> > 287    if (ka->timer < 0)
> > (gdb) p now
> > $2 = 18288001
> > (gdb) p ka->lastPacketReceived
> > $3 = 1609430405
> > 
> > Signed-off-by: BiaoXiang Ye <yebiaoxiang at huawei.com>
> > ---
> > src/rpc/virkeepalive.c | 20 ++++++++++++++++++++
> > 1 file changed, 20 insertions(+)
> > 
> > diff --git a/src/rpc/virkeepalive.c b/src/rpc/virkeepalive.c
> > index 860b91b6b1..e662484ea6 100644
> > --- a/src/rpc/virkeepalive.c
> > +++ b/src/rpc/virkeepalive.c
> > @@ -119,12 +119,25 @@ virKeepAliveTimerInternal(virKeepAlivePtr ka,
> >     if (ka->interval <= 0 || ka->intervalStart == 0)
> >         return false;
> > 
> > +    if (now < ka->intervalStart) {
> > +        VIR_WARN("system time jump back detected. now=%ld,"
> > +                 " intervalStart=%ld",
> > +                 now, ka->intervalStart);
> > +        ka->intervalStart = now;
> > +    }
> > +
> 
> Instead of detecting a jump backwards, we could use a monotonic timer
> that does not go backwards.
> 
> g_get_monotonic_time () from GLib should do that
> https://developer.gnome.org/glib/stable/glib-Date-and-Time-Functions.html#g-get-monotonic-time
> 
> (Although they say the timer might not tick while the machine is
> suspended - not sure whether we need that here or not)

Your connections are doomed no matter what if you suspend for any
non-negligible amount of time, as the low level TCP connection
will get timed out by the OS most likely.

IOW regardless of which timer we use for keepalive, we're
going to start counting the keepalive failures from when we
resume. It is probably only going to show a difference for any
keepalive responses that we had received, but not yet processed
at time of suspend.

So I think using monotonic timer would likely be ok here.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list