[Cluster-devel] [PATCH dlm-tool 13/14] dlm_controld: plock log lock state

Alexander Aring aahringo at redhat.com
Fri Mar 3 22:43:08 UTC 2023


Hi,

On Fri, Mar 3, 2023 at 5:20 PM Alexander Aring <aahringo at redhat.com> wrote:
>
> Hi,
>
> On Fri, Mar 3, 2023 at 11:08 AM Andreas Gruenbacher <agruenba at redhat.com> wrote:
> >
> > On Fri, Mar 3, 2023 at 4:53 PM Andreas Gruenbacher <agruenba at redhat.com> wrote:
> > > diff --git a/dlm_controld/plock.c b/dlm_controld/plock.c
> > > index 39bdd1f6..588bcaaa 100644
> > > --- a/dlm_controld/plock.c
> > > +++ b/dlm_controld/plock.c
> > > @@ -8,6 +8,7 @@
> > >
> > >  #include "dlm_daemon.h"
> > >  #include <linux/dlm_plock.h>
> > > +#include <sys/sdt.h>
> > >
>
> does this require an additional dependency?
>
> > >  /* FIXME: remove this once everyone is using the version of
> > >   * dlm_plock.h which defines it */
> > > @@ -211,6 +212,11 @@ static uint64_t dt_usec(const struct timeval *start, const struct timeval *stop)
> > >  static void plock_print_start_waiter(const struct lockspace *ls,
> > >                                      struct lock_waiter *w)
> > >  {
> > > +       const struct dlm_plock_info *info = &w->info;
> > > +
> > > +       DTRACE_PROBE7(dlm_controld, plock_wait_begin, info->number, w, info->start,
> > > +                     info->end, info->nodeid, info->pid, info->owner);
> > > +
> > >         log_plock(ls, "state waiter start %llx %p %llx-%llx %d/%u/%llx",
> > >                   (unsigned long long)w->info.number,
> > >                   w,
> >
> > An additional question I have about those events is which information
> > to log. We need to be able to identify which inode the request is for
> > (info->number), the locking range (info->start and info->end), whether
> > it is read or write lock, and which context in the kernel the request
> > refers to (this seems to be info->owner, but I'm not entirely sure
> > about that). The pid may be interesting as well, but are w or
>
> w is always true for waiters, it's just stored there as the structure
> is used in other places. It means if it's not a trylock or not. When
> we have a waiter we always have no trylock, so w == 1.
>
> > info->nodeid really useful? We should try to avoid exposing
>
> yes it is useful. So far I see you need a nodeid/pid kombination to
> see which pid had the lock state for a specific time. nodeid will tell
> you on which node the pid runs on, the same pid _could_ occur twice,
> without nodeid you don't know which process in the cluster had the
> state. Either lock or in this case contention state.
>
> > unnecessary implementation details like the addresses of objects
> > inside dlm_controld.
> >
>
> I am using it as a unique handle (probably the nodeid needs to be
> considered here as well, because the same argument like pid just more
> unlikely) to look up the start trace/log to the end trace. I used a

no, we don't need the nodeid here. We don't trace multiple
applications into one file, then yes. But we separate them in their
own trace/log file.
And using the pointer is valid because this structure should be
add/del to the list and cannot be e.g. added twice.

- Alex



More information about the Cluster-devel mailing list