[libvirt] [PATCH 00/12] Structured syslog (Lumberjack/CEE) support

Miloslav Trmac mitr at redhat.com
Mon Oct 8 18:44:04 UTC 2012


Hello,
Thanks for the review, and I'm sorry about the late response.

----- Original Message -----
> While I agree with the effort to try to standardize on metadata to
> be collected with log messages, I can't say I like the Lumberjack
> design, or this proposed usage in libvirt.
> 
> First it is not possible for libvirt to detect whether messages it
> sends to syslog() are going to be processed by a syslog daemon which
> actually understands the JSON format. This means we can't turn on
> use of this JSON format automatically, without causing all existing
> distros to get their logs filled with JSON data.

True, but that's an inevitable consequence of being compatible with the syslog protocol.[1]

Yes, it does make the question of a default awkward.  However, considering that journal has been switched on by default, and MESSAGE= of everything sent to journal goes to syslog, is there still a chance of enabling a syslog output by default at the same time?  Doing so would cause each message to appear in syslog twice on systems that have journald.


> I also don't like the use of JSON for this purpose in the first place.
I'm afraid this is not really open to a change, however I'll try to explain why I don't see JSON as a problem.

> Application logging facilities really need to be simple, reliable, and
> safe to use in any context. For example in between a fork() and execve()
> where POSIX requires only use of async signal safe functions. Now I know
> that technically speaking syslog() itself is not async signal safe in
> general, but the way libvirt uses it for logging makes it safe(ish) in
> our case. We can not, however, imagine that JSON (or really any general
> JSON library) is going to be async signal safe.

You can also view JSON syntax as a sequence of (fixed field identifier string, field value with some characters escaped) pairs, which is pretty much the minimum standard for _any_ representation of data in a text format.  Choosing JSON was just simpler than trying to committee-invent something new.[2]

The journal protocol has a similarly inconvenient case of handling \n - it's only not as noticeable because there are fewer characters to escape, and \n is rarer.


Re: async-signal-safe use: Yes, using a JSON library is more convenient and generally preferred to writing a yet another JSON encoder, and yes, a typical JSON library will not be async-signal-safe.  However, applications are not required to use all JSON facilities.  An application that wants to log a static string between fork() and execve() can just as easily log a static JSON string.

If the multi-threaded application wanted to log something dynamic at that inconvenient time, the application author would have to write string formatting code manually in any case (sprintf() is also not async signal safe) - and I don't think JSON escaping is really difficult.  Sure, it is more difficult than no escaping, but there needs to be _some_ escaping if we want to keep the existing syslog transport.

Regarding libvirt specifically, virLogVMessage is rather far from async-signal-safe (e.g. virVasprintf(), virLogLock()), so this doesn't seem to be a concern for libvirt.


> This patch series goes to great lengths to introduce a whole series of
> new JSON APIs in libvirt, to avoid memory allocation from the heap, but
> this work is negated by the fact that the YAJL APIs used at the end all
> allocate from the heap anyway.

The major problem here was not memory allocation itself (as noted above, other parts of the log path allocate memory), but possible recursion through virReportOOMError() -> virLogMessage() and deadlock on virLogMutex.  Alocating data on the stack is primarily a way to avoid OOM, not signal safety.

In retrospect, the virJSON* data structure is unnecessarily complex for the simple list of key-type-value triples that libvirt actually needs for logging, making the code rather more complex than necessary.  I'll replace that, see the other reply.


> Finally alot of the metadata items listed in the "standard fields" doc
> above would be better filled in by the server (from SCM_RIGHTS), than
> the client, so that the server can prevent an application from lying
> about its uid/gid/pid etc.

rsyslog does add these fields automatically, and the patch set does not add them manually.  This is a bug in the FieldList wiki page, I'll get it fixed.


> By way of comparison the systemd journal APIs deal with all theses
> issues.
> 
> The actual journald client side logging API is possible to implement
> and use without doing any memory allocation at all,

virLogOutputToJournald can be implemented like that, but not the libsystemd-journal API: you'll find a fairly liberal use of alloca() in journal-send.c.  If alloca() is allowed, a JSON formatting API can be implemented without "memory allocation" as well.

> in a completely
> async signal safe manner, using only socket() + sendmsg() with a
> stack allocated iovec[] for storing individual metadata fields to be
> sent.

Looking at how libvirt currently sends journal data, is the /run/systemd/journal/socket protocol a stable ABI?  I can't find anything documenting it within the systemd git tree, and it is not mentioned at http://www.freedesktop.org/wiki/Software/systemd/InterfaceStabilityPromise .


> The API is soooo simple to use too, for example, a traditional
> syslog() call, can be replaced with something like:
> 
>     sd_journal_send("MESSAGE=%s", str,
>                     "PRIORITY=%d", prio,
>                     "CODE_LINE=%lld", linenr,
>                     "CODE_FUNC=%s", funcname,
>                     ...arbirary other key=value metadata...
>                     NULL);

There actually is a library (libumberlog) providing just such a simple API.  A corresponding call is
> ul_syslog(prio, str,
>           "code_line", "%s", service,
>           "code_func", "%s", auth_method,
>           ... arbitrary other metadata...
>           NULL);

However, I thought that libvirt would prefer not adding yet another dependency, especially when the extensions for virRaiseErrorFull require a per-field data structure in any case.[3]
    Mirek


[1] Whether JSON is desired or not also ultimately depends on the final recipient of the data, not on the system that generates it - an "old" system could generate JSON that will be forwarded to a log aggregation server.
[2] Especially because the major syslog implementations already supported the JSON format for other reasons.
[3] And, to be honest, the library does not nicely support the case of virRaiseErrorFull adding more data.  It would be possible, but ugly - we'll make it easier in the future.




More information about the libvir-list mailing list