[Freeipa-devel] [PATCH] initial commit of log watcher (lwatch)

Fri Jul 24 14:55:31 UTC 2009

>> 10)  Such code will not produce right results since there are multiple
>> ways why function can return NULL and not in all cases
>> the errno is set.
>>
>>          if ((pw = new_path_watch(existing_directory_ancestor)) == NULL) {
>> +            error = errno;
>> +            log_msg(LOG_ERROR, _("%s: could not allocate new watch for
>> \"%s\" (%s)\n"),
>> +                    __FUNCTION__, existing_directory_ancestor,
>> error_string(error));
>> +            return error;
>>
>> Better approach to such cases would be either
>>
>> pw = new_path_watch(existing_directory_ancestor,&error);
>> if (!pw) {
>> ...
>>
>> or which I prefer more:
>>
>> error = new_path_watch(&pw, existing_directory_ancestor);
>> if(error) {
>> ...
>
> This second approach is more in line with the rest of the code in the
> SSSD. We do this all over the code. Also, our preference is for this
> error code to remain in line with the errno set, so things like ENOENT
> and ENOMEM are standard. (We also defined EOK which is identical to
> EXIT_SUCCESS for convenience)

Hopefully you'll have noticed the 2nd approach, the preferred one used 
in SSSD, is the one I used for virtually all function definitions. The 
function cited above is an anomaly. It's good your review caught it 
(BTW, the reason it was coded that way originally was it was originally 
just an "allocator" function whose only failure would have been no 
memory, but it grew new functionality and it should have been modified 
at that point to return an error status just like everything else does.

>> 12) In search for some functions I looked at path_utils.
>> First in general I am a bit scared about using static buffers to hold paths.
>> Is it still an Ok practice? In the SSSD all code is allocated in
>> collections
>> and ini the only place  where the static buffer was used was some
>> parsing of the
>> file where the reasonable length of the line in the ini file is less
>> than 100 bytes
>> while buffer is about 64K. Should we use dynamic memory for paths?
>> Start with malloc(MAX_PATH) and then realloc if needed?
>>
>
> The idea behind MAX_PATH is that it's defined on every system to be the
> largest legitimate value of a path that the system can handle
> internally. malloc(MAX_PATH) doesn't gain anything over path[MAX_PATH]
> except a need to manage the memory. I generally prefer, however, only to
> allocate memory as-needed. This will come in handy in the future if
> anyone should want to include this daemon on a limited-memory device
> (such as a Linux-based router). That said, for speed and ease of use,
> you can't beat this static buffer. I just balk at always using a full
> memory page for every path, when in most cases you're going to need<
> 100 bytes. Considering that the whole purpose of this tool is to easily
> manage large numbers of paths, that 4kiB (on Linux; 8kiB or more on some
> other systems) can really add up.

I think one can make two equally valid arguments with respect to storing 
path names. Pre-declaring a variable or structure member of sufficient 
size to hold any pathname has the advantage of simplicity, simplicity 
tends to lead toward more robust and efficient code (less mistakes to 
make and fewer cycles spent trying to optimize memory use).

However, Stephen is right on the money when he observes in most cases 
the bulk of the memory allocated won't be used and this *may* be a 
liability.

First of all one needs to be aware that much of the use of the construct 
"path[MAX_PATH]" occurs in automatic variables on the stack. In this 
context it's hard to argue the unused memory in the string is a 
liability, the memory comes into existence and is freed by a simple 
adjustment of the stack pointer and has a very short lifetime (yes, it 
does cost extra stack space). The alternative is more complex memory 
management, either alloca on the stack or from the heap, both require 
more complex logic with a greater opportunity for program error and many 
more processor cycles (don't forget these path variables only come into 
existence because we need to manipulate them which means one must always 
be doing bounds checking and reallocations on virtually all operations).

There are places where buffers of size MAX_PATH are members of heap 
allocated objects. These are better candidates for memory optimization, 
however it does incur much more complex program logic. In fact the 
problem really reduces to:

Dynamic String Objects

E.g. in C++ std::string. It's not just a matter of dynamic buffer 
management, it also requires functions to be bound to the underlying 
buffer which can perform various string operations.

To the best of my knowledge we don't have a library available to us in C 
which has been blessed for our use which implements this. Just like we 
were required to write our own hash table implementation we would be 
required to write our own dynamic string library.

So when the choice came as to whether to write a new string library or 
declare paths using MAX_PATH and utilize the existing clib string 
functions (plus some customs functions) it seemed to me to be clearly a 
win to opt for simplicity, processor efficiency, shorter development 
time and proven library correctness.

That's not to say there isn't a good argument on the other side too. 
Perhaps it would be possible to use macros in the current implemention 
which would allow switching to dynamic operations later down the road.

Of course we could also just wait and see if it's an issue in practice. 
The golden rule of optimization is *not* to optimize in the beginning, 
rather keep things simple, concentrate on program correctness, and only 
later come back and tweak specific problem areas.

-- 
John Dennis <jdennis at redhat.com>

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/