OT: Requesting C advice
Matthew Saltzman
mjs at clemson.edu
Fri Jun 1 23:02:15 UTC 2007
On Fri, 1 Jun 2007, Les wrote:
> On Fri, 2007-06-01 at 07:36 -0400, Matthew Saltzman wrote:
>>
>>> I know why their programs failed. I also know that C uses a pushdown
>> ^some particular
>> implementations of
>>> stack for variables in subroutines. You can check it out with a very
>>> simple program using pointers:
>>>
>>> #include <sttlib.h>
>>>
>>> int i,j,k;
>>>
>>> main()
>>> {
>>> int mi,mj,mk;
>>> int *x;
>>> mi=4;mj=5;mk=6;
>>> x=&mk;
>>> printf ("%d %d %d\n",*x++,*X++;*X++);
>>> x=&i;
>>> printf ("%d %d %d\n",*x++,*x++,*x++);
>>> i-1;j=2;k=3;
>>> printf ("%d %d %d\n",*x++,*x++,*x++);
>>> )
>>>
>>> Just an exercise you understand. compile and run this with several c
>>> packages, or if the package you choose supports it, have it compile K&R.
>>> and try it.
>>
>> Of course, several constructs here are undefined, so there is no such
>> thing as "correct" or "incorrect" behavior.
>>
>> After correcting obvious typos and adding #include <stdio.h> so it would
>> compile, I got (using gcc-4.1.1-51.fc6 with no options):
>>
>> $ ./a.out
>> 5 4 6
>> 0 0 0
>> 0 0 0
>
> OOPS, forgot to reset the X pointer between the last two print
> statements. This bit of code is intended to show that globals are on a
> heap and locals are on a stack.
Fixed that. Now I get:
$ ./a.out
5 4 6
0 0 0
0 2 1
But I confess, I don't see how this code proves your point. It does
demonstrate that globals are initialized by default, though.
>
>>
>> Was that what you were expecting?
>>
>>
>>>
>>> I cannot vouch for every compiler, only Microsoft, Sun, and Instant C
>>> off the top of my head. I have used a few other packages as well. But
>>> any really good programmer NEVER relies on system initialization. It is
>>> destined to fail you at bad times.
>>
>> How much effort are you willing to expend to defend against potentially
>> buggy compilers (as opposed to undefined or implementation-defined
>> behaviors)? The Intel fdiv bug would seem to prove that you should NEVER
>> rely on arithmetic instructions to provide the correct answer. There's an
>> economic tradeoff between protecting yourself from all conceivable errors
>> and actually getting work done.
>>
>
> There is a difference between implementation differences and hardware
> errors, which was the microsoft error. They had
> a bug in their silicon compiler that caused that IIRC.
I could just as easily reference some other obscure compiler bug or
implementation-defined behavior and make the same point. The thing about
a standard is that there are clear requirements about what is
implementation-defined and what is not. Static initialization in ISO C is
not one of those implementation-defined things.
I will concede that explicit initializations--even to default
values--might be a useful self-documentation tool.
>
>>> One case is as has been pointed out
>>> here, that NULL is sometimes 0, sometimes 0x80000000, and sometimes
>>> 0xffffffff. Even NULL as a char may be 0xFF 0xFF00 or 0x8000 depending
>>> on the implementation. But strings always end in a character NULL or
>>> 0x00 for 8 bit ascii, if you use GNU, Microsoft, or Sun C compilers.
>>> They may do otherwise on some others. It can byte (;-) you if you are
>>> not careful.
>>
>> In your source code, NULL is *always* written 0 (or sometimes (void *) 0
>> to indicate that it's intented to stand for a null pointer value, not a
>> NUL character value). The string terminator character is *always* written
>> '\0'. The machine's representation of that value is immaterial. If you
>> type-pun to try to look at the actual machine's representation, your
>> program's behavior is undefined and you deserve what you get. It's the
>> compiler's responsibility to ensure that things work as expected, no
>> matter what the machine's representation is. (For example, '\0' == 0 must
>> return 1.)
>>
>
> '\0' is an escape forcing the 0, so of course this will be equal.
OK. But the main point is that it doesn't matter what bit pattern
represents a null pointer. Your source code will always use the value 0
to represent it. For example,
int *p;
/* ...code that sets p... */
if ( p == 0 ) /* *not* if ( p == 0x80000000 ) or
if ( p == 0xffffffff ) */
{ /* ...handle null pointer value... */ }
>
>>>
>>> And since that is so, how are those variables initialized? and to
>>> what value? What is a pointer set to when it is intialized. Hint, on
>>> Cyber the supposed default for assigned pointers used to the the address
>>> of the pointer. Again, system dependencies may get you.
>>
>> Pre-ANSI/ISO compilers might have initialized static memory to
>> all-bits-zero even when that was not the correct representation of the
>> default for the type being initialized. ANSI/ISO compilers are not
>> allowed to do that. The required default initializations are well
>> defined. (This is the sort of thing that motivates the creation of
>> standards in the first place.)
>>
>>>
>>> And those systems that used the first location to store the return
>>> address are not re-entrant, without other supporting code in the
>>> background. I think I used one of those once as well.
>>
>> There's no requirement for re-entrancy in K&R or ANSI/ISO. In fact
>> several standard library routines are known to not be re-entrant.
>>
>
> This is true, but knowing that the base code is not reentrant due to
> design constraints or due to hardware constraints makes the difference
> on modern multithreaded systems, where the same executable memory can be
> used for the program (if the hardware allows that).
Sure, you need to know that you can compile re-entrant code if you need
it.
>
>>>
>>> PS. A stack doesn't necessarily mean a processor call and return
>>> stack. It is any mechanism of memory address where the data is applied
>>> to the current location, then the pointer incremented (or decremented
>>> depending on the architecture).
>>
>> But usually in the context of discussions about compiler architectures,
>> call stacks are exactly what is meant.
>>
>
> I am not sure that is true, because in some implementations, the data
> heap and stack are in the same segment of memory, while the runtime
> stack for the processor is somewhere else. For high security systems
> running this should be a requirement. It prevents obvious means of
> inserting malicious code through variable initialization, and then stack
> manipulation. I say should be, because it has been tossed around from
> time to time, but I am unsure if it has ever been formalized.
>
> One system I worked on looked like this:
> init jump
> heap
> variable stack (push down)
> program entrance
> program
> local libraries
> relocation table
> symbol table (if not removed)
> machine stack
>
> Unfortunately I no longer remember which system that was. Just the
> fact that some standard libraries at that time would not run on it
> because they did manipulate the stack.
>
> Regards,
> Les H
>
--
Matthew Saltzman
Clemson University Math Sciences
mjs AT clemson DOT edu
http://www.math.clemson.edu/~mjs
More information about the fedora-list
mailing list