Compiling -- gcc -- Lex & Yacc

Rick Stevens ricks at
Tue Jul 22 21:12:00 UTC 2008

William Case wrote:
> Hi Rick and Kevin;
> On Tue, 2008-07-22 at 10:56 -0700, Rick Stevens wrote: 
>> William Case wrote:
>>> Hi;
>>> I am working my way through the compiling process.  I want to be precise
>>> about my question so that responders do not waste time on answering the
>>> wrong question.
>> cpp, was CREATED using lex and yacc (or more correctly, their Gnu
>> replacements flex and bison), but it doesn't use them itself.  cpp
>> expects C or C++ source, with appropriate escapes to put in in-line
>> assembly code.  It does eventually call the assembler ("as") and the
>> linker ("ld").  Your makefile may also send the code to the librarian
>> ("ar") to create libraries.
> You and Kevin have corrected some of the terminology and mis-perceptions
> about what process is doing what.  I thank you for that.  But it still
> begs the question.  How do you know that?
> Is it simply randomly acquired knowledge or is there some source you or
> I can go to and see what is being done?  An in depth explanation of each
> process I can and have googled for or read about.  It is the source of
> the kind of gcc/compiler information that you report, I seem to be
> unable to locate.

My knowledge is from 35+ years of experience, plus some (quite old)
schooling back when we still chiseled stuff on rocks and had teachers
with large whips.

> Again as I said, I can locate *lots* of generalized information about
> compiling but info that is particular to my system seems unlocateable.
> For example, the gnu site offers me the same 'info gcc' document that I
> already have -- but nothing additional.
> Another way of saying it.
> You seem to say that cpp which I have read as the preprocessing process
> in a C compiler has been expanded to include code that does lexicon and
> parsing analysis for gcc.  

cpp does C and C++ parsing specific to the language.  The parser that
does that can either be hand coded or generated by lex/flex based on
rules that you create.

Different languages use different mechanisms.  Typically, C/C++ parse
C/C++ source code and generate assembly code that will perform the
intended operations.  They hand the assembly code off to an assembler
to generate the actual machine code.  The linker/loader either links the
code to the appropriate libraries at runtime (thereby using shared 
libraries) or create a "statically linked" executable that is not
dependent on having the libraries available at runtime.

For example back in the 80's, Whitesmith's C compiler consisted of
three parts: the C preprocessor (cpp) which tokenized the source into
pieces that could be handled by the second phase, "cp1".  cp1 did the
actual syntax checks and such, did optimizations and generated yet
another set of tokens that were processed by the third phase, "cp2".
cp2 came in different versions for different target machines, "cp2-11"
was used on PDP-11s, "cp2-vax" was for VAX-11 machines, "cp2-386" was
for Intel machines, etc.  They generated the actual assembly code for
the various assemblers and actually invoked the assemblers and linkers
according to the architectures they were designed for.

There are also "hybrid" systems, such as Java.  Java is a "compiling
interpreter" (as was UCSD P-System Pascal), where the compiler generates
opcodes for a pseudo machine (one that doesn't even really exist).  The
runtime system (the java virtual machine or "JVM") executes those
opcodes in an interpretive manner.  The nice thing about pseudo-code
systems is that the pseudocode is portable between architectures...the
JVM is the only thing that actually has to know how the hardware works.

BASIC has been available in both an interpreter and a compiler form for
many years.  You could run the interpreted version of the program until
you got the bugs worked out, then you could compile it into machine code
so it'd run faster.

> "cpp, was CREATED using lex and yacc (or more correctly, their Gnu
> replacements flex and bison), but it doesn't use them itself.  cpp
> expects C or C++ source, with appropriate escapes to put in in-line
> assembly code."
> If so where is an official site that says so and that perhaps links to a
> fuller description of what this expanded cpp does?  If I don't have the
> above quite right I will get it sorted out eventually.  It is the source
> of your compiler knowledge I am trying to find.

You have to keep in mind that things like lex and yacc are tools used to
design and implement compilers.  They can be used for other things.  For
example, building VS-FTPD from source uses flex/bison to generate its
configuration file parser.

I think many will agree that the seminal basic textbook for this is
"Principles of Compiler Design" by Alfred Aho and Jeff Ullman.  It
covers the concepts of compiler design, parser design, state machines
and a number of other areas such as the tools used to build those bits
(lex/flex and yacc/bison).  The book is often called the "Dragon Book,"
since the covers on most editions depict a representation of St. George
slaying a dragon.

> Perhaps there is no such site, which would strike me as odd because the
> gcc compiler is so important to making the C code my system uses,
> usable.

You really don't need to know all the guts of it, just how to use it.
Think of it like the powertrain of your car.  Most people don't fully
understand exactly how the engine, transmission, and differential work,
but they can drive a car.  Compilers and the like are sorta like that.

Do what the old lady said, "If it's going to happen anyway, just lay
back and enjoy it."
- Rick Stevens, Systems Engineer                       rps2 at -
- Hosting Consulting, Inc.                                           -
-                                                                    -
-              Careful!  Ugly strikes 9 out of 10 people!            -

More information about the fedora-list mailing list