[bitlug] Re: [Beowulf] Which is better GNU C or JAVA (for network programing)(fwd)

  • From: Peeyush Prasad <peeyush@xxxxxxxxxx>
  • To: bitlug@xxxxxxxxxxxxx, <bitcompsci02@xxxxxxxxxxxxxxx>,<kiggaboys@xxxxxxxxxxxxxxx>
  • Date: Wed, 11 Feb 2004 13:18:26 +0530 (IST)


-- 


---------- Forwarded message ----------
Date: Wed, 21 Jan 2004 11:46:15 -0500 (EST)
From: Robert G. Brown <rgb@xxxxxxxxxxxx>
To: Jakob Oestergaard <jakob@xxxxxxxxxxxxx>
Cc: prakash borade <hpcatcnc@xxxxxxxxx>, mail-plug@xxxxxxxxxxx,
     beowulf@xxxxxxxxxxx
Subject: Re: [Beowulf] Which is better GNU C  or  JAVA (for network
    programing)

On Wed, 21 Jan 2004, Jakob Oestergaard wrote:

>
> Well, the bait is out, let's see if someone bites   ;)
>

Having been accused of having early alzheimers and forgetting some silly
little symbol, what was it, oh yeah, a "++" in my even handed and
totally objective diatribe, I'll have to at least nibble:-)

>  It is better to light a flame thrower than curse the darkness.
>   - Terry Prachett, "Men at Arms"

Alas, my flame thrower is in the shop.  The best I can do is make a
nifty lamp with a handy wine bottle, some gasoline, and some detergent
flakes.  There, let me stick this handy rag in the neck like this, now
<click source="lighter"> where were we again?  My memory is failing me.

Oh yes.  The pluses.

> > There.  Let us bask for a moment in the serenity of our knowledge that
> > we have the complete freedom to choose, and that there are no wrong
> > answers.
>
> I find your lack of faith disturbing...  ;)

Trying an old jedi mind trick on me, are you?

At least your metaphor is correct.  You are indeed attracted to the
Power of the Dark Side...:-)

> > Now we can give the correct answer, which is "C".
>
> Typing a little fast there, I think...  The correct answer for anything
> larger than 1KLOC is "C++" - of course, you knew that, you were just a
> little fast on the keyboard   ;)
>
> (KLOC = Kilo-Lines-Of-Code)

Well, we COULD make a list of all the major programs involving > 1KLOC
(exclusive of comments, even) that have been written entirely in C (plus
a small admixture, in some cases, of assembler).  Unfortunately, the
list would be too long (I'd be here literally all day typing), and would
include things like the kernels of pretty much all the major operating
systems (certainly all the Unix derivatives and variants) and gcc
itself, including its g++ extension.

To be fair, that doesn't prove that your observation is wrong -- there
is a rather huge list of fortran sources available in my primary field
of endeavor, for example, which doesn't stop fortran from being a cruel
joke.  However, to be fair the OTHER way most of that fortran was
written by scientists (a.k.a. "complete idiots where advanced computer
programming is concerned") to do what amounts to simple arithmetic
problems, or maybe even complex arithmetic problems, fine, with a
trivial data interface (read the data in, write the data out).  Quite a
bit of the aforementioned C sources and operating systems and advanced
toos were written not only by computer professionals, but by "brilliant"
computing professionals.  Class O stars in a sea of A, B, F and G (where
fortran programmers are at best class M, or maybe white dwarfs).

What you probably mean is that IF everybody knew how to program in C++,
they would have written all of this in C++, right?  Let's see, is there
a major operating system out there that favors C++?  Hmmmm, I believe
there is.  Are it and its applications pretty much perpetually broken,
to the extent that a lot of its programmers have bolted from C++ and use
things like Visual Basic instead?  Could be.

This isn't intended to be a purely a joke observation.  I would rather
advance it as evidence that contrary to expectations it is MORE
difficult to write and maintain a complex application in C++.  The very
features of C++ that are intended to make an application "portable" and
"extensible" are a deadly trap, because portability and extensibility
are largely an illusion.  The more you use them, the more difficult it
is to go in and work under the hood, and if you DON'T go in and work
under the hood, things you've ported or extended often break.

To go all gooey and metaphorical, programming seems to be a holistic
enterprise with lots of layering and feathering of the brush strokes to
achieve a particular effect with C providing access to the entire
palette (it was somebody on this list, I believe, that referred to C as
a "thin veneer of upper-level language syntax on top of naked
assembler").  C++, in the same metaphor, is paint by numbers.  It
ENCOURAGES you to create blocks to be filled in with particular colors,
and adds a penalty to those that want to go in and feather.

In some cases, those paint-by-numbers blocks can without doubt be very
useful, I'm not arguing that.  It is a question of balance.  A very good
C++ programmer (and I'm certain that you are one:-) very likely has
developed a very good sense of this balance, as suggested by your
observation that a good C++ programmer writes what amounts to procedural
C where it is appropriate (which IMHO is a LARGE block of most code) and
reserves C++ extensions for where its structural blocking makes sense.

Who could argue with that?  Of course, a very good programmer in ANY
language is going to use procedural methodology where appropriate, and
create "objects" where they make sense.  We must not make the mistake of
comparing good programming practice with the language features.

Here are the real questions.  Presuming "good programmers, best
practice" in all cases:

 a) Do C's basic object features (typedefs, structs and unions, for
example) suffice to meet the needs for object creation and manipulation
in those code segments where they are needed?  I would say of course
they do.  IMO, protection and inheritance are a nuisance more often than
an advantage because, as I observed, real "objects" are almost never
portable between programs (exceptions exist for graphical objects --
graphics is one place where OO methodology is very appropriate -- and
MAYBE for DB applications where data objects are being manipulated by
multiple binaries in a set).  "protection" in particular I think of as
being a nuisance and more likely to lead to problems than to solutions.
In a large project it often translates into "live with the bugs behind
this layer, you fool, mwaahhahahaha".  Result: stagnation, ugly hacks,
bitterness and pain.  In single-developer projects, just who and what
are you protecting your structs FROM?  Yourself?

 b) Do C++'s basic object features increase or decrease the efficiencies
of the eventual linked binaries one produces?  As you say, C++ is more
than an "extension" of C, it has some real differences.  In particular,
it forces (or at least, "encourages") a certain programming discipline on
the programmer, one that pushes them away from the "thin veneer" aspect
of pure C.  I think it is clear that each additional layer of contraint
INCREASES the intrinsic complexity of the compiler itself, thickens the
veneer, and narrows the range of choices into particular channels.
Those channels in C++ were deliberately engineered to favor somebody's
idea of "good programming practice", which amounts to a particular
tradeoff between ultimate speed and flexibility and code that can be
extended, scaled, maintained.  So I would expect C++ to be as efficient
as C only in the limit that one programs it like C and to deviate away
as one uses the more complex and narrow features.  Efficient "enough",
sure, why not?  CPU clocks and Moore's Law give us cycles to burn in
most cases.  And when one uses objects in many cases, they ARE cases
where bleeding edge efficiency IS less important than ability to create
an API and "reuse" data objects via library calls (for example).  Still
I think C would have to win here.

 c) Do C's type-checking, etc. features suffice to make programming
particularly easy or safe?  Here I will give C++ a win, as the answer
for C at least is no.  C is not particularly easy and it most definitely
is not safe.  It is a THIN veneer on top of assembler, and assembler is
as unsafe as it gets (short of writing in naked machine code), so thin
that one can easily inline assembler to access and manipulate CPU
registers etc and have a pretty good idea of how to smoothly move data
around and switch from one programming mode to the other.  Again the art
metaphor is appropriate -- with color-by-numbers you are "safer" from
creating really, really ugly pictures (although it is always possible,
of course:-).  With nothing but the canvas, the paints, and a set of
brushes and palette knives you can create anything from Da Vinci or
Rembrandt to a three-year-old's picture of a dog (big blob of muddy
brown in the middle of the canvas that may or may not even have "eyes").

This is why I think that the C++ vs C issue is almost entirely
determined by an individual's personal taste and preferences, along with
(maybe) the amount and kind of code that they write.  I personally
prefer to eat my vegetables (objects) raw -- if I need a struct, I make
a struct.  If I want to allocate a struct, I use malloc or write a
constructor, depending on the complexity of the struct.  If I want to
de-allocate a struct, I either use free or I write a destructor, again
depending on the complexity (whether or not I have to recursively free
the contents of the struct, and at how many levels, how many times in
the code).  If I want to change the struct (either structurally or by
accessing or altering its contents), I change the struct, and am of
course responsible for changing all the points in my program that are
affected by the change.  If I want to "protect" the struct, well, I
don't change it, or write to it, or read it, or whatever.  My choice,
unchanneled by the compiler.

Do I have to deal with sometimes screwing up data typing?  Absolutely.
Do I have to occasionally deal with working my way backwards through my
programs to redo all my structs?  Of course (I'm doing it right now with
a program I'm writing).  This process is INEVITABLE in real-world
program design because we DON'T know a priori what the best shape is for
the data objects is until we've programmed them at least once the wrong
way, in most cases.  The really interesting question is whether or not
this PROCESS is any easier in C++ than in C.  I can't see why it would
be, but perhaps it is.  I suspect it is still mostly a matter of skill
and style (and how one apportions time between active code development
and "planning" a program in the first place) and, of course, personal
taste.

> > In order to give it, I have to infer an implicit "better than" and a
> > certain degree of generality in your question is in "which language is
> > BETTER suited to write an efficient networking program using linux
> > systemscalls, in GENERAL".
> >
> > With this qualifier, C wins hands down.  A variety of reasons:
> >
> >   a) The operating system is written in C (plus a bit of assembler)
>
> This ought not to be a good argument - I guess what makes this a good
> argument is, that the operating system provides very good C APIs.
>
> So, any language that provides easy direct access to those C APIs have a
> chance of being "the one true language".

You miss my point entirely.  The argument is that C is a thin veneer on
top of assembler -- so thin that one CAN write an operating system in
it.  Imagine writing an operating system in LISP.  Wait, don't do that.
The results are too horrible to imagine.  Imagine doing it on top of
fortran, instead.  That's bad, but you won't have nightmares for more
than a week or two afterwards. (IIRC, somebody actually did this once.)

The API issue is moot.  Hell, perl and python have direct access to most
of the C APIs.

IN THE CONTEXT of the reply, of course, there was also the suggestion
that the C APIs are a good way of writing network code since the network
drivers and kernel structs those APIs provide access to were all written
in C, so your access is pretty much "naked".  You can often read or
write directly any register or value or memory location that isn't in
the protected part of the kernel, if you dare (or need to to achieve
enough efficiency in your particular application).  But C++ I'm sure
provides the same degree of naked access and C and C++ share a common
underlying data organization, really, and even Fortran (with somewhat
different data organization) probably does pretty well.

In other languages (especially scripting languages e.g. java, perl,
python), the access is typically "wrapped" in a translation layer that
is required because one has NO CONTROL over the way the interpreter
actually creates data objects.  They are the "ultimate" in OO
programming -- instantly created by the interpreter in real time,
manipulable according to a wide set of rules, they go away when they
aren't being used without leaking memory (usually:-) but heaven help you
if you look beneath the hood and try to manipulate the raw memory
addresses.

This is a lovely tradeoff in a lot of cases, which is why I hesitate to
actually stick a rod up java's ass and barbeque it on a spit.  I really
do believe the BASIC argument associated with OO programming, that the
data and what you want to do with it should dictate program design
features, including choice of language and programming technique.  For
many problems perl or python provide a very good data interface where
one DOESN'T have to mess with lots of data issues that are a major chore
in C.  Wanna make a hash?  Use it.  Wanna make your hash into an array?
Change the little % TO A @ and hack your dereferencing a bit.  Is your
hash actually stored as a linked list, as an array of structs?  Don't
ask, don't tell, don't try to look.  All this and direct syntactical
manipulation of regular expressions, what's not to love?

This is what I was trying to keep an open mind to WRT java.  Perhaps
there are aspects of its data manipulation methodologies and programming
features that are excellent fits to particular problems.  I just don't
know, and would rather have my teeth drilled with a half-charged
portable black and decker screwdriver (allen bit) than learn YAPL just
to find out.  Hell, I'm only starting to learn python under extreme
duress as I swore perl was going to be the last language I ever learned
and that was before PHP and now python.  Somebody would have to pay me a
LOT OF MONEY to get me to learn java.  Yessir, a whole lot.

[Anybody reading this who happens to have a lot of money is welcome to
contact me to arrange for a transfer...;-)]

> I prefer to think of "C++" as "A better C", rather than a "C extension",
> as not all C is valid C++, and therefore C++ is not really an extension.

Funny that.  I tend to think of C++ as "A broken C extension" for
exactly the same reason;-)

If they hadn't broken it, then there really would be no reason not to
use it, as if C were a strict subset of C++ all the way down to the
compiler level, so that pure C code was compiled as efficiently as pure
C is anyway then sure, I'd replace all the gcc's in my makefiles with
g++'s.  That would certainly lower the barrier to using C++; C
programmers could program in C to their heart's content and IF AND WHEN
their program needs a "C++ object" they could just add it without
breaking or even tweaking the rest of their code.

Breaking C was (IMO) a silly design decision.  So was adding crap like a
completely different stdin/stdout interface, which makes most C++ code,
even that which doesn't use objects in any way, most
non-back-portable-or-compatible to C.  It is very clear that these were
all deliberate design decisions INTENDED to break C and FORCE
programmers to make what amounts to a religious choice instead of
smoothly extend the palette of their programming possibilities.  I'd
honestly have to say that it is this particular aspect of C++ more than
any other that irritates me the most.  There is a HUGE CODE BASE in C.
For a reason.

> >   c) Nearly all decent books on network programming (e.g. Stevens)
> > provide excellent C templates for doing lots of network-based things
> >   d) You can do "anything" with C plus (in a few, very rare cases, a bit
> > of inlined assembler)
>
> Amen!  This goes for C++ as well though.

Y'know, you won't believe this, but I actually added the (and C++)
references in my original reply just thinking of you...;-)

Last time we had this discussion you were profound and passionate and
highly articulate in advancing C++ -- so much so that you almost
convinced me.  Alas, that silly barrier (which I recall your saying took
you YEARS to get over yourself)... I just bounce right off of it in a
state of total irritation every time I try.  Another problem is that
every time I work with a student C++ programmer coming out of Duke's CPS
department (which now teaches C++ as standard fare) I observe that while
they are just great at creating objects and so forth they are clueless
about things like pointers and malloc and the actual structure of a
multidimensional array of structs and how to make one and manipulate its
contents.  As a consequence I have to spend weeks teaching them about
all of the virtues of "C as a thin veneer" in order to get them to where
they can cut advanced code at all, in either one.

You may not have encountered this as you did C first.  Or maybe you
never have cause to allocate triangular arrays of structs or to build
your own linked list of structs, or maybe C++ does this and eats your
meatloaf for you and I'm just to ignorant to know how.

Its funny.  Back in the 80's all the CPS departments taught pascal
because it provided strong type checking and absolutely forced one to
use a rigorous bottom-up program design methodology.  Precisely the same
features that caused all real programmers to run from it screaming, as
somehow all the programmers I've ever met work top down initially, then
top, bottom, middle, whereever until the project is finished, often with
a few epiphany-driven total project reorganizations in the middle as
experience teaches one what the REAL data objects are that they are
manipulating and whether they need easily maintained code that can run
relatively slowly or whether they need the core loop to run faster than
humanly possible if they have to hand-code it in assembler to get it
there.

Now they teach C++, and as you observed, teach it badly.  I humbly
submit that the REASON they teach it badly is that they aren't teaching
"C++" at all, they are teaching structured programming and programming
discipline with a language that simply doesn't permit (or if you prefer
encourage) the student to use methodologies with far more power but far
less externally enforced structure.  Pure pedagogy, in other words.

I personally think the world would be a better place if they FIRST
taught the students to code in naked C with no nets, and taught them
that the reason for learning and using good programming discipline is
BECAUSE the bare machine that they are working with comes with no nets.

Then by all means, teach them C++ and object oriented design principles.
I suspect that students who learn C++ in this order are, as you appear
to be, really good programmers who can get the most out of C++ and its
object oriented features without "forgetting" how to manipulate raw
blocks of memory without any sort of OO interface per se when the code
structure and extensibility requirements don't warrant all the extra
overhead associated with setting one up.

> C, in my oppinion, would be somewhat like C++, except for larger
> problems it doesn't fare qute as well (not poorly by any means, just not
> as well).

For CERTAIN larger problems, you could be right.  I do consider things
like writing operating systems and compilers to be "larger problems"
though, and C seems to do very well here;-) Your arguments from last
time were very compelling, as I said.

> The only very very large problem with C++ is, that almost no people know
> the language.  There is a mountain of absolutely crap learning material
> out there.  This is why you see examples of "C versus C++" where the C++
> code is several times larger or even less efficient than the C example,
> because the author felt that shovelling bad OO and braindead design
> after the problem seemed like a good idea.

Again, I think that is because most C++ programmers learn C++ >>first<<
as part of a course on structured programming, which teaches them to use
C++ the way it was "intended" to be used.  If one learns C first, on the
other hand, you already KNOW how to write tight code and can freely
switch over to using C++ constructs where they make sense.  In fact,
since you actually understand what a struct IS and how data allocation
WORKS (a thing that, believe it or not, is largely hidden from the
student in most C++ classrooms as this is precisely the under-the-hood
stuff considered anathema in a world where programmers are expected to
be commodity items and write commodity --interchangeable -- code) you
can probably even do very clever end runs around most of the silliness
that they teach as good practice whenever and whereever it suits you.

> I believe that with the state of compilers today, nobody should have to
> start a large project in C - except if it absolutely needs to run on a
> platform for which no decent C++ compiler is available (maybe Novell
> NetWare - but that's the only such platform that comes to mind...)

Give me a few days paring C++ down to the bare minimum extension of C so
that it is a pure superset (so ALL C CODE just plain compiles perfectly
and even produces the same runfile, so that the I/O language
"extensions" are moved off into a library that you can link or not,
mostly not, and so that using things like classes and protection and
inheritance are suddenly pure and transparent extensions of the basic
ideas of structs and unions and typedefs, and I wouldn't even argue.  In
fact, I'd say that the argument itself becomes silly and moot.  One
wouldn't even have to change Makefiles, as C++ would just be a feature
of gcc that is automatically processed when and where it is encountered.
So the kernel would suddenly be written in "C++", sort of, and there
would be a very low barrier to converting C into C++ because basically,
there would be no conversion at all, only extension.

This of course will never happen.  Until it does, then an excellent
reason to choose C instead of C++ is if you happen to know C very well
but not know (or care for) C++ as well.  Or if your project is derived
from a code base already written in C, and you don't feel like going
back and fixing everything to make a C++ compiler happy with it
(especially when doing so might well "break" it so that a C compiler is
no longer happy with it).

Note that this isn't a "theoretical" argument about should -- it is a
real-world argument on why the choice you describe is NOT made, over and
over again, by most people writing code (judging strictly on the basis
of the number of GPL projects written in C vs C++).  Even in a world
where most schools have been TEACHING C++ for close to a decade, the
majority of people who become computer scientists and systems
programmers and to work on the guts of computer systems underneath the
hood seem to at some point crossover to C and stay there, while the
majority of people who stick with C++ end up programming for Windows for
some big corporation (which might well be Microsoft) -- and produce
shitty, broken, huge, inefficient code as often as not.

My next door neighbor is an interesting example that comes to mind.  He
is a professional programmer.  A decade ago he taught CPS at Duke, but
got irritated because he had to teach the students to program in C++,
and in such a way that they never learned how data structures really
work.  Believe it or not, I've had to actually teach SEVERAL students
who have FINISHED the intro computer courses here just how memory on a
computer works -- it is deliberately taught in such a way that one
DOESN'T learn that, one learns instead to "think" only about the
compiler-provided memory schema.  He once spent a whole afternoon in my
yard ranting at me about how a struct or union was all the object
support any programmer could ever need.  So he quit and took over
Scholastic Books software division, made a modest fortune, and STILL
lives next door writing software and clipping coupons.  Other faculty I
know seem to view C++ the same way -- a good thing to use to teach
students because it forces them to code a certain way, but THEY don't
use it for THEIR research projects.

Professionals seem more often than not to prefer having the entire
palette and all the brushes, and don't like even MAKING color by numbers
codelets.

This is very definitely not exhaustive or definitive.  I also know
students where they exact opposite is true -- they learned C++, learned
about data structs and malloc and C, and continue to code in C++ (as you
do) with strictly enriched abilities.  Of course they are (like you;-)
Very Bright Guys...and could probably even write good Fortran code if
they ever learned it.  No, wait, that is an oxymoron -- couldn't happen.

> Seriously though, I think that the language-flamewars are fun and
> informative, since so much happens in the space of compilers and
> real-world projects out there.  So, I think it's useful to get an update
> every now and then, from people who have strong feelings about their
> langauges - oh, and a discussion in the form of a friendly flame-fest is
> always good fun too   ;)
>
>  / jakob

I agree, and hope you realize that I'm deliberately overarguing the case
for C above for precisely that reason.  I really do believe that it is
as much a matter of taste and your particular experience and application
space as anything else and wouldn't be surprised if some java coder DOES
speak up for java as that's what THEY are really good at and they've
discovered some feature that makes it all worthwhile.

We've already heard from the python crowd, after all, and THAT'S a
language where they even eschew such obviously useful constructs as {}'s
and line terminators such as ; in favor of indentation as a matter of
deliberate design/dogma.  It really does come down in the end to
religious belief/personal taste.

In the best spirit of tongue-in-cheek, I cannot do better than end with:

  http://www.phy.duke.edu/~rgb/Beowulf/c++_interview/c++_interview.html

Read it and weep...:-)

   rgb

-- 
Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@xxxxxxxxxxxx



_______________________________________________
Beowulf mailing list, Beowulf@xxxxxxxxxxx
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


Other related posts: