[joho] JOHO - Oct. 15, 2004

  • From: "David Weinberger" <self@xxxxxxxxxxx>
  • To: <joho@xxxxxxxxxxxxx>
  • Date: Sat, 16 Oct 2004 17:55:26 -0400

Journal of the
Hyperlinked Organization 
October 15, 2004
Editor: David Weinberger (self@xxxxxxxxxxx)

To unsubscribe: send a message to
joho-request@xxxxxxxxxxxxx with "unsubscribe"
in the subject line.
For the fully glorious illustrated and
hyperlink-saturated online version of JOHO, please
To view this issue correctly, please use a
monospaced font such as Courier and stretch your
window until it all makes sense.

  | CONTENTS                                    |
  |                                             |
  | SERVERS: Will facts become as interesting   |
  | as styrofoam peanuts?                       |
  |                                             |
  | THE END OF DATA: In the new world of        |
  | classification and categorization, data     |
  | and metadata are indistinguishable.         |
  |                                             |
  | WALKING THE WALK: O'Reilly's foo camp is    |
  | brilliant marketing in which the product    |
  | is never mentioned                          |
  |                                             |
  | COOL TOOL: Open source Audacity sounds good |
  |                                             |
  | WHAT I'M PLAYING: Far Cry                   |
  |                                             |
  | EMAIL: How much of an anti-Semitic          |
  | misogynist was Melvil Dewey?                |
  |                                             |
  | BOGUS CONTEST: Name the metadata bundles    |
  | discussed in "The end of data" article      |
  |                                             |
  | I've started a series of discussions at     |
  | the Harvard Berkman Center, just about      |
  | every other Wednesday evening at 6pm. The   |
  | first one was on facts and the locus of     |
  | authority on the Web. The next one will be  |
  | on Nov. 3, so we'll undoubtedly talk about  |
  | whether technology can do anything to save  |
  | save us. It's free, open to the public,     |
  | we serve pizza. See you there?              |
|           ELECTION CHAT                             |
|                                                     |
|I set up an IRC chat during the last                 |
|presidential debate, and about 50 people             |
|jumped on board, out-snarking one another            |
|about the candidates. (Note: This was not a          |
|fair and balanced crowd.) Kevin Marks [1], one       |
|of the participants, then surprised us by            |
|posting a QuickTime movie [2] that plays the         |
|audio of the debate and shows the chat,              |
|synchronized with the audio. Lots of bad             |
|language and comments we regret. Ulp.                |
|                                                     |
|I plan on setting up another chat on election        |
|night. You're invited. Check my blog [3] for         |
|details.                                             |
|                                                     |
|[1] http://epeus.blogspot.com/                       |
|[2] http://homepage.mac.com/kevinmarks/johodebate.mov|
|[3] http://www.hyperorg.com/blogger                  |

The Wikipedia had to freeze the George W. Bush entry
[1] a few weeks ago because people were altering it
to suit their political viewpoints at an alarming
rate. So, the editors pared the page down to the non-controversial "core" of
facts. There was still a lot of information there -- much more than merely
"He was born, he drank, he became president" -- and occasional
acknowledgements of controversies, such as whether Bush satisfactorily
completed his National Guard service.

But, most interesting to me, towards the top, on the
right, the Wikipedia ran one of the staples of its
biographical entries: A fact box.[See Web version
for a screen capture.]

I find this two-tiered view of facts, quite common
in reference works, fascinating. And in the context
of a bottom-up work such as the Wikipedia, in the
midst of a dust-up over what constitutes a
factual account of the life of W, you have to ask:
What's happening to facts?

                        * * *

I don't like facts and I never have.
Psychologically, metaphysically and sociologically,
I'm uncomfortable in their stern, disapproving,
Cheney-like presence.

Psychologically, I freeze when I have to recite one.
They are, for me, simply opportunities to be wrong
in public. My hesitation is noticeable, leading
people to think I must be struggling to make up the
fact, which actually is frequently the case. That's
why JOHO has been 100% fact free since it's
inception. That's my pledge to you.

I also have a metaphysical problem with facts. Of
course I understand that there's a real world that
existed before I was born and into which I will be
buried (or smudged, depending on the cause of my
demise). But facts aren't the same thing as reality.
They are one way reality -- the way the world is
apart from our awareness of it -- shows itself to
us. Without us, the universe would carry on fine,
but facts wouldn't emerge from the darkness. Because
experience is cultural, facts are cultural
artifacts: They're expressed in language, they have
a grammar, they are deeply contextual. Facts don't
like us saying that, but it's true: "The Titanic
sank in 1912" is only a fact because of a context
that implicitly includes an understanding of how
names stand for things, a decision to mark time by
trips around the sun, a convention that numbers
years from the birth of a guy I don't care much
about, and a historical-cultural context that says
that the sinking of a large ship is worth making an
explicit proposition about.

Now, you probably snort at that line of thought
because you think I'm running from the pure, brutal
"Look, it happened!" that facts express. But I'm
not. It was sad when the great ship went down (down
to the bottom of the...), and it happened on a date
we agree on. But facts are not context-free meteors
that slam into our planet unbidden. They are instead
a way of conjuring up the world in one of its
infinite facets. They are a way of speaking, a form
of rhetoric, and thus should not be treated as if
they are the end-all of thought and discussion. But, sociologically, that's
often how they're used: They are the knuckle sandwich of rhetoric. Facts
are, of course, peculiarly important, but they are not the only peculiar and
important things we say to one another. And they are not quite as
reality-based, muscular and manly as they pretend. Inside every fact is a
value struggling to get out.

So, when the Web started heating up the Internet, I
was among those who thought that we were going to
see a merging of voice and facts, and, more
particularly, voice and objectivity. (Objectivity is
the mood in which we get all factual.) To a greater
extent than I'd hoped, that's happening: Just read
your 50 favorite blogs. Many Big-Time Journalists go
to absurd lengths to hide their political sympathies
-- one editor boasts he doesn't even vote -- but it's
reversed on blogs: If we don't know who you're
voting for, how can we trust what you write?

And yet...There are classes of facts I don't want
wrapped in voice. If I post a question about the
battery life of a laptop, I'll trust the people who
write in response more than I trust the computer
company's site, but I trust the company site more
for the dimensions of the machine. The company is
liable for its answer in a way that a random blogger
isn't; if I have to buy a new carrying case because
the number was wrong, the blogger can say, "Sorry,
dude, I misread the measuring tape," whereas I'll
expect the company to compensate me one way or

Similarly, I count on mainstream newspapers to
provide fact-based stories that "cover" an event: I
don't expect in the foreseeable future to be
counting on webloggers to tell me how many troops
attacked Samara, how this was coordinated with other
simultaneous battles, or how many civilians were
killed. Of course I expect bloggers to fact check
the media's ass but good, which implies that I don't
have full confidence in the media's ability to
deliver the facts. (PS: there's no such thing as
"the" facts because which facts are relevant is not
itself a matter of fact.) But covering events seems
to require the type of centralization that only a
news bureau can provide. (Hint: Any sentence of mine
that of the form "only a _____ can provide" is
likely to turn false particularly quickly.) Further,
news organizations stand behind their stories in a
way that someone talking over the virtual back fence
doesn't have to. (Of course, sometimes the news
media stand behind their stories Rather longer than
they should.)

The role of facts in discourse may look immutable,
but it is exactly the sort of thing that can change;
I've been reading Foucault recently and it's
startling how such deep structures can transform
rapidly.(It's also startling how unbelievably
brilliant Foucault was.) I don't know what will
happen, but my hunch is that we are heading towards
commoditizing facts, driving down their value so
that they don't provide differentiating value. For
example, take the table of Bush facts at the
Wikipedia. With the right API, the Wikipedia could
become a Fact Server that delivers the undisputed
facts about any of its 1,000,000+ topics to any
application that asks politely, making facts cheaper
than popcorn.

Now, it would be irresponsible for a fact server to
serve up dubious or putative facts, but if it only
serves the commoditized facts, it won't have all
that much value. So, perhaps fact servers will
deliver facts along with metadata about how reliable
the facts are: It's 0.99 certain that Bush was born
in 1946 but it's 0.4 that he completed his National
Guard duty. Will this sharpen the line between the
two tiers of facts -- the reliability of lower-class
facts will always be the subject to argument while
0.99s are beyond serious dispute -- or will it tar
all facts with the welcome brush of human

There are bunches of other questions, many of which
take on an Hegelian cast. For example, the Wikipedia
fact box gives Bush's date of birth but not his
race. That's because our culture does not count race
as relevant (haha!), and, no, you can't always tell
from the photo. The Wikipedia fact box also does not
state who W's parents are, yet in some cultures
knowing who your parentage is as important as
knowing the year you were born. But, if Wikipedia
acts as a fact server, it won't have to decide which
0.99 facts to include in the fact box. It will
simply serve up all facts the requesting app wants.
Thus, Bush's date of birth, race and parentage will
show up as equal; if your culture values parentage,
your app will make a big deal of that. If some other
culture considers listing the date of birth to be a
type of ageism, its apps will ignore that datum.
Undoubtedly, some app will find intense value in the
0.99 fact that Bush is white. So, the
commoditization of facts may result in the formation
of cultural fact boxes that divide us on the basis
of a consensus core of 0.99s that we all agree on:
Cultures united in a core of commoditized facts from
which they select the fact boxes that divide us.
Weird. Or is it the way the world has always
implicitly worked?

The delivery of facts with probabilities as part of
them could lead to unpredictable consequences.
Building doubt into facts could transform their
rhetorical and social role. Will we recognize facts
as being as perpetually subject to argument as are
opinions? Will their source of authority become an
integral part of them, as opposed to being an
outside reference? Will the recognition that they're
socially conditioned degrade them so that all facts
are equal, no matter how contradictory or stupid --
appending a huge "Whatever!" to all factual
discussions? Are we heading towards a more
sophisticated, nuanced way of thinking that will put
facts in their place, or towards a new age of
stupidity and obstinacy? And in the new world of
facts, what will be the sound of voices conversing
and voices testifying?

I believe we are currently inventing a new and
important life for facts. We just don't yet know
what it will be.

[1] http://en.wikipedia.org/wiki/George_W_Bush


To forestall rants about how I don't believe in
facts and think that, for example, the date the
Titanic went down is subject to debate, let me state
for the record: The Titanic sank on April 15, 1912.
We should reject any explanation of facts that lets
someone claim that the date of its sinking is up for
grabs, relative or unknowable. Facts are crucial in
disciplines I care a lot about, including science
and journalism. Nevertheless, facts are form of
understanding and a form of rhetoric, and thus they
are always infected with slimy humanity.


Here's an idea for the book I am perpetually working
on working on. (No, that's not a typo. I've been
working for over a year on a proposal that would
enable me to work on the book.)

There used to be a difference between data and
metadata. Data was the suitcase and metadata was the
name tag on it. Data was the folder and metadata was
its label. Data was the contents of the book and
metadata was the Dewey Decimal number on its spine.
But, in the Third Age of Order (see the previous
issue [1]), everything is becoming metadata.

For example, imagine you're at a large corporation
doing a Third Order treatment of its digital library
of research articles. Instead of (or, in addition
to) designing a large, complex, hierarchical
taxonomy, you focus on adding enough metadata to
each article so that people will be able to sort and
classify them any which way they want. If someone
wants to find all the articles that talk about
hydrocarbons written in Italian in 1965 and that
have more than 30 footnotes, they'll be able to. If
someone wants to make a browsable hierarchy based
not on topic but on gender or on the number of co-
authors, they'll be able to. You build enriched
objects first so users can for ever after taxonomize
the way they want to, instead of the way you think
they'll want to.

Now take a closer look at these information objects.
They look like contents tagged with lots of
metadata, but in fact they're all metadata. If I'm
looking for an article about hydrocarbons written by
Barbara Rodriguez, then the article's topic
("hydrocarbons") and author's name ("Rodriguez,
Barbara") are metadata, and the content is the data.
But, I could just as well be trying to remember the
name of the author who wrote an article that
included the phrase "Hydrocarbons are the burros of
the the cosmos" sometime in the 1960s, in which case
the content and date are metadata and the author's
name is the data. What's data and what's metadata
depends on the person doing the asking.

So, in the Third Age of Order, all data is metadata.
Contents are labels. Data is all surface and no
insides. It's all handles and no suitcase. It's a
folder whose content is just another label. It's all
sticker and no bumper.

Why does this matter? It changes the primary job of
information architects. It makes stores of
information more useful to users. It enables
research that otherwise would be difficult, thus
making our culture smarter overall. But, most
interestingly (at least to me), this does the ol'
Einsteinian reverse flip to Aristotle. Aristotle
assumed that of the 10 categories by which one could
understand a thing, one must be primary: Where that
thing fits into the tree of knowledge. So, you could
say that Alcibiades is made of flesh or lived in
Greece, but if you really want to understand him,
you have to say that he is an animal of a particular
kind. But, now that everything is metadata, no
particular way of understanding something is any
more inherently valuable than any other; it all
depends on what you're trying to do. The old
framework for knowledge -- and authority -- are
getting a pretty good shake.

Right? Wrong? Old? Obvious? Pointless? Stop me
before I make a fool of myself to someone not as
nice as you...

[1] http://www.hyperorg.com/backissues/joho-sep03-04.html

My friend Robert Morris who teaches computer science
at U. Mass Boston, and who has always been
unnecessarily generous to me with what he knows,
says the above is pretty much old news:

        The short answer is that in the business, nobody
        anymore contends there is a diffference between
        data and metadata other than in a context such
        as you mention, namely the metadata is usually
        that part which helps you locate and use the
        other part and which you can often ignore if you
        already know those things.

Bob points to Life Science IDs (LSIDs) [1] as an
example of a standard that does sort of distinguish
data from metadata.

        An LSID is an immutable, permanent, globally
        unique key to a piece of information. The LSID
        spec requires that getData always return the same
        bytes for the entire future of the universe,
        whereas getMetadata may return things about the
        information that could change.

LSIDs are being supported by the Interoperable
Informatics Infrastructure Consortium (I3C) [2]. An
LSID server sits in front of your database or
application so you can continue to use your existing infrastructure.

Sounds like the architecture for a life sciences
fact server...

[1] http://www.bio-itworld.com/archive/011204/lsid.html
[2] http://www.i3c.org/

| Middle World Resources                      |


It's no surprise that O'Reilly Publications is a
cool company. It's geeky and Tim O'Reilly is, IMO, a
hero of the Web. Even so, I'm often impressed with
just how right they get just about everything they
touch. I don't want to rave about Foo Camp, the
free-form weekend camp-out for nerds and geeks, but
one of the many reasons it succeeds is that even
though the company pays for it and lets us occupy
its building and grounds, Tim keeps the weekend
free of overt O'Reilly commercial messages. In
true end-to-end fashion, O'Reilly gets out of the
center and allows the ends -- the attendees -- to

As a result, we love O'Reilly all the more.


Bit by bit, I'm replacing my desktop apps with open
sources ones. The latest one to go has been
PolderBits [1], a fine sound recorder/editor for
which I was happy to pay $29. But Audacity [2] is at
least as good for my minimal needs. And Audacity is
open source and free.

I'm not doing sound editing, so I have no opinion
about how Audacity stacks up in that regard. But
it's terrific for recording onto your computer off a
microphone and -- more important -- for recording
whatever sounds you're streaming. So, if you're
listening to radio over the Internet and they're
playing a song you'd like to keep, just press the
Audacity "record" button. (That's known as the
"analog hole" to people who want to plug your every
orifice with Digital Rights Management controls.)

Then you can do a whole bunch of manipulation of the
sounds, but I don't.

[1] http://www.polderbits.com/
[2] http://audacity.sourceforge.net/


In order to postpone the pleasure of Doom 3, I'm
playing Far Cry, yet another shooter. Lots of people
like it more than I do. And there are many elements
to admire: The graphics are detailed and the island
on which it's set is beautifully drawn. The enemy AI
is the best I've seen; not only don't they get stuck
running into palm trees, but when you shoot at them
from a distance, they do things you might do,
assuming you're not a pants-wetting civilian like
me. I even don't mind their we-know-best save
system, especially since you can save anywhere you
want if you look up the code on the Web. But, I'm
just not finding it all that engaging. Painkiller,
which I finally and regretfully finished, has humor
and imagination going for it. Far Cry has a
beautifully rendered tropical isle and not enough


I got great mail from bunches of you about the piece
in the previous issue about Dewey. If I try to
respond to it all here, I'll never get this issue
out, and in the new fast-paced world of the Web, I'm
trying to pare JOHO down so that it can come out
more often than Punxsutawney Phil.

Several of you took issue with my statement that
Melvil Dewey was "a progressive on social issues."
Of course, you're right. But you're not as right as
some of you think. Although it's certainly true that
he was forced to resign as NY State Librarian in
1905, Wayne A. Wiegand paints a complex picture in
his biography, Irrepressible Reformer (1996). The
anti-Semitism that got him fired seems to have been
rather conventional: He and his wife created a
gated, semi-utopian community at Lake Placid that
casually excluded everyone except white Christians.
In his day-to-day dealings, he seems to have
expressed no hatred of Jews. He also was accused of
being a sexual harasser, although -- in part because
the language of the day was so circumspect -- it's
hard to tell from Wiegand's book just how lecherous
he was; that he made at least some of the women who
worked for him intensely uncomfortable seems
certain, and it may have been much worse than that.
And, indeed, the fact that women worked cheaper than
men undoubtedly was important to him as he staffed
up. The reality seems at best disturbing.

Wiegand argues that Dewey's forced resignation was
due not just to his anti-Semitism and his abuse of
women but also to the fact that he was egomaniacal
and a shady bookkeeper who made lots of enemies for
good and bad reasons.

In short: Dewey was complex. 

(Thanks for the correction.)


You know those objects I talked about in the article
above, the ones that are all metadata and no data? I
want to give them a name. It should be something
that businesspeople can talk about without
embarrassment. At the moment, believe it or not, the
best I've come up with is extradata; at least that
would let me talk about data, metadata and
extradata. So, you do better. You might take it in a
completely different direction. For example, you
might suggest "i-objects," "data monads" or
"chrontent," which I'd then reject and possibly
laugh at.

So, go ahead. I could use a good laugh.


That's it for JOHO. Sorry for the delay. There's
just too much going on. And don't forget that I'm
writing absurd amounts of paranoid drivel over at my
blog[1]. Just think how much worse it's going to get
after November 2 when I am terminally depressed. So,
read my blog now, before the Great Depression

And, if you're American, don't forget to vote.
Depending, of course.

[1] http://www.hyperorg.com/blogger


JOHO is a free, independent newsletter written and
produced by David Weinberger. If you write him with
corrections or criticisms, it will probably turn out
to have been your fault.

To unsubscribe, send an email to joho-request@xxxxxxxxxxxxx
with "unsubscribe" in the subject line. If you have
more than one email address, you must send the
unsubscribe request from the email address you want
unsubscribed. There's more information about
subscribing, changing your address, etc., at
http://www.hyperorg.com/forms/adminhome.html. In
case of confusion, you can always send mail to me at
self@xxxxxxxxxxxx There is no need for harshness or
recriminations. Sometimes things just don't work out
between people.

Any email sent to JOHO may be published in JOHO and
snarkily commented on unless the email explicitly
states that it's not for publication.

The Journal of the Hyperlinked Organization is a
publication of Evident Marketing, Inc. "Hyperlinked
Organization" is a trademark of Open Text. For
information about trademarks owned by Evident
Marketing, Inc., please see our Preemptive
Trademarks?? page at

This issue of JOHO is licensed under a creative
commons license: http://creativecommons.org/licenses/by-nc-sa/1.0

Other related posts:

  • » [joho] JOHO - Oct. 15, 2004