Responding to Philippe Martin now. I appreciate the extreme care and thought that went into this post, Philippe, as well as your personal expertise in areas extremely relevant to this project. At this point, this sort of sweeping re-envisioning of what I have in mind is appropriate and important to consider. Even if we do not opt to take your advice, I think we will learn something important. > From: Philippe MARTIN [mailto:phmartin@xxxxxxxxxxxxx] > Sent: Thursday, May 11, 2006 10:54 PM > Subject: still not fine-grained and structured enough to be scalable ... > For example, I see some similarities between this project and > Synview (1985: > http://portal.acm.org/ft_gateway.cfm?id=637116&type=pdf), > ScholOnto (1999: http://citeseer.ist.psu.edu/shum99representing.html), > MathWorld (1999-2006 http://mathworld.wolfram.com/) and the > Open Directory Project (1998-2006: http://dmoz.org/) although > the ODP is more coarse grained (it is much more about whole documents > that document elements). > This project is far more coarse-grained and far less > ambitious than the HALO/Aristotle projects (see > http://www.projecthalo.com/ and > http://www.edge.org/3rd_culture/hillis04/hillis04_index.html), > the QED project (http://www-unix.mcs.anl.gov/qed/) and the > OpenGALEN project (http://www.opengalen.org/). I can't claim to be familiar with all of these projects, but it seems most of the projects listed here, especially the latter projects, are indeed much finer-grained. Also, they are as much technical projects (no doubt designed to prove or demonstrate something of interest mainly to information theorists) as they are reference projects; e.g., they are attempts to build ontologies, or provide scalable technical models of knowledge, that don't have *immediate* uses. MathWorld and the ODP have at least developed a great deal of useable content--which is what impresses me most, frankly. I want Textop to be like that: very useful. > Nevertheless, is this project achievable and worth to be > achieved exactly as it is currently described, that is, with > a classic rather coarse grained and loosely structured approach? > From my viewpoint, it is not. I love a definite proposition, and that is one! > The first problem is the informal hierarchy of topics (which > may contain "many thousands if not millions of outline > headers"). It is well recognised that informal hierarchies of > topics are very much arbitrary (there is no "right place" for > a node, placing nodes is a matter a personal > preferences/goals), hence it is difficult to retrieve > information or know where to insert information > and this leads to many redundancies and inconsistencies (in > the same way that Web documents are often redundant, inconsistent and > their content difficult to retrieve and compare). This is because > there are no formal/precise/semantic/meaningful relations > (such as category subsumption, statement specialization, mereological > relations, ...) between the nodes of the hierarchy. I suspect that this problem is made much more tractable when one is dealing with text chunks that are individuated precisely by the fact that they make (or are taken to make) definite, classifiable arguments, propositions, definitions, etc. That the items I propose to classify are these sorts of "text chunks" is crucial to remember. That's only part of the solution. Another part is that it is then up to the designers of the project to *designate* what the parent-child relations shall mean. I have followed a certain pattern with the Leviathan that I have found useful, and which I might explain sometime; but it is clear enough to me that the fact that there is a variety of choices of rules does not imply that there's no distinguishing the *quality* of rules, or (more generally) no way to settle upon a set of rational rules. Perhaps the more difficult problem is one that Kunal Sen identified--how to get people to agree on how to create an outline. > When indexing "interesting documents", superficial and > informal hierarchies of topics such as thos of Yahoo or the > ODP may make sense (since documents are about many ideas) > but when categorising individual ideas, concepts or objects, > using informal hierarchies cannot work. Not to be merely contrary, but I actually think it is the reverse. Please do consult the work on Hobbes' Leviathan I've done so far (http://www.textop.org/outline_help.html). Whereas websites and books and even encyclopedia articles concern very many different topics, and thus are inherently problematic to classify, chunks of text are a different matter altogether. Something I have confirmed to my own satisfaction is that chunking texts in the way I do makes it possible to organize the results into an outline with much more satisfactory results than classifying websites or books. And bear in mind, the items that are being categorized here are decidedly *not* "ideas, concepts or objects," but chunks of text. That's an important difference. I very much suspect that you are thinking about the Collation Project (that's what we're discussing) as an ontology, which *is* about "ideas, concepts or objects." But the outline of the Collation Project *is not* an ontology, nor is it meant to be one. Again, consult the example. The fact that we're talking about outlining text chunks, not "ideas, concepts or objects," makes a difference both in theory and in practice. In theory, we *should* expect relatively unclear concepts to require filing in multiple places, and for an outline built out of concepts to be confusing and redundant, for the reason that concepts do not enter into *unique* semantic, logical, and other relations with each other. But propositions, definitions, arguments, explanations, etc.--human thought chunked at that level--*do* fall into more definite relations. Consider, for example, "realism" as a concept. This might fall under many other concepts in an ontology. But contrast that with a paragraph articulating what someone means by "realism" in a particular case. It is realism *about Platonic universals*, for example. Philosophers know where to put that, at least in relation to a cluster of other related concepts. Even more definitely can they say what relation a specific point about realism about Platonic universals bears to other points. > For the Textop project, the minimal support that should be used is > (i) an updatable lexical ontology (and hence semantic network) of > English such as for example the one browsable and updatable at > http://www.webkb.org (although it is derived from WordNet and > many improvements still need to be made before it can be a > "good" support), Perhaps. As much as I love ontologies generally, and the project of building ontologies, and as much as I admire those who have the technical chops to build coherent ontologies, I'm not sure what the benefit of a formal or even a semi-formal ontology would be *in this context*. I'm looking at this as a practical project that will have a definite human use; it's going to be a reference work. So how would an updatable lexical ontology be *of use* in this context? And how we can expect people constructing the outline just to buy into the ontology wholesale? Besides, though I'm not absolutely sure of this, I wouldn't be at all surprised if a usable ontology fell out of the careful examination of texts in metaphysics, logic, and semantics. By exploring the logical and other relations of various definitions, arguments, etc., *in all their glorious detail* (that's the important part), one is *least* apt to leave some relevant consideration out. But it is necessary in any case to follow the text where it leads. It's been one of my rules, to create nodes only when necessary to place a text correctly. That's why you'll see some areas of my outline of the Leviathan are very well-developed, and some of them are not. > (ii) an updatable conceptual/semantic network of individual > statements connected by conceptual intersentential relations > (specialization relations, argumentation relations, rhetoric > relations, ...). Well, in any case, I'm trying to analyze the actual words of actual texts--text chunks--and to "collate" them all together. What (ii) proposes is to create a network of *statements*. So I guess I need to know is how this network of statements is to be generated: by summarizing texts, for example? Or by producing them ourselves, *a priori* as it were, and then expecting texts to fall neatly into the network? > Even in the hypertext community (for which the linked > document elements are not necessarily fine-grained), the need > for typed hyperlinks > was finally recognised in the early 90's, as it is again > nowadays by the creators of "semantic wikis". Well, the actual structure I propose is a hierarchical outline structure, so the elements of the structure are nodes. To create this, and to file chunks into it as I've done for the Leviathan, it seems to me (at least) that neither (i) nor (ii) is necessary. And I'm not even sure why any particular attempted ontology would be of that much help. Nodes will, however, have to have unique identifiers, distinct from the words that make up the header that lives at a node. Have a look at the proposed screenshot I have devised for the contributor interface: http://www.textop.org/screenshot.html There can also be cross-references, of course. > The second (although related) problem of this project is that > a "paragraph" is not a fine-grained enough unit of > information to support a scalable > indexation/retrieval/comparison of information and a > "democratic" cooperation between the information providers. > Indeed, for each idea/topic/statement there will be thousands > of paragraphs (from different > documents) about (or giving an argument for) that particular > idea/topic/statement and simply > listing all of these paragraphs will not permit to > compare/organise the various underlying > ideas/topics/statements/arguments/objections. Of course there will be thousands of paragraphs, if the "idea/topic/statement" is broad enough. For example, if the topic were "arguments for the existence of God," there would be probably tens of thousands of paragraphs. But philosophers, at least, actually have names for different kinds of arguments (e.g., the argument from design vs. the argument from first causes for the existence of God). Part of the plan is to lump arguments (and other linguistic entities) together, when there are relatively few of them, under a specific node, and split them and find distinctions when there are many (whether or not there are names to go with the distinguished types). Granted, it might turn out that there are *very* many instances of the argument from design (just for example) in the literature (there must be literally hundreds of instances of the basic argument being stated, to say nothing of the discussion of surrounding matters), and that even if we try to split these into types, we will find that there really aren't any meaningfully different types that do not themselves each have dozens of instances. Well, in that case so be it, I say. Humanity ought to stop all this wasteful duplication of effort, already. Note, we (users) will be able to filter from which sources chunks will be displayed. So if there are a zillion arguments from design, then just show me the ones from the 18th century; or just the ones with French as the original source; or just the ones from some list of "the Great Books." (And such filtering is exactly how some academics' minds work: it doesn't really matter if it didn't happen within the last 30 years or so. It makes their work more tractable. The Collation Project, and Textop generally, is going to embarrass such people terribly. We'll demonstrate as never before that there's nothing new under the sun.) Anyway, your observation above really goes to one of the more interesting aspects of the Collation Project: if we build the outline *around* the texts, i.e., if we use our summaries of text chunks to decide what outline nodes shall exist (as I've done with Hobbes), then we are (together) exploring the dialectical territory in *enormously fine* detail, something that to my knowledge has never quite been done before, certainly not to the extent I'm proposing. > To do so, the above cited updatable conceptual/semantic network of > individual statements is required (the unit of information > should be a sentence, not a set of sentences). Well, then consider the thing to be organized into the outline not the text chunk but a summary of the text chunk. (And, by the way, I have actually put chunks into different parts of the outline under different summaries. It seemed like the right thing to do at the time.) > And with such > a network where each node has a recorded creator, it is > possible to calculate a > value for the "originality" and "usefulness" of each > statement (and hence also for each creator of statements) > based on votes and the > argumentation tree associated to each statement; thus, there > is no need for a committee to decide which statements are > "correct" and "interesting" and remove the other ones; > instead, each user > can filter out (or change the presentation of) statements > with low originality/usefulness (a base algorithm is given in the > sections 2.1 and 2.2 of the articles accessible from > http://www.webkb.org/doc/papers/iccs05/ but, ideally, options > should be provided to each user for the calculated values to > better reflect what that user believes is original or useful). > (Note: Section 2.1 of this article also show small examples of > the above cited semantic network of statements but it is better > to see the more complete examples accessible from > http://www.webkb.org/kb/classif/sd.html#examples). As Kunal wisely said, to do this justice we'd have to read your references. So, perhaps I'm not understanding at all, but this is not really sounding very much like what I'm proposing to do. Perhaps you can start at a more basic level. Are you saying that the project should *rate* particular statements (outline nodes, I guess) as "correct" or "interesting"? I would see that as an unnecessary distraction--a side-project, perhaps. The project's aim is not to get at the truth, but to elicit the structure of various points made in actual books. Hopefully, the result will help individuals decide what they think is true. > To conclude, I believe that a (logic-based) semantic network of > categories and statements is needed for this project to be > scalable This I don't see that you've proven. The feasibility of building an outline of the sort I propose--which may turn out to have all sorts of imperfections, but does the job--seems to be much more a practical question. > and of more interest, Forgive me, but I don't see how you've proven this either. It would be immodest of me to claim that the outline I've built of the Leviathan is of interest even to philosophers, but I am very sure I could go on--strictly by myself I wanted to--and incorporate, say, Locke's Essay, Hume's Treatise, some Reid, some Mill, some Russell, and I would have a very nice outline of the history of English language philosophy that would be of considerable interest to historians of philosophy as a reference. Especially if specialists were to clean it up and help me with it. Why *wouldn't* such a thing be of considerable interest? What I don't understand is how the interest of the result of this work would *increase* if the outline were somehow based on, say, your ontology. I don't mean to claim it *wouldn't*, but I don't see how you've supported this. > whether or not the statements are formal, > informal or semi-formal (i.e., semi-structured or using some > formal terms). Thus, I believe the required interface is not > the one currently envisaged but one that permits to create a > semantic network, with some of the nodes being pointers to > parts of documents. Then, however, there would be few > differences between that project and mine, and the tools and > syntaxes I am developing could be re-used. It would be very interesting indeed to have an elaboration of this. I think that's probably a useful way to move this discussion forward. What would the input interface be like? How would *the collation of texts* proceed? What's the overall procedure? How would the *result* look different from what I've illustrated? If you like, you can ignore all my other replies and focus on these questions, because it's really what I'm interested in: other clear, viable options. > It is clear that the path that I am advocating (that is, > precision-oriented knowledge entering) is more demanding for > information providers (they have to be analytic, precise and > careful when writing statements), at least until a certain amount of > information has been represented. But I do not see any escape to that. I actually would say that the technical barriers even to what *I* propose are very difficult--actually teaching people to use the software and understand the system, that will be a challenge to say the least. To require further that they be logicians and severely analytically-minded is to limit the number of participants *much* more. Philippe, I apologize for this long and vigorous reply, but I hope you will take that as a sign of my respect for and interest in your ideas. Furthermore, I really do believe you've raised some very interesting issues. --Larry ====textop - a Textop (http://www.textop.org) mailing list. To post, send a mail to textop@xxxxxxxxxxxxx, or just reply to this post. To unsubscribe, send a mail to textop-request@xxxxxxxxxxxxx with 'unsubscribe' in the header.