## [contentanalysis] 关于Lucene的讨论

• From: "Wight911" <wight911@xxxxxxxxx>
• To: <contentanalysis@xxxxxxxxxxxxx>
• Date: Sun, 13 Jul 2008 18:53:49 +0800

```下面片段是我在一篇SIGMOD文章中看到的、关于倒排表索引（inverted list）的描

Conceptually, an inverted list is a two-dimensional table,

where the i-th row represents indexed keyword Ki and the

j-th column represents instance Ij . The cell at the i-th row

and j-th column, denoted (Ki, Ij), records the number of

occurrences, called occurrence count, of keyword Ki in the

attributes of instance Ij . If the cell (Ki,Ij) is not zero, we

say instance Ij is indexed on Ki. The keywords are ordered

in alphabetic order, and the instances are ordered by their

identifiers. Table 1 shows the inverted list for our example

triple base.

…(这里省略一小段无关文字)

In practice, an inverted list is seldom stored as a matrix.

There are multiple ways to store an inverted list, such as a

sorted array, a prefix B-tree or a Patricia trie. In addition,

[42] describes techniques for compression of inverted lists.

The extensions we describe are orthogonal to these physical

implementations.

```

### Other related posts:

• » [contentanalysis] 关于Lucene的讨论