[retroforth] Re: Faster find

  • From: "Helmar Wodtke" <helmwo@xxxxxx>
  • To: retroforth@xxxxxxxxxxxxx
  • Date: Thu, 27 Jan 2005 21:57:50 +0100

retroforth@xxxxxxxxxxxxx schrieb am 27.01.05 21:28:47:
> 
> Dear all,
> 
> Some days ago I suggested a faster find, and gave that a try. Well, I'm 
> still
> novice with fasm, so I would like to ask for your comments. Expect (quite)
> some errors in my attempt below.
I hope I understand what you want. You want to build a tree where each node has 
possibly as much kids as the alphabete?
Why not making a "less perfect" hashing algorithm and resolve collisions? This 
would not take so much memory.
Did you make a test how much memory is needed for a normal wordlist? I could 
imagine it's a lot.

Bis dann,
Helmar

> Best Regards,
> Ton
> 
> Consider the ASCII tabel.
> 
> ...
> 040   32    20    SPACE          
> 041   33    21  !
> ...
> 176   126   7E   ~
> 177   127   7F    DEL
> 
> The character up to space are not important for words.  A words is always
> delimeted by a space.  I don't care for 8 bit characters (now).
> 
> Consider the first character in a words. This from 33 upto 126. These is
> an array with a  total of 96 characters. The space is in index 0. An 'A'
> (65) is at 'A 32 - . this is 33.
> 
> The first character array always exists. Suppose we want to store a '!'.
> Then index[1]  ( '! 32 - . ) contains a pointer to the next array. This
> array contains the second character following the '!'. The index[0]
> (space) contains the (Forth) dictionary address of '!'.
> 
> The index[0] of the first character is not used, so can contain a count of 
> the allocated character arrays.
> 
> Suppose we want to search a word: ABC then tfind ( a # -- xt )
>  - Start at the base of database, where the first character is.
>  - Loop # times. 
>  - Exit if the address is 0 during the loop
>  - If after # interations an address is found not equal zero, we have it.
> 
> Suppose we want to search a word: ABC then tfind ( a # -- xt )
>  - Start at the base of database, where the first character is.
>  - Loop # times. 
>  - Check for a zero pointer, then create new char array.
> 
> tree rb 20000h
> 
> ; ----------------------------------------------------------------------
> code 'tfind', tfind  ; ( a # - xt)
>  push ebx
>  push ecx
>  push edx
>         mov ebx, tree
>         upop ecx  ; #
>         upop edx  ; a
> .a: sub byte [edx],32
>  inc ebx,[edx]  ; step to index
>  inc byte [edx],32
>  cmp [ebx],0  ; 
>  je .end   ;
>  mov ebx,[ebx]  ; new pointer
>  dec [ecx],1  ; # 1 -
>  cmp [ecx],0  ; # 0 =
>  je .found
>  jmp .a
> .found: clc
>  jmp ret
> .end: stc
> .ret pop ebx
>  pop ecx
>  pop edx
> next
> ; ----------------------------------------------------------------------
> code 'tstore',tstore  ; ( a # - )
>  push ebx
>  push ecx
>  push edx
>         mov ebx, tree
>         upop ecx  ; #
>         upop edx  ; a
> .a: sub byte [edx],32
>  inc ebx,[edx]  ;
>  inc byte [edx],32
>  cmp [ebx],0  ; empty pointer?
>  je new
>  mov ebx,[ebx]  ;
>  dec [ecx],1  ; # 1 -
>  cmp [ecx],0  ; # 0 =
>  jne a
>  jmp end
> .new inc [tree],1
>  mov [ebx],(tree+[tree]*96)
>  jmp a
> .end: pop ebx
>  pop ecx
>  pop edx
> next
> ; ----------------------------------------------------------------------
> 
> 
> 



-- Binary/unsupported file stripped by Ecartis --
-- Type: application/x-pkcs7-signature
-- File: smime.p7s
-- Desc: S/MIME Cryptographic Signature



Other related posts: