Re: [nvda-translations] symbols.dic and char_descriptions.dic - is there a limit on number of symbols/words that can be held there?

From: James Teh <jamie@xxxxxxxxxxxx>
To: nvda-translations@xxxxxxxxxxxxx
Date: Wed, 19 Sep 2012 17:04:27 +1000

changeset:main,5470 may improve this somewhat, though I'm not certainwhether this will fix your particular problem.


Jamie

On 19/09/2012 4:03 PM, Joseph Lee wrote:

Hi,
Thanks. I'll ask other Korean users to send me the log file and attach the
logs.
Cheers,
Joseph

-----Original Message-----
From: nvda-translations-bounce@xxxxxxxxxxxxx
[mailto:nvda-translations-bounce@xxxxxxxxxxxxx] On Behalf Of James Teh
Sent: Tuesday, September 18, 2012 10:51 PM
To: nvda-translations@xxxxxxxxxxxxx
Subject: Re: [nvda-translations] symbols.dic and char_descriptions.dic - is
there a limit on number of symbols/words that can be held there?

Hi Joseph,

This is a complex question to answer. I've done my best below.

1. Data for both is loaded into memory. Limits aside, obviously, the more
you add, the longer it will take to load and the more memory it will
consume. It could consume quite a bit of memory - I'm not really sure how
much - because the data is more than just characters. This probably isn't
the cause of any problems, but it's important nevertheless. You should
carefully consider whether you really need all of these, and even if you do,
whether there's a more programmatic way to determine names/descriptions.

2. Symbols and character descriptions are handled very differently. They are
also used differently: symbols is what should be spoken for the symbol in
normal speech, whereas descriptions are a longer description of the
character to distinguish it from characters that sound similar.
They should not just be the same data.

3. There shouldn't be a limit on the number of character descriptions.
Still, you should do any tests for these separately from symbols and report
them separately to avoid confusion and make debugging easier.

4. Regarding symbols, to provide some background, we use Python's regular
expression engine for symbols. It imposes some limits that we cannot change.
I'm not even sure what these limits are or how they work.
However:

5. I know for certain that there is a limit of about 90 (I can't remember
the exact number) complex symbols. This should be more than enough, as
complex symbols should only be used for sentence endings and other special
cases.

6. I just did some brief testing and it seems that there is probably a limit
on the number of simple symbols as well. (I wasn't aware of this
before.) I can probably improve this for symbols that are just one character
in length.

Once you've done further testing, please file a ticket in the tracker with
steps to reproduce, a log file (with log level set to input/output) and
sample data. Please see:
http://www.nvda-project.org/wiki/ReportingIssues
for more details.

Thanks,
Jamie

On 19/09/2012 3:13 PM, Joseph Lee wrote:

Hi, mostly for James:
One of the korean translators found a database of over 25000 Hanja
(Chinese chars used in Korea) ) characters online. After downloading
the file (which contains both spoken word name and its descriptions),
he encountered a problem with possible buffering issue or may have hit
the limit on number of symbols/chars that can be held in symbols.dic
and character_descriptions.dic 9note that we're using a dev version of
eSpeak and added Korean voices and pronunciation files).
Here's his procedure (from a post from Korean NvDA users forum on

Facebook):

1. Opened symbols.dic to insert Hanja symbols and its Korean

pronunciation.

2. Insert a small number of chars (perhaps less than 20) in the
dictionary, save it and apply it into NVDA.
3. Reopen the symbols file and insert a larger number of characters,
save it and apply the newly modified symbols file in NVDA. So far, this

works fine.

4. Open the symbols file again and insert hundreds of characters at
once, save the symbols file and apply modification to NVDA. When this
happens, eSpeak (Korean voice) would have hard time working with newly
added Hanja characters and/or previously entered characters would not
be spoken at all with error beep while NVDA is running.
Repeat steps 1 through 3 in character_descriptions.dic file. He plans
to duplicate this issue and, if possible, grab the log file and post
it on FB; I plan to forward to log to you to see if it is NVDA
related, eSpeak related, both or none of these two.
Thanks.
Cheers,
Joseph


--
James Teh
Director, NV Access Limited
Email: jamie@xxxxxxxxxxxx
Web site: http://www.nvaccess.org/
Phone: +61 7 5667 8372


--
James Teh
Director, NV Access Limited
Email: jamie@xxxxxxxxxxxx
Web site: http://www.nvaccess.org/
Phone: +61 7 5667 8372

Follow-Ups:
- Re: [nvda-translations] symbols.dic and char_descriptions.dic - is there a limit on number of symbols/words that can be held there?
  - From: Joseph Lee

References:
- [nvda-translations] symbols.dic and char_descriptions.dic - is there a limit on number of symbols/words that can be held there?
  - From: Joseph Lee
- Re: [nvda-translations] symbols.dic and char_descriptions.dic - is there a limit on number of symbols/words that can be held there?
  - From: James Teh
- Re: [nvda-translations] symbols.dic and char_descriptions.dic - is there a limit on number of symbols/words that can be held there?
  - From: Joseph Lee

Re: [nvda-translations] symbols.dic and char_descriptions.dic - is there a limit on number of symbols/words that can be held there?

Other related posts: