Re: [nvda-translations] symbols.dic and char_descriptions.dic - is there a limit on number of symbols/words that can be held there?

  • From: James Teh <jamie@xxxxxxxxxxxx>
  • To: nvda-translations@xxxxxxxxxxxxx
  • Date: Wed, 19 Sep 2012 15:51:08 +1000

Hi Joseph,

This is a complex question to answer. I've done my best below.

1. Data for both is loaded into memory. Limits aside, obviously, the more you add, the longer it will take to load and the more memory it will consume. It could consume quite a bit of memory - I'm not really sure how much - because the data is more than just characters. This probably isn't the cause of any problems, but it's important nevertheless. You should carefully consider whether you really need all of these, and even if you do, whether there's a more programmatic way to determine names/descriptions.

2. Symbols and character descriptions are handled very differently. They are also used differently: symbols is what should be spoken for the symbol in normal speech, whereas descriptions are a longer description of the character to distinguish it from characters that sound similar. They should not just be the same data.

3. There shouldn't be a limit on the number of character descriptions. Still, you should do any tests for these separately from symbols and report them separately to avoid confusion and make debugging easier.

4. Regarding symbols, to provide some background, we use Python's regular expression engine for symbols. It imposes some limits that we cannot change. I'm not even sure what these limits are or how they work. However:

5. I know for certain that there is a limit of about 90 (I can't remember the exact number) complex symbols. This should be more than enough, as complex symbols should only be used for sentence endings and other special cases.

6. I just did some brief testing and it seems that there is probably a limit on the number of simple symbols as well. (I wasn't aware of this before.) I can probably improve this for symbols that are just one character in length.

Once you've done further testing, please file a ticket in the tracker with steps to reproduce, a log file (with log level set to input/output) and sample data. Please see:
http://www.nvda-project.org/wiki/ReportingIssues
for more details.

Thanks,
Jamie

On 19/09/2012 3:13 PM, Joseph Lee wrote:
Hi, mostly for James:
One of the korean translators found a database of over 25000 Hanja (Chinese
chars used in Korea) ) characters online. After downloading the file (which
contains both spoken word name and its descriptions), he encountered a
problem with possible buffering issue or may have hit the limit on number of
symbols/chars that can be held in symbols.dic and character_descriptions.dic
9note that we're using a dev version of eSpeak and added Korean voices and
pronunciation files).
Here's his procedure (from a post from Korean NvDA users forum on Facebook):
1. Opened symbols.dic to insert Hanja symbols and its Korean pronunciation.
2. Insert a small number of chars (perhaps less than 20) in the dictionary,
save it and apply it into NVDA.
3. Reopen the symbols file and insert a larger number of characters, save it
and apply the newly modified symbols file in NVDA. So far, this works fine.
4. Open the symbols file again and insert hundreds of characters at once,
save the symbols file and apply modification to NVDA. When this happens,
eSpeak (Korean voice) would have hard time working with newly added Hanja
characters and/or previously entered characters would not be spoken at all
with error beep while NVDA is running.
Repeat steps 1 through 3 in character_descriptions.dic file. He plans to
duplicate this issue and, if possible, grab the log file and post it on FB;
I plan to forward to log to you to see if it is NVDA related, eSpeak
related, both or none of these two.
Thanks.
Cheers,
Joseph



--
James Teh
Director, NV Access Limited
Email: jamie@xxxxxxxxxxxx
Web site: http://www.nvaccess.org/
Phone: +61 7 5667 8372

Other related posts: