[haiku-bugs] Re: [Haiku] #16221: Web+: Garbled text on Github
- From: "Haiku" <trac@xxxxxxxxxxxx>
- To: undisclosed-recipients: ;
- Date: Tue, 29 Jun 2021 15:01:43 -0000
#16221: Web+: Garbled text on Github
---------------------------+----------------------------
Reporter: bitigchi | Owner: pulkomandy
Type: bug | Status: reopened
Priority: normal | Milestone: R1/beta3
Component: Kits/Web Kit | Version: R1/Development
Resolution: | Keywords:
Blocked By: 16213 | Blocking:
Platform: All |
---------------------------+----------------------------
Comment (by madmax):
Replying to [comment:26 pulkomandy]:
Please bear with me through another wall of text, I'll try to err on the
"too much" side this time.
We identity-pair them with the value of the character, the 32 bit
codepoint, as if they were the same (see GlyphPageTreeNodeHaiku or
FontHaiku, for example). So [emojis] don't work because they are in plane
1 and that bit is lost before giving the string to Haiku.
What else could we do? There is no notion of "planes" anywhere in
WebKit, right? So where should that come from? At which point does webkit
remove that information, and where does it store it instead?
That'd be in the characters, and it'd be our code that removes it when we
put unicode codepoint values into 16 bit glyphs. Details below.
Then in FontHaiku instead of drawing the glyphs with the given
advances, as it seems what the WebKit infrastructure expects, we rebuild
the string to use Haiku API to draw it
This is not an "instead". We do use the advances, but we put the glyphs
back into a string.
I was not very clear with that one, sorry. We do use the advances, yes,
but we don't put the glyph back when we have signaled that we don't have a
glyph for that character. That's the cause of the garbled text. Details
below.
I think WebKit expects the font thing to be more font file level, not
BFont level. When working through some text, it would call the GlyphPage
code when needed (and it does so with ranges of characters, not strides of
real text) to have a character <-> internal-glyph map. That's quite a
guess, but please assume it's correct enough for the moment. When we
`setGlyphForIndex(i, character)` we do have say 0x1f3af in `character`,
but it's casted to the 16 bit Glyph of the second parameter for
setGlyphForIndex, so the map becomes "codepoint 0x1f3af is glyph 0xf3af",
and we can't recover 0x1f3af when we are given glyphs to draw. When we
don't have a glyph, `setGlyphForIndex(i, 0)` would be right for an
implementation as WebKit expects, but it creates problems for us, as we
won't be able to put the original character back and, as we'll see, we
don't put *any* character back for that glyph, mismatching the string with
the advances.
When it finally wants the text to be drawn, `drawGlyphs` is called with
the needed glyphs, not characters. The typical lower-level-than-BFont
implementation would go through the glyphs array, retrieve each one from
the font and draw it in the given positions. That would include glyph 0,
the "missing glyph" glyph. We instead rebuild the text string and use the
advances to call Haiku's DrawString. But we have aliased characters >
65535, we don't really know what character produced a glyph 0 and have yet
another problem with that. Let me run an example.
Say we have text with characters 1, 2, 3, 4 and 5. We don't have a glyph
for char 2. Say glyphs' advances are equal to their codepoint (just to
have something easy with different widths). drawGlyphs would be called
with glyphs [1, 0, 3, 4, 5] and the same advances widths. The offsets we
calculate are then [0, 1, 1, 4, 8].
To rebuild the string we go through each of the glyphs/characters, turn
each one of them to an utf8 string with BUnicodeChar::ToUTF8 and append
that to BString. So we get 1, the utf8 string is [1, 0], and our BString
becomes [1, 0]. We get 0, the utf8 string for that is [0], and our BString
stays [1, 0]. And there is our problem. We then get 3, the string becomes
[1, 3, 0], etc. In the end we have the string [1, 3, 4, 5, 0]. So we call
DrawString with the correct advances for the full string, but only with
the characters for which we have glyphs. Character 1 is OK. Character 3 is
OK too if you expect unavailable glyphs to not take any space. Character 4
goes over 3 and also overflows the allotted space, as it was supposed to
be for 3. Character 5 goes where 4 should have been, and is a bit over the
drawn 4.
Notice how that, as long as GetHasGlyphs returns false and nothing is
changed in WebKit side, the garbling is independent of whether we return
the "missing glyph" glyph or not, as DrawString does not even get the
character.
So what can we do?
Well, to "play by the rules" I guess one would use FreeType-level stuff,
ask it to render bitmaps instead of returning vector glyphs (if possible,
I haven't checked, and it would probably give a subtly different rendering
than Haiku's) or duplicate Haiku's rendering stuff, put each glyph in a
buffer image and finally send that image to be drawn. So I think the
identity mapping, string recovering and call to DrawString is a really
good idea, it just has those two problems that were probably very rare to
notice until this fashion of having emojis everywhere.
One problem should be solved with a 32bit Glyph type, if WebKit does not
assume it's 16 bits anywhere else.
For the other one there are at least three possibilities:
- Discard advances for glyph 0 in drawGlyphs don't increment the new index
we'd have to use to access the offsets array, and decrement numGlyphs at
the end with the number of glyphs we have skipped (or just ask BString
what its length is).
- Still in drawGlyphs, just append a space to the BString when get glyph
0.
- In GlyphPage::fill, always setGlyphForIndex(i, character), whether we
have the glyph or not.
I think the third one would show the "missing glyph" glyph with no further
changes in WebKit when we make the fallback code return it, and nothing
(but not garble the rest of the text) until then.
--
Ticket URL: <
https://dev.haiku-os.org/ticket/16221#comment:27>
Haiku <
https://dev.haiku-os.org>
The Haiku operating system.
Other related posts: