f



Q.: Character entity for ZeroWidthSpace character?

Elsewhere in this group, James Moe recently wrote:

> In general it is safer to use character entities
> ... than numeric escape sequences.

In that spirit, I'd greatly welcome a character entity for the numeric
escape sequence ​ requisitioning a ZeroWidthSpace character.

Is there such a beastie? Thanks! Cheers, and Seasons' Greetings, -- tlvp
-- 
Avant de repondre, jeter la poubelle, SVP.
0
tlvp
12/19/2016 2:30:03 AM
comp.authoring.html 7078 articles. 0 followers. Post Follow

12 Replies
613 Views

Similar Articles

[PageSpeed] 52

19.12.2016, 4:30, tlvp wrote:

> Elsewhere in this group, James Moe recently wrote:
>
>> In general it is safer to use character entities
>> ... than numeric escape sequences.

That�s not true; it�s rather the opposite (though for …, only as 
regards to the possibility that the document might some day be processed 
by an XHTML processor, which is not required to support named character 
references except those defined in XML).

> In that spirit, I'd greatly welcome a character entity for the numeric
> escape sequence ​ requisitioning a ZeroWidthSpace character.

There is: ​. Reference:
https://www.w3.org/TR/html/syntax.html#named-character-references
But don�t expect browsers to support it. They may, or they may not, 
depending on whether browser vendors have kept up with the fancy 
additions to the list and whether people have updated their browsers.

You can alternatively use ​ or ​ 
or ​ or ​ (I�m not even 
considering asking for the reasons for this apparent insanity; I�m sure 
I would get a long and convincing-looking explanation).

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/
0
Jukka
12/19/2016 9:06:10 AM
On Mon, 19 Dec 2016 11:06:10 +0200, Jukka K. Korpela wrote:

> 19.12.2016, 4:30, tlvp wrote:
> 
>> Elsewhere in this group, James Moe recently wrote:
>>
>>> In general it is safer to use character entities
>>> ... than numeric escape sequences.
> 
> That’s not true; it’s rather the opposite (though for …, only as 
> regards to the possibility that the document might some day be processed 
> by an XHTML processor, which is not required to support named character 
> references except those defined in XML).
> 
>> In that spirit, I'd greatly welcome a character entity for the numeric
>> escape sequence ​ requisitioning a ZeroWidthSpace character.
> 
> There is: ​. Reference:
> https://www.w3.org/TR/html/syntax.html#named-character-references
> But don’t expect browsers to support it. They may, or they may not, 
> depending on whether browser vendors have kept up with the fancy 
> additions to the list and whether people have updated their browsers.
> 
> You can alternatively use ​ or ​ 
> or ​ or ​ (I’m not even 
> considering asking for the reasons for this apparent insanity; I’m sure 
> I would get a long and convincing-looking explanation).

Thanks, Yucca. My explanation for looking is simple, and convincing to me
at least: my author colleague sometimes sets local explanatory material off
between em-dashes, and prefers these (a) abutting the words they separate,
but (b) prepared to break away from them should line-flow aesthetics demand
it. One resolution of the conflict those demands spawn is to use the trio
[ZeroWidthSpace][EmDash][ZeroWidthSpace] in place of simply EmDash, i.e.,
​&emdash;​ , wherever an em-dash would be called for.

That's forced by the observation that "Browsers create soft linebreaks
after hyphens (see above), but not after en dashes or em dashes." (Source:
<https://www.w3.org/wiki/Common_HTML_entities_used_for_typography#HTML_entity_usage_notes>,
item 7.)

Well, &emdash; is memorable. I seek an equally memorable replacement for
&#8203; ... if there is one :-)  ... preferably one that browsers support.
So if there's only the ill-supported &ZeroWidthSpace; I guess I'll throw in
the proverbial towel.

Negative spaces? I'm not sure I need such beasties, but if free ... :-) .

Thanks again. And cheers, -- tlvp
-- 
Avant de repondre, jeter la poubelle, SVP.
0
tlvp
12/19/2016 9:42:27 AM
On Mon, 19 Dec 2016 04:42:27 -0500, tlvp mistakenly wrote:

> ... &#8203;&emdash;&#8203; ...

Sorry, I know better: should have been just &mdash; in the middle there.
Sorry to raise such confusion. A thousand apologies! Cheers, -- tlvp
-- 
Avant de repondre, jeter la poubelle, SVP.
0
tlvp
12/19/2016 9:55:02 AM
On Mon, 19 Dec 2016 11:06:10 +0200, Jukka K. Korpela wrote:
> 19.12.2016, 4:30, tlvp wrote:
> 
> > Elsewhere in this group, James Moe recently wrote:
> >
> >> In general it is safer to use character entities
> >> ... than numeric escape sequences.
> 
> That?s not true; it?s rather the opposite

Thanks. That seemed wrong to me, but I don't have your depth of 
knowledge. I appreciate he confirmation BEFORE I asked. 



-- 
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
                                       http://BrownMath.com/
                                  http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator:      http://validator.w3.org/
CSS 2.1 spec:   http://www.w3.org/TR/CSS21/
validator:      http://jigsaw.w3.org/css-validator/
Why We Won't Help You: http://preview.tinyurl.com/WhyWont
0
Stan
12/19/2016 11:58:40 PM
On Mon, 19 Dec 2016 04:55:02 -0500, tlvp wrote:
> On Mon, 19 Dec 2016 04:42:27 -0500, tlvp mistakenly wrote:
> 
> > ... &#8203;&emdash;&#8203; ...
> 
> Sorry, I know better: should have been just &mdash; in the middle there.
> Sorry to raise such confusion. A thousand apologies! Cheers, -- tlvp
> 

Of course we wouldn't need such filigree if browsers(*) didn't feel 
free to break a line BEFORE an em dash. That seems like a pretty 
basic function compared to all the much more complicated stuff they 
implement correctly.

(*) Where "Firefox" is a member of "browsers". I haven't tested in 
any others.

-- 
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
                                       http://BrownMath.com/
                                  http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator:      http://validator.w3.org/
CSS 2.1 spec:   http://www.w3.org/TR/CSS21/
validator:      http://jigsaw.w3.org/css-validator/
Why We Won't Help You: http://preview.tinyurl.com/WhyWont
0
Stan
12/20/2016 12:00:31 AM
20.12.2016, 2:00, Stan Brown wrote:

> On Mon, 19 Dec 2016 04:55:02 -0500, tlvp wrote:
>> On Mon, 19 Dec 2016 04:42:27 -0500, tlvp mistakenly wrote:
>>
>>> ... &#8203;&emdash;&#8203; ...
>>
>> Sorry, I know better: should have been just &mdash; in the middle there.
>> Sorry to raise such confusion. A thousand apologies! Cheers, -- tlvp
>>
>
> Of course we wouldn't need such filigree if browsers(*) didn't feel
> free to break a line BEFORE an em dash. That seems like a pretty
> basic function compared to all the much more complicated stuff they
> implement correctly.

I�m confused. According to Unicode Line Breaking rules, EM DASH has line 
breaking class B2 [Break Opportunity Before and After], so the behavior 
you have noticed appears to be correct. And if you want to prevent that, 
you need explicit line breaking *prohibition*, whereas &#8203; is ZERO 
WIDTH SPACE, which explicitly *allows* line breaking (so it is redundant 
in &#8203;&mdash;&#8203;, though it can be useful since not all browsers 
implement the Line Breaking rules correctly. (For years, browsers broke 
only on spaces.)

When EM DASH is used in its normal meaning in English texts, namely as a 
punctuation character to set off a parenthetic remark (in US English 
style), it seems appropriate that it has the Break Opportunity Before 
and After property, though perhaps it is better to leave it at the end 
of a line rather than at the start of a new line.

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/
0
Jukka
12/20/2016 5:04:59 AM
19.12.2016, 11:42, tlvp wrote:

> [...] my author colleague sometimes sets local explanatory material off
> between em-dashes, and prefers these (a) abutting the words they separate,
> but (b) prepared to break away from them should line-flow aesthetics demand
> it. One resolution of the conflict those demands spawn is to use the trio
> [ZeroWidthSpace][EmDash][ZeroWidthSpace] in place of simply EmDash, i.e.,
> &#8203;&emdash;&#8203; , wherever an em-dash would be called for.

It looks messy, but I don’t think you can make it simpler (in HTML 
source). Actually entering the characters involved is impractical even 
if you can define e.g. a macro for it in an editor, since zero width 
spaces are literally unnoticeable (unless an editor chooses to render it 
in some special way). Using <wbr>—<wbr> would be a nicer option, but 
people who define HTML standards have decided to treat the good old 
<wbr> as Bad, Obsolete, and whatever, and the browser support, which was 
excellent, isn’t quite that any more.

> That's forced by the observation that "Browsers create soft linebreaks
> after hyphens (see above), but not after en dashes or em dashes." (Source:
> <https://www.w3.org/wiki/Common_HTML_entities_used_for_typography#HTML_entity_usage_notes>,
> item 7.)

The information is outdated. Chrome implements Unicode line breaking 
rules for EM DASH. Unfortunately other browsers misbehave.

Don’t treat that page as an authority of any kind. For one thing, it 
claims that EN DASH is indistinguishable from MINUS SIGN. (They are two 
distinct characters, and even though people may confuse them and even 
though fonts may have identical or almost identical glyphs for them, a 
well-designed font makes them different.)

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/
0
Jukka
12/20/2016 9:29:26 AM
Am 20.12.2016 um 06:04 schrieb Jukka K. Korpela:

> When EM DASH is used in its normal meaning in English texts, namely
> as a punctuation character to set off a parenthetic remark (in US
> English style),

In German, the em dash with thin spaces before and after was used in the
same sense until about 50 years ago, now an en dash with normal spaces
is used instead. I like that better than the US way as I find abutting 
characters distracting at a place where phrases are separated rather 
than connected but this may be mere habituation to local customs.

The question where to break lines arises with all these typesetting 
customs in the same way.

 > it seems appropriate that it has the Break Opportunity Before
> and After property, though perhaps it is better to leave it at the
> end of a line rather than at the start of a new line.

Some uses of such dashes � but not all � act like a pair of parentheses 
while others more like a semicolon � not obvious to distinguish. In my 
opinion, the line break should happen before the dash for the left 
parenthesis and after the dash in the remaining cases. For some time I 
have forced that behaviour with NBSP characters but now I consider this 
too much effort.

Much more important are NBSP before or after numbers, e.g. 
number&nbsp;17 but 17&nbsp;bottles. There does not seem to be an 
automatic way to enforce that.

-- 
Helmut Richter
0
Helmut
12/20/2016 10:58:15 AM
On Tue, 20 Dec 2016 07:04:59 +0200, Jukka K. Korpela wrote:

> ... ZERO 
> WIDTH SPACE, which explicitly *allows* line breaking (so it is redundant 
> in &#8203;&mdash;&#8203;, though it can be useful since not all browsers 
> implement the Line Breaking rules correctly. ...

Exactly. I've tested my HTML in older browsers, some of which need specific
"feel free to wrap the line here" instructions before and after each
em-dash. It's necessary also when converting an HTML file -- or a docx file
-- to the Kindle .MOBI format. Cheers, -- tlvp
-- 
Avant de repondre, jeter la poubelle, SVP.
0
tlvp
12/20/2016 11:23:07 PM
On Tue, 20 Dec 2016 11:58:15 +0100, Helmut Richter wrote:

> In German, the em dash with thin spaces before and after was used in the
> same sense until about 50 years ago, now an en dash with normal spaces
> is used instead.

At least one Polish publisher shares the new German aesthetic you describe.

> ... I like that better than the US way as I find abutting 
> characters distracting at a place where phrases are separated rather 
> than connected

Theoretically, I have to agree with this position. But as a matter of
practice, I've become accustomed to the US way on this, and now accept it
as standard.

Thanks for the added perspective. Cheers, -- tlvp
-- 
Avant de repondre, jeter la poubelle, SVP.
0
tlvp
12/20/2016 11:28:24 PM
On Tue, 20 Dec 2016 11:29:26 +0200, Jukka K. Korpela wrote:

>> &#8203;&mdash;&#8203; , wherever an em-dash would be called for.
> 
> It looks messy, but I don’t think you can make it simpler

OK, wishes called back home again as impractical.

> ... Actually entering the characters involved is impractical ...

Absolutely! Proofing HTML with invisible ZWSpaces scattered about would be
a worse nightmare even than living in the USA will be, come Jan. 20 :-{ .

Cheers, -- tlvp
-- 
Avant de repondre, jeter la poubelle, SVP.
0
tlvp
12/20/2016 11:33:56 PM
tlvp wrote:
> On Tue, 20 Dec 2016 11:29:26 +0200, Jukka K. Korpela wrote:
>
>>> &#8203;&mdash;&#8203; , wherever an em-dash would be called for.
>>
>> It looks messy, but I don’t think you can make it simpler
>
> OK, wishes called back home again as impractical.
>
>> ... Actually entering the characters involved is impractical ...
>
> Absolutely! Proofing HTML with invisible ZWSpaces scattered about would be
> a worse nightmare even than living in the USA will be, come Jan. 20 :-{ .
>
> Cheers, -- tlvp
>
Better than it has been for the past 8 years under the racist-in-chief


-- 
Frosted Flake
0
Frosted
12/21/2016 2:34:01 AM
Reply: