Save file defaults to utf-16

  • Follow


I'm struggling to get emacs 21.2.1 set up on a debian 3.0, and having
trouble with character sets.

For example, when I save a document that contains an accented
character, the default is utf-16. If I catch it I can change to utf-8,
but I would rather change the defaults. The resulting (html) file
can't be displayed in a browser (galeon) and text editors can't cope
(in Nedit, for example, an � (e grave) becomes è).

Incidentally, what is this è (Uppercase A with a tilde, followed by a
orphaned umlaut)? Just garbage?

-- 
      Haines Brown
	brownh@hartford-hwp.com
	kb1grm@arrl.net
	www.hartford-hwp.com
        
0
Reply Haines 12/8/2003 1:39:09 PM

Hi Haines,


Haines Brown <brownh@teufel.hartford-hwp.com> writes:

> I'm struggling to get emacs 21.2.1 set up on a debian 3.0, and
> having trouble with character sets.
>
> For example, when I save a document that contains an accented
> character, the default is utf-16. If I catch it I can change to
> utf-8, but I would rather change the defaults. The resulting (html)
> file can't be displayed in a browser (galeon) and text editors can't
> cope (in Nedit, for example, an � (e grave) becomes è).

There must be some misunderstanding.  If I remember right, 21.2
doesn't support UTF-16 OOTB.  Although Debian may have added a package
to do that.

Anyway, do these info nodes help: (info "(emacs)Recognize Coding") or
(info "(emacs)Coding Systems") ?

> Incidentally, what is this è (Uppercase A with a tilde, followed by a
> orphaned umlaut)? Just garbage?

That is typically UTF-8, wrongly interpreted as Latin-1.


benny

0
Reply Benjamin 12/8/2003 2:46:03 PM


Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes:

> There must be some misunderstanding.  If I remember right, 21.2
> doesn't support UTF-16 OOTB.  Although Debian may have added a package
> to do that.

All I can add is that under 21.2.1, when a document has an accented
character, and I save it, I'm prompted that default coding system is
something like utf-16-f (maybe f for French char?), and the result is
a 16-bit file that only emacs knows how to handle. I have to type in
utf-8 to get an exportable document. 

> Anyway, do these info nodes help: (info "(emacs)Recognize Coding") or
> (info "(emacs)Coding Systems") ?

I tried to follow your recommendation, but failed, I regret to
admit. I Browsed Manuals with Info, but nothing on "Coding" or on
"nodes." The Info menu has nothing on "nodes."

> > Incidentally, what is this è (Uppercase A with a tilde, followed
> > by a orphaned umlaut)? Just garbage?
> 
> That is typically UTF-8, wrongly interpreted as Latin-1.

That's very useful to know, thanks. I often see this and wonder
what's up. 

-- 
      Haines Brown
	brownh@hartford-hwp.com
	kb1grm@arrl.net
	www.hartford-hwp.com
        
0
Reply Haines 12/8/2003 4:16:56 PM

Hi Haines,


> Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes:
>> There must be some misunderstanding.  If I remember right, 21.2
>> doesn't support UTF-16 OOTB.  Although Debian may have added a
>> package to do that.

Haines Brown <brownh@teufel.hartford-hwp.com> writes:
> All I can add is that under 21.2.1, when a document has an accented
> character, and I save it, I'm prompted that default coding system is
> something like utf-16-f (maybe f for French char?), and the result
> is a 16-bit file that only emacs knows how to handle.

"Something like" can be of course anything.  The version of Emacs 21.2
that I just checked doesn't have a utf-16 coding system.  Your
distribution may have installed whatever it likes and done any
preconfiguration it likes.  You may also want to ask in a Debian
support group about this.

>> Anyway, do these info nodes help: (info "(emacs)Recognize Coding") or
>> (info "(emacs)Coding Systems") ?
>
> I tried to follow your recommendation, but failed, I regret to
> admit.

The tutorial says:

>>>>>>>
   C-h i	Read On-line Manuals (a.k.a. Info).  This command puts
		you into a special buffer called `*info*' where you
		can read on-line manuals for the packages installed on
		your system.  Type m emacs <Return> to read the Emacs
		manual.  If you have never before used Info, type ?
		and Emacs will take you on a guided tour of Info mode
		facilities.  Once you are through with this tutorial,
		you should consult the Emacs Info manual as your
		primary documentation.
<<<<<<

If you have never done the tutorial, you may want to do that now.
Even if you skip some bits that you don't need, it points to a number
of things that you may not yet know.

To get to the nodes that I mentioned, you just need to execute the
Lisp expressions that I wrote.  Put the cursor after the last closing
parentheses and type C-x C-e.


Back to the original topic, why is it that Emacs is setup to use some
Unicode encoding and your other tools (you mentioned nedit) are not?
Maybe your locale isn't setup correctly?  It should specify the 8-bit
encoding to use, and that should be enough as a default for all your
applications.  (Note that I am not an expert in locales, much less on
Debian systems, so you want to find some reference for that, if you
want to pursue this angle.)


benny
0
Reply Benjamin 12/8/2003 6:01:51 PM

Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes:

> Haines Brown <brownh@teufel.hartford-hwp.com> writes:
> > All I can add is that under 21.2.1, when a document has an accented
> > character, and I save it, I'm prompted that default coding system is
> > something like utf-16-f (maybe f for French char?), and the result
> > is a 16-bit file that only emacs knows how to handle.
> 
> "Something like" can be of course anything. 

Sorry about that ;-( I ran a test and see that it was utf-16-be (big
endian). 

> The version of Emacs 21.2
> that I just checked doesn't have a utf-16 coding system.  Your
> distribution may have installed whatever it likes and done any
> preconfiguration it likes.  You may also want to ask in a Debian
> support group about this.

Yes, thanks. I suspect I set locale up for utf-16. 
 
> >> Anyway, do these info nodes help: (info "(emacs)Recognize Coding") or
> >> (info "(emacs)Coding Systems") ?
> >
> > I tried to follow your recommendation, but failed, I regret to
> > admit.
> 
> The tutorial says:
> 
> >>>>>>>
>    C-h i	Read On-line Manuals (a.k.a. Info).  This command puts
> 		you into a special buffer called `*info*' where you
> 		can read on-line manuals for the packages installed on
> 		your system.  Type m emacs <Return> to read the Emacs
> 		manual.  If you have never before used Info, type ?
> 		and Emacs will take you on a guided tour of Info mode
> 		facilities.  Once you are through with this tutorial,
> 		you should consult the Emacs Info manual as your
> 		primary documentation.

Even with all your help, it was still a long hit-or-miss struggle. I
visit Info once every few months, and then promptly forget what I
learned. If used regularly, it would be easy, but I don't find it all
that intuitive. In any case, I did finally arrive at the proper manual
section. Thanks.  

-- 
      Haines Brown
	brownh@hartford-hwp.com
	kb1grm@arrl.net
	www.hartford-hwp.com
        
0
Reply Haines 12/8/2003 9:12:55 PM

Hi Haines,


It's good to know that you are making progress with your problem now.

>> Haines Brown <brownh@teufel.hartford-hwp.com> writes:
> I ran a test and see that it was utf-16-be (big endian).

That makes sense.  Well, limited sense, as it isn't what you want
:-((.

> Yes, thanks. I suspect I set locale up for utf-16. 

UTF-16 is, as the name says, a 16-bit encoding.  As I see it, that
doesn't make sense as a locale setting, because the locale is supposed
to tell you what to use for your standard 8-bit encoding.  On Unix,
all applications are supposed to handle 8-bit text for interchange.

> [About info] If used regularly, it would be easy, but I don't find
> it all that intuitive.

You maybe *should* use it more regularly.  I find it a good investment
of my time to know and improve my tools, and Emacs is certainly one of
my most important tools.


benny
0
Reply Benjamin 12/8/2003 10:43:30 PM

Thanks, Benny. This is just a follow up to your helpful note.

Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes:

> >> Haines Brown <brownh@teufel.hartford-hwp.com> writes:
> > I ran a test and see that it was utf-16-be (big endian).
> 
> That makes sense.  Well, limited sense, as it isn't what you want
> :-((.

When I save a file that holds an accented character, I get: "No
default coding system to try." It then lists a couple dozen options to
choose from, including utf-8, which I now choose.

The minibuffer prompt reads: "Select coding system (default
utf-16-le):" Someone insisted this is not a default, but isn't it in
fact a default? Is utf-16-le the default solely because it is the most
universal of all possible coding systems (rather than being set
somewhere in emacs or debian)?   

> > Yes, thanks. I suspect I set locale up for utf-16. 
> 
> UTF-16 is, as the name says, a 16-bit encoding.  As I see it, that
> doesn't make sense as a locale setting, because the locale is supposed
> to tell you what to use for your standard 8-bit encoding.  On Unix,
> all applications are supposed to handle 8-bit text for interchange.

Yes, I now find my locales is broken (partially installed and
unoperational because of some problem with glibc), and the # locales
command does not report the encoding as it should. So my problem is
apparently with my locales setup in debian, not with emacs.
 
> > [About info] If used regularly, it would be easy, but I don't find
> > it all that intuitive.
> 
> You maybe *should* use it more regularly.  I find it a good investment
> of my time to know and improve my tools, and Emacs is certainly one of
> my most important tools.

;-( Yes, true in principle. BUT, a) I've a dozen or so (yes, really)
more pressing fires to put out all the time, and b) I find I have no
time to invest (at my age, after 12 or so hours of hard work, my brain
turns into jello, and so nothing much left in there for investment).

-- 
      Haines Brown
	brownh@hartford-hwp.com
	kb1grm@arrl.net
	www.hartford-hwp.com
        
0
Reply Haines 12/9/2003 1:06:07 PM

Hi Haines,


Haines Brown <brownh@teufel.hartford-hwp.com> writes:
> Is utf-16-le the default solely because it is the most universal of
> all possible coding systems (rather than being set somewhere in
> emacs or debian)?

It is probably the default because the people packaging Emacs for
Debian thought so.  They had to install some additional package like
Mule-UCS to even get that coding system.


benny
0
Reply Benjamin 12/9/2003 2:22:21 PM

My original difficulty was that when saving a file in emacs, it
reported it could fine no default coding system and asked which I
wanted. It suggested utf-16-le by default, but the resulting 16-bit
files are not understood by my other applications. So I need to define
utf-8 as the default. 

I had to take leave of this thread when I discovered that locales had
not been properly installed, and so my system lacked any default
coding system. After fixing it I get:

  $ locales
  LANG=en_US.UTF-8
  LC_CTYPE="en_US.UTF-8"
  LC_NUMERIC="en_US.UTF-8"
  LC_TIME="en_US.UTF-8"
  LC_COLLATE="en_US.UTF-8"
  LC_MONETARY="en_US.UTF-8"
  LC_MESSAGES="en_US.UTF-8"
  LC_PAPER="en_US.UTF-8"
  LC_NAME="en_US.UTF-8"
  LC_ADDRESS="en_US.UTF-8"
  LC_TELEPHONE="en_US.UTF-8"
  LC_MEASUREMENT="en_US.UTF-8"
  LC_IDENTIFICATION="en_US.UTF-8"
  LC_ALL=

I am running emacs 21.2.1 under debian 3.0. Although there have been
statements to the contrary, it is my understanding that this version
does handles MULE-UTC satisfactorily.

Based on reading the documentation, it is my impression that emacs
tries to figure out from a document being saved what an appropriate
coding system, based on a priority list. Otherwise, it looks to the
coding set established for the operating system. On my system, this is
now en-US and UTF-8.

Which of the above variables returned by $ locales does it use to find
out the system's coding system? 

If it is one of these variables, then I need to find out why emacs is
not finding it. 

My impression is that setting a value for
`default-buffer-file-coding-system' would be a work-around, but I'd
rather solve the problem than merely remove its symptoms.

-- 
      Haines Brown
	brownh@hartford-hwp.com
	kb1grm@arrl.net
	www.hartford-hwp.com
        
0
Reply Haines 12/14/2003 7:01:30 PM

Hi Haines,


Haines Brown <brownh@teufel.hartford-hwp.com> writes:
> I had to take leave of this thread

Hi again ;-)

> when I discovered that locales had not been properly installed, and
> so my system lacked any default coding system. After fixing it I
> get:
>
>   $ locales
>   LANG=en_US.UTF-8
>   LC_CTYPE="en_US.UTF-8"
>   [. . . ]

> I am running emacs 21.2.1 under debian 3.0. Although there have been
> statements to the contrary, it is my understanding that this version
> does handles MULE-UTC satisfactorily.

You mean Mule-UCS, I assume?  Yes, Emacs 21.2 should be able to load
that package, which gets you additional coding systems, like the
utf-16-le that you want to avoid.  That you have that coding system
indicates that the package is already installed, and that the Debian
maintainers added configuration to make its coding systems the
default.  How they did that exactly, we don't know yet.  As Mule-UCS
is an external package, these coding systems are not considered by the
stock Emacs by default automatically.

> Which of the above variables returned by $ locales does it use to
> find out the system's coding system?

That's all in the manual (the info pages).  Try searching for those
variables there.  If you don't know how to search in the manual,
*please* take the time to learn that.


benny
0
Reply Benjamin 12/15/2003 2:02:36 PM

Haines Brown <brownh@teufel.hartford-hwp.com> writes:

> My original difficulty was that when saving a file in emacs, it
> reported it could fine no default coding system and asked which I
> wanted. It suggested utf-16-le by default, but the resulting 16-bit
> files are not understood by my other applications. So I need to define
> utf-8 as the default. 
>
[...]
>   $ locales
>   LANG=en_US.UTF-8
[...]
>
> I am running emacs 21.2.1 under debian 3.0. Although there have been
> statements to the contrary, it is my understanding that this version
> does handles MULE-UTC satisfactorily.
[...]

mule-ucs is an add-on package and it is possible that it is to blame
for your troubles. Please try to start Emacs from the command line
like this:

emacs -q --no-site-file

With your locale settings Emacs should save files in UTF-8 by default.

    Oliver
-- 
25 Frimaire an 212 de la Révolution
Liberté, Egalité, Fraternité!
0
Reply Oliver 12/15/2003 4:09:23 PM

Oliver Scholz <alkibiades@gmx.de> writes:

> Haines Brown <brownh@teufel.hartford-hwp.com> writes:
> 
> > I am running emacs 21.2.1 under debian 3.0. Although there have been
> > statements to the contrary, it is my understanding that this version
> > does handles MULE-UTC satisfactorily.
> [...]
> 
> mule-ucs is an add-on package and it is possible that it is to blame
> for your troubles. Please try to start Emacs from the command line
> like this:
> 
> emacs -q --no-site-file

I experimented and found that when I use the --no-site-file argument,
the problem is solved, but not with just the -q argument for the init
file.

So my next job was to find out what the "site-file" is. I took Benny
R's good advice and spent about three quarters an hour learning to
search Info files, but the search did not turn up anything on
"(no-)site-file." I didn't see anything in the list of files
associated with emacs that looked like it.

I did find that there are such things as site-load.el and
site-init.el, but they are not on my machine. I found that there's a
/etc/emacs/site-start.el, but nothing in in affecting coding
system. However, another script in /etc/emacs/site-start.d does
reference mule-ucs, which I do have in
/usr/share/emacs/site-lisp/debian-starup.el. While this file does not
look like it has anything relevant, the site-list directory does refer
to mule-ucs, but I'm in over my head here.

-- 
      Haines Brown
	brownh@hartford-hwp.com
	kb1grm@arrl.net
	www.hartford-hwp.com
        
0
Reply Haines 12/15/2003 5:45:02 PM

Haines Brown <brownh@teufel.hartford-hwp.com> writes:

> So my next job was to find out what the "site-file" is. I took Benny
> R's good advice and spent about three quarters an hour learning to
> search Info files, but the search did not turn up anything on
> "(no-)site-file."

Really? The following worked for me:

C-h i       - start info
m Emacs     - select Emacs manual
i site-file - index search for "site-file".

> /etc/emacs/site-start.el, but nothing in in affecting coding
> system. However, another script in /etc/emacs/site-start.d does
> reference mule-ucs, which I do have in
> /usr/share/emacs/site-lisp/debian-starup.el. While this file does not
> look like it has anything relevant, the site-list directory does refer
> to mule-ucs, but I'm in over my head here.

This sounds like Debian specific
modifications. /etc/emacs/site-start.el will be either a symbolic link
to /usr/share/emacs/site-lisp/site-start.el, or /etc/emacs is added
to the load-path in the Debian version. debian-startup.el is probably
loaded from there, and all files in /etc/emacs/site-start.d are
probably loaded by debian-startup.el.
0
Reply jasonr 12/15/2003 10:11:35 PM

jasonr (Jason Rumney) @  f2s.com writes:

> Really? The following worked for me:
> 
> C-h i       - start info
> m Emacs     - select Emacs manual
> i site-file - index search for "site-file".

Yeah, it works for me too, _now_. I suspect I was trying to get
directly to an index without specifying which manual. Thanks for the
guidance. 

> > /etc/emacs/site-start.el, but nothing in in affecting coding
> > system. However, another script in /etc/emacs/site-start.d does
> > reference mule-ucs, which I do have in
> > /usr/share/emacs/site-lisp/debian-starup.el. While this file does not
> > look like it has anything relevant, the site-list directory does refer
> > to mule-ucs, but I'm in over my head here.
> 
> This sounds like Debian specific
> modifications. /etc/emacs/site-start.el will be either a symbolic link
> to /usr/share/emacs/site-lisp/site-start.el, or /etc/emacs is added
> to the load-path in the Debian version. debian-startup.el is probably
> loaded from there, and all files in /etc/emacs/site-start.d are
> probably loaded by debian-startup.el.

No help for it but to return to the debian group, I guess.

-- 
      Haines Brown
	brownh@hartford-hwp.com
	kb1grm@arrl.net
	www.hartford-hwp.com
        
0
Reply Haines 12/15/2003 10:34:30 PM

Hi Haines,


Ok, I just had some time on my hands and an Emacs 21.2 and Mule-UCS
lying around so I made a few experiments.

I can reproduce your problem with just initializing Mule-UCS with the
default procedure as documented in that package and triggering the
auto-detection process for a file.

It looks like Emacs is not considering the external UTF-8 coding
system for its defaults which are based on the locale settings (not
surprising, locale detection is probably running long before loading
Mule-UCS), and from what you are reporting, Mule-UCS probably doesn't
consider the locale settings at all.

The remedy on my system is

  (prefer-coding-system 'utf-8)

which you could put into your ~/.emacs. 

Does that help?


> jasonr (Jason Rumney) @  f2s.com writes:
>> C-h i       - start info
>> m Emacs     - select Emacs manual
>> i site-file - index search for "site-file".

Haines Brown <brownh@teufel.hartford-hwp.com> writes:
> Yeah, it works for me too, _now_. I suspect I was trying to get
> directly to an index without specifying which manual. Thanks for the
> guidance.

Hm, yes.  I see that the info help (C-h i ?) doesn't mention that the
info system is partitioned into separate manuals with separate indices
and separate search scopes.

That "page" also doesn't allow current standard keyboard navigation
(cursor, pgup, pgdown) and it doesn't allow copying text or printing
:-((.  That page could use a good overhaul in both content and
usability, I guess.


benny
0
Reply Benjamin 12/16/2003 11:32:45 AM

Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes:

> It looks like Emacs is not considering the external UTF-8 coding
> system for its defaults which are based on the locale settings (not
> surprising, locale detection is probably running long before loading
> Mule-UCS), and from what you are reporting, Mule-UCS probably doesn't
> consider the locale settings at all.
> 
> The remedy on my system is
> 
>   (prefer-coding-system 'utf-8)
> 
> which you could put into your ~/.emacs. 
> 
> Does that help?

Yes it does, and thank you.

It seems you had to resort to this "remedy" as well It sounds like
there's a bug somewhere. Is that your impression? If so, then I'm more
comfortable adding a work-around.

-- 
      Haines Brown
	brownh@hartford-hwp.com
	kb1grm@arrl.net
	www.hartford-hwp.com
        
0
Reply Haines 12/16/2003 12:17:38 PM

Hi Haines,


Haines Brown <brownh@teufel.hartford-hwp.com> writes:
> It seems you had to resort to this "remedy" as well It sounds like
> there's a bug somewhere. Is that your impression? If so, then I'm
> more comfortable adding a work-around.

I think that we can not expect any of the concerned development teams
to do much about this directly for various reasons.  Debian is the one
immediatly responsible IMO.  But they will probably just update to
Emacs 21.3 sooner or later, which comes with its own support for
UTF-8, and where this will than work OOTB.


benny
0
Reply Benjamin 12/16/2003 12:47:35 PM

16 Replies
395 Views

(page loaded in 0.158 seconds)

Similiar Articles:


















7/22/2012 5:22:13 PM


Reply: