Hi, when I create a Ruby String from a C extension by using "rb_str_new(s,=
=20
len)" I get a String with US-ASCII encoding.
I don't want to call later String#force_encoding(:"UTF-8") but instead use =
the=20
rb_enc_str_new() function in string.c:
VALUE
rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
{
VALUE str =3D rb_str_new(ptr, len);
rb_enc_associate(str, enc);
return str;
}
But I have no idea on how to set 'enc' parameter to be "UTF-8".
How should I fill the third 'enc' argument?
Thanks a lot.
=2D-=20
I=C3=B1aki Baz Castillo <ibc@aliax.net>
|
|
0
|
|
|
|
Reply
|
ibc (607)
|
12/2/2009 3:17:08 PM |
|
Iñaki Baz Castillo wrote:
> VALUE
> rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
> {
> VALUE str = rb_str_new(ptr, len);
> rb_enc_associate(str, enc);
> return str;
> }
>
> But I have no idea on how to set 'enc' parameter to be "UTF-8".
> How should I fill the third 'enc' argument?
I'd say give it a pointer to an rb_encoding object.
Have a look in encoding.c, this particular function might be useful:
rb_encoding *
rb_enc_find(const char *name)
{
int idx = rb_enc_find_index(name);
if (idx < 0) idx = 0;
return rb_enc_from_index(idx);
}
--
Posted via http://www.ruby-forum.com/.
|
|
0
|
|
|
|
Reply
|
Brian
|
12/2/2009 4:45:29 PM
|
|
El Mi=C3=A9rcoles, 2 de Diciembre de 2009, Brian Candler escribi=C3=B3:
> I=C3=B1aki Baz Castillo wrote:
> > VALUE
> > rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
> > {
> > VALUE str =3D rb_str_new(ptr, len);
> > rb_enc_associate(str, enc);
> > return str;
> > }
> >
> > But I have no idea on how to set 'enc' parameter to be "UTF-8".
> > How should I fill the third 'enc' argument?
>=20
> I'd say give it a pointer to an rb_encoding object.
>=20
> Have a look in encoding.c, this particular function might be useful:
>=20
> rb_encoding *
> rb_enc_find(const char *name)
> {
> int idx =3D rb_enc_find_index(name);
> if (idx < 0) idx =3D 0;
> return rb_enc_from_index(idx);
> }
Humm, it involves allocating memory for the rb_encoding object and so... no=
t=20
so trivial as I desired :)
But that's the way. Thanks a lot.
=2D-=20
I=C3=B1aki Baz Castillo <ibc@aliax.net>
|
|
0
|
|
|
|
Reply
|
utf
|
12/2/2009 5:03:21 PM
|
|
Iñaki Baz Castillo wrote:
> El Miércoles, 2 de Diciembre de 2009, Brian Candler escribió:
>> > How should I fill the third 'enc' argument?
>> return rb_enc_from_index(idx);
>> }
>
> Humm, it involves allocating memory for the rb_encoding object
Why? AFAICS, you can just pass a pointer to an existing encoding object.
They are not mutated.
There are other examples, e.g. from io.c
#ifdef _WIN32
if (utf16 == (rb_encoding *)-1) {
utf16 = rb_enc_find("UTF-16LE");
if (utf16 == rb_ascii8bit_encoding())
utf16 = NULL;
}
if (utf16) {
VALUE wfname = rb_str_encode(fname, rb_enc_from_encoding(utf16),
0,
Qnil);
rb_enc_str_buf_cat(wfname, "", 1, utf16); /* workaround */
data.fname = RSTRING_PTR(wfname);
data.wchar = 1;
}
else {
data.wchar = 0;
}
#endif
It looks like rb_enc_from_encoding() takes a pointer to the rb_encoding
object returned from rb_enc_find, and turns it into a VALUE
--
Posted via http://www.ruby-forum.com/.
|
|
0
|
|
|
|
Reply
|
Brian
|
12/2/2009 5:59:17 PM
|
|
El Mi=C3=A9rcoles, 2 de Diciembre de 2009, Brian Candler escribi=C3=B3:
> I=C3=B1aki Baz Castillo wrote:
> > El Mi=C3=A9rcoles, 2 de Diciembre de 2009, Brian Candler escribi=C3=B3:
> >> > How should I fill the third 'enc' argument?
> >>
> >> return rb_enc_from_index(idx);
> >> }
> >
> > Humm, it involves allocating memory for the rb_encoding object
>=20
> Why? AFAICS, you can just pass a pointer to an existing encoding object.
> They are not mutated.
>=20
> There are other examples, e.g. from io.c
>=20
> #ifdef _WIN32
> if (utf16 =3D=3D (rb_encoding *)-1) {
> utf16 =3D rb_enc_find("UTF-16LE");
> if (utf16 =3D=3D rb_ascii8bit_encoding())
> utf16 =3D NULL;
> }
> if (utf16) {
> VALUE wfname =3D rb_str_encode(fname, rb_enc_from_encoding(utf16),
> 0,
> Qnil);
> rb_enc_str_buf_cat(wfname, "", 1, utf16); /* workaround */
> data.fname =3D RSTRING_PTR(wfname);
> data.wchar =3D 1;
> }
> else {
> data.wchar =3D 0;
> }
> #endif
>=20
> It looks like rb_enc_from_encoding() takes a pointer to the rb_encoding
> object returned from rb_enc_find, and turns it into a VALUE
Ok, so the rb_encoding objects already exist and I just must use a point to=
=20
it.
Thanks a lot.=20
=2D-=20
I=C3=B1aki Baz Castillo <ibc@aliax.net>
|
|
0
|
|
|
|
Reply
|
utf
|
12/2/2009 10:01:14 PM
|
|
|
4 Replies
130 Views
(page loaded in 0.07 seconds)
|