How to use rb_enc_str_new() to create a String with UTF-8 encoding?

  • Follow


Hi, when I create a Ruby String from a C extension by using "rb_str_new(s,=
=20
len)" I get a String with US-ASCII encoding.

I don't want to call later String#force_encoding(:"UTF-8") but instead use =
the=20
rb_enc_str_new() function in string.c:


  VALUE
  rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
  {
      VALUE str =3D rb_str_new(ptr, len);
      rb_enc_associate(str, enc);
      return str;
  }

But I have no idea on how to set 'enc' parameter to be "UTF-8".
How should I fill the third 'enc' argument?

Thanks a lot.

=2D-=20
I=C3=B1aki Baz Castillo <ibc@aliax.net>

0
Reply ibc (607) 12/2/2009 3:17:08 PM

Iñaki Baz Castillo wrote:
>   VALUE
>   rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
>   {
>       VALUE str = rb_str_new(ptr, len);
>       rb_enc_associate(str, enc);
>       return str;
>   }
> 
> But I have no idea on how to set 'enc' parameter to be "UTF-8".
> How should I fill the third 'enc' argument?

I'd say give it a pointer to an rb_encoding object.

Have a look in encoding.c, this particular function might be useful:

rb_encoding *
rb_enc_find(const char *name)
{
    int idx = rb_enc_find_index(name);
    if (idx < 0) idx = 0;
    return rb_enc_from_index(idx);
}

-- 
Posted via http://www.ruby-forum.com/.

0
Reply Brian 12/2/2009 4:45:29 PM


El Mi=C3=A9rcoles, 2 de Diciembre de 2009, Brian Candler escribi=C3=B3:
> I=C3=B1aki Baz Castillo wrote:
> >   VALUE
> >   rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
> >   {
> >       VALUE str =3D rb_str_new(ptr, len);
> >       rb_enc_associate(str, enc);
> >       return str;
> >   }
> >
> > But I have no idea on how to set 'enc' parameter to be "UTF-8".
> > How should I fill the third 'enc' argument?
>=20
> I'd say give it a pointer to an rb_encoding object.
>=20
> Have a look in encoding.c, this particular function might be useful:
>=20
> rb_encoding *
> rb_enc_find(const char *name)
> {
>     int idx =3D rb_enc_find_index(name);
>     if (idx < 0) idx =3D 0;
>     return rb_enc_from_index(idx);
> }

Humm, it involves allocating memory for the rb_encoding object and so... no=
t=20
so trivial as I desired :)
But that's the way. Thanks a lot.


=2D-=20
I=C3=B1aki Baz Castillo <ibc@aliax.net>

0
Reply utf 12/2/2009 5:03:21 PM

Iñaki Baz Castillo wrote:
> El Miércoles, 2 de Diciembre de 2009, Brian Candler escribió:
>> > How should I fill the third 'enc' argument?
>>     return rb_enc_from_index(idx);
>> }
> 
> Humm, it involves allocating memory for the rb_encoding object

Why? AFAICS, you can just pass a pointer to an existing encoding object. 
They are not mutated.

There are other examples, e.g. from io.c

#ifdef _WIN32
    if (utf16 == (rb_encoding *)-1) {
        utf16 = rb_enc_find("UTF-16LE");
        if (utf16 == rb_ascii8bit_encoding())
            utf16 = NULL;
    }
    if (utf16) {
        VALUE wfname = rb_str_encode(fname, rb_enc_from_encoding(utf16), 
0,
                                     Qnil);
        rb_enc_str_buf_cat(wfname, "", 1, utf16); /* workaround */
        data.fname = RSTRING_PTR(wfname);
        data.wchar = 1;
    }
    else {
        data.wchar = 0;
    }
#endif

It looks like rb_enc_from_encoding() takes a pointer to the rb_encoding 
object returned from rb_enc_find, and turns it into a VALUE
-- 
Posted via http://www.ruby-forum.com/.

0
Reply Brian 12/2/2009 5:59:17 PM

El Mi=C3=A9rcoles, 2 de Diciembre de 2009, Brian Candler escribi=C3=B3:
> I=C3=B1aki Baz Castillo wrote:
> > El Mi=C3=A9rcoles, 2 de Diciembre de 2009, Brian Candler escribi=C3=B3:
> >> > How should I fill the third 'enc' argument?
> >>
> >>     return rb_enc_from_index(idx);
> >> }
> >
> > Humm, it involves allocating memory for the rb_encoding object
>=20
> Why? AFAICS, you can just pass a pointer to an existing encoding object.
> They are not mutated.
>=20
> There are other examples, e.g. from io.c
>=20
> #ifdef _WIN32
>     if (utf16 =3D=3D (rb_encoding *)-1) {
>         utf16 =3D rb_enc_find("UTF-16LE");
>         if (utf16 =3D=3D rb_ascii8bit_encoding())
>             utf16 =3D NULL;
>     }
>     if (utf16) {
>         VALUE wfname =3D rb_str_encode(fname, rb_enc_from_encoding(utf16),
> 0,
>                                      Qnil);
>         rb_enc_str_buf_cat(wfname, "", 1, utf16); /* workaround */
>         data.fname =3D RSTRING_PTR(wfname);
>         data.wchar =3D 1;
>     }
>     else {
>         data.wchar =3D 0;
>     }
> #endif
>=20
> It looks like rb_enc_from_encoding() takes a pointer to the rb_encoding
> object returned from rb_enc_find, and turns it into a VALUE

Ok, so the rb_encoding objects already exist and I just must use a point to=
=20
it.
Thanks a lot.=20


=2D-=20
I=C3=B1aki Baz Castillo <ibc@aliax.net>

0
Reply utf 12/2/2009 10:01:14 PM

4 Replies
130 Views

(page loaded in 0.07 seconds)


Reply: