String copy-on-write question

  • Follow


Hello group,

Ruby implements copy-on-write for strings, so you can do stuff like
this very cheaply:

   str = 0.chr * (2**24)  # 16MiB allocated
   str[100..-1]  # this costs only a small amount of memory

How come this optimization does not apply in this case?:

  str[100..-2]   # this costs around 16MiB bytes of memory

As a side effect, if using regexps on a large string, the pre-match
and post-match variables behave differently:

  s = 0.chr * (2**23) + "Hello" + 0.chr * (2**23)   # About 16MiB
allocated (after GC)
  s.scan(/Hello/) { |m| p m }   # This is free
  p $'.size  # This is free
  p $`.size  # This costs another 8MiB.

Any insights?

Lars
0
Reply larsch (43) 5/5/2008 3:37:02 PM

Lars Christensen wrote:

 Well, it's best if you look at rb_str_substr() in string.c

>    str[100..-1]  # this costs only a small amount of memory

 ruby just need to adjust the pointer and the length in the new
 object 

>   str[100..-2]   # this costs around 16MiB bytes of memory

 one character is missing from the previous string, if it do the
 same thing than previously then it must
  * adjust the pointer
  * adjust the length
  * add \0 at the end

 This mean that fatally it has modified the string, this is why it
 duplicate.

>   p $'.size  # This is free
>   p $`.size  # This costs another 8MiB.

 same reason here.


Guy Decoux

0
Reply decoux (1351) 5/5/2008 4:07:09 PM


On 05.05.2008 18:07, ts wrote:
> Lars Christensen wrote:
> 
>  Well, it's best if you look at rb_str_substr() in string.c
> 
>>    str[100..-1]  # this costs only a small amount of memory
> 
>  ruby just need to adjust the pointer and the length in the new
>  object 
> 
>>   str[100..-2]   # this costs around 16MiB bytes of memory
> 
>  one character is missing from the previous string, if it do the
>  same thing than previously then it must
>   * adjust the pointer
>   * adjust the length
>   * add \0 at the end
> 
>  This mean that fatally it has modified the string, this is why it
>  duplicate.
> 
>>   p $'.size  # This is free
>>   p $`.size  # This costs another 8MiB.
> 
>  same reason here.

Interesting.  Do you also happen to know why not an additional field is 
used that stores the length?  Is the reason maybe usage of C library 
string functions that work on zero terminated strings?

Cheers

	robert
0
Reply shortcutter (5766) 5/5/2008 4:15:38 PM

Robert Klemme wrote:
> Interesting.  Do you also happen to know why not an additional field is 
> used that stores the length?

 I've not understood : it has a field which give it the length of
 the string, for example with

  str = '0' * 200
  str[100 .. -1]

 the first object (in str) will have 200 for its length
 the field length in the new object will have the value 100  
  
>                              Is the reason maybe usage of C library 
> string functions that work on zero terminated strings?

 only matz know this :-)


Guy Decoux



0
Reply decoux (1351) 5/5/2008 4:33:52 PM

On 05.05.2008 18:33, ts wrote:
> Robert Klemme wrote:
>> Interesting.  Do you also happen to know why not an additional field is 
>> used that stores the length?
> 
>  I've not understood : it has a field which give it the length of
>  the string, for example with

Ah, ok.  This happens when one is too lazy to look into the source. :-) 
  Somehow I had assumed that the length was not stored because you made 
the point that the \0 could not be inserted without altering the 
original.  I concluded, there is no length. :-)

>   str = '0' * 200
>   str[100 .. -1]
> 
>  the first object (in str) will have 200 for its length
>  the field length in the new object will have the value 100  
>   
>>                              Is the reason maybe usage of C library 
>> string functions that work on zero terminated strings?
> 
>  only matz know this :-)

Well, maybe he'll stop by and enlighten us.

Kind regards

	robert
0
Reply shortcutter (5766) 5/5/2008 4:46:15 PM

4 Replies
37 Views

(page loaded in 0.332 seconds)

Similiar Articles:













7/25/2012 12:43:49 AM


Reply: