Bizarre Range behavior

  • Follow


Can someone please explain this behavior in ruby (1.8.6p111):

>> ("2"..."8").to_a
=> ["2", "3", "4", "5", "6", "7"]
>> ("2".."8").to_a
=> ["2", "3", "4", "5", "6", "7", "8"]
>> ("2".."9").to_a
=> ["2", "3", "4", "5", "6", "7", "8", "9"]
>> ("2".."10").to_a
=> []
>> ("2".."11").to_a
=> []
>> ("1".."11").to_a
=> ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"]


Cheers,
Scott
-- 
Posted via http://www.ruby-forum.com/.

0
Reply scott.br (4) 8/4/2009 6:47:05 PM

On Aug 4, 1:47=A0pm, Scott Briggs <scott...@gmail.com> wrote:
> Can someone please explain this behavior in ruby (1.8.6p111):
>
> >> ("2"..."8").to_a
>
> =3D> ["2", "3", "4", "5", "6", "7"]>> ("2".."8").to_a
>
> =3D> ["2", "3", "4", "5", "6", "7", "8"]>> ("2".."9").to_a
>
> =3D> ["2", "3", "4", "5", "6", "7", "8", "9"]>> ("2".."10").to_a
> =3D> []
> >> ("2".."11").to_a
> =3D> []
> >> ("1".."11").to_a
>
> =3D> ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"]


It gets better.

    >> ("100".."11").to_a
    =3D> ["100"]

It seems you're running not so much into strange Range behavior as
strange String behavior in certain numeric circumstances. Or maybe a
combination of strange Range and String behvior. If you want the
ranges to make more sense, use actual numbers.

    >> (2..11).to_a
    =3D> [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

If you want strings in the result, you can get that with a little bit
of work.

    >> (2..11).to_a.map { |x|  x.to_s }
    =3D> ["2", "3", "4", "5", "6", "7", "8", "9", "10", "11"]

--
-yossef

0
Reply ymendel (158) 8/4/2009 7:04:57 PM


On Wed, 5 Aug 2009, Scott Briggs wrote:

> Can someone please explain this behavior in ruby (1.8.6p111):
>
>>> ("2"..."8").to_a
> => ["2", "3", "4", "5", "6", "7"]
>>> ("2".."8").to_a
> => ["2", "3", "4", "5", "6", "7", "8"]
>>> ("2".."9").to_a
> => ["2", "3", "4", "5", "6", "7", "8", "9"]
>>> ("2".."10").to_a
> => []
>>> ("2".."11").to_a
> => []
>>> ("1".."11").to_a
> => ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"]
>

It's because you're using strings -- "11" comes before "2", hence the 
failure, because it's an invalid range, just as if you had (11 .. 2) is 
invalid.

Matt

0
Reply matt2482 (43) 8/4/2009 7:15:04 PM

Matt, that doesn't explain why "1".."11" works and "2".."11" doesn't 
work.

Scott

Matthew K. Williams wrote:
> On Wed, 5 Aug 2009, Scott Briggs wrote:
> 
>>>> ("2".."11").to_a
>> => []
>>>> ("1".."11").to_a
>> => ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"]
>>
> 
> It's because you're using strings -- "11" comes before "2", hence the
> failure, because it's an invalid range, just as if you had (11 .. 2) is
> invalid.
> 
> Matt

-- 
Posted via http://www.ruby-forum.com/.

0
Reply scott.br (4) 8/4/2009 7:26:52 PM

Ah, I should clarify that.  When ruby interprets "11" as an integer 11 
for "1".."11", then why doesn't it do the same when it's "2".."11"?

Scott

Scott Briggs wrote:
> Matt, that doesn't explain why "1".."11" works and "2".."11" doesn't 
> work.
> 
> Scott
> 
> Matthew K. Williams wrote:
>> On Wed, 5 Aug 2009, Scott Briggs wrote:
>> 
>>>>> ("2".."11").to_a
>>> => []
>>>>> ("1".."11").to_a
>>> => ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"]
>>>
>> 
>> It's because you're using strings -- "11" comes before "2", hence the
>> failure, because it's an invalid range, just as if you had (11 .. 2) is
>> invalid.
>> 
>> Matt

-- 
Posted via http://www.ruby-forum.com/.

0
Reply scott.br (4) 8/4/2009 7:29:24 PM

On Aug 4, 2009, at 3:04 PM, Yossef Mendelssohn wrote:
> On Aug 4, 1:47 pm, Scott Briggs <scott...@gmail.com> wrote:
>> Can someone please explain this behavior in ruby (1.8.6p111):
>>
>>>> ("2"..."8").to_a
>>
>> => ["2", "3", "4", "5", "6", "7"]>> ("2".."8").to_a
>>
>> => ["2", "3", "4", "5", "6", "7", "8"]>> ("2".."9").to_a
>>
>> => ["2", "3", "4", "5", "6", "7", "8", "9"]>> ("2".."10").to_a
>> => []
>>>> ("2".."11").to_a
>> => []
>>>> ("1".."11").to_a
>>
>> => ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"]

Well, you need to think about String#succ when the Range endpoints are  
String.

>
>
> It gets better.
>
>>> ("100".."11").to_a
>    => ["100"]

Now, that one is odd. I'd have predicted a result of:
  => ["100", "101", "102", "103", "104", "105", "106", "107", "108",  
"109"]
on the basis of staring with "100" and applying #succ until the value  
was >"11" like this loop does:

a = []
v = "100"
loop do
   break if v > "11"
   a << v
   v = v.succ
end
p a

This loop produced the "right" result for "2".."11" (namely an empty  
array) so the actual result defies (my) explanation.

>
> It seems you're running not so much into strange Range behavior as
> strange String behavior in certain numeric circumstances. Or maybe a
> combination of strange Range and String behvior. If you want the
> ranges to make more sense, use actual numbers.
>
>>> (2..11).to_a
>    => [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>
> If you want strings in the result, you can get that with a little bit
> of work.
>
>>> (2..11).to_a.map { |x|  x.to_s }
>    => ["2", "3", "4", "5", "6", "7", "8", "9", "10", "11"]
>
> --
> -yossef



Of course, you can also do things like:

("a".."ah").to_a
=> ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",  
"n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "aa",  
"ab", "ac", "ad", "ae", "af", "ag", "ah"]

Which might help label your spreadsheet columns.

-Rob

Rob Biedenharn		http://agileconsultingllc.com
Rob@AgileConsultingLLC.com



0
Reply Rob7461 (595) 8/4/2009 7:38:33 PM

On Wed, 5 Aug 2009, Scott Briggs wrote:

> Matt, that doesn't explain why "1".."11" works and "2".."11" doesn't
> work.

irb(main):015:0> "1" < "11"
=> true
irb(main):016:0> "2" < "11"
=> false


irb(main):021:0> "11" < "2"
=> true

This is true because it's comparing strings to get the range -- it 
compares the first character of each string, then stops when it can't go 
any further.

Try this for an example of how the expansion is occurring:

("a".."cat".to_a

(I'm only putting a portion of it here)

=> ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", 
"o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "aa", "ab",
....
"caa", "cab", "cac", "cad", "cae", "caf", "cag", "cah", "cai", "caj", 
"cak", "cal", "cam", "can", "cao", "cap", "caq", "car", "cas", "cat"]

In string order, it's going to compare strings of length 1 first, then 
strings of length 2, etc...  Here's another example (with an attempt at an 
explanation):

irb(main):019:0> ("11" .. "2").to_a
=> ["11"]

As we've seen before, "11" < "2", so it's a part of the range, but then it 
stops, we're done.

Matt

>
> Scott
>
> Matthew K. Williams wrote:
>> On Wed, 5 Aug 2009, Scott Briggs wrote:
>>
>>>>> ("2".."11").to_a
>>> => []
>>>>> ("1".."11").to_a
>>> => ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"]
>>>
>>
>> It's because you're using strings -- "11" comes before "2", hence the
>> failure, because it's an invalid range, just as if you had (11 .. 2) is
>> invalid.
>>
>> Matt
>
> -- 
> Posted via http://www.ruby-forum.com/.
>
>

0
Reply matt2482 (43) 8/4/2009 7:40:10 PM

On Aug 4, 2009, at 3:15 PM, Matthew K. Williams wrote:
> On Wed, 5 Aug 2009, Scott Briggs wrote:
> Can someone please explain this behavior in ruby (1.8.6p111):
>>
>>>> ("2"..."8").to_a
>> => ["2", "3", "4", "5", "6", "7"]
>>>> ("2".."8").to_a
>> => ["2", "3", "4", "5", "6", "7", "8"]
>>>> ("2".."9").to_a
>> => ["2", "3", "4", "5", "6", "7", "8", "9"]
>>>> ("2".."10").to_a
>> => []
>>>> ("2".."11").to_a
>> => []
>>>> ("1".."11").to_a
>> => ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"]
>
> It's because you're using strings -- "11" comes before "2", hence  
> the failure, because it's an invalid range, just as if you had  
> (11 .. 2) is invalid.
>
> Matt


Well, it certainly isn't invalid. You can easily have a Range where  
the end is less than the begin value.

r = 3..-1
=> 3..-1
irb> r.to_a
=> []
irb> "hello"[r]
=> "lo"

-Rob

Rob Biedenharn		http://agileconsultingllc.com
Rob@AgileConsultingLLC.com




0
Reply Rob7461 (595) 8/4/2009 7:42:53 PM

On Wed, 5 Aug 2009, Rob Biedenharn wrote:

>> It gets better.
>> 
>>>> ("100".."11").to_a
>>    => ["100"]
>
> Now, that one is odd. I'd have predicted a result of:
> => ["100", "101", "102", "103", "104", "105", "106", "107", "108", "109"]
> on the basis of staring with "100" and applying #succ until the value was 
>> "11" like this loop does:
>

It's doing a comparison of the strings -- it has to do with the 
length of the string.  "100" is longer than "11", it also happens to be 
less characters (and, based on #succ, it's "less").

In order to find the range, it's going to compare the two strings --

+  it compares for the string lengths to get whether the beginning is less 
than the end

+ It then uses #succ to try to expand the range, but since "100" has more 
characters than "11", it stops...

Hope I've not muddied it too much.....

Matt

0
Reply matt2482 (43) 8/4/2009 7:45:56 PM

On Wed, 5 Aug 2009, Rob Biedenharn wrote:

> Well, it certainly isn't invalid. You can easily have a Range where the end 
> is less than the begin value.
>
> r = 3..-1
> => 3..-1
> irb> r.to_a
> => []
> irb> "hello"[r]
> => "lo"

I guess the code for substring treats it differently than #to_a -- just 
taking the bounds.  Huh.  That's pretty interesting.  Learn something 
every day.  Makes sense when I stop to think about it, though.

Just don't try "hello"[3,-1]....

I need to read the rdocs more often....
Matt

0
Reply matt2482 (43) 8/4/2009 8:04:39 PM

On Aug 4, 2009, at 3:45 PM, Matthew K. Williams wrote:
> On Wed, 5 Aug 2009, Rob Biedenharn wrote:
>>> It gets better.
>>>>> ("100".."11").to_a
>>>   => ["100"]
>>
>> Now, that one is odd. I'd have predicted a result of:
>> => ["100", "101", "102", "103", "104", "105", "106", "107", "108",  
>> "109"]
>> on the basis of staring with "100" and applying #succ until the  
>> value was
>>> "11" like this loop does:
>>
>
> It's doing a comparison of the strings -- it has to do with the  
> length of the string.  "100" is longer than "11", it also happens to  
> be less characters (and, based on #succ, it's "less").
>
> In order to find the range, it's going to compare the two strings --
>
> +  it compares for the string lengths to get whether the beginning  
> is less than the end
>
> + It then uses #succ to try to expand the range, but since "100" has  
> more characters than "11", it stops...
>
> Hope I've not muddied it too much.....
>
> Matt


Well, the Range#to_a is actually Enumerable#to_a and uses Range#each  
defined in range.c

After checking that the beginning of the range responds to :succ and  
if it is a Fixnum (which are special), it finds that the Range.begin  
is a String:

     else if (TYPE(beg) == T_STRING) {
	VALUE args[5];
	long iter[2];

	args[0] = beg;
	args[1] = end;
	args[2] = range;
	iter[0] = 1;
	iter[1] = 1;
	rb_iterate(str_step, (VALUE)args, step_i, (VALUE)iter);
     }

str_step calls rb_str_upto defined in string.c

VALUE
rb_str_upto(VALUE beg, VALUE end, int excl)
{
     VALUE current, after_end;
     ID succ = rb_intern("succ");
     int n;

     StringValue(end);
     n = rb_str_cmp(beg, end);
     if (n > 0 || (excl && n == 0)) return beg;
     after_end = rb_funcall(end, succ, 0, 0);
     current = beg;
     while (!rb_str_equal(current, after_end)) {
	rb_yield(current);
	if (!excl && rb_str_equal(current, end)) break;
	current = rb_funcall(current, succ, 0, 0);
	StringValue(current);
	if (excl && rb_str_equal(current, end)) break;
	StringValue(current);
	if (RSTRING_LEN(current) > RSTRING_LEN(end) || RSTRING_LEN(current)  
== 0)
	    break;
     }

     return beg;
}

Now, not having read a lot of Ruby's C code, I'm not sure what some  
bits are for (like calling StringValue(current) so much), but it does  
ultimately behave almost like Matt said.  The difference being that  
the rb_yield(current) has already happened once before the length  
check (RSTRING_LEN(current) > RSTRING_LEN(end)).  I think the  
RSTRING_LEN(current)==0 is there to catch "".succ == "", but that just  
means that (""..any).to_a is [""] and yet ("".."").to_a is [] (because  
after_end will be "" and the loop is never entered).

So it's the odd situation that String is given some special treatment  
and has the unusual property that there are strings a,b such that:
a < b && a.length > b.length

Knowing this, here's an even more bizzare-looking example:

irb> "19".succ
=> "20"
irb> ("2".."19").to_a
=> []
irb> ("2"..."20").to_a
=> ["2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",  
"14", "15", "16", "17", "18", "19"]


-Rob


Rob Biedenharn		http://agileconsultingllc.com
Rob@AgileConsultingLLC.com




0
Reply Rob7461 (595) 8/4/2009 8:32:37 PM

Yukihiro Matsumoto wrote:
> What if I sprinkle more magic to the language and change String#upto
> to generate numerical sequences when all characters in edges are
> digits, so that
> 
> irb> ("2".."19").to_a
> => ["2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
> "14", "15", "16", "17", "18", "19"]
> irb> ("2"..."20").to_a
> => ["2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
> "14", "15", "16", "17", "18", "19", "20"]
> 
> Any opinion?

-1 for added complexity with little benefit
-- 
Posted via http://www.ruby-forum.com/.

0
Reply b.candler (2627) 8/5/2009 11:14:45 AM

[Note:  parts of this message were removed to make it a legal post.]

2009/8/5 Brian Candler <b.candler@pobox.com>

> Yukihiro Matsumoto wrote:
> > What if I sprinkle more magic to the language and change String#upto
> > to generate numerical sequences when all characters in edges are
> > digits, so that
> >
> > irb> ("2".."19").to_a
> > => ["2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
> > "14", "15", "16", "17", "18", "19"]
> > irb> ("2"..."20").to_a
> > => ["2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
> > "14", "15", "16", "17", "18", "19", "20"]
> >
> > Any opinion?
>
> -1 for added complexity with little benefit


I'm also against this. I prefer explicit type conversions here: changing
behaviour because a string happens to look like a number will likely cause
more problems than it solves. In fact this kind of thing shows up in
JavaScript and it usually masks bugs where the developer has failed to
properly handle user input.

If this change were to go ahead, I'd also argue for changing String#+ to
recognise numbers, which might also mean changing Numeric#+ for symmetry.

--
James Coglan
http://jcoglan.com

0
Reply jcoglan (199) 8/5/2009 11:44:21 AM

Piyush Ranjan wrote:
> +1 for that. and -1 for the change.
> 
> It just makes the developer look too stupid. Can't we let the developers
> understand the difference between a string and an integer ?

If it was 20 years ago, I'd understand this sentiment.  What I don't 
understand is why programming languages seem to insist on using 
semantics that don't adapt to the natural ways that humans interact or 
think.  It's almost as if people prefer to fight against the inevitable 
evolution of programming languages.

In this case, your only argument for not introducing this "magic" is 
because people need to understand the difference between a string and an 
integer, why is that so critical in this case?  There's no ambiguity in 
"2".."11".

There are a lot of constructs in ruby that make it much easier to use 
and understand from a natural language point of view, one of the big 
strengths of ruby, and this in turn makes it more accessible to people 
who are interested in programming and not getting bogged down in the 
minutiae of why "2" is greater than "11".

> 
> 
> I vote against. If people want numeric ranges, it's their job to use

-- 
Posted via http://www.ruby-forum.com/.

0
Reply scott.br (4) 8/17/2009 1:46:25 PM

On Monday 17 August 2009 08:46:25 am Scott Briggs wrote:
> Piyush Ranjan wrote:
> > +1 for that. and -1 for the change.
> >
> > It just makes the developer look too stupid. Can't we let the developers
> > understand the difference between a string and an integer ?
>
> If it was 20 years ago, I'd understand this sentiment.  What I don't
> understand is why programming languages seem to insist on using
> semantics that don't adapt to the natural ways that humans interact or
> think.

Because the semantics with which humans interact and think are ambiguous, 
often illogical, and often rely on intuition.

We can't give our languages intuition, but the more we try to do so, and the 
more magic we introduce, the less predictable things get.

> There are a lot of constructs in ruby that make it much easier to use
> and understand from a natural language point of view, one of the big
> strengths of ruby, and this in turn makes it more accessible to people
> who are interested in programming and not getting bogged down in the
> minutiae of why "2" is greater than "11".

Programming inevitably leads to at least understanding these minutiae. I use 
Ruby, and I love it for that natural-language expressiveness, and also just 
for the conciseness, even where I know it's less efficient:

(2..11).map(:&to_s)

But there's a case to be made that at a certain point, you need to understand 
what's going on. A simple example: What's the difference between a string and a 
symbol? Someone who uses strings where they should use symbols is making their 
program needlessly inefficient and verbose; someone who does the opposite is 
introducing a rather serious memory leak and potential DoS vulnerability.

You could make the case that we should just use strings, and find ways to make 
them really efficient. But hey, at least the semantics of symbols are adequately 
covered by strings -- the semantics of numbers really aren't.

Put another way: Currently, we're allowed to do:

puts 'Ho! '*3 + 'Merry Christmas!'

Now, suppose we start making + and * smart, so that '2'*'3'='6'. Now what does 
'2'*3 do? Is it '6', or 6, or '222'? It certainly seems feasible a newbie 
would get stuck here -- for example, what if they feel like adding 000 as a 
delimiter -- '0'*80 instead of '-'*80 to make a horizontal line -- did they 
get eighty zeros, or the product of 0*80=0?

Or suppose they added a space into their number accidentally -- is '2 '*80 
equal to '160' or '2 2 2 2 ...'? Maybe it's just me, but '2 ' seems like a 
much more probable mistake (and a harder one to catch) than saying '2' when 
you mean 2.

By making the easy stuff ridiculously easy (and assuming users are idiots), it 
adds enough ambiguity to drive users crazy later on.

Maybe I'm overreacting, and this would be fine for ranges, but I think "magic" 
only makes sense when it's very well understood and predictable. 'puts' 
calling #to_s on everything, and 'p' calling #inspect on everything, makes 
sense. Range calling #to_i sometimes just seems like it's asking for trouble.

0
Reply ninja (512) 8/19/2009 1:46:15 AM

[Note:  parts of this message were removed to make it a legal post.]

2009/8/19 David Masover <ninja@slaphack.com>

> On Monday 17 August 2009 08:46:25 am Scott Briggs wrote:
> > Piyush Ranjan wrote:
> > > +1 for that. and -1 for the change.
> > >
> > > It just makes the developer look too stupid. Can't we let the
> developers
> > > understand the difference between a string and an integer ?
> >
> > If it was 20 years ago, I'd understand this sentiment.  What I don't
> > understand is why programming languages seem to insist on using
> > semantics that don't adapt to the natural ways that humans interact or
> > think.
>
> Because the semantics with which humans interact and think are ambiguous,
> often illogical, and often rely on intuition.
>
> We can't give our languages intuition, but the more we try to do so, and
> the
> more magic we introduce, the less predictable things get.
>
> > There are a lot of constructs in ruby that make it much easier to use
> > and understand from a natural language point of view, one of the big
> > strengths of ruby, and this in turn makes it more accessible to people
> > who are interested in programming and not getting bogged down in the
> > minutiae of why "2" is greater than "11".
>
> Programming inevitably leads to at least understanding these minutiae. I
> use
> Ruby, and I love it for that natural-language expressiveness, and also just
> for the conciseness, even where I know it's less efficient:


I second this. "Magic" (for want of a better word) is only useful when it
gives you a faster way to achieve the same result. To anyone with moderate
or above programming experience, the difference between strings and numbers
is important and I for one would be annoyed at finding strings being
magically handled as numbers when that isn't what I wanted -- especially if
it were happening to user-supplied data.

This isn't an implementation detail that ought to be hidden from the user to
make things easier (like dynamic typing, or automatic garbage collection):
strings and numbers are conceptually different types of data that support
different operations and different semantics. I think trying to do too much
automatic type conversion is likely to end up producing a lot of the
problems that exist with number/string/boolean comparison in PHP and (to a
lesser extent) JavaScript.

David mentions concatenation vs addition -- what about splitting? I can
split "1234" into "12" and "34" and I have two perfectly valid strings; if I
split the number 1234 into 12 and 34 I've not done something meaningful. In
a number the digits have meaning based on their position within the number,
which itself depends on the base used to represent the number. A string is
just a sequence of glyphs, which have no intrinsic meaning at a technical
level.

Ruby's design is said to follow the principle of least surprise; to me this
means that consistency and correctness shouid be maintained. Blurring the
boundaries between strings and numbers is a frequent cause of bugs for
beginners in some other languages, and I think Ruby does well to enforce
some separation between them to guide people in the right direction.

--
James Coglan
http://jcoglan.com

0
Reply jcoglan (199) 8/19/2009 8:58:07 AM

15 Replies
24 Views

(page loaded in 0.533 seconds)


Reply: