#map, #select semantics

  • Follow


[Note:  parts of this message were removed to make it a legal post.]

I imagine this has come up before, though I can't find anything about it. I
was talking to a friend last night who mentioned that in Smalltalk, the
#collect and #select methods (and some others) return the same type as the
object they're called on, and in Ruby they always return an Array, no matter
what the receiver type is. This seems like a good idea to me, though there
are some questions:

1. What do they mean when applied to a Hash? Do you want a list of values,
or keys, or pairs? Or do you want to map each key and value to a different
key and value, and have #map build a Hash for you? If so, how do you
represent that as a return value? Is it [key, value] or {key => value} ?

2. Is part of the contract of #map that you get back a collection of the
same size as the input? If so, what happens if you #map a Set and produce
duplicates? The output will be smaller than the input (similarly for hash
key collisions). Certainly from an FP viewpoint, a map is a one-to-one
transform from input to output values, so it would seem logical to have as
many output values as input values.

3. There is no uniform method for adding items to a collection. We have
Array#<<, Hash#[]=, Set#add and Set#add?. For linked lists, adding to the
collection would involve changing pointers on existing members of the
collection. Is it possible to come up with a single method/message whose
role is to provide a uniform way to add something to any collection?

These are pretty much of the top of my head, but I'd be very interested in
hearing discussion of pros and cons, and any history that shows why Ruby
differs from Smalltalk. I also heard that Ruby 2.0 will "correct" this
behaviour -- is this true? Any discussion and/or links to further reading
welcome.

P.S. This was raised through a discussion of JS.Class[1], which implements
Enumerable, Hash and Set. The Enumerable methods return native JavaScript
arrays, which don't have the Enumerable methods -- this causes some
frustration and breaks the Ruby-like model the library is designed to
support.

[1] http://jsclass.jcoglan.com

-- 
James Coglan
http://jcoglan.com

0
Reply jcoglan (199) 8/28/2009 9:42:51 AM

Hi --

On Fri, 28 Aug 2009, James Coglan wrote:

> I imagine this has come up before, though I can't find anything about it. I
> was talking to a friend last night who mentioned that in Smalltalk, the
> #collect and #select methods (and some others) return the same type as the
> object they're called on, and in Ruby they always return an Array, no matter
> what the receiver type is. This seems like a good idea to me, though there
> are some questions:

I'm one of the few, I think, who prefer to get everything back in an
array. I like the idea of a uniform structure for result sets from
Enumerable operations. Among other things, it's impossible for it to
work universally the other way, so it's always going to be "return an
array, except for a few special cases."

> 1. What do they mean when applied to a Hash? Do you want a list of values,
> or keys, or pairs? Or do you want to map each key and value to a different
> key and value, and have #map build a Hash for you? If so, how do you
> represent that as a return value? Is it [key, value] or {key => value} ?

Hash#select and #reject return hashes in 1.9:

>> h = { 1 => 2, 3 => 4, 5 => 6 }
=> {1=>2, 3=>4, 5=>6}
>> h.select {|k,v| k > 1 }
=> {3=>4, 5=>6}

#map returns an array. I don't think it can be otherwise, since you're
only returning one value from the code block. Of course it would be
possible to write something like this:

module Enumerable
   def map2hash
     res = {}
     each do |k,v|
       res[k] = yield(v)
     end
     res
   end
end

> 2. Is part of the contract of #map that you get back a collection of the
> same size as the input? If so, what happens if you #map a Set and produce
> duplicates? The output will be smaller than the input (similarly for hash
> key collisions). Certainly from an FP viewpoint, a map is a one-to-one
> transform from input to output values, so it would seem logical to have as
> many output values as input values.

I think any quasi-mappish operation that departs from real map
semantics would have to be a different method. Mapping an enumerable
through a function to an array is just too basic and too useful not to
be available.

> 3. There is no uniform method for adding items to a collection. We have
> Array#<<, Hash#[]=, Set#add and Set#add?. For linked lists, adding to the
> collection would involve changing pointers on existing members of the
> collection. Is it possible to come up with a single method/message whose
> role is to provide a uniform way to add something to any collection?

You could try -- it should be implementable in Ruby, at least for
proof of concept.


David

-- 
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Q: What's the best way to get a really solid knowledge of Ruby?
A: Come to our Ruby training in Edison, New Jersey, September 14-17!
    Instructors: David A. Black and Erik Kastner
    More info and registration: http://rubyurl.com/vmzN

0
Reply dblack (1323) 8/28/2009 10:11:58 AM


On Fri, Aug 28, 2009 at 5:42 AM, James Coglan<jcoglan@googlemail.com> wrote:
> I imagine this has come up before, though I can't find anything about it. I
> was talking to a friend last night who mentioned that in Smalltalk, the
> #collect and #select methods (and some others) return the same type as the
> object they're called on, and in Ruby they always return an Array, no matter
> what the receiver type is. This seems like a good idea to me, though there
> are some questions:

I'm not sure what you mean is a good idea, the Smalltalk way, the Ruby
way, or that there's a difference?

Actually, it's not completely true that in Smalltalk collect and
select return the same class (and in both Ruby and Smalltalk classes
are not types), as the receiver.  In Smalltalk they return an instance
of the species of the receiver.  The abstract Collection class
implements collect and select which calls self species to get the
class to instantiate the result.

The collect method then enumerates the receiver and adds each element
to the result using the add: method.  Select pretty much is the same
except that adds the element conditional on what the block evaluation
returns.

The abstract implementation of species is just self class, but it is
overridden when that's not appropriate. For example the Interval class
(ST's rough equivalent of Range) overrides species to return Array.

I'm going to reorder your questions a bit.

> 2. Is part of the contract of #map that you get back a collection of the
> same size as the input? If so, what happens if you #map a Set and produce
> duplicates? The output will be smaller than the input (similarly for hash
> key collisions). Certainly from an FP viewpoint, a map is a one-to-one
> transform from input to output values, so it would seem logical to have as
> many output values as input values.

Not in Smalltalk.  If you send collect to a Set, you'll get back a Set
of the results of evaluating the block for each element of the
receiver, which means that duplicates will be removed.  So the result
Set may be smaller than the original.

> 1. What do they mean when applied to a Hash? Do you want a list of values,
> or keys, or pairs? Or do you want to map each key and value to a different
> key and value, and have #map build a Hash for you? If so, how do you
> represent that as a return value? Is it [key, value] or {key => value} ?
>
>
> 3. There is no uniform method for adding items to a collection. We have
> Array#<<, Hash#[]=, Set#add and Set#add?. For linked lists, adding to the
> collection would involve changing pointers on existing members of the
> collection. Is it possible to come up with a single method/message whose
> role is to provide a uniform way to add something to any collection?

Smalltalk implements Dictionary (which is the rough equivalent of
Ruby's Hash) as a subclass of Set.  A Smalltalk Dictionary is a Set of
Associations. An instance of Association represents a key-value pair,
and two Associations are considered equal if the keys are equal.  When
you enumerate a Dictionary in Smalltalk with the do: method (the rough
equivalent of #each), the values of the associations are enumerated.
There's also an associationsDo: method which enumerates the
associations.

In Smalltalk there is a uniform method to add an item to a collection
add:  So to add an item to a Dictionary you add an association.

So what about collect/select for Dictionaries.

In Smalltalk, Dictionary overrides collect: to return an instance of
Bag holding the values in the Dictionary, a Bag is just what it sounds
like, it's a bag of objects which is unordered and may have
duplicates.

Dictionary also overrides select: to use associationsDo: instead of
do: it returns an instance of the species of the receiver (a
Dictionary for Dictionary instances, but maybe something else for a
subclass) after adding each association value from evaluating the
block argument for each association in the receiver.  So in this case
you'd get back a Dictionary or something like a Dictionary.

All of this is based on "Blue-Book" Smalltalk-80, there might be some
slight variation in more modern Smalltalk implementations/dialects.

Now, is the Ruby or Smalltalk approach 'better'  I guess if I had my
druthers, I guess I differ a bit from my friend David A. Black here,
and I'd prefer it if Ruby were a bit more Smalltalk like here, and
Ruby 1.9 has made some steps in that direction.

But it is what it is.

-- 
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

0
Reply rick.denatale (1691) 8/28/2009 1:35:22 PM

On Fri, Aug 28, 2009 at 6:11 AM, David A. Black<dblack@rubypal.com> wrote:

> Hash#select and #reject return hashes in 1.9:
>
>>> h =3D { 1 =3D> 2, 3 =3D> 4, 5 =3D> 6 }
>
> =3D> {1=3D>2, 3=3D>4, 5=3D>6}
>>>
>>> h.select {|k,v| k > 1 }
>
> =3D> {3=3D>4, 5=3D>6}
>
> #map returns an array. I don't think it can be otherwise, since you're
> only returning one value from the code block.

Both of these are more like Smalltalk.  As I pointed out in another
posting to this thread, in the second case Smalltalk would return a
Bag of values from the receiver.

I honestly can't recall ever using map with a Dictionary in Smalltalk,
or with a Hash in Ruby for that matter.

> Of course it would be
> possible to write something like this:
>
> module Enumerable
> =A0def map2hash
> =A0 =A0res =3D {}
> =A0 =A0each do |k,v|
> =A0 =A0 =A0res[k] =3D yield(v)
> =A0 =A0end
> =A0 =A0res
> =A0end
> end

It's trickier coming up with a map for Hash which enumerates over the
key,value pairs and allows the block to return a new key,value pair,
what do you do if there are duplicate keys?

Smalltalk finessed that question.

>
>> 2. Is part of the contract of #map that you get back a collection of the
>> same size as the input? If so, what happens if you #map a Set and produc=
e
>> duplicates? The output will be smaller than the input (similarly for has=
h
>> key collisions). Certainly from an FP viewpoint, a map is a one-to-one
>> transform from input to output values, so it would seem logical to have =
as
>> many output values as input values.
>
> I think any quasi-mappish operation that departs from real map
> semantics would have to be a different method. Mapping an enumerable
> through a function to an array is just too basic and too useful not to
> be available.


I think there are two sides to that.  If you think of the world of
collections primarily as arrays, then this makes sense.

OTOH, if you think of collections more abstractly, then having map on
a Set return a Set which might be smaller because of duplication in
the results of the block is useful as well, and if you want the array
behavior you can make an Array from the set ant then map that.

>> 3. There is no uniform method for adding items to a collection. We have
>> Array#<<, Hash#[]=3D, Set#add and Set#add?. For linked lists, adding to =
the
>> collection would involve changing pointers on existing members of the
>> collection. Is it possible to come up with a single method/message whose
>> role is to provide a uniform way to add something to any collection?
>
> You could try -- it should be implementable in Ruby, at least for
> proof of concept.

The difficulty is that for some Ruby classes, I'm thinking of Hash in
particular here, it's not clear what an element is, in Smalltalk every
collection is a collection of elements, with Dictionary elements being
key-value associations.


--=20
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

0
Reply rick.denatale (1691) 8/28/2009 1:51:54 PM

Hi --

On Fri, 28 Aug 2009, Rick DeNatale wrote:

> Now, is the Ruby or Smalltalk approach 'better'  I guess if I had my
> druthers, I guess I differ a bit from my friend David A. Black here,
> and I'd prefer it if Ruby were a bit more Smalltalk like here, and
> Ruby 1.9 has made some steps in that direction.
>
> But it is what it is.

The problem is that it isn't what it is :-) Maybe it's just Hash
that's special-cased. It's certainly impossible to return, say, a file
handle from doing fh.map, meaningless to return a range from
range.map (though range.map is sort of hybrid behavior anyway), and
maybe or maybe not meaningful for other classes, including
user-defined ones, to return an object of the same class.

I guess I like the simplicity of defining each, including Enumerable,
and getting pre-defined behavior. I know that Hash and Array already
override a few things, though, so it's never been quite that simple.
Still, given the underlying mechanism, there's no way to generalize
returning a same-class object.


David

-- 
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Q: What's the best way to get a really solid knowledge of Ruby?
A: Come to our Ruby training in Edison, New Jersey, September 14-17!
    Instructors: David A. Black and Erik Kastner
    More info and registration: http://rubyurl.com/vmzN

0
Reply dblack (1323) 8/28/2009 1:54:28 PM

Hi --

On Fri, 28 Aug 2009, Rick DeNatale wrote:

> On Fri, Aug 28, 2009 at 6:11 AM, David A. Black<dblack@rubypal.com> wrote:
>
>>> 2. Is part of the contract of #map that you get back a collection of the
>>> same size as the input? If so, what happens if you #map a Set and produce
>>> duplicates? The output will be smaller than the input (similarly for hash
>>> key collisions). Certainly from an FP viewpoint, a map is a one-to-one
>>> transform from input to output values, so it would seem logical to have as
>>> many output values as input values.
>>
>> I think any quasi-mappish operation that departs from real map
>> semantics would have to be a different method. Mapping an enumerable
>> through a function to an array is just too basic and too useful not to
>> be available.
>
>
> I think there are two sides to that.  If you think of the world of
> collections primarily as arrays, then this makes sense.

I wouldn't extrapolate a world-view from it :-) None of this has to
be winner-take-all; there's room for all of these methods to exist. I
just like having a foundational map operation that returns a result
set in an array. I'm not ruling out other kinds of functionality, nor
claiming that non-array collections are actually arrays.

For example:

class Hash
   def map2     # or whatever
     h = {}
     each do |k,v|
       k2, v2 = yield(k,v)
       h[k2] = v2
     end
     h
   end
end

h = {1 => 2, 3 => 4 }

p h.map2 {|k,v| [k*10, v-1] }

   # => {30=>3, 10=>1}

Nothing wrong with having such a method. I just don't think it's a
candidate to replace Hash#map.


David

-- 
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Q: What's the best way to get a really solid knowledge of Ruby?
A: Come to our Ruby training in Edison, New Jersey, September 14-17!
    Instructors: David A. Black and Erik Kastner
    More info and registration: http://rubyurl.com/vmzN

0
Reply dblack (1323) 8/28/2009 2:11:39 PM


On Aug 28, 9:54=A0am, "David A. Black" <dbl...@rubypal.com> wrote:

> I guess I like the simplicity of defining each, including Enumerable,
> and getting pre-defined behavior. I know that Hash and Array already
> override a few things, though, so it's never been quite that simple.
> Still, given the underlying mechanism, there's no way to generalize
> returning a same-class object.

Sure there is --or at least it can be made complete predictable. Just
define a special class method, eg.

  class Array
    def self.enumerable_instance; []; end
   end

  class Hash
    def self.enumerable_instance; {}; end
   end

Then a uniform "add". Eg. #<< (btw I love Hash#<<) and the whole thing
can be generalized.

This has been discussed before, but for whatever reason the idea has
been disregarded. I suspect efficiency hacks have come to out way any
of these nice "uniformalizaions".

T.

0
Reply transfire (2969) 8/29/2009 4:50:35 AM


On Aug 29, 5:50=A0am, "David A. Black" <dbl...@rubypal.com> wrote:

> I don't mean that *something* can't be defined for every class; I mean
> specifically that it will never be the case that every Enumerable
> class will return an instance of itself from its iterations.

That's true.  And we would never really want it to. There are many
classes we may make Enumerable just to have the conveniences for some
internal property. For example, I once defined an #each method for
ObjectSpace, and made it enumerable. I sure wouldn't want a new
ObjectSpace returned! ;-)

I think people who suggest the idea, don't really mean exactly what
they think they do. That is to say, they see Set and Hash enumerable
methods returning Arrays and think, why not Sets and Hashes? Then they
take the mental leap to the idea that enumerables ought to return an
instance of the originating class. But on further reflection, they
would realize that the ideal solution is a predictable and
controllable system for determining the return container.

It's a pretty powerful idea. It could even be designed to allow the
container to be changed on the fly. Let's say we have

  [[:a,1], [:b,2]].map{ |(k,v)| [k, v+1] }  #=3D> [[:a,2], [:b,3]]

We might have something like:

  [[:a,1], [:b,2]].using({}).map{ |(k,v)| [k, v+1] }  #=3D>
{:a=3D>2, :b=3D>3}

--


0
Reply transfire (2969) 8/29/2009 12:32:21 PM

7 Replies
13 Views

(page loaded in 0.19 seconds)


Reply: