f



ML vs. Lisp

Besides the infix syntax and the static type system, can someone
describe the difference between ML and Lisp from a programmer's point
of view?  Should a Lisp programmer be able to get accustomed to .*ML
(insert your favorite dialect prefix for ".*") without a lot of
adjustment?  I can tell you that does NOT happen with Haskell ;).

Thanks.
0
phr.cx (5493)
2/9/2007 10:02:39 PM
comp.lang.functional 2791 articles. 0 followers. Post Follow

200 Replies
2853 Views

Similar Articles

[PageSpeed] 15

Paul Rubin schrieb:
> Besides the infix syntax and the static type system, can someone
> describe the difference between ML and Lisp from a programmer's point
> of view?  Should a Lisp programmer be able to get accustomed to .*ML
> (insert your favorite dialect prefix for ".*") without a lot of
> adjustment?  I can tell you that does NOT happen with Haskell ;).

Lisp has a lot of introspection, and coming with that are various forms 
of quoting.
*ML doesn't have or need that.

Lisp allows the construction of executable code at runtime.
ML doesn't have that (whether it would need that is something that MLers 
and Lispers probably disagree).

Lisp has macros that work at the syntax tree level.
Macro processors are not available for all *MLs (and they aren't 
considered obligatory).

The default data structure for Lisp is the linked list.
The default data structure for *ML is the tagged union (aka algebraic 
data type); linked lists are one (important) variant of that class of types.
Also, Lisp lists are by default mutable, while ML lists are not. (This 
particular difference isn't as important in practice, because most Lisp 
programs don't mutate lists, and because it's possible to have mutable 
lists in ML, so the two language groups meet on a middle ground of 
"lists are mostly immutable".)

These are just those that come to my mind. I'm pretty sure people with 
more direct experience with both systems can enumerate more differences.

Regards,
Jo
0
jo427 (1164)
2/9/2007 11:49:11 PM
Paul Rubin <http://phr.cx@nospam.invalid> wrote:
> Besides the infix syntax and the static type system, can someone
> describe the difference between ML and Lisp from a programmer's point
> of view?  

From a programmer's point of view, I'd say that the big diffence
between the two kinds of language is that the kind of things that one
would do using macros (e.g. http://mlton.org/ForLoops) or dynamic
typing (e.g. http://mlton.org/Printf) in Lisp or Scheme one does with
combinators in ML.  It takes some time to learn and master.  For
learning, I'd recommend reading articles on combinator libraries.

> Should a Lisp programmer be able to get accustomed to .*ML
[intentionally cut the rest]

Yes.

-Vesa Karvonen
0
2/10/2007 12:19:35 AM
On 09 Feb 2007 14:02:39 -0800, Paul Rubin <http://phr.cx@NOSPAM.invalid>
wrote:

>Should a Lisp programmer be able to get accustomed to .*ML (insert your
>favorite dialect prefix for ".*") without a lot of adjustment?  I can
>tell you that does NOT happen with Haskell ;).

If you find that Haskell wreaks havoc on your brain, then ML probably
will, too, though perhaps to a lesser extent.

Steve Schafer
Fenestra Technologies Corp.
http://www.fenestra.com/
0
steve2740 (70)
2/10/2007 2:44:20 AM
In article <98cqs2hf3m0rkkhgldks2fss038me8n59l@4ax.com>, Steve Schafer
<steve@fenestra.com> wrote:

> On 09 Feb 2007 14:02:39 -0800, Paul Rubin <http://phr.cx@NOSPAM.invalid>
> wrote:
> 
> >Should a Lisp programmer be able to get accustomed to .*ML (insert your
> >favorite dialect prefix for ".*") without a lot of adjustment?  I can
> >tell you that does NOT happen with Haskell ;).
> 
> If you find that Haskell wreaks havoc on your brain, then ML probably
> will, too, though perhaps to a lesser extent.

ML is not too different from Scheme. Almost not worth the effort.

Haskell is really crazy, elegant, mind-blowing, pure, strange....but
enough so that it's _worth it_ to "break on through to the other side."

--Tim May
0
timcmay (55)
2/10/2007 4:05:24 AM
Tim May <timcmay@removethis.got.net> writes:
> ML is not too different from Scheme. Almost not worth the effort.

From a practical standpoint (of trying to escape from godforsaken
languages like C while still writing tight code) I'm hoping that
compile-time type checking can eliminate some debugging, and
similarly, getting rid of dynamic typing makes much better runtime
code.

> Haskell is really crazy, elegant, mind-blowing, pure, strange....but
> enough so that it's _worth it_ to "break on through to the other side."

Yeah, that's why I've been spending time on it and should try to get
back to it.  Ultimately though writing real applications in it seems
dodgy, e.g. Arch is very slow, that poker guy gave up on it and
converted his server to Erlang, etc.
0
phr.cx (5493)
2/10/2007 4:17:36 AM
On Fri, 09 Feb 2007 20:05:24 -0800, Tim May <timcmay@removethis.got.net>
wrote:

>ML is not too different from Scheme.

I would think that stuff like type inference would be enough to qualify
as "different enough."

Steve Schafer
Fenestra Technologies Corp.
http://www.fenestra.com/
0
steve2740 (70)
2/10/2007 4:21:21 AM
Vesa Karvonen <vesa.karvonen@cs.helsinki.fi> writes:
> From a programmer's point of view, I'd say that the big diffence
> between the two kinds of language is that the kind of things that one
> would do using macros (e.g. http://mlton.org/ForLoops) or dynamic
> typing (e.g. http://mlton.org/Printf) in Lisp or Scheme one does with
> combinators in ML.  It takes some time to learn and master.  For
> learning, I'd recommend reading articles on combinator libraries.

Thanks, do you recommend any particular articles?  I think I
understand the gist of what you're saying, though I don't know ML
syntax enough to really grok those two examples.  
0
phr.cx (5493)
2/10/2007 4:22:39 AM
In article <7xsldebq8v.fsf@ruckus.brouhaha.com>, Paul Rubin
<http://phr.cx@NOSPAM.invalid> wrote:

> Tim May <timcmay@removethis.got.net> writes:
> > ML is not too different from Scheme. Almost not worth the effort.
> 
> From a practical standpoint (of trying to escape from godforsaken
> languages like C while still writing tight code) I'm hoping that
> compile-time type checking can eliminate some debugging, and
> similarly, getting rid of dynamic typing makes much better runtime
> code.
> 
> > Haskell is really crazy, elegant, mind-blowing, pure, strange....but
> > enough so that it's _worth it_ to "break on through to the other side."
> 
> Yeah, that's why I've been spending time on it and should try to get
> back to it.  Ultimately though writing real applications in it seems
> dodgy, e.g. Arch is very slow, that poker guy gave up on it and
> converted his server to Erlang, etc.

That was _his_ application. Note that nether darcs (the CVS sort of
program) nor Pugs (the language in which Perl 6 is being written) was
written in Erlang.

Popularity is rarely a good measure, in any case. 


--Tim May
0
timcmay (55)
2/10/2007 6:36:50 AM
Tim May <timcmay@removethis.got.net> writes:
> > dodgy, e.g. Arch is very slow, that poker guy gave up on it and
> > converted his server to Erlang, etc.
> 
> That was _his_ application. Note that nether darcs (the CVS sort of
> program) nor Pugs (the language in which Perl 6 is being written) was
> written in Erlang.

Sorry, yeah, I meant darcs not Arch.  Darcs is well known to be quite
slow and have scaling problems--I dunno whether a comparable
implementation in another language would have had similar problems.
Pugs's slowness can't be held against it since it was supposed be a
prototype, so ok.  The poker guy gave up on Haskell because 1) he
spent too much time fighting the type system and decided he was more
productive with dynamic types; and 2) his application (a very highly
concurrent, high traffic internet server) kept uncovering bugs in the
GHC runtime that hadn't been subjected to that kind of stress before.

I do get the impression that doing anything serious in Haskell
requires fairly deep wizardry not needed for Lisp, Erlang, Python,
etc.  However, I continue to fool around with Haskell and I still want
to write something nontrivial in it just for educational purposes,
since I don't think the very tiny things I've tried with it so far
really give the flavor.

> Popularity is rarely a good measure, in any case. 

True.  But the experiences of actual users count for something.
0
phr.cx (5493)
2/10/2007 6:56:52 AM
Tim May <timcmay@removethis.got.net> wrote:
[...]
> ML is not too different from Scheme. Almost not worth the effort.

I disagree.  While there are similarities (strict, impure), ML is
quite different (dynamic typing vs static typing with type inference;
no module system or namespaces (R5RS) vs fairly sophisticated module
system; mutable bindings, strings, cons cells, and vectors vs
immutable bindings, cons cells, strings, and vectors and (only)
mutable ref cells and arrays; no standard pattern matching, algebraic
datatypes, or records vs all of them in the standard; no specified
evaluation order vs fully specified evaluation order (SML); no
standard exception handling but call/cc vs no standard call/cc
(available in SML/NJ and MLton) but exception handling; often
interpreted (or bytecode) vs usually compiled; functions take a list
of arguments vs a function always takes a single argument; macros vs
no macros; symbols ('x) vs algebraic datatypes; prefix syntax vs
programmer definable infix operators) from (R5RS+) Scheme (list would
slightly different with Common Lisp).  Programming in ML feels quite
different from programming in Scheme.  I use both regularly.

-Vesa Karvonen
0
2/10/2007 12:05:22 PM
Vesa Karvonen <vesa.karvonen@cs.helsinki.fi> writes:
> Programming in ML feels quite
> different from programming in Scheme.

Well, ok, Python also feels different from Scheme, but someone used to
one can switch to the other easily.  Most of the differences in your
list are comparable to Scheme-vs-Python differences.  But some are
not, particularly those dealing with the type system and its
consequences, and that's the part I have no real experience with.
0
phr.cx (5493)
2/10/2007 12:14:25 PM
Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:
(snip)
> Sorry, yeah, I meant darcs not Arch.  Darcs is well known to be quite
> slow and have scaling problems--I dunno whether a comparable
> implementation in another language would have had similar problems.

My impression is that it would: it's partly about design decisions and
suchlike, rather than intrinsically-Haskell stuff.

> Pugs's slowness can't be held against it since it was supposed be a
> prototype, so ok. 

I know nothing about Pugs.

> The poker guy gave up on Haskell because 1) he
> spent too much time fighting the type system and decided he was more
> productive with dynamic types;

That's an interesting one. I grew up with static stuff (e.g., I learned
SML before I learned Common Lisp, and always liked languages like
Modula-3) (and also untyped stuff, like assembler) and when writing in
Lisp I think in ML, and I don't feel constrained by that, so I wonder if
my mind is just bent to think of problem solutions that don't need
dynamic types. Rather like, coming from an imperative background, FP is
initially hard because you have to push the obvious imperative solution
out of your head first before you have a chance of seeing the functional
one. I may just be a bad Lisp programmer. (-:

> and 2) his application (a very highly concurrent, high traffic
> internet server) kept uncovering bugs in the GHC runtime that hadn't
> been subjected to that kind of stress before.

Mmmm, I can believe that.

FWIW Haskell people do care about performance and progress is being
made. For example, Strings used to be a common reason why applications
were slow, so just lately we have much shinier ByteString stuff instead.
(Actually, I don't know if this is an issue with darcs.) Similarly, now
we also have Sequences where, unlike lists, we can tolerably efficiently
append to the end of a long one.

> I do get the impression that doing anything serious in Haskell
> requires fairly deep wizardry not needed for Lisp, Erlang, Python,
> etc.
(snip)

That's quite possible. Even from an FP background Haskell hurts my head
a lot. I still have much to learn about it. Whereas, with most
languages, after few weeks I'd usually have a good command of things
that weren't just details.

"Anything serious" is overstating it a bit, though. Once you understand
GADTs and monad transformers you can go a very long way. Much of what
I'm learning now, I don't actually /need/, but it's nice to have. It's
like mathematics, where when you understand a new field, you see how you
could have used it in the past, but in the past you did manage muddle
through some other way, you weren't actually hamstrung. Like these
people who don't initially notice when 'design patterns' like unfoldr or
Monoid apply, they just write a little function that does the same
thing.

Mark.

-- 
Functional programming vacancy at http://www.aetion.com/
0
Mark.Carroll (154)
2/10/2007 12:56:06 PM
Steve Schafer <steve@fenestra.com> writes:

> On Fri, 09 Feb 2007 20:05:24 -0800, Tim May <timcmay@removethis.got.net>
> wrote:
>
>>ML is not too different from Scheme.
>
> I would think that stuff like type inference would be enough to qualify
> as "different enough."

Well, in some ways, it doesn't a lot change how you write the program,
it's more about when you learn about the error ?

Mark.

-- 
Functional programming vacancy at http://www.aetion.com/
0
Mark.Carroll (154)
2/10/2007 1:00:24 PM
Paul Rubin wrote:
> Well, ok, Python also feels different from Scheme, but someone used to
> one can switch to the other easily.  Most of the differences in your
> list are comparable to Scheme-vs-Python differences.  But some are
> not, particularly those dealing with the type system and its
> consequences, and that's the part I have no real experience with.

Just to underline Vesa's point that Scheme/Python and ML are very different
languages, have a look at this schemer's attempt at solving the n-queens
problem in OCaml:

http://curiousprogrammer.wordpress.com/2006/12/22/speed-comparison-plt-scheme-ocaml-and-c/

Compare to my attempt and some optimisations:

http://caml.inria.fr/pub/ml-archives/caml-list/2006/12/e13b5ffc31ccffb8b39e822d2c95ac28.en.html

It is worth looking at the mistakes the schemer made in his ML:

1. He used several pointless types, e.g. type posn = Posn of int * int.
2. He misused pattern matching, e.g. as an explicit replacement for
destructing bind:

let board_ref b x y =
  match b with
      Board (squares, size) -> Array.get squares (index_of_coord x y size);;

3. He added superfluous parentheses.
4. He didn't understand OCaml's equality:

  if c != Safe then c

5. He didn't leverage built-in higher-order functions.
6. He added lots of unnecessary boxing.

let rec placement board posn rem =
  match board, posn with
      None, _ -> None
    | _, None -> None
    | Some (Board (squares, size) as b), Some (Posn (x, y) as p) ->

7. His code wasn't factored, e.g. into his own higher-order functions.
8. Finally, he opted for a slower, array-based approach rather than using
the much simpler and faster list-based approach.

There are two main sources of bad OCaml code. The main source is people
coming from a C++ background who try to use objects for everything because
they don't know any better. Look at the Glome ray tracer, for example:

http://syn.cs.pdx.edu/~jsnow/glome/

The author used objects because he didn't know how to implement mutual
recursion without them. I made exactly the same mistakes when I came to
OCaml from C++.

The other source of bad OCaml code is programmers used to dynamic typing.
They typically define many unnecessary sum types and try to box all values
in terms of these. This results in slow, unreliable code.

While I agree that a Schemer or Python programmer can sit down and knock out
some ML that might work, I can't emphasise enough that there is no point in
writing ML until you understand the trade-offs involved. ML is a superb
family of languages and it lets you solve many problems with remarkable
brevity. People won't realise that if they just try to "switch to ML
easily".

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/10/2007 2:22:43 PM
Mark.Carroll@Aetion.com (Mark T.B. Carroll) writes:

> Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:
> (snip)
>> Sorry, yeah, I meant darcs not Arch.  Darcs is well known to be quite
>> slow and have scaling problems--I dunno whether a comparable
>> implementation in another language would have had similar problems.
>
> My impression is that it would: it's partly about design decisions and
> suchlike, rather than intrinsically-Haskell stuff.

I' like to support this point of view. I've been told that the
algorithms Darcs uses are scaling badly in memory and time in certain
cases. So this would be not an implementation issue, not a Haskell
issue and unfixable until the specification would be changed (but I
think the Darcs-people would rather wait for better hardware and I'm
not so sure they would be wrong :-).

>> Pugs's slowness can't be held against it since it was supposed be a
>> prototype, so ok. 
>
> I know nothing about Pugs.

Nor do I -- is that a typo?

>
>> The poker guy gave up on Haskell because 1) he
>> spent too much time fighting the type system and decided he was more
>> productive with dynamic types;
>
> That's an interesting one. I grew up with static stuff (e.g., I learned
> SML before I learned Common Lisp, and always liked languages like
> Modula-3) (and also untyped stuff, like assembler) and when writing in
> Lisp I think in ML, and I don't feel constrained by that, so I wonder if
> my mind is just bent to think of problem solutions that don't need
> dynamic types. 

I do think that possible. There is a "cultural bias" in the choice of
programming tools -- which is the reason why "I don't have problems"
or "I know someone who never came to grasp with it" usually only have
limited significance for the general case.


> Rather like, coming from an imperative background, FP is
> initially hard because you have to push the obvious imperative solution
> out of your head first before you have a chance of seeing the functional
> one. I may just be a bad Lisp programmer. (-:
>
>> and 2) his application (a very highly concurrent, high traffic
>> internet server) kept uncovering bugs in the GHC runtime that hadn't
>> been subjected to that kind of stress before.
>
> Mmmm, I can believe that.

But I also believe that the GHC people are rather cooperative in
fixing those bugs (as compared with certain commercial vendors who
want to sell really expensive support contracts before they even
listento your description of a bug in _their_ product).


> FWIW Haskell people do care about performance and progress is being
> made. For example, Strings used to be a common reason why applications
> were slow, so just lately we have much shinier ByteString stuff instead.
> (Actually, I don't know if this is an issue with darcs.) Similarly, now
> we also have Sequences where, unlike lists, we can tolerably efficiently
> append to the end of a long one.
>
>> I do get the impression that doing anything serious in Haskell
>> requires fairly deep wizardry not needed for Lisp, Erlang, Python,
>> etc.
> (snip)
>
> That's quite possible. Even from an FP background Haskell hurts my head

Yeah. And that is, because the language emphasizes _smart_ programming
and there are such a lot of really smart people around in the Haskell
scene. But I think one doesn't have to use all that special stuff. One
is well of with 2 rules:

  - The I/O Monad just works like imperative programming: Write your
    main and I/O procedures in the I/O monad. Don't bothe what a
    "monad" in the theoretical view is. You can just use ut.

  - Everything else is pure and lazy. Write your data transformations
    pure and lazy (outside the I/O monad).

You don't have to bother about the specialities for a long time. 

> a lot. I still have much to learn about it. Whereas, with most
> languages, after few weeks I'd usually have a good command of things
> that weren't just details.

> "Anything serious" is overstating it a bit, though. 

Yes. You can, i.e. write a complete compiler or XML transformation
tool in Haskell w/o much wizadry. 

> Once you understand GADTs and monad transformers you can go a very
> long way. Much of what I'm learning now, I don't actually /need/,
> but it's nice to have. It's like mathematics, where when you
> understand a new field, you see how you could have used it in the
> past, but in the past you did manage muddle through some other way,
> you weren't actually hamstrung. 

:-) Exactly my point, though I'd set the cutoff even earlier: People
don't have GADTs in other languages, so they don't strictly need them
in Haskel to do useful things.

> Like these people who don't initially notice when 'design patterns'
> like unfoldr or Monoid apply, they just write a little function that
> does the same thing.

ACK.

Regards -- Markus

0
2/10/2007 2:31:17 PM
Paul Rubin wrote:
> Besides the infix syntax and the static type system, can someone
> describe the difference between ML and Lisp from a programmer's point
> of view?  Should a Lisp programmer be able to get accustomed to .*ML
> (insert your favorite dialect prefix for ".*") without a lot of
> adjustment?  I can tell you that does NOT happen with Haskell ;).

We haven't discussed pattern matching much. IMHO, this is one of the biggest
improvements...

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/10/2007 2:31:40 PM
My long answer can be found in the translations (or lack thereof) of
the SICP examples into ML:

http://www.codepoetics.com/wiki/index.php?title=Topics:SICP_in_other_languages

Chris Rathman

On Feb 9, 4:02 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Besides the infix syntax and the static type system, can someone
> describe the difference between ML and Lisp from a programmer's point
> of view?  Should a Lisp programmer be able to get accustomed to .*ML
> (insert your favorite dialect prefix for ".*") without a lot of
> adjustment?  I can tell you that does NOT happen with Haskell ;).
>
> Thanks.


0
2/10/2007 4:39:36 PM
Chris Rathman wrote:
> My long answer can be found in the translations (or lack thereof) of
> the SICP examples into ML:
> 
>
http://www.codepoetics.com/wiki/index.php?title=Topics:SICP_in_other_languages

I'm not sure that is very useful. The OCaml and F# implementations are quite
poor, they fail to leverage the languages' features and, instead, waste
much of their code reimplementing Lisp/Scheme right down to the variable
names and superfluous parentheses. I think this is fundamentally because
you're translating a book designed to teach Scheme into other languages.

Look at this excerpt, for example:

(* Due to the typeful nature of OCaml, we need to define a set of variants
   that will allow us to encode the solution shown in SICP.  It is a bit
messier
   than the scheme version, but sometimes that's the price you pay for type
safety
*)
type ('a,'b) mpair = MLeft of 'a | MRight of 'b | LSet of ('a -> unit) |
RSet of ('b -> unit)

let cons'' x y =
  let x = ref x and y = ref y in
  let set_x v = x := v
  and set_y v = y := v
  in function
  | `Car -> Mleft !x
  | `Cdr -> MRight !y
  | `Set_car -> LSet set_x
  | `Set_cdr -> RSet set_y
  | _ -> raise (Invalid_argument "cons''")

Is this function ever going to be useful in OCaml? Can you find a single
OCaml program that uses a function like this?

Why car/cdr and not fst/snd?

Why dynamic dispatch rather than four separate functions?

Why the catchall pattern at the end?

Again, look at this comment:

   Note also that this isn't exactly the most efficient way (or even the
most
   intuitive way) of representing a queue in OCaml.  But it's a close
translation of
   the book.

To me, that says "this is not the way to implement a queue in OCaml".

From my point of view, SICP remains decades out of date and people
interested in learning something modern should look to modern examples.
Take a look at Markus Mottl's queue implementations, for example:

http://www.ocaml.info/ocaml_sources/pure-fun-1.0.6/chp5.ml

module BatchedQueue : QUEUE = struct
  type 'a queue = 'a list * 'a list

  let empty = [], []
  let is_empty (f, r) = f = []

  let checkf (f, r as q) = if f = [] then List.rev r, f else q

  let snoc (f, r) x = checkf (f, x :: r)
  let head = function [], _ -> raise Empty | x :: _, _ -> x
  let tail = function [], _ -> raise Empty | _ :: f, r -> checkf (f, r)
end

This is the first port of call for a queue implementation in ML.

I'll take a quick look at some specifics. Ex 3.56:

(define (merge s1 s2)
  (cond ((stream-null? s1) s2)
        ((stream-null? s2) s1)
        (else
         (let ((s1car (stream-car s1))
               (s2car (stream-car s2)))
           (cond ((< s1car s2car)
                  (cons-stream s1car (merge (stream-cdr s1) s2)))
                 ((> s1car s2car)
                  (cons-stream s2car (merge s1 (stream-cdr s2))))
                 (else
                  (cons-stream s1car
                               (merge (stream-cdr s1)
                                      (stream-cdr s2)))))))))

In OCaml, this is just:

# let rec merge = parser
    | [<'h1; t1>] -> (parser
          [<'h2; t2>] ->
            if h1<h2 then [<'h1; merge t1 [<'h2; t2>]>] else
            [<'h2; merge [<'h1; t1>] t2>]
        | [<>] -> [<'h1; t1>])
    | [<>] -> (parser [<t>] -> t);;
val merge : 'a Stream.t -> 'b Stream.t -> 'b Stream.t = <fun>

For example:

# let rec f = parser [<'h; t>] -> string_of_int h^f t | [<>] -> "";;
val f : int Stream.t -> string = <fun>
# f(merge [<'1;'3;'5;'7;'9>] [<'2;'4;'6;'8>]);;
- : string = "123456789"

There are many other example functions out there that are prohibitively
tedious or error prone to implement in Scheme, so you won't find them in
SICP.

For anyone wanting to learn OCaml or F# from good code, I recommend:

  http://www.ffconsultancy.com/free/ocaml
  http://www.codecodex.com/wiki/index.php?title=Category:Objective_Caml

and, of course, my book.

Look for any examples that leverage pattern matching, higher-order functions
and so on.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/10/2007 11:02:07 PM
On Feb 10, 5:02 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> I'll take a quick look at some specifics. Ex 3.56:
>
> (define (merge s1 s2)
>   (cond ((stream-null? s1) s2)
>         ((stream-null? s2) s1)
>         (else
>          (let ((s1car (stream-car s1))
>                (s2car (stream-car s2)))
>            (cond ((< s1car s2car)
>                   (cons-stream s1car (merge (stream-cdr s1) s2)))
>                  ((> s1car s2car)
>                   (cons-stream s2car (merge s1 (stream-cdr s2))))
>                  (else
>                   (cons-stream s1car
>                                (merge (stream-cdr s1)
>                                       (stream-cdr s2)))))))))
>
> In OCaml, this is just:
>
> # let rec merge = parser
>     | [<'h1; t1>] -> (parser
>           [<'h2; t2>] ->
>             if h1<h2 then [<'h1; merge t1 [<'h2; t2>]>] else
>             [<'h2; merge [<'h1; t1>] t2>]
>         | [<>] -> [<'h1; t1>])
>     | [<>] -> (parser [<t>] -> t);;
> val merge : 'a Stream.t -> 'b Stream.t -> 'b Stream.t = <fun>
>
> For example:
>
> # let rec f = parser [<'h; t>] -> string_of_int h^f t | [<>] -> "";;
> val f : int Stream.t -> string = <fun>
> # f(merge [<'1;'3;'5;'7;'9>] [<'2;'4;'6;'8>]);;
> - : string = "123456789"

Since the O'Caml and F# hadn't translated this particular example, I'd
have to show how the Alice ML was translated:

fun lazy merge (nil, s2) = s2
  | merge(s1, nil) = s1
  | merge (s1, s2) =
      let
         val s1car = hd s1
         val s2car = hd s2
      in
         if (s1car < s2car)
            then s1car :: (merge(tl s1, s2))
            else
               if (s1car > s2car)
                  then s2car :: (merge(s1, tl s2))
                  else s1car :: (merge(tl s1, tl s2))
      end

val result = List.take(merge([1,3,5,7,9], [2,4,6,8,10]), 10);

Not necessarily the way you'd actually want to structure the code for
Alice ML, which has some hangover Scheme terminology.  I found it
instructive as it shows a nice correlation with the Scheme code.


0
2/11/2007 6:24:21 AM
Not sure why this didn't go through the first time, so apologies in
advance if it ends up being a duplicate.

On Feb 10, 5:02 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> I'm not sure that is very useful. The OCaml and F# implementations are quite
> poor, they fail to leverage the languages' features and, instead, waste
> much of their code reimplementing Lisp/Scheme right down to the variable
> names and superfluous parentheses. I think this is fundamentally because
> you're translating a book designed to teach Scheme into other languages.

The translation is mostly a self-education project.  Some might find
it useful in a limited context.  Others will find it muddled.  It is a
"translation", not a methodical attempt to teach any of the other
languages, nor optimized algorithmic design in those PLs.  Being a
translation, it's only natural that a lot of Scheme-isms will permeate
the code (though I can't think of better language to mimic than
Scheme).

That said, I disagree with your last point.  SICP is designed to teach
programming, and Scheme just happens to be the vehicle used to
accomplish that goal.  Many of the topics in SICP, such as HOFs, are
realized nicely in languages with modern type systems, while others
are not.  It's all well and good to say that more modern languages
have emerged in the last 20 years, but SICP is still quite an
approachable introduction to many of the fundamental concepts of FP.
It won't teach you how to program in ML.  But it will teach you most
of the ideas that form the basis of modern PLs.

> Is this function ever going to be useful in OCaml? Can you find a single
> OCaml program that uses a function like this?

Perhaps it's because the specific example chosen is demonstrating a
technique that does not translate well into an ML idiom.  And perhaps
because there's not an ML idiomatic way to express this particular
technique, most O'Caml code will avoid using it.

> Why car/cdr and not fst/snd?

Are you saying that fst|snd are more meaningful than car|cdr (or my
SML bias that prefers hd|tl)?  For the most part, the use of terms car
and cdr in the translation are limited to those functions in SICP that
are trying to directly define car & cdr (though I'll have to recheck
the O'Caml translation).  Now most Scheme programmers are not going to
waste time defining car and cdr, as they will use the builtin
primitives.  The question is why the authors of SICP waste time
defining these functions in a number of different ways?

> Why dynamic dispatch rather than four separate functions?

Perhaps because dynamic dispatch was the lesson being taught in that
particular example.  The original SICP example that you chose to
highlight is about how message dispatch can be used to implement car
and cdr (fst & snd if you prefer).  Now the O'Caml person may counter
by saying that no one in their right mind would ever want to define
fst and snd as a dispatched closure (shades of OOP here).  But then no
right minded Schemer is ever likely to define car and cdr in that
manner either.  So are the authors wasting our time showing us an
implementation that will never be used?  Well, if the book were a
recipe book, then yes this little foray would be a waste of time.  But
it also happens to teach us a lessons about functions, closures and
dispatch.  And that lesson is much more valuable than simply handing
us a recipe about the best way to do something.

Now, I'll be the first to admit that this particular example doesn't
translate well to most other statically typed languages.  But then the
original question was about the differences between Lisp and ML.  In
that sense, the derivation of car and cdr as a closure in ML may at
least serve as an example (of the negative variety) of what Lisp
idioms to avoid when crossing the border to a typed language.
Personally, I find dynamic dispatch to be a useful programming
technique in dynamic languages.  OTOH, I find working with types to be
a much better experience in ML.  But then the discussion boils down to
one of tradeoffs.  There is much effort put forth by those on both
sides of the aisle to attempt to recover some of that functionality on
the other side of the fence (shades of Oleg).  When I get time, I'd
probably have a second less literal translation that uses classes for
this particular example.  But I still think the literal translation is
useful for showing a bridge between the two languages.

> Why the catchall pattern at the end?

Because the compiler will complain about non-exhaustive pattern
matching.  And it happens to correspond to the SICP example.  If you
are dispatching on messages, as the Lisp is doing in the example, then
you may get an unexpected message.  Again, you chose an example that
doesn't translate well to ML.  The very fact that it corresponds to
message dispatch via functions and closures, tells us that the example
is going to be rather hard to translate.  In short, the example is
exactly about message dispatch - not exactly a core strength of ML.
Even though it might be foreign to ML, the question is whether there
is a way to get message dispatch in the language?

Again, I'm not particularly satisfied with the translation of message
dispatch to ML.  Most ML people will say that you have to re-educate
yourself to think differently, foregoing these things you happen to be
fond of in other languages.  But the question is whether it's possible
to reacquire some of the tools from our toolbox that we have become
accustomed to in other languages?  Perhaps not.  Of course, this is
just a long-winded way to say that ML is not as dynamic as languages
like Scheme and Smalltalk - but that particular discussion may not be
productive.  There is much better material on this particular subject
that other O'Caml users have provided - I just haven't had time to
digest it.

I do however find the overall response rather dismaying in that I
infer (probably incorrectly) that because these techniques are not
available in ML, they should be summarily swept under the rug, never
to be given serious consideration - effectively labelling them as a
distraction (pariah) to the way that ML'ers should think.  It makes me
cringe a little when the ML and Haskell advocates (or advocates of any
other language or paradigm for that matter) say that you have to
completely relearn how you think and program.  It may well be true,
but I find that having to redefine the craft every couple of years
rather grating.  Has the process of thinking about programs really
changed since the release of SICP all those years ago?  It was way
ahead of it's time.  In some ways, I still think it's ahead.

> Again, look at this comment:
>
>    Note also that this isn't exactly the most efficient way (or even the
> most
>    intuitive way) of representing a queue in OCaml.  But it's a close
> translation of
>    the book.
>
> To me, that says "this is not the way to implement a queue in OCaml".

Well, that happens to be the connotation of the quote.  The
translation of Scheme to ML is ok in some places, torturous in
others.  It could also be said that the implementation of Queues given
in SICP is likely not the actual the implemetation that resides in
most modern Scheme dialects.  There's a million ways to implement a
queue.  Some are useful.  Some are not.

> From my point of view, SICP remains decades out of date and people
> interested in learning something modern should look to modern examples.
> Take a look at Markus Mottl's queue implementations, for example:

If you want to learn how to implement queues (and a number of other
data structures) in ML, then at least send them to the definitive
source: Purely Functional Data Structures by Chris Okasaki, of which
the link you give is a translation to O'Caml.  I'd note that much of
chapter #3 of SICP is concerned with state (though I still haven't
figured out why streams was located in this chapter).  So, yes, many
of the algorithms and examples are best formulated without state using
a functional approach.

But as long as we're on the subject, and handing out reading
recommendations, there's some interesting material in the book
Concepts, Techniques and Models of Computer Programming by Peter van
Roy and Seif Haridi.

http://www.info.ucl.ac.be/~pvr/book.html

In the book they cover constant time queues using dataflow variables
in chapter #3 (3.4.5 to be exact).  But then that book is written
using Oz, much as SICP is written using Scheme.  Alice ML can get us
real close - the translation page on SICP also has a link to a
translation of many of the book's examples to CTM.  But then if you
just want to live in a single language and learn under a single
paradigm and find recipes for doing things, then SICP and CTM are not
the books to read to learn programming.  But I have to agree to
disagree that SICP is out of date.  It teaches many things that are
relevent today.  It teaches us things we are still striving for.  And
it does so quite elegantly.  It's not without it's faults (as the HTDP
people and Wadler have pointed out).  But if I'm looking for modern
techniques, I'd recommend CTM as the best current offering - a book
that I consider being the closest thing we have to a modern SICP.

That's not to say that the SICP translations are necessarily of high
quality (I spent an entire 4 hours trying to come up to speed on
O'Caml and translating chapters #1 & #2, though I confess it was
mostly a translation of the Alice ML code which I spent much more time
on).  They are merely (feable) attempts at making the ideas locked in
SICP a bit more accessible to myself.   Perhaps not useful to others,
but then you get what you pay for.  And being on a wiki, others more
in tune with these languages can modify it to be more idiomatic and
elegant.

> # let rec merge = parser
>     | [<'h1; t1>] -> (parser
>           [<'h2; t2>] ->
>             if h1<h2 then [<'h1; merge t1 [<'h2; t2>]>] else
>             [<'h2; merge [<'h1; t1>] t2>]
>         | [<>] -> [<'h1; t1>])
>     | [<>] -> (parser [<t>] -> t);;
> val merge : 'a Stream.t -> 'b Stream.t -> 'b Stream.t = <fun>
>
> For example:
>
> # let rec f = parser [<'h; t>] -> string_of_int h^f t | [<>] -> "";;
> val f : int Stream.t -> string = <fun>
> # f(merge [<'1;'3;'5;'7;'9>] [<'2;'4;'6;'8>]);;
> - : string = "123456789"

Well, one of the criticisms of ML and Haskell is that they tend to
value terseness in their definitions.  Yes, a programmer immersed in
ML can easily understand this code.  For those that are not - well it
can look as opaque as APL.

> There are many other example functions out there that are prohibitively
> tedious or error prone to implement in Scheme, so you won't find them in
> SICP.

And there are examples in SICP that are tedious and error prone to
implement in ML (as you point out above).  If you are interested in
programming, I'd highly recommend SICP and CTM - way above most of the
books I've read on any particular PL - whether or not you happen to be
interested in Scheme or Oz.  And as long as you happen to be reading
these books (or have read them in the past), perhaps seeing the ideas
expressed in other languages (however un-idiomatic) might help (or
not).

> For anyone wanting to learn OCaml or F# from good code, I recommend:
>
>  http://www.ffconsultancy.com/free/ocaml
>  http://www.codecodex.com/wiki/index.php?title=Category:Objective_Caml

So the question becomes how one learns ML?  The answer is not a simple
- buy this book and you will suddenly possess ML-fu.  Indeed, people
learn things in many different ways with different approaches.  What
works for one, may not work for another.  And learning isn't always a
go-or-no-go situation.  People come to ML with different backgrounds
in programming.  I happen to think that some of the all time best
introductory programming texts have come from the Scheme community.  I
happen to also be naive enough to think that there's got to be a way
to leverage that material to other languages.  But in no way do I
think this project amounts to a substitute for a more reasoned
presentation of O'Caml or F# - it's just one more resource in a sea of
resources that's available in the ocean of the internet.

>
> and, of course, my book.

I've heard good things about your O'Caml book, though I haven't read
it myself.  I'm actually more interested in your F# book - Any news
you can share on the delivery?

> Look for any examples that leverage pattern matching, higher-order functions
> and so on.

And I would concur if your immediate goal is to learn ML.


0
2/11/2007 7:11:08 AM
Markus E Leypold <development-2006-8ecbb5cc8aREMOVETHIS@andthatm-e-leypold.de>
wrote:
> Mark.Carroll@Aetion.com (Mark T.B. Carroll) writes:
>> Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:

>>> Pugs's slowness can't be held against it since it was supposed be a
>>> prototype, so ok. 

>> I know nothing about Pugs.

> Nor do I -- is that a typo?

Pugs is a prototype implementation of the "next generation" of the
Perl language, in Haskell. Google for it for more information.

- Dirk
0
dthierbach2 (260)
2/11/2007 7:49:00 AM
Paul, 

My comment about Haskell was mainly in terms of what woud be the
biggest bang for the buck, not that Haskell would be more popular than
Erlang or OCaml or Scheme, not that it would be faster than C++ or
whatever.

I was a physicist for many years. I started with BASIC in the 1960s.
Boring. Then "FORTRAN IV with WATFIV" in the early 70s. Boring. 

Then various forms of BASIC and Pascal on my home computers (Proc.
Tech. SOL) and work computers (HP 9825A then PDP 11/34A then VAX
11/780, our company's second such machine, installed in my lab). We
also ran an early ADA compiler in our lab's VAX.

All very boring, as a physicist. Some of the early Smalltalk-80 stuff,
just appearing, was slightly interesting.

It took until the mid-80s for Lisp Machines to make programmng truly
interesting. 

Later, I toyed with Squeak, a Smalltalk implementation. Not much
different from ZetaLisp, in important ways.

ML languages were a new frontier. Haskell is the most revolutionary of
all semi-mainstream languages. (I say semi-mainstream because of course
there are many other less popular languages, from Clean to Charity to
Oz to Coq which compete in various ways, but which have relatively
small communities.)

Personally, I favor lazy languages with things like list compehensions.
This is, I think, a nearly perfect match with set theory and math in
general. I cannot imagine liking a language which does allow 

[ n*n | n <- [1..1000] ; n mod 2 = 1]

to succinclty, in set-theoretic terms, list the squares of odd numbers
up to a 1000. ("the set of numbers squared such that the numbers are
elements of the list from 1 to 1000 and such that n mod 2 is 1, that
is, the odd numbers")

(the vertical bar, |, matches usual mathematical notation and means
"such that" or "s.t." or "assuming that. A lot like probability
notation, by the way. the "<-" can be read as "taken from" or "element
of," where the "<-" sort of looks like the element of symbol.)

The treatment of potentially infinite sets is even more elegant.
Likewise, lazy evaluation is more elegant than kludges involving FORCE
and DELAY.

Haskell is almost pure set theory. (I never used SETL, so I can't
compare it.)

Haskell, to me, is just math made for computers.

Whether C++ or Modula-3 or whatever are 2.6 times faster is immaterial. 

To me, of course.


--Tim May
0
timcmay (55)
2/11/2007 8:05:30 AM
Chris Rathman <Chris.Rathman@comcast.net> wrote:
[...]
> The translation is mostly a self-education project.  Some might find
> it useful in a limited context.  Others will find it muddled.  It is a
> "translation", not a methodical attempt to teach any of the other
> languages, nor optimized algorithmic design in those PLs.  Being a
> translation, it's only natural that a lot of Scheme-isms will permeate
> the code (though I can't think of better language to mimic than
> Scheme).
[...]
> Perhaps it's because the specific example chosen is demonstrating a
> technique that does not translate well into an ML idiom.  And perhaps
> because there's not an ML idiomatic way to express this particular
> technique, most O'Caml code will avoid using it.

AAAARRGH...  Frankly, your translations of Scheme snippets in SICP to
Alice ML suck.  I'm not sure that I would describe them as
translations (http://en.wikipedia.org/wiki/Translation).  They look
more like transliterations
(http://en.wikipedia.org/wiki/Transliteration) to me.  Your
translations contain lots of irritating superfluous parentheses and
semicolons (see sections "Declarations and expressions" and
"Semicolons" on the page http://mlton.org/StandardMLGotchas).  Many of
the functions are unnecessarily verbosely written and do not follow ML
idioms.  I would definitely NOT RECOMMEND reading your examples if the
goal is to learn how to program in ML.

Here is a note from the pages:

(* Note: The Scheme code is demonstrating both packaging (encapsulation) and
         dynamic dispatch.  ML can handle encapsulation with no problem using
         either functors or records.  The dynamic dispatch does present a
         challenge for any statically typed PL (at least any that want to be
         considered as having a sound type system). *)

That just isn't true.  Dynamic dispatch is easy to achieve in ML.
(Subtyping is more challenging.)  For example, the below code shows a
simple way to define pairs as ecapsulated, dynamically dispatched,
mutable objects.  (This relates to the specific example that you are
talking about above.)  First we define an interface for our Pair
objects as a datatype:

  datatype ('a, 'b) pair =
     Pair of {fst : unit -> 'a,
              snd : unit -> 'b,
              setFst : 'a -> unit,
              setSnd : 'b -> unit}

One could use values of type ('a, 'b) pair directly, but it is more
convenient to define helper functions:

  fun fst (Pair c) = #fst c ()
  fun snd (Pair c) = #snd c ()
  fun setFst (Pair c, a) = #setFst c a
  fun setSnd (Pair c, d) = #setSnd c d

A constructor function creates a value (or object) of some interface
type:

  fun simplePair (f, s) = let
         val f = ref f
         val s = ref s
      in
         Pair {fst = fn () => !f,
               snd = fn () => !s,
               setFst = fn f' => f := f',
               setSnd = fn s' => s := s'}
      end

We can, of course, have multiple constructor functions per interface
type.  Here is an "verbose pair" that takes a(nother) pair as an
argument:

  fun verbosePair p =
      Pair {fst = fn () => (print "fst\n" ; fst p),
            snd = fn () => (print "snd\n" ; snd p),
            setFst = fn f' => (print "setFst\n" ; setFst (p, f')),
            setSnd = fn s' => (print "setSnd\n" ; setSnd (p, s'))}

Now, let's create a pair

  val aPair = simplePair (1, 2)

and verbose pair

  val aVerbosePair = verbosePair aPair

and modify the verbose pair

  val () = setFst (aVerbosePair, 10)

As a side-effect, the above prints:

  setFst

Now

  val 10 = fst aPair

evaluates without an error.

-Vesa Karvonen
0
2/11/2007 3:27:07 PM
Chris Rathman wrote:
> That said, I disagree with your last point.  SICP is designed to teach
> programming, and Scheme just happens to be the vehicle used to
> accomplish that goal.  Many of the topics in SICP, such as HOFs, are
> realized nicely in languages with modern type systems, while others
> are not.

I disagree with your conclusion and I already posted counter examples to
some points. There are very few approaches to problem solving presented in
SICP that cannot be implemented more succinctly and more efficiently in
languages like OCaml and F#.

> It's all well and good to say that more modern languages 
> have emerged in the last 20 years, but SICP is still quite an
> approachable introduction to many of the fundamental concepts of FP.
> It won't teach you how to program in ML.  But it will teach you most
> of the ideas that form the basis of modern PLs.

We'll have to agree to disagree. I found SICP to be very slim on
generally-applicable information.

>> Is this function ever going to be useful in OCaml? Can you find a single
>> OCaml program that uses a function like this?
> 
> Perhaps it's because the specific example chosen is demonstrating a
> technique that does not translate well into an ML idiom.  And perhaps
> because there's not an ML idiomatic way to express this particular
> technique, most O'Caml code will avoid using it.

Forget about "expressing techniques" and consider only solving problems. The
OCaml code that I posted solves the same problem much more succinctly and
robustly. There are many other OCaml solutions already out there that will
also be much faster.

>> Why car/cdr and not fst/snd?
> 
> Are you saying that fst|snd are more meaningful than car|cdr (or my
> SML bias that prefers hd|tl)?

ML has no car and cdr (they were stipped to improve reliability). OCaml
nomenclature is fst and snd. There are built-in fst and snd functions.

> For the most part, the use of terms car 
> and cdr in the translation are limited to those functions in SICP that
> are trying to directly define car & cdr (though I'll have to recheck
> the O'Caml translation).

Yes. Given that car and cdr can be stripped with no adverse effects and were
stripped deliberately, it doesn't make sense (to me) to try to reimplement
them.

> Now most Scheme programmers are not going to 
> waste time defining car and cdr, as they will use the builtin
> primitives.  The question is why the authors of SICP waste time
> defining these functions in a number of different ways?

This is what I meant about SICP being very Lisp/Scheme specific.

>> Why dynamic dispatch rather than four separate functions?
> 
> Perhaps because dynamic dispatch was the lesson being taught in that
> particular example.

In OCaml, you would provide both so that users can leverage static typing
when the message being dispatched is known statically. The result would be
faster and more robust but would be cumbersome or impossible to represent
in Lisp or Scheme.

> The original SICP example that you chose to 
> highlight is about how message dispatch can be used to implement car
> and cdr (fst & snd if you prefer).  Now the O'Caml person may counter
> by saying that no one in their right mind would ever want to define
> fst and snd as a dispatched closure (shades of OOP here).

Indeed. They would use OOP.

> But then no  
> right minded Schemer is ever likely to define car and cdr in that
> manner either.  So are the authors wasting our time showing us an
> implementation that will never be used?  Well, if the book were a
> recipe book, then yes this little foray would be a waste of time.  But
> it also happens to teach us a lessons about functions, closures and
> dispatch.  And that lesson is much more valuable than simply handing
> us a recipe about the best way to do something.

The result is little more instructive than a Scheme interpreter written in
ML.

> Now, I'll be the first to admit that this particular example doesn't
> translate well to most other statically typed languages.  But then the
> original question was about the differences between Lisp and ML.  In
> that sense, the derivation of car and cdr as a closure in ML may at
> least serve as an example (of the negative variety) of what Lisp
> idioms to avoid when crossing the border to a typed language.

Those Lisp constructs have been superceded and people should use the modern
variants in modern languages. Unfortunately, SICP spends most of its code
reimplementing them in slightly different ways...

> Personally, I find dynamic dispatch to be a useful programming
> technique in dynamic languages.  OTOH, I find working with types to be
> a much better experience in ML.  But then the discussion boils down to
> one of tradeoffs.  There is much effort put forth by those on both
> sides of the aisle to attempt to recover some of that functionality on
> the other side of the fence (shades of Oleg).  When I get time, I'd
> probably have a second less literal translation that uses classes for
> this particular example.  But I still think the literal translation is
> useful for showing a bridge between the two languages.

I think a translation from OCaml to Scheme would close the circle.

>> Why the catchall pattern at the end?
> 
> Because the compiler will complain about non-exhaustive pattern
> matching.  And it happens to correspond to the SICP example.  If you 
> are dispatching on messages, as the Lisp is doing in the example, then
> you may get an unexpected message. Again, you chose an example that 
> doesn't translate well to ML.

I chose an example that has not been translated well, not one that cannot be
translated well. Dynamic dispatch is easy to achieve in OCaml, as you have
done.

> The very fact that it corresponds to 
> message dispatch via functions and closures, tells us that the example
> is going to be rather hard to translate.

No. This is my point: you found it hard does not imply that it is hard. I
think it would be worth holding off on such comments until the code is peer
reviewed by some expert OCaml programmers.

> the question is whether there is a way to get message dispatch in the
> language? 

Message dispatch can be useful, e.g. when writing an XMLRPC interface.
However, imposing message dispatch all the time is insane. This is why it
was deliberately removed from ML when it evolved from Lisp.

> Again, I'm not particularly satisfied with the translation of message
> dispatch to ML.  Most ML people will say that you have to re-educate 
> yourself to think differently, foregoing these things you happen to be
> fond of in other languages.

If you want to learn new language features (e.g. pattern matching, static
typing, polymorphic variants, OOP) then you have to get educated. There is
no way around it.

What would you say to a C programmer struggling to learn Scheme?

> But the question is whether it's possible 
> to reacquire some of the tools from our toolbox that we have become
> accustomed to in other languages?

F# does that without sacrificing reliability. However, it appears to come at
a grave cost in terms of performance for allocation-intensive programs.

> I do however find the overall response rather dismaying in that I
> infer (probably incorrectly) that because these techniques are not
> available in ML, they should be summarily swept under the rug, never
> to be given serious consideration - effectively labelling them as a
> distraction (pariah) to the way that ML'ers should think.

MLers don't sweep those features under the rug. They hold those features up
and shout "hey guys, look at this awful style that used to be taught before
static typing made better approaches easier".

> It makes me 
> cringe a little when the ML and Haskell advocates (or advocates of any
> other language or paradigm for that matter) say that you have to
> completely relearn how you think and program.

What would you say to a C programmer struggling to learn Scheme?

This is a pretty accurate parallel actually: Let's say a C programmer writes
Scheme code and cites it. The code uses only a handful of Scheme's
features. He always turns off safety. He starts every program by
reimplementing unsafe pointers and pointer arithmetic. He has difficulty
translating the Sieve of Erasthones into Scheme using only pointers and
concludes, therefore, that the Sieve of Erasthones is an example of
something made much more difficult to write by Scheme's so-called "safety
features".

If you were polite, you'd surely tell him that he must relearn how he thinks
about programs before he can master Scheme?

> It may well be true, 
> but I find that having to redefine the craft every couple of years
> rather grating.

ML is over 30 years old. OCaml adds lots of sexy new features but even that
is over 10 years old.

> Has the process of thinking about programs really 
> changed since the release of SICP all those years ago?

Beyond the first few pages? Yes, of course.

>> Again, look at this comment:
>>
>>    Note also that this isn't exactly the most efficient way (or even the
>> most
>>    intuitive way) of representing a queue in OCaml.  But it's a close
>> translation of
>>    the book.
>>
>> To me, that says "this is not the way to implement a queue in OCaml".
> 
> Well, that happens to be the connotation of the quote.  The
> translation of Scheme to ML is ok in some places, torturous in
> others.  It could also be said that the implementation of Queues given
> in SICP is likely not the actual the implemetation that resides in
> most modern Scheme dialects.  There's a million ways to implement a
> queue.  Some are useful.  Some are not.

I'd have thought that anyone new to OCaml and wanting to implement a queue
might look at the mutable Queue implementation in the OCaml stdlib first,
and find Markus Mottl's purely functional implementation second.

>> From my point of view, SICP remains decades out of date and people
>> interested in learning something modern should look to modern examples.
>> Take a look at Markus Mottl's queue implementations, for example:
> 
> If you want to learn how to implement queues (and a number of other
> data structures) in ML, then at least send them to the definitive
> source: Purely Functional Data Structures by Chris Okasaki, of which
> the link you give is a translation to O'Caml.

Yes. That is an excellent book.

> But as long as we're on the subject, and handing out reading
> recommendations, there's some interesting material in the book
> Concepts, Techniques and Models of Computer Programming by Peter van
> Roy and Seif Haridi.
> 
> http://www.info.ucl.ac.be/~pvr/book.html
> 
> In the book they cover constant time queues using dataflow variables
> in chapter #3 (3.4.5 to be exact).  But then that book is written
> using Oz, much as SICP is written using Scheme.

For anyone wanting to implement any data structures in any statically typed
languages, I would not recommend any books written entirely in dynamically
typed languages. Data structures are one of the areas most affected by
static typing. In ML, we want to leverage inference whilst having
well-defined and machine-verified interfaces.

> Alice ML can get us 
> real close - the translation page on SICP also has a link to a
> translation of many of the book's examples to CTM.  But then if you
> just want to live in a single language and learn under a single
> paradigm and find recipes for doing things, then SICP and CTM are not
> the books to read to learn programming.

But those books presumably omit all discussion of static typing, pattern
matching and many other features that are fundamental to many new FPLs?

> But I have to agree to 
> disagree that SICP is out of date.  It teaches many things that are
> relevent today.  It teaches us things we are still striving for.  And
> it does so quite elegantly.

SICP cannot leverage any of the many improvements invented or adopted over
the intervening 20 years. Pattern matching, type inference and static
typing are now widely supported.

> That's not to say that the SICP translations are necessarily of high
> quality (I spent an entire 4 hours trying to come up to speed on
> O'Caml and translating chapters #1 & #2, though I confess it was
> mostly a translation of the Alice ML code which I spent much more time
> on).  They are merely (feable) attempts at making the ideas locked in
> SICP a bit more accessible to myself.   Perhaps not useful to others,
> but then you get what you pay for.  And being on a wiki, others more
> in tune with these languages can modify it to be more idiomatic and
> elegant.

Do you mind if I have a go? I think I would start by replacing some of the
data structure examples wholesale.

>> For anyone wanting to learn OCaml or F# from good code, I recommend:
>>
>>  http://www.ffconsultancy.com/free/ocaml
>>  http://www.codecodex.com/wiki/index.php?title=Category:Objective_Caml
> 
> So the question becomes how one learns ML?  The answer is not a simple
> - buy this book and you will suddenly possess ML-fu.

If you want a good book on ML then check out Larry Paulson's "ML for the
Working Programmer".

> Indeed, people 
> learn things in many different ways with different approaches.  What
> works for one, may not work for another.  And learning isn't always a
> go-or-no-go situation.  People come to ML with different backgrounds
> in programming.  I happen to think that some of the all time best
> introductory programming texts have come from the Scheme community.

There are certainly very few books on ML. There are several on Haskell
though. I bought some recently and am ploughing through them.

> I 
> happen to also be naive enough to think that there's got to be a way
> to leverage that material to other languages.  But in no way do I
> think this project amounts to a substitute for a more reasoned
> presentation of O'Caml or F# - it's just one more resource in a sea of
> resources that's available in the ocean of the internet.

I'm not convinced. I believe that many people complain about Joshua B.
Smith's book "Practical OCaml" because it stuck too closely to "Practical
Lisp", in that the examples were good for Lisp but not for OCaml.

>> and, of course, my book.
> 
> I've heard good things about your O'Caml book, though I haven't read
> it myself.  I'm actually more interested in your F# book - Any news
> you can share on the delivery?

Yes, I was going to publish F# for Scientists this month but Microsoft just
agreed to commission me to write 2 extra chapters and have the book
published by a mainstream publisher (e.g. Wiley).

Robert Pickering is publishing Foundations of F# through APress next month,
I believe.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/11/2007 4:13:05 PM
Jon Harrop <jon@ffconsultancy.com> writes:
> For anyone wanting to implement any data structures in any statically typed
> languages, I would not recommend any books written entirely in dynamically
> typed languages. Data structures are one of the areas most affected by
> static typing.

I may be missing something here--I thought the classic book on data
structures was still TAOCP whose examples are all in (untyped)
assembly language.  Data structures seem to me much more pervasively
affected by functional programming (think of graphs) than by static typing.

> SICP cannot leverage any of the many improvements invented or adopted over
> the intervening 20 years. Pattern matching, type inference and static
> typing are now widely supported.

Er, pattern matching has been done in Lisp programs (including
destructuring-bind in CL) since the dawn of time; SICP itself has
examples.  Static typing per se is familiar from the whole
Algol/Fortran family, and dispatch based on static types is implicit
in how most compilers translate arithmetic and explicit in C++ and
Java.  What's new and different in ML is its much more extensive type
system with inference.  However, extensive reliance on type inference
appears to get hopelessly confusing, thus the customary Haskell
practice of declaring the types of most functions.  But even after
that, it's still confusing and reportedly gets in the way (cf. Joel
Reymont's Haskell vs. Erlang posts).  I'm trying to keep an open mind.

> Do you mind if I have a go? I think I would start by replacing some
> of the data structure examples wholesale.

That would be cool; it would also be interesting to see Haskell versions.

> If you want a good book on ML then check out Larry Paulson's "ML for the
> Working Programmer".

I have a borrowed copy of this book but found it pretty unhelpful.  It
discussed the language's abstract qualities reasonably (though
concentrated a bit much on theorem proving), however it didn't say
much about what working programmers actually concern themselves with,
such as how to do I/O, concurrency, etc.

> There are certainly very few books on ML. There are several on Haskell
> though. I bought some recently and am ploughing through them.

I have "The Haskell School of Expression" which is pretty readable but
IMO too superficial and too limited to get anything done with.  I'd be
interested to know what else you find.
0
phr.cx (5493)
2/11/2007 5:30:34 PM
On Feb 11, 9:27 am, Vesa Karvonen <vesa.karvo...@cs.helsinki.fi>
wrote:
> Here is a note from the pages:
>
> (* Note: The Scheme code is demonstrating both packaging (encapsulation) and
>          dynamic dispatch.  ML can handle encapsulation with no problem using
>          either functors or records.  The dynamic dispatch does present a
>          challenge for any statically typed PL (at least any that want to be
>          considered as having a sound type system). *)

Perhaps not my most lucid comment, but it has to do with apply and
generic selectors in section 2.4.3, not the derivation of car and cdr
in chapter 3.3.1.

> That just isn't true.  Dynamic dispatch is easy to achieve in ML.
> (Subtyping is more challenging.)  For example, the below code shows a
> simple way to define pairs as ecapsulated, dynamically dispatched,
> mutable objects.  (This relates to the specific example that you are
> talking about above.)  First we define an interface for our Pair
> objects as a datatype:

Thanks for pointing out a proper method of translating the code in
question.  I don't know that this quite captures the dynamic dispatch
that is being referred to in the comments above.  It does do
polymorphic dispatch which is what should be emphasized in the code in
3.3.1, but it's not exactly what I'd term as dynamic dispatch that the
comment refers to in 2.4.3.



0
2/11/2007 6:06:41 PM
Markus E Leypold <development-2006-8ecbb5cc8aREMOVETHIS@ANDTHATm-e-leypold.de> writes:
> I' like to support this point of view. I've been told that the
> algorithms Darcs uses are scaling badly in memory and time in certain
> cases. So this would be not an implementation issue, not a Haskell
> issue and unfixable until the specification would be changed (but I
> think the Darcs-people would rather wait for better hardware and I'm
> not so sure they would be wrong :-).

But could be that coding in Haskell naturally leads one to use
badly-scaling algorithms?

> >> he spent too much time fighting the type system and decided he
> >> was more productive with dynamic types;
> > That's an interesting one. ...  I wonder if my mind is just bent
> > to think of problem solutions that don't need dynamic types.

Well, his application involved sending a lot of serialized data over
network sockets and he had a lot of object types, so I guess he found
it more straightforward on the receiving end to deserialize the
packets into objects of the appropriate type and dispatch them
dynamically, than to have static selection logic all over the place.
I guess in OCaml one would use OOP, but I always saw OOP as a
workaround to static types.

>   - The I/O Monad just works like imperative programming: Write your
>     main and I/O procedures in the I/O monad. Don't bothe what a
>     "monad" in the theoretical view is. You can just use ut.

Well, what about functions like liftM2?  I understood once what that
did but it escapes me now.

>   - Everything else is pure and lazy. Write your data transformations
>     pure and lazy (outside the I/O monad).
> 
> You don't have to bother about the specialities for a long time. 

But I think I have to use other monads for arrays, concurrency, etc.

> > "Anything serious" is overstating it a bit, though. 
> Yes. You can, i.e. write a complete compiler or XML transformation
> tool in Haskell w/o much wizadry. 

OK, but I think of "serious" as meaning concurrent network
applications, programs that do transactions and therefore need to know
exactly when their I/O completes, programs that use complex OS
interfaces, etc.  SPJ's article "Tackling the awkward squad" describes
some of the issues.

> :-) Exactly my point, though I'd set the cutoff even earlier: People
> don't have GADTs in other languages, so they don't strictly need them
> in Haskel to do useful things.

I don't understand what precisely what a GADT is, but in dynamically
typed languages doesn't the whole question go away?
0
phr.cx (5493)
2/11/2007 6:23:27 PM
Paul Rubin schrieb:
> I thought the classic book on data
> structures was still TAOCP whose examples are all in (untyped)
> assembly language.

If you mean Don Knuth's series: it's seriously outdated, particularly on 
topics like abstract data types, data structures, etc.
In all, it reflects the topics and choices that were controversial and 
important in the fifties. (Similar to reading a book on geometry written 
by the old Greek philosophical mathematicians. Even if they contributed 
substantially to the state of the art, neither their problems nor much 
of their results are very relevant today anymore.)

 > Data structures seem to me much more pervasively
> affected by functional programming (think of graphs) than by static typing.

Graphs are even more affected than the usual data structure.
It's easy to model vertices using pointers if the pointers can mutate 
and nearly impossible for the cyclic case if they can't mutate.

In comparison to that, most other data structures are only mildly affected.

> Er, pattern matching has been done in Lisp programs (including
> destructuring-bind in CL) since the dawn of time; SICP itself has
> examples.

I think it's the combination of sum types (tagged unions) and pattern 
matching.
Lisp doesn't use sum types (if it has them at all); instead, new 
primitive data types proliferated, making its type system unnecessarily 
complex (from the viewpoint of an ML or Haskell programmer).

 > Static typing per se is familiar from the whole
> Algol/Fortran family,

Now that's doing injustice to entire generations of programming languages.

Algol had a type algebra, while Fortran did not. (I'm not sure if it has 
today. If so, it's not Fortran anymore *g*)

Then, *ML and Haskell have tagged unions and pattern matching (actually 
structure matching), something that none of the Algol language have.

In addition, *ML and Haskell have type inference.

That's very, very far ahead of what Fortran does. For some metrics, 
Fortran is farther from the *ML family than the Lisp family!

 > and dispatch based on static types is implicit
> in how most compilers translate arithmetic and explicit in C++ and
> Java.

Again, that's dismissing vital differences.
The combination of tagged types and pattern matching is more than its parts.

 > What's new and different in ML is its much more extensive type
> system with inference.  However, extensive reliance on type inference
> appears to get hopelessly confusing, thus the customary Haskell
> practice of declaring the types of most functions.  But even after
> that, it's still confusing and reportedly gets in the way (cf. Joel
> Reymont's Haskell vs. Erlang posts).  I'm trying to keep an open mind.

I think Joel tried to do something for which you need to know your tools 
well, and exercize them to their limits (I'm assuming we're both talking 
about the poker server guy - my recollection of names it generally bad).
He didn't know Haskell well enough, and almost certainly was unable to 
exercize it to its limits.

He said he was confused by the type errors, but then that seems to be 
the norm for the first few months in Haskell. He never got beyond that 
stage AFAIR.

In other words, I'd say that type inference can be confusing, but it 
seems to be a vast net win after a few months.

Regards,
Jo
0
jo427 (1164)
2/11/2007 6:51:29 PM
Paul Rubin <http://phr.cx@nospam.invalid> wrote:
> Markus E Leypold
> <development-2006-8ecbb5cc8aREMOVETHIS@ANDTHATm-e-leypold.de> writes:

>>   - The I/O Monad just works like imperative programming: Write your
>>     main and I/O procedures in the I/O monad. Don't bother what a
>>     "monad" in the theoretical view is. You can just use it.

I'd like to second this point of view very strongly.

> Well, what about functions like liftM2?  I understood once what that
> did but it escapes me now.

It lifts a pure function to a function that works in a monad. Just
have a look at the type (that's what types are good for, among other
things they *document* what a function is doing):

liftM2 :: (Monad m) => (a -> b -> c) -> (m a -> m b -> m c)

So you have a function f :: a -> b -> c, i.e., with two arguments,
and you transform this into a function that takes two monadic
values as arguments and return a third monadic value. And there just
one obvious way to do that :-)

But if cannot remember these functions, you don't have to use them -- just
"think imperatively" and start coding. In the same way, if you have a
function that could be elegantly expressed using, say, fold and map,
you can still implement this function with basic pattern matching,
without the HOFs (but of course it's better style of you do).

- Dirk
0
dthierbach2 (260)
2/11/2007 7:48:34 PM
Joachim Durchholz <jo@durchholz.org> writes:

> Paul Rubin schrieb:
>> I thought the classic book on data
>> structures was still TAOCP whose examples are all in (untyped)
>> assembly language.
>
> If you mean Don Knuth's series: it's seriously outdated, particularly
> on topics like abstract data types, data structures, etc.

Knuth TAOCP is not about data structures, abstraction or software
engineering. It's about algorithms and in that area it is not outdated
at all.

> In all, it reflects the topics and choices that were controversial and
> important in the fifties. (Similar to reading a book on geometry
> written by the old Greek philosophical mathematicians. Even if they
> contributed substantially to the state of the art, neither their
> problems nor much of their results are very relevant today anymore.)

I wish the people implementing e.g. pseudo random number generators
today had read Knuth's 2nd volume. Just recently I had a case of
abolutely pathological test data (but not recognizable as such at the
first glance) because someone messed up with random number generation.

So I don't agree. TAOCP is not outdated and should be required reading
at universities.

BTW: Which other texts would you recommend as a substitute? (Serious:
I'm always collecting bibliographic references).

Regards -- Markus


0
2/11/2007 8:20:06 PM
Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:

> Markus E Leypold <development-2006-8ecbb5cc8aREMOVETHIS@ANDTHATm-e-leypold.de> writes:
>> I' like to support this point of view. I've been told that the
>> algorithms Darcs uses are scaling badly in memory and time in certain
>> cases. So this would be not an implementation issue, not a Haskell
>> issue and unfixable until the specification would be changed (but I
>> think the Darcs-people would rather wait for better hardware and I'm
>> not so sure they would be wrong :-).
>
> But could be that coding in Haskell naturally leads one to use
> badly-scaling algorithms?

No. What nonsense.

Regards -- Markus



0
2/11/2007 8:21:18 PM
Hi Paul,

Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:

>>   - The I/O Monad just works like imperative programming: Write your
>>     main and I/O procedures in the I/O monad. Don't bother what a
>>     "monad" in the theoretical view is. You can just use it.
>
> Well, what about functions like liftM2?  I understood once what that
> did but it escapes me now.

Do you need it to program on the same level of abstraction as
i.e. with C / C++ / Ada / OCaml? No. So liftM2 is optional in a sense
(OK, as I'm not a hard core Haskeller I might miss a subtle point, but
the point I'd like to make, is, that people worry about _additional_
features when deciding to migrate from languages which don't have that
features at all).

>
>>   - Everything else is pure and lazy. Write your data transformations
>>     pure and lazy (outside the I/O monad).
>> 
>> You don't have to bother about the specialities for a long time. 

> But I think I have to use other monads for arrays, concurrency, etc.

You don't have to use array: Many things can be done purely functional
with lists and maps. Concurrency is hardly what one would need in an
application that needs to do simple things, so certainly (in my eyes)
not necessary for the beginner. And, Paul: Concurrency is also hard in
imperative languages :-).


>> > "Anything serious" is overstating it a bit, though. 
>> Yes. You can, i.e. write a complete compiler or XML transformation
>> tool in Haskell w/o much wizardry. 

> OK, but I think of "serious" as meaning concurrent network
> applications, programs that do transactions and therefore need to know
> exactly when their I/O completes, programs that use complex OS
> interfaces, etc.  SPJ's article "Tackling the awkward squad" describes
> some of the issues.

Yes, there are such programs. But those are not the only serious
programs, but "systems programming" or whatever. Also serious in my
opinion (and much more frequent) is "data processing", that is, read
your data, derive other data from it and output that. E.g. creating a
report from a database, massaging XML documents (deleting sections,
extract meta data etc). Programming servers is only a tiny part of
information processing.

>> :-) Exactly my point, though I'd set the cutoff even earlier: People
>> don't have GADTs in other languages, so they don't strictly need them
>> in Haskell to do useful things.

> I don't understand what precisely what a GADT is, but in dynamically
> typed languages doesn't the whole question go away?

Yes. At the expense of less checking. I wouldn't want that: Less
static typing makes things slow, because to ensure type safety and
exclude execution errors, dynamic checks would have to be introduced.

Regards -- Markus
0
2/11/2007 8:30:56 PM
Joachim Durchholz <jo@durchholz.org> writes:
> If you mean Don Knuth's series: it's seriously outdated, particularly
> on topics like abstract data types, data structures, etc.

Well, it says nothing at all about abstract data types, since it
treats all of computing as bit-smashing operations; but I hadn't
thought of it as outdated regarding data structures.  Maybe I'm behind
the times myself.

> I think it's the combination of sum types (tagged unions) and pattern
> matching.

You mean objects with tags in the runtime representation?  Is that
different from dynamic types with some annotation?  I knew ML and
Haskell support (e.g.) numeric types like "Float | Int" but I thought
that eventually the static type system had to resolve what every
instance actually was at compile time.  

> Then, *ML and Haskell have tagged unions and pattern matching
> (actually structure matching), something that none of the Algol
> language have.

Am I missing something?  I think of pattern/structure matching as
separate from the static type system, since it can be done naturally
in Lisp.  I agree that ML and Haskell's type system is richer than
(Algol*)'s and that would include union types.

> The combination of tagged types and pattern matching is more than its parts.

OK, again I'm missing something, is there a simple example?

> I think Joel tried to do something for which you need to know your
> tools well, and exercize them to their limits (I'm assuming we're both
> talking about the poker server guy - my recollection of names it
> generally bad).

Yes, I didn't remember his name either but re-read his post and found
his name there.

> He said he was confused by the type errors, but then that seems to be
> the norm for the first few months in Haskell. He never got beyond that
> stage AFAIR.

Well, he goes much further, he says the type system actually gets in
the way, giving examples like deserializing packet layouts which can
result in differing record types depending on the packet contents.

Have you seen his post?  This is what got me discouraged about
Haskell, though I haven't given up on it, I haven't messed with it as
much since then.

  http://wagerlabs.com/2006/01/01/haskell-vs-erlang-reloaded

> In other words, I'd say that type inference can be confusing, but it
> seems to be a vast net win after a few months.

I keep hearing this and I do know that I spend a lot of time finding
Lisp and Python bugs at runtime that a static type system could have
caught at compile time.  

I think the next book I look at should be Pierce's TAPL.  There was an
interview with Autrijus (now Audrey) Tang which recommended it highly
and basically said it was the reason Pugs got written.  Any thoughts?
0
phr.cx (5493)
2/11/2007 9:45:13 PM

Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:
>
> Have you seen his post?  This is what got me discouraged about
> Haskell, though I haven't given up on it, I haven't messed with it as
> much since then.
>
>   http://wagerlabs.com/2006/01/01/haskell-vs-erlang-reloaded

What I see here, is a (badly quoted) piece from presumably a mail or
news group post which cannot be found elsewhere. Badly quoted, because
I assume that the first paragraph is written by someone else (Joey?) ,
not by SPJ. After that I stopped reading.

Regards -- Markus

0
2/11/2007 10:03:26 PM
Paul Rubin schrieb:
> Markus E Leypold <development-2006-8ecbb5cc8aREMOVETHIS@ANDTHATm-e-leypold.de> writes:
>> I' like to support this point of view. I've been told that the
>> algorithms Darcs uses are scaling badly in memory and time in certain
>> cases. So this would be not an implementation issue, not a Haskell
>> issue and unfixable until the specification would be changed (but I
>> think the Darcs-people would rather wait for better hardware and I'm
>> not so sure they would be wrong :-).
> 
> But could be that coding in Haskell naturally leads one to use
> badly-scaling algorithms?

I wouldn't draw that conclusion from the alleged scaling problems of Darcs.

First, making revision control systems scale well is a generally hard 
problem. CVS doesn't scale well with the number of branches, for 
example. I'm not sure about Subversion (it's generally slow for my 
current setup, but I suspect it's not the server that's slow).

Second, Darcs seems to be in the late "we optimize the hell out of it" 
phase. At least that's what I gather from
http://darcs.net/DarcsWiki/FrequentlyAskedQuestions#head-68209e6e104b04ac71812b5f991dc3b58ada2390

Third, it seems that the scalability problems are either being addressed 
or due to some basic design decisions.

None of these things are particularly Haskell-related, I'd say.


In general, I'd say that it's easy to write inefficient software in 
Haskell, particularly if you don't (yet) know what lazy evaluation 
actually is.
I think that this effect will diminish over time, and it may even 
reverse because the programmer has more time to do algorithmic 
optimization (compared to the usual imperative languages anyway).
I'm not sure how well an average programmer will be able to work with 
Haskell. (Haskell seems to be particularly attractive to people with a 
more mathematical inclination, which is definitely not what the average 
programmer is. So it's difficult to draw conclusions from experience.)

Regards,
Jo
0
jo427 (1164)
2/11/2007 10:48:56 PM
Paul Rubin schrieb:
> Joachim Durchholz <jo@durchholz.org> writes:
>> If you mean Don Knuth's series: it's seriously outdated, particularly
>> on topics like abstract data types, data structures, etc.
> 
> Well, it says nothing at all about abstract data types, since it
> treats all of computing as bit-smashing operations;

I wasn't sure whether he was entirely blank about ADTs.

 > but I hadn't
> thought of it as outdated regarding data structures.  Maybe I'm behind
> the times myself.

Well, it's trees and hash tables and queues IIRC.
I don't recall B* trees, tough that may be a lapse of memory. (He dwells 
a lot on the problem of sorting when you have too little RAM and several 
tape decks. Interesting, but not a serious problem on a modern computer.)
Geographical applications of data structures: blank.
No lazy data structures.
No treatment of immutable data structures.
Garbage collection is severely under-represented.

>> I think it's the combination of sum types (tagged unions) and pattern
>> matching.
> 
> You mean objects with tags in the runtime representation?

No. Some people name them "discriminated union".

At the C level, these are unions of structs, with a mandatory common 
"type" field that discriminates which kind of object this is.

 > Is that
> different from dynamic types with some annotation?

Yes.

 > I knew ML and
> Haskell support (e.g.) numeric types like "Float | Int"

Float | Int is not a numeric type. It's a type that may *contain* either 
a Float or an Int, but you have to deconstruct it via pattern matching 
to get at the numeric value inside it.

 > but I thought
> that eventually the static type system had to resolve what every
> instance actually was at compile time.  

Yes, but in FPLs, the notion of a "static type" is often broader than 
what you usually have. E.g.
   a -> b -> a
is a function that takes something of types a and b, and returns 
something of type a - where a and b can be any types.
The types are ultimately static, but a lot of source code can be very 
agnostic of the types that are actually passed through.

>> Then, *ML and Haskell have tagged unions and pattern matching
>> (actually structure matching), something that none of the Algol
>> language have.
> 
> Am I missing something?  I think of pattern/structure matching as
> separate from the static type system, since it can be done naturally
> in Lisp.

In your typical statically-typed FPL, it's extremely helpful because it 
unifies case discrimination, union-of-structs deconstruction and 
assigning the struct elements to local variables in a single, 
easy-to-grasp operation.

It is also helpful because tagged unions are actually ubiquitous.
E.g. you return a data structure with an error flag? In an FPL, that's a 
tagged union of an Error type and the normal return type.
Got several variants of a record, with different fields reinterpreted 
depending on circumstances? Tagged unions can handle this kind of 
situation just fine, and with none of the "uncleanness" associated with 
this kind of structure.

When I last did Lisp (which is several years ago), it didn't use these 
idioms at all, so while pattern matching is possible in Lisp, it's not 
as useful.

> Well, he goes much further, he says the type system actually gets in
> the way, giving examples like deserializing packet layouts which can
> result in differing record types depending on the packet contents.

 From what he wrote, it was a bit unclear to me what his actual problems 
were (well, those with the type system at least). I was under the 
impression that he simply didn't understand what Haskell's type system 
is good for, but it's hard to tell whether that's even remotely true 
without taking a look at his actual source code.

> Have you seen his post?

Yes.

Regards,
Jo
0
jo427 (1164)
2/11/2007 11:22:27 PM
Joachim Durchholz <jo@durchholz.org> writes:
> No. Some people name them "discriminated union".
> At the C level, these are unions of structs, with a mandatory common
> "type" field that discriminates which kind of object this is.

Oh, ok, this is basically how Lisp works.

> The types are ultimately static, but a lot of source code can be very
> agnostic of the types that are actually passed through.

Right, that's different from something with a type field that's
dispatched on at runtime.

> It is also helpful because tagged unions are actually ubiquitous.
> E.g. you return a data structure with an error flag? In an FPL, that's
> a tagged union of an Error type and the normal return type.
> Got several variants of a record, with different fields reinterpreted
> depending on circumstances? Tagged unions can handle this kind of
> situation just fine, and with none of the "uncleanness" associated
> with this kind of structure.

Aha, this sounds like exactly what the poker guy had trouble with, and
it wasn't clear to me that it was possible to do in Haskell cleanly.
I think I understand what you're getting at now, though I don't know
the Haskell syntax to express it right now, I should be able to figure
it out from the docs.  The idea is to select a different pattern match
based on the type field, and use the matched stuff as function args,
so the matched args and the function itself are statically typed and
are type-checked straightforwardly.  Thanks!

> When I last did Lisp (which is several years ago), it didn't use these
> idioms at all,

Well of course it's straightforward to write in that style (using
destructuring-bind, for example), but maybe it's more common to use
structure types directly.

>  From what he wrote, it was a bit unclear to me what his actual
> problems were (well, those with the type system at least).

Well, he gave some concrete examples where it took much more code to
do something in Haskell than in Erlang, and much more cutting and
pasting using his method.  Maybe there's a better way but I'm not
familiar enough with Haskell to know this at the moment.
0
phr.cx (5493)
2/12/2007 12:20:23 AM
Paul Rubin wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
>> For anyone wanting to implement any data structures in any statically
>> typed languages, I would not recommend any books written entirely in
>> dynamically typed languages. Data structures are one of the areas most
>> affected by static typing.
> 
> I may be missing something here--I thought the classic book on data
> structures was still TAOCP whose examples are all in (untyped)
> assembly language.  Data structures seem to me much more pervasively
> affected by functional programming (think of graphs) than by static
> typing.

Does it cover concurrent, lazy or immutable data structures?

>> SICP cannot leverage any of the many improvements invented or adopted
>> over the intervening 20 years. Pattern matching, type inference and
>> static typing are now widely supported.
> 
> Er, pattern matching has been done in Lisp programs (including
> destructuring-bind in CL) since the dawn of time; SICP itself has
> examples.

Turing argument and Greenspun.

> Static typing per se is familiar from the whole 
> Algol/Fortran family, and dispatch based on static types is implicit
> in how most compilers translate arithmetic and explicit in C++ and
> Java.

Numeric towers are rare outside CASs.

> What's new and different in ML is its much more extensive type 
> system with inference.  However, extensive reliance on type inference
> appears to get hopelessly confusing,

It works well for me.

> thus the customary Haskell 
> practice of declaring the types of most functions.  But even after
> that, it's still confusing and reportedly gets in the way (cf. Joel
> Reymont's Haskell vs. Erlang posts).  I'm trying to keep an open mind.

Don't extrapolate something Joel said about Haskell to the whole of type
inference. ML's type system is fundamentally different from Haskell's, for
a start.

>> Do you mind if I have a go? I think I would start by replacing some
>> of the data structure examples wholesale.
> 
> That would be cool; it would also be interesting to see Haskell versions.

I've made some minor alterations already, removing some superfluous
parentheses.

>> If you want a good book on ML then check out Larry Paulson's "ML for the
>> Working Programmer".
> 
> I have a borrowed copy of this book but found it pretty unhelpful.  It
> discussed the language's abstract qualities reasonably (though
> concentrated a bit much on theorem proving), however it didn't say
> much about what working programmers actually concern themselves with,
> such as how to do I/O, concurrency, etc.

IO is different for each member of the ML family of languages. You'll need
to turn to language specific books or manuals if you want that kind of
information.

Many MLs don't support concurrency. F# does.

>> There are certainly very few books on ML. There are several on Haskell
>> though. I bought some recently and am ploughing through them.
> 
> I have "The Haskell School of Expression" which is pretty readable but
> IMO too superficial and too limited to get anything done with.  I'd be
> interested to know what else you find.

I got that and:

http://www.amazon.com/Algorithms-Functional-Programming-Approach-International/dp/0201596040/ref=cm_lmf_tit_5_rsrsrs0/002-3342211-4734423

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/12/2007 1:07:39 AM
Chris Rathman <Chris.Rathman@comcast.net> wrote:
> On Feb 11, 9:27 am, Vesa Karvonen <vesa.karvo...@cs.helsinki.fi>
> wrote:
> > Here is a note from the pages:
> >
> > (* Note: The Scheme code is demonstrating both packaging (encapsulation) and
> >          dynamic dispatch.  ML can handle encapsulation with no problem using
> >          either functors or records.  The dynamic dispatch does present a
> >          challenge for any statically typed PL (at least any that want to be
> >          considered as having a sound type system). *)

> Perhaps not my most lucid comment, but it has to do with apply and
> generic selectors in section 2.4.3, not the derivation of car and cdr
> in chapter 3.3.1.

Like I said, dynamic dispatch isn't hard to achieve in ML.  Below is a
bit ad-hoc and simplistic implementation of generic operations similar
in spirit to the SICP technique.  It could be improved significantly
(e.g. O(1) lookup of operations and coercions can be achieved), but I
leave further improvements to the reader.

(* Some general purpose utilities *)
fun pass x f = f x

structure List = struct
   open List
   fun push rxs x = rxs := x :: !rxs
   fun findSome f =
    fn [] => NONE
     | x::xs =>
       case f x of
          NONE => findSome f xs
        | result => result
end

(* Signature for arithmetic modules *)
signature ARITHMETIC = sig
   type t
   val + : t * t -> t
   val - : t * t -> t
   val * : t * t -> t
   val div : t * t -> t
end

(* Two "specific" arithmetic modules *)
structure FixedArith : ARITHMETIC = struct open FixedInt type t = int end
structure LargeArith : ARITHMETIC = struct open LargeInt type t = int end

(* Helper *)
functor GenBop (type t) = struct
   val cases : (t * t -> t option) list ref = ref []
   val add = List.push cases
end

(* Generic arithmetic module - sealed later *)
structure Arithmetic = struct
   datatype t = In of {from : t -> t option, value : exn}
   fun out (In t) = t

   structure Add = GenBop (type t = t)
   structure Sub = GenBop (type t = t)
   structure Mul = GenBop (type t = t)
   structure Div = GenBop (type t = t)

   local
      fun none () = NONE

      fun tryCases args cases =
          List.findSome (pass args) (!cases)

      fun tryCoerce (l, r) =
          case #from (out r) l of
             SOME l => SOME (l, r)
           | NONE =>
             case #from (out l) r of
                SOME r => SOME (l, r)
              | NONE => NONE

      fun mk cases args =
          case tryCases args cases of
             SOME v => v
           | NONE =>
             case tryCoerce args of
                SOME args => mk cases args
              | NONE => raise Domain
   in
      val op + = mk Add.cases
      val op - = mk Sub.cases
      val op * = mk Mul.cases
      val op div = mk Div.cases
   end
end

(* Signature for a "generic" arithmetic module *)
signature GEN = sig
   type t
   val addCoercion : {unwrap : Arithmetic.t -> 'a option,
                      coerce : 'a -> t} -> unit
   val wrap : t -> Arithmetic.t
   val unwrap : Arithmetic.t -> t option
end

(* Installs a specific arithmetic module to the generic arithmetic module *)
functor InstallBops (A : ARITHMETIC) : GEN = struct
   open Arithmetic
   type t = A.t
   exception T of A.t
   val coercions : (Arithmetic.t -> t option) list ref = ref []
   fun addCoercion {unwrap, coerce} =
       List.push coercions
                 (fn x => case unwrap x of
                             NONE => NONE
                           | SOME x => SOME (coerce x))
   fun from x =
       case List.findSome (pass x) (!coercions) of
          NONE => NONE
        | SOME x => SOME (wrap x)
   and wrap x = In {from = from, value = T x}
   val unwrap = fn In {value = T x, ...} => SOME x | _ => NONE
   val () = let
      fun add adder oper =
          adder (fn (In {value = T l, ...}, In {value = T r, ...}) =>
                    SOME (wrap (oper (l, r)))
                  | _ => NONE)
   in
      add Add.add A.+
    ; add Sub.add A.-
    ; add Mul.add A.*
    ; add Div.add A.div
   end
end

(* Here we seal the generic arithmetic module *)
structure Arithmetic : ARITHMETIC = Arithmetic

(* Here we install a couple of specific arithmetic modules *)
structure GenFixedArith = InstallBops (FixedArith)
structure GenLargeArith = InstallBops (LargeArith)

(* Here we install a coercion *)
val () = GenLargeArith.addCoercion
            {unwrap = GenFixedArith.unwrap,
             coerce = FixedInt.toLarge}

(* Here is a simple example of arithmetic *)
val SOME (11 : LargeInt.int) =
    GenLargeArith.unwrap
       (Arithmetic.+ (GenFixedArith.wrap 10,
                      GenLargeArith.wrap 1))
0
2/12/2007 1:31:07 AM
Jon Harrop <jon@ffconsultancy.com> writes:
> > I may be missing something here--I thought the classic book on data
> > structures was still TAOCP whose examples are all in (untyped)
> Does it cover concurrent, lazy or immutable data structures?

I'd say not all that much; however, "classic" != "latest and most modern".

> > Er, pattern matching has been done in Lisp programs (including
> > destructuring-bind in CL) since the dawn of time; SICP itself has examples.
> Turing argument and Greenspun.

Eh?  Wasn't Mathematica the other example of pattern matching that
you like to cite?  It's dynamically typed just like Lisp.

> Numeric towers are rare outside CASs.
Not sure what you're getting at there.

> Don't extrapolate something Joel said about Haskell to the whole of
> type inference.

Well it's still standard practice in Haskell to declare most or all
functions, even though the compiler can infer the types, because the
declarations make things easier on humans and make it easier to locate
errors once the compiler reports them.  I don't know whether those
issues apply to ML.

> Many MLs don't support concurrency. F# does.

Ocaml and AliceML (I'm not sure about SML) do support concurrency, but
the implementations aren't parallel.

> > I have "The Haskell School of Expression" which is pretty readable but
> I got that and:
> http://www.amazon.com/Algorithms-Functional-Programming-Approach-International/dp/0201596040/ref=cm_lmf_tit_5_rsrsrs0/002-3342211-4734423

Thanks, that looks interesting too.  I have to say though, that most
real-world programming problems I deal with don't have serious
algorithmic issues (in the CS theory sense); they're mostly about
program organization and about OS interfaces and about choosing good
abstractions.
0
phr.cx (5493)
2/12/2007 1:31:32 AM
Chris Rathman wrote:
> The translation is mostly a self-education project.

This one was quite funny:

  let newton_transform g =
    fun x -> x -. ((g x) /. ((deriv g) x))

All of those parentheses are superfluous!

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/12/2007 1:31:47 AM
On Feb 11, 7:31 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> Chris Rathman wrote:
> > The translation is mostly a self-education project.
>
> This one was quite funny:
>
>   let newton_transform g =
>     fun x -> x -. ((g x) /. ((deriv g) x))
>
> All of those parentheses are superfluous!

Thanks.  My pride takes a hit, but since I'm getting a little help
from those who know ML infinitely better than myself, that's to be
expected.  :-)

Chris

0
2/12/2007 1:59:57 AM
Paul Rubin wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
>> > I may be missing something here--I thought the classic book on data
>> > structures was still TAOCP whose examples are all in (untyped)
>> Does it cover concurrent, lazy or immutable data structures?
> 
> I'd say not all that much; however, "classic" != "latest and most modern".

You said that you were interested in learning modern FPLs like ML. You won't
find any information about that in books that predate modern FPLs, whether
they are classic or not.

>> > Er, pattern matching has been done in Lisp programs (including
>> > destructuring-bind in CL) since the dawn of time; SICP itself has
>> > examples.
>>
>> Turing argument and Greenspun.
> 
> Eh?

Turing: you can reimplement anything in Lisp, including pattern matching.

Greenspun: you'll end up with a poor implementation of part of a modern FPL.

> Wasn't Mathematica the other example of pattern matching that 
> you like to cite?

Yes. Mathematica's pattern matching is great for symbolic manipulation
whereas ML's pattern matching is great for manipulating general-purpose
data structures.

I was referring specifically to ML's pattern matching though, which offers
static checking and is intimately tied to the type system (it is the only
way to deconstruct sum types).

> It's dynamically typed just like Lisp. 

Mathematica isn't statically typed but I'm not sure it is dynamically typed
either. Regardless, like ML, Mathematica comes with pattern matching. Lisp
does not. ML-style pattern matching makes many things easier and you won't
find any of them described in SICP, or any other book on Scheme.

>> Numeric towers are rare outside CASs.
>
> Not sure what you're getting at there.

You said "dispatch based on static types is implicit in how most compilers
translate arithmetic". What did you mean?

>> Don't extrapolate something Joel said about Haskell to the whole of
>> type inference.
> 
> Well it's still standard practice in Haskell to declare most or all
> functions, even though the compiler can infer the types, because the
> declarations make things easier on humans and make it easier to locate
> errors once the compiler reports them.  I don't know whether those
> issues apply to ML.

MLers add little or no type information beyond definitions (particularly for
sum types). Look at the OCaml stdlib, for example.

One feature that makes a huge difference to usability here is the ability to
get the type inferred for any subexpression. Both OCaml/emacs and F#/VS
support this.

> Thanks, that looks interesting too.  I have to say though, that most
> real-world programming problems I deal with don't have serious
> algorithmic issues (in the CS theory sense); they're mostly about
> program organization and about OS interfaces and about choosing good
> abstractions.

I have a lot of data structure related problems and am interested in the
trade-offs involved, e.g. between robustness and performance.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/12/2007 3:07:05 AM
Paul Rubin wrote:
> Aha, this sounds like exactly what the poker guy had trouble with, and
> it wasn't clear to me that it was possible to do in Haskell cleanly.
> I think I understand what you're getting at now, though I don't know
> the Haskell syntax to express it right now, I should be able to figure
> it out from the docs.  The idea is to select a different pattern match
> based on the type field, and use the matched stuff as function args,
> so the matched args and the function itself are statically typed and
> are type-checked straightforwardly.  Thanks!

Exactly. For example, you can extract the first or second member of a pair
dynamically using pattern matching:

# let get n t = match n, t with
    | `Fst, (x, _) | `Snd, (_, x) -> x;;
val get : [< `Fst | `Snd ] -> 'a * 'a -> 'a = <fun>

For example:

# get `Fst (2, 3);;
- : int = 2
# get `Snd (2, 3);;
- : int = 3

The choice of `Fst and `Snd has been deferred until run-time. This is how
you regain dynamic typing (run-time dispatch) when necessary in ML.
However, you should not write in this style more than is necessary because
it undermines reliability (you're circumventing static typing and forgoing
static checking). Whenever possible, you should use static functions (like
the built-in fst and snd):

# fst (2, 3);;
- : int = 2
# snd (2, 3);;
- : int = 3

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/12/2007 3:14:30 AM
Jon Harrop <jon@ffconsultancy.com> writes:
> However, you should not write in this style more than is necessary because
> it undermines reliability (you're circumventing static typing and forgoing
> static checking).

Well, sure, it's the essence of dynamic typing.  But I expected that
the matched args themselves would be statically checked.  Can I ask for
another example, the equivalent of

   struct {
      int type;
      union { int ival; float fval; } u;  
   } x;

   if (x.type == 0)  /* use the int part of the union */
      printf ("%d\n", x.u.ival * 3);
   else if (x.type == 1) /* use the float part */
      print("%f\n", sqrt(x.u.fval));
   else
      print ("invalid type tag\n");

I thought the above was the kind of thing we were talking about, where
the matched arg really has a completely different type depending on
the tag field.
0
phr.cx (5493)
2/12/2007 3:37:28 AM
Paul Rubin wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
>> However, you should not write in this style more than is necessary
>> because it undermines reliability (you're circumventing static typing and
>> forgoing static checking).
> 
> Well, sure, it's the essence of dynamic typing.  But I expected that
> the matched args themselves would be statically checked.

Right. They were in my previous example because they were separated from the
variant type [`Fst|`Snd].

> Can I ask for 
> another example, the equivalent of
> 
>    struct {
>       int type;
>       union { int ival; float fval; } u;
>    } x;
> 
>    if (x.type == 0)  /* use the int part of the union */
>       printf ("%d\n", x.u.ival * 3);
>    else if (x.type == 1) /* use the float part */
>       print("%f\n", sqrt(x.u.fval));
>    else
>       print ("invalid type tag\n");
> 
> I thought the above was the kind of thing we were talking about, where
> the matched arg really has a completely different type depending on
> the tag field.

You bundle the matched args into the variant type constructors.

You can have a closed sum type, which is inextensible but more thoroughly
checked at compile time:

# let print_num = function
    | `Int n -> sprintf "%d" n
    | `Float x -> sprintf "%f" x;;
val print_num : [< `Float of float | `Int of int ] -> string = <fun>

or an open sum type, which is more error-prone:

# let print_num = function
    | `Int n -> sprintf "%d" n
    | `Float x -> sprintf "%f" x
    | _ -> invalid_arg "print_num";;
val print_num : [> `Float of float | `Int of int ] -> string = <fun>

or an inheritance:

# let print_num print = function
    | `Int n -> sprintf "%d" n
    | `Float x -> sprintf "%f" x
    | z -> print z;;
val print_num :
  (([> `Float of float | `Int of int ] as 'a) -> string) -> 'a -> string =
  <fun>

However, this dynamic style is very rare in ML code. It only arises when
absolutely necessary because it is verbose, slow and error-prone.
Specifically, it is useful when interfacing, e.g. printing, remote
procedure calls, marshalling and so on.

ML usually optimises away static type information but some languages (e.g.
F#) keep it and can use it to implement generic print functions, equality
and so on.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/12/2007 3:50:29 AM
Paul Rubin schrieb:
> Joachim Durchholz <jo@durchholz.org> writes:
>> No. Some people name them "discriminated union".
>> At the C level, these are unions of structs, with a mandatory common
>> "type" field that discriminates which kind of object this is.
> 
> Oh, ok, this is basically how Lisp works.

No.
In Lisp, every data object must have a globally unique type identifier. 
(This makes it more difficult to transplant a data structure from one 
machine to another, for example.)
In a language with structural pattern matching, it's enough if the 
identifier discriminates between the variants of the union.

>> The types are ultimately static, but a lot of source code can be very
>> agnostic of the types that are actually passed through.
> 
> Right, that's different from something with a type field that's
> dispatched on at runtime.

The main difference here is that the set of variants to dispatch upon is 
closed (i.e. cannot be extended by adding code, you have to modify 
existing code).
That's sometimes a disadvantage, but I think this issue isn't central to 
what we're discussing right now.

>> It is also helpful because tagged unions are actually ubiquitous.
>> E.g. you return a data structure with an error flag? In an FPL, that's
>> a tagged union of an Error type and the normal return type.
>> Got several variants of a record, with different fields reinterpreted
>> depending on circumstances? Tagged unions can handle this kind of
>> situation just fine, and with none of the "uncleanness" associated
>> with this kind of structure.
> 
> Aha, this sounds like exactly what the poker guy had trouble with,

Then I have been unclear.
The Poker guy had trouble acessing C structures from Haskell.

 > and
> it wasn't clear to me that it was possible to do in Haskell cleanly.

I don't know about that part of Haskell, but I can well imagine that 
accessing data at the byte level sucks in Haskell (it sucks in most 
higher-level languages, since that's exactly the kind of activity they 
try to abstract away).

> I think I understand what you're getting at now, though I don't know
> the Haskell syntax to express it right now, I should be able to figure
> it out from the docs.  The idea is to select a different pattern match
> based on the type field, and use the matched stuff as function args,
> so the matched args and the function itself are statically typed and
> are type-checked straightforwardly.  Thanks!

Well, actually I wasn't getting at this. In practice, one could define 
e.g. homogenous lists with elements of type a as
   data List a = Nil | (a, List a)
and do pattern matching to discriminate between the degenerate case (an 
empty list, indicated by the object Nil) and the normal case (we have a 
first element, plus a - possibly degenerate - list as rest):
   case foo of
     Nil -> <return what's appropriate for an empty list>
     (head, rest) -> <return what's appropriate for a nonempty list;
                     'head' has the first element, 'rest' the rest>
Now that's slightly more complicated than what Lisp does for linked 
lists, but you can model all union-of-struct data types with that approach.

Note that the discriminator (what you called "type field") is entirely 
implicit. It doesn't have a name at the Haskell level; the only way to 
access it is by doing a pattern match (and you don't get a value, you 
get a decision).

>> When I last did Lisp (which is several years ago), it didn't use these
>> idioms at all,
> 
> Well of course it's straightforward to write in that style (using
> destructuring-bind, for example), but maybe it's more common to use
> structure types directly.

I don't know what destructuring-bind is, but from the terminology, I'd 
guess it is indeed pattern matching (which decomposes a structure and 
binds the structural elements to local names).
Is that guess correct?

>>  From what he wrote, it was a bit unclear to me what his actual
>> problems were (well, those with the type system at least).
> 
> Well, he gave some concrete examples where it took much more code to
> do something in Haskell than in Erlang, and much more cutting and
> pasting using his method.  Maybe there's a better way but I'm not
> familiar enough with Haskell to know this at the moment.

Neither am I.

My guess is that he hit some real problems, but couldn't reasonably get 
around them because he was held up by the learning curve at the same time.
I do hope that his experience report are helping the Haskell guys to 
improve the system.

Regards,
Jo
0
jo427 (1164)
2/12/2007 10:24:07 AM
Paul Rubin schrieb:
> Can I ask for another example, the equivalent of
> 
>    struct {
>       int type;
>       union { int ival; float fval; } u;  
>    } x;
> 
>    if (x.type == 0)  /* use the int part of the union */
>       printf ("%d\n", x.u.ival * 3);
>    else if (x.type == 1) /* use the float part */
>       print("%f\n", sqrt(x.u.fval));
>    else
>       print ("invalid type tag\n");

Let me rephrase this slightly to make the translation more straightforward:


 >    enum type_t {ival_t, fval_t }; # syntax probably wrong
 >    struct {
 >       type_t type;
 >       union { int ival; float fval; } u;
 >    } x;
 >
 >    if (x.type == ival_t)  /* use the int part of the union */
 >       printf ("%d\n", x.u.ival * 3);
 >    else if (x.type == fval_t) /* use the float part */
 >       print("%f\n", sqrt(x.u.fval));
 >    else
 >       print ("invalid type tag\n");

That would be

   data type_t = Ival int | Fval float
   ...
   case foo of
     Ival i -> print_int i
     Fval f -> print_float f
     _      -> print_error "invalid type tag"

Note that the last case is superfluous in most FPLs. Since the compiler 
already knows what cases are allowed, and since it's impossible to 
access the type field and write invalid values into it, the compiler can 
infer that the last case will never be run. (For the reverse case, when 
you forget a variant in the case expression, most compilers will give 
you a warning that the pattern match is non-exhaustive.)

Regards,
Jo
0
jo427 (1164)
2/12/2007 10:33:02 AM
Jon Harrop schrieb:
> ML's type system is fundamentally different from Haskell's, for
> a start.

I'd have said they are fundamentally similar because both are based on 
Hindley-Milner type inference.
They do have fundamental differences in other areas; would you say that 
these differences would have made a difference for Joel's project?

Regards,
Jo
0
jo427 (1164)
2/12/2007 10:35:34 AM
Joachim Durchholz wrote:
> 
>   data type_t = Ival int | Fval float
>   ...
>   case foo of
>     Ival i -> print_int i
>     Fval f -> print_float f
>     _      -> print_error "invalid type tag"
> 
> Note that the last case is superfluous in most FPLs.

Actually, it's even an error in most FPLs, or at least a warning.

- Andreas
0
rossberg (600)
2/12/2007 1:01:03 PM
Paul Rubin wrote:
> Joachim Durchholz <jo@durchholz.org> writes:
>> No. Some people name them "discriminated union".
>> At the C level, these are unions of structs, with a mandatory common
>> "type" field that discriminates which kind of object this is.
>
> Oh, ok, this is basically how Lisp works.

I'd say it is a misconception that the tag in a discriminated union 
value is a "type" field. Just consider the following trivial example:

   datatype t = Left of int | Right of int

Both variants have type int.

Datatype tags are just that, tags, nothing else. As such they have a 
status comparable to record labels (in type theory these are in fact 
dual concepts). You wouldn't say that a record label - or a method name, 
or a struct member name - is a "type selector" either.

[Now, from the POV of type theory, the dynamic tags in Lisp aren't types 
either, but that is another story which has seen enough battles in this 
forum.]

- Andreas
0
rossberg (600)
2/12/2007 1:15:47 PM
Joachim Durchholz wrote:
> Jon Harrop schrieb:
>> ML's type system is fundamentally different from Haskell's, for
>> a start.
> 
> I'd have said they are fundamentally similar because both are based on
> Hindley-Milner type inference.
> They do have fundamental differences in other areas; would you say that
> these differences would have made a difference for Joel's project?

I don't know about Joel's project but, in this context, Haskell's type
system is undecidable so it can require type annotations whereas ML's is
decidable and does not require annotations.

The fact that Haskellers write more type annotations should be not taken as
evidence that type inference doesn't work well. Especially given the fact
that MLers rarely annotate any types.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/12/2007 2:45:14 PM
Jon Harrop wrote:
> 
> I don't know about Joel's project but, in this context, Haskell's type
> system is undecidable so it can require type annotations whereas ML's is
> decidable and does not require annotations.

Not sure what you mean by "in this context", but just to get this 
straight: Haskell's type system is not undecidable. GHC allows you to 
step into the realm of undecidable type checking by explicitly 
requesting it via a compiler flag, but the average user should hardly 
ever need that.

OCaml's module type system, on the other hand, *is* undecidable. That's 
due to the presence of nested abstract signatures. Nobody ever uses them 
either.

- Andreas
0
rossberg (600)
2/12/2007 3:06:44 PM
Andreas Rossberg wrote:
> Jon Harrop wrote:
>> I don't know about Joel's project but, in this context, Haskell's type
>> system is undecidable so it can require type annotations whereas ML's is
>> decidable and does not require annotations.
> 
> Not sure what you mean by "in this context", but just to get this
> straight: Haskell's type system is not undecidable. GHC allows you to
> step into the realm of undecidable type checking by explicitly
> requesting it via a compiler flag, but the average user should hardly
> ever need that.

My mistaken then. I thought that was the default in Haskell.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/12/2007 5:21:59 PM
Paul Rubin <http://phr.cx@NOSPAM.invalid> wrote:
>
>I may be missing something here--I thought the classic book on data
>structures was still TAOCP whose examples are all in (untyped)
>assembly language.

TAOCP, even the updated editions, is horribly old-fashioned. Knuth's
arguments for using a made-up assembly language instead of a made-up
programming language are entirely bogus. You're much better off with
Corment, Leiserson & Rivest, plus Okasaki if you want to get into
hard-core FP.

Tony.
-- 
f.a.n.finch  <dot@dotat.at>  http://dotat.at/
ROCKALL: NORTHWEST BACKING SOUTH 6 TO GALE 8. ROUGH OR VERY ROUGH,
OCCASIONALLY HIGH. RAIN OR SHOWERS. MODERATE OR GOOD.
0
dot1 (33)
2/12/2007 5:50:00 PM
I've altered a lot of code now and have reached ex 3.16-3.19:

Chris Rathman wrote:
>> Why dynamic dispatch rather than four separate functions?
> 
> Perhaps because dynamic dispatch was the lesson being taught in that
> particular example.
> ...
>> Why the catchall pattern at the end?
> 
> Because the compiler will complain about non-exhaustive pattern
> matching.  And it happens to correspond to the SICP example.  If you
> are dispatching on messages, as the Lisp is doing in the example, then
> you may get an unexpected message.
>
> Again, you chose an example that doesn't translate well to ML.

I believe they are demonstrating how a cons cell can be replaced with a set
of closures that implement the functions that could have acted on the cons
cell (car, cdr, set_car and set_cdr), i.e. an object instead of a value.

This does not require dynamic dispatch. Although "you may get an unexpected
message" in Lisp or Scheme, you can statically define the allowed messages
in the type and have your code checked for correctness at compile time in
ML. So this example is actually more concise, more robust and clearer in
OCaml than in Scheme:

type ('a, 'b) cell =
  {car: unit -> 'a; cdr: unit -> 'b;
   set_car: 'a -> unit; set_cdr: 'b -> unit};;

let cons x y =
  let x = ref x and y = ref y in
  {car = (fun () -> !x); cdr = (fun () -> !y);
   set_car = (fun x' -> x:=x'); set_cdr = (fun y' -> y:=y')};;

let x = cons 1 2;;
let z = cons x x;;
(z.cdr()).set_car 17;;
x.car();;

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/12/2007 5:54:01 PM
Tony Finch <dot@dotat.at> writes:
> TAOCP, even the updated editions, is horribly old-fashioned. Knuth's
> arguments for using a made-up assembly language instead of a made-up
> programming language are entirely bogus. You're much better off with
> Corment, Leiserson & Rivest, plus Okasaki if you want to get into
> hard-core FP.

Ehh, ok, maybe I'm a fuddy duddy then.  I do know about CL&R and have
thought I might splurge for a copy sometime, since it has a lot more
about graph algorithms than Knuth vols 1-3 (the forthcoming Knuth
vol. 4 will cover them).  I just googled and found a blurb for
Okasaki's book (hadn't heard of it before) but am not sure what to
think--should I care about that stuff (for example to support
parallelism) or just use imperative constructions (e.g. with the array
monads in Haskell)?
0
phr.cx (5493)
2/12/2007 6:22:54 PM
On Feb 11, 12:51 pm, Joachim Durchholz <j...@durchholz.org> wrote:
> Paul Rubin schrieb:
>
> > I thought the classic book on data
> > structures was still TAOCP whose examples are all in (untyped)
> > assembly language.
>
> If you mean Don Knuth's series: it's seriously outdated, particularly on
> topics like abstract data types, data structures, etc.
> In all, it reflects the topics and choices that were controversial and
> important in the fifties. (Similar to reading a book on geometry written
> by the old Greek philosophical mathematicians. Even if they contributed
> substantially to the state of the art, neither their problems nor much
> of their results are very relevant today anymore.)
>

Wrong.  It just works at a much lower-level than most people these
days (especially functional programmers) are working at.  It's
completely relevant to people working on low-level things that YOU are
not working on, and it doesn't have any bias towards functional
programming or any other.  That's why they use MIX or MIX2.


0
MarkHaniford (145)
2/12/2007 8:13:47 PM
On Feb 12, 11:54 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> I believe they are demonstrating how a cons cell can be replaced with a set
> of closures that implement the functions that could have acted on the cons
> cell (car, cdr, set_car and set_cdr), i.e. an object instead of a value.

Now that I'm out of *reflexive mode*, and into a more humble state of
mind, I would agree with that description and solution.  Having
backtracked my breadcrumbs (which is what the wiki represents for me),
I see that I was trying to approximate something along those lines in
Alice (though my given solution there was convoluted).

> ...So this example is actually more concise, more robust and clearer in
> OCaml than in Scheme:

Well, I'm trying to understand Scheme as well, so I'm less inclined at
the current time to express an opinion on the matter, one way or the
other.  :-)

With respect to some of the other changes, I can't say that I
understand the ML styles that frowns on excessive usage of
parenthesis.  That is, I'm not aware of any semantic differences
between, say, something like:

let sum_of_squares x y = (square x) + (square y);;

and

let sum_of_squares x y = square x + square y;;

Certainly it would help if I'd actually immerse myself in the culture
and mores of the various languages that I have chosen to explore, as
I've violated the styles and idioms of each and every one of them (the
Haskell code being perhaps the worse offender).  But I'm somewhat
perplexed by how much that grates on these communities.  I'd venture a
guess that it arises from an effort to keep people from bringing their
imperative accents in play while they are writing ML code.

My own inclination is to say that the first is slightly more readible
to someone, such as myself, coming from a variety of other languages.
Though I acknowledge that some of this comes from a mistrust of
compilers in other languages, so I'm probably overspecifying the
solution and should place more trust in the language.

Thanks,
Chris

0
2/12/2007 8:21:00 PM
Markus E Leypold <development-2006-8ecbb5cc8aREMOVETHIS@ANDTHATm-e-leypold.de> writes:
> Do you need it to program on the same level of abstraction as
> i.e. with C / C++ / Ada / OCaml? No. So liftM2 is optional in a sense
> (OK, as I'm not a hard core Haskeller I might miss a subtle point, but
> the point I'd like to make, is, that people worry about _additional_
> features when deciding to migrate from languages which don't have that
> features at all).

Well, if I wanted to code in C, I wouldn't be looking into Haskell ;-).

> > But I think I have to use other monads for arrays, concurrency, etc.
> You don't have to use array: Many things can be done purely functional
> with lists and maps. 

Someone mentioned Chris Okasaki's book on functional data structures
but I think that's fairly esoteric even for here.  Yes, obviously you
can use lists and maps straightforwardly for a lot of things, but that
can lead to horrendous performance impairment compared with using
mutable structures (think of almost any graph algorithm).  Even
something like a simple character string like "hello" in Haskell is
represented with five cons nodes instead of a linear vector, so that
you can pattern match x:xs against it, and "x !! n" is an O(n)
operation instead of O(1).  This was one reason I asked if Haskell
naturally led to selecting poorly scaling algorithms.

> Concurrency is hardly what one would need in an application that
> needs to do simple things, so certainly (in my eyes) not necessary
> for the beginner. And, Paul: Concurrency is also hard in imperative
> languages :-).

Well true, but I'm interested in FP to escape from the hazards of
C-like (or Lisp-like) languages for complex applications.  I'm
therefore trying to get a sense of what it's like on the other side of
the learning curve, not the beginner's side where I am now.  That
means dealing with systems interfaces, performance issues, etc.

> Yes, there are such programs. But those are not the only serious
> programs, but "systems programming" or whatever. Also serious in my
> opinion (and much more frequent) is "data processing", that is, read
> your data, derive other data from it and output that. E.g. creating a
> report from a database, massaging XML documents (deleting sections,
> extract meta data etc). Programming servers is only a tiny part of
> information processing.

Sure, that's reasonable, though there's plenty of "systems
programming" on the client side too (think of a writing a word
processor or web browser).  Even stuff like massaging XML documents
leads to systems programming, if you want to do stuff like follow
links from the documents (multi-threaded web crawler), or edit them
graphically (communicate with window system etc).
0
phr.cx (5493)
2/12/2007 9:01:06 PM
Markus E Leypold <development-2006-8ecbb5cc8aREMOVETHIS@ANDTHATm-e-leypold.de> writes:
> >   http://wagerlabs.com/2006/01/01/haskell-vs-erlang-reloaded
> What I see here, is a (badly quoted) piece from presumably a mail or
> news group post which cannot be found elsewhere. Badly quoted, because
> I assume that the first paragraph is written by someone else (Joey?) ,
> not by SPJ. After that I stopped reading.

It was written by Joel Reymont, the guy who wrote a poker server and
then tried writing a test framework for it in Haskell.  It's one of a
series of messages exchanged with SPJ and others, so at the beginning
he quotes a paragraph from an earlier SPJ message.  Basically it's
part of a thread about why he gave up on Haskell.  SPJ asked Joel to
describe his problems more explicitly, so Joel did so in that message.
0
phr.cx (5493)
2/12/2007 9:03:14 PM
MarkHaniford@gmail.com schrieb:
> On Feb 11, 12:51 pm, Joachim Durchholz <j...@durchholz.org> wrote:
>> Paul Rubin schrieb:
>>
>>> I thought the classic book on data
>>> structures was still TAOCP whose examples are all in (untyped)
>>> assembly language.
>> If you mean Don Knuth's series: it's seriously outdated, particularly on
>> topics like abstract data types, data structures, etc.
>> In all, it reflects the topics and choices that were controversial and
>> important in the fifties. (Similar to reading a book on geometry written
>> by the old Greek philosophical mathematicians. Even if they contributed
>> substantially to the state of the art, neither their problems nor much
>> of their results are very relevant today anymore.)
> 
> Wrong.  It just works at a much lower-level than most people these
> days (especially functional programmers) are working at.

Hmm... I have to partially agree.

TAOCP was (and, in some respects, still is) a great work, but it also 
has some horrible flaws.
One excellent example here is the spectral test.
First, the algorithm is worded as two overlapping loops. There's not a 
single assertion in the code, so it's entirely unclear what invariants 
this code is establishing at all.
Second, he's quite handwavy about "the algorithm may overflow but that's 
not usually the case", without any useful way to detect the overflow 
case, and no discussion about potential improvements or anything.
Knuth lost a lot of credit with me for that piece of shoddy work.

Of course, his discussion of random-number generators is excellent - but 
Mersenne Twister wasn't invented when he wrote about random numbers, so 
he sticks with recommending linear congruential PRNGs (and devotes a 
*lot* of space to the selection of "good" parameters).
Instructive to know that a mere number can influence the quality of an 
algorithm, but hardly worth an entire chapter today.

Etc. etc. etc.

There is also a conspicuous lack of important topics.
Nothing about abstract data types.
Nothing about immutability (though there's quite a lot to be said about 
the algorithms).
Nothing about amortized complexity, or methods to compute it.
Nothing about programming paradigms.
Nothing about networking.
Nothing about security.
Nothing about usability.
Nothing about (semi-)automatic reasoning about source code (static 
typing, type inference, etc.)
Of course, each of these topics is enough for a book of its own.
But I suspect that Knuth would be unable to contribute to these.

It's been quite a long time since I stopped hoping for Knuth's next book.

TAOCP is excellent, and partly brilliant. It may still be the prime 
source for the very low-level stuff (though I doubt that - the details 
of MIX don't reflect modern architectures anymore, and MIX II may be an 
improvement, but it's not *that* much of an improvement).
Anyway. TAOCP is rather outdated, and I'd recommend other sources to 
novices in the field.

Regards,
Jo
0
jo427 (1164)
2/12/2007 9:04:06 PM
Markus E Leypold schrieb:
> Joachim Durchholz <jo@durchholz.org> writes:
> 
>> Paul Rubin schrieb:
>>> I thought the classic book on data
>>> structures was still TAOCP whose examples are all in (untyped)
>>> assembly language.
>> If you mean Don Knuth's series: it's seriously outdated, particularly
>> on topics like abstract data types, data structures, etc.
> 
> Knuth TAOCP is not about data structures, abstraction or software
> engineering. It's about algorithms and in that area it is not outdated
> at all.

Limited selection of graph algorithm.
A lot of space for tape sorting - interesting but hardly relevant.
No lazy data structures.
Old set of algorithms for PRNGs.
No cryptography.
I don't remember having read about 3D graphics.
Heck, not even 2D graphics. I learned about that from Smalltalk books.

Of course, there *is* a wealth of algorithms in TAOCP. But the selection 
of algorithms is tailored towards the needs of the sixties or seventies, 
not towards today's needs.

> I wish the people implementing e.g. pseudo random number generators
> today had read Knuth's 2nd volume.

Absolutely no!
TAOCP recommends linear congruential PRNGs. These have comparatively 
short periods and hyperplane anomalies.
Today, I'd start with a Mersenne Twister, which AFAIK doesn't have any 
known anomalies yet.

 > Just recently I had a case of
> abolutely pathological test data (but not recognizable as such at the
> first glance) because someone messed up with random number generation.

Yes, but incompetence is independent of available literature.

> So I don't agree. TAOCP is not outdated and should be required reading
> at universities.

There are better books.

> BTW: Which other texts would you recommend as a substitute? (Serious:
> I'm always collecting bibliographic references).

Today, I'd recommend googling around.
The field of algorithms has diversified so much that it's quite hopeless 
to find a single reference.

I do have a fixed list of recommendations, though each fills just a 
specific need, in no particular order:

For data structures in FPLs: Chris Okasaki, Purely Functional Data 
Structures. It leaves room for improvement, but it's still 
groundbreaking work.

For OOP: Bertrand Meyer, Object-Oriented Software Construction. 
Instructive even in its (sometimes grave) mistakes, because he explains 
his choices so if you disagree about a conclusion, you have a chance to 
find out where his assumptions diverge from yours (and you get a chance 
to reconsider your assumptions, which can help a lot to shed mental 
ballast and acquire new confidence in those assumptions that remain).

For programming paradigms in general: Peter van Roy, Seif Haridi, 
Concepts, Techniques, and Models of Computer Programming. Gives an 
excellent bird's eye overview of most of the programming paradigms that 
are around. (It's missing Prolog, which I personally consider a failure 
though it seems to have sparked some interesting other languages.)

Sorting, basic graph & geometry stuff: Robert Sedgewick, Algorithms in 
C. Not brilliant, just a solid reference of many well-known algorithms.

Cryptography: Bruce Schneier, Applied Cryptography. *The* standard in 
its field. Though I suspect that my copy is quite outdated by now - 
cryptography is a fast-moving field.


I probably forgot to mention a few, but I don't have my bookshelf handy. 
I guess that if I remember these all by heart, these are among the most 
outstanding books anyway :-)
There are also a few odd works that were outstanding in their own right, 
but dwell too much on algorithms for which better algorithms exist 
today. This includes Adele Goldberg's books about Smalltalk's design and 
implementation, Niklas Wirth's Algorithms and Data Structures, the 
famous "Dragon Book" by Aho/Hopcroft/Ullman, and probably a few others.
Oh, and the above list doesn't have a good reference work for parsing. 
The only work in that area that I read was Kannapinn's thesis, which is 
fascinating but essentially a one-trick pony.
To be a reasonably competent programmer, one would have to master at 
least one large library, but that's something that's best acquired by 
osmosis and programming, not by reading books :-)

HTH.

Regards,
Jo
0
jo427 (1164)
2/12/2007 9:30:15 PM
"Chris Rathman" <Chris.Rathman@comcast.net> writes:

> On Feb 12, 11:54 am, Jon Harrop <j...@ffconsultancy.com> wrote:
>> I believe they are demonstrating how a cons cell can be replaced with a set
>> of closures that implement the functions that could have acted on the cons
>> cell (car, cdr, set_car and set_cdr), i.e. an object instead of a value.
>
> Now that I'm out of *reflexive mode*, and into a more humble state of
> mind, I would agree with that description and solution.  Having
> backtracked my breadcrumbs (which is what the wiki represents for me),
> I see that I was trying to approximate something along those lines in
> Alice (though my given solution there was convoluted).
>
>> ...So this example is actually more concise, more robust and clearer in
>> OCaml than in Scheme:
>
> Well, I'm trying to understand Scheme as well, so I'm less inclined at
> the current time to express an opinion on the matter, one way or the
> other.  :-)
>
> With respect to some of the other changes, I can't say that I
> understand the ML styles that frowns on excessive usage of
> parenthesis.  That is, I'm not aware of any semantic differences
> between, say, something like:
>
> let sum_of_squares x y = (square x) + (square y);;
>
> and
>
> let sum_of_squares x y = square x + square y;;
>
> Certainly it would help if I'd actually immerse myself in the culture
> and mores of the various languages that I have chosen to explore, as
> I've violated the styles and idioms of each and every one of them (the
> Haskell code being perhaps the worse offender).  But I'm somewhat
> perplexed by how much that grates on these communities.  I'd venture a
> guess that it arises from an effort to keep people from bringing their
> imperative accents in play while they are writing ML code.
>
> My own inclination is to say that the first is slightly more readible
> to someone, such as myself, coming from a variety of other languages.
> Though I acknowledge that some of this comes from a mistrust of
> compilers in other languages, so I'm probably overspecifying the
> solution and should place more trust in the language.
>
> Thanks,
> Chris

I think the argument is whether one should trust one's readers.  I
trust the compiler perfectly well.  However, I find that the next
maintainer of any program I write is likely to be confused if I don't
over parenthesize.  Now, perhaps these are just the confessions of an
"imperative programmer".  Perhaps, I would feel differently if I wrote
a lot of FP and had other FP programmers as my readers.  However, in
my normal world, I find that one can rarely make things too simple and
too explicit.

I, myself, found the superfluous parenthesis made reading the line in
question simpler.  However, I am neither a native nor a fluent ML
speaker.  I'm not even a native lisp speaker either, and I didn't find
the lisp-like placement of the parenthesis distracting.

Just one reader's opinion,
-Chris (Clark)
0
cfc (239)
2/12/2007 9:30:23 PM
Paul Rubin schrieb:
> Someone mentioned Chris Okasaki's book on functional data structures
> but I think that's fairly esoteric even for here.

No, not at all. Okasaki's algorithms made it into several data structure 
libraries.
(Of course, only data structure library programmers will really read 
every line in that book.)

 > Yes, obviously you
> can use lists and maps straightforwardly for a lot of things, but that
> can lead to horrendous performance impairment compared with using
> mutable structures (think of almost any graph algorithm).  Even
> something like a simple character string like "hello" in Haskell is
> represented with five cons nodes instead of a linear vector, so that
> you can pattern match x:xs against it, and "x !! n" is an O(n)
> operation instead of O(1).  This was one reason I asked if Haskell
> naturally led to selecting poorly scaling algorithms.

This kind of problem will hurt at the micro level, but not so much when 
programming in the large.
Of course, you have to do your homework and replace those naive O(N) 
algorithms with O(log N) ones. But I've seen needless O(N^2) algorithms 
in imperative software far too often to be overly worried about that issue.

Regards,
Jo
0
jo427 (1164)
2/12/2007 9:38:57 PM
Joachim Durchholz <jo@durchholz.org> writes:
> Limited selection of graph algorithm.

That's volume 4, in preparation.  Some parts of it have been published.

> A lot of space for tape sorting - interesting but hardly relevant.

Very relevant way back when.  Current edition says no longer relevant,
but left in because theory still beautiful and might come back into
relevance; however, he says he might take that tsuff out of the next
edition.

> No lazy data structures.

Not sure what this means: I've heard of lazy evaluation in
higher-level languages, but lazy data structures?  He does discuss
coroutines a fair amount.

> Old set of algorithms for PRNGs.
> No cryptography.

True.  Cryptography is kind of a specialized subtopic but it's vastly
affected how we think of PRNG's, so maybe a forthcoming edition better
treat this.  He does talk about integer factorization a lot, and
mentions RSA cryptography in conjunction with this.

> I don't remember having read about 3D graphics.
> Heck, not even 2D graphics. I learned about that from Smalltalk books.

Ehh again, specialized subtopic, though maybe my definition of
"specialized subtopic" is colored by what's in TAOCP ;) (as a teenager
I more or less lived inside those books).

> Today, I'd start with a Mersenne Twister, which AFAIK doesn't have any
> known anomalies yet.

There are known distinguishers for MT.  If you're really trying to be
careful, you have to use RNG's designed against adversaries, which
means cryptographic RNG's.

> For data structures in FPLs: Chris Okasaki, Purely Functional Data
> Structures. It leaves room for improvement, but it's still
> groundbreaking work.

Sounds interesting.

> For OOP: Bertrand Meyer, Object-Oriented Software

Well, TAOCP is an algorithms book, not really a software development
book.

> For programming paradigms in general: Peter van Roy, Seif Haridi,
> Concepts, Techniques, and Models of Computer Programming. 

Give up on SICP?

> Sorting, basic graph & geometry stuff: Robert Sedgewick, Algorithms in
> C. Not brilliant, just a solid reference of many well-known algorithms.

I like Knuth better for sorting.  For other stuff, maybe CL&R.

> Cryptography: Bruce Schneier, Applied Cryptography. *The* standard in
> its field. Though I suspect that my copy is quite outdated by now -
> cryptography is a fast-moving field.

Nah, no offense to Bruce but that's just a cookbook for programmers.
For the mathematical theory, I'd start with Bellare & Rogaway,
"Introduction to Modern Cryptography".  This book (maybe not yet in
print yet, but online) is wonderful.  It completely demystifies a
subject that most people (including professionals) still often treat
as black magic:

  http://www-cse.ucsd.edu/users/mihir/cse207/classnotes.html

More hardcore is Goldreich, "Foundations of Cryptography", two printed
vols (I don't have them); fragments of it are online:
  http://theory.lcs.mit.edu/~oded/frag.html

> Oh, and the above list doesn't have a good reference work for
> parsing. 

Yeah, that's what Knuth vol 7 was supposed to be, but he may have
changed this plan.
0
phr.cx (5493)
2/12/2007 9:44:49 PM
Paul Rubin wrote:
> But could be that coding in Haskell naturally leads one to use
> badly-scaling algorithms?

Purely functional data structures certainly tend to be slower for the worst
case in the absence of concurrency. A map represented by a balanced binary
tree can be 10x slower than a hash table.

However, purely functional data structures can also be faster:

Many GCs (e.g. OCaml's) treat arrays of pointers as atoms, traversing the
whole array in one go. So the array of pointers in a big hash table causes
the GC to stall the system, drastically worsening worst-case performance
and rendering the program less suitable for soft real-time work.

Inserting in a hash table is O(n) whereas inserting in a map is only O(log
n) so, again, maps can be preferable when you want fast worst-case
performance.

In the presence of concurrency, the need for expensive locks is minimal if
you use purely functional data structures. This vastly simplifies
concurrent code, which can be much more valuable than a factor of 2 in
performance.

If the language supports it, the compiler and run-time can leverage
knowledge about immutability to improve performance.

One gem of information that I gleaned from Okasaki's work is that you can
recover much of the performance lost when using purely functional data
structures by amortising costs. However, this often also loses you thread
safety.

>> >> he spent too much time fighting the type system and decided he
>> >> was more productive with dynamic types;
>> > That's an interesting one. ...  I wonder if my mind is just bent
>> > to think of problem solutions that don't need dynamic types.
> 
> Well, his application involved sending a lot of serialized data over
> network sockets and he had a lot of object types, so I guess he found
> it more straightforward on the receiving end to deserialize the
> packets into objects of the appropriate type and dispatch them
> dynamically, than to have static selection logic all over the place.
> I guess in OCaml one would use OOP, but I always saw OOP as a
> workaround to static types.

I wrote an XMLRPC implementation in OCaml in only 30 LOC by leveraging
pattern matching, higher-order functions and currying.

If you're talking about having hundreds of different types then you should
write a compiler to generate the OCaml bindings for you. That is also easy
in OCaml.

I wouldn't use OOP for that in OCaml. Also, any application that uses
marshalling extensively is better suited to a language with type safe
marshalling, e.g. HashCaml, MetaOCaml or F#.

> I don't understand what precisely what a GADT is, but in dynamically
> typed languages doesn't the whole question go away?

No. The best example of GADTs that I've seen is the tagless interpreter.

In a dynamically typed language you get the worst of all worlds. You have
one of three choices:

1. Put up with no static assurances and maximal boxing and unboxing.

2. Split your interpreter for the most common cases, add lots of type
declarations and hope that your compiler optimises away some of the boxing
and unboxing.

3. Sacrifice safety.

Your typical implementation will look like this OCaml:

let int = function `Int n -> n | _ -> invalid_arg "int";;
let bool = function `Bool b -> b | _ -> invalid_arg "bool";;
let pair = function `Pair(a, b) -> a, b | _ -> invalid_arg "pair";;

let rec eval = function
  | `Lit i -> `Int i
  | `Inc f -> `Int(int(eval f) + 1)
  | `IsZ f -> `Bool(int(eval f) = 0)
  | `If(p, t, f) -> eval(if bool(eval p) then t else f)
  | `Pair(a, b) -> `Pair(eval a, eval b)
  | `Fst f -> fst(pair(eval f))
  | `Snd f -> snd(pair(eval f));;

In the absence of GADTs you can split your interpreter into specialised
versions designed to return unboxed values for expressions known to
evaluate to certain types:

let rec eval = function
  | `Lit i -> `Int i
  | `Inc f -> `Int(eval_int f + 1)
  | `IsZ f -> `Bool(eval_int f = 0)
  | `If(p, t, f) -> eval(if eval_bool p then t else f)
  | `Pair(a, b) -> `Pair(eval a, eval b)
  | `Fst f -> fst(eval_pair f)
  | `Snd f -> snd(eval_pair f)
and eval_int = function
  | `Lit i -> i
  | `Inc f -> eval_int f + 1
  | `If(p, t, f) -> eval_int(if eval_bool p then t else f)
  | `Fst f -> int(fst(eval_pair f))
  | `Snd f -> int(snd(eval_pair f))
and eval_bool = function
  | `IsZ f -> eval_int f = 0
  | `If(p, t, f) -> eval_bool(if eval_bool p then t else f)
  | `Fst f -> bool(fst(eval_pair f))
  | `Snd f -> bool(snd(eval_pair f))
and eval_pair = function
  | `If(p, t, f) -> eval_pair(if eval_bool p then t else f)
  | `Pair(a, b) -> eval a, eval b
  | `Fst f -> pair(fst(eval_pair f))
  | `Snd f -> pair(snd(eval_pair f));;

This removes some of the boxing and unboxing and provides some extra static
checking.

With GADTs, you can remove all superfluous boxing to get maximum performance
and maximum static checking:

data Term a where
    Lit  :: Int -> Term Int
    Inc  :: Term Int -> Term Int
    IsZ  :: Term Int -> Term Bool
    If   :: Term Bool -> Term a -> Term a -> Term a
    Pair :: Term a -> Term b -> Term (a,b)
    Fst  :: Term (a,b) -> Term a
    Snd  :: Term (a,b) -> Term b 

eval :: Term a -> a
eval (Lit i)    = i
eval (Inc t)    = eval t + 1
eval (IsZ t)    = eval t == 0
eval (If b t e) = if eval b then eval t else eval e
eval (Pair a b) = (eval a, eval b)
eval (Fst t)    = fst (eval t)
eval (Snd t)    = snd (eval t)

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/12/2007 9:55:51 PM
Paul Rubin wrote:
> Markus E Leypold
> <development-2006-8ecbb5cc8aREMOVETHIS@ANDTHATm-e-leypold.de> writes:
>> >   http://wagerlabs.com/2006/01/01/haskell-vs-erlang-reloaded
>> What I see here, is a (badly quoted) piece from presumably a mail or
>> news group post which cannot be found elsewhere. Badly quoted, because
>> I assume that the first paragraph is written by someone else (Joey?) ,
>> not by SPJ. After that I stopped reading.
> 
> It was written by Joel Reymont

That's the guy who bought my book and read it while his house burned down,
whilst sitting on a corpse. Or something:

http://wagerlabs.com/articles/2006/08/23/objective-caml-for-scientists

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/12/2007 10:01:11 PM
Jon Harrop <jon@ffconsultancy.com> writes:
> That's the guy who bought my book and read it while his house burned down,
> whilst sitting on a corpse.

Heh, quite a recommendation ;-)
0
phr.cx (5493)
2/12/2007 10:15:44 PM
On Feb 12, 3:04 pm, Joachim Durchholz <j...@durchholz.org> wrote:
> There is also a conspicuous lack of important topics.
> Nothing about abstract data types.
> Nothing about immutability (though there's quite a lot to be said about
> the algorithms).
> Nothing about amortized complexity, or methods to compute it.
> Nothing about programming paradigms.
> Nothing about networking.
> Nothing about security.
> Nothing about usability.
> Nothing about (semi-)automatic reasoning about source code (static
> typing, type inference, etc.)
> Of course, each of these topics is enough for a book of its own.
> But I suspect that Knuth would be unable to contribute to these.
>

The compiler volume isn't out yet as well as some other volumes.  But
Knuth nor any other individual can hope to cover all the topics in
computer science/software engineering/usability....  The topics he
does present though are still as relevant as they were back when they
were first published.

> It's been quite a long time since I stopped hoping for Knuth's next book.
>

I'm not sure what volume is next - maybe the compiler one, but I'm
sure it'll be relevant and not out-dated.

> TAOCP is excellent, and partly brilliant. It may still be the prime
> source for the very low-level stuff (though I doubt that - the details
> of MIX don't reflect modern architectures anymore, and MIX II may be an
> improvement, but it's not *that* much of an improvement).

I'd argue that many of the code examples would be better served by
proving examples in straight C, but I understand the reasoning behind
MIX.

> Anyway. TAOCP is rather outdated, and I'd recommend other sources to
> novices in the field.
>

TAOCP is not outdated, but I consider TAOCP as reference volumes and
not teaching texts - at least at the undergraduate level.


> Regards,
> Jo


0
MarkHaniford (145)
2/12/2007 10:29:27 PM
Joachim Durchholz <jo@durchholz.org> writes:
>    data type_t = Ival int | Fval float
>    ...
>    case foo of
>      Ival i -> print_int i
>      Fval f -> print_float f
>      _      -> print_error "invalid type tag"
> 
> Note that the last case is superfluous in most FPLs. Since the
> compiler already knows what cases are allowed, and since it's
> impossible to access the type field and write invalid values into it,

Um, ok, but in this case how do you cons up the object in the first
place?

For example, say you want to read 5 bytes from a network socket.  If
the first byte is a 0, the next 4 bytes are an integer (don't worry
about byte order); if the first byte is a 1, the next 4 bytes are a
float.  How would you make one of those type_t's out of the 5 bytes?
0
phr.cx (5493)
2/12/2007 10:33:04 PM
"MarkHaniford@gmail.com" <MarkHaniford@gmail.com> writes:
> TAOCP is not outdated, but I consider TAOCP as reference volumes and
> not teaching texts - at least at the undergraduate level.

I read them in high school and they were revelatory.  I'm just
incapable today of experiencing the level of amazement that they
brought me back then (also Aho & Ullman's dragon book at about the
same time).
0
phr.cx (5493)
2/12/2007 10:36:26 PM
On Feb 12, 3:30 pm, Joachim Durchholz <j...@durchholz.org> wrote:
> Limited selection of graph algorithm.
> A lot of space for tape sorting - interesting but hardly relevant.
> No lazy data structures.
> Old set of algorithms for PRNGs.
> No cryptography.
> I don't remember having read about 3D graphics.
> Heck, not even 2D graphics. I learned about that from Smalltalk books.
>

3d graphics!? 2D graphics!?  Get a grip man.  Knuth can't write a
volume on every subject under the sun.

> Of course, there *is* a wealth of algorithms in TAOCP. But the selection
> of algorithms is tailored towards the needs of the sixties or seventies,
> not towards today's needs.
>

No it's not.  We've already been through this.  It's just at a
different level.  W

> > I wish the people implementing e.g. pseudo random number generators
> > today had read Knuth's 2nd volume.
>
> Absolutely no!
> TAOCP recommends linear congruential PRNGs. These have comparatively
> short periods and hyperplane anomalies.
> Today, I'd start with a Mersenne Twister, which AFAIK doesn't have any
> known anomalies yet.
>
>  > Just recently I had a case of
>
> > abolutely pathological test data (but not recognizable as such at the
> > first glance) because someone messed up with random number generation.
>
> Yes, but incompetence is independent of available literature.
>
> > So I don't agree. TAOCP is not outdated and should be required reading
> > at universities.
>
> There are better books.
>
> > BTW: Which other texts would you recommend as a substitute? (Serious:
> > I'm always collecting bibliographic references).
>
> Today, I'd recommend googling around.
> The field of algorithms has diversified so much that it's quite hopeless
> to find a single reference.
>
> I do have a fixed list of recommendations, though each fills just a
> specific need, in no particular order:
>
> For data structures in FPLs: Chris Okasaki, Purely Functional Data
> Structures. It leaves room for improvement, but it's still
> groundbreaking work.
>
> For OOP: Bertrand Meyer, Object-Oriented Software Construction.
> Instructive even in its (sometimes grave) mistakes, because he explains
> his choices so if you disagree about a conclusion, you have a chance to
> find out where his assumptions diverge from yours (and you get a chance
> to reconsider your assumptions, which can help a lot to shed mental
> ballast and acquire new confidence in those assumptions that remain).
>
> For programming paradigms in general: Peter van Roy, Seif Haridi,
> Concepts, Techniques, and Models of Computer Programming. Gives an
> excellent bird's eye overview of most of the programming paradigms that
> are around. (It's missing Prolog, which I personally consider a failure
> though it seems to have sparked some interesting other languages.)
>
> Sorting, basic graph & geometry stuff: Robert Sedgewick, Algorithms in
> C. Not brilliant, just a solid reference of many well-known algorithms.
>
> Cryptography: Bruce Schneier, Applied Cryptography. *The* standard in
> its field. Though I suspect that my copy is quite outdated by now -
> cryptography is a fast-moving field.
>
> I probably forgot to mention a few, but I don't have my bookshelf handy.
> I guess that if I remember these all by heart, these are among the most
> outstanding books anyway :-)
> There are also a few odd works that were outstanding in their own right,
> but dwell too much on algorithms for which better algorithms exist
> today. This includes Adele Goldberg's books about Smalltalk's design and
> implementation, Niklas Wirth's Algorithms and Data Structures, the
> famous "Dragon Book" by Aho/Hopcroft/Ullman, and probably a few others.
> Oh, and the above list doesn't have a good reference work for parsing.
> The only work in that area that I read was Kannapinn's thesis, which is
> fascinating but essentially a one-trick pony.
> To be a reasonably competent programmer, one would have to master at
> least one large library, but that's something that's best acquired by
> osmosis and programming, not by reading books :-)
>
> HTH.

And you still don't get it.  You say that TAOCP is outdated and
irrelevant and then bring up a book on Smalltalk's design and how
Knuth's books don't cover 2d/3d graphics.  That's all irrelevant
compared to the fundamentals which Knuth is expressing in his volumes.


>
> Regards,
> Jo


0
MarkHaniford (145)
2/12/2007 10:36:56 PM
Jon Harrop wrote:
> One gem of information that I gleaned from Okasaki's work is that you can
> recover much of the performance lost when using purely functional data
> structures by amortising costs. However, this often also loses you thread
> safety.

Just curious: How can a purely functional data structure have any probems
with thread safety?

Cheers
Ben
0
2/12/2007 10:52:13 PM
Paul Rubin wrote:
> Joachim Durchholz <jo@durchholz.org> writes:
>> No lazy data structures.
> 
> Not sure what this means: I've heard of lazy evaluation in
> higher-level languages, but lazy data structures?

Yes. Data structures that are evaluated only when needed. For example, the
stream of natural numbers:

# let rec succ n () = n, succ(n+1);;
val succ : int -> (unit -> int * 'a as 'a) = <fun>
# let nats = succ 0;;
val nats : unit -> int * 'a as 'a = <fun>
# let zero, rest = nats();;
val zero : int = 0
val rest : unit -> int * 'a as 'a = <fun>
# let one, rest = rest();;
val one : int = 1
val rest : unit -> int * 'a as 'a = <fun>

Lazy data structures are widely used in both pure and impure FPLs,
particularly in the context of parsing. OCaml even has a camlp4 syntax
extension to support lazy stream parsing, which is perfect for writing
simple LR parsers:

  http://www.ffconsultancy.com/free/ocaml/parsing.html

Lazy data structures are ideal for incremental algorithms that handle large
data structures.

>> I don't remember having read about 3D graphics.
>> Heck, not even 2D graphics. I learned about that from Smalltalk books.
> 
> Ehh again, specialized subtopic, though maybe my definition of
> "specialized subtopic" is colored by what's in TAOCP ;) (as a teenager
> I more or less lived inside those books).

Many data structures and algorithms are best explained in the context of
graphics. Any book on data structures or algorithms should at least touch
on graphics, not least because graphics is the single most effective way to
teach this stuff.

Multiresolution representations (e.g. wavelets) are ideally suited to
functional programming and are fundamental in many disciplines, including
important parts of graphics (e.g. MRI).

Octrees, BSPs and k-D trees underpin LOD algorithms and are used in many
applications that deal with heterogeneous distributions of objects in space
(even databases).

We have already reached the point where consumer GPUs are faster than CPUs.
So leveraging the GPU for general-purpose computation will become
increasingly important.

Implementing GUIs elegantly is one of the major headaches in software
engineering, as I understand it.

So I think graphics is a core topic.

>> For data structures in FPLs: Chris Okasaki, Purely Functional Data
>> Structures. It leaves room for improvement, but it's still
>> groundbreaking work.
> 
> Sounds interesting.

It was a PhD thesis and is quite hardcore. I wish there was a thorough book
on algorithms in SML/OCaml/F#. OCaml for Scientists has a chapter
discussing the most important of OCaml's built-in data structures (arrays,
lists, sets, hash tables, maps and trees) and some extra stuff on
scientific computing (e.g. unbalanced trees). F# for Scientists extends
this with a discussion of ASTs for symbolic manipulation.

But I still want a book that discusses all of the basic data structures
(strings, lists, lazy lists, arrays, sets, maps, hash tables, queues,
stacks, graphs, grab-bags and heaps) with both immutable and mutable
implementations as well as more sophisticated data structures (e.g. skip
lists, functional arrays, deques, tries, vertex/index arrays).

I could do this. Would anyone be interested? I'm erring on the side of a new
book on visualisation instead...

Has anyone read a book on data structures that explained how writing in
continuating passing style lets you convert from hashtables to maps and
back transparently? I find that quite useful...

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/12/2007 11:05:04 PM
Joachim Durchholz wrote:
> I do have a fixed list of recommendations, though each fills just a
> specific need, in no particular order:
> 
> For data structures in FPLs: Chris Okasaki, Purely Functional Data
> Structures. It leaves room for improvement, but it's still
> groundbreaking work.

I agree that this is a fantastically good book. Unfortunately it misses one
of the best purely functional data structures that were ever invented: Ralf
Hinze and Ross Paterson's 2-3 finger trees
(http://www.soi.city.ac.uk/~ross/papers/FingerTree.html). Not Okasaki's
fault, of course, 2-3 finger trees didn't exist at the time he wrote his
book.

Cheers
Ben
0
2/12/2007 11:08:18 PM
Paul Rubin wrote:
> For example, say you want to read 5 bytes from a network socket.  If
> the first byte is a 0, the next 4 bytes are an integer (don't worry
> about byte order); if the first byte is a 1, the next 4 bytes are a
> float.  How would you make one of those type_t's out of the 5 bytes?

Use pattern matching to dispatch over the first byte, collect the next four,
pass it to the appropriate conversion function and box the result in Ival
or Fval:

let int_of_bytes a b c d = a + (b<<8) + (c<<16) + (d<<24)
let ival a b c d = Ival(int_of_bytes a b c d)
let fval a b c d = Fval(float_of_bits(ival a b c d))

let read = function
  | 0 :: a::b::c::d :: t -> ival a b c d :: read t
  | 1 :: a::b::c::d :: t -> fval a b c d :: read t
  | ...

This is exactly the kind of application where you'd probably use a lazy list
for the byte stream.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/12/2007 11:09:55 PM
Joachim Durchholz wrote:
> > � �enum type_t {ival_t, fval_t }; # syntax probably wrong

Nicely self-referencing comment... ;-))

Ben
0
2/12/2007 11:21:59 PM
Jon Harrop <jon@ffconsultancy.com> writes:
> Purely functional data structures certainly tend to be slower for
> the worst case in the absence of concurrency. A map represented by a
> balanced binary tree can be 10x slower than a hash table.

It seems to me that in real-world computing there is always mutable
data at the bottom, i.e. any FP implementation is going to have a
runtime garbage collector busily trying to re-use memory cells in the
computer.

Maybe these functional data structures are often putting the
abstraction at the wrong place, and instead the abstractions should be
offered as language primitives like lists are, so the compiler can use
mutable structures in the very common cases where it can tell that the
object won't be shared.  

> Many GCs (e.g. OCaml's) treat arrays of pointers as atoms, traversing the
> whole array in one go. So the array of pointers in a big hash table causes
> the GC to stall the system, drastically worsening worst-case performance
> and rendering the program less suitable for soft real-time work.

That's just a matter of how the GC is implemented, and how arrays are
implemented.

> In the presence of concurrency, the need for expensive locks is
> minimal if you use purely functional data structures.

It pushes the requirements for locking down to the GC.

> This vastly simplifies concurrent code, which can be much more
> valuable than a factor of 2 in performance.

I think the Haskell STM stuff gets rid of the need for visible locks,
even for mutable structures.

> If the language supports it, the compiler and run-time can leverage
> knowledge about immutability to improve performance.

Certainly, any FPL compiler that didn't do that would be awful.

> I wrote an XMLRPC implementation in OCaml in only 30 LOC by leveraging
> pattern matching, higher-order functions and currying.
> 
> If you're talking about having hundreds of different types then you should
> write a compiler to generate the OCaml bindings for you. That is also easy
> in OCaml.

Maybe the poker guy should have done that.

> I wouldn't use OOP for that in OCaml. Also, any application that uses
> marshalling extensively is better suited to a language with type safe
> marshalling, e.g. HashCaml, MetaOCaml or F#.

Yes, that's precisely what the poker guy was trying to do in Haskell.

> [GADT's] Your typical implementation will look like this OCaml:
> 
> let int = function `Int n -> n | _ -> invalid_arg "int";;
> let bool = function `Bool b -> b | _ -> invalid_arg "bool";;
> let pair = function `Pair(a, b) -> a, b | _ -> invalid_arg "pair";;
> 
> let rec eval = function
>   | `Lit i -> `Int i
>   | `Inc f -> `Int(int(eval f) + 1)
>   | `IsZ f -> `Bool(int(eval f) = 0)
>   | `If(p, t, f) -> eval(if bool(eval p) then t else f)
>   | `Pair(a, b) -> `Pair(eval a, eval b)
>   | `Fst f -> fst(pair(eval f))
>   | `Snd f -> snd(pair(eval f));;

I'm confused by this; eval makes a tagged and boxed value, so the
Lisp equivalent would be (the int (eval f)) for Inc f, and so
forth.  "the" is a runtime type assertion that in some cases can
be optimized away by the compiler.

> In the absence of GADTs you can split your interpreter into specialised
> versions designed to return unboxed values for expressions 

Right, this is pretty messy and is maybe what the poker guy got
frustrated with.  So GADT's are the workaround that let you do things
more like dynamically typed languages, but maybe with less looseness
so I guess it's nicer.
0
phr.cx (5493)
2/12/2007 11:52:35 PM
Jon Harrop <jon@ffconsultancy.com> writes:
> let int_of_bytes a b c d = a + (b<<8) + (c<<16) + (d<<24)
> let ival a b c d = Ival(int_of_bytes a b c d)
> let fval a b c d = Fval(float_of_bits(ival a b c d))

OK, thanks, this helps.  I'm not sure how the syntax indicates that
what's coming out is a type_t but I think I understand the example, so
this is pretty good.  I'll see if I can apply the concept to the poker
guy's Haskell/Erlang example.

Now I have to wonder how it is that the Haskell guys omitted this
feature for so long, if it's been in ML the whole time.
0
phr.cx (5493)
2/13/2007 12:14:45 AM
Paul Rubin wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
>> let int_of_bytes a b c d = a + (b<<8) + (c<<16) + (d<<24)
>> let ival a b c d = Ival(int_of_bytes a b c d)
>> let fval a b c d = Fval(float_of_bits(ival a b c d))
> 
> OK, thanks, this helps.  I'm not sure how the syntax indicates that
> what's coming out is a type_t but I think I understand the example, so
> this is pretty good.

That is a very important point. In this case, we explicitly defined Ival and
Fval so the compiler knows the instant that it sees these constructors that
the result is of type type_t. Thanks to this unambiguous knowledge about
the sum type, static type checking buys you a lot in terms of robustness.
If you make the slightest mistake it is caught.

However, our sum type is not extensible. We could make it extensible by
switching to polymorphic variants:

let ival a b c d = `Ival(int_of_bytes a b c d)
let fval a b c d = `Fval(float_of_bits(ival a b c d))

Note the `.

Now the compiler infers the sum type and leaves it "open", meaning
extensible. However, inferring sum types quickly leads to obfuscated error
messages so you end up adding type annotations. Consequently, non-trivial
polymorphic variants are rare in real OCaml code. However, polymorphic
variants are widely used to implement overlapping sum types (e.g. the GL_
namespace in OpenGL) and simple sum types (as I did in my previous
postings).

For example, you could easily misspell a constructor `IVal instead of `Ival
and this might slip through type checking to give you a run-time error or
even incorrect semantics.

> I'll see if I can apply the concept to the poker guy's Haskell/Erlang
> example. 

Judicious use of currying can help a lot.

> Now I have to wonder how it is that the Haskell guys omitted this
> feature for so long, if it's been in ML the whole time.

I'm sure this is in Haskell. It is just a question of recognising this
pattern for what it is (dynamic typing).

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 12:19:48 AM
Paul Rubin wrote:
>> Don't extrapolate something Joel said about Haskell to the whole of
>> type inference.
> 
> Well it's still standard practice in Haskell to declare most or all
> functions, even though the compiler can infer the types, because the
> declarations make things easier on humans and make it easier to locate
> errors once the compiler reports them. �I don't know whether those
> issues apply to ML.

First, Joel's problems had nothing to do with type /inference/ AFAIR. They
had something to do with static typing, as such, though. I think for what
he wanted to do, an untyped language such as Erlang is simply more
convenient.

Second, it may be true that "declarations [type signatures] ... make it
easier to locate errors once the compiler reports them." However, they can
also make refactoring harder. There are also situations where the code is
relatively simple but the most general type signature is quite complex
(because of many type class constraints). It's a trade-off. One commonly
used practice is to give type signatures only on module boundaries
(especially for libraries) which has the aditional advantage that they make
life easier for auto-documentation tools like haddock. Sometimes a type
signature is necessary because of limitations of the inference algorithm
(polymorphic recursion comes to mind, as well as the monomorphism
restriction).

Cheers
Ben
0
2/13/2007 12:22:01 AM
MarkHaniford@gmail.com wrote:
> On Feb 12, 3:30 pm, Joachim Durchholz <j...@durchholz.org> wrote:
>> Limited selection of graph algorithm.
>> A lot of space for tape sorting - interesting but hardly relevant.
>> No lazy data structures.
>> Old set of algorithms for PRNGs.
>> No cryptography.
>> I don't remember having read about 3D graphics.
>> Heck, not even 2D graphics. I learned about that from Smalltalk books.
> 
> 3d graphics!? 2D graphics!?  Get a grip man.  Knuth can't write a
> volume on every subject under the sun.

Look at this quote from TAOCP:

"High-level languages are inadequate for discussing important low-level
details such as coroutine linkage, random number generation,
multi-precision arithmetic and many problems involving efficient usage of
memory."

Note that the information in TAOCP on every one of those topics in now
almost entirely redundant in the context of modern FPLs. Coroutines are
worthless without discussing tail recursion, exceptions, closures and
higher-order functions. His description of RNGs is apparently out of date.
MP remains of little relevance. Memory usage is quite different in FPLs and
is closely tied to the run-time and its representation.

As one of the few book authors on functional programming, I can tell you
that I have done by homework with regard to market research and my readers
don't give two hoots about tape drives but they love anything on graphics.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 12:25:10 AM
Jon Harrop wrote:
> Andreas Rossberg wrote:
>> Jon Harrop wrote:
>>> I don't know about Joel's project but, in this context, Haskell's type
>>> system is undecidable so it can require type annotations whereas ML's is
>>> decidable and does not require annotations.
>> 
>> Not sure what you mean by "in this context", but just to get this
>> straight: Haskell's type system is not undecidable. GHC allows you to
>> step into the realm of undecidable type checking by explicitly
>> requesting it via a compiler flag, but the average user should hardly
>> ever need that.
> 
> My mistaken then. I thought that was the default in Haskell.

I think you two are talking about different things. Yes, Haskell's type
system is decidable by default. However, Haskell (by default, too) allows
things like polymorphic recursion which will give you a type error if you
forget the type signature. That is, Haskell's type inference is incomplete
(which is because type inference for polymorphic recursion is undecidable).

I hope I've got this right... ;-)

Ben
0
2/13/2007 12:30:55 AM
Paul Rubin wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
>> Purely functional data structures certainly tend to be slower for
>> the worst case in the absence of concurrency. A map represented by a
>> balanced binary tree can be 10x slower than a hash table.
> 
> It seems to me that in real-world computing there is always mutable
> data at the bottom, i.e. any FP implementation is going to have a
> runtime garbage collector busily trying to re-use memory cells in the
> computer.

Absolutely. The FPL compiler takes care of producing correct, mutable data
structures.

> Maybe these functional data structures are often putting the
> abstraction at the wrong place, and instead the abstractions should be
> offered as language primitives like lists are, so the compiler can use
> mutable structures in the very common cases where it can tell that the
> object won't be shared.

That is exactly what modern FPL compilers do and that is why OCaml can be as
fast as C++.

>> Many GCs (e.g. OCaml's) treat arrays of pointers as atoms, traversing the
>> whole array in one go. So the array of pointers in a big hash table
>> causes the GC to stall the system, drastically worsening worst-case
>> performance and rendering the program less suitable for soft real-time
>> work.
> 
> That's just a matter of how the GC is implemented, and how arrays are
> implemented.

Yes, but there are important trade-offs involved that drastically affect
performance. This is a hugely complicated subject and, for high-performance
programming, you must be aware of the caveats of your chosen
implementation.

>> In the presence of concurrency, the need for expensive locks is
>> minimal if you use purely functional data structures.
> 
> It pushes the requirements for locking down to the GC.

Exactly. Writing the GC is then the hard part, which is exactly the way we
want it!

>> This vastly simplifies concurrent code, which can be much more
>> valuable than a factor of 2 in performance.
> 
> I think the Haskell STM stuff gets rid of the need for visible locks,
> even for mutable structures.

Impressive.

>> I wouldn't use OOP for that in OCaml. Also, any application that uses
>> marshalling extensively is better suited to a language with type safe
>> marshalling, e.g. HashCaml, MetaOCaml or F#.
> 
> Yes, that's precisely what the poker guy was trying to do in Haskell.

In OCaml, you should really write custom string_of_foo functions for every
type that passes through your interface. You can factor them into HOFs, of
course, so it generally isn't too much work.

>> [GADT's] Your typical implementation will look like this OCaml:
>> 
>> let int = function `Int n -> n | _ -> invalid_arg "int";;
>> let bool = function `Bool b -> b | _ -> invalid_arg "bool";;
>> let pair = function `Pair(a, b) -> a, b | _ -> invalid_arg "pair";;
>> 
>> let rec eval = function
>>   | `Lit i -> `Int i
>>   | `Inc f -> `Int(int(eval f) + 1)
>>   | `IsZ f -> `Bool(int(eval f) = 0)
>>   | `If(p, t, f) -> eval(if bool(eval p) then t else f)
>>   | `Pair(a, b) -> `Pair(eval a, eval b)
>>   | `Fst f -> fst(pair(eval f))
>>   | `Snd f -> snd(pair(eval f));;
> 
> I'm confused by this; eval makes a tagged and boxed value, so the
> Lisp equivalent would be (the int (eval f)) for Inc f, and so
> forth.  "the" is a runtime type assertion that in some cases can
> be optimized away by the compiler.

Exactly. That is no good. It can always be optimised away in this case but
neither Lisp nor OCaml are expressive enough to convey that information to
the compiler, so they end up boxing/unboxing every single subexpression
evaluation.

>> In the absence of GADTs you can split your interpreter into specialised
>> versions designed to return unboxed values for expressions
> 
> Right, this is pretty messy and is maybe what the poker guy got
> frustrated with.  So GADT's are the workaround that let you do things
> more like dynamically typed languages, but maybe with less looseness
> so I guess it's nicer.

No, you can't even begin to do that in a dynamically typed language - it
will just box/unbox everything.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 12:40:21 AM
Benjamin Franksen wrote:
> I think you two are talking about different things. Yes, Haskell's type
> system is decidable by default. However, Haskell (by default, too) allows
> things like polymorphic recursion which will give you a type error if you
> forget the type signature. That is, Haskell's type inference is incomplete
> (which is because type inference for polymorphic recursion is
> undecidable).
> 
> I hope I've got this right... ;-)

That's exactly what I thought. So you can use polymorphic recursion in your
program but Haskell can't decide the type so you are forced to annotate.
Then in what sense is Haskell's typing decidable?

Note that I'm a physicist and not a type theorist!

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 12:42:49 AM
Paul Rubin wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
>> Purely functional data structures certainly tend to be slower for
>> the worst case in the absence of concurrency. A map represented by a
>> balanced binary tree can be 10x slower than a hash table.
> 
> It seems to me that in real-world computing there is always mutable
> data at the bottom, i.e. any FP implementation is going to have a
> runtime garbage collector busily trying to re-use memory cells in the
> computer.
> 
> Maybe these functional data structures are often putting the
> abstraction at the wrong place, and instead the abstractions should be
> offered as language primitives like lists are, so the compiler can use
> mutable structures in the very common cases where it can tell that the
> object won't be shared.

It may be a common case that an object isn't shared, although I'd guess this
is rather /because/ most languages aren't purely functional. 

It is something completely different to say that such a case can easily be
told by the compiler (I doubt it).

Purely functional data structures have many advantages, one of them being
persistence, i.e. you can quite easily store a complete 'history' of a data
structure -- think undo/redo in an interactive editor -- /without/ making a
complete copy every time it 'changes'.

>> In the presence of concurrency, the need for expensive locks is
>> minimal if you use purely functional data structures.
> 
> It pushes the requirements for locking down to the GC.

No, this has nothing to do with GC. In a purely functional language
you /cannot/ 'accidentally' (in the sense of: not understanding that you
violate an invariant) modify something. Thus, data structures can be shared
quite liberally, even between threads; it is just that you don't notice it
because the sharing is never observable in the form of differing values.

>> This vastly simplifies concurrent code, which can be much more
>> valuable than a factor of 2 in performance.
> 
> I think the Haskell STM stuff gets rid of the need for visible locks,
> even for mutable structures.

And STM works so well in Haskell thanks to immutability of all values.

>> If the language supports it, the compiler and run-time can leverage
>> knowledge about immutability to improve performance.
> 
> Certainly, any FPL compiler that didn't do that would be awful.

I think this is quite hard in the presence of side effects if the latter are
not clearly identifiable via a type distinction.

>> I wrote an XMLRPC implementation in OCaml in only 30 LOC by leveraging
>> pattern matching, higher-order functions and currying.
>> 
>> If you're talking about having hundreds of different types then you
>> should write a compiler to generate the OCaml bindings for you. That is
>> also easy in OCaml.
> 
> Maybe the poker guy should have done that.

It is indeed a possibility. Note however that static typing was only one of
Joels problems.

>> I wouldn't use OOP for that in OCaml. Also, any application that uses
>> marshalling extensively is better suited to a language with type safe
>> marshalling, e.g. HashCaml, MetaOCaml or F#.
> 
> Yes, that's precisely what the poker guy was trying to do in Haskell.

And nowadays we have Data.Binary and Data.ByteString, which would have been
a tremendous help, had it existed then.

>> [GADT's] Your typical implementation will look like this OCaml:
>> 
>> let int = function `Int n -> n | _ -> invalid_arg "int";;
>> let bool = function `Bool b -> b | _ -> invalid_arg "bool";;
>> let pair = function `Pair(a, b) -> a, b | _ -> invalid_arg "pair";;
>> 
>> let rec eval = function
>>   | `Lit i -> `Int i
>>   | `Inc f -> `Int(int(eval f) + 1)
>>   | `IsZ f -> `Bool(int(eval f) = 0)
>>   | `If(p, t, f) -> eval(if bool(eval p) then t else f)
>>   | `Pair(a, b) -> `Pair(eval a, eval b)
>>   | `Fst f -> fst(pair(eval f))
>>   | `Snd f -> snd(pair(eval f));;
> 
> I'm confused by this; eval makes a tagged and boxed value, so the
> Lisp equivalent would be (the int (eval f)) for Inc f, and so
> forth.  "the" is a runtime type assertion that in some cases can
> be optimized away by the compiler.

I doubt there is any lisp dialect which can distinguish (and propagate)
function types at runtime -- you'd need a (potentially) infinite number of
tags to do this...

>> In the absence of GADTs you can split your interpreter into specialised
>> versions designed to return unboxed values for expressions
> 
> Right, this is pretty messy and is maybe what the poker guy got
> frustrated with.  So GADT's are the workaround that let you do things
> more like dynamically typed languages, but maybe with less looseness
> so I guess it's nicer.

No, no, a GADT ensures /more/ static restrictions than a regular ADT, not
less. It says that certain constructors of the data type must
deliver /specialised/ versions of the general type scheme of the data type.
It is these extra restrictions that allow the definition of the 'eval'
function to rest assured that e.g. the result of the 'Inc' constructor is
a 'Term Int' and not just any old 'Term a'. With a ordinary data type this
would have to be checked at runtime.

Cheers
Ben
0
2/13/2007 1:12:08 AM
Benjamin Franksen wrote:
> Jon Harrop wrote:
>> One gem of information that I gleaned from Okasaki's work is that you can
>> recover much of the performance lost when using purely functional data
>> structures by amortising costs. However, this often also loses you thread
>> safety.
> 
> Just curious: How can a purely functional data structure have any probems
> with thread safety?

Good question!

If the API makes the data structure appear purely functional by not
providing mutation but the underlying implementation uses mutation then it
can break thread safety. Splay trees are the obvious example. They reorder
themselves transparently and in-place when you use "find".

It is normally the allocation/GC that kills performance. Amortising
operations can greatly reduce this cost but often requires a mini mutable
stack or equivalent internal data structure to remember the operations
being amortised. To avoid allocation when superceding, these are often
mutable.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 1:17:42 AM
Jon Harrop wrote:
> Paul Rubin wrote:
>> Now I have to wonder how it is that the Haskell guys omitted this
>> feature for so long, if it's been in ML the whole time.
> 
> I'm sure this is in Haskell. It is just a question of recognising this
> pattern for what it is (dynamic typing).

Hm, I don't think polymorphic variants are supported directly in Haskell. I
remember reading some papers that proposed extensions to that effect. You
could probably make up something using advanced type class hackery, but it
wouldn't be very intuitive or idiomatic.

GHC (at least) has a type Dynamic, though, which may be what you meant; it
lets you pack data into a generic container and later dispatch on the
actual type of teh data contained. However, this is involves tagging, of
course (though nicely hidden under the surface) and also excludes
functions.

Cheers
Ben
0
2/13/2007 1:27:58 AM
On Feb 12, 6:25 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> MarkHanif...@gmail.com wrote:
> > On Feb 12, 3:30 pm, Joachim Durchholz <j...@durchholz.org> wrote:
> >> Limited selection of graph algorithm.
> >> A lot of space for tape sorting - interesting but hardly relevant.
> >> No lazy data structures.
> >> Old set of algorithms for PRNGs.
> >> No cryptography.
> >> I don't remember having read about 3D graphics.
> >> Heck, not even 2D graphics. I learned about that from Smalltalk books.
>
> > 3d graphics!? 2D graphics!?  Get a grip man.  Knuth can't write a
> > volume on every subject under the sun.
>
> Look at this quote from TAOCP:
>
> "High-level languages are inadequate for discussing important low-level
> details such as coroutine linkage, random number generation,
> multi-precision arithmetic and many problems involving efficient usage of
> memory."
>
> Note that the information in TAOCP on every one of those topics in now
> almost entirely redundant in the context of modern FPLs.

TAOCP transcends functional programming and are not supposed to be
functional programming books.

 Coroutines are
> worthless without discussing tail recursion, exceptions, closures and
> higher-order functions. His description of RNGs is apparently out of date.
> MP remains of little relevance. Memory usage is quite different in FPLs and
> is closely tied to the run-time and its representation.
>
> As one of the few book authors on functional programming,

Few authors?  I don't think so.

> I can tell you
> that I have done by homework with regard to market research and my readers
> don't give two hoots about tape drives but they love anything on graphics.
>

Haha, I guess Knuth should do a poll and write a book on underwater
basket weaving if the responders want it.  Foley, Van Damn, and others
can handle graphics just fine.



> --
> Dr Jon D Harrop, Flying Frog Consultancy
> OCaml for Scientistshttp://www.ffconsultancy.com/products/ocaml_for_scientists/index.html...


0
MarkHaniford (145)
2/13/2007 1:30:43 AM
Jon Harrop wrote:
> Benjamin Franksen wrote:
>> I think you two are talking about different things. Yes, Haskell's type
>> system is decidable by default. However, Haskell (by default, too) allows
>> things like polymorphic recursion which will give you a type error if you
>> forget the type signature. That is, Haskell's type inference is
>> incomplete (which is because type inference for polymorphic recursion is
>> undecidable).
>> 
>> I hope I've got this right... ;-)
> 
> That's exactly what I thought. So you can use polymorphic recursion in
> your program but Haskell can't decide the type so you are forced to
> annotate. Then in what sense is Haskell's typing decidable?
> 
> Note that I'm a physicist and not a type theorist!

I'm not a type theorist either, but from what I've read so far, a type
system is usually called 'decidable' resp. 'undecidable' according to
whether type /checking/ is (i.e. assuming everything is annotated with a
signature). You could say this terminology is merely a convention (and it
is).

In Haskell98 type checking is decidable, i.e. the type checker always
terminates. Type checking can become undecidable if the type system is so
strong that it becomes turing complete. Dependent type systems have a
tendency to become undecidable, if they allow general recursion. See, e.g.
Qi, or Cayenne, or even C++. Haskell's type system gets undecidable if you
lift certain restrictions for class instance declaration; the ghc compiler
switch is named (guess what) -fallow-undecidable-instances ;-)

Cheers
Ben
0
2/13/2007 1:46:08 AM
Benjamin Franksen wrote:
>>> In the presence of concurrency, the need for expensive locks is
>>> minimal if you use purely functional data structures.
>> 
>> It pushes the requirements for locking down to the GC.
> 
> No, this has nothing to do with GC. In a purely functional language
> you /cannot/ 'accidentally' (in the sense of: not understanding that you
> violate an invariant) modify something. Thus, data structures can be
> shared quite liberally, even between threads; it is just that you don't
> notice it because the sharing is never observable in the form of differing
> values.

So the application programmer doesn't have to take out locks but the GC
programmer must, if the GC is to be concurrent. I think that is what Paul
meant.

>> Right, this is pretty messy and is maybe what the poker guy got
>> frustrated with.  So GADT's are the workaround that let you do things
>> more like dynamically typed languages, but maybe with less looseness
>> so I guess it's nicer.
> 
> No, no, a GADT ensures /more/ static restrictions than a regular ADT, not
> less. It says that certain constructors of the data type must
> deliver /specialised/ versions of the general type scheme of the data
> type. It is these extra restrictions that allow the definition of the
> 'eval' function to rest assured that e.g. the result of the 'Inc'
> constructor is a 'Term Int' and not just any old 'Term a'. With a ordinary
> data type this would have to be checked at runtime.

Exactly. I think Paul's confusion arises from the fact that you don't
explicitly box in dynamically typed languages, it is forced upon you behind
your back. This is why succinct ML code is often fast ML code...

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 1:59:11 AM
Paul Rubin wrote:
> Tony Finch <dot@dotat.at> writes:
>> TAOCP, even the updated editions, is horribly old-fashioned. Knuth's
>> arguments for using a made-up assembly language instead of a made-up
>> programming language are entirely bogus. You're much better off with
>> Corment, Leiserson & Rivest, plus Okasaki if you want to get into
>> hard-core FP.
> 
> Ehh, ok, maybe I'm a fuddy duddy then.  I do know about CL&R and have
> thought I might splurge for a copy sometime, since it has a lot more
> about graph algorithms than Knuth vols 1-3 (the forthcoming Knuth
> vol. 4 will cover them).  I just googled and found a blurb for
> Okasaki's book (hadn't heard of it before) but am not sure what to
> think--should I care about that stuff (for example to support
> parallelism) or just use imperative constructions (e.g. with the array
> monads in Haskell)?

I think the best would be to postpone reading about purely functional data
structures and simply use the ones in the standard libraries. Well, at
least until you feel a bit more familiar with statically typed fucntional
programming. Okasaki's book is very much concerned with proving amortized
bounds for certain operations, in fact he invented the first practical
method to do so for persistent functional data structures. In practice this
is often (but not always) not the most relevant issue; for instance,
constant factors may -- if large enough -- be more relevant in practice
than asymptotic bounds.

Writing 'imperative' code in Haskell is possible using monads, but I'd limit
it to where necessary. For instance, Data.Sequence (new in ghc6.6) is a
really good sequence implementation; of course it doesn't give constant
time random access (random access is logarithmic in the distance to one of
the end points), but this is not very often necessary. And arrays are bad
for appending or merging operations.

Cheers
Ben
0
2/13/2007 2:00:28 AM
Chris F Clark wrote:
> I think the argument is whether one should trust one's readers.  I
> trust the compiler perfectly well.  However, I find that the next
> maintainer of any program I write is likely to be confused if I don't
> over parenthesize.  Now, perhaps these are just the confessions of an
> "imperative programmer".  Perhaps, I would feel differently if I wrote
> a lot of FP and had other FP programmers as my readers.  However, in
> my normal world, I find that one can rarely make things too simple and
> too explicit.
> 
> I, myself, found the superfluous parenthesis made reading the line in
> question simpler.  However, I am neither a native nor a fluent ML
> speaker.  I'm not even a native lisp speaker either, and I didn't find
> the lisp-like placement of the parenthesis distracting.

In *ML and Haskell, there is a very simple rule: (prefix) function
application ("fun arg1 arg2") binds most tightly and associates to the
left. This is quite easy to get used to and I have never seen Haskell code
that uses extra parentheses as in given the example.

I agree, however, that the whole issue is superficial; and that it often
makes sense to be a bit more explicit than would be necessary, if only for
the benefit of the reader. This is epecially true for Haskell, where
operator binding precedence can be defined (almost) ad libitum.

Cheers
Ben
0
2/13/2007 2:11:36 AM
Benjamin Franksen <benjamin.franksen@bessy.de> writes:
> It may be a common case that an object isn't shared, although I'd guess this
> is rather /because/ most languages aren't purely functional. 

Think of the obvious "f(g(x))" in your favorite language; the
intermediate result g(x) is not shared, and this is a very common
construction.

> Purely functional data structures have many advantages, one of them
> being persistence, i.e. you can quite easily store a complete
> 'history' of a data structure -- think undo/redo in an interactive
> editor -- /without/ making a complete copy every time it 'changes'.

This is sometimes useful, sometimes not, and (maybe) sometimes imposes
too much cost.

> > It pushes the requirements for locking down to the GC.
> No, this has nothing to do with GC. ...  data structures can be shared
> quite liberally, even between threads

What I mean is that underneath, the GC is frantically relocating data
in the computer's memory, and might be multi-threaded itself, so
there's still mutation and locking going on.  It's just concealed from
the high-level abstract layer, maybe at the cost of a lot of low level
locking and thrashing instead of a much smaller amount of high level
data mutation.  Presumably Okasaki's work analyzes some of these cost
tradeoffs.

> I doubt there is any lisp dialect which can distinguish (and
> propagate) function types at runtime -- you'd need a (potentially)
> infinite number of tags to do this...

Right.  You'd just manually assign tags to the types you cared about.

> No, no, a GADT ensures /more/ static restrictions than a regular ADT, not
> less. It says that certain constructors of the data type must
> deliver /specialised/ versions of the general type scheme of the data type.

Now I'm confused again--it's not making a boxed object with a type tag
interpreted by special hair in the runtime?
0
phr.cx (5493)
2/13/2007 2:23:54 AM
Jon Harrop wrote:
> Benjamin Franksen wrote:
>> Jon Harrop wrote:
>>> One gem of information that I gleaned from Okasaki's work is that you
>>> can recover much of the performance lost when using purely functional
>>> data structures by amortising costs. However, this often also loses you
>>> thread safety.
>> 
>> Just curious: How can a purely functional data structure have any probems
>> with thread safety?
> 
> Good question!
> 
> If the API makes the data structure appear purely functional by not
> providing mutation but the underlying implementation uses mutation then it
> can break thread safety. Splay trees are the obvious example. They reorder
> themselves transparently and in-place when you use "find".

Hmm, yes. The internal implementation using mutable structures would have to
be thread-safe, too. And now I think I understand your (and Paul's)
comments w.r.t. GC. Yes, the GC would have to be aware of multiple threads
and do some (hopefully small) amount of locking... at least if the
(internally used) mutable state /can be/ shared between threads. /If/ the
code internally uses the mutable state only in a single threaded way (which
can be statically guaranteed by using the ST monad in Haskell) then this
wouldn't be a problem, right?

> It is normally the allocation/GC that kills performance. Amortising
> operations can greatly reduce this cost but often requires a mini mutable
> stack or equivalent internal data structure to remember the operations
> being amortised. To avoid allocation when superceding, these are often
> mutable.

But what Okasaki shows in his book is that you can use laziness instead of
mutability. Laziness (if used in the right way) allows the (purely
functional) implementation to postpone costly operations and to perform
them incrementally 'on demand'. You need a special sort of cost analysis to
show the amortized time bounds even in case of persistent use, but
developing and applying such an analysis is exactly the point of the book
(as I understood it).

Of course it may be that a lazy evaluation strategy places additional
demands on the GC...

Cheers
Ben
0
2/13/2007 2:28:13 AM
MarkHaniford@gmail.com wrote:
>> Note that the information in TAOCP on every one of those topics is now
>> almost entirely redundant in the context of modern FPLs.
> 
> TAOCP transcends functional programming

No more than a book on how to use an abacus "transcends" FP.

> and are not supposed to be functional programming books.

This forum and thread are about FP. TAOCP might still be relevant for
embedded assembly language programmers but that discussion is for another
newsgroup.

>> I can tell you
>> that I have done by homework with regard to market research and my
>> readers don't give two hoots about tape drives but they love anything on
>> graphics.
> 
> Haha, I guess Knuth should do a poll and write a book on underwater
> basket weaving if the responders want it.  Foley, Van Damn, and others
> can handle graphics just fine.

You claimed that TAOCP is still relevant, yet people no longer value its
contents.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 2:29:59 AM
Jon Harrop wrote:
> Benjamin Franksen wrote:
>>>> In the presence of concurrency, the need for expensive locks is
>>>> minimal if you use purely functional data structures.
>>> 
>>> It pushes the requirements for locking down to the GC.
>> 
>> No, this has nothing to do with GC. In a purely functional language
>> you /cannot/ 'accidentally' (in the sense of: not understanding that you
>> violate an invariant) modify something. Thus, data structures can be
>> shared quite liberally, even between threads; it is just that you don't
>> notice it because the sharing is never observable in the form of
>> differing values.
> 
> So the application programmer doesn't have to take out locks but the GC
> programmer must, if the GC is to be concurrent. I think that is what Paul
> meant.

Yes, right. I didn't understand this at first.

Cheers
Ben
0
2/13/2007 2:30:40 AM
Benjamin Franksen wrote:
> Hmm, yes. The internal implementation using mutable structures would have
> to be thread-safe, too. And now I think I understand your (and Paul's)
> comments w.r.t. GC. Yes, the GC would have to be aware of multiple threads
> and do some (hopefully small) amount of locking... at least if the
> (internally used) mutable state /can be/ shared between threads. /If/ the
> code internally uses the mutable state only in a single threaded way
> (which can be statically guaranteed by using the ST monad in Haskell) then
> this wouldn't be a problem, right?

In theory, I believe so. In practice, concurrent GC still cripples
performance in real implementations. Look at Java and C#.

I've been benchmarking F# recently and it is much slower than OCaml on
allocation-intensive code because the .NET GC is very slow for this usage
compared to the OCaml GC. Now, this is partly because the .NET GC does not
optimise for rapid recycling (a value-lifetime distribution typical of
FPLs) so it isn't a fair comparison in that sense. But F# can use 1.5 CPUs
(thanks to concurrent GC) and yet it is still 3x slower in real time for
many programs.

I should also note that F# can be faster than OCaml on C#-like code, e.g.
float array operations. These play to the advantages of a run-time that has
been heavily tuned for typical C# code. In OCaml, mutation forces minor GC
collection, IIRC.

>> It is normally the allocation/GC that kills performance. Amortising
>> operations can greatly reduce this cost but often requires a mini mutable
>> stack or equivalent internal data structure to remember the operations
>> being amortised. To avoid allocation when superceding, these are often
>> mutable.
> 
> But what Okasaki shows in his book is that you can use laziness instead of
> mutability. Laziness (if used in the right way) allows the (purely
> functional) implementation to postpone costly operations and to perform
> them incrementally 'on demand'. You need a special sort of cost analysis
> to show the amortized time bounds even in case of persistent use, but
> developing and applying such an analysis is exactly the point of the book
> (as I understood it).

Yes.

> Of course it may be that a lazy evaluation strategy places additional
> demands on the GC...

If it is purely functional then I think it must add to the GCs burden. How
can it accumulate information without allocating or mutating?

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 2:39:26 AM
Benjamin Franksen wrote:
> I agree, however, that the whole issue is superficial; and that it often
> makes sense to be a bit more explicit than would be necessary, if only for
> the benefit of the reader. This is epecially true for Haskell, where
> operator binding precedence can be defined (almost) ad libitum.

Can I just note that Chris had parenthesised expressions like:

  (1+(2*3))

I mean, ((f a) b) is bad but the above is ordinary maths!

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 2:43:45 AM
Paul Rubin wrote:
> Benjamin Franksen <benjamin.franksen@bessy.de> writes:
>> It may be a common case that an object isn't shared, although I'd guess
>> this is rather /because/ most languages aren't purely functional.
> 
> Think of the obvious "f(g(x))" in your favorite language; the
> intermediate result g(x) is not shared, and this is a very common
> construction.

Ok, agreed.

>> Purely functional data structures have many advantages, one of them
>> being persistence, i.e. you can quite easily store a complete
>> 'history' of a data structure -- think undo/redo in an interactive
>> editor -- /without/ making a complete copy every time it 'changes'.
> 
> This is sometimes useful, sometimes not, and (maybe) sometimes imposes
> too much cost.

There may be cases where the cost is prohibitive but I'd guess they are
rare. Take a compiler, for instance. You need lots of -- sometimes large --
tables assigning semantic data to names with efficient lookup, insert, and
update. GHC uses a balanced binary tree implementation for finite maps for
those tables. Seems to work fine.

>> > It pushes the requirements for locking down to the GC.
>> No, this has nothing to do with GC. ...  data structures can be shared
>> quite liberally, even between threads
> 
> What I mean is that underneath, the GC is frantically relocating data
> in the computer's memory, and might be multi-threaded itself, so
> there's still mutation and locking going on.  It's just concealed from
> the high-level abstract layer, maybe at the cost of a lot of low level
> locking and thrashing instead of a much smaller amount of high level
> data mutation.  Presumably Okasaki's work analyzes some of these cost
> tradeoffs.

Yes, I didn't understand this at first. You are right, the demands on the GC
are high. And no, Okasaki's work is not concerned with GC costs, at least
from what I remember.

>> I doubt there is any lisp dialect which can distinguish (and
>> propagate) function types at runtime -- you'd need a (potentially)
>> infinite number of tags to do this...
> 
> Right.  You'd just manually assign tags to the types you cared about.

Ok, you can do that any time, of course.

>> No, no, a GADT ensures /more/ static restrictions than a regular ADT, not
>> less. It says that certain constructors of the data type must
>> deliver /specialised/ versions of the general type scheme of the data
>> type.
> 
> Now I'm confused again--it's not making a boxed object with a type tag
> interpreted by special hair in the runtime?

No it is not, it is the opposite. Does Curry-Howard-Isomorphism tell you
anything? Types are propositions and programs are proofs. A well-typed
program dispatching on a GADT is a (machine checkable) /proof/ that certain
combinations of values simply cannot occur. The compiler checks this proof
(which is called type checking) and thus 'knows' that it can optimize away
certain tags. The assurance the compiler (and also the programmer!) gets is
in principle the same as with simple types such as Int vs. Char: You cannot
add an Int to a Char, because the type of the function add (Int -> Char)
says so and the compiler checks that the definition of add as well as its
usage never violates this proposition. The GADT is just a more powerful
version of writing down such a proposition, because the type of the result
can depend on the type of input provided. (It is of course much harder to
design such a type checker; GADTs are a very novel concept.)

Cheers
Ben
0
2/13/2007 3:00:27 AM
Joachim Durchholz wrote:
{stuff deleted}
> A lot of space for tape sorting - interesting but hardly relevant.
These days a disk because of seek times might as well be treated like a 
tape drive which can do random access very slowly. So, I wouldn't say 
algorithms that assume a tape like storage model are irrelevant these days.
0
danwang74 (207)
2/13/2007 3:11:32 AM
Jon Harrop wrote:
> Benjamin Franksen wrote:
>> Hmm, yes. The internal implementation using mutable structures would have
>> to be thread-safe, too. And now I think I understand your (and Paul's)
>> comments w.r.t. GC. Yes, the GC would have to be aware of multiple
>> threads and do some (hopefully small) amount of locking... at least if
>> the (internally used) mutable state /can be/ shared between threads. /If/
>> the code internally uses the mutable state only in a single threaded way
>> (which can be statically guaranteed by using the ST monad in Haskell)
>> then this wouldn't be a problem, right?
> 
> In theory, I believe so. In practice, concurrent GC still cripples
> performance in real implementations. Look at Java and C#.
> 
> I've been benchmarking F# recently and it is much slower than OCaml on
> allocation-intensive code because the .NET GC is very slow for this usage
> compared to the OCaml GC. Now, this is partly because the .NET GC does not
> optimise for rapid recycling (a value-lifetime distribution typical of
> FPLs) so it isn't a fair comparison in that sense. But F# can use 1.5 CPUs
> (thanks to concurrent GC) and yet it is still 3x slower in real time for
> many programs.
> 
> I should also note that F# can be faster than OCaml on C#-like code, e.g.
> float array operations. These play to the advantages of a run-time that
> has been heavily tuned for typical C# code. In OCaml, mutation forces
> minor GC collection, IIRC.

What kind of benchmark do you use?

Take a look at
http://shootout.alioth.debian.org/debian/benchmark.php?test=message&lang=all

GHC Haskell is one of the best implementations if it comes to fast
concurrency, GC cost notwithstanding. I've heard though that there are
situations where GHC's GC does become a bottleneck, and it seems that these
situations mostly involve large amounts of mutable data structures.

>>> It is normally the allocation/GC that kills performance. Amortising
>>> operations can greatly reduce this cost but often requires a mini
>>> mutable stack or equivalent internal data structure to remember the
>>> operations being amortised. To avoid allocation when superceding, these
>>> are often mutable.
>> 
>> But what Okasaki shows in his book is that you can use laziness instead
>> of mutability. Laziness (if used in the right way) allows the (purely
>> functional) implementation to postpone costly operations and to perform
>> them incrementally 'on demand'. You need a special sort of cost analysis
>> to show the amortized time bounds even in case of persistent use, but
>> developing and applying such an analysis is exactly the point of the book
>> (as I understood it).
> 
> Yes.
> 
>> Of course it may be that a lazy evaluation strategy places additional
>> demands on the GC...
> 
> If it is purely functional then I think it must add to the GCs burden. How
> can it accumulate information without allocating or mutating?

Unfortunately I am not nearly expert enough to answer this in any detail.
Drop by on @haskell-cafe or @ghc-users. There are enough of the cracks
there who will answer almost any question...

Cheers
Ben
0
2/13/2007 3:17:01 AM
Paul Rubin wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
>> Purely functional data structures certainly tend to be slower for
>> the worst case in the absence of concurrency. A map represented by a
>> balanced binary tree can be 10x slower than a hash table.
> 
> It seems to me that in real-world computing there is always mutable
> data at the bottom, i.e. any FP implementation is going to have a
> runtime garbage collector busily trying to re-use memory cells in the
> computer.

Many file systems these days manage disk blocks like a functional 
program would memory. If you want to support  easy atomic actions with 
low-cost cleanup the functional style is actually better.
0
danwang74 (207)
2/13/2007 3:19:39 AM
Daniel C. Wang wrote:
> Joachim Durchholz wrote:
>> A lot of space for tape sorting - interesting but hardly relevant.
> These days a disk because of seek times might as well be treated like a
> tape drive which can do random access very slowly.

The controller abstracts away seeking and sweeps the disc to do accesses
from multiple programs efficiently, so seek time is not really comparable
on HDs and tapes. Even so, this topic is very specialised and will not be
of interest to most programmers (unlike graphics ;-).

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 4:06:28 AM
Paul Rubin wrote:
> Benjamin Franksen <benjamin.franksen@bessy.de> writes:
>> It may be a common case that an object isn't shared, although I'd guess
>> this is rather /because/ most languages aren't purely functional.
> 
> Think of the obvious "f(g(x))" in your favorite language; the
> intermediate result g(x) is not shared, and this is a very common
> construction.

I assume you mean that the return value of g(x) can only be accessed from
one thread?

I think that is not true. For example, g(x) could return the same
(referentially transparent) purely functional data structure when called
from two different threads. If it uses hash consing, for example.

>> Purely functional data structures have many advantages, one of them
>> being persistence, i.e. you can quite easily store a complete
>> 'history' of a data structure -- think undo/redo in an interactive
>> editor -- /without/ making a complete copy every time it 'changes'.
> 
> This is sometimes useful, sometimes not, and (maybe) sometimes imposes
> too much cost.

Yes. It is useful to be able to choose, e.g. betweens maps and hash tables
in OCaml.

>> > It pushes the requirements for locking down to the GC.
>> No, this has nothing to do with GC. ...  data structures can be shared
>> quite liberally, even between threads
> 
> What I mean is that underneath, the GC is frantically relocating data
> in the computer's memory, and might be multi-threaded itself, so
> there's still mutation and locking going on.

Exactly.

> It's just concealed from 
> the high-level abstract layer, maybe at the cost of a lot of low level
> locking and thrashing instead of a much smaller amount of high level
> data mutation.

Possible but unlikely. The set of efficient, concurrent C programs that can
be feasibly written is small compared to the set of efficient, functional
programs that can be feasibly written.

> Presumably Okasaki's work analyzes some of these cost tradeoffs.

I don't think he did.

>> No, no, a GADT ensures /more/ static restrictions than a regular ADT, not
>> less. It says that certain constructors of the data type must
>> deliver /specialised/ versions of the general type scheme of the data
>> type.
> 
> Now I'm confused again--it's not making a boxed object with a type tag
> interpreted by special hair in the runtime?

Exactly. GADTs let you convey the necessary information to the compiler so
that it can completely statically infer and check the types of this
example, prove its type correctness and then optimise away all
boxing/unboxing.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 4:18:38 AM
Chris Rathman wrote:
>
http://www.codepoetics.com/wiki/index.php?title=Topics:SICP_in_other_languages

I've posted this mutable list:

module MList = struct
  type ('a, 'b) cons = {mutable hd: 'a; mutable tl: 'b}
  type 'a t = Nil | Cons of ('a, 'a t) cons

  let empty = Nil
  let is_empty = (=) Nil
  let cons h t = Cons{hd=h; tl=t}
  let car = function
    | Nil -> raise Not_found
    | Cons c -> c.hd
  let cdr = function
    | Nil -> raise Not_found
    | Cons c -> c.tl
  let set_car t x = match t with
    | Nil -> raise Not_found
    | Cons c -> c.hd <- x
  let set_cdr t x = match t with
    | Nil -> raise Not_found
    | Cons c -> c.tl <- x
  let rec append t1 t2 = match t1 with
    | Nil -> ()
    | Cons({tl=Nil} as c) -> c.tl <- t2
    | Cons c -> append c.tl t2
  let rec last_pair = function
    | Cons {tl=Nil} as c -> c
    | Cons c -> last_pair c.tl
    | Nil -> raise Not_found
  let append' t1 t2 = set_cdr (last_pair t1) t2
end;;

and this mutable Queue (there is a faster, more space-efficient but
obfuscated implementation in the OCaml stdlib):

module Queue = struct
  type 'a cell =
      {mutable prev: 'a cell option;
       elt: 'a;
       mutable next: 'a cell option}
  type 'a t =
      {mutable front: 'a cell option;
       mutable rear: 'a cell option}

  let empty = {front=None; rear=None}

  let is_empty q = q.front=None

  let front q = match q.front with
    | None -> raise Not_found
    | Some x -> x

  let insert q x = match q.front with
    | None ->
        let cell = Some {prev=None; elt=x; next=None} in
        q.front <- cell;
        q.rear <- cell
    | Some f ->
        let cell = Some {prev=None; elt=x; next=Some f} in
        f.prev <- cell;
        q.front <- cell

  let delete q = match q.rear with
    | None -> raise Not_found
    | Some r -> match r.prev with
      | None ->
          let x = r.elt in
          q.front <- None;
          q.rear <- None;
          x
      | Some r' ->
          let x = r.elt in
          r'.next <- None;
          q.rear <- r.prev;
          x
end;;

Also, memoize can be written without a type annotation (I've no idea what
you were doing wrong!):

let memoize f =
  let m = Hashtbl.create 1 in
  let rec f' x =
    try Hashtbl.find m x with Not_found ->
      let f_x = f f' x in
      Hashtbl.add m x f_x;
      f_x in
  f';;

let fib fib = function
  | 0 -> 0
  | 1 -> 1
  | n -> fib (n-1) + fib (n-2);;

let mem_fib = memoize fib;;

I think the OCaml code will be a lot clearer once we get rid of all the
dynamic dispatch and run-time checking.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 4:23:36 AM
On Feb 12, 8:43 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> Can I just note that Chris had parenthesised expressions like:
>
>   (1+(2*3))
>
> I mean, ((f a) b) is bad but the above is ordinary maths!

On a somewhat related subject...  My thinking is that I really need to
go through the motions of reconciling the F# and O'Caml code.  The
material in SICP really shouldn't dictate a separate translation.  The
only reason I put it out there was that I was wanting to ask a
question on the F# mailing list... The question is about module types
in F#.  A cursory glance through the mailing list archives gave some
pointers on the subject, but I got off track and never managed to get
my thoughts together on the subject.  Anyhow, since you are modifying
the O'Caml code, it would probably not make sense to duplicate the
changes for what essentially was put up in order to ask a question
about module types.

Thanks,
Chris

0
2/13/2007 4:42:06 AM
On Feb 12, 10:23 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> let memoize f =
>   let m = Hashtbl.create 1 in
>   let rec f' x =
>     try Hashtbl.find m x with Not_found ->
>       let f_x = f f' x in
>       Hashtbl.add m x f_x;
>       f_x in
>   f';;
>
> let fib fib = function
>   | 0 -> 0
>   | 1 -> 1
>   | n -> fib (n-1) + fib (n-2);;
>
> let mem_fib = memoize fib;;

I've a suspicion that the O'Caml version borrowed a problem from the
Alice translation.  In dabbling on another site (one that is more
ambitious than the SICP site  - http://en.literateprograms.org/Fibonacci_numbers_%28Alice_ML%29
) - I stumbled on the problem that the memoize needs to be a closure
within each recursive call - otherwise I'm just memoizing a single
return value - instead of recursively memoizing down the chain of the
recursion.   I think I got it corrected on the Alice version.  Didn't
think to check the O'Caml.

Thanks,
Chris


0
2/13/2007 5:01:06 AM
Chris Rathman wrote:
> On Feb 12, 8:43 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
>> Can I just note that Chris had parenthesised expressions like:
>>
>>   (1+(2*3))
>>
>> I mean, ((f a) b) is bad but the above is ordinary maths!
> 
> On a somewhat related subject...  My thinking is that I really need to
> go through the motions of reconciling the F# and O'Caml code.

Maybe. F# is different in many respects. Most of the OCaml code that I've
posted won't work in F#. You don't have labelled or optional arguments,
functors, structurally-subtyped objects, polymorphic variants,
monomorphism...

> The 
> material in SICP really shouldn't dictate a separate translation.

I think the relative merits of the two languages will cause the code to
diverge. Some code (e.g. the mutable list and queue implementations) can
stay the same.

> The 
> only reason I put it out there was that I was wanting to ask a
> question on the F# mailing list... The question is about module types
> in F#.  A cursory glance through the mailing list archives gave some
> pointers on the subject, but I got off track and never managed to get
> my thoughts together on the subject.  Anyhow, since you are modifying
> the O'Caml code, it would probably not make sense to duplicate the
> changes for what essentially was put up in order to ask a question
> about module types.

Module types only have limited support in F#: no abstract types, no nested
module types.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 5:03:48 AM
Benjamin Franksen wrote:
> 
> I'm not a type theorist either, but from what I've read so far, a type
> system is usually called 'decidable' resp. 'undecidable' according to
> whether type /checking/ is (i.e. assuming everything is annotated with a
> signature). You could say this terminology is merely a convention (and it
> is).

A type system usually is specified in a declarative manner, by a set of 
potentially non-deterministic deduction rules. To perform type checking 
you need a deterministic algorithm. Such an algorithm usually should 
have at least the following three properties with respect to the 
declarative formulation:

* Soundness. It should only classify those programs as well-typed which 
are according to the declarative specification.

* Completeness. It should classify all programs as well-typed that are 
according to the declarative specification.

* Termination. It should terminate on all programs.

The first two properties together establish partial correctness of the 
algorithm. Along with the third you have total correctness. A type 
system only is decidable if there exists a total, correct algorithm to 
check it.

(Note that the soundness property above is unrelated to the soundness 
property of the type system itself, which says that all well-typed 
programs successfully evaluate.)

The distinction between type checking and type inference is another 
story. A type checking algorithm may require an arbitrary amount of 
"inference", but one usually talks about type checking when all type 
information flows only in one direction, e.g. all types can be derived 
in a purely bottom-up manner when given the types of variables. ML 
cannot be type-checked in that manner, because it requires unification 
and hence arbitrary information flow.

(By this distinction, the so-called "type inference" features in C++0X 
and friends are not really type inference, as they still work simply 
bottom-up.)

With respect to Haskell, the "declarative" type system (which isn't 
really formalised for full Haskell, but at least for significant 
sublanguages), specifies that a definition can only be polymorphically 
recursive if it has a type signature. Hence type inference not only is 
decidable, it also is complete wrt to that specification.

You may consider it incomplete wrt a hypothetical superset language of 
Haskell in which polymorphic recursion is always allowed. But 
technically, that's not Haskell anymore. Haskell's declarative type 
system was tailored that way precisely to allow complete inference. That 
may sound like a subtlety but is an important point. The same is true 
for sum types in *ML, or for more sophisticated uses of objects in OCaml.

Hope this clarifies,
- Andreas
0
rossberg (600)
2/13/2007 12:18:32 PM
Andreas Rossberg wrote:
> ...a definition can only be polymorphically
> recursive if it has a type signature...
> 
> ...Haskell's declarative type
> system was tailored that way precisely to allow complete inference...

I don't understand this bit. If Haskell's type system can fail without type
annotations, how can you say that Haskell can do "complete inference"?

Was I at least correct that Haskell will reject some programs until type
annotations are added manually whereas ML never require type annotations?

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 3:14:38 PM
Benjamin Franksen <benjamin.franksen@bessy.de> wrote:
>Jon Harrop wrote:
>
>> It is normally the allocation/GC that kills performance. Amortising
>> operations can greatly reduce this cost but often requires a mini mutable
>> stack or equivalent internal data structure to remember the operations
>> being amortised. To avoid allocation when superceding, these are often
>> mutable.
>
>But what Okasaki shows in his book is that you can use laziness instead of
>mutability. Laziness (if used in the right way) allows the (purely
>functional) implementation to postpone costly operations and to perform
>them incrementally 'on demand'. You need a special sort of cost analysis to
>show the amortized time bounds even in case of persistent use, but
>developing and applying such an analysis is exactly the point of the book
>(as I understood it).
>
>Of course it may be that a lazy evaluation strategy places additional
>demands on the GC...

Laziness also implies mutation within the RTC, which will also affect
the system's handling of concurrency.

Tony.
-- 
f.a.n.finch  <dot@dotat.at>  http://dotat.at/
FISHER: SOUTHEAST 6 TO GALE 8, BACKING EAST 5. ROUGH OR VERY ROUGH. RAIN OR
SHOWERS. MODERATE OR GOOD.
0
dot1 (33)
2/13/2007 3:44:13 PM
Paul Rubin wrote:
> Besides the infix syntax and the static type system, can someone
> describe the difference between ML and Lisp from a programmer's point
> of view?  Should a Lisp programmer be able to get accustomed to .*ML
> (insert your favorite dialect prefix for ".*") without a lot of
> adjustment?  I can tell you that does NOT happen with Haskell ;).

I recently discussed the benefits of ML's pattern matching with a Lisper and
I think you'll be interested in some of our results. I wrote the following
symbolic simplifier in OCaml:

type expr =
    Q of num
  | Var of string
  | Add of expr * expr
  | Mul of expr * expr

let compose f g e f' op g' = if f==f' && g==g' then e else op f' g'

let rec simplify = function
    Q _ | Var _ as e -> e
  | Add (f, g) as e ->
      compose f g e (simplify f) (fun f g -> Add(f, g)) (simplify g)
  | Mul (f, g) as e ->
      compose f g e (simplify f) (fun f g -> Mul(f, g)) (simplify g)
and simplify_aux = function
    Add (Q n, Q m) -> Q (n +/ m)
  | Mul (Q n, Q m) -> Q (n */ m)
  | Add (Q (Int 0), e) | Add (e, Q (Int 0)) -> e
  | Mul (Q (Int 1), e) | Mul (e, Q (Int 1)) -> e
  | Mul (Q (Int 0), e) | Mul (e, Q (Int 0)) -> Q(Int 0)
  | Add (f, Add(g, h)) -> simplify (Add (Add (f, g), h))
  | Mul (f, Mul(g, h)) -> simplify (Mul (Mul (f, g), h))
  | e -> e;;

Note that it uses arbitrary-precision arithmetic. I asked him to translate
it into Lisp and he came up with the following:

(defun simplify (a)
   (if (atom a)
       a
       (destructuring-bind (op x y) a
        (let* ((f (simplify x))
               (g (simplify y))
               (nf (numberp f))
               (ng (numberp g))
               (+? (eq '+ op))
               (*? (eq '* op)))
          (cond
            ((and +? nf ng)                   (+ f g))
            ((and +? nf (zerop f))            g)
            ((and +? ng (zerop g))            f)
            ((and (listp g) (eq op (first g)))
             (destructuring-bind (op2 u v) g
               (simplify `(,op (,op ,f ,u) ,v))))
            ((and *? nf ng)                   (* f g))
            ((and *? (or (and nf (zerop f))
                         (and ng (zerop g)))) 0)
            ((and *? nf (= 1 f))              g)
            ((and *? ng (= 1 g))              f)
            (t                                `(,op ,f ,g)))))))

Benchmarking these two functions on randomly generated symbolic expressions,
the OCaml is over 100x faster than the Lisp compiled with SBCL:

n       OCaml  SBCL
1000    <.001s 0.004s
10000   <.001s 0.069s
100000  0.006s 0.628s
1000000 0.055s 9.317s

Any Lisper interested in learning ML would do well to understand why the ML
is so concise and efficient compared to the Lisp.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 3:56:15 PM
Jon Harrop <jon@ffconsultancy.com> wrote:
>
>Memory usage is quite different in FPLs and
>is closely tied to the run-time and its representation.

Not just FPLs, but anything with a substantial runtime, including Java and
..NET, etc.

Tony.
-- 
f.a.n.finch  <dot@dotat.at>  http://dotat.at/
FORTH TYNE DOGGER: WEST BACKING SOUTH 4 OR 5 OCCASIONALLY 6 LATER. SLIGHT OR
MODERATE. RAIN. MODERATE OR GOOD.
0
dot1 (33)
2/13/2007 4:10:45 PM
Jon Harrop wrote:
> Andreas Rossberg wrote:
>> ...a definition can only be polymorphically
>> recursive if it has a type signature...
>>
>> ...Haskell's declarative type
>> system was tailored that way precisely to allow complete inference...
> 
> I don't understand this bit. If Haskell's type system can fail without type
> annotations, how can you say that Haskell can do "complete inference"?
> 
> Was I at least correct that Haskell will reject some programs until type
> annotations are added manually

It will reject some "programs", but the point is that these are not 
legal Haskell programs according to the definition of the type system. 
So inference will never fail on legal programs.

Another way to look at it is that Haskell has two different forms of let 
bindings, with different syntax. One includes a type signature in its 
syntax, the other does not. Their typing rules differ.

This certainly is nit-picking to some degree. The difference to other, 
less well-founded languages is that those tend to claim all kinds of 
programs legal qua their language definition, just because you could in 
principle find a typing in many cases, but compilers will fail on them 
anyway.

> whereas ML never require type annotations?

MLs also require occasional annotations. For instance, SML sometimes 
requires annotations to resolve overloading or flexible records. OCaml 
sometimes requires annotations with objects or with polymorphic variants 
(in particular, the situation with polymorphic methods is closely 
related to polymorphic recursion).

Moreover, don't forget that even datatype declarations are (indirect) 
type annotations, though with different syntax. The essence is the same: 
you cannot write any non-trivial ML or Haskell program without giving a 
certain amount of explicit type information.

-- 
Andreas Rossberg, rossberg@ps.uni-sb.de
0
rossberg (600)
2/13/2007 4:26:17 PM
Tony Finch <dot@dotat.at> writes:
> Laziness also implies mutation within the RTC,

What's that?
0
phr.cx (5493)
2/13/2007 5:19:51 PM
Andreas Rossberg <rossberg@ps.uni-sb.de> writes:
> * Termination. It should terminate on all programs.

Is that really enough?  What if it always terminates but has
super-exponential complexity?  Does that ever happen in
would-be-practical type systems?  I think there's a situation like
this with some systems that try to check the correctness of array
subscripts, using the decision procedure for Presberger arithmetic.

> The distinction between type checking and type inference is another
> story. A type checking algorithm may require an arbitrary amount of
> "inference", but one usually talks about type checking when all type
> information flows only in one direction, e.g. all types can be derived
> in a purely bottom-up manner when given the types of variables. ML
> cannot be type-checked in that manner, because it requires unification
> and hence arbitrary information flow.

Aha, maybe this explains I've found ML (etc.) types to be more
confusing than C/Java/etc. types.  Thanks.
0
phr.cx (5493)
2/13/2007 5:25:10 PM
Jon Harrop <jon@ffconsultancy.com> writes:
> Benchmarking these two functions on randomly generated symbolic expressions,
> the OCaml is over 100x faster than the Lisp compiled with SBCL:

Have you profiled the executions and figured out where they're spending
their time?
0
phr.cx (5493)
2/13/2007 6:19:56 PM
Jon Harrop schrieb:
> 
> One gem of information that I gleaned from Okasaki's work is that you can
> recover much of the performance lost when using purely functional data
> structures by amortising costs. However, this often also loses you thread
> safety.

You don't lose thread safety.
If you have an amortizing data structure, submit it to multiple threads, 
and have each thread create new versions from it, then you lose 
amortization.

Regards,
Jo
0
jo427 (1164)
2/13/2007 7:14:43 PM
Paul Rubin schrieb:
> Tony Finch <dot@dotat.at> writes:
>> Laziness also implies mutation within the RTC,
> 
> What's that?

Laziness is the opposite of strictness.

The
   if expr then expr else expr
operator is strict in its first argument (it always evaluates it), and 
lazy in the second and third argument (it won't evalute its operands 
until it needs them).

For user-defined functions, most programming languages use strictness: 
parameters are evaluated, then the results are passed to the function.
Haskell and a few other languages use laziness by default: the 
expressions are simply passed into the function without evaluating them. 
The get automatically evaluated as soon as they are needed (typically 
when they are used inside the condition of an if expression, or in a 
similar place).

This has advantages:
* No expression is ever evaluated unless it's needed.
   You can pass information along "just in case" and don't need to worry
   about the cost of evaluating it - it will be evaluated if and when
   it's needed (and we don't incur additional evaluation cost because if
   it was needed, we'd have evaluated it somewhere anyway).
* You can handle infinite data structures.
   E.g. the list of all integers, starting from n, is
     integers n = n : integers (n + 1)
   The list construction operator (:) is lazy, so if you use integer 10
   somewhere in the code, this will simply remain an expression.
   Now if some code accesses (say) the second element of that list, this
   will eventually force the expression to evaluate to
     10 : temp1 : temp2 : integers (temp2 + 1)
   where temp1 and temp2 are the expressions:
     temp1: 10 + 1
     temp2: temp1 + 1
   The tail of the list is still infinite, but finitely represented as
   a call to integers.
   (This also works for infinite acyclic graphs.)

It also has disadvantages:
* The run-time system cannot pass around values, it must pass around
   expressions. These are more complicated, so subroutine calls become
   more expensive.
* If you get handed a data structure, you don't know how long accessing
   any element of it may take: it could be a huge unevaluated expression.

Regards,
Jo
0
jo427 (1164)
2/13/2007 7:35:13 PM
Joachim Durchholz wrote:
> For programming paradigms in general: Peter van Roy, Seif Haridi,
> Concepts, Techniques, and Models of Computer Programming. Gives an
> excellent bird's eye overview of most of the programming paradigms that
> are around. (It's missing Prolog, which I personally consider a failure
> though it seems to have sparked some interesting other languages.)

What's wrong with section 9.7?

> Cryptography: Bruce Schneier, Applied Cryptography. *The* standard in
> its field. Though I suspect that my copy is quite outdated by now -
> cryptography is a fast-moving field.

This is now very outdated.

> I probably forgot to mention a few, but I don't have my bookshelf handy.

/Introduction to Algorithms/, Cormen, Leiserson and Rivest (and Stein for
the 2nd edition, although I only have the first).

-- 
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
0
2/13/2007 7:40:59 PM
Joachim Durchholz <jo@durchholz.org> writes:
> >> Laziness also implies mutation within the RTC,
> > What's that?
> Laziness is the opposite of strictness.

I mean what's RTC?  I don't recognize the acronym.  I'm familiar with
the basic notion of laziness though still a bit fuzzy on the fine
points of how to implement it.  I'm sorry about the unclarity in my
question leading you to write the unneeded (but very good) explanation
of laziness in general.  Thanks.
0
phr.cx (5493)
2/13/2007 7:43:09 PM
Paul Rubin schrieb:
> What I mean is that underneath, the GC is frantically relocating data
> in the computer's memory, and might be multi-threaded itself, so
> there's still mutation and locking going on.  It's just concealed from
> the high-level abstract layer,

Yes.
Actually, every evaluation step in the Haskell run-time system mutates 
an expression tree with sharing, so there's even more mutation going on 
behind the scene than meets the eye. (In the *MLs there is less 
behind-the-scenes mutation.)

However, you don't have to worry about mutation in your source code.
No aliasing problems anymore!

 > maybe at the cost of a lot of low level
> locking and thrashing instead of a much smaller amount of high level
> data mutation.

No, it's usually the opposite: in a language with mutation, you're 
either forced to use all data in a single-threaded fashion (which places 
quite tight constraints on the design), or you have to copy entire data 
structures "just to make sure it doesn't get overwritten".

E.g. guess what happens first when submitting a data structure to any 
Windows API: it gets copied. Sometimes several levels deep. That's 
necessary because callbacks may modify the copy after the data was 
passed to Windows... if Windows could assume that all data were 
immutable, it would have to do less copying and CPU cache thrashing.


The mutation inside a FPL usually doesn't even need locking: create a 
new block (no locking needed), assign a pointer (no locking needed), and 
(in lazy languages): replace a pointer to an expression with a point to 
the value (no locking either).
The garbage collector needs to lock blocks before moving them. However, 
the GC takes just about 10% of the total time, the rest of the time the 
system is running without any locks whatsoever. Try to beat that with 
mutable data...

Regards,
Jo
0
jo427 (1164)
2/13/2007 7:47:53 PM
Benjamin Franksen <benjamin.franksen@bessy.de> writes:

> Jon Harrop wrote:
>> Benjamin Franksen wrote:
>>> Jon Harrop wrote:
>>>> One gem of information that I gleaned from Okasaki's work is that you
>>>> can recover much of the performance lost when using purely functional
>>>> data structures by amortising costs. However, this often also loses you
>>>> thread safety.
>>> 
>>> Just curious: How can a purely functional data structure have any probems
>>> with thread safety?
>> 
>> Good question!
>> 
>> If the API makes the data structure appear purely functional by not
>> providing mutation but the underlying implementation uses mutation then it
>> can break thread safety. Splay trees are the obvious example. They reorder
>> themselves transparently and in-place when you use "find".
>
> Hmm, yes. The internal implementation using mutable structures would have to
> be thread-safe, too. And now I think I understand your (and Paul's)
> comments w.r.t. GC. Yes, the GC would have to be aware of multiple threads
> and do some (hopefully small) amount of locking... at least if the
> (internally used) mutable state /can be/ shared between threads. /If/ the
> code internally uses the mutable state only in a single threaded way (which
> can be statically guaranteed by using the ST monad in Haskell) then this
> wouldn't be a problem, right?
>
>> It is normally the allocation/GC that kills performance. Amortising
>> operations can greatly reduce this cost but often requires a mini mutable
>> stack or equivalent internal data structure to remember the operations
>> being amortised. To avoid allocation when superceding, these are often
>> mutable.
>
> But what Okasaki shows in his book is that you can use laziness instead of
> mutability. Laziness (if used in the right way) allows the (purely
> functional) implementation to postpone costly operations and to perform
> them incrementally 'on demand'. You need a special sort of cost analysis to
> show the amortized time bounds even in case of persistent use, but
> developing and applying such an analysis is exactly the point of the book
> (as I understood it).
>
> Of course it may be that a lazy evaluation strategy places additional
> demands on the GC...

Lazy evaluation strategies use mutability (internally).

0
find19 (1244)
2/13/2007 8:08:23 PM
Joachim Durchholz <jo@durchholz.org> writes:

[...]

> E.g. guess what happens first when submitting a data structure to any
> Windows API: it gets copied. Sometimes several levels deep. That's
> necessary because callbacks may modify the copy after the data was
> passed to Windows... if Windows could assume that all data were
> immutable, it would have to do less copying and CPU cache thrashing.

I'm guessing it might also have to do with the fact that C/C++ isn't
garbage-collected.  After all, you don't want dangling pointers in
your kernel.

Besides, what do you do when you want to "mutate" an immutable data
structure?  Copy it, I suppose.

> The mutation inside a FPL usually doesn't even need locking: create
> a new block (no locking needed), assign a pointer (no locking
> needed), and (in lazy languages): replace a pointer to an expression
> with a point to the value (no locking either).  The garbage
> collector needs to lock blocks before moving them. However, the GC
> takes just about 10% of the total time, the rest of the time the
> system is running without any locks whatsoever. Try to beat that
> with mutable data...

You make it sound like being purely functional is an advantage
performance-wise.  Purely functional languages do have some
advantages, but better performance (than "impure" languages) isn't one
of them.

--
Wolfram Fenske

A: Yes.
>Q: Are you sure?
>>A: Because it reverses the logical flow of conversation.
>>>Q: Why is top posting frowned upon?

0
int2k (134)
2/13/2007 9:00:04 PM
On 13 Feb., 18:25, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Andreas Rossberg <rossb...@ps.uni-sb.de> writes:
> > * Termination. It should terminate on all programs.
>
> Is that really enough?  What if it always terminates but has
> super-exponential complexity?

Note that I said "at least" in the part you snipped. ;-) For practical
purposes, you want more, e.g. reasonable performance, understandable
error messages, predictability... Some experimental type systems never
made it into something usable because they failed on one or more of
these points (type inference with subtyping comes to mind).

> Does that ever happen in would-be-practical type systems?

In fact, ML-style type inference has exponential worst-case behaviour.
However, this has never appeared to be a problem in practice, because
the worst case only appears for examples where the types themselves
get exponentially large - i.e. examples you couldn't write down
without type inference in the first place. (The worst case is so rare
that Damas/Milner first conjectured linear complexity, and it took
years until that claim was falsified.)

Certainly, this might be different with other type systems.

> > The distinction between type checking and type inference is another
> > story. A type checking algorithm may require an arbitrary amount of
> > "inference", but one usually talks about type checking when all type
> > information flows only in one direction, e.g. all types can be derived
> > in a purely bottom-up manner when given the types of variables. ML
> > cannot be type-checked in that manner, because it requires unification
> > and hence arbitrary information flow.
>
> Aha, maybe this explains I've found ML (etc.) types to be more
> confusing than C/Java/etc. types.  Thanks.

The type language itself, or the type inference process and its
results? The former would puzzle me, because I know of no practical
language that has a more regular type algebra than ML. Certainly not C
and Java.

- Andreas

0
rossberg (600)
2/13/2007 9:10:16 PM
int2k@gmx.net wrote:
> Besides, what do you do when you want to "mutate" an immutable data
> structure? �Copy it, I suppose.

You don't need to copy everything; and sometimes you get by with noe copying
at all. A very simple example is adding an element to the front of a list;
the new list will just point to the old one, no copying. (This is only the
best possible case, of course, usually some copying will be necessary.)

>> The mutation inside a FPL usually doesn't even need locking: create
>> a new block (no locking needed), assign a pointer (no locking
>> needed), and (in lazy languages): replace a pointer to an expression
>> with a point to the value (no locking either). �The garbage
>> collector needs to lock blocks before moving them. However, the GC
>> takes just about 10% of the total time, the rest of the time the
>> system is running without any locks whatsoever. Try to beat that
>> with mutable data...
> 
> You make it sound like being purely functional is an advantage
> performance-wise. �Purely functional languages do have some
> advantages, but better performance (than "impure" languages) isn't one
> of them.

In a highly concurrency setting they can be very efficient. Look at
http://shootout.alioth.debian.org/debian/benchmark.php?test=message&lang=all

This is certainly a special case (however one that's going to become less
special pretty soon if one can believe the 'free lunch is over' story).

Ben
0
2/13/2007 9:30:39 PM
Benjamin Franksen <benjamin.franksen@bessy.de> writes:
> You don't need to copy everything; and sometimes you get by with noe copying
> at all. A very simple example is adding an element to the front of a list;
> the new list will just point to the old one, no copying. (This is only the
> best possible case, of course, usually some copying will be necessary.)

The best case is removing the list's first element (the new list just
points to the old second element), and that's actually very common.
0
phr.cx (5493)
2/13/2007 9:37:15 PM
Paul Rubin wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
>> Benchmarking these two functions on randomly generated symbolic
>> expressions, the OCaml is over 100x faster than the Lisp compiled with
>> SBCL:
> 
> Have you profiled the executions and figured out where they're spending
> their time?

Profiling won't help much here because all the time is spent in the only
function. The Lisp is slower because it repeats the same tests and copies
unnecessarily. You should be able to get within 10x the performance of the
OCaml by optimising the Lisp by hand, but it is seriously tedious...

This style of programming (variants + patterns) is ubiquitous in ML and is
one of the the main advantages over Lisp, IMHO.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/13/2007 11:12:17 PM
int2k@gmx.net schrieb:
> Joachim Durchholz <jo@durchholz.org> writes:
> 
> [...]
> 
>> E.g. guess what happens first when submitting a data structure to any
>> Windows API: it gets copied. Sometimes several levels deep. That's
>> necessary because callbacks may modify the copy after the data was
>> passed to Windows... if Windows could assume that all data were
>> immutable, it would have to do less copying and CPU cache thrashing.
> 
> I'm guessing it might also have to do with the fact that C/C++ isn't
> garbage-collected.  After all, you don't want dangling pointers in
> your kernel.

Windows has to check that when it gets the data structure first, so 
that's not an issue.
Not, it's really that Windows doesn't want to have the data changed 
while it's running.

> Besides, what do you do when you want to "mutate" an immutable data
> structure?  Copy it, I suppose.

Yes.
Though that's a general observation, unrelated to API issues.

>> The mutation inside a FPL usually doesn't even need locking: create
>> a new block (no locking needed), assign a pointer (no locking
>> needed), and (in lazy languages): replace a pointer to an expression
>> with a point to the value (no locking either).  The garbage
>> collector needs to lock blocks before moving them. However, the GC
>> takes just about 10% of the total time, the rest of the time the
>> system is running without any locks whatsoever. Try to beat that
>> with mutable data...
> 
> You make it sound like being purely functional is an advantage
> performance-wise.  Purely functional languages do have some
> advantages, but better performance (than "impure" languages) isn't one
> of them.

Immutability can give massive performance advantages. Simply because you 
can share with impunity, and never have to do a deep copy just to be 
sure that some other part of the program doesn't mutate the data 
structure that you want to save for later reference.
Of course, immutability also has performance disadvantages.
In practice, sometimes you win big, sometimes you lose big, and most of 
the time the difference is somewhere in between.

Regards,
Jo
0
jo427 (1164)
2/13/2007 11:56:11 PM
David Hopwood schrieb:
> Joachim Durchholz wrote:
>> For programming paradigms in general: Peter van Roy, Seif Haridi,
>> Concepts, Techniques, and Models of Computer Programming. Gives an
>> excellent bird's eye overview of most of the programming paradigms that
>> are around. (It's missing Prolog, which I personally consider a failure
>> though it seems to have sparked some interesting other languages.)
> 
> What's wrong with section 9.7?

Nothing that I'm aware of, actually.

Haridi/Roy is good at highlighting the separate choices that can go into 
language design.
My remark was triggered by the feeling that it would be instructive to 
identify these components in languages that were designed as unified 
wholes (say, Prolog, Lisp, and Smalltalk).
Of course, that would be expanding an already-too-thick book. (It might 
be a good course task though.)

Regards,
Jo
0
jo427 (1164)
2/14/2007 12:06:49 AM
int2k@gmx.net wrote:
> Besides, what do you do when you want to "mutate" an immutable data
> structure?  Copy it, I suppose.

No, you derive from it and reference back to most of the original data
structure.

> You make it sound like being purely functional is an advantage
> performance-wise.  Purely functional languages do have some
> advantages, but better performance (than "impure" languages) isn't one
> of them.

In practice, using immutable data structures frees you from some low-level
concerns and gives you more time to perform high-level optimisations. I
often find that judicious use of immutability results in faster code.
Indeed, the only mutable data structures that I make much use of are arrays
and hash tables.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/14/2007 12:17:40 AM
Andreas Rossberg wrote:
>Jon Harrop wrote:
[...]
>> whereas ML never require type annotations?
>
>MLs also require occasional annotations. For instance, SML sometimes 
>requires annotations to resolve overloading or flexible records. OCaml 
>sometimes requires annotations with objects or with polymorphic variants 
>(in particular, the situation with polymorphic methods is closely 
>related to polymorphic recursion).
>
>Moreover, don't forget that even datatype declarations are (indirect) 
>type annotations, though with different syntax. The essence is the same: 
>you cannot write any non-trivial ML or Haskell program without giving a 
>certain amount of explicit type information.

Even counting datatypes, the majority of ML type annotations are probably
to be found in signatures.  One can talk about whether core ML (ML without
the module layer) needs type annotations, but all well-written nontrivial
ML programs use modules so the more interesting question is whether
*those* programs need type annotations, and clearly they do.

I think it's a mistake to value annotation-free type inference as highly
as we (the ML community and, probably, the Haskell community, though
perhaps less strongly) do.  The lack of a decent function-level type
annotation syntax in SML has given me plenty of grief, and I'm not sure
it's possible to get good error messages from unification-based
inference.

The story of refinement types is instructive: despite inference being
possible, it's undesirable.  Efficiency aside, once your type system
captures more precise properties than plain Hindley-Milner, you don't
want all of them inferred.  For example, if you have a function `f'
that's only supposed to take nonempty lists (nonemptiness being an
invariant easily captured by refinement types), you don't want the
type system to allow you to apply `f' to empty lists just because
`f's behavior on empty lists happens to be well-defined; that can
lead to hard-to-find bugs, because you don't think of `f' as taking
empty lists, and you forget that it can.  Or you change `f's
implementation so it does something else with empty lists, and now
other code breaks.  Having to write down in an annotation the
behavior that `f' is supposed to have constrains use of `f', and
makes your intentions clear to yourself and other people (including
yourself a year later when you've forgotten everything).

I guess that's also the story of modules, which, through interfaces,
constrain the behavior of clients and make intentions clear.

-j.
0
joshuad (9)
2/14/2007 1:27:42 AM
Jon Harrop <jon@ffconsultancy.com> writes:
> > Have you profiled the executions and figured out where they're spending
> > their time?
> 
> Profiling won't help much here because all the time is spent in the only
> function. The Lisp is slower because it repeats the same tests and copies
> unnecessarily. You should be able to get within 10x the performance of the
> OCaml by optimising the Lisp by hand, but it is seriously tedious...

Sounds like finer grained profiling is needed.  If the Lisp program is
spending 99% of its time in repeated tests and copies, there's some
algorithmic difference, or some horrible lossage in the SBCL
implementation (which in most regards is pretty good).  Usually a Lisp
program is not 100x slower than the corresponding C program despite
all the type tags and so forth.  So I'm a bit mystified.
0
phr.cx (5493)
2/14/2007 2:58:30 AM
Joachim Durchholz <jo@durchholz.org> writes:

> int2k@gmx.net schrieb:
>> Joachim Durchholz <jo@durchholz.org> writes:
>> [...]
>>
>>> E.g. guess what happens first when submitting a data structure to any
>>> Windows API: it gets copied. Sometimes several levels deep. That's
>>> necessary because callbacks may modify the copy after the data was
>>> passed to Windows... if Windows could assume that all data were
>>> immutable, it would have to do less copying and CPU cache thrashing.
>> I'm guessing it might also have to do with the fact that C/C++ isn't
>> garbage-collected.  After all, you don't want dangling pointers in
>> your kernel.
>
> Windows has to check that when it gets the data structure first, so
> that's not an issue.

Right.  Being the OS, it's in a position to know about this stuff.

> Not, it's really that Windows doesn't want to have the data changed
> while it's running.

Dammit!  You got me there. :-)

>> Besides, what do you do when you want to "mutate" an immutable data
>> structure?  Copy it, I suppose.
>
> Yes.  Though that's a general observation, unrelated to API issues.

But it's related to immutable data structures.  You said before:

> No, it's usually the opposite: in a language with mutation, you're
> either forced to use all data in a single-threaded fashion (which
> places quite tight constraints on the design), or you have to copy
> entire data structures "just to make sure it doesn't get
> overwritten".

This sounded like you were saying that you have to copy *less* data if
you're using immutable data structures.  I really doubt that's
generally true.  Sure, in cases like the one above, it saves you
unnecessary copying but in other cases you'll have to copy more (or at
least allocate more memory, which is an expensive operation) because
you can't mutate.  All in all I'm pretty sure this'll amount to *more*
copying.  The fact that programs in functional languages tend to be
more allocation intensive than ones in non-functional languages is a
hint in that direction.

>>> The mutation inside a FPL usually doesn't even need locking: create
>>> a new block (no locking needed), assign a pointer (no locking
>>> needed), and (in lazy languages): replace a pointer to an expression
>>> with a point to the value (no locking either).  The garbage
>>> collector needs to lock blocks before moving them. However, the GC
>>> takes just about 10% of the total time, the rest of the time the
>>> system is running without any locks whatsoever. Try to beat that
>>> with mutable data...
>>
>> You make it sound like being purely functional is an advantage
>> performance-wise.  Purely functional languages do have some
>> advantages, but better performance (than "impure" languages) isn't one
>> of them.
>
> Immutability can give massive performance advantages.

I'm sure it *can*.  The question is if it *usually* does.  I don't
think so.

> Simply because you can share with impunity, and never have to do a
> deep copy just to be sure that some other part of the program
> doesn't mutate the data structure that you want to save for later
> reference.  Of course, immutability also has performance
> disadvantages.  In practice, sometimes you win big, sometimes you
> lose big, and most of the time the difference is somewhere in
> between.

I can agree to that.

--
Wolfram Fenske

A: Yes.
>Q: Are you sure?
>>A: Because it reverses the logical flow of conversation.
>>>Q: Why is top posting frowned upon?

0
int2k (134)
2/14/2007 6:40:21 AM
Jon Harrop <jon@ffconsultancy.com> writes:

> int2k@gmx.net wrote:
>> Besides, what do you do when you want to "mutate" an immutable data
>> structure?  Copy it, I suppose.
>
> No, you derive from it and reference back to most of the original
> data structure.

But then you introduce another level of indirection--which won't make
you program faster, either.

>> You make it sound like being purely functional is an advantage
>> performance-wise.  Purely functional languages do have some
>> advantages, but better performance (than "impure" languages) isn't one
>> of them.
>
> In practice, using immutable data structures frees you from some
> low-level concerns and gives you more time to perform high-level
> optimisations.

True, but that's a slightly different issue.  Joachim Durchholz talked
about a lot of low-level stuff that the runtime of an FPL doesn't have
to do (mostly, no locking) and that this would be one of the reasons
why working with immutable data structures was faster.  That's what I
object to.  I'm by no means saying "functional languages are slow",
but you can't claim that they (the "pure" ones in particular) are
super fast, either.  If a language enables you to do more high-level
optimisations because you don't have to concern yourself with the
low-level stuff then that's because the language is more high-level.
It doesn't have to do with being functional or not.

> I often find that judicious use of immutability results in faster
> code.  Indeed, the only mutable data structures that I make much use
> of are arrays and hash tables.

Arrays and hash tables, eh?  That's a pretty big exception.  After all
these are the data structures that arguably see the most use, where
performance matters the most.

--
Wolfram Fenske

A: Yes.
>Q: Are you sure?
>>A: Because it reverses the logical flow of conversation.
>>>Q: Why is top posting frowned upon?

0
int2k (134)
2/14/2007 7:13:00 AM
Joshua Dunfield <joshuad@cs.cmu.edu> wrote:
> Andreas Rossberg wrote:
>>Jon Harrop wrote:
> [...]
>>> whereas ML never require type annotations?

> I think it's a mistake to value annotation-free type inference as highly
> as we (the ML community and, probably, the Haskell community, though
> perhaps less strongly) do.  The lack of a decent function-level type
> annotation syntax in SML has given me plenty of grief, 

I also find myself routinely adding type annotations when writing
in OCaml, mainly just for documentation purposes, but occasionally
to restrict the type and make it simpler (especially when using objects).

> The story of refinement types is instructive: despite inference being
> possible, it's undesirable.  

I don't think so. Inference is nice, because it saves you work (why
write down the type yourself when you can just ask the
compiler/interpreter, and then copy and paste), you can use it to
discover that maybe your function is more general than you thought,
and I also frequently *don't* write out type annotations for small or
obvious functions, or ones with uninteresting type signatures, both in
Haskell and OCaml.

> Efficiency aside, once your type system captures more precise
> properties than plain Hindley-Milner, you don't want all of them
> inferred.

Well, maybe, but in practice I think I still would use the "write the
function, ask for the type, see if it matches your expectation" style
often enough. 

> For example, if you have a function `f' that's only supposed to take
> nonempty lists (nonemptiness being an invariant easily captured by
> refinement types),

That's maybe not a particular good example, because it practice you
wouldn't encode such an information in a type if it would involve lots
of more work when writing out types (which again means that I would
only use it in conjunction with type inference). Just not worth the
effort, unless you're really paranoid about correctness (which 
would justify the extra work).

- Dirk
0
dthierbach2 (260)
2/14/2007 7:36:03 AM
Wolfram Fenske schrieb:
> True, but that's a slightly different issue.  Joachim Durchholz talked
> about a lot of low-level stuff that the runtime of an FPL doesn't have
> to do (mostly, no locking) and that this would be one of the reasons
> why working with immutable data structures was faster.

I wasn't clear enough then.

I'm well aware that there are aspects of immutability that hamper 
performance. I just wanted to complement that with the aspects that give 
performance advantages.

 > That's what I
> object to.  I'm by no means saying "functional languages are slow",
> but you can't claim that they (the "pure" ones in particular) are
> super fast, either.

Nobody claimed that!

 > If a language enables you to do more high-level
> optimisations because you don't have to concern yourself with the
> low-level stuff then that's because the language is more high-level.
> It doesn't have to do with being functional or not.

Ah, but I have found again and again that mutability actively hampers 
abstraction.
E.g. it gets quite tedious to debug higher-order functions: if you pass 
several functions to a higher-order function, then it becomes important 
in which order that higher-order function calls them.
I've been doing library design work in imperative languages in almost my 
entire career. Since the day that I knew the advantages of immutability 
I've been actively perceiving the kind of design contortions that 
mutability has been imposing on my designs (mostly in the form of "I 
can't do that because some code might depend on the fact that this piece 
of data doesn't change, and I can't simply copy the data structure 
because the copy may get stale over updates by other code", though the 
influence is more profound than that).

>> I often find that judicious use of immutability results in faster
>> code.  Indeed, the only mutable data structures that I make much use
>> of are arrays and hash tables.
> 
> Arrays and hash tables, eh?  That's a pretty big exception.  After all
> these are the data structures that arguably see the most use, where
> performance matters the most.

Hmm... I have to slightly disagree for staple stuff like business 
applications.

For these, you need a way to search and sort the data, and dictionaries 
are also great for managing relationships between pieces of data. So 
maps (hash tables) are logically central to many applications.

In terms of CPU cycles, they aren't that central anymore. Finding the 
data to process requires a lookup in a possibly far too deep binary 
search tree, but the actual processing still takes longer.

Of course, this can vary greatly. Search engines, databases, graphics, 
matrix stuff - that's all where the lookup can quickly dominate the 
other processing, and in these cases, you need something better than a 
binary search tree.
However, taking a copy of a large array is expensive. In applications 
where that's needed (or even just helpful), the immutability approach 
has advantages.

Take, for an example, an in-memory database. Sounds like it's a prime 
example for mutable data structures, right?
Now add ACID requirements at a high isolation. Promptly you need to copy 
parts of the tables, and all those fast arrays start to get in the way - 
you either have to lock entire tables, or you have to lock each single 
row before accessing it, and check the locks on each access. That's a 
serious performance loss - while the immutable implementation simply 
takes a snapshot by copying the a pointer to the current state of the 
database.
Of course, the immutable solution incurs a similar overhead when 
updating data structures, so the two solutions end up at comparable 
performance. The immutable solution will likely be faster in a massively 
concurrent scenario with heavy update activity, while the mutable 
solution has advantages in the database is used in a single-threaded 
and/or read-only fashion. As I said elsewhere: sometimes you win big, 
sometimes you lose big, and most of the time it's somewhere in the middle.
However, the immutable solution has one advantage: the code is easier to 
reuse, writing and maintaining it required far less man-hours. The 
immutability team has more time or incurs less cost to reach a given 
level of performance.
In other words: Take the kind of manpower that was thrown as MySQL, 
apply it to any FPL compiler and libraries, and I predict you'll get 
something that's more flexible, has better ACID properties, and is at 
least as fast as MySQL. You'd still generate revenue with query 
optimization services, of course :-) (Of course, as with any prediction, 
it's not worth the bytes needed to store it *g*)

Regards,
Jo
0
jo427 (1164)
2/14/2007 9:23:53 AM
Paul Rubin wrote:
> Sounds like finer grained profiling is needed.  If the Lisp program is
> spending 99% of its time in repeated tests and copies, there's some
> algorithmic difference, or some horrible lossage in the SBCL
> implementation (which in most regards is pretty good).  Usually a Lisp
> program is not 100x slower than the corresponding C program despite
> all the type tags and so forth.  So I'm a bit mystified.

The difference is algorithmic. You can get rid of the repeated tests by
restructuring the COND into many nested "if"s (a decision tree). OCaml does
this for you. Lisp does not.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/14/2007 2:25:15 PM
Joshua Dunfield wrote:
> The story of refinement types is instructive: despite inference being
> possible, it's undesirable.  Efficiency aside, once your type system
> captures more precise properties than plain Hindley-Milner, you don't
> want all of them inferred.  For example, if you have a function `f'
> that's only supposed to take nonempty lists (nonemptiness being an
> invariant easily captured by refinement types), you don't want the
> type system to allow you to apply `f' to empty lists just because
> `f's behavior on empty lists happens to be well-defined; that can
> lead to hard-to-find bugs, because you don't think of `f' as taking
> empty lists, and you forget that it can.  Or you change `f's
> implementation so it does something else with empty lists, and now
> other code breaks.  Having to write down in an annotation the
> behavior that `f' is supposed to have constrains use of `f', and
> makes your intentions clear to yourself and other people (including
> yourself a year later when you've forgotten everything).

Your original point is perfectly valid but your example is not. You can
leverage the static type system to ensure that "f" only gets lists with at
least one element by using the type 'a * 'a list.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/14/2007 2:33:45 PM
Wolfram Fenske wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
> But then you introduce another level of indirection--which won't make
> you program faster, either.

A pointer dereference is much cheaper than taking out a lock or copying an
entire data structure.

>> In practice, using immutable data structures frees you from some
>> low-level concerns and gives you more time to perform high-level
>> optimisations.
> 
> True, but that's a slightly different issue.

If you're allowing infinite time to optimise then machine code is
the "fastest" language. In practice, the time taken to optimise is
relevant.

> Joachim Durchholz talked 
> about a lot of low-level stuff that the runtime of an FPL doesn't have
> to do (mostly, no locking) and that this would be one of the reasons
> why working with immutable data structures was faster.  That's what I
> object to.  I'm by no means saying "functional languages are slow",
> but you can't claim that they (the "pure" ones in particular) are
> super fast, either.

I haven't said anything about purely functional languages. We're talking
about immutable data structures.

> If a language enables you to do more high-level 
> optimisations because you don't have to concern yourself with the
> low-level stuff then that's because the language is more high-level.

Yes.

> It doesn't have to do with being functional or not.

How many low level functional languages do you know?

Indeed, the easiest way to make mutable and immutable data structures
equivalent so that they can be interchanged easily is to present an API in
continuation passing style. You'll have trouble doing that in a
non-functional language.

>> I often find that judicious use of immutability results in faster
>> code.  Indeed, the only mutable data structures that I make much use
>> of are arrays and hash tables.
> 
> Arrays and hash tables, eh?  That's a pretty big exception.  After all
> these are the data structures that arguably see the most use, where
> performance matters the most.

That isn't worth saying unless you can quantify it. How often are different
data structures used in different languages?

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/14/2007 3:03:21 PM
Wolfram Fenske wrote:
>> No, it's usually the opposite: in a language with mutation, you're
>> either forced to use all data in a single-threaded fashion (which
>> places quite tight constraints on the design), or you have to copy
>> entire data structures "just to make sure it doesn't get
>> overwritten".
> 
> This sounded like you were saying that you have to copy *less* data if
> you're using immutable data structures.  I really doubt that's
> generally true.  Sure, in cases like the one above, it saves you
> unnecessary copying but in other cases you'll have to copy more (or at
> least allocate more memory, which is an expensive operation) because
> you can't mutate.  All in all I'm pretty sure this'll amount to *more*
> copying.  The fact that programs in functional languages tend to be
> more allocation intensive than ones in non-functional languages is a
> hint in that direction.

Allocation is not copying. You can't draw a meaningful comparison about the
rest without being more precise, e.g. maps are often more memory efficient
than hash tables because hash table must be kept largely empty in order to
be efficient.

>> Immutability can give massive performance advantages.
> 
> I'm sure it *can*.  The question is if it *usually* does.  I don't
> think so.

That depends entirely upon what you're doing. If you're doing a lot of
set-theoretic operations (e.g. graph theory) then immutable data structures
are highly likely to be easier to use and, consequently, faster. If you're
doing an FFT then you want a mutable array.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/14/2007 3:10:29 PM
In article <eqtoie$l6n$1@wolfberry.srv.cs.cmu.edu>, Joshua Dunfield wrote:
> 
> The lack of a decent function-level type annotation syntax in SML
> has given me plenty of grief, and I'm not sure it's possible to get
> good error messages from unification-based inference.

Good, I don't know, but it's certainly possible to cook up a
unification-using type system that gives adequate error messages.

The trick is to do as much bidirectional type checking as you can, and
only to generate unification constraints when you really need to (eg,
at applications of polymorphic functions). This has two benefits, one
minor, and one major. The minor benefit is that it keeps your
constraint sets small, which means that the unification errors are
(hopefully) easier to understand.

The major benefit is that bidirectionality means that type annotations
will actually keep programs from type checking. IIRC, in SML/NJ a type
annotation doesn't actually restrict the type of a term -- NJ infers
the most general type and then checks at the the end if the annotation
is an instance of it. But with a bidirectional algorithm, annotations
will actually restrict the type used in the rest of the program, which
means type errors are caught earlier.

For ML-style languages, IMO this gives you adequate error messages.
I'm not sure how to get up to genuinely good error messages, though.

-- 
Neel R. Krishnaswami
neelk@cs.cmu.edu
0
neelk (298)
2/14/2007 4:22:26 PM
Neelakantan Krishnaswami wrote:
> For ML-style languages, IMO this gives you adequate error messages.
> I'm not sure how to get up to genuinely good error messages, though.

OCaml gives good error messages, IMHO. But I wouldn't want the semantics you
describe because I find it quite useful to annotate a function with 'a
array to indicate that it accepts some kind of an array without specifying
the element type.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/14/2007 5:03:15 PM
In article <45d3425f$0$8740$ed2619ec@ptn-nntp-reader02.plus.net>, Jon Harrop 
wrote:
> Neelakantan Krishnaswami wrote:
>> For ML-style languages, IMO this gives you adequate error messages.
>> I'm not sure how to get up to genuinely good error messages, though.
> 
> OCaml gives good error messages, IMHO. But I wouldn't want the
> semantics you describe because I find it quite useful to annotate a
> function with 'a array to indicate that it accepts some kind of an
> array without specifying the element type.

No, Ocaml does something very close to what I describe, which is *why*
it gives adequate error messages.

You can compare the difference by comparing the error messages that
OCaml and SML/NJ give for the following erroneous function:

  # let g : ('a array -> 'a) = fun x -> fst x;;  (* Ocaml *)

  - val g : ('a array -> 'a) = fn x => fst x;    (* SML/NJ *)

Here's New Jersey's error message:

stdIn:1.1-4.23 Error: pattern and expression in val dec don't agree [tycon 
mismatch]
  pattern:    'a array -> 'a
  expression:    'Z * 'Y -> 'Z
  in declaration:
    g : 'a array -> 'a = (fn x => fst x)

You can see that SML/NJ inferred a type for (fn x => fst x), and then
complained when the whole expression didn't have the ascribed type;
it's complaining that 'a array -> 'a is a different type than 
'z * 'y -> 'z. 

On the other hand, Ocaml uses the type annotation to constrain the
type of the function argument, which allows it give a more precise
error message, right at the point of mis-usage:

  Characters 40-41:
    let g : ('a array -> 'a) = fun x -> fst x;;
                                            ^
  This expression has type 'a array but is here used with type 'b * 'c

Observe that it is complaining that x, an array, is being used at a
pair type. It is flowing the information from a type annotation into
the leaves of the AST, which is exactly what a bidirectional type
checking algorithm makes it easy to do.

-- 
Neel R. Krishnaswami
neelk@cs.cmu.edu
0
neelk (298)
2/14/2007 5:53:24 PM
On Feb 14, 9:25 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> Paul Rubin wrote:
> > Sounds like finer grained profiling is needed.  If the Lisp program is
> > spending 99% of its time in repeated tests and copies, there's some
> > algorithmic difference, or some horrible lossage in the SBCL
> > implementation (which in most regards is pretty good).  Usually a Lisp
> > program is not 100x slower than the corresponding C program despite
> > all the type tags and so forth.  So I'm a bit mystified.
>
> The difference is algorithmic. You can get rid of the repeated tests by
> restructuring the COND into many nested "if"s (a decision tree). OCaml does
> this for you. Lisp does not.

It is not quite that simple.  I had a go at optimizing the Lisp
function
this morning; Jon was kind enough to send me his benchmarking code via
private email.  I've played around quite a bit and come up with
several
different versions of the original function:

SIMPLIFY-ORIG: The original version posted.
SIMPLIFY-SPEED: Add (DECLARE (OPTIMIZE (SPEED 3))).
SIMPLIFY-INLINE-LIST: Use LIST instead of backquotes.  The compiler
  (SBCL) inlines allocation when LIST is used, but not when using
  backquotes.
SIMPLIFY-SMART-CONS: Avoid consing the original expression back up if
we
  failed to simplify anything.
SIMPLIFY-NO-DBIND: Avoid the use of 'destructuring-bind' by manual
  inlining.  Use EQL instead of ZEROP and =.
SIMPLIFY-REARRANGED-TESTS: Move common checks up to be tested sooner.
SIMPLIFY-DECISION-TREE: Avoid checking '+?' and '*?' multiple times.
SIMPLIFY-VEC-EXPR: Input is a vector, rather than a list.

Each version includes all the optimizations of the previous versions.
I'm
running SBCL 1.0.2.20 on a 1.66GHz Core Duo laptop with 2GB of RAM.
All
times quoted below are in seconds and are the 'real time' reported by
TIME.  For reference, I've also noted the amount of consing each
version did, since I did do some optimizations to reduce consing, and
the percentage improvement relative to the original version:

SIMPLIFY-ORIG:             7.25s, 430MB consed
SIMPLIFY-SPEED:            7.04s, 430MB consed (3%)
SIMPLIFY-INLINE-LIST:      5.96s, 430MB consed (15%)
SIMPLIFY-SMART-CONS:       4.93s, 48MB consed (17%)
SIMPLIFY-NO-DBIND:         1.78s, 48MB consed (64%)
SIMPLIFY-REARRANGED-TESTS: 1.64s, 44MB consed (8%)
SIMPLIFY-DECISION-TREE:    1.40s, 44MB consed (15%)
SIMPLIFY-VEC-EXPR:         0.46s, 21MB consed (67%)
  (full bounds checking done by the compiler)
SIMPLIFY-VEC-EXPR:         0.26 s, 21MB consed (43%)
  (bounds checking + declarations to eliminate nearly all of
   the bounds checks)
SIMPLIFY-VEC-EXPR:         0.23s, 21MB consed (12%)
  (bounds checking turned off entirely via compiler switches)

The big wins are inlining functions (DESTRUCTURING-BIND is very
expensive),
whether by the compiler or by the user and proper data structure
choices.
This should not come as a surprise to anybody here.  The algorithmic
choices that Jon alluded to are actually a pretty small percentage of
the
overall improvement.

I didn't test whether using DEFSTRUCT would have the same effect as
using
vectors; I suspect that it would, as SBCL is very good about
optimizing
DEFSTRUCT and accessors defined from it.  Using DEFSTRUCT would also
make
the Lisp more closely approximate the O'Caml code.  It's likely that
even
without "algorithmic" improvements to the code that a version using
DEFSTRUCT/vectors would give the O'Caml code a run for its money.

(Lest people scramble to shout that my best posted time is still an
order
of magnitude away from Jon's best time, I will point out there's a bug
in
the implementation Jon posted and timed earlier.  Fixing the bug
brings
the O'Caml times in line with the fastest time I posted above.)

I will say that everything prior to SIMPLIFY-DECISION-TREE is pretty
clean,
pretty close to the original with minor modifications.
SIMPLIFY-DECISION-TREE is somewhat ugly, as it uses a TAGBODY and
explicit
GOs.  SIMPLIFY-VEC-EXPR is a bit uglier, given that I didn't abstract
away
the allocation of new expressions, so there's MAKE-ARRAYs and SVREFs
all
over the place.  DEFSTRUCT would likely help alleviate some of this.

-Nathan

0
froydnj (14)
2/14/2007 6:52:42 PM
Neelakantan Krishnaswami wrote:
> No, Ocaml does something very close to what I describe, which is *why*
> it gives adequate error messages.

I must have misunderstood your proposal.

> You can compare the difference by comparing the error messages that
> OCaml and SML/NJ give for the following erroneous function:
> 
>   # let g : ('a array -> 'a) = fun x -> fst x;;  (* Ocaml *)
> 
>   - val g : ('a array -> 'a) = fn x => fst x;    (* SML/NJ *)
> 
> Here's New Jersey's error message:
> 
> stdIn:1.1-4.23 Error: pattern and expression in val dec don't agree [tycon
> mismatch]
>   pattern:    'a array -> 'a
>   expression:    'Z * 'Y -> 'Z
>   in declaration:
>     g : 'a array -> 'a = (fn x => fst x)
> 
> You can see that SML/NJ inferred a type for (fn x => fst x), and then
> complained when the whole expression didn't have the ascribed type;
> it's complaining that 'a array -> 'a is a different type than
> 'z * 'y -> 'z.
> 
> On the other hand, Ocaml uses the type annotation to constrain the
> type of the function argument, which allows it give a more precise
> error message, right at the point of mis-usage:
> 
>   Characters 40-41:
>     let g : ('a array -> 'a) = fun x -> fst x;;
>                                             ^
>   This expression has type 'a array but is here used with type 'b * 'c
> 
> Observe that it is complaining that x, an array, is being used at a
> pair type. It is flowing the information from a type annotation into
> the leaves of the AST, which is exactly what a bidirectional type
> checking algorithm makes it easy to do.

Ok. I thought you were advocating type declarations instead of annotations,
so the compiler would object to a unequal type rather than a unifiable
type, i.e. giving a type "'a array" would fail if the type ended up as "int
array".

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/14/2007 8:42:34 PM
Thanks to Nathan for pointing out a major bug in the code that I sent him:
it was just traversing the expr and not simplifying it! There is also an
algorithmic discrepancy between my fixed OCaml and the Lisp, which gives
another performance boost to the OCaml.

So, I believe equivalent OCaml can be written most elegantly in terms of
simplifying constructors:

let rec ( +: ) f g = match f, g with
  | Q n, Q m -> Q (n +/ m)
  | Q (Int 0), e | e, Q (Int 0) -> e
  | f, Add(g, h) -> f +: g +: h
  | f, g -> Add(f, g)

let rec ( *: ) f g = match f, g with
  | Q n, Q m -> Q (n */ m)
  | Q (Int 0), e | e, Q (Int 0) -> Q (Int 0)
  | Q (Int 1), e | e, Q (Int 1) -> e
  | f, Mul(g, h) -> f *: g *: h
  | f, g -> Mul(f, g)

let rec simplify = function
  | Q _ | Var _ as e -> e
  | Add(f, g) -> simplify f +: simplify g
  | Mul(f, g) -> simplify f *: simplify g

On my machine (2.2GHz AMD64 running Debian x86_64), that takes 0.200s to
simplify a 12Mb expression, best of 3.

froydnj@gmail.com wrote:
> SIMPLIFY-ORIG:             7.25s, 430MB consed
> SIMPLIFY-SPEED:            7.04s, 430MB consed (3%)
> SIMPLIFY-INLINE-LIST:      5.96s, 430MB consed (15%)
> SIMPLIFY-SMART-CONS:       4.93s, 48MB consed (17%)
> SIMPLIFY-NO-DBIND:         1.78s, 48MB consed (64%)
> SIMPLIFY-REARRANGED-TESTS: 1.64s, 44MB consed (8%)
> SIMPLIFY-DECISION-TREE:    1.40s, 44MB consed (15%)
> SIMPLIFY-VEC-EXPR:         0.46s, 21MB consed (67%)
>   (full bounds checking done by the compiler)
> SIMPLIFY-VEC-EXPR:         0.26 s, 21MB consed (43%)
>   (bounds checking + declarations to eliminate nearly all of
>    the bounds checks)
> SIMPLIFY-VEC-EXPR:         0.23s, 21MB consed (12%)
>   (bounds checking turned off entirely via compiler switches)

The last three seem to require the expression to be transformed into a
different (vector) representation. I haven't got the code to do this but I
can present timings for the others (best of 3):

8.807s
6.651s
4.216s
4.038s
1.589s
1.541s
1.337s (84Mb consed)

In all cases, the expression was simplified down to the same 635kB as the
OCaml.

So, while Nathan's optimisations have yielded a 6.6x improvement in Lisp's
performance, it remains 6.2x slower than the OCaml and is now much longer.
For comparison, here is the fastest Lisp function (that doesn't use
vectors):

(defun simplify-no-redundant-checks (xexpr)
  (declare (optimize (speed 3)))
  (if (atom xexpr)
      xexpr
      (let ((op (first xexpr))
            (z (second xexpr))
            (y (third xexpr)))
        (let* ((f (simplify-no-redundant-checks z))
               (g (simplify-no-redundant-checks y))
               (nf (numberp f))
               (ng (numberp g)))
          (tagbody
           START
             (if (eq '+ op) (go OPTIMIZE-PLUS) (go TEST-MULTIPLY))
           OPTIMIZE-PLUS
             (when (and nf ng) (return-from simplify-no-redundant-checks (+
f g)))
           TEST-PLUS-ZEROS
             (when (eql f 0) (return-from simplify-no-redundant-checks g))
             (when (eql g 0) (return-from simplify-no-redundant-checks f))
             (go REARRANGE-EXPR)
           TEST-MULTIPLY
             (unless (eq '* op) (go REARRANGE-EXPR))
           OPTIMIZE-MULTIPLY
             (when (and nf ng) (return-from simplify-no-redundant-checks (*
f g)))
           TEST-MULTIPLY-ZEROS-AND-ONES
             (when (or (eql f 0) (eql g 0)) (return-from
simplify-no-redundant-checks 0))
             (when (eql f 1) (return-from simplify-no-redundant-checks g))
             (when (eql g 1) (return-from simplify-no-redundant-checks f))
           REARRANGE-EXPR
             (when (and (listp g) (eq op (first g)))
               (let ((op2 (first g))
                     (u (second g))
                     (v (third g)))
                 (declare (ignore op2))
                 (return-from simplify-no-redundant-checks
                   (simplify-no-redundant-checks (list op (list op f u)
v)))))
           MAYBE-CONS-EXPR
             (if (and (eq f z) (eq g y))
                 (return-from simplify-no-redundant-checks xexpr)
                 (return-from simplify-no-redundant-checks (list op f
g))))))))

I think these results speak for themselves.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/14/2007 11:15:14 PM
Jon Harrop <jon@ffconsultancy.com> writes:

> Wolfram Fenske wrote:
>>> No, it's usually the opposite: in a language with mutation, you're
>>> either forced to use all data in a single-threaded fashion (which
>>> places quite tight constraints on the design), or you have to copy
>>> entire data structures "just to make sure it doesn't get
>>> overwritten".
>>
>> This sounded like you were saying that you have to copy *less* data if
>> you're using immutable data structures.  I really doubt that's
>> generally true.  Sure, in cases like the one above, it saves you
>> unnecessary copying but in other cases you'll have to copy more (or at
>> least allocate more memory, which is an expensive operation) because
>> you can't mutate.  All in all I'm pretty sure this'll amount to *more*
>> copying.  The fact that programs in functional languages tend to be
>> more allocation intensive than ones in non-functional languages is a
>> hint in that direction.
>
> Allocation is not copying.

High allocation reates come with the side-effect free style,
i. e. with using immutable data structures/treating data structures as
immutable.  Take, for example, `reverse' vs. `nreverse' in Common
Lisp.  `reverse' allocates a new list as its result.  `nreverse',
OTOH, reverses a list in place, which is more efficient.  But usually
you'll use `reverse' and not `nreverse' because (part of) the original
list might still be used somewhere else.  What did we do just there?
We copied data "just to make sure we don't overwrite it."

> You can't draw a meaningful comparison about the rest without being
> more precise, e.g. maps are often more memory efficient than hash
> tables because hash table must be kept largely empty in order to be
> efficient.

(Not if you use a good hash function, e. g. this one:
<http://www.burtleburtle.net/bob/hash/evahash.html>.)  I see your
point but I don't think this is the issue here (unless, of course, you
were saying that immutable data structures inherently need more
memory).

>>> Immutability can give massive performance advantages.
>>
>> I'm sure it *can*.  The question is if it *usually* does.  I don't
>> think so.
>
> That depends entirely upon what you're doing. If you're doing a lot
> of set-theoretic operations (e.g. graph theory) then immutable data
> structures are highly likely to be easier to use and, consequently,
> faster. If you're doing an FFT then you want a mutable array.

I agree.

--
Wolfram Fenske

A: Yes.
>Q: Are you sure?
>>A: Because it reverses the logical flow of conversation.
>>>Q: Why is top posting frowned upon?

0
int2k (134)
2/14/2007 11:38:16 PM
Jon Harrop <jon@ffconsultancy.com> writes:

> Wolfram Fenske wrote:
>> Jon Harrop <jon@ffconsultancy.com> writes:
>> But then you introduce another level of indirection--which won't make
>> you program faster, either.
>
> A pointer dereference is much cheaper than taking out a lock or
> copying an entire data structure.

Jumping all over memory will result in poor cache performance.

Also, I'm still not sure that just referring back to the original data
structure is all that's needed.  I can easily imagine that with linked
lists, where you just add a cell in front of some other part of the
list.  But the original example were structures passed into the
Windows API.  These are usually flat records with, I don't know, 10
fields or so.  How would sharing of structure work there?

>>> In practice, using immutable data structures frees you from some
>>> low-level concerns and gives you more time to perform high-level
>>> optimisations.
>>
>> True, but that's a slightly different issue.
>
> If you're allowing infinite time to optimise then machine code is
> the "fastest" language. In practice, the time taken to optimise is
> relevant.

Yes, the Turing-complete argument.  I didn't use it.

[...]

>> If a language enables you to do more high-level optimisations
>> because you don't have to concern yourself with the low-level stuff
>> then that's because the language is more high-level.
>
> Yes.
>
>> It doesn't have to do with being functional or not.
>
> How many low level functional languages do you know?

You misunderstood: "Functional" as in "free of side-effects", not as
in "higher order functions."  After all, this is about immutable data
structures.

PS: I noticed your latest F# News posting, "Avoiding copying in
functional programming languages".  I wonder where that came from. ;-)

--
Wolfram Fenske

A: Yes.
>Q: Are you sure?
>>A: Because it reverses the logical flow of conversation.
>>>Q: Why is top posting frowned upon?

0
int2k (134)
2/15/2007 12:20:38 AM
Jon Harrop wrote:
> Thanks to Nathan for pointing out a major bug in the code that I sent him:
> it was just traversing the expr and not simplifying it! There is also an
> algorithmic discrepancy between my fixed OCaml and the Lisp, which gives
> another performance boost to the OCaml.
> 
> So, I believe equivalent OCaml can be written most elegantly in terms of
> simplifying constructors:
> 
> let rec ( +: ) f g = match f, g with
>   | Q n, Q m -> Q (n +/ m)
>   | Q (Int 0), e | e, Q (Int 0) -> e
>   | f, Add(g, h) -> f +: g +: h
>   | f, g -> Add(f, g)
> 
> let rec ( *: ) f g = match f, g with
>   | Q n, Q m -> Q (n */ m)
>   | Q (Int 0), e | e, Q (Int 0) -> Q (Int 0)
>   | Q (Int 1), e | e, Q (Int 1) -> e
>   | f, Mul(g, h) -> f *: g *: h
>   | f, g -> Mul(f, g)
> 
> let rec simplify = function
>   | Q _ | Var _ as e -> e
>   | Add(f, g) -> simplify f +: simplify g
>   | Mul(f, g) -> simplify f *: simplify g

A more idiomatic version in Common Lisp is this:

(defstruct add x y)
(defstruct mul x y)

(defgeneric simplify-exp (op arg1 arg2)
   (:method ((exp add) (x number) (y number)) (+ x y))
   (:method ((exp add) (x (eql 0)) y) y)
   (:method ((exp add) x (y (eql 0))) x)
   (:method ((exp add) x (y add))
    (simplify-exp exp (make-add :x x :y (add-x y)) (add-y y)))
   (:method ((exp add) x y) exp)

   (:method ((exp mul) (x number) (y number)) (* x y))
   (:method ((exp mul) (x (eql 0)) y) 0)
   (:method ((exp mul) x (y (eql 0))) 0)
   (:method ((exp mul) (x (eql 1)) y) y)
   (:method ((exp mul) x (y (eql 1))) x)
   (:method ((exp mul) x (y mul))
    (simplify-exp exp (make-mul :x x :y (mul-x y)) (mul-y y)))
   (:method ((exp mul) x y) exp))

(defgeneric simplify (exp)
   (:method (exp) exp)
   (:method ((exp add))
    (simplify-exp exp (simplify (add-x exp)) (simplify (add-y exp))))
   (:method ((exp mul))
    (simplify-exp exp (simplify (mul-x exp)) (simplify (mul-y exp)))))


Since I am not interested in microbenchmarks, I haven't measured the 
runtimes. Can your version easily be extended at runtime?


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
0
pc56 (3929)
2/15/2007 12:44:23 AM
Wolfram Fenske wrote:
> High allocation reates come with the side-effect free style,
> i. e. with using immutable data structures/treating data structures as
> immutable.  Take, for example, `reverse' vs. `nreverse' in Common
> Lisp.  `reverse' allocates a new list as its result.  `nreverse',
> OTOH, reverses a list in place, which is more efficient.  But usually
> you'll use `reverse' and not `nreverse' because (part of) the original
> list might still be used somewhere else.  What did we do just there?
> We copied data "just to make sure we don't overwrite it."

There is little point in comparing the performance of functions that do
different things. You're much better off comparing the performance of
programs that do the same thing but by different means. In this case, using
nreverse may force you to use a copy elsewhere.

>> You can't draw a meaningful comparison about the rest without being
>> more precise, e.g. maps are often more memory efficient than hash
>> tables because hash table must be kept largely empty in order to be
>> efficient.
> 
> (Not if you use a good hash function, e. g. this one:
> <http://www.burtleburtle.net/bob/hash/evahash.html>.)

Whatever hash function you use, you want your hash table to be >50% empty.

> I see your 
> point but I don't think this is the issue here (unless, of course, you
> were saying that immutable data structures inherently need more
> memory).

Immutable data structures contain lots of pointers as well as data. They
consume memory. Whether or not they consume more memory depends which
mutable data structure you're comparing them to.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/15/2007 12:44:55 AM
Wolfram Fenske wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
>> Wolfram Fenske wrote:
>>> Jon Harrop <jon@ffconsultancy.com> writes:
>>> But then you introduce another level of indirection--which won't make
>>> you program faster, either.
>>
>> A pointer dereference is much cheaper than taking out a lock or
>> copying an entire data structure.
> 
> Jumping all over memory will result in poor cache performance.

Yet hash tables are fast.

> Also, I'm still not sure that just referring back to the original data
> structure is all that's needed.  I can easily imagine that with linked
> lists, where you just add a cell in front of some other part of the
> list.  But the original example were structures passed into the
> Windows API.  These are usually flat records with, I don't know, 10
> fields or so.  How would sharing of structure work there?

I don't know anything about the Windows API.

>>>> In practice, using immutable data structures frees you from some
>>>> low-level concerns and gives you more time to perform high-level
>>>> optimisations.
>>>
>>> True, but that's a slightly different issue.
>>
>> If you're allowing infinite time to optimise then machine code is
>> the "fastest" language. In practice, the time taken to optimise is
>> relevant.
> 
> Yes, the Turing-complete argument.  I didn't use it.

That isn't really the Turing argument (which is to do with computability,
rather than performance) but I thought you were tending in that direction
by disregarding the relevance of higher-level programming.

My point is, if you compare immutable data structures with mutable data
structures on an individual level then the mutable data structures will
always appear to be faster. However, that does not carry through to the
optimisation of whole programs because mutable data structures can be much
more difficult to use.

To see this, just try to implement set union efficiently, or any algorithm
that uses set-theoretic operations.

>>> If a language enables you to do more high-level optimisations
>>> because you don't have to concern yourself with the low-level stuff
>>> then that's because the language is more high-level.
>>
>> Yes.
>>
>>> It doesn't have to do with being functional or not.
>>
>> How many low level functional languages do you know?
> 
> You misunderstood: "Functional" as in "free of side-effects"

That's "pure" not "functional". Most functional languages (Lisp, Scheme,
SML, OCaml, F#) are impure and allow side-effects. However, many provide
immutable data structures that are much easier to use efficiently than the
data structures provided with C++, Java and so on.

> PS: I noticed your latest F# News posting, "Avoiding copying in
> functional programming languages".  I wonder where that came from. ;-)

That came from a thread on the OCaml beginners list.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/15/2007 12:56:19 AM
Pascal Costanza wrote:
> Since I am not interested in microbenchmarks, I haven't measured the
> runtimes.

Can you give some examples of simplifying symbolic expressions using this
code? I can't get it to work.

> Can your version easily be extended at runtime? 

No, it is completely inextensible. If you were implementing a CAS then you
would start with an inextensible but fast core and extend it to support
run-time generated replacement rules.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/15/2007 1:04:11 AM
Jon Harrop <jon@ffconsultancy.com> writes:
> Whatever hash function you use, you want your hash table to be >50% empty.

You haven't been reading your Knuth vol. 3 ;-).
0
phr.cx (5493)
2/15/2007 1:23:29 AM
On Feb 14, 6:15 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> The last three seem to require the expression to be transformed into a
> different (vector) representation. I haven't got the code to do this but I
> can present timings for the others (best of 3):

The code to convert the representation to vectors is simple:

(defun vectorize (e)
  (if (atom e)
      e
      (map 'vector #'vectorize e))))


> So, while Nathan's optimisations have yielded a 6.6x improvement in Lisp's
> performance, it remains 6.2x slower than the OCaml and is now much longer.

I would like to point out here that the OCaml and Lisp codes that you
are comparing use fundamentally different representations.  Using
vectors
more closely approximates the OCaml code.  See also below for more
improvements.

> For comparison, here is the fastest Lisp function (that doesn't use
> vectors):
>
> [...ugly Lisp code snipped...]
>
> I think these results speak for themselves.

You're right, that's not particularly elegant.  This version is much
better:

(defun simplify-common-funcs (xexpr)
  (declare (optimize (speed 3)))
  (if (atom xexpr)
      xexpr
      (let ((op (first xexpr))
            (z (second xexpr))
            (y (third xexpr)))
        (labels ((make-expr (op f g)
                   (list op f g))
                 (optimize-plus (f nf g ng)
                   (cond
                     ((and nf ng) (+ f g))
                     ((eql f 0) g)
                     ((eql g 0) f)
                     (t (optimize-general '+ f g))))
                 (optimize-multiply (f nf g ng)
                   (cond
                     ((and nf ng) (* f g))
                     ((or (eql f 0) (eql g 0)) 0)
                     ((eql f 1) g)
                     ((eql g 1) f)
                     (t (optimize-general '* f g))))
                 (optimize-general (op f g)
                   (cond
                     ((and (listp g) (eq op (first g)))
                      (let ((op2 (first g))
                            (u (second g))
                            (v (third g)))
                        (declare (ignore op2))
                        (make-expr op (make-expr op f u) v)))
                     (t (if (and (eq f z) (eq g y))
                            xexpr
                            (make-expr op f g))))))
          (declare (inline make-expr))
          (let* ((f (simplify-common-funcs z))
                 (g (simplify-common-funcs y))
                 (nf (numberp f))
                 (ng (numberp g)))
            (cond
              ((eq op '+) (optimize-plus f nf g ng))
              ((eq op '*) (optimize-multiply f nf g ng))
              (t (optimize-general op f g))))))))

It also has the advantage of being significantly faster than any
of the other list-based versions benchmarked above.  For example,
your latest OCaml code on three randomly generated one million
term expressions (1.66 GHz Core Duo, 2GB RAM) yields the
following times:

0.19 seconds
0.16 seconds
2.25 seconds

Yes, I know the last one is a big outlier.  But it was consistent
from run to run.

Now the Lisp function above on the same three expressions:

0.28 seconds
0.26 seconds
0.33 seconds

So now the Lisp code is under a factor of 2 slower.  And
if we use vectors to make the comparison more equitable:

(defun simplify-common-funcs-vec (xexpr)
  (declare (optimize (speed 3)))
  (if (not (typep xexpr 'simple-vector))
      xexpr
      (locally (declare (type (simple-array t (3)) xexpr))
        (let ((op (svref xexpr 0))
              (z (svref xexpr 1))
              (y (svref xexpr 2)))
          (labels ((make-expr (op f g)
                     (let ((vec (make-array 3)))
                       (setf (svref vec 0) op
                             (svref vec 1) f
                             (svref vec 2) g)))
                   (optimize-plus (f nf g ng)
                     (cond
                       ((and nf ng) (+ f g))
                       ((eql f 0) g)
                       ((eql g 0) f)
                       (t (optimize-general '+ f g))))
                   (optimize-multiply (f nf g ng)
                     (cond
                       ((and nf ng) (* f g))
                       ((or (eql f 0) (eql g 0)) 0)
                       ((eql f 1) g)
                       ((eql g 1) f)
                       (t (optimize-general '* f g))))
                   (optimize-general (op f g)
                     (cond
                       ((and (typep g 'simple-vector)
                             (eq op (svref g 0)))
                        (let ((op2 (svref g 0))
                              (u (svref g 1))
                              (v (svref g 2)))
                          (declare (ignore op2))
                          (make-expr op (make-expr op f u) v)))
                       (t (if (and (eq f z) (eq g y))
                              xexpr
                              (make-expr op f g))))))
            (declare (inline make-expr))
            (let* ((f (simplify-common-funcs z))
                   (g (simplify-common-funcs y))
                   (nf (numberp f))
                   (ng (numberp g)))
              (cond
                ((eq op '+) (optimize-plus f nf g ng))
                ((eq op '*) (optimize-multiply f nf g ng))
                (t (optimize-general op f g)))))))))

The Lisp version runs in less than a tenth of a second.

So maybe if I want shorter code, I'll write OCaml, but if I
want speed, I'll use Lisp. ;)

-Nathan

0
froydnj (14)
2/15/2007 1:35:27 AM
Nathan Froyd wrote:
>> So, while Nathan's optimisations have yielded a 6.6x improvement in
>> Lisp's performance, it remains 6.2x slower than the OCaml and is now much
>> longer.
> 
> I would like to point out here that the OCaml and Lisp codes that you
> are comparing use fundamentally different representations.

You mean the Lisp has T(1+n) access to the "n"th argument of + or * because
it is a list whereas OCaml has T(1) access?

That is true, but there are only 2 arguments and the OCaml has unnecessary
indirections not present in the Lisp, e.g. '(+ 1 2) is Add(Q(Int 1), Q(Int
2)).

> Using vectors more closely approximates the OCaml code.

Maybe.

> It also has the advantage of being significantly faster than any
> of the other list-based versions benchmarked above.

Unfortunately it has the disadvantage that it produces different results. I
think it is the same bug I had: it isn't recursing when it should to
rewrite already rewritten terms. This is most easily fixed by factoring the
recursion into a separate "rewrite" function.

I just tested your vectorized Lisp and it is also wrong. However, it managed
to reduce this function of "x":

  #(+ #(+ 0 #(+ #(+ 1 #(+ X #(+ 2 #(+ #(+ 0 X) 1)))) 2)) #(+ 1 #(* 0 #(+ X
2))))

down to "7", which is impressively wrong. ;-)

> So maybe if I want shorter code, I'll write OCaml, but if I
> want speed, I'll use Lisp. ;)

Keep trying. Meanwhile, I'll start optimising the OCaml...

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/15/2007 2:24:41 AM
On Feb 15, 2:35 am, "Nathan Froyd" <froy...@gmail.com> wrote:
> On Feb 14, 6:15 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
>
> > The last three seem to require the expression to be transformed into a
> > different (vector) representation. I haven't got the code to do this but I
> > can present timings for the others (best of 3):
>
> The code to convert the representation to vectors is simple:
>
> (defun vectorize (e)
>   (if (atom e)
>       e
>       (map 'vector #'vectorize e))))
>
> > So, while Nathan's optimisations have yielded a 6.6x improvement inLisp's
> > performance, it remains 6.2x slower than the OCaml and is now much longer.
>
> I would like to point out here that the OCaml andLispcodes that you
> are comparing use fundamentally different representations.  Using
> vectors
> more closely approximates the OCaml code.  See also below for more
> improvements.
>
> > For comparison, here is the fastestLispfunction (that doesn't use
> > vectors):
>
> > [...uglyLispcode snipped...]
>
> > I think these results speak for themselves.
>
> You're right, that's not particularly elegant.  This version is much
> better:
>
> (defun simplify-common-funcs (xexpr)
>   (declare (optimize (speed 3)))
>   (if (atom xexpr)
>       xexpr
>       (let ((op (first xexpr))
>             (z (second xexpr))
>             (y (third xexpr)))
>         (labels ((make-expr (op f g)
>                    (list op f g))
>                  (optimize-plus (f nf g ng)
>                    (cond
>                      ((and nf ng) (+ f g))
>                      ((eql f 0) g)
>                      ((eql g 0) f)
>                      (t (optimize-general '+ f g))))
>                  (optimize-multiply (f nf g ng)
>                    (cond
>                      ((and nf ng) (* f g))
>                      ((or (eql f 0) (eql g 0)) 0)
>                      ((eql f 1) g)
>                      ((eql g 1) f)
>                      (t (optimize-general '* f g))))
>                  (optimize-general (op f g)
>                    (cond
>                      ((and (listp g) (eq op (first g)))
>                       (let ((op2 (first g))
>                             (u (second g))
>                             (v (third g)))
>                         (declare (ignore op2))
>                         (make-expr op (make-expr op f u) v)))
>                      (t (if (and (eq f z) (eq g y))
>                             xexpr
>                             (make-expr op f g))))))
>           (declare (inline make-expr))
>           (let* ((f (simplify-common-funcs z))
>                  (g (simplify-common-funcs y))
>                  (nf (numberp f))
>                  (ng (numberp g)))
>             (cond
>               ((eq op '+) (optimize-plus f nf g ng))
>               ((eq op '*) (optimize-multiply f nf g ng))
>               (t (optimize-general op f g))))))))
>
> It also has the advantage of being significantly faster than any
> of the other list-based versions benchmarked above.  For example,
> your latest OCaml code on three randomly generated one million
> term expressions (1.66 GHz Core Duo, 2GB RAM) yields the
> following times:
>
> 0.19 seconds
> 0.16 seconds
> 2.25 seconds
>
> Yes, I know the last one is a big outlier.  But it was consistent
> from run to run.
>
> Now theLispfunction above on the same three expressions:
>
> 0.28 seconds
> 0.26 seconds
> 0.33 seconds
>
> So now theLispcode is under a factor of 2 slower.  And
> if we use vectors to make the comparison more equitable:
>
> (defun simplify-common-funcs-vec (xexpr)
>   (declare (optimize (speed 3)))
>   (if (not (typep xexpr 'simple-vector))
>       xexpr
>       (locally (declare (type (simple-array t (3)) xexpr))
>         (let ((op (svref xexpr 0))
>               (z (svref xexpr 1))
>               (y (svref xexpr 2)))
>           (labels ((make-expr (op f g)
>                      (let ((vec (make-array 3)))
>                        (setf (svref vec 0) op
>                              (svref vec 1) f
>                              (svref vec 2) g)))
>                    (optimize-plus (f nf g ng)
>                      (cond
>                        ((and nf ng) (+ f g))
>                        ((eql f 0) g)
>                        ((eql g 0) f)
>                        (t (optimize-general '+ f g))))
>                    (optimize-multiply (f nf g ng)
>                      (cond
>                        ((and nf ng) (* f g))
>                        ((or (eql f 0) (eql g 0)) 0)
>                        ((eql f 1) g)
>                        ((eql g 1) f)
>                        (t (optimize-general '* f g))))
>                    (optimize-general (op f g)
>                      (cond
>                        ((and (typep g 'simple-vector)
>                              (eq op (svref g 0)))
>                         (let ((op2 (svref g 0))
>                               (u (svref g 1))
>                               (v (svref g 2)))
>                           (declare (ignore op2))
>                           (make-expr op (make-expr op f u) v)))
>                        (t (if (and (eq f z) (eq g y))
>                               xexpr
>                               (make-expr op f g))))))
>             (declare (inline make-expr))
>             (let* ((f (simplify-common-funcs z))
>                    (g (simplify-common-funcs y))
>                    (nf (numberp f))
>                    (ng (numberp g)))
>               (cond
>                 ((eq op '+) (optimize-plus f nf g ng))
>                 ((eq op '*) (optimize-multiply f nf g ng))
>                 (t (optimize-general op f g)))))))))
>
> TheLispversion runs in less than a tenth of a second.
>
> So maybe if I want shorter code, I'll write OCaml, but if I
> want speed, I'll useLisp. ;)
>
> -Nathan

Nice stuff.

Or if you want even nicer and more domain level code, you can use
Lisp, too:

(setf *simplification-rules* (mapcar #'infix->prefix '(
  (x + 0  = x)
  (0 + x  = x)
  (x + x  = 2 * x)
  (x - 0  = x)
  (0 - x  = - x)
  (x - x  = 0)
  (- - x  = x)
  (x * 1  = x)
  (1 * x  = x)
  (x * 0  = 0)
  (0 * x  = 0)
  (x * x  = x ^ 2)
  (x / 0  = undefined)
  (0 / x  = 0)
  (x / 1  = x)
  (x / x  = 1)
  (0 ^ 0  = undefined)
  (x ^ 0  = 1)
  (0 ^ x  = 0)
  (1 ^ x  = 1)
  (x ^ 1  = x)
  (x ^ -1 = 1 / x)
  (x * (y / x) = y)
  ((y / x) * x = y)
  ((y * x) / x = y)
  ((x * y) / x = y)
  (x + - x = 0)
  ((- x) + x = 0)
  (x + y - x = y)
  )))

(setf *simplification-rules*
 (append *simplification-rules* (mapcar #'simp-rule
  '((s * n = n * s)
    (n * (m * x) = (n * m) * x)
    (x * (n * y) = n * (x * y))
    ((n * x) * y = n * (x * y))
    (n + s = s + n)
    ((x + m) + n = x + n + m)
    (x + (y + n) = (x + y) + n)
    ((x + n) + y = (x + y) + n)))))

(setf *simplification-rules*
 (append *simplification-rules* (mapcar #'simp-rule '(
  (log 1         = 0)
  (log 0         = undefined)
  (log e         = 1)
  (sin 0         = 0)
  (sin pi        = 0)
  (cos 0         = 1)
  (cos pi        = -1)
  (sin(pi / 2)   = 1)
  (cos(pi / 2)   = 0)
  (log (e ^ x)   = x)
  (e ^ (log x)   = x)
  ((x ^ y) * (x ^ z) = x ^ (y + z))
  ((x ^ y) / (x ^ z) = x ^ (y - z))
  (log x + log y = log(x * y))
  (log x - log y = log(x / y))
  ((sin x) ^ 2 + (cos x) ^ 2 = 1)
  ))))


(setf *simplification-rules*
 (append *simplification-rules* (mapcar #'simp-rule '(
  (d x / d x       = 1)
  (d (u + v) / d x = (d u / d x) + (d v / d x))
  (d (u - v) / d x = (d u / d x) - (d v / d x))
  (d (- u) / d x   = - (d u / d x))
  (d (u * v) / d x = u * (d v / d x) + v * (d u / d x))
  (d (u / v) / d x = (u * (d v / d x) - v * (d u / d x))
                     / v ^ 2)
  (d (u ^ n) / d x = n * u ^ (n - 1) * (d u / d x))
  (d (u ^ v) / d x = v * u ^ (v - 1) * (d u / d x)
                   + u ^ v * (log u) * (d v / d x))
  (d (log u) / d x = (d u / d x) / u)
  (d (sin u) / d x = (cos u) * (d u / d x))
  (d (cos u) / d x = - (sin u) * (d u / d x))
  (d (exp u) / d x = (exp u) * (d u / d x))
  (d u / d x       = 0)))))


(integration-table
  '((Int log(x) d x = x * log(x) - x)
    (Int exp(x) d x = exp(x))
    (Int sin(x) d x = - cos(x))
    (Int cos(x) d x = sin(x))
    (Int tan(x) d x = - log(cos(x)))
    (Int sinh(x) d x = cosh(x))
    (Int cosh(x) d x = sinh(x))
    (Int tanh(x) d x = log(cosh(x)))
    ))

These are the rules that Norvig used in his book "Paradigms of
Artificial Intelligence Programming". One could also read there about
optimizing the code and implementing it as an interpreter and a rule
compiler (Chapter 9 and 10).

http://norvig.com/paip/README.html
http://norvig.com/paip/macsymar.lisp

0
joswig (506)
2/15/2007 2:41:32 AM
On Feb 15, 2:04 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> Pascal Costanza wrote:
> > Since I am not interested in microbenchmarks, I haven't measured the
> > runtimes.
>
> Can you give some examples of simplifying symbolic expressions using this
> code? I can't get it to work.

? (simplify #S(ADD :X 3 :Y #S(mul :x 3 :y 4)))
15

>
> > Can your version easily be extended at runtime?
>
> No, it is completely inextensible. If you were implementing a CAS then you
> would start with an inextensible but fast core and extend it to support
> run-time generated replacement rules.
>
> --
> Dr Jon D Harrop, Flying Frog Consultancy
> OCaml for Scientistshttp://www.ffconsultancy.com/products/ocaml_for_scientists/index.html...


0
joswig (506)
2/15/2007 2:46:51 AM
Jon Harrop <jon@ffconsultancy.com> writes:

> Wolfram Fenske wrote:

[...]

>> immutable.  Take, for example, `reverse' vs. `nreverse' in Common
>> Lisp.  `reverse' allocates a new list as its result.  `nreverse',
>> OTOH, reverses a list in place, which is more efficient.  But usually
>> you'll use `reverse' and not `nreverse' because (part of) the original
>> list might still be used somewhere else.  What did we do just there?
>> We copied data "just to make sure we don't overwrite it."
>
> There is little point in comparing the performance of functions that
> do different things. You're much better off comparing the
> performance of programs that do the same thing but by different
> means. In this case, using nreverse may force you to use a copy
> elsewhere.

But that's all I'm saying:

  * data is mutable   -> you copy so nobody can modify your version
  * data is immutable -> you copy so as not to mess up someone else's
    version

>>> You can't draw a meaningful comparison about the rest without being
>>> more precise, e.g. maps are often more memory efficient than hash
>>> tables because hash table must be kept largely empty in order to be
>>> efficient.
>>
>> (Not if you use a good hash function, e. g. this one:
>> <http://www.burtleburtle.net/bob/hash/evahash.html>.)
>
> Whatever hash function you use, you want your hash table to be >50%
> empty.

Up to 75% full is just fine.  Above that, performance starts to
worsen, but not dramatically [1].  Anyway, this is off topic.


Footnotes:
[1]  If your implementation doesn't suck.

--
Wolfram Fenske

A: Yes.
>Q: Are you sure?
>>A: Because it reverses the logical flow of conversation.
>>>Q: Why is top posting frowned upon?

0
int2k (134)
2/15/2007 3:22:04 AM
Dirk Thierbach wrote:
>Joshua Dunfield <joshuad@cs.cmu.edu> wrote:
>> The story of refinement types is instructive: despite inference being
>> possible, it's undesirable.  
>
>I don't think so. Inference is nice, because it saves you work (why
>write down the type yourself when you can just ask the
>compiler/interpreter, and then copy and paste), you can use it to
>discover that maybe your function is more general than you thought,

With plain HM, that's plausible, but when your types start to express
more interesting properties, I think there's a real benefit to being
forced to write them down yourself (before you finish writing the
function, at least, if not before you start writing it).

[...]
>> For example, if you have a function `f' that's only supposed to take
>> nonempty lists (nonemptiness being an invariant easily captured by
>> refinement types),
>
>That's maybe not a particular good example, because it practice you
>wouldn't encode such an information in a type if it would involve lots
>of more work when writing out types (which again means that I would
>only use it in conjunction with type inference). Just not worth the
>effort, unless you're really paranoid about correctness (which 
>would justify the extra work).

Experience with refinement types so far suggests that it's not lots
more work; in Davies' SML-CIDRE*, for example, you need to annotate
functions and not much else.

As for paranoia about correctness, it really depends on the specifics
of the code.  If there can't be a formal specification (and I don't mean
that you just haven't bothered, I mean there *can't* be one because you're
doing GUI prototyping or something and any specification would change
every 5 minutes anyway), refined types won't do any good.  If there's
a well-known specification and the code will be used by thousands of
people, the situation is quite different.  And that's exactly the
situation with data structure libraries.

Those are the extremes, of course, and the tradeoffs in-between are
harder to judge.

* http://www.cs.cmu.edu/~rowan/sorts.html

-j.
0
joshuad (9)
2/15/2007 3:35:47 AM
Wolfram Fenske wrote:
> But that's all I'm saying:
> 
>   * data is mutable   -> you copy so nobody can modify your version

You copy the whole data structure.

>   * data is immutable -> you copy so as not to mess up someone else's
>     version

The implementation copies only the parts that have changed and refers back
to the rest of the original.

So you are using "copy" to refer to two very different operations.

>> Whatever hash function you use, you want your hash table to be >50%
>> empty.
> 
> Up to 75% full is just fine.  Above that, performance starts to
> worsen, but not dramatically [1].  Anyway, this is off topic.

I think we need to compare the actual memory used by some real
implementations for this to be useful.

I just did a test with OCaml and, for 10^6 float->float bindings, the map
uses 3x more memory than the hash table.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/15/2007 3:41:12 AM
Jon Harrop <jon@ffconsultancy.com> writes:

> Wolfram Fenske wrote:
>> Jon Harrop <jon@ffconsultancy.com> writes:
>>> Wolfram Fenske wrote:
>>>> Jon Harrop <jon@ffconsultancy.com> writes:

[...]

>>> If you're allowing infinite time to optimise then machine code is
>>> the "fastest" language. In practice, the time taken to optimise is
>>> relevant.
>>
>> Yes, the Turing-complete argument.  I didn't use it.
>
> That isn't really the Turing argument (which is to do with computability,
> rather than performance) but I thought you were tending in that direction
> by disregarding the relevance of higher-level programming.

By no means!

> My point is, if you compare immutable data structures with mutable
> data structures on an individual level then the mutable data
> structures will always appear to be faster. However, that does not
> carry through to the optimisation of whole programs because mutable
> data structures can be much more difficult to use.

I agree.

[...]

>>>> If a language enables you to do more high-level optimisations
>>>> because you don't have to concern yourself with the low-level stuff
>>>> then that's because the language is more high-level.
>>>
>>> Yes.
>>>
>>>> It doesn't have to do with being functional or not.
>>>
>>> How many low level functional languages do you know?
>>
>> You misunderstood: "Functional" as in "free of side-effects"
>
> That's "pure" not "functional".

My bad.

> Most functional languages (Lisp, Scheme, SML, OCaml, F#) are impure
> and allow side-effects.

Yes.

[...]

>> PS: I noticed your latest F# News posting, "Avoiding copying in
>> functional programming languages".  I wonder where that came from. ;-)
>
> That came from a thread on the OCaml beginners list.

I could have sworn ... :-)

--
Wolfram Fenske

A: Yes.
>Q: Are you sure?
>>A: Because it reverses the logical flow of conversation.
>>>Q: Why is top posting frowned upon?

0
int2k (134)
2/15/2007 3:46:35 AM
Jon Harrop wrote:
>Joshua Dunfield wrote:
>>[...]For example, if you have a function `f'
>> that's only supposed to take nonempty lists (nonemptiness being an
>> invariant easily captured by refinement types), you don't want the
>> type system to allow you to apply `f' to empty lists just because
>> `f's behavior on empty lists happens to be well-defined;
[...]
>
>Your original point is perfectly valid but your example is not. You can
>leverage the static type system to ensure that "f" only gets lists with at
>least one element by using the type 'a * 'a list.

Yes, that's true, but of course that "leveraging" doesn't scale.  It only
works because nonempty lists only have one constructor, cons, so you can
strip off the cons and use cons' argument.

-j.
0
joshuad (9)
2/15/2007 3:47:57 AM
Joshua Dunfield wrote:
> Yes, that's true, but of course that "leveraging" doesn't scale.  It only
> works because nonempty lists only have one constructor, cons, so you can
> strip off the cons and use cons' argument.

Then can we have a better counter example? :-)

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/15/2007 3:59:40 AM
Pascal Costanza wrote:
> (defstruct add x y)
> (defstruct mul x y)
> 
> (defgeneric simplify-exp (op arg1 arg2)
>    (:method ((exp add) (x number) (y number)) (+ x y))
>    (:method ((exp add) (x (eql 0)) y) y)
>    (:method ((exp add) x (y (eql 0))) x)
>    (:method ((exp add) x (y add))
>     (simplify-exp exp (make-add :x x :y (add-x y)) (add-y y)))
>    (:method ((exp add) x y) exp)
> 
>    (:method ((exp mul) (x number) (y number)) (* x y))
>    (:method ((exp mul) (x (eql 0)) y) 0)
>    (:method ((exp mul) x (y (eql 0))) 0)
>    (:method ((exp mul) (x (eql 1)) y) y)
>    (:method ((exp mul) x (y (eql 1))) x)
>    (:method ((exp mul) x (y mul))
>     (simplify-exp exp (make-mul :x x :y (mul-x y)) (mul-y y)))
>    (:method ((exp mul) x y) exp))
> 
> (defgeneric simplify (exp)
>    (:method (exp) exp)
>    (:method ((exp add))
>     (simplify-exp exp (simplify (add-x exp)) (simplify (add-y exp))))
>    (:method ((exp mul))
>     (simplify-exp exp (simplify (mul-x exp)) (simplify (mul-y exp)))))

This doesn't seem to work. For example:

* expr**

#S(ADD
   :X #S(ADD
         :X 0
         :Y #S(ADD
               :X #S(ADD
                     :X 1
                     :Y #S(ADD
                           :X X
                           :Y #S(ADD
                                 :X 2
                                 :Y #S(ADD :X #S(ADD :X 0 :Y X) :Y 1))))
               :Y 2))
   :Y #S(ADD :X 1 :Y #S(MUL :X 0 :Y #S(ADD :X X :Y 2))))
* (simplify expr**)

#S(ADD
   :X #S(ADD
         :X 0
         :Y #S(ADD
               :X #S(ADD
                     :X 1
                     :Y #S(ADD
                           :X X
                           :Y #S(ADD
                                 :X 2
                                 :Y #S(ADD :X #S(ADD :X 0 :Y X) :Y 1))))
               :Y 2))
   :Y #S(ADD :X 1 :Y #S(MUL :X 0 :Y #S(ADD :X X :Y 2))))

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/15/2007 4:13:29 AM
Joshua Dunfield <joshuad@cs.cmu.edu> wrote:
> Dirk Thierbach wrote:
>>Joshua Dunfield <joshuad@cs.cmu.edu> wrote:
>>> The story of refinement types is instructive: despite inference being
>>> possible, it's undesirable.  

>>I don't think so. Inference is nice, because it saves you work (why
>>write down the type yourself when you can just ask the
>>compiler/interpreter, and then copy and paste), you can use it to
>>discover that maybe your function is more general than you thought,

> With plain HM, that's plausible, but when your types start to express
> more interesting properties, 

The point I was trying to make is that it's maybe not a good idea
to try to express to many things in the type system itself (unless
it can be integrated seemlessly). If I want more interesting 
properties, they should be expressed on a different level, and leave
the type system intact so I am still able to use it in the same
way as before.

> I think there's a real benefit to being forced to write them down
> yourself (before you finish writing the function, at least, if not
> before you start writing it).

Yes, certainly. It more or less amounts to writing pre/postconditions,
or unit tests, QuickCheck properties, or whatever you like to use.

You have to write these yourself, because they express the
specification, and no compiler can properly second guess them :-)

But I am trying to argue that just because you *also* want to play
with more interesting properties, that's no reason to ditch a type
inference system that works well.

>>> For example, if you have a function `f' that's only supposed to take
>>> nonempty lists (nonemptiness being an invariant easily captured by
>>> refinement types),

>>That's maybe not a particular good example, because it practice you
>>wouldn't encode such an information in a type if it would involve lots
>>of more work when writing out types (which again means that I would
>>only use it in conjunction with type inference). Just not worth the
>>effort, unless you're really paranoid about correctness (which 
>>would justify the extra work).

> Experience with refinement types so far suggests that it's not lots
> more work; in Davies' SML-CIDRE*, for example, you need to annotate
> functions and not much else.

I have looked at refinement types, and while they are theoretically
nice, to actually use them in practice I think you need to much effort
for the benefit they give.

And in your example, the compiler could actually figure out everything
by itself: Looking at the pattern matches, it can deduce that the
case for the empty list is not defined, so it infers an appropriate
precondition, carries this precondition to all the places where the
function is used, and then tries to prove that the precondition is
satisfied (or generates more preconditions in this process). No need
to annotate anything unless the compiler cannot figure out how to
do the proof. 

> As for paranoia about correctness, it really depends on the specifics
> of the code.  

"paranoia" was of course meant a bit tongue-in-cheek.

> If there can't be a formal specification (and I don't mean
> that you just haven't bothered, I mean there *can't* be one because you're
> doing GUI prototyping or something and any specification would change
> every 5 minutes anyway), refined types won't do any good.  If there's
> a well-known specification and the code will be used by thousands of
> people, the situation is quite different.  And that's exactly the
> situation with data structure libraries.

There is also the much more common situation where there *is* a
specification, but extensions to the type system like refinement types
are just not going to capture it. So you need something more
expressive, and you do have some options (even without using an
automated proof system, see above). So, still no reason to ditch type
inference :-)

"Cannot be applied to an empty list" is a nice toy example, but
usually, the requirements are much more complicated.

So "no type inference, but a little more expressibility" is not a 
desireablr trade-off, at least not for me.

- Dirk
0
dthierbach2 (260)
2/15/2007 8:53:42 AM
Jon Harrop wrote:
> Pascal Costanza wrote:
>> Since I am not interested in microbenchmarks, I haven't measured the
>> runtimes.
> 
> Can you give some examples of simplifying symbolic expressions using this
> code? I can't get it to work.

For example:

(simplify (make-mul :x (make-add :x 0 :y 0) :y (make-add :x 'x :y 'y)))

[Before you report any benchmark results you have to make sure that 
optimization settings are appropriate...]

>> Can your version easily be extended at runtime? 
> 
> No, it is completely inextensible. If you were implementing a CAS then you
> would start with an inextensible but fast core and extend it to support
> run-time generated replacement rules.

Sounds too complicated to me... ;)


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
0
pc56 (3929)
2/15/2007 9:34:18 AM
Jon Harrop wrote:

> This doesn't seem to work. For example:
[...]

I have just transliterated your original OCaml code. Does the example 
work in your code?


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
0
pc56 (3929)
2/15/2007 9:36:18 AM
Pascal Costanza wrote:
> Jon Harrop wrote:
> 
>> This doesn't seem to work. For example:
> [...]
> 
> I have just transliterated your original OCaml code. Does the example 
> work in your code?

Does this work better?

(defstruct add x y)
(defstruct mul x y)

(defgeneric simplify-exp (op arg1 arg2)
   (:method ((exp add) (x number) (y number)) (+ x y))
   (:method ((exp add) (x (eql 0)) y) y)
   (:method ((exp add) x (y (eql 0))) x)
   (:method ((exp add) x (y add))
    (let ((new-add (make-add :x x :y (add-x y))))
      (simplify-exp new-add new-add  (add-y y))))
   (:method ((exp add) x y) exp)

   (:method ((exp mul) (x number) (y number)) (* x y))
   (:method ((exp mul) (x (eql 0)) y) 0)
   (:method ((exp mul) x (y (eql 0))) 0)
   (:method ((exp mul) (x (eql 1)) y) y)
   (:method ((exp mul) x (y (eql 1))) x)
   (:method ((exp mul) x (y mul))
    (let ((new-mul (make-mul :x x :y (mul-x y))))
      (simplify-exp new-mul new-mul  (mul-y y))))
   (:method ((exp mul) x y) exp))

(defgeneric simplify (exp)
   (:method (exp) exp)
   (:method ((exp add))
    (simplify-exp exp (simplify (add-x exp)) (simplify (add-y exp))))
   (:method ((exp mul))
    (simplify-exp exp (simplify (mul-x exp)) (simplify (mul-y exp)))))



Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
0
pc56 (3929)
2/15/2007 9:52:11 AM
Pascal Costanza wrote:
> Pascal Costanza wrote:
>> Jon Harrop wrote:
>>
>>> This doesn't seem to work. For example:
>> [...]
>>
>> I have just transliterated your original OCaml code. Does the example 
>> work in your code?
> 
> Does this work better?
> 
> (defstruct add x y)
> (defstruct mul x y)
> 
> (defgeneric simplify-exp (op arg1 arg2)
>   (:method ((exp add) (x number) (y number)) (+ x y))
>   (:method ((exp add) (x (eql 0)) y) y)
>   (:method ((exp add) x (y (eql 0))) x)
>   (:method ((exp add) x (y add))
>    (let ((new-add (make-add :x x :y (add-x y))))
>      (simplify-exp new-add new-add  (add-y y))))
>   (:method ((exp add) x y) exp)
> 
>   (:method ((exp mul) (x number) (y number)) (* x y))
>   (:method ((exp mul) (x (eql 0)) y) 0)
>   (:method ((exp mul) x (y (eql 0))) 0)
>   (:method ((exp mul) (x (eql 1)) y) y)
>   (:method ((exp mul) x (y (eql 1))) x)
>   (:method ((exp mul) x (y mul))
>    (let ((new-mul (make-mul :x x :y (mul-x y))))
>      (simplify-exp new-mul new-mul  (mul-y y))))
>   (:method ((exp mul) x y) exp))
> 
> (defgeneric simplify (exp)
>   (:method (exp) exp)
>   (:method ((exp add))
>    (simplify-exp exp (simplify (add-x exp)) (simplify (add-y exp))))
>   (:method ((exp mul))
>    (simplify-exp exp (simplify (mul-x exp)) (simplify (mul-y exp)))))

Replace the latter with this:

(defgeneric simplify (exp)
   (:method (exp) exp)
   (:method ((exp add))
    (let ((new-add (make-add :x (simplify (add-x exp))
                             :y (simplify (add-y exp)))))
      (simplify-exp new-add (add-x new-add) (add-y new-add))))
   (:method ((exp mul))
    (let ((new-mul (make-mul :x (simplify (mul-x exp))
                             :y (simplify (mul-y exp)))))
      (simplify-exp new-mul (mul-x new-mul) (mul-y new-mul)))))


Is that better?


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
0
pc56 (3929)
2/15/2007 9:58:27 AM
Pascal Costanza wrote:
> Pascal Costanza wrote:
>> Pascal Costanza wrote:
>>> Jon Harrop wrote:
>>>
>>>> This doesn't seem to work. For example:
>>> [...]
>>>
>>> I have just transliterated your original OCaml code. Does the example 
>>> work in your code?

Finally, this version should work. Sorry for the noise.


(defstruct add x y)
(defstruct mul x y)

(defgeneric simplify-exp (op arg1 arg2)
   (:method ((exp add) (x number) (y number)) (+ x y))
   (:method ((exp add) (x (eql 0)) y) y)
   (:method ((exp add) x (y (eql 0))) x)
   (:method ((exp add) x (y add))
    (simplify-exp exp (make-add :x x :y (add-x y)) (add-y y)))
   (:method ((exp add) x y) (make-add :x x :y y))

   (:method ((exp mul) (x number) (y number)) (* x y))
   (:method ((exp mul) (x (eql 0)) y) 0)
   (:method ((exp mul) x (y (eql 0))) 0)
   (:method ((exp mul) (x (eql 1)) y) y)
   (:method ((exp mul) x (y (eql 1))) x)
   (:method ((exp mul) x (y mul))
    (simplify-exp exp (make-mul :x x :y (mul-x y)) (mul-y y)))
   (:method ((exp mul) x y) (make-mul :x x :y y)))

(defgeneric simplify (exp)
   (:method (exp) exp)
   (:method ((exp add))
    (simplify-exp exp (simplify (add-x exp)) (simplify (add-y exp))))
   (:method ((exp mul))
    (simplify-exp exp (simplify (mul-x exp)) (simplify (mul-y exp)))))


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
0
pc56 (3929)
2/15/2007 10:14:59 AM
Jon Harrop wrote:
> 
> Ok. I thought you were advocating type declarations instead of annotations,
> so the compiler would object to a unequal type rather than a unifiable
> type, i.e. giving a type "'a array" would fail if the type ended up as "int
> array".

Your use of terminology is slightly odd, but I reckon you are talking 
about the fact that OCaml interprets type variables in expressions as 
existentially quantified, rather than universally quantified.

If so, then you should know that many people consider this feature a 
design bug. It is confusing and inconsistent with the rest of the 
language, where free type variables are always universal. When you argue 
that a type variable is "unifiable" with another type then you have 
already internalised the existential interpretation. The argument does 
not hold water a priori.

I think existential variables are occasionally useful, and should 
probably be in the language, but I prefer a syntax that clearly 
distinguishes them from ordinary universal type variables. In any case, 
the latter are more important - in OCaml, there currently is no way of 
documenting or enforcing, via type annotations, that something is 
polymorphic.

Btw, if all you care about is being able to give partial type 
annotations, then OCaml already has a more reasonable syntax for that:

# ([|5|] : _ array);;
- : int array = [|5|]

-- 
Andreas Rossberg, rossberg@ps.uni-sb.de
0
rossberg (600)
2/15/2007 10:19:32 AM
On Feb 14, 9:24 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> Unfortunately it has the disadvantage that it produces different results. I
> think it is the same bug I had: it isn't recursing when it should to
> rewrite already rewritten terms. This is most easily fixed by factoring the
> recursion into a separate "rewrite" function.
>
> I just tested your vectorized Lisp and it is also wrong. However, it managed
> to reduce this function of "x":
>
>   #(+ #(+ 0 #(+ #(+ 1 #(+ X #(+ 2 #(+ #(+ 0 X) 1)))) 2)) #(+ 1 #(* 0 #(+ X
> 2))))
>
> down to "7", which is impressively wrong. ;-)

Depends on the value of X, perhaps. :)

But you are right; the last two functions posted (list and vector
versions)
each had bugs in them.  The vector versions that I've been testing all
had
another bug, too: they constructed a new expression and failed to
return it!
Once I fixed all these bugs, there is no difference between the list
and
vector versions in my timings.  So maybe my one-off about the
"fundamentally different representations" didn't matter as much as I
thought. :)

Final, improved list version:

(defun simplify-common-funcs (xexpr)
  (declare (optimize (speed 3)))
  (if (atom xexpr)
      xexpr
      (labels ((make-expr (op f g)
                 (list op f g))
               (optimize-plus (f g)
                 (cond
                   ((eql f 0) g)
                   ((eql g 0) f)
                   ((and (numberp f) (numberp g)) (+ f g))
                   (t (optimize-associatively '+ #'optimize-plus f
g))))
               (optimize-multiply (f g)
                 (cond
                   ((or (eql f 0) (eql g 0)) 0)
                   ((eql f 1) g)
                   ((eql g 1) f)
                   ((and (numberp f) (numberp g)) (* f g))
                   (t (optimize-associatively '* #'optimize-multiply f
g))))
               (optimize-associatively (op opt-func f g)
                 (cond
                   ((and (listp g) (eq op (first g)))
                    (let ((a (funcall opt-func f (second g))))
                      (funcall opt-func a (third g))))
                   (t (make-expr op f g)))))
        (declare (inline make-expr optimize-associatively))
        (let* ((op (first xexpr))
               (f (simplify-common-funcs (second xexpr)))
               (g (simplify-common-funcs (third xexpr))))
          (cond
            ((eq op '+) (optimize-plus f g))
            ((eq op '*) (optimize-multiply f g)))))))

Clocks in at 0.3 seconds +/- 0.02 seconds on a million-term expression
on
my machine.  I don't think I can get much faster than that without
introducing bugs. :)  So it's within a factor of 2 versus the
equivalent
OCaml.  A far cry from the 100x slowdown claimed earlier, at least.

-Nathan

0
froydnj (14)
2/15/2007 3:08:28 PM
Nathan Froyd wrote:
> Clocks in at 0.3 seconds +/- 0.02 seconds on a million-term expression
> on my machine.

I get:

Lisp: 0.337s 34LOC 722b
OCaml: 0.216s 20LOC 474b

> So it's within a factor of 2 versus the equivalent OCaml.  A far cry from
> the 100x slowdown claimed earlier, at least.

True, but after a lot of optimisation the Lisp remains significantly longer
and slower than the unoptimised OCaml.

I think you will at least have to eat your words:

> So maybe if I want shorter code, I'll write OCaml, but if I
> want speed, I'll use Lisp. ;)

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/15/2007 6:33:13 PM
Pascal Costanza wrote:
> Finally, this version should work. Sorry for the noise.

No, worries. Thanks for trying. Still doesn't work:

* expr**

#S(ADD
   :X #S(ADD
         :X 0
         :Y #S(ADD
               :X #S(ADD
                     :X 1
                     :Y #S(ADD
                           :X X
                           :Y #S(ADD
                                 :X 2
                                 :Y #S(ADD :X #S(ADD :X 0 :Y X) :Y 1))))
               :Y 2))
   :Y #S(ADD :X 1 :Y #S(MUL :X 0 :Y #S(ADD :X X :Y 2))))
* (simplify expr**)

#S(ADD
   :X #S(ADD
         :X #S(ADD :X #S(ADD :X 1 :Y #S(ADD :X X :Y #S(ADD :X 2 :Y X))) :Y
1)
         :Y 2)
   :Y 1)

Note that the add nodes haven't been rotated.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/15/2007 7:21:51 PM
Jon Harrop wrote:
> Pascal Costanza wrote:
>> Finally, this version should work. Sorry for the noise.
> 
> No, worries. Thanks for trying. Still doesn't work:
> 
> * expr**
> 
> #S(ADD
>    :X #S(ADD
>          :X 0
>          :Y #S(ADD
>                :X #S(ADD
>                      :X 1
>                      :Y #S(ADD
>                            :X X
>                            :Y #S(ADD
>                                  :X 2
>                                  :Y #S(ADD :X #S(ADD :X 0 :Y X) :Y 1))))
>                :Y 2))
>    :Y #S(ADD :X 1 :Y #S(MUL :X 0 :Y #S(ADD :X X :Y 2))))
> * (simplify expr**)
> 
> #S(ADD
>    :X #S(ADD
>          :X #S(ADD :X #S(ADD :X 1 :Y #S(ADD :X X :Y #S(ADD :X 2 :Y X))) :Y
> 1)
>          :Y 2)
>    :Y 1)
> 
> Note that the add nodes haven't been rotated.

Two questions:

+ What result do you expect? (Doesn't have to be in the above notation.)

+ Where do you rotate expressions in your original code (from 0:15 of 
today)?


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
0
pc56 (3929)
2/15/2007 8:06:01 PM
Wolfram Fenske schrieb:
> But that's all I'm saying:
> 
>   * data is mutable   -> you copy so nobody can modify your version
>   * data is immutable -> you copy so as not to mess up someone else's
>     version

Ah.

Well, the difference is that you can stop copying.

Take a binary search tree. Say, you need a version that has been changed 
in three places.
If the tree is mutable, you have to copy it in its entirety.
If the tree is immutable, the old and the new version of the tree can 
share those subtrees that remain the same.

In general, if you need an updated version of a data structure under an 
immutable regime, you need to copy only those nodes that are on the 
access path to the changed parts. That's often enough just a tiny 
fraction of the entire data structure.
That's a new tool in the toolbox: copying multi-megabyte data structures 
is now possible. That's useful e.g. for validation: simply apply the 
changes, and if validation fails, revert to the old state of things (of 
which you have kept a copy).

Regards,
Jo
0
jo427 (1164)
2/15/2007 8:49:45 PM
Wolfram Fenske schrieb:
> But the original example were structures passed into the
> Windows API.  These are usually flat records with, I don't know, 10
> fields or so.  How would sharing of structure work there?

The data would be shared between Windows and the application program.

Regards,
Jo
0
jo427 (1164)
2/15/2007 8:56:27 PM
Dirk Thierbach schrieb:
> And in your example, the compiler could actually figure out everything
> by itself: Looking at the pattern matches, it can deduce that the
> case for the empty list is not defined, so it infers an appropriate
> precondition, carries this precondition to all the places where the
> function is used, and then tries to prove that the precondition is
> satisfied (or generates more preconditions in this process). No need
> to annotate anything unless the compiler cannot figure out how to
> do the proof. 

Sure, however, there's a difference between essential and accidental 
preconditions.

I.e. the compiler may infer some preconditions, but miss some.
E.g. if you write a division algorithm, the compiler won't be able to 
infer that the divisor shouldn't be zero; it will happily emit code that 
will instead not terminate (or Do the Wrong Thing, depending on algorithm).

Also, writing down the expected preconditions is helpful to catch 
errors: if the compiler ends up with more preconditions than those 
listed, either you forgot it, or you still have a bug in your code.
Conveniently, the bug will be triggered in those cases that the 
additional inferred precondition describes, so it should be easy to spot 
the problem.

Of course, most of the time, the compiler will infer the intended 
preconditions just fine :-)
(Postconditions are different.)

Regards,
Jo
0
jo427 (1164)
2/15/2007 9:06:39 PM
Pascal Costanza wrote:
> Two questions:
> 
> + What result do you expect? (Doesn't have to be in the above notation.)

* (convert expr4)

#S(ADD
   :X #S(ADD
         :X #S(ADD :X #S(ADD :X #S(ADD :X #S(ADD :X 1 :Y X) :Y 2) :Y X) :Y
1)
         :Y 2)
   :Y 1)

> + Where do you rotate expressions in your original code (from 0:15 of
> today)?

let rec ( +: ) f g = match f, g with
  | Q n, Q m -> Q (n +/ m)
  | Q (Int 0), e | e, Q (Int 0) -> e
  | f, Add(g, h) -> f +: g +: h              <-- Here
  | f, g -> Add(f, g)

let rec ( *: ) f g = match f, g with
  | Q n, Q m -> Q (n */ m)
  | Q (Int 0), e | e, Q (Int 0) -> Q (Int 0)
  | Q (Int 1), e | e, Q (Int 1) -> e
  | f, Mul(g, h) -> f *: g *: h              <-- Here
  | f, g -> Mul(f, g)

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/15/2007 9:53:47 PM
Jon Harrop wrote:
> Pascal Costanza wrote:
>> Two questions:
>>
>> + What result do you expect? (Doesn't have to be in the above notation.)
> 
> * (convert expr4)
> 
> #S(ADD
>    :X #S(ADD
>          :X #S(ADD :X #S(ADD :X #S(ADD :X #S(ADD :X 1 :Y X) :Y 2) :Y X) :Y
> 1)
>          :Y 2)
>    :Y 1)
> 
>> + Where do you rotate expressions in your original code (from 0:15 of
>> today)?
> 
> let rec ( +: ) f g = match f, g with
>   | Q n, Q m -> Q (n +/ m)
>   | Q (Int 0), e | e, Q (Int 0) -> e
>   | f, Add(g, h) -> f +: g +: h              <-- Here
>   | f, g -> Add(f, g)
> 
> let rec ( *: ) f g = match f, g with
>   | Q n, Q m -> Q (n */ m)
>   | Q (Int 0), e | e, Q (Int 0) -> Q (Int 0)
>   | Q (Int 1), e | e, Q (Int 1) -> e
>   | f, Mul(g, h) -> f *: g *: h              <-- Here
>   | f, g -> Mul(f, g)

I don't get it.

I assume the following:

"( +: ) a b" is equivalent to "a +: b"

"a +: b +: c" is equivalent to "((a +: b) + :c)"

So the latter is also equivalent to "(( +: ) (( +: ) a b) c)"

However, this would mean that there is no rotation going on. So I must 
be missing something. Could you clarify?


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
0
pc56 (3929)
2/15/2007 11:18:03 PM
Pascal Costanza <pc@p-cos.net> writes:

> I assume the following:
> 
> "( +: ) a b" is equivalent to "a +: b"
> 
> "a +: b +: c" is equivalent to "((a +: b) + :c)"
> 
> So the latter is also equivalent to "(( +: ) (( +: ) a b) c)"
> 
> However, this would mean that there is no rotation going on. So I must be
> missing something. Could you clarify?

I'm not Jon but...

 | f,  Mul(g, h) ->  f *: g *: h              <-- Here
   v   ---v----        v    v
   f*   (g*h)    -> (f * g) * h 


'as
0
2/16/2007 12:13:46 AM
Alexander Schmolck wrote:
> Pascal Costanza <pc@p-cos.net> writes:
> 
>> I assume the following:
>>
>> "( +: ) a b" is equivalent to "a +: b"
>>
>> "a +: b +: c" is equivalent to "((a +: b) + :c)"
>>
>> So the latter is also equivalent to "(( +: ) (( +: ) a b) c)"
>>
>> However, this would mean that there is no rotation going on. So I must be
>> missing something. Could you clarify?
> 
> I'm not Jon but...
> 
>  | f,  Mul(g, h) ->  f *: g *: h              <-- Here
>    v   ---v----        v    v
>    f*   (g*h)    -> (f * g) * h 

Thanks a lot for your help. It should have been clear to me before, but 
you helped me to focus better on the actual issue.

Here is another attempt at a correct transliteration. This time, I have 
made further simplifications to the code...

(defstruct add x y)
(defstruct mul x y)

(defgeneric simplify-add (x y)
   (:method ((x number) (y number)) (+ x y))
   (:method ((x (eql 0)) y) y)
   (:method (x (y (eql 0))) x)
   (:method (x (y add))
    (simplify-add (simplify-add x (add-x y)) (add-y y)))
   (:method (x y) (make-add :x x :y y)))

(defgeneric simplify-mul (x y)
   (:method ((x number) (y number)) (* x y))
   (:method ((x (eql 0)) y) 0)
   (:method (x (y (eql 0))) 0)
   (:method ((x (eql 1)) y) y)
   (:method (x (y (eql 1))) x)
   (:method (x (y mul))
    (simplify-mul (simplify-mul x (mul-x y)) (mul-y y)))
   (:method (x y) (make-mul :x x :y y)))

(defgeneric simplify (exp)
   (:method (exp) exp)
   (:method ((exp add))
    (simplify-add (simplify (add-x exp)) (simplify (add-y exp))))
   (:method ((exp mul))
    (simplify-mul (simplify (mul-x exp)) (simplify (mul-y exp)))))


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
0
pc56 (3929)
2/16/2007 12:33:14 AM
Pascal Costanza wrote:
> Alexander Schmolck wrote:
>>  | f,  Mul(g, h) ->  f *: g *: h              <-- Here
>>    v   ---v----        v    v
>>    f*   (g*h)    -> (f * g) * h

Exactly.

> Thanks a lot for your help. It should have been clear to me before, but
> you helped me to focus better on the actual issue.
> 
> Here is another attempt at a correct transliteration. This time, I have
> made further simplifications to the code...

Inspecting the results by eye, this seems to be ok. But using EQUAL doesn't
work. Why is that?

Incidentally, time taken for n=10^6 is 0.556s, which is quite good.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/16/2007 3:04:34 AM
Jon Harrop <jon@ffconsultancy.com> writes:

> Pascal Costanza wrote:
> > Alexander Schmolck wrote:
> >>  | f,  Mul(g, h) ->  f *: g *: h              <-- Here
> >>    v   ---v----        v    v
> >>    f*   (g*h)    -> (f * g) * h
> 
> Exactly.
> 
> > Thanks a lot for your help. It should have been clear to me before, but
> > you helped me to focus better on the actual issue.
> > 
> > Here is another attempt at a correct transliteration. This time, I have
> > made further simplifications to the code...
> 
> Inspecting the results by eye, this seems to be ok. But using EQUAL doesn't
> work. Why is that?

EQUAL uses physical equality (EQ) for structures. Try EQUALP.

'as

0
2/16/2007 3:50:08 AM
Jon Harrop wrote:
> Pascal Costanza wrote:
>> Alexander Schmolck wrote:
>>>  | f,  Mul(g, h) ->  f *: g *: h              <-- Here
>>>    v   ---v----        v    v
>>>    f*   (g*h)    -> (f * g) * h
> 
> Exactly.
> 
>> Thanks a lot for your help. It should have been clear to me before, but
>> you helped me to focus better on the actual issue.
>>
>> Here is another attempt at a correct transliteration. This time, I have
>> made further simplifications to the code...
> 
> Inspecting the results by eye, this seems to be ok. But using EQUAL doesn't
> work. Why is that?

Use EQUALP. EQUAL on structs just performs an identity test.

> Incidentally, time taken for n=10^6 is 0.556s, which is quite good.

That wasn't my intention. ;)

Nevertheless, it's probably possible to squeeze out more by playing 
around with the various standard optimization settings (debug => 0, 
safety => 0, speed => 3, compilation-speed => 0), including locally for 
the generic functions, roughly like this:

(defgeneric foo (...)
   (declare (optimize (speed 3)))
   (:method ...)
   ...)


The reason for the principle performance is that the methods for generic 
functions  are analyzed in one go and turned into discrimination nets. 
You can get a good overview of the ideas behind this in 
http://people.csail.mit.edu/jrb/Projects/partial-dispatch.htm - this is 
probably very similar to the decision trees that OCaml generates. 
However, in CLOS you can change / add method definitions at runtime, and 
the discrimination nets will then be recomputed for you, so you 
basically don't lose any efficiency. It's probably very hard to get the 
same efficiency with manual coding.

You can also read a more CLOS-specific description of method dispatch in 
http://citeseer.ist.psu.edu/kiczales90efficient.html - that paper 
describes the PCL implementation which is essentially the implementation 
used in SBCL, CMUCL and GCL, among others.


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
0
pc56 (3929)
2/16/2007 7:26:06 AM
Pascal Costanza wrote:
> The reason for the principle performance is that the methods for generic
> functions  are analyzed in one go and turned into discrimination nets.
> You can get a good overview of the ideas behind this in
> http://people.csail.mit.edu/jrb/Projects/partial-dispatch.htm - this is
> probably very similar to the decision trees that OCaml generates.
> However, in CLOS you can change / add method definitions at runtime, and
> the discrimination nets will then be recomputed for you, so you
> basically don't lose any efficiency.

In theory, yes. In practice, the Lisp remains much slower than the OCaml.

Here are 3 subsequent running times on 3 different random expressions:

Shortest OCaml: 0.270s, 0.269s, 0.287s
Fastest OCaml: 0.128s, 0.166s, 0.167s
Nathan's Lisp: 1.802s, 1.758s, 0.395s
Your Lisp: 2.494s, 0.599s, 2.121s

Although Lisp's fastest time is only 3x slower than the OCaml (using
Nathan's hand-optimised pattern match), the running times are much more
varied for Lisp, up to 20x slower.

> It's probably very hard to get the same efficiency with manual coding.

If you're willing to work from an interactive session, you just write in
this style:

let rule rules = function
  | ...
  | expr -> rules expr

> You can also read a more CLOS-specific description of method dispatch in
> http://citeseer.ist.psu.edu/kiczales90efficient.html - that paper
> describes the PCL implementation which is essentially the implementation
> used in SBCL, CMUCL and GCL, among others.

Thanks.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/16/2007 7:06:47 PM
Jon Harrop wrote:
> Pascal Costanza wrote:
>> The reason for the principle performance is that the methods for generic
>> functions  are analyzed in one go and turned into discrimination nets.
>> You can get a good overview of the ideas behind this in
>> http://people.csail.mit.edu/jrb/Projects/partial-dispatch.htm - this is
>> probably very similar to the decision trees that OCaml generates.
>> However, in CLOS you can change / add method definitions at runtime, and
>> the discrimination nets will then be recomputed for you, so you
>> basically don't lose any efficiency.
> 
> In theory, yes. In practice, the Lisp remains much slower than the OCaml.
> 
> Here are 3 subsequent running times on 3 different random expressions:
> 
> Shortest OCaml: 0.270s, 0.269s, 0.287s
> Fastest OCaml: 0.128s, 0.166s, 0.167s
> Nathan's Lisp: 1.802s, 1.758s, 0.395s
> Your Lisp: 2.494s, 0.599s, 2.121s
> 
> Although Lisp's fastest time is only 3x slower than the OCaml (using
> Nathan's hand-optimised pattern match), the running times are much more
> varied for Lisp, up to 20x slower.

The CLOS generic function caches are typically only built after the 
first invocation of a generic function. So you have to take this into 
account in your benchmarks because in the long run, this upfront cost 
doesn't matter much.

Secondly, it makes a difference whether you call the test code directly 
in the listener, or whether you call it inside some internal test 
function. (I don't know exactly why, I guess it has to do with giving 
the Lisp compiler a chance to optimize away some bookkeeping code when 
calling the timer functions.)

Thirdly, the deviations are hardly interesting. Real-world code spends 
very little time in method dispatch, or similar dispatching mechanisms. 
In the general case, real-world code actually does something 
interesting, so the differences you noted will not matter much in practice.

In this case, the code isn't very interesting - at least, I don't see 
what the simplifications of the expressions actually buy you. They leave 
a lot of constants around that could be folded when the algorithm does 
more interesting shuffling around. (That's probably a case where the 
dispatching mechanisms will already start to matter less.)

Fourthly, I already tried to indicate a few times that CLOS allows you 
to change / add method definitions (and other aspects of CLOS programs) 
at runtime. This may not be interesting to you, but can be interesting 
in some scenarios, and of course, one has to live with some overhead 
here. However, taking the high level of dynamicity into account that 
CLOS buys you, I'd say that the overhead is much lower than one would 
expect.

Finally, to summarize my comments with a quote by Richard Gabriel from 
his book "Performance and Evaluation of Lisp Systems": "Performance is 
an issue, but it is not the only issue."


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
0
pc56 (3929)
2/16/2007 7:52:10 PM
In article <53mgbbF1sg5hpU1@mid.individual.net>, Pascal Costanza wrote:
>
> Thirdly, the deviations are hardly interesting. Real-world code
> spends very little time in method dispatch, or similar dispatching
> mechanisms. In the general case, real-world code actually does
> something interesting, so the differences you noted will not matter
> much in practice.

I don't buy this argument, because the cost model of your language
shapes the sorts of programs you write. The claim you're making is
analogous to a C programmer claiming that he doesn't need tail
recursion optimization because in practice state machines are written
with labels and goto, not with recursive function calls.

Another example, unrelated to the specifics of Ocaml or Lisp (and
hence hopefully avoiding a language flamewar), is deforestation.

If you map over a list twice -- for example, (map f (map g lst)) --
then it creates an intermediate list which is immediately traversed
and discarded. As a result, Lisp and ML programmers will go to some
lengths to write the example as (map (compose f g) lst). 

This means the objection that existing programs don't generate a lot
of intermediate structures to deforest away is therefore not a
persuasive argument against spending the effort on implementing it,
because if programmers knew that the compiler would automatically make
the optimization itself, then they could/would write programs in a
different style.

(In fact, the ghc Haskell compiler does do this, and unsurprisingly
you see Haskell programmers tending to write code that worries much
less about apparently creating intermediate data structures.)


-- 
Neel R. Krishnaswami
neelk@cs.cmu.edu
0
neelk (298)
2/18/2007 1:28:54 AM
Neelakantan Krishnaswami wrote:
> In article <53mgbbF1sg5hpU1@mid.individual.net>, Pascal Costanza wrote:
>> Thirdly, the deviations are hardly interesting. Real-world code
>> spends very little time in method dispatch, or similar dispatching
>> mechanisms. In the general case, real-world code actually does
>> something interesting, so the differences you noted will not matter
>> much in practice.
> 
> I don't buy this argument

Straw clutching.

> As a result, Lisp and ML programmers will go to some
> lengths to write the example as (map (compose f g) lst).

That is misleading. The difference is down to evaluation semantics. Lazy
evaluation results in deforesting in this case. You can get the same effect
in ML as you do in Haskell by using laziness, e.g. a lazy stream in OCaml:

# let rec map f = parser
    | [<'h; t>] -> [<'f h; map f t>]
    | [<>] -> [<>];;
val map : ('a -> 'b) -> 'a Stream.t -> 'b Stream.t = <fun>

# let rec to_list = parser
    | [<'h; t>] -> h::to_list t
    | [<>] -> [];;
val to_list : 'a Stream.t -> 'a list = <fun>

# to_list (map ((+) 2) (map ((/) 17) [<'1; '2; '3; '4; '5>]));;
- : int list = [19; 10; 7; 6; 5]

or in F# by using the lazy Seq module:

> Seq.map ((+) 2) (Seq.map ((/) 17) [1; 2; 3; 4; 5])
val it : int seq = [19; 10; 7; 6; 5]

ML programmers choose the eager "map (compose f g) lst" because it is
faster.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/18/2007 4:25:38 AM
On Feb 16, 1:33 am, Pascal Costanza <p...@p-cos.net> wrote:
> Alexander Schmolck wrote:
> > Pascal Costanza <p...@p-cos.net> writes:
>
> >> I assume the following:
>
> >> "( +: ) a b" is equivalent to "a +: b"
>
> >> "a +: b +: c" is equivalent to "((a +: b) + :c)"
>
> >> So the latter is also equivalent to "(( +: ) (( +: ) a b) c)"
>
> >> However, this would mean that there is no rotation going on. So I must be
> >> missing something. Could you clarify?
>
> > I'm not Jon but...
>
> >  | f,  Mul(g, h) ->  f *: g *: h              <-- Here
> >    v   ---v----        v    v
> >    f*   (g*h)    -> (f * g) * h
>
> Thanks a lot for your help. It should have been clear to me before, but
> you helped me to focus better on the actual issue.
>
> Here is another attempt at a correct transliteration. This time, I have
> made further simplifications to the code...
>
> (defstruct add x y)
> (defstruct mul x y)
>
> (defgenericsimplify-add (x y)
>    (:method ((x number) (y number)) (+ x y))
>    (:method ((x (eql 0)) y) y)
>    (:method (x (y (eql 0))) x)
>    (:method (x (y add))
>     (simplify-add (simplify-add x (add-x y)) (add-y y)))
>    (:method (x y) (make-add :x x :y y)))
>
> (defgenericsimplify-mul (x y)
>    (:method ((x number) (y number)) (* x y))
>    (:method ((x (eql 0)) y) 0)
>    (:method (x (y (eql 0))) 0)
>    (:method ((x (eql 1)) y) y)
>    (:method (x (y (eql 1))) x)
>    (:method (x (y mul))
>     (simplify-mul (simplify-mul x (mul-x y)) (mul-y y)))
>    (:method (x y) (make-mul :x x :y y)))
>
> (defgenericsimplify(exp)
>    (:method (exp) exp)
>    (:method ((exp add))
>     (simplify-add (simplify(add-x exp)) (simplify(add-y exp))))
>    (:method ((exp mul))
>     (simplify-mul (simplify(mul-x exp)) (simplify(mul-y exp)))))
>
> Pascal
>
> --
> My website:http://p-cos.net
> CommonLispDocument Repository:http://cdr.eurolisp.org
> Closer to MOP & ContextL:http://common-lisp.net/project/closer/

This is how it looks like with some old pattern matching library from
1990 (Stephen Adams):

(defpatmatch simplify-add (form)
  ('+ (#'numberp x) (#'numberp y)) => (+ x y)
  ('+ 0 y) => y
  ('+ x 0) => x
  ('+ x ('+ y z)) => (simplify-add `(+ ,(simplify-add `(+ ,x ,y)) ,z))
  other => form)

(defpatmatch simplify-mul (form)
  (('* (#'numberp x) (#'numberp y)) (* x y))
  (('* 0 y) 0)
  (('* x 0) 0)
  (('* 1 y) y)
  (('* x 1) x)
  (('* x ('* y z)) (simplify-mul `(* ,(simplify-mul `(* ,x ,y)) ,z)))
  (other form))

(defpatmatch simplify (form)
  ('+ x y) => (simplify-add `(+ ,(simplify x) ,(simplify y)))
  ('* x y) => (simplify-mul `(* ,(simplify x) ,(simplify y)))
  other => form)


0
joswig (506)
2/19/2007 9:09:19 PM
joswig@corporate-world.lisp.de wrote:
> This is how it looks like with some old pattern matching library from
> 1990 (Stephen Adams):
> 
> (defpatmatch simplify-add (form)
>   ('+ (#'numberp x) (#'numberp y)) => (+ x y)
>   ('+ 0 y) => y
>   ('+ x 0) => x
>   ('+ x ('+ y z)) => (simplify-add `(+ ,(simplify-add `(+ ,x ,y)) ,z))
>   other => form)
> 
> (defpatmatch simplify-mul (form)
>   (('* (#'numberp x) (#'numberp y)) (* x y))
>   (('* 0 y) 0)
>   (('* x 0) 0)
>   (('* 1 y) y)
>   (('* x 1) x)
>   (('* x ('* y z)) (simplify-mul `(* ,(simplify-mul `(* ,x ,y)) ,z)))
>   (other form))
> 
> (defpatmatch simplify (form)
>   ('+ x y) => (simplify-add `(+ ,(simplify x) ,(simplify y)))
>   ('* x y) => (simplify-mul `(* ,(simplify x) ,(simplify y)))
>   other => form)

That's more like it!

Where do I get the library?

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/20/2007 12:33:44 AM
Jon Harrop wrote:
> joswig@corporate-world.lisp.de wrote:
>> This is how it looks like with some old pattern matching library from
>> 1990 (Stephen Adams):
>>
>> (defpatmatch simplify-add (form)
>>   ('+ (#'numberp x) (#'numberp y)) => (+ x y)
>>   ('+ 0 y) => y
>>   ('+ x 0) => x
>>   ('+ x ('+ y z)) => (simplify-add `(+ ,(simplify-add `(+ ,x ,y)) ,z))
>>   other => form)
>>
>> (defpatmatch simplify-mul (form)
>>   (('* (#'numberp x) (#'numberp y)) (* x y))
>>   (('* 0 y) 0)
>>   (('* x 0) 0)
>>   (('* 1 y) y)
>>   (('* x 1) x)
>>   (('* x ('* y z)) (simplify-mul `(* ,(simplify-mul `(* ,x ,y)) ,z)))
>>   (other form))
>>
>> (defpatmatch simplify (form)
>>   ('+ x y) => (simplify-add `(+ ,(simplify x) ,(simplify y)))
>>   ('* x y) => (simplify-mul `(* ,(simplify x) ,(simplify y)))
>>   other => form)
> 
> That's more like it!
> 
> Where do I get the library?

http://www.justfuckinggoogleit.com/search?q=%22Stephen+Adams%22+%22pattern+matcher%22


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
0
pc56 (3929)
2/20/2007 12:55:54 AM
Pascal Costanza wrote:
>> Where do I get the library?
>
http://www.justfuckinggoogleit.com/search?q=%22Stephen+Adams%22+%22pattern+matcher%22

What is the name of the library?

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/20/2007 12:57:45 AM
Jon Harrop wrote:
> Pascal Costanza wrote:
>>> Where do I get the library?
> http://www.justfuckinggoogleit.com/search?q=%22Stephen+Adams%22+%22pattern+matcher%22
> 
> What is the name of the library?

The search terms above yield five links. Please do your homework 
yourself, it's not that hard.


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
0
pc56 (3929)
2/20/2007 7:35:36 AM
Pascal Costanza wrote:
> Jon Harrop wrote:
>> Pascal Costanza wrote:
>>>> Where do I get the library?
>>
http://www.justfuckinggoogleit.com/search?q=%22Stephen+Adams%22+%22pattern+matcher%22
>> 
>> What is the name of the library?
> 
> The search terms above yield five links. Please do your homework
> yourself, it's not that hard.

3 links from there is a post by Ralf Moeller:

http://groups.google.com.br/group/comp.lang.lisp/msg/3db7356bde9d765b

giving a pattern matching implementation by Stephen Adams found "several
years ago" by Ralf, before 1993. Might be from 1990. However, it doesn't
implement the defpatmatch construct used by Rainer.

The nearest vaguely relevant link is to a PhD thesis by Guillermo J. Rozas
who mentions pattern matching and thanks Stephen Adams for "liking beer".

Googling for "defpatmatch" gives zero results.

While the Lisp code looks good, it is useless without the library it depends
upon...

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/20/2007 2:41:50 PM
Jon Harrop schrieb:
> Neelakantan Krishnaswami wrote:
>> In article <53mgbbF1sg5hpU1@mid.individual.net>, Pascal Costanza wrote:
>>> Thirdly, the deviations are hardly interesting. Real-world code
>>> spends very little time in method dispatch, or similar dispatching
>>> mechanisms. In the general case, real-world code actually does
>>> something interesting, so the differences you noted will not matter
>>> much in practice.
>> I don't buy this argument
> 
> Straw clutching.

This kind of harsh judgement is appropriate at the end of an argument, 
not while it's in progress.
Well, you obviously *think* it's over, but as you see, there's disagreement.

>> As a result, Lisp and ML programmers will go to some
>> lengths to write the example as (map (compose f g) lst).
> 
> That is misleading. The difference is down to evaluation semantics. Lazy
> evaluation results in deforesting in this case.

Not, not in deforestation.

The difference would be:
In a strict language, the entire intermediate list would be constructed 
in one go, then processed by the outer function.
In a lazy language, the intermediate list would be created one node at a 
time.

In both cases, the entire list is created and becomes unreachable 
afterwards.
If there is a garbage collection after each node construction, the lazy 
language will require less memory, since it will never have to store 
more than one node of the intermediate list at a time. (Could be a 
factor if the list is really large and each function is doing really 
massive computations. That would be a very long-running program in all 
though...)

However, the total number of nodes constructed for the intermediate 
liste is exactly the same.
Deforestation transforms the code so that the intermediate nodes aren't 
constructed at all.

Regards,
Jo
0
jo427 (1164)
2/20/2007 5:41:16 PM
Joachim Durchholz wrote:
> This kind of harsh judgement is appropriate at the end of an argument,
> not while it's in progress.
> Well, you obviously *think* it's over, but as you see, there's
> disagreement.

The argument about dispatch is over. Nathan left and Pascal retreated
into "dispatch is not performance critical".

> The difference would be:
> In a strict language, the entire intermediate list would be constructed
> in one go, then processed by the outer function.

O(n) excess allocation.

> In a lazy language, the intermediate list would be created one node at a
> time.

O(1) excess allocation.

> In both cases, the entire list is created and becomes unreachable
> afterwards.

I'd avoid "entire" in the context of potentially infinite implicit data
structures.

> If there is a garbage collection after each node construction, the lazy
> language will require less memory, since it will never have to store
> more than one node of the intermediate list at a time.

Exactly.

> (Could be a 
> factor if the list is really large and each function is doing really
> massive computations. That would be a very long-running program in all
> though...)

My factor of 35 was only for 10^6-element lists.

> However, the total number of nodes constructed for the intermediate
> liste is exactly the same.
> Deforestation transforms the code so that the intermediate nodes aren't
> constructed at all.

Then I'm using a different definition of "deforest". Let's forget about that
word and its definition. Here's my point:

ML programmers perform this transformation to avoid O(n) unnecessary
consing:

  fold f (map g list)
  fold (f o g) list

The performance improvements are often considerable, e.g. the factor of 35
that I quoted.

In contrast, the same transformation in Haskell does not asymptotically
reduce excess consing because there was only O(1) extra consing in the
first place. The transformation improves performance but not by anything
like as much.

My qualm is that this optimisation is all about allocation and the original
eager and lazy versions have completely different allocation profiles, so
it is unfair to compare transformations on them.

So I think the comparison should be restricted to lazy lists in ML, e.g. the
F#:

  Seq.fold f (Seq.map g list)
  Seq.fold (f o g) list

I don't know if that is deforested in F#.

I'd also be interested to hear whether or not Haskell compilers automate the
(much more valuable) deforesting of eager code.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/index.html?usenet
0
jon (3344)
2/20/2007 7:22:16 PM
Reply:

Similar Artilces:

Forth Vs Clean Vs J vs Lisp
Dear Friends of Forth, I have just found Forth and am excited about it. But i need to clarify its benefits in pc programming rather than using c, c++ or Java or a scripting language. Then there is speed of execution. Seems Forth is like Lisp in that it can be used to write or extend one's own language because you can build up your own definitions. Lisp is powerful and expressive and fast prototyping like Forth but usually slow. Forth can go down further to lower level programming - powerful. But is the speed one gets without heavy tweaking better than most other languages? Interpreted For...

Haskell: functional languages vs Lisp
I am curious to know what advantages does Lisp have over Haskell. As far as I know the Haskell Prelude as well as its modules implement most of the functionality that lisp provides but with a much cleaner and succinct syntax. Furthermore, Paul Hudak claims that lisp is not based on lambda calculus (as many people think) and is not a true functional language; therefore in some instance this can make formal verification of lisp programs difficult. I am aware that lisp has many great features such as multiple programming paradigms, that it is extensible, adaptable, and that it takes a pragmatic ...

pure functions vs. functions
Hey everyone, I had a beginner question: what exactly is the difference between a pure function and just a function? ie. the difference between declaring functions using f[arg1_,...,argn_] := "some expression making use of the arguments" and explicilty calling Function[{arg1_,...,argn_}, "same expression making use of the arguments"] They have different heads!! Thanks MR ...

Lisp vs. Common Lisp
I'm deeply interested in Lisp. I'm learning it through Peter Seibel's PCL. One comment I find a lot when people talk about Lisp is that Lisp is wonderful, but CL has a lot of problems (problems I cannot understand due to my newbieness). Now, I don't want to learn the wrong Lisp. What do people mean by Lisp in this context? Is it Lisp 1, Lisp 1.5, the particular dialect they used and fall in love with? If this unidentified Lisp is so great, why don't they use it and avoid all the pitfalls of CL? On Sat, 12 Jun 2010 09:36:50 -0700, Simon wrote: > I'm...

Haskell: functional languages vs Lisp #2
marcelin7@yahoo.com (Nelson Marcelino) wrote in message news:<d4626d19.0405280530.3c7c259f@posting.google.com>... > I am curious to know what advantages does Lisp have over Haskell. Lisp is a mature, ANSI-standardized language with a plethora of free as well as proprietary, commercial implementations. Lisp has many features, in it, too numerous to list here. A Lisp application can be shipped as a native code executable. Yet, even while that that application is running, it can be patched live with newer versions of compiled functions. Moreover, the Lisp lexical scanner and compiler ...

ML estimation: maximizing the objective function VS finding zeros of the derivatives
Finding maximum likelihood estimation can be numerically solved both maximizing the likelihood function or by computing analytically the derivatives and solving a system of non linear equation. From a numerically point of view which of these two strategies is better, with regards to speed of calculation and reliability of results? Thanks Andrea andrea wrote: > > > Finding maximum likelihood estimation can be numerically solved > both > maximizing the likelihood function or by computing analytically the > derivatives and solving a system of non linear equation. > > Fr...

Function name vs pointer to function
Is a function name in C treated the same as a pointer to that function? For example, the following program compiles and runs fine: #include <stdio.h> #include <stdlib.h> int callback(int x){return x;} void print_callback(int cb()){ printf("%d\n", cb(10)); } int main(int argc, char *argv[]){ print_callback(callback); return 0; } If so, why should I explicitly use a pointer to a function? Sortie <nomail@nomail.com> writes: > Is a function name in C treated the same as a pointer to that > function? Yes, but... > void print_callback(int...

LISP vs HASKELL vs PROLOG
I'm not trying to start a flame war about which one is the best. Could anybody explain me each of these languages features and strong points ? !!!@!!! wrote: > I'm not trying to start a flame war about which one is the best. Could > anybody explain me each of these languages features and strong points ? I know a bit Common Lisp, but I'm still lisp toddler. I can barely understand haskell, and have troubles reading prolog code. So Common Lisp is multi-paradigm language - you can code in many different ways. Strong points are powerful macro system, continuous development ...

subroutines vs method vs function
Hi, Can someone clarify the 3 terms wrt Perl providing with some examples? Regards, BH BH <Benson.Hoi@googlemail.com> writes: > Hi, > > Can someone clarify the 3 terms wrt Perl providing with some examples? they're all subroutines. functions and subroutines are the same thing in perl. methods are subroutines that are called using method resolution and get the object passed as the first argument. IOW there methods as such, just functions that get called using method call semantics. package Bla; sub something { print "my arguments are '@_'\n" } Bla-&...

function prototype vs function declaration
Dear C Experts, While prepating a content for a C course,I made section on function prototypes. Could you kindly provide me your comments on its correctness. Thank you ! Q12: What is the difference between a function prototype and a function declaration? If you get this right, you must have done fair amount of research on C programming beginning with "K&R" C. It turns out that function declarations and function prototypes are the same thing in standard C. In K&R C a function declaration did not need to have the types of its arguments to be specified. Thus a declaration li...

clear functions vs mex functions?
What does the 'clear functions' call do for mex functions? We have some mex functions in a Simulink model that don't seem to like to run a second simulation after one finishes unless you exit Matlab or do 'clear functions'. Could it be some kind of memory allocation that is cleared? or global memory that is freed up?? any other ideas where to start looking?? dave wrote: > > > What does the 'clear functions' call do for mex functions? We have > some mex functions in a Simulink model that don't seem to like to > run > a second simulation after o...

function overload vs. virtual function
They looks so similar. Anybody would like to tell me their differences? Thanks a lot. On Mon, 09 Oct 2006 19:37:36 -0700, asdf wrote: > They looks so similar. Anybody would like to tell me their differences? > Thanks a lot. As I understand it, virtual functions refer to the use of a function that has overriden a function of the same name that was inherited from another class. Overloading functions refers to creating multiple functions with the same name, but ultimately have different signatures because each one takes a different set of arguments (or maybe none at all). Please repe...

Template functions vs. function objects
Hi Can someone please write some "compare and contrast" notes for "Template functions vs. function objects"? When to use one and not the other? For example, the TF square_f does the same thing as the FO square_o in the following. When MUST you use one over the other, and when MIGHT you use one over the other? Thanks a bunch..... #include <iostream.h> template<class T> class Square { public: T operator()(const T& t) { return t*t; } }; template<class T> T square_f(const T& t) { return t*t; } int main(int argc, char **argv) { Square<int> sq...

Re: pure functions vs. functions
MR, First, pure functions are very useful in Mathematica and it's great that you are looking at them. Pure functions are usually used where we want to carry out some operation on expressions without actually going to the bother of defining a named function. Maybe it is something you want to do on the fly and you are only going to use it once. Here is a function that takes the square of an expression. f[x_]:=x^2 We use it... f[2a] 4*a^2 But maybe we only wanted to do this once. So why bother defining f. (Of course, in this case we could just use (2a)^2 but I'm...

Web resources about - ML vs. Lisp - comp.lang.functional

Lisp - Wikipedia, the free encyclopedia
For the programming language, see Lisp (programming language) . For the Internet protocol, see Locator/Identifier Separation Protocol . The "lateral" ...

"Shen: A Sufficiently Advanced Lisp" by Aditya Siram - YouTube
Shen is an hosted Lisp that comes with a full-featured macro system, a Prolog, an optional type system more powerful than Haskell's, and does ...

'I have a lisp': Charlie Sheen's homophobic slur
Charlie Sheen just can't help himself.

the Depreciation Guild, Box Elders, My Teenage Stride, a Lisps musical & more in This Week in Indie
by Bill Pearis The Depreciation Guild For fans of Shoegaze/Dreampop/etc, the Bell House is where you want to be tonight (1/8) for a free show ...

John McCarthy, creator of Lisp programming language, dies
Also a pioneer in artificial intelligence, McCarthy is credited with having coined the term in 1955. John McCarthy, the creator of the Lisp programming ...

Cam'ron had a bad lisp as a kid, so he admired Kool G Rap because he had a lisp, too.
Today is the 10-year anniversary of The Diplomats' landmark debut, Diplomatic Immunity. Although the album was certainly a crew effort,...

Gay pitch, vowels...and lisp?
ON FRIDAY we previewed a post on the stereotypical "gay voice". We'll look at English today, and stereotypes for gay men only. "Gay voice", for ...

How Gay Marriage Is Like Lisp
Paul Graham once wrote an essay about the power of programming languages. Programming languages vary in abstraction, with the most abstract languages ...

Does Charlie Sheen Really Have a Lisp Or Did He Just Use a Homophobic Slur?
Actor and profesional crazy person Charlie Sheen recently hosted the opening of his new bar in the El Ganzo hotel in Cabo San Lucas, Mexico. ...

lisp/de.setf.amqp · GitHub
de.setf.amqp - A Common Lisp client library for the Advanced Message Queueing Protocol (AMQP)

Resources last updated: 1/28/2016 8:35:44 PM