Sudoku puzzle solver for Apple II

  • Follow


Scott Hemphill and I recently collaborated to create a Sudoku puzzle
solver for Apple II computers.  The machine language solver is Scott's,
with some adaptations by me, and I wrote the interactive Applesoft
front-end.

SUDOKU runs on any Apple II with 80-column firmware and a 65C02
processor, which includes the IIc, the IIc+, the Enhanced //e, and
the IIgs.

It is amazingly fast!  Most Sudoku solvers run on modern PCs thousands
of times faster than a 1MHz Apple II, and take seconds to solve a
puzzle, but Scott's solver is so time and space efficient that it
solves puzzles in seconds *running on an Apple II*!

Check it out at:

http://members.aol.com/mjmahon/Sudoku.html

It is available both as a ShrinkIt disk archive and as a .dsk image.

Enjoy!

(BTW, the execution profile referred to in the paper has not yet
been uploaded, so for the time being, that's a 404. ;-)

-michael

Fast Sudoku solving for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it is seriously underused."
0
Reply Michael 7/12/2006 10:05:59 AM

In article <XtudnVI15qUVVCnZnZ2dnUVZ_oydnZ2d@comcast.com>,
Michael J. Mahon <mjmahon@aol.com> wrote:
>Scott Hemphill and I recently collaborated to create a Sudoku puzzle
>solver for Apple II computers.  The machine language solver is Scott's,
>with some adaptations by me, and I wrote the interactive Applesoft
>front-end.
>
>SUDOKU runs on any Apple II with 80-column firmware and a 65C02
>processor, which includes the IIc, the IIc+, the Enhanced //e, and
>the IIgs.
>
>It is amazingly fast!  Most Sudoku solvers run on modern PCs thousands
>of times faster than a 1MHz Apple II, and take seconds to solve a
>puzzle,

I don't believe that!  Check out this Sudoku solver:
    http://homepage.ntlworld.com/valleyway/solver.html

It's written in Javascript, which of course is a quite inefficient
programming language - yet it needs less than half a second to solve
a sudoku, on a 4+ year old PC running at a moderate 1.8 GHz clock speed.

You must have encountered some really sloppily written sudoku solvers
for the PC if they required several seconds to solve a sudoku....


> but Scott's solver is so time and space efficient that it
>solves puzzles in seconds *running on an Apple II*!
>
>Check it out at:
>
>http://members.aol.com/mjmahon/Sudoku.html
>
>It is available both as a ShrinkIt disk archive and as a .dsk image.

....but why does it require a 65C02 ?  Couldn't it have been written
in plain 6502 assembly language?


>Enjoy!
>
>(BTW, the execution profile referred to in the paper has not yet
>been uploaded, so for the time being, that's a 404. ;-)
>
>-michael
>
>Fast Sudoku solving for Apple II's!
>Home page:  http://members.aol.com/MJMahon/
>
>"The wastebasket is our most important design
>tool--and it is seriously underused."


-- 
----------------------------------------------------------------
Paul Schlyter,  Grev Turegatan 40,  SE-114 38 Stockholm,  SWEDEN
e-mail:  pausch at stockholm dot bostream dot se
WWW:     http://stjarnhimlen.se/
0
Reply pausch 7/12/2006 12:13:07 PM


> I don't believe that!  Check out this Sudoku solver:
>     http://homepage.ntlworld.com/valleyway/solver.html
>
> It's written in Javascript, which of course is a quite inefficient
> programming language - yet it needs less than half a second to solve
> a sudoku, on a 4+ year old PC running at a moderate 1.8 GHz clock speed.

Javascript isn't all that bad. :-P  Sure, it's not exactly suited for
realtime applications, but once the code has been interpreted and all
data structures are created, it's pretty decent actually.  I've found
it to be not much slower than, say, Perl.  Some things in the language
are actually a lot faster than you think!  Dynamically-created data
structures are not a strong-point of javascript when speed is concerned
though.

http://www.jorendorff.com/articles/javascript/speed-test.html

I don't see enough people rag on the speed of, say, Ruby or Python.
:-D

> ...but why does it require a 65C02 ?  Couldn't it have been written
> in plain 6502 assembly language?

*sigh*  Why are there so many anti-65c02 enthusiasts out there?  Can't
you afford a free emulator that can handle the opcodes or shell out $6
for a new 65c02 processor?  But, seriously, I have a strong preference
for 65c02 opcodes in my code as well becase it's a slight bit more
convienent.  Why limp when you can walk?  (or use 65816 and go for a
jog ;-)

-B

0
Reply BLuRry 7/12/2006 2:44:20 PM

Michael J. Mahon wrote:
> Scott Hemphill and I recently collaborated to create a Sudoku puzzle
> solver for Apple II computers.  The machine language solver is Scott's,
> with some adaptations by me, and I wrote the interactive Applesoft
> front-end.

Holy algorithm, Batman!  That is cool stuff.  Very excellent algorithm
design!  A few moot points to consider if you're a total perfectionist:

- Could it be possible to support arrow keys?

- Could it use mousetext for the display, since it's already singled
out to use 65c02, and thus mainly enhanced //'s anyway?

- Maybe make some boxes to uneditable (e.g. keep them hilighted) so
that the original numbers can remain unaltered.

Whether you even care about these suggestions is totally up to you. ;-)
 Awesome stuff, guys!  Keep up the great work!

0
Reply BLuRry 7/12/2006 3:11:08 PM

"BLuRry" (brendan.robert@gmail.com) writes:
>> I don't believe that!  Check out this Sudoku solver:
>>     http://homepage.ntlworld.com/valleyway/solver.html
>>
>> It's written in Javascript, which of course is a quite inefficient
>> programming language - yet it needs less than half a second to solve
>> a sudoku, on a 4+ year old PC running at a moderate 1.8 GHz clock speed.
> 
> Javascript isn't all that bad. :-P  Sure, it's not exactly suited for
> realtime applications, but once the code has been interpreted and all
> data structures are created, it's pretty decent actually.  I've found
> it to be not much slower than, say, Perl.  Some things in the language
> are actually a lot faster than you think!  Dynamically-created data
> structures are not a strong-point of javascript when speed is concerned
> though.
> 
> http://www.jorendorff.com/articles/javascript/speed-test.html
> 
> I don't see enough people rag on the speed of, say, Ruby or Python.
> :-D
> 
>> ...but why does it require a 65C02 ?  Couldn't it have been written
>> in plain 6502 assembly language?
> 
> *sigh*  Why are there so many anti-65c02 enthusiasts out there?  Can't
> you afford a free emulator that can handle the opcodes or shell out $6
> for a new 65c02 processor?  But, seriously, I have a strong preference
> for 65c02 opcodes in my code as well becase it's a slight bit more
> convienent.  Why limp when you can walk?  (or use 65816 and go for a
> jog ;-)
> 
What's the point of having a program for a relatively new game for
a computer that hasn't been made in almost twenty years, but then
you are assuming the most interest will be in running it on an emulator?

At that point, you aren't talking about some functionality of an old
computer, you are doing it for no good reason.

And once you move away from emulators, you do start limiting the
computers that can run the program when you use a 65C02.  In other
words, you may be limiting who can run this program on an actual Apple
II.

But the question still remains, why is it so important to use
the few instructions that are unique to the 65C02?  They may be nice
additions, but lots of people survived without them.

  Michael

0
Reply et472 7/12/2006 3:18:36 PM

pausch@saaf.se (Paul Schlyter) writes:

> In article <XtudnVI15qUVVCnZnZ2dnUVZ_oydnZ2d@comcast.com>,
> Michael J. Mahon <mjmahon@aol.com> wrote:
> >Scott Hemphill and I recently collaborated to create a Sudoku puzzle
> >solver for Apple II computers.  The machine language solver is Scott's,
> >with some adaptations by me, and I wrote the interactive Applesoft
> >front-end.
> >
> >SUDOKU runs on any Apple II with 80-column firmware and a 65C02
> >processor, which includes the IIc, the IIc+, the Enhanced //e, and
> >the IIgs.
> >
> >It is amazingly fast!  Most Sudoku solvers run on modern PCs thousands
> >of times faster than a 1MHz Apple II, and take seconds to solve a
> >puzzle,
> 
> I don't believe that!  Check out this Sudoku solver:
>     http://homepage.ntlworld.com/valleyway/solver.html

I don't know about _most_ Sudoku solvers, but there are a lot of slow ones
out there.  And in terms of taking seconds, I think Michael was talking
about difficult puzzles, not easy ones.  Our most frequently used test
puzzle was one we called "evil01".  I loaded this into the solver you
mentioned and it took about 4 seconds on my 3GHz machine.  Here's the
puzzle:

..2.......
....6....3
..74.8....
......3..2
..8..4..1.
6..5.....
.....1.78.
5....9...
........4.

> It's written in Javascript, which of course is a quite inefficient
> programming language - yet it needs less than half a second to solve
> a sudoku, on a 4+ year old PC running at a moderate 1.8 GHz clock speed.

I don't know how inefficient Javascript is.  Is this a good speed?  My
C program solves the evil01 puzzle in about 2 milliseconds.

> You must have encountered some really sloppily written sudoku solvers
> for the PC if they required several seconds to solve a sudoku....

Yup, if by sloppy you mean not choosing the most efficient algorithms.

> > but Scott's solver is so time and space efficient that it
> >solves puzzles in seconds *running on an Apple II*!
> >
> >Check it out at:
> >
> >http://members.aol.com/mjmahon/Sudoku.html
> >
> >It is available both as a ShrinkIt disk archive and as a .dsk image.
> 
> ...but why does it require a 65C02 ?  Couldn't it have been written
> in plain 6502 assembly language?

It requires a 65C02 because that's the way I initially wrote the
assembly language solver.  As it underwent various transformations
I started migrating it towards the 6502, but we decided to keep the
speed and release the code.  It's public domain, with all sources
provided, so anyone is free to modify it for the 6502.  Furthermore,
we didn't put in all the speed tweaks we thought of, and Michael has
thoughtfully mentioned some of those improvements on his website if
anyone wants to pick up where we left off.

> >Enjoy!
> >
> >(BTW, the execution profile referred to in the paper has not yet
> >been uploaded, so for the time being, that's a 404. ;-)

(I've sent it to Michael, so I imagine it will appear there soon.)

Scott
-- 
Scott Hemphill	hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear
0
Reply Scott 7/12/2006 3:46:04 PM

> What's the point of having a program for a relatively new game for
> a computer that hasn't been made in almost twenty years, but then
> you are assuming the most interest will be in running it on an emulator?

A program is a program...  I write lots of programs that run within a
emulator for a non-existant machine and make good money doing it. ;-)
Applewin + soduku < 1mb of download.  This is still within the realm of
feasibility when you look at how big most games are these days.

> At that point, you aren't talking about some functionality of an old
> computer, you are doing it for no good reason.

Is there ever any "good" reason for playing games?  I do crazy things
with apple //'s because, well, I can.  And because I was frustrated I
couldn't at an earlier age in life when I had a real //.  I'm sure that
others have their own reasons that equally valid.  ;-)

> And once you move away from emulators, you do start limiting the
> computers that can run the program when you use a 65C02.

Well, kind of.  It's easier to find apple //e's and apple //c's on
ebay, and altogether there were more of those produced than older ]['s.
 I agree for the purpose of total nostalgia that you would want
complete portability (which is why I rewrote a2gameserver to support
old ][ computers as well) but ultimately it is harder to find an older,
non 65c02 or 65816 based apple computer than a legacy 6502 machine.

>  In other words, you may be limiting who can run this program on an actual Apple II.

I agree that one should be sensitive to that if there was tangible
penalty for limiting the audience.  Such as, if this were a commercial
product being sold.  But for free, I'll take what I can get!  :-)

> But the question still remains, why is it so important to use
> the few instructions that are unique to the 65C02?  They may be nice
> additions, but lots of people survived without them.

I think that the extended opcodes were used in developing the algorithm
to make it as efficient as possible.  MJ Mahon's page has a pretty
awesome amount of detail around how this thing evolved  TSB was one of
the opcodes used at some point, of which there is no easy equivilant
without storing the accumulator somewhere, loading a value, performing
the bit operations, etc...  Would be pretty messy in 6502, and
definately less optimal.  There are also a lot of table lookups, which
may have been optimized with the extra opcodes (I'm guessing -- I
haven't read thru all the source though it is fascinating work)

0
Reply BLuRry 7/12/2006 6:03:32 PM

In article <m364i29683.fsf@jade.local>,
Scott Hemphill  <hemphill@alumni.caltech.edu> wrote:

> pausch@saaf.se (Paul Schlyter) writes:
> 
>> In article <XtudnVI15qUVVCnZnZ2dnUVZ_oydnZ2d@comcast.com>,
>> Michael J. Mahon <mjmahon@aol.com> wrote:
>>>Scott Hemphill and I recently collaborated to create a Sudoku puzzle
>>>solver for Apple II computers.  The machine language solver is Scott's,
>>>with some adaptations by me, and I wrote the interactive Applesoft
>>>front-end.
>>>
>>>SUDOKU runs on any Apple II with 80-column firmware and a 65C02
>>>processor, which includes the IIc, the IIc+, the Enhanced //e, and
>>>the IIgs.
>>>
>>>It is amazingly fast!  Most Sudoku solvers run on modern PCs thousands
>>>of times faster than a 1MHz Apple II, and take seconds to solve a
>>>puzzle,
>> 
>> I don't believe that!  Check out this Sudoku solver:
>>     http://homepage.ntlworld.com/valleyway/solver.html
> 
> I don't know about _most_ Sudoku solvers, but there are a lot of slow ones
> out there.  And in terms of taking seconds, I think Michael was talking
> about difficult puzzles, not easy ones.  Our most frequently used test
> puzzle was one we called "evil01".  I loaded this into the solver you
> mentioned and it took about 4 seconds on my 3GHz machine.  Here's the
> puzzle:
> 
> .2.......
> ...6....3
> .74.8....
> .....3..2
> .8..4..1.
> 6..5.....
> ....1.78.
> 5....9...
> .......4.

:-) ....this shows the importance of carefully specifying the initial conditions
in your benchmark.  I just typed in some numbers until I got a unique solution,
and used that as a test case.

>> It's written in Javascript, which of course is a quite inefficient
>> programming language - yet it needs less than half a second to solve
>> a sudoku, on a 4+ year old PC running at a moderate 1.8 GHz clock speed.
> 
> I don't know how inefficient Javascript is.  Is this a good speed?  My
> C program solves the evil01 puzzle in about 2 milliseconds.
> 
>> You must have encountered some really sloppily written sudoku solvers
>> for the PC if they required several seconds to solve a sudoku....
> 
> Yup, if by sloppy you mean not choosing the most efficient algorithms.

That's indeed part of sloppiness!

 
>>> but Scott's solver is so time and space efficient that it
>>>solves puzzles in seconds *running on an Apple II*!
>>>
>>>Check it out at:
>>>
>>>http://members.aol.com/mjmahon/Sudoku.html
>>>
>>>It is available both as a ShrinkIt disk archive and as a .dsk image.
>> 
>> ...but why does it require a 65C02 ?  Couldn't it have been written
>> in plain 6502 assembly language?
> 
> It requires a 65C02 because that's the way I initially wrote the
> assembly language solver.  As it underwent various transformations
> I started migrating it towards the 6502, but we decided to keep the
> speed and release the code.

APproximately how much slower do you think a 6502 version would have
been?  10% slower?  Twice as slow?  10 times as slow?

> It's public domain, with all sources
> provided, so anyone is free to modify it for the 6502.  Furthermore,
> we didn't put in all the speed tweaks we thought of, and Michael has
> thoughtfully mentioned some of those improvements on his website if
> anyone wants to pick up where we left off.

What assembler did you use?
 
>>>Enjoy!
>>>
>>>(BTW, the execution profile referred to in the paper has not yet
>>>been uploaded, so for the time being, that's a 404. ;-)
> 
> (I've sent it to Michael, so I imagine it will appear there soon.)
> 
> Scott

-- 
----------------------------------------------------------------
Paul Schlyter,  Grev Turegatan 40,  SE-114 38 Stockholm,  SWEDEN
e-mail:  pausch at stockholm dot bostream dot se
WWW:     http://stjarnhimlen.se/
0
Reply pausch 7/12/2006 8:43:38 PM

In article <1152715460.757037.151750@h48g2000cwc.googlegroups.com>,
BLuRry <brendan.robert@gmail.com> wrote:

>> I don't believe that!  Check out this Sudoku solver:
>>     http://homepage.ntlworld.com/valleyway/solver.html
>>
>> It's written in Javascript, which of course is a quite inefficient
>> programming language - yet it needs less than half a second to solve
>> a sudoku, on a 4+ year old PC running at a moderate 1.8 GHz clock speed.
> 
> Javascript isn't all that bad. :-P  Sure, it's not exactly suited for
> realtime applications, but once the code has been interpreted and all
> data structures are created, it's pretty decent actually.

....you mean when the program has reached the end of execution?
Interpretation doesn't stop until the program reaches its end -- unless
the "interpreter" actually is a semi-compiler....

> I've found it to be not much slower than, say, Perl.

It probably won't be much slower than interpreted BASIC either....

> Some things in the language are actually a lot faster than you think!
> Dynamically-created data structures are not a strong-point of javascript
> when speed is concerned though.

Interesting --- because dynamically created data structures must be
handled at runtime, even in compiled languages.
 
> http://www.jorendorff.com/articles/javascript/speed-test.html
> 
> I don't see enough people rag on the speed of, say, Ruby or Python.
> :-D
>
>> ...but why does it require a 65C02 ?  Couldn't it have been written
>> in plain 6502 assembly language?
> 
> *sigh*  Why are there so many anti-65c02 enthusiasts out there?

I'm not anti-65C02, but my good ol' Apple II+ is.  And there's little
I can do about that, unfortunately.

> Can't you afford a free emulator that can handle the opcodes

I can afford a modern computer, but I see little point in using it
for running some "lobotomy software", such as emulators for older,
less capable, computers.

> or shell out $6 for a new 65c02 processor?

In the early 1980's I shelled out $50 for a 65C02!  That's probably
some $250 in today's money .... however I had little use for it.
I tried it in my Apple II+, which just locked up.  I also tried it on
some Apple IIe's, and there it worked fine.  So the CPU worked OK.
I still have it.....

> But, seriously, I have a strong preference for 65c02 opcodes in my
> code as well becase it's a slight bit more convienent.  Why limp
> when you can walk?  (or use 65816 and go for a jog ;-)

....if so, why not fly with a modern 32-bit or 64-bit CPU ?   ;-)
 
> -B

OK, you program for your own enjoyment, not for the utility of others.
All fine and well, we all choose why we do what we do -- but it's good
to know that it's no use trying your software on Apple II's earlier
than the IIe....

-- 
----------------------------------------------------------------
Paul Schlyter,  Grev Turegatan 40,  SE-114 38 Stockholm,  SWEDEN
e-mail:  pausch at stockholm dot bostream dot se
WWW:     http://stjarnhimlen.se/
0
Reply pausch 7/12/2006 8:43:38 PM

Paul Schlyter wrote:
> In article <XtudnVI15qUVVCnZnZ2dnUVZ_oydnZ2d@comcast.com>,
> Michael J. Mahon <mjmahon@aol.com> wrote:
> 
>>Scott Hemphill and I recently collaborated to create a Sudoku puzzle
>>solver for Apple II computers.  The machine language solver is Scott's,
>>with some adaptations by me, and I wrote the interactive Applesoft
>>front-end.
>>
>>SUDOKU runs on any Apple II with 80-column firmware and a 65C02
>>processor, which includes the IIc, the IIc+, the Enhanced //e, and
>>the IIgs.
>>
>>It is amazingly fast!  Most Sudoku solvers run on modern PCs thousands
>>of times faster than a 1MHz Apple II, and take seconds to solve a
>>puzzle,
> 
> 
> I don't believe that!  Check out this Sudoku solver:
>     http://homepage.ntlworld.com/valleyway/solver.html
> 
> It's written in Javascript, which of course is a quite inefficient
> programming language - yet it needs less than half a second to solve
> a sudoku, on a 4+ year old PC running at a moderate 1.8 GHz clock speed.
> 
> You must have encountered some really sloppily written sudoku solvers
> for the PC if they required several seconds to solve a sudoku....

That may be--I haven't actually tried a bunch of PC Sudoku solvers.

If they are also "tidy" enough, perhaps one of these faster (and
hopefully small) solvers could also have been ported to the Apple II.

I probably should have been satisfied simply to say that it's fast.  ;-)

>>but Scott's solver is so time and space efficient that it
>>solves puzzles in seconds *running on an Apple II*!
>>
>>Check it out at:
>>
>>http://members.aol.com/mjmahon/Sudoku.html
>>
>>It is available both as a ShrinkIt disk archive and as a .dsk image.
> 
> 
> ....but why does it require a 65C02 ?  Couldn't it have been written
> in plain 6502 assembly language?

I usually enforce that restriction on my own code, and am seldom much
constrained by it.  However, in this case, a few 65C02 instructions
relieve register pressure in inner loops, making a quite measurable
difference in performance (but still less than a factor of 2, so a
6502 version is quite practical).

I had it on my list to do a 6502 version, but thought it would be
interesting to get this version out for people to play with.

Note that it also depends on the 80-column firmware control characters,
so that pretty well pushes users toward 65C02 machines.  The only one
that is "popular" and left out is the unenhanced //e.  ;-(

>>Enjoy!
>>
>>(BTW, the execution profile referred to in the paper has not yet
>>been uploaded, so for the time being, that's a 404. ;-)

This has now been uploaded, so the profile is there.  ;-)

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/12/2006 8:44:33 PM

Michael Black wrote:

<snip>

> And once you move away from emulators, you do start limiting the
> computers that can run the program when you use a 65C02.  In other
> words, you may be limiting who can run this program on an actual Apple
> II.

Well, I began the limitation by exploiting the 80-column firmware
(in 40-column mode ;-) to aloow control characters to clear to end
of page, etc.  That could also be re-done, but I left it for later
because I figured that most people had access to machines that had
the firmware--and a 65C02 as well (though I realize that the
unenhanced //e doesn't).

> But the question still remains, why is it so important to use
> the few instructions that are unique to the 65C02?  They may be nice
> additions, but lots of people survived without them.

I don't think that it is, as a rule.

In this case, I looked at the inner loops, where every cycle counts,
and couldn't see any "free" or even "inexpensive" way to do away with
the 65C02isms.  There are only a few, like LDA (ptr) when Y is busy with
some other useful value.  Having to save it and set it to 0, then
restore it, seems a little expensive if you're trying to go fast.

Almost all other cases, like BRA and INA, can be eliminated when the
time comes without penalty, and I've already removed some.

But until I bring myself to remove *all* 65C02 ops, and take the
time penalty, there's little reason to fix a few.

So--I agree with you in principle, but speed and "time to market"
were more important to me as I'm about to leave town for a week.  ;-)

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/12/2006 8:57:41 PM

BLuRry wrote:
> Michael J. Mahon wrote:
> 
>>Scott Hemphill and I recently collaborated to create a Sudoku puzzle
>>solver for Apple II computers.  The machine language solver is Scott's,
>>with some adaptations by me, and I wrote the interactive Applesoft
>>front-end.
> 
> 
> Holy algorithm, Batman!  That is cool stuff.  Very excellent algorithm
> design!  A few moot points to consider if you're a total perfectionist:
> 
> - Could it be possible to support arrow keys?

The arrow keys are already supported--did you try them?

> - Could it use mousetext for the display, since it's already singled
> out to use 65c02, and thus mainly enhanced //'s anyway?

That was both "over the top" and more of a mess in the Applesoft
program.  Maybe someone else would like to try it?

Frankly, I kind of like its ugly little text face.  ;-)

> - Maybe make some boxes to uneditable (e.g. keep them hilighted) so
> that the original numbers can remain unaltered.

Not sure I know what you mean here...  After a complete solution, the
grid is returned to its original state.  Do you mean provide protection
of the user from himself?  ;-)

When I change something I didn't intend to, I just reload the original
puzzle (which I save as soon as I've entered it).

> Whether you even care about these suggestions is totally up to you. ;-)
>  Awesome stuff, guys!  Keep up the great work!

Thanks!  It was great fun working together!

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/12/2006 9:04:20 PM

Michael J. Mahon wrote:

> But until I bring myself to remove *all* 65C02 ops, and take the
> time penalty, there's little reason to fix a few.

I should also note that it is currently ProDOS-based, and is
distributed with a 65C02 version, 2.0.3, so that would also
need to be changed.

A DOS 3.3 version would not be hard (though I used a
subdirectory for the puzzles for convenience).

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/12/2006 9:14:53 PM

Michael J. Mahon wrote:

> (BTW, the execution profile referred to in the paper has not yet
> been uploaded, so for the time being, that's a 404. ;-)

The profile of the solver on "evil01" is there now.  ;-)

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/12/2006 9:18:35 PM

Paul Schlyter wrote:

> APproximately how much slower do you think a 6502 version would have
> been?  10% slower?  Twice as slow?  10 times as slow?

Certainly less than twice as slow.  I'll find out what the current
speed impact would be...

But, as noted, there are other things about the program(s) that
would keep it from running on your ][+, too--all fixable with
some more effort...

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/12/2006 9:23:13 PM

Michael J. Mahon wrote:
> Paul Schlyter wrote:
> 
>> APproximately how much slower do you think a 6502 version would have
>> been?  10% slower?  Twice as slow?  10 times as slow?
> 
> 
> Certainly less than twice as slow.  I'll find out what the current
> speed impact would be...

Paul's message prompted me to re-look at where 65C02 ops are used, and
they are all quite easy to eliminate--most with no or negligible time 
impact.

The only one that has a measurable impact is the LDA (p) in setbox,
and that can be eliminated by just doing an LDA bit rather than a TXA,
which frees up the X register to hold 0, so that the LDA (p) can become
LDA (p,x).  (I don't get to use that addressing mode much--even if it is
with a constant 0 index.  ;-)

The net impact is +1 cycle in a 166K loop, for a total time cost of 0.17 
seconds!

So I think I'll make the changes and replace ProDOS with an earlier
version, so unenhanced //e's will run the program, too.

BTW, this is a good example of why it's worth reevaluating earlier
decisions later in the game.  I had taken a quick look at this a
week ago, and, without really trying, didn't see an obvious fix.
Now it all seems pretty easy.  ;-)

So I guess this is another confirmation of my belief that the 65C02
extensions were largely unnecessary.

> But, as noted, there are other things about the program(s) that
> would keep it from running on your ][+, too--all fixable with
> some more effort...

There's still that 80-column firmware issue...   Hmmm...

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/12/2006 10:34:12 PM

By your logic, no one should ever write an Apple II program using 65c02
language because then not every A2 owner can run it.  How many Apple II
enthusiasts do you know that don't have AT LEAST one 65c02-equipped
Apple in their arsenal?  I think I only have one Apple II that DOESN'T
have a 65c02 in it (or a processor capable of running 65c02 code).

Chris

Michael Black wrote:
> "BLuRry" (brendan.robert@gmail.com) writes:
> >> I don't believe that!  Check out this Sudoku solver:
> >>     http://homepage.ntlworld.com/valleyway/solver.html
> >>
> >> It's written in Javascript, which of course is a quite inefficient
> >> programming language - yet it needs less than half a second to solve
> >> a sudoku, on a 4+ year old PC running at a moderate 1.8 GHz clock speed.
> >
> > Javascript isn't all that bad. :-P  Sure, it's not exactly suited for
> > realtime applications, but once the code has been interpreted and all
> > data structures are created, it's pretty decent actually.  I've found
> > it to be not much slower than, say, Perl.  Some things in the language
> > are actually a lot faster than you think!  Dynamically-created data
> > structures are not a strong-point of javascript when speed is concerned
> > though.
> >
> > http://www.jorendorff.com/articles/javascript/speed-test.html
> >
> > I don't see enough people rag on the speed of, say, Ruby or Python.
> > :-D
> >
> >> ...but why does it require a 65C02 ?  Couldn't it have been written
> >> in plain 6502 assembly language?
> >
> > *sigh*  Why are there so many anti-65c02 enthusiasts out there?  Can't
> > you afford a free emulator that can handle the opcodes or shell out $6
> > for a new 65c02 processor?  But, seriously, I have a strong preference
> > for 65c02 opcodes in my code as well becase it's a slight bit more
> > convienent.  Why limp when you can walk?  (or use 65816 and go for a
> > jog ;-)
> >
> What's the point of having a program for a relatively new game for
> a computer that hasn't been made in almost twenty years, but then
> you are assuming the most interest will be in running it on an emulator?
>
> At that point, you aren't talking about some functionality of an old
> computer, you are doing it for no good reason.
>
> And once you move away from emulators, you do start limiting the
> computers that can run the program when you use a 65C02.  In other
> words, you may be limiting who can run this program on an actual Apple
> II.
>
> But the question still remains, why is it so important to use
> the few instructions that are unique to the 65C02?  They may be nice
> additions, but lots of people survived without them.
> 
>   Michael

0
Reply Chris 7/12/2006 10:34:22 PM

Chris Alaimo wrote:
> By your logic, no one should ever write an Apple II program using 65c02
> language because then not every A2 owner can run it.  How many Apple II
> enthusiasts do you know that don't have AT LEAST one 65c02-equipped
> Apple in their arsenal?  I think I only have one Apple II that DOESN'T
> have a 65c02 in it (or a processor capable of running 65c02 code).

Although it may sound strange, I think I might try to make exactly
that argument.

In the vast majority of cases, the 65C02 operations bring almost no
advantage to the resourceful programmer, while limiting the "market"
for his product by a significant fraction.

Only when rigid space or time constraints require it would I plan
on using 65C02 ops, and I can count the possible cases on one hand.

It seems to me that the 65C02 extensions were "too little, too late".
They provided a very small advantage in very few cases and at the
significant cost of creating a completely avoidable compatibility
problem.

When an architecture, warts and all, has been around long enough to
be widely exploited, and everyone has learned to live with it, it
doesn't make any sense to make *minor* improvements and break
unconditional compatibility.

As long as there are *any* 6502 Apples out there, that is an overriding
reason to write to the 6502, especially since the cost for doing so is
so low.

In the case of the Sudoku solver, a 30-second full solution is
increased to 30.17 seconds to remain completely compatible!

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/12/2006 10:43:46 PM

> > - Could it be possible to support arrow keys?
> The arrow keys are already supported--did you try them?
It didn't work for some strange reason.  Maybe applewin doesn't like my
laptop (of course I am using a beta release... who knows?)

> > - Could it use mousetext for the display, since it's already singled
> > out to use 65c02, and thus mainly enhanced //'s anyway?
> That was both "over the top" and more of a mess in the Applesoft
> program.  Maybe someone else would like to try it?

I wouldn't mind doing that part.  Yeah, it's ugly but I've done
mousetext from applesoft basic once before.

> Frankly, I kind of like its ugly little text face.  ;-)

Oh, it's not that bad.  The +'s and |'s and -'s give it a leet-looking
appearance that is familar and reminiscent of the earlier ][ programs.
Or .NFO files.  I'm working on a 40-col umn DHGR font and a display
method to render a 40-col screen to the DHGR page, reminscient of the
beagle bros and paul lutus type programs that make the hires screen
emulate the text screen.  this might also be an option for a minimally
painful look-n-feel overhaul if I ever finish it (free time has been a
little fleeting lately)

> > - Maybe make some boxes to uneditable (e.g. keep them hilighted) so
> > that the original numbers can remain unaltered.
>
> Not sure I know what you mean here...  After a complete solution, the
> grid is returned to its original state.  Do you mean provide protection
> of the user from himself?  ;-)

Yes, sorry.  That was pitiful grammar on my part.  Reloading the puzzle
makes sense.  However, I've found that keeping the original numbers
"locked" and hilighted is somewhat useful when I'm trying to figure out
where I messed up or determine which numbers are definately parts of
the solution.  Once you're 75% into the puzzle it makes it harder to
backtrack otherwise -- similarly with the numbers you enter and are
completely sure of.  But, like I said, this is relatively moot.

0
Reply BLuRry 7/12/2006 11:40:09 PM

"Michael J. Mahon" <mjmahon@aol.com> wrote in message 
news:XtudnVI15qUVVCnZnZ2dnUVZ_oydnZ2d@comcast.com...
> Scott Hemphill and I recently collaborated to create a Sudoku puzzle
> solver for Apple II computers.  The machine language solver is Scott's,
> with some adaptations by me, and I wrote the interactive Applesoft
> front-end.
>
> SUDOKU runs on any Apple II with 80-column firmware and a 65C02
> processor, which includes the IIc, the IIc+, the Enhanced //e, and
> the IIgs.
>
> It is amazingly fast!  Most Sudoku solvers run on modern PCs thousands
> of times faster than a 1MHz Apple II, and take seconds to solve a
> puzzle, but Scott's solver is so time and space efficient that it
> solves puzzles in seconds *running on an Apple II*!
>
> Check it out at:
>
> http://members.aol.com/mjmahon/Sudoku.html
>
> It is available both as a ShrinkIt disk archive and as a .dsk image.
>
> Enjoy!
>
> (BTW, the execution profile referred to in the paper has not yet
> been uploaded, so for the time being, that's a 404. ;-)
>
> -michael
>
> Fast Sudoku solving for Apple II's!
> Home page:  http://members.aol.com/MJMahon/
>
> "The wastebasket is our most important design
> tool--and it is seriously underused."

Michael,

    I understand that space saving memory and efficient optimization on 
6502/65C02 processor is best choice.  How do you assume that optimized 
Sudoku solving writted in x86 assembly and 6502/65C02 assembly might have 
the impact speed?  Is it possible that x86 assembly runs at 1GHz might be 
faster or almost close to the same speed to 6502/65C02 assembly at 1MHz?
    One paragraph of your website states that optimization is evil.  I guess 
that many programmers and designer of processor are not smart enough to 
study space saving memory and efficient optimization when they should try to 
write algothrims in smaller functions.
    I expect them to spend extra time and money to write advanced x86 
assembly rather than high level language so they can write PC Operating 
System and games.  They can outperform better.  It is wasting time and money 
to develop modern x86 processors by trying to improve and increase the speed 
while they forget all the optimization.  It is too much dependant on speed 
without optimization so software may operate little fair or poor speed.
    How could you predict if improved processor under the design of 6502 may 
run at 1GHz with 32 bits and 64 bits?  They may have better space saving 
memory and efficient speed than x86 processors.  If it does exist, 
programmers may expect modern emulator projects to be written in 6502 
assembly might be slower than x86 processor like RISC processor.  Do you 
expect?

Bryan Parkoff 


0
Reply Bryan 7/13/2006 12:01:27 AM

pausch@saaf.se (Paul Schlyter) writes:

> In article <m364i29683.fsf@jade.local>,
> Scott Hemphill  <hemphill@alumni.caltech.edu> wrote:
> 
> > pausch@saaf.se (Paul Schlyter) writes:

[snip]

> >> You must have encountered some really sloppily written sudoku solvers
> >> for the PC if they required several seconds to solve a sudoku....
> > 
> > Yup, if by sloppy you mean not choosing the most efficient algorithms.
> 
> That's indeed part of sloppiness!

To be fair, some of these programs were explicitly written with the
idea of using the same techniques that humans use to solve the
puzzles, and some use this feature to assign a puzzle difficulty that
is supposed to correspond in some degree to the difficulty a human
would have.

If you are only interested in computer solution, however, and you want
to be able to solve any puzzle, then you have to implement backtracking.
Backtracking is so simple for computers (but painful for humans) that
it makes sense for it to play a major role in computer solution.

[snip]

> APproximately how much slower do you think a 6502 version would have
> been?  10% slower?  Twice as slow?  10 times as slow?

Apparently under 1% slower, and it looks like Michael might put out a
6502 version in the next couple of days.

[snip]

> > It's public domain, with all sources
> > provided, so anyone is free to modify it for the 6502.  Furthermore,
> > we didn't put in all the speed tweaks we thought of, and Michael has
> > thoughtfully mentioned some of those improvements on his website if
> > anyone wants to pick up where we left off.
> 
> What assembler did you use?

Merlin-8.

Scott
-- 
Scott Hemphill	hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear
0
Reply Scott 7/13/2006 2:40:51 AM

BLuRry wrote:
>>>- Could it be possible to support arrow keys?
>>
>>The arrow keys are already supported--did you try them?
> 
> It didn't work for some strange reason.  Maybe applewin doesn't like my
> laptop (of course I am using a beta release... who knows?)
> 
> 
>>>- Could it use mousetext for the display, since it's already singled
>>>out to use 65c02, and thus mainly enhanced //'s anyway?
>>
>>That was both "over the top" and more of a mess in the Applesoft
>>program.  Maybe someone else would like to try it?
> 
> 
> I wouldn't mind doing that part.  Yeah, it's ugly but I've done
> mousetext from applesoft basic once before.

Be forewarned--I'm thinking about having a small M/L routine just
move the screen image into place--is that compatible with mousetext?

And this is kind of going in the wrong direction, since I'm trying
to make it universal, so it runs on *all* Apple II's...

>>Frankly, I kind of like its ugly little text face.  ;-)
> 
> 
> Oh, it's not that bad.  The +'s and |'s and -'s give it a leet-looking
> appearance that is familar and reminiscent of the earlier ][ programs.
> Or .NFO files.  I'm working on a 40-col umn DHGR font and a display
> method to render a 40-col screen to the DHGR page, reminscient of the
> beagle bros and paul lutus type programs that make the hires screen
> emulate the text screen.  this might also be an option for a minimally
> painful look-n-feel overhaul if I ever finish it (free time has been a
> little fleeting lately)

After you use it for awhile, the framing characters drop out of
consciousness entirely, and only the numbers are there.

BTW, animating this screen in HGR or DHR would have big speed effects!

>>>- Maybe make some boxes to uneditable (e.g. keep them hilighted) so
>>>that the original numbers can remain unaltered.
>>
>>Not sure I know what you mean here...  After a complete solution, the
>>grid is returned to its original state.  Do you mean provide protection
>>of the user from himself?  ;-)
> 
> 
> Yes, sorry.  That was pitiful grammar on my part.  Reloading the puzzle
> makes sense.  However, I've found that keeping the original numbers
> "locked" and hilighted is somewhat useful when I'm trying to figure out
> where I messed up or determine which numbers are definately parts of
> the solution.  Once you're 75% into the puzzle it makes it harder to
> backtrack otherwise -- similarly with the numbers you enter and are
> completely sure of.  But, like I said, this is relatively moot.

Um, editing the puzzle after solution has started can't be done.
You have to re-start the solver.

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/13/2006 2:46:31 AM

> BTW, animating this screen in HGR or DHR would have big speed effects!

Yeah, I know.  I remember some of the good ol' softdisk programs being
a big sluggish.  But if optimized (e.g. only update the characters that
have actually changed) it wouldn't be all that unbearable.

> Um, editing the puzzle after solution has started can't be done.
> You have to re-start the solver.

I meant more along the lines of this: http://www.websudoku.com/

(the original provided numbers of the puzzle are not editable, sort of
a way to prevent you from shooting your own feet.)

0
Reply BLuRry 7/13/2006 3:49:54 AM

BLuRry wrote:
> However, I've found that keeping the original numbers
>"locked" and hilighted is somewhat useful when I'm trying to figure out
>where I messed up or determine which numbers are definately parts of
>the solution.

I haven't tried running the program yet...

but you could:

1)inverse the original puzzle
2)inverse numbers you input
3)surround original puzzle digits with - characters or something


Rich

0
Reply aiiadict 7/13/2006 3:56:23 AM

Michael J. Mahon wrote:
> Michael J. Mahon wrote:
> 
>> Paul Schlyter wrote:
>>
>>> APproximately how much slower do you think a 6502 version would have
>>> been?  10% slower?  Twice as slow?  10 times as slow?
>>
>>
>>
>> Certainly less than twice as slow.  I'll find out what the current
>> speed impact would be...
> 
> 
> Paul's message prompted me to re-look at where 65C02 ops are used, and
> they are all quite easy to eliminate--most with no or negligible time 
> impact.
> 
> The only one that has a measurable impact is the LDA (p) in setbox,
> and that can be eliminated by just doing an LDA bit rather than a TXA,
> which frees up the X register to hold 0, so that the LDA (p) can become
> LDA (p,x).  (I don't get to use that addressing mode much--even if it is
> with a constant 0 index.  ;-)
> 
> The net impact is +1 cycle in a 166K loop, for a total time cost of 0.17 
> seconds!
> 
> So I think I'll make the changes and replace ProDOS with an earlier
> version, so unenhanced //e's will run the program, too.
> 
> BTW, this is a good example of why it's worth reevaluating earlier
> decisions later in the game.  I had taken a quick look at this a
> week ago, and, without really trying, didn't see an obvious fix.
> Now it all seems pretty easy.  ;-)
> 
> So I guess this is another confirmation of my belief that the 65C02
> extensions were largely unnecessary.
> 
>> But, as noted, there are other things about the program(s) that
>> would keep it from running on your ][+, too--all fixable with
>> some more effort...
> 
> 
> There's still that 80-column firmware issue...   Hmmm...

OK, I've done the solver, and in 6502 code, it's 0.85% slower
than the 65C02 version!  Looks like a no-brainer...  ;-)

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/13/2006 5:48:50 AM

BLuRry wrote:
>>BTW, animating this screen in HGR or DHR would have big speed effects!
> 
> 
> Yeah, I know.  I remember some of the good ol' softdisk programs being
> a big sluggish.  But if optimized (e.g. only update the characters that
> have actually changed) it wouldn't be all that unbearable.

No, it would still be unbearable.

The animation that SUDOKU does is not 15fps or 30fps.  It's doing
about 525 screen updates per second!  And if you skip any of them,
you have to look at all 81 digits, because you don't know which ones
have changed!  So anything you hope to make up in the straightaway
you will lose in the curves.  ;-)

This is a problem that hardware character generators were invented
to solve, and every Apple II has one of its very own.  ;-)

Every 'setbox' and 'unsetbox' places and removes a digit from the
screen memory, and each such operation requires 37 cycles.  All but
9 of those are spent just getting the screen address to update, and
only 9 are spent changing the screen memory.  If you have to update
7-8 bytes of HGR memory, or 14-16 bytes of DHR memory, with a pattern
that will take over a dozen cycles to set up addressing for, things
will get s-l-o-w-e-r.

(BTW, an accelerator doesn't help as much as you'd like, since all
the screen stores have to run at 1MHz.)

>>Um, editing the puzzle after solution has started can't be done.
>>You have to re-start the solver.
> 
> 
> I meant more along the lines of this: http://www.websudoku.com/
> 
> (the original provided numbers of the puzzle are not editable, sort of
> a way to prevent you from shooting your own feet.)

Ah.  I suppose this could be useful for experimentation with a puzzle.

I never thought of the program as a "paper" for solving puzzles by
hand, only as a machine solver...  To be used as a "coach", a very
different interface and "coaching" functions would be appropriate.

You might want to write such a program.  ;-)

It wouldn't be too hard to add uneditable boxes to the BASIC program,
just use a different representation for the "givens", like inverse text
(though it looks a little sloppy in the grid, since inverse digits tend
to merge with the major dividing "lines").  And the solver would have to
ignore that distinction by masking the grid entries--no problem.

Have at it!  But you might want to wait a couple of days to let me
get a new version re-converged.

Frankly, I personally don't have much interest in making this program
more like some other program--as I'm sure you can understand.  ;-)

Think of this program more like a "prover"--it tells you whether a
puzzle is inconsistent, has a unique solution, or has many solutions--
but it doesn't help you design puzzles or decide what to do next.

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/13/2006 8:21:49 AM

aiiadict@gmail.com wrote:
> BLuRry wrote:
> 
>>However, I've found that keeping the original numbers
>>"locked" and hilighted is somewhat useful when I'm trying to figure out
>>where I messed up or determine which numbers are definately parts of
>>the solution.
> 
> 
> I haven't tried running the program yet...
> 
> but you could:
> 
> 1)inverse the original puzzle
> 2)inverse numbers you input
> 3)surround original puzzle digits with - characters or something

Yep, 1)'s my first idea.  The only snag is that inverse characters
come in contact with the inverse "major" grid lines, and so tend
to "attach" themselves, which is unattractive.

My real issue is that I don't see the purpose, or the purpose is
at odds with our original design intent.

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/13/2006 8:24:58 AM

Bryan Parkoff wrote:
> "Michael J. Mahon" <mjmahon@aol.com> wrote in message 
> news:XtudnVI15qUVVCnZnZ2dnUVZ_oydnZ2d@comcast.com...
> 
>>Scott Hemphill and I recently collaborated to create a Sudoku puzzle
>>solver for Apple II computers.  The machine language solver is Scott's,
>>with some adaptations by me, and I wrote the interactive Applesoft
>>front-end.
>>
>>SUDOKU runs on any Apple II with 80-column firmware and a 65C02
>>processor, which includes the IIc, the IIc+, the Enhanced //e, and
>>the IIgs.
>>
>>It is amazingly fast!  Most Sudoku solvers run on modern PCs thousands
>>of times faster than a 1MHz Apple II, and take seconds to solve a
>>puzzle, but Scott's solver is so time and space efficient that it
>>solves puzzles in seconds *running on an Apple II*!
>>
>>Check it out at:
>>
>>http://members.aol.com/mjmahon/Sudoku.html
>>
>>It is available both as a ShrinkIt disk archive and as a .dsk image.
>>
>>Enjoy!
>>
>>(BTW, the execution profile referred to in the paper has not yet
>>been uploaded, so for the time being, that's a 404. ;-)
>>
>>-michael
>>
>>Fast Sudoku solving for Apple II's!
>>Home page:  http://members.aol.com/MJMahon/
>>
>>"The wastebasket is our most important design
>>tool--and it is seriously underused."
> 
> 
> Michael,
> 
>     I understand that space saving memory and efficient optimization on 
> 6502/65C02 processor is best choice.  How do you assume that optimized 
> Sudoku solving writted in x86 assembly and 6502/65C02 assembly might have 
> the impact speed?  Is it possible that x86 assembly runs at 1GHz might be 
> faster or almost close to the same speed to 6502/65C02 assembly at 1MHz?

Surely you jest!

x86 code would be approximately as efficient in terms of space and ops
required to get to a solution, but would execute those ops about 6000
times faster!  In fact, Scott's compiled C code for the algorithm solves
the 'evil01' puzzle in about 2 milliseconds, while a 1MHz Apple II takes
about 20 seconds--so there's a factor of 10,000.

>     One paragraph of your website states that optimization is evil.  I guess 
> that many programmers and designer of processor are not smart enough to 
> study space saving memory and efficient optimization when they should try to 
> write algothrims in smaller functions.

"*Premature* optimization is the root of all evil" is a well-known
software engineering aphorism.  It refers to the fact that a well-
structured but inefficent program can be optimized later without much
difficulty, but a program that has become twisted and "spaghettified"
in an effort to speed it up a little is very difficult to improve--and
most big improvements are algorithmic or data structure related, not
just coding hacks.

The meaning of the aphorism is to refrain from optimizing until you have
a reasonably well-structured program actually running.  Then you can
measure it and determine exactly which 10% of it actually makes a real
difference in its efficiency.  Optimization isn't bad, but optimizing
before you know what to optimize or why it matters is *very* bad.  It's
a waste of time and it hardens otherwise malleable code.

>     I expect them to spend extra time and money to write advanced x86 
> assembly rather than high level language so they can write PC Operating 
> System and games.  They can outperform better.  It is wasting time and money 
> to develop modern x86 processors by trying to improve and increase the speed 
> while they forget all the optimization.  It is too much dependant on speed 
> without optimization so software may operate little fair or poor speed.

As you know from my previous posts, I really lament the passing of the
age of craftsmanship in most software projects.  But I also recognize
that the vast majority of code that is written is executed very seldom
if at all (think about the percentage of code that is error recovery
code).  It is unreasonable to expect that most of the large applications
that we use today would be written in assembly language.  (However, I
would like to see measurement-targeted dynamic optimization flourish--
maybe in another decade...)

>     How could you predict if improved processor under the design of 6502 may 
> run at 1GHz with 32 bits and 64 bits?  They may have better space saving 
> memory and efficient speed than x86 processors.  If it does exist, 
> programmers may expect modern emulator projects to be written in 6502 
> assembly might be slower than x86 processor like RISC processor.  Do you 
> expect?

There could hardly be a more apples and oranges comparison.  The 6502
architecture is very different from the x86 architecture in *many* ways.
The efficiency of a computer architecture is not an absolute, but is
completely relative to the nature of the workload that it will process,
and the cost point which it must satisfy.

For two very similar architectures, or for two implementations of the
same architecture, relative evaluations are much simpler, but are still
based on predetermined workload samples.

In the case of the Sudoku solver, I think that the empirical ratio of
performance between a 1MHz 6502 and a 3GHz 4-way superscalar x86 is in
rough agreement with their expected op throughput.  Of course, the 6502
requires much less hardware to achieve its performance, and could, in
principle, run 20-50 times faster with essentially the same logic design
in modern technology, but that is all speculation.

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/13/2006 8:47:31 AM

Scott Hemphill <hemphill@hemphills.net> wrote:

> pausch@saaf.se (Paul Schlyter) writes:
> 
> > In article <XtudnVI15qUVVCnZnZ2dnUVZ_oydnZ2d@comcast.com>,
> > Michael J. Mahon <mjmahon@aol.com> wrote:
> > >It is amazingly fast!  Most Sudoku solvers run on modern PCs thousands
> > >of times faster than a 1MHz Apple II, and take seconds to solve a
> > >puzzle,
> > 
> > I don't believe that!  Check out this Sudoku solver:
> >     http://homepage.ntlworld.com/valleyway/solver.html
> 
> I don't know about _most_ Sudoku solvers, but there are a lot of slow ones
> out there.  And in terms of taking seconds, I think Michael was talking
> about difficult puzzles, not easy ones.  Our most frequently used test
> puzzle was one we called "evil01".  I loaded this into the solver you
> mentioned and it took about 4 seconds on my 3GHz machine.  Here's the
> puzzle:
> 
> .2.......
> ...6....3
> .74.8....
> .....3..2
> .8..4..1.
> 6..5.....
> ....1.78.
> 5....9...
> .......4.

That's certainly a tough one. I regularly do the puzzles in the local
paper, and this one took me about 1h10m, which is twice as long as the
typical time I take for a "diabolical" puzzle.

Only one speculative move and backtrack required, but I had to stretch
my usual techniques somewhat to avoid guesswork near the beginning.

The Sudoku program on my PDA took about ten seconds to verify it was
solvable (it usually takes well under a second).

-- 
David Empson
dempson@actrix.gen.nz
0
Reply dempson 7/13/2006 1:08:34 PM

dempson@actrix.gen.nz (David Empson) writes:

> Scott Hemphill <hemphill@hemphills.net> wrote:
> 
> > pausch@saaf.se (Paul Schlyter) writes:
> > 
> > > In article <XtudnVI15qUVVCnZnZ2dnUVZ_oydnZ2d@comcast.com>,
> > > Michael J. Mahon <mjmahon@aol.com> wrote:
> > > >It is amazingly fast!  Most Sudoku solvers run on modern PCs thousands
> > > >of times faster than a 1MHz Apple II, and take seconds to solve a
> > > >puzzle,
> > > 
> > > I don't believe that!  Check out this Sudoku solver:
> > >     http://homepage.ntlworld.com/valleyway/solver.html
> > 
> > I don't know about _most_ Sudoku solvers, but there are a lot of slow ones
> > out there.  And in terms of taking seconds, I think Michael was talking
> > about difficult puzzles, not easy ones.  Our most frequently used test
> > puzzle was one we called "evil01".  I loaded this into the solver you
> > mentioned and it took about 4 seconds on my 3GHz machine.  Here's the
> > puzzle:
> > 
> > .2.......
> > ...6....3
> > .74.8....
> > .....3..2
> > .8..4..1.
> > 6..5.....
> > ....1.78.
> > 5....9...
> > .......4.
> 
> That's certainly a tough one. I regularly do the puzzles in the local
> paper, and this one took me about 1h10m, which is twice as long as the
> typical time I take for a "diabolical" puzzle.
> 
> Only one speculative move and backtrack required, but I had to stretch
> my usual techniques somewhat to avoid guesswork near the beginning.
> 
> The Sudoku program on my PDA took about ten seconds to verify it was
> solvable (it usually takes well under a second).

If you think that one's tough, try this one.  It's much harder, at least
for my algorithm.  I don't know where I got it, but I saved it under the
filename "hints17-2".  I have four files of the form "hints17-x", all
very tough, but this one is the worst:

...1.2.7..
..5.....9.
....4.....
..8...5...
..9.......
.....6...2
...2......
...6.....5
......9.83

Hmmm.  I just noticed that there are only 17 givens in this puzzle, which
is the minimum number known to be required.  (What I mean is that although
no one has proved a minimum number, there are no known puzzles with
as few as 16 givens that result in a unique solution.)  That might account
for the "17" in the filename.

Scott
-- 
Scott Hemphill	hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear
0
Reply Scott 7/13/2006 1:23:34 PM

Michael J. Mahon wrote:
> BLuRry wrote:
> >>BTW, animating this screen in HGR or DHR would have big speed effects!
> >
> >
> > Yeah, I know.  I remember some of the good ol' softdisk programs being
> > a big sluggish.  But if optimized (e.g. only update the characters that
> > have actually changed) it wouldn't be all that unbearable.
>
> No, it would still be unbearable.
>
> The animation that SUDOKU does is not 15fps or 30fps.  It's doing
> about 525 screen updates per second!  And if you skip any of them,
> you have to look at all 81 digits, because you don't know which ones
> have changed!  So anything you hope to make up in the straightaway
> you will lose in the curves.  ;-)

AH!  I see what you mean now -- I didn't realize it was doing screen
updates while searching for the solution.  It's a little too fast to
catch that, you see?  I think it must have been doing all its
calculations during one VBL cycle. ;-)   I was talking more about just
the UI that you interact with to solve the puzzle.

> (BTW, an accelerator doesn't help as much as you'd like, since all
> the screen stores have to run at 1MHz.)

That and having a large font table would not have enough cache locality
to be very efficient unless you happen to be storing the same character
everywhere.  Either way, I wouldn't bet on an accelorator making things
much better -- and also I wouldn't want to expect the average apple //
user to have one.

-B

0
Reply BLuRry 7/13/2006 1:39:53 PM

> Yep, 1)'s my first idea.  The only snag is that inverse characters
> come in contact with the inverse "major" grid lines, and so tend
> to "attach" themselves, which is unattractive.

There's always a simple solution, like surrounding the numbers with
asterisks or periods or some other character that "shades" the cell.

> My real issue is that I don't see the purpose, or the purpose is
> at odds with our original design intent.

well, that is very fair.  As a prover, this program more than meets the
requirements, and very ellegantly for that matter.  I can understand
why you wouldn't want to duplicate features of  other suducko solver
programs out there, too much UI, and a total pain to write in basic.

I wouldn't mind rewriting the ui to be fancier, but honestly I have
other projects backlogged that I promised myself to work on first (e.g.
get a2gameserver to work properly for //e's, rewrite my raycaster
engine for double-hires and fix bugs in it, etc)

-B

0
Reply BLuRry 7/13/2006 1:44:09 PM

Scott Hemphill <hemphill@hemphills.net> wrote:

> If you think that one's tough, try this one.  It's much harder, at least
> for my algorithm.  I don't know where I got it, but I saved it under the
> filename "hints17-2".  I have four files of the form "hints17-x", all
> very tough, but this one is the worst:
> 
> ..1.2.7..
> .5.....9.
> ...4.....
> .8...5...
> .9.......
> ....6...2
> ..2......
> ..6.....5
> .....9.83

Not even symmetrical.  (Shudder.)

Interestingly my PDA was able to verify this one much faster. I don't
have time to look at it now, but I'll get to it in a few days.

-- 
David Empson
dempson@actrix.gen.nz
0
Reply dempson 7/13/2006 1:44:12 PM

In article <NpidnWmquZS-5ijZnZ2dnUVZ_qOdnZ2d@comcast.com>,
Michael J. Mahon <mjmahon@aol.com> wrote:

A 6502 assembly language programmer using 65C02 specific ops would be
compared to one who uses an original 6502 and there utilizes the
undocumented 6502 ops which certainly will fail on a 65C02.  And for
the same reason: marginal speed and space advantages.


> Chris Alaimo wrote:
>> By your logic, no one should ever write an Apple II program using 65c02
>> language because then not every A2 owner can run it.  How many Apple II
>> enthusiasts do you know that don't have AT LEAST one 65c02-equipped
>> Apple in their arsenal?  I think I only have one Apple II that DOESN'T
>> have a 65c02 in it (or a processor capable of running 65c02 code).
> 
> Although it may sound strange, I think I might try to make exactly
> that argument.
> 
> In the vast majority of cases, the 65C02 operations bring almost no
> advantage to the resourceful programmer, while limiting the "market"
> for his product by a significant fraction.
> 
> Only when rigid space or time constraints require it would I plan
> on using 65C02 ops, and I can count the possible cases on one hand.
> 
> It seems to me that the 65C02 extensions were "too little, too late".
> They provided a very small advantage in very few cases and at the
> significant cost of creating a completely avoidable compatibility
> problem.
> 
> When an architecture, warts and all, has been around long enough to
> be widely exploited, and everyone has learned to live with it, it
> doesn't make any sense to make *minor* improvements and break
> unconditional compatibility.
> 
> As long as there are *any* 6502 Apples out there, that is an overriding
> reason to write to the 6502, especially since the cost for doing so is
> so low.
> 
> In the case of the Sudoku solver, a 30-second full solution is
> increased to 30.17 seconds to remain completely compatible!
> 
> -michael
> 
> Fast Sudoku solver for Apple II's!
> Home page:  http://members.aol.com/MJMahon/
> 
> "The wastebasket is our most important design
> tool--and it's seriously underused."

-- 
----------------------------------------------------------------
Paul Schlyter,  Grev Turegatan 40,  SE-114 38 Stockholm,  SWEDEN
e-mail:  pausch at stockholm dot bostream dot se
WWW:     http://stjarnhimlen.se/
0
Reply pausch 7/13/2006 4:14:01 PM

In article <1152743662.106100.139600@p79g2000cwp.googlegroups.com>,
Chris Alaimo <cpalaimo@gmail.com> wrote:

> By your logic, no one should ever write an Apple II program using 65c02
> language because then not every A2 owner can run it.  How many Apple II
> enthusiasts do you know that don't have AT LEAST one 65c02-equipped
> Apple in their arsenal?

I know myself pretty well... :-) ...and I have only one Apple II: my
very first personal computer, bought in January 1980, before the 65C02
even existed.  Perhaps you think my posession of only one single Apple II
disqualifies me as an "Apple II enthusiast" ????



-- 
----------------------------------------------------------------
Paul Schlyter,  Grev Turegatan 40,  SE-114 38 Stockholm,  SWEDEN
e-mail:  pausch at stockholm dot bostream dot se
WWW:     http://stjarnhimlen.se/
0
Reply pausch 7/13/2006 4:14:01 PM

Paul Schlyter wrote:
> In article <NpidnWmquZS-5ijZnZ2dnUVZ_qOdnZ2d@comcast.com>,
> Michael J. Mahon <mjmahon@aol.com> wrote:
> 
> A 6502 assembly language programmer using 65C02 specific ops would be
> compared to one who uses an original 6502 and there utilizes the
> undocumented 6502 ops which certainly will fail on a 65C02.  And for
> the same reason: marginal speed and space advantages.

An interesting comparison.

While quite different in circumstance (since, at least early in the
cycle one might assume that the legitimate extended variant of an old
standard will itself become the "standard"), I do agree that the results
of such "standard-splintering" events is quite similar.  They fragment
the market with potentially negative results.

An example I am too familiar with is Appleworks.  By version 3.0, it
was the most widely used integrated package in the world, and had
become a "platform" for applications in its own right.  Then, as the
Apple II market was losing momentum, two new versions were introduced
(4 and 5), and, while they never became the dominant version, they did
serve to splinter the Appleworks market, so that the cohesion and
synergy was effectively lost.  It was all downhill from there.

Of course, the Appleworks community was destined to decline anyway
as the hardware platform fell behind, but the Appleworks "improvements"
actually hastened the disintegration of its community.

In a world where models become obsolete and are either retired or
replaced by newer models, the older standard simply vanishes from
the stage, and the "evolution with backward compatibility" model
works well enough.

But in the case of retrocomputing, older models *never* disappear,
so writing for the lowest common denominator *where practical*
should be the rule.

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/13/2006 6:05:48 PM

David Empson wrote:
> Scott Hemphill <hemphill@hemphills.net> wrote:
> 
> 
>>If you think that one's tough, try this one.  It's much harder, at least
>>for my algorithm.  I don't know where I got it, but I saved it under the
>>filename "hints17-2".  I have four files of the form "hints17-x", all
>>very tough, but this one is the worst:
>>
>>..1.2.7..
>>.5.....9.
>>...4.....
>>.8...5...
>>.9.......
>>....6...2
>>..2......
>>..6.....5
>>.....9.83
> 
> 
> Not even symmetrical.  (Shudder.)
> 
> Interestingly my PDA was able to verify this one much faster. I don't
> have time to look at it now, but I'll get to it in a few days.

Scott and I have both noticed that the time for a backtrack solution
depends very strongly on the order in which the boxes are scanned for
a trial placement.

This puzzle is on the SUDOKU disk as VEVIL (Very evil ;-).  I also added
the 180-degree rotation of this puzzle as RVEVIL, where the solver will
effectively scan the grid in the reverse order.  The solver takes 3.5
times longer to solve the rotated puzzle!

Scott did an experiment in which the solver scanned all the "evil"
puzzles in reverse (actually, upper left to lower right) and in a 
majority of cases, the solution times changed by a factor of two
or more.

The solver currently scans "backward", from lower right to upper
left, for efficiency and coding convenience.

BTW, I'm glad that the 1MHz Apple II is hanging in there with your
PDA in solution speed.  ;-)

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/13/2006 6:39:19 PM

BLuRry wrote:
> Michael J. Mahon wrote:

>>The animation that SUDOKU does is not 15fps or 30fps.  It's doing
>>about 525 screen updates per second!  And if you skip any of them,
>>you have to look at all 81 digits, because you don't know which ones
>>have changed!  So anything you hope to make up in the straightaway
>>you will lose in the curves.  ;-)
> 
> 
> AH!  I see what you mean now -- I didn't realize it was doing screen
> updates while searching for the solution.  It's a little too fast to
> catch that, you see?  I think it must have been doing all its
> calculations during one VBL cycle. ;-)   I was talking more about just
> the UI that you interact with to solve the puzzle.

I'd forgotten that you were running it in an emulator!

Try it slowed down to 1MHz and you'll see the animation.

>>(BTW, an accelerator doesn't help as much as you'd like, since all
>>the screen stores have to run at 1MHz.)
> 
> 
> That and having a large font table would not have enough cache locality
> to be very efficient unless you happen to be storing the same character
> everywhere.  Either way, I wouldn't bet on an accelorator making things
> much better -- and also I wouldn't want to expect the average apple //
> user to have one.

That's a good point.  The 8KB Zip Chip cache can hold essentially all
of the code and data now.

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/13/2006 6:55:01 PM

BLuRry wrote:
>>Yep, 1)'s my first idea.  The only snag is that inverse characters
>>come in contact with the inverse "major" grid lines, and so tend
>>to "attach" themselves, which is unattractive.
> 
> 
> There's always a simple solution, like surrounding the numbers with
> asterisks or periods or some other character that "shades" the cell.

The current screen layout is only two lines per grid line (2x9=18),
so an inverse character is always going to contact an adjacent
inverse "line".

>>My real issue is that I don't see the purpose, or the purpose is
>>at odds with our original design intent.
> 
> 
> well, that is very fair.  As a prover, this program more than meets the
> requirements, and very ellegantly for that matter.  I can understand
> why you wouldn't want to duplicate features of  other suducko solver
> programs out there, too much UI, and a total pain to write in basic.
> 
> I wouldn't mind rewriting the ui to be fancier, but honestly I have
> other projects backlogged that I promised myself to work on first (e.g.
> get a2gameserver to work properly for //e's, rewrite my raycaster
> engine for double-hires and fix bugs in it, etc)

If you do get a chance, go for it!  It's not like we're drowning in
new software for the Apple II.  ;-)

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/13/2006 6:57:52 PM

David Empson <dempson@actrix.gen.nz> wrote:

> Scott Hemphill <hemphill@hemphills.net> wrote:
> 
> > If you think that one's tough, try this one.  It's much harder, at least
> > for my algorithm.  I don't know where I got it, but I saved it under the
> > filename "hints17-2".  I have four files of the form "hints17-x", all
> > very tough, but this one is the worst:
> > 
> > ..1.2.7..
> > .5.....9.
> > ...4.....
> > .8...5...
> > .9.......
> > ....6...2
> > ..2......
> > ..6.....5
> > .....9.83
> 
> Not even symmetrical.  (Shudder.)
> 
> Interestingly my PDA was able to verify this one much faster. I don't
> have time to look at it now, but I'll get to it in a few days.

That was a doddle. I did it before I went to sleep. It took me about 15
minutes, though I did need one guess. The previous "evil101" was much
harder.

This "hints17-x" one might be harder for a simple brute force solution,
but not if it is solved logically.

-- 
David Empson
dempson@actrix.gen.nz
0
Reply dempson 7/13/2006 9:26:20 PM

In article <yvOdnYZ_3_3jFivZnZ2dnUVZ_sSdnZ2d@comcast.com>,
Michael J. Mahon <mjmahon@aol.com> wrote:

> Paul Schlyter wrote:
>> In article <NpidnWmquZS-5ijZnZ2dnUVZ_qOdnZ2d@comcast.com>,
>> Michael J. Mahon <mjmahon@aol.com> wrote:
>> 
>> A 6502 assembly language programmer using 65C02 specific ops would be
>> compared to one who uses an original 6502 and there utilizes the
>> undocumented 6502 ops which certainly will fail on a 65C02.  And for
>> the same reason: marginal speed and space advantages.
> 
> An interesting comparison.
> 
> While quite different in circumstance (since, at least early in the
> cycle one might assume that the legitimate extended variant of an old
> standard will itself become the "standard"), I do agree that the results
> of such "standard-splintering" events is quite similar.  They fragment
> the market with potentially negative results.
> 
> An example I am too familiar with is Appleworks.  By version 3.0, it
> was the most widely used integrated package in the world, and had
> become a "platform" for applications in its own right.  Then, as the
> Apple II market was losing momentum, two new versions were introduced
> (4 and 5), and, while they never became the dominant version, they did
> serve to splinter the Appleworks market, so that the cohesion and
> synergy was effectively lost.  It was all downhill from there.
> 
> Of course, the Appleworks community was destined to decline anyway
> as the hardware platform fell behind, but the Appleworks "improvements"
> actually hastened the disintegration of its community.
> 
> In a world where models become obsolete and are either retired or
> replaced by newer models, the older standard simply vanishes from
> the stage, and the "evolution with backward compatibility" model
> works well enough.
> 
> But in the case of retrocomputing, older models *never* disappear,
> so writing for the lowest common denominator *where practical*
> should be the rule.

Quite true!

Anyone got a sudoku solver for the ENIAC?  I'd like to try to run it on
this ENIAC simulator:   http://page.mi.fu-berlin.de/~zoppke/eniac/

An Apple II emulator for the ENIAC will be ok too...

:-)


> -michael
> 
> Fast Sudoku solver for Apple II's!
> Home page:  http://members.aol.com/MJMahon/
> 
> "The wastebasket is our most important design
> tool--and it's seriously underused."
-- 
----------------------------------------------------------------
Paul Schlyter,  Grev Turegatan 40,  SE-114 38 Stockholm,  SWEDEN
e-mail:  pausch at stockholm dot bostream dot se
WWW:     http://stjarnhimlen.se/
0
Reply pausch 7/13/2006 9:43:37 PM

Michael J. Mahon wrote:
> Scott Hemphill and I recently collaborated to create a Sudoku puzzle
> solver for Apple II computers.

Very neat!  I had no idea you could get a solver like this
to run so fast on the IIe!

It's fun to watch the solver go over the more complex puzzles...
watch where it's thinking..

Rich

0
Reply aiiadict 7/14/2006 12:37:25 AM

> I'd forgotten that you were running it in an emulator!
>
> Try it slowed down to 1MHz and you'll see the animation.

Actually, I never speed up an emulator, except disk i/o.  I noticed
numbers appear on the screen, but the solver is too fast for me to
really see every little step it takes, unless I load one of the evil
puzzles.   Man this thing is fast.  Still astounded by what you guys
pulled off.  Have a good week!

-B

0
Reply BLuRry 7/14/2006 4:06:43 AM

"Michael J. Mahon" <mjmahon@aol.com> wrote in message
news:GL2dnfZbpopeQyjZnZ2dnUVZ_rKdnZ2d@comcast.com...
> Michael J. Mahon wrote:
> > Michael J. Mahon wrote:
> >
> >> Paul Schlyter wrote:
> >>
> >>> APproximately how much slower do you think a 6502 version would have
> >>> been?  10% slower?  Twice as slow?  10 times as slow?
> >>
> >>
> >>
> >> Certainly less than twice as slow.  I'll find out what the current
> >> speed impact would be...
> >
> >
> > Paul's message prompted me to re-look at where 65C02 ops are used, and
> > they are all quite easy to eliminate--most with no or negligible time
> > impact.
> >
> > The only one that has a measurable impact is the LDA (p) in setbox,
> > and that can be eliminated by just doing an LDA bit rather than a TXA,
> > which frees up the X register to hold 0, so that the LDA (p) can become
> > LDA (p,x).  (I don't get to use that addressing mode much--even if it is
> > with a constant 0 index.  ;-)

Speaking of 'setbox', I'm surprised that you used a lookup table to
determine screen addresses but did not use one for the 'other' array. If I
understand the code correctly, you could replace the
'best*24+other_base_address' calculation with a pre-computed value. Also you
could get rid of three '-1' values in 80 rows of 'other', for a net savings
of 80 bytes to boot.

Heck, you could get rid of that last -1 value in each row as well: add $80
to the last entry, so that the ASL instruction in the main loop of 'setbox'
sets the carry for that entry, clears it for all the others. Then after
you've set the bit of interest, break the loop if the carry flag is set.
That also saves one LDA instruction at the top of the loop (20 executions
instead of 21).

And then you might consider breaking the 'setbox' loop in two loops, one for
setting the low byte and one setting the 9th bit. That would eliminate the
redundant test executed inside the loop: just test it once to decide which
loop to execute. Of course all the loop control code would be duplicated and
the only differences would be the few instructions that actually set bits,
but you just saved a bunch of space by implementing that 'others' lookup
table.

And of course you don't need 65C02 instructions to do any of this :)

Something like this:

    ...
    lda best
    asl
    tax
    lda otherptr,x
    sta p
    lda otherptr+1,x
    sta p+1
    ldx #20-1
    lda bit
    beq    high_bit_loop

:1  txa
    tay
    lda (p),y
    asl
    tay
    lda (bits),y
    ora bit
    sta (bits),y
    dex
    bpl :1
    rts

high_bit_loop:

     txa
    tay
    lda (p),y
    asl
    tay
    iny
    lda (bits),y
    ora #$01
    sta (bits),y
    dex
    bpl high_bit_loop
    rts

....to take advantage of the fact that there are two indirect pointers
already. Can use the X-register for counting directly,
and don't need to put any kind of special coded values in the 'other' table
at all.

And if that works, why not store the neighbor square# times two in the
'other' table and eliminate the ASL instruction in each loop?

Oh, I gotta stop now...

- Anton Treuenfels


0
Reply Anton 7/14/2006 4:32:09 AM

Anton Treuenfels wrote:
> "Michael J. Mahon" <mjmahon@aol.com> wrote in message
> news:GL2dnfZbpopeQyjZnZ2dnUVZ_rKdnZ2d@comcast.com...
> 
>>Michael J. Mahon wrote:
>>
>>>Michael J. Mahon wrote:
>>>
>>>
>>>>Paul Schlyter wrote:
>>>>
>>>>
>>>>>APproximately how much slower do you think a 6502 version would have
>>>>>been?  10% slower?  Twice as slow?  10 times as slow?
>>>>
>>>>
>>>>
>>>>Certainly less than twice as slow.  I'll find out what the current
>>>>speed impact would be...
>>>
>>>
>>>Paul's message prompted me to re-look at where 65C02 ops are used, and
>>>they are all quite easy to eliminate--most with no or negligible time
>>>impact.
>>>
>>>The only one that has a measurable impact is the LDA (p) in setbox,
>>>and that can be eliminated by just doing an LDA bit rather than a TXA,
>>>which frees up the X register to hold 0, so that the LDA (p) can become
>>>LDA (p,x).  (I don't get to use that addressing mode much--even if it is
>>>with a constant 0 index.  ;-)
> 
> 
> Speaking of 'setbox', I'm surprised that you used a lookup table to
> determine screen addresses but did not use one for the 'other' array. If I
> understand the code correctly, you could replace the
> 'best*24+other_base_address' calculation with a pre-computed value. Also you
> could get rid of three '-1' values in 80 rows of 'other', for a net savings
> of 80 bytes to boot.

I think the simple answer is that the pieces of code were written by
different people at different times.  ;-)

> Heck, you could get rid of that last -1 value in each row as well: add $80
> to the last entry, so that the ASL instruction in the main loop of 'setbox'
> sets the carry for that entry, clears it for all the others. Then after
> you've set the bit of interest, break the loop if the carry flag is set.
> That also saves one LDA instruction at the top of the loop (20 executions
> instead of 21).

Switching to count control for the loop is a big win.

> And then you might consider breaking the 'setbox' loop in two loops, one for
> setting the low byte and one setting the 9th bit. That would eliminate the
> redundant test executed inside the loop: just test it once to decide which
> loop to execute. Of course all the loop control code would be duplicated and
> the only differences would be the few instructions that actually set bits,
> but you just saved a bunch of space by implementing that 'others' lookup
> table.
> 
> And of course you don't need 65C02 instructions to do any of this :)

Right.  I've already eliminated the 65C02 instructions, and one of the
paths not (yet?) taken was splitting the loop.

> Something like this:
> 
>     ...
>     lda best
>     asl
>     tax
>     lda otherptr,x
>     sta p
>     lda otherptr+1,x
>     sta p+1
>     ldx #20-1
>     lda bit
>     beq    high_bit_loop
> 
> :1  txa
>     tay
>     lda (p),y
>     asl
>     tay
>     lda (bits),y
>     ora bit
>     sta (bits),y
>     dex
>     bpl :1
>     rts
> 
> high_bit_loop:
> 
>      txa
>     tay
>     lda (p),y
>     asl
>     tay
>     iny
>     lda (bits),y
>     ora #$01
>     sta (bits),y
>     dex
>     bpl high_bit_loop
>     rts
> 
> ...to take advantage of the fact that there are two indirect pointers
> already. Can use the X-register for counting directly,
> and don't need to put any kind of special coded values in the 'other' table
> at all.
> 
> And if that works, why not store the neighbor square# times two in the
> 'other' table and eliminate the ASL instruction in each loop?

It's all good...I'm working on the solver as we "speak".

There's no need to do an OR in the "=9" case, since the only values
that byte can take on are 0 and 1.  So LDA #1; STA (bits),y will do
fine.  This, together with other changes, gets this loop down to 18
cycles per iteration--too bad that it's much less frequent than the
other case.  ;-)

> Oh, I gotta stop now...

Please don't!  ;-)

BTW, as I was thinking about the additional ops in the 65C02, I realized
how much more useful it would have been to provide more symmetry between
the X and Y registers.  For example, the register juggling in the above
loop would be unnecessary if you could post-index by X as well as Y.

Kind of makes you wonder who designed the extensions...

We can avoid this juggling by modifying the addresses of the "neighbor
index" loads, outside the loop.  This allows X to be used directly as
an index.  When combined with storing the index already doubled, it
drops the "<9" loop to 22 cycles per iteration, resulting in a net
saving of about 260 cycles per 'setbox' call.  In our benchmark
puzzle, it's called about 8300 times, for a saving of about 2.3 seconds
out of 30, or 7.6%.  Though I prefer factors of two, I'll take it! ;-)

-michael

Even faster Sudoku solver for *all* Apple II's soon!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/14/2006 10:26:05 AM

BLuRry wrote:
>>I'd forgotten that you were running it in an emulator!
>>
>>Try it slowed down to 1MHz and you'll see the animation.
> 
> 
> Actually, I never speed up an emulator, except disk i/o.  I noticed
> numbers appear on the screen, but the solver is too fast for me to
> really see every little step it takes, unless I load one of the evil
> puzzles.   Man this thing is fast.  Still astounded by what you guys
> pulled off.  Have a good week!

When you try one that takes more than 5 seconds, the animation will
amuse you.  It's actually fun to watch it trying various permutations.
;-)

You have a good one, too!

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/14/2006 10:28:46 AM

aiiadict@gmail.com wrote:
> Michael J. Mahon wrote:
> 
>>Scott Hemphill and I recently collaborated to create a Sudoku puzzle
>>solver for Apple II computers.
> 
> 
> Very neat!  I had no idea you could get a solver like this
> to run so fast on the IIe!
> 
> It's fun to watch the solver go over the more complex puzzles...
> watch where it's thinking..

Glad you like it, Rich!

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/14/2006 10:29:17 AM

Hey, Mike!

Michael J. Mahon wrote:
> Glad you like it, Rich!

    You have the REMs at the end of line numbers 410 and 420 backwards.

Willi

0
Reply A2Pro 7/14/2006 8:10:19 PM

A2Pro wrote:
> Hey, Mike!
> 
> Michael J. Mahon wrote:
> 
>>Glad you like it, Rich!
> 
> 
>     You have the REMs at the end of line numbers 410 and 420 backwards.

Oops--thanks!

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/14/2006 9:08:56 PM

Hi!

Michael J. Mahon wrote:
> A2Pro wrote:
> >     You have the REMs at the end of line numbers 410 and 420 backwards.
>
> Oops--thanks!

    The end of "sudoku.solver.s" has the line 'bitmap ds 0'.  What is
its actual size?  I think it's 162 * 2 but I thought I'd better check
with you.

Willi

0
Reply A2Pro 7/15/2006 12:40:04 AM

"Michael J. Mahon" <mjmahon@aol.com> wrote in message
news:oridnYySprSi7CrZnZ2dnUVZ_sadnZ2d@comcast.com...
> Anton Treuenfels wrote:
> > "Michael J. Mahon" <mjmahon@aol.com> wrote in message
> > news:GL2dnfZbpopeQyjZnZ2dnUVZ_rKdnZ2d@comcast.com...
> >
> >>Michael J. Mahon wrote:

> It's all good...I'm working on the solver as we "speak".
>
> There's no need to do an OR in the "=9" case, since the only values
> that byte can take on are 0 and 1.  So LDA #1; STA (bits),y will do
> fine.  This, together with other changes, gets this loop down to 18
> cycles per iteration--too bad that it's much less frequent than the
> other case.  ;-)
>
> > Oh, I gotta stop now...
>
> Please don't!  ;-)
>

Ah, ya talked me into it...but my next idea is a bit more work...

> BTW, as I was thinking about the additional ops in the 65C02, I realized
> how much more useful it would have been to provide more symmetry between
> the X and Y registers.  For example, the register juggling in the above
> loop would be unnecessary if you could post-index by X as well as Y.
>
> Kind of makes you wonder who designed the extensions...

Fred Bowen and friends designed the 65CE02, which had a Z-register that acts
like the Y-registers and 'usurps' the non-indexed indirect instructions of
the 65C02. So

lda (zp)

is really

lda (zp),z

but performs identically to a 65C02 if you never change the Z-register.

I may add the 65CE02 instructions to my assembler just because I think
they're neat - better than the W65C16S, for instance.

> We can avoid this juggling by modifying the addresses of the "neighbor
> index" loads, outside the loop.  This allows X to be used directly as
> an index.  When combined with storing the index already doubled, it
> drops the "<9" loop to 22 cycles per iteration, resulting in a net
> saving of about 260 cycles per 'setbox' call.  In our benchmark
> puzzle, it's called about 8300 times, for a saving of about 2.3 seconds
> out of 30, or 7.6%.  Though I prefer factors of two, I'll take it! ;-)

Self-modifying code?  How about this instead:

    ldx best
    lda bit
    beq set_high_byte_bit:

set_low_byte_bit::

    ldy  other00,x
    lda (bits),y
    ora bit
    sta (bits),y
    ldy  other01,x
    lda (bits),y
    ora bit
    sta (bits),y
    ...                    ; and so on (loop unrolling)
    ldy other19,x
    lda (bits),y
    ora bit
    sta (bits),y
    rts

set_high_byte_bit:

    lda #$01
    ldy other00,x
    iny
    sta (bits),y
    ldy other01,x
    iny
    sta (bits),y
    ...                    ; same unrolled loop
    ldy other19,x
    iny
    sta (bits),y
    rts

The key here is that the 'other' table has to be re-arranged (which is where
most of the work involved occurs). Instead of 81 20-byte tables, use 20
81-byte tables. The first table holds the first 'other' cell for all 81
cells, the second table the second, and so on. The 'best' value indexes the
cell we want to find the 'others' of directly, once for each of the 20
tables.

No address calculation, no counting, no loop overhead, no self-modifying
code - and it's pure 6502 (of course!).

- Anton Treuenfels


0
Reply Anton 7/15/2006 3:47:12 AM

This thread is cross-posted to three different groups which isn't really 
necessary. Just about every Apple II person who reads one of the 
comp.sys.apple2 groups reads all the others.
0
Reply Sean 7/15/2006 5:54:41 AM

Anton Treuenfels wrote:
> "Michael J. Mahon" <mjmahon@aol.com> wrote in message
> news:oridnYySprSi7CrZnZ2dnUVZ_sadnZ2d@comcast.com...
> 
>>Anton Treuenfels wrote:
>>
>>>"Michael J. Mahon" <mjmahon@aol.com> wrote in message
>>>news:GL2dnfZbpopeQyjZnZ2dnUVZ_rKdnZ2d@comcast.com...
>>>
>>>
>>>>Michael J. Mahon wrote:
> 
> 
>>It's all good...I'm working on the solver as we "speak".
>>
>>There's no need to do an OR in the "=9" case, since the only values
>>that byte can take on are 0 and 1.  So LDA #1; STA (bits),y will do
>>fine.  This, together with other changes, gets this loop down to 18
>>cycles per iteration--too bad that it's much less frequent than the
>>other case.  ;-)
>>
>>
>>>Oh, I gotta stop now...
>>
>>Please don't!  ;-)
>>
> 
> 
> Ah, ya talked me into it...but my next idea is a bit more work...
> 
> 
>>BTW, as I was thinking about the additional ops in the 65C02, I realized
>>how much more useful it would have been to provide more symmetry between
>>the X and Y registers.  For example, the register juggling in the above
>>loop would be unnecessary if you could post-index by X as well as Y.
>>
>>Kind of makes you wonder who designed the extensions...
> 
> 
> Fred Bowen and friends designed the 65CE02, which had a Z-register that acts
> like the Y-registers and 'usurps' the non-indexed indirect instructions of
> the 65C02. So
> 
> lda (zp)
> 
> is really
> 
> lda (zp),z
> 
> but performs identically to a 65C02 if you never change the Z-register.

Nice.  Another index register is a very powerful addition.

> I may add the 65CE02 instructions to my assembler just because I think
> they're neat - better than the W65C16S, for instance.
> 
> 
>>We can avoid this juggling by modifying the addresses of the "neighbor
>>index" loads, outside the loop.  This allows X to be used directly as
>>an index.  When combined with storing the index already doubled, it
>>drops the "<9" loop to 22 cycles per iteration, resulting in a net
>>saving of about 260 cycles per 'setbox' call.  In our benchmark
>>puzzle, it's called about 8300 times, for a saving of about 2.3 seconds
>>out of 30, or 7.6%.  Though I prefer factors of two, I'll take it! ;-)
> 
> 
> Self-modifying code?  How about this instead:
> 
>     ldx best
>     lda bit
>     beq set_high_byte_bit:
> 
> set_low_byte_bit::
> 
>     ldy  other00,x
>     lda (bits),y
>     ora bit
>     sta (bits),y
>     ldy  other01,x
>     lda (bits),y
>     ora bit
>     sta (bits),y
>     ...                    ; and so on (loop unrolling)
>     ldy other19,x
>     lda (bits),y
>     ora bit
>     sta (bits),y
>     rts
> 
> set_high_byte_bit:
> 
>     lda #$01
>     ldy other00,x
>     iny
>     sta (bits),y
>     ldy other01,x
>     iny
>     sta (bits),y
>     ...                    ; same unrolled loop
>     ldy other19,x
>     iny
>     sta (bits),y
>     rts
 >
> The key here is that the 'other' table has to be re-arranged (which is where
> most of the work involved occurs). Instead of 81 20-byte tables, use 20
> 81-byte tables. The first table holds the first 'other' cell for all 81
> cells, the second table the second, and so on. The 'best' value indexes the
> cell we want to find the 'others' of directly, once for each of the 20
> tables.

Transposing the array--always an interesting strategy to evaluate
on a 6502.

> No address calculation, no counting, no loop overhead, no self-modifying
> code - and it's pure 6502 (of course!).

I notice that I have an aversion to self-modifying code, which is
justified in some instances.  But going for speed when all code is
in RAM is not one of those instances.  Code modification is quite
manageable when done in a structured way, and when the modifications
are done orders of magnitude less often than the modified addresses
are used.

BTW, I had no bugs related to the address modification, though I did
find an assembler bug related to incorrectly chosing a zero-page variant
for an instruction with a non-zero page operand of the form "xxx+0*0".

I added a table to multiply an index register by two into the other
index register--4 cycles and doesn't clobber the A reg--no biggie,
but handy in avoiding saves/reloads.

We considered unrolling early in the game, but the space used is pretty
high for the benefit, especially when compared with what can be done
with modified addresses.  I've taken the latter path, in 'select',
'setbox', and the 'bits' save on recursion (not so important), and
got a total speed improvement of a little more than 18%.

Frankly, I'm not sure it was worth the work!

I'll look at the timing of your version of 'setbox'.

I also put the screen painting routine into the solver, and that makes
it much snappier on a 1MHz machine.  And I added a "clear bottom 3
lines" routine so I wouldn't need the 80-column firmware's "clear to
end of screen" control character.

I should have the new version up by Monday, after some more testing.

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/16/2006 6:06:00 AM

Sean Fahey wrote:
> This thread is cross-posted to three different groups which isn't really 
> necessary. Just about every Apple II person who reads one of the 
> comp.sys.apple2 groups reads all the others.

I agree, Sean--it was just all followup to the announcement.

After this, I'll reply only in 'programmer'.

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply mjmahon (7061) 7/16/2006 6:07:54 AM

A2Pro wrote:
> Hi!
> 
> Michael J. Mahon wrote:
> 
>>A2Pro wrote:
>>
>>>    You have the REMs at the end of line numbers 410 and 420 backwards.
>>
>>Oops--thanks!
> 
> 
>     The end of "sudoku.solver.s" has the line 'bitmap ds 0'.  What is
> its actual size?  I think it's 162 * 2 but I thought I'd better check
> with you.

As it says in the comment ( ;-), the worst case recursion is 81 levels.
That won't practically happen, especially with the very effective
"expensive" recursion avoidance that Scott used.  The deepest I have
seen it go is 16 levels.

In the latest version, since space is not cramped, I traded space for
some time saving by page-aligning the 'bits' stack, and making each
"frame" a full page, wasting 94 bytes per level, but saving address
modification time.

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply mjmahon (7061) 7/16/2006 6:12:29 AM

"Michael J. Mahon" <mjmahon@aol.com> wrote in message >

> I notice that I have an aversion to self-modifying code, which is
> justified in some instances.  But going for speed when all code is
> in RAM is not one of those instances.  Code modification is quite
> manageable when done in a structured way, and when the modifications
> are done orders of magnitude less often than the modified addresses
> are used.

Actually the only place I use self-modifying code heavily is in the
instruction fetch loop of an interpreter - and there the modified address is
used immediately and usually only once :)

> BTW, I had no bugs related to the address modification, though I did
> find an assembler bug related to incorrectly chosing a zero-page variant
> for an instruction with a non-zero page operand of the form "xxx+0*0".

A bug in Merlin? Maybe, but IIRC Merlin doesn't have operator precedence.
Evaluated left-to-right, the result of this expression is zero, so Merlin
probably believes zero-page is the correct variant. There's a way to force
the absolute variant by adding a character to the opcode, or something like
that. Just a thought.

- Anton Treuenfels


0
Reply Anton 7/16/2006 10:16:51 PM

Anton Treuenfels wrote:
> "Michael J. Mahon" <mjmahon@aol.com> wrote in message >
> 
>>I notice that I have an aversion to self-modifying code, which is
>>justified in some instances.  But going for speed when all code is
>>in RAM is not one of those instances.  Code modification is quite
>>manageable when done in a structured way, and when the modifications
>>are done orders of magnitude less often than the modified addresses
>>are used.
> 
> 
> Actually the only place I use self-modifying code heavily is in the
> instruction fetch loop of an interpreter - and there the modified address is
> used immediately and usually only once :)

That doesn't give you much performance leverage, but a cycle is
a cycle...

>>BTW, I had no bugs related to the address modification, though I did
>>find an assembler bug related to incorrectly chosing a zero-page variant
>>for an instruction with a non-zero page operand of the form "xxx+0*0".
> 
> 
> A bug in Merlin? Maybe, but IIRC Merlin doesn't have operator precedence.
> Evaluated left-to-right, the result of this expression is zero, so Merlin
> probably believes zero-page is the correct variant. There's a way to force
> the absolute variant by adding a character to the opcode, or something like
> that. Just a thought.

Right you are--silly me!  I use the left-to-right evaluation all over
the place, but when writing a "fill in the blanks" address, my old
(mainframe) convention of using "0*0" hoist me by my own petard!  ;-)
(I've learned not to use "xxx+*-*" because another assembler got
confused about relocation attributes!)

 From now on, it's just "xxx+00".

-michael

Fast Sudoku solver for Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/17/2006 4:29:28 AM

Hi!

Michael J. Mahon wrote:
> As it says in the comment ( ;-), the worst case recursion is 81 levels.
> That won't practically happen, especially with the very effective
> "expensive" recursion avoidance that Scott used.  The deepest I have
> seen it go is 16 levels.
>
> In the latest version, since space is not cramped, I traded space for
> some time saving by page-aligning the 'bits' stack, and making each
> "frame" a full page, wasting 94 bytes per level, but saving address
> modification time.

    I translated the Applesoft front-end to Kyan Pascal and modified
"SUDOKU.SOLVER.S" so that the Kyan assembler could cope with it.  The
grid display is much faster and the program solved a simple puzzle.
But then it blew sky high.  I see in another thread that you've redone
the front-end so I've stopped work on my version and am going back to
working on upgrading the compiler.

Willi

0
Reply A2Pro 7/17/2006 4:35:13 AM

A2Pro wrote:
> Hi!
> 
> Michael J. Mahon wrote:
> 
>>As it says in the comment ( ;-), the worst case recursion is 81 levels.
>>That won't practically happen, especially with the very effective
>>"expensive" recursion avoidance that Scott used.  The deepest I have
>>seen it go is 16 levels.
>>
>>In the latest version, since space is not cramped, I traded space for
>>some time saving by page-aligning the 'bits' stack, and making each
>>"frame" a full page, wasting 94 bytes per level, but saving address
>>modification time.
> 
> 
>     I translated the Applesoft front-end to Kyan Pascal and modified
> "SUDOKU.SOLVER.S" so that the Kyan assembler could cope with it.  The
> grid display is much faster and the program solved a simple puzzle.
> But then it blew sky high.  I see in another thread that you've redone
> the front-end so I've stopped work on my version and am going back to
> working on upgrading the compiler.

It's likely that the 'bitmap' stack grew enough to clobber something--
like the runtime...  When the solver returned to the Pascal program,
it went into never-never land.

I've the "version 1" stack grow as large as 2600 bytes, and its
theoretical max is over 13KB.  The version 2.0 solver is more wasteful
of stack space to maintain page alignment, so it can theoretically get
over 20KB.  That's a fair amount of space to carve out of a "host"
language system.

Since I develop on a //e with an 8MHz Zip Chip, I didn't really notice
how long Applesoft took to paint the screen until I ran it on a stock
machine.  That's when I started writing the grid generator in M/L.  ;-)
It's a lot snappier now.

-michael

New, faster SUDOKU v2.0 solver for *all* Apple II's!
Home page:  http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
0
Reply Michael 7/17/2006 6:56:47 AM

58 Replies
136 Views

(page loaded in 0.425 seconds)

12/16/2012 10:16:46 AM


Reply: