COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### Saturn assembly code suggestions?

• Follow

```A few weeks ago I posted here about my first dabblings in assembly
language on the hp50g and got some very useful advice.

I now have a pretty rudimentary "Cows and Bulls" game with my addition
to say how many candidates remain that match the guesses so far.

The first move takes a second or two (down from about 30 seconds
for my first assembly language algorithm).  This delay is due
to the fact that it has to test 3024 ( == 9x8x7x6) candidates the
first time.  Since the first move typically pares the list down
to a few hundred, subsequent moves are virtually instantaneous.

I'm interested to see if any extra speed can be wrung out of the
current algorithm (though I'm guessing it'll never be efficient
enough to be playable on my old 48SX).

I'll quote a small bit of code below.  I think it's pretty clear, but
I'll outline what it's doing afterwards anyway.  I can see a pretty
easy optimisation I could make, using self modifying code, but I don't
fancy that too much.  I'm not all that experienced with the Saturn
instruction set and I hope someone more knowledgeable can suggest a
cleaner way to speed it up...

The code snippet (this function is called 4 times for each of the
3024 candidates first time through):

*
* Remove Pth digit from B, returning it in P.
* At entry:
*   B -- digit array
*   P -- digit to translate
* At exit:
*   B -- array with higher order digits shifted over Pth digit
*   P -- translated digit
*   C -- A, ready to have its digit overwritten
*
PthDig
C=B          W
CSR          W               Close up the higher order digits
P=P-1
GOC          NoCombine
C=B          WP              Keep digits below the selected one
NoCombine
P=P+1
CBEX         W               B updated, C is the original
GOTO         L-enter
Shiftit
CSR          W               Probably faster with smc
L-enter
P=P-1
GONC         Shiftit
P=C          0
C=A          A
RTN

I'm using the lowest 9 nibbles of B as an array.  It will start
with the digits 1-9 in sequence.  Once a digit has been selected,
the higher digits get packed down.  For example I might request
the 3rd, 0th, 6th, 5th digits.  This would translate as

123456789   3rd -> 4
12356789    0th -> 1
2356789     6th -> 9
235678      5th -> 8

Oddly enough, the WP field is *perfect* for closing up the gap
in the array, but the much simpler problem of picking out the
array member in question calls for that slow Shiftit loop.
In short, what I'd like is to be able to extract the P field
into the P register.

P=C		P

The nearest I can see to this is:

P=C		n

with the n in the machine code being overwritten at run time.
I don't fancy this, and I gather it would no longer work from
a port either. (I still don't have much idea of what happens
when a program gets mapped from a port, but I gather it's
not too forgiving of shonky practices)

Oh, and, as I mentioned, I have an SX too, so I'm trying to
do this with "classic" Saturn instructions.  What's life
without a challenge after all?

Any suggestions would be most welcome.

Have fun,

Robert.
--
Robert Swan		| No, not the antarctic adventurer.
swan.r.l@gmail.com	| No, not the Canberra porn monger.
| Yes, that's right, the boring one.
```
 0

```Hi,

I won't dive into this too far, but just a note:
If you use self-modfying code,
I'd suggest moving the code skeleton to
a scratch RAM area, like one of the buffers,
then only modify aprts of the code there,
and call it from your USEROB area main code.

This way you'll avoid various problems regarding
code and checksum consistency.

When writing a code slice for the n-Dames problem
'contest' for the HP-48 in ML a while ago,
I also had a code variant which acted like described above.
The code wasn't actually modifying itself,
but it was modified from the main routine,
and then called from there.

The modified code was very fast and compact,
but the modifying itself, writing it to the destination,
and then calling the code ate up more CPU cycles
than a 'hard-coded' version, and thus the latter was slightly faster.

Generally I don't suggest using self-modifying or similar code
on the Saturn-based machines, as it's very hard to maintain,
can lead to the consistency problems mentioned above
and is also not very clean programming style IMHO;-)

And as you mentioned, it's not bad to use the level 2 instruction set,
to make your code usable on real Satunr machines, too.

HTH

Raymond

"Robert Swan" <roberts@athlonx2.gville> schrieb im Newsbeitrag
news:q2lvh5-amf.ln1@rswan.netspace.net.au...
>A few weeks ago I posted here about my first dabblings in assembly
> language on the hp50g and got some very useful advice.
>
> [..]
> Oh, and, as I mentioned, I have an SX too, so I'm trying to
> do this with "classic" Saturn instructions.  What's life
> without a challenge after all?
>

```
 0

```One simple suggestion I can think of is using a jump table:

.... given P=digit (0-8), B contains the digit array, A and C available
to use (scratch)

Subroutine returns P=requested digit, A and C modified, B is
untouched. It uses RSTK levels, you need to check if that's a problem.

Get_N_Digit:
C=0.A
C=P.0  ; GET DIGIT VALUE
C=C+C.A
A=C.A
C=C+C.A
A=A+C.A    ; MULTIPLY BY 6 NIBBLES (A=DIGIT*6)
GOSUB JUMPOVER
P=C.0
RTN
P=C.1
RTN
P=C.2
RTN
P=C.3
RTN
.... well, you get the idea. These 2 instructions take 6 nibbles
(therefore the multiplication by 6 above)

JUMPOVER:
C=RSTK
A=A+C.A
C=B.W
PC=A

And that should do the trick.

Another one is to store the contents of B in memory, then read the
proper nibble (quite simple, but with memory access overhead). I think
in speed it might be similar to the version above using jump tables.

Hope it helps,
Claudio
```
 0

```On Jun 9, 6:32=A0pm, Claudio Lapilli <pleasedonts...@isp.com> wrote:

> Another one is to store the contents of B in memory, then read the
> proper nibble (quite simple, but with memory access overhead). I think
> in speed it might be similar to the version above using jump tables.
>

Actually, it could even be faster:

D0=3D(5) SaveMisc
C=3DB.W
DAT0=3DC.P
C=3D0.W
C=3DDAT0.1
RTN
```
 0

```hello,

> The first move takes a second or two (down from about 30 seconds
> for my first assembly language algorithm).  This delay is due
> to the fact that it has to test 3024 ( == 9x8x7x6) candidates the
> first time.  Since the first move typically pares the list down
> to a few hundred, subsequent moves are virtually instantaneous.

if there is only 3024 possible cases, it should be much faster than 2
seconds. this is a good news, because it means that there is optimization
room!

>I can see a pretty
> easy optimisation I could make, using self modifying code, but I don't
> fancy that too much.

in my years of experience working on the saturn, I realized that self
modifying code has verry little use and makes prety much no difference. not
to talk about all the issues involved with it either...

> * Remove Pth digit from B, returning it in P.
> * At entry:
> *   B -- digit array
> *   P -- digit to translate
> * At exit:
> *   B -- array with higher order digits shifted over Pth digit
> *   P -- translated digit
> *   C -- A, ready to have its digit overwritten
C=0.B
B+B.WP SKNC { C+8.B } % remove/loose MSB of the digit we want! copy it in Cb
B+B.WP SKNC { C+4.B } % repeat for 2nd bit
B+B.WP SKNC { C+2.B }% and 3rd bit
B+B.WP SKNC { C+1.B }% and last bit
BSR.W % move everything down...
P=C.0 % copy result in P
RTN % return

regards, cyrille

ps: I am using the MASD syntax: SKNC stands for Skip No Carry and skips to
the end of the block if there is no carry. basicaly, it's a fancy GONC
EndOfBlock instructiuon...; *EndOfBlack in masd, you can also ommit the
first part of the instruction in R1=R1+R2 type instructions and just type
R1+R2

cyrille

```
 0

```In article <g2m06b\$aut\$1@usenet01.boi.hp.com>,
cyrille de brebisson <cyrille@hp.com> wrote:
>> The first move takes a second or two (down from about 30 seconds
>> for my first assembly language algorithm).  This delay is due
>> to the fact that it has to test 3024 ( == 9x8x7x6) candidates the
>> first time.  Since the first move typically pares the list down
>> to a few hundred, subsequent moves are virtually instantaneous.
>
>if there is only 3024 possible cases, it should be much faster than 2
>seconds. this is a good news, because it means that there is optimization
>room!

I don't doubt it at all.  Just to be clear about where this function
lies, it is called once for each digit of each candidate, so it's
called 12096 times for the first guess.

I know some of my decisions have sacrificed speed for memory.  E.g.
all this fiddling with decoding a packed representation of
unique-digit numbers could be eliminated by keeping a pre-initialised
grob representing all integers [1234:9876] with a bit cleared for each
invalid number.  The program could work with a copy of this grob,
clearing further bits as guesses eliminate possibilities.  I prefer
being a bit more frugal with memory and allocate just one grob at run
time with 3024 pixels all marked valid, but that leaves me having to
work out which number each pixel represents.

Then again, your code suggestion below shows me I have a long way to
go in understanding just what can be managed with the Saturn
instructions and I would not be surprised if someone like you could
come up with much smarter ways of coding it without blowing out the
memory.

Not that I'm asking for that.  There's nothing important here, I'm
just trying to keep my brain in gear.

>in my years of experience working on the saturn, I realized that self
>modifying code has verry little use and makes prety much no difference. not
>to talk about all the issues involved with it either...

Yeah, I know.  I dabbled with self modifying code on the Z80 many
years ago and it was impressive how little benefit I got.  I thought
it might be different for the Saturn.  I'm kind of relieved that it's
not.

>> * Remove Pth digit from B, returning it in P.
>> * At entry:
>> *   B -- digit array
>> *   P -- digit to translate
>> * At exit:
>> *   B -- array with higher order digits shifted over Pth digit
>> *   P -- translated digit
>> *   C -- A, ready to have its digit overwritten
>C=0.B
>B+B.WP SKNC { C+8.B } % remove/loose MSB of the digit we want! copy it in Cb
>B+B.WP SKNC { C+4.B } % repeat for 2nd bit
>B+B.WP SKNC { C+2.B }% and 3rd bit
>B+B.WP SKNC { C+1.B }% and last bit
>BSR.W % move everything down...
>P=C.0 % copy result in P
>RTN % return

Terrific!  And thanks for including the MASD tips, I have stuck to
HP's syntax (for little reason).  It's a bit of a culture shock to
have brace delimited blocks in assembly language, but it does save
having to make up yet another meaningless label.

I wondered for about a minute whether the same as the above couldn't
be accomplished with less branching using BSL, but then I remembered
how the sticky bit works (i.e. with no instruction that shifts the SB
in).  Strange -- I suppose HP had their reasons at the time.

Anyhow, many thanks Cyrille, and also to Claudio and Raymond.  I've
do tick counts), though I will try Claudio's too, just for the fun of
it.  He wasn't as kind as you though -- his isn't a straight plug-in
replacement for the original code :-)

Have fun,

Robert.
--
Robert Swan		| No, not the antarctic adventurer.
swan.r.l@gmail.com	| No, not the Canberra porn monger.
| Yes, that's right, the boring one.
```
 0

```In article <urd3i5-amr.ln1@rswan.netspace.net.au>,  <swan.r.l@gmail.com> wrote:
>In article <g2m06b\$aut\$1@usenet01.boi.hp.com>,
>cyrille de brebisson <cyrille@hp.com> wrote:
[snip]
>Anyhow, many thanks Cyrille, and also to Claudio and Raymond.  I've
>do tick counts), though I will try Claudio's too, just for the fun of
>it.  He wasn't as kind as you though -- his isn't a straight plug-in
>replacement for the original code :-)

Having tried the three algorithms with the same problem with TEVAL,
Claudio's latest suggestion wins (I didn't try his jump table
suggestion, just his transfer via memory method).  Average results
after 4 iterations are:

Original: 1.32
Claudio:  1.18
Cyrille:  1.33

I still like Cyrille's code because it very cleverly did in one step
what I had been doing in two, and I don't think I would have thought
of it myself no matter how long I stewed over it.  OTOH Claudio's was
simple and elegant; should have been able to think of that myself.

Thanks again,

Robert.
--
Robert Swan		| No, not the antarctic adventurer.
swan.r.l@gmail.com	| No, not the Canberra porn monger.
| Yes, that's right, the boring one.
```
 0

```hello,

>>if there is only 3024 possible cases, it should be much faster than 2
>>seconds. this is a good news, because it means that there is optimization
>>room!
> I don't doubt it at all.  Just to be clear about where this function
> lies, it is called once for each digit of each candidate, so it's
> called 12096 times for the first guess.
but you said that my updated function makes little difference in time, so
where is the time spent? what other things are done in your critical loop
that could be optimized?

> Terrific!  And thanks for including the MASD tips, I have stuck to
> HP's syntax (for little reason).  It's a bit of a culture shock to
> have brace delimited blocks in assembly language, but it does save
> having to make up yet another meaningless label.
Masd syntax is a WHOLE lot better than the HP syntax, and for multiple
reasons.
how compact it is is critical when you program on the calculator itself (as
Masd originally was), and the brackets does make it so much more readable,
especially when you have multiple loops together...
LC 01234
{
B=0.B
{
B+1.B UPNC
}
C-1.A UPNC
}
it's like working in C :-)

regards, cyrille

```
 0

```Hello,

"cyrille de brebisson" <cyrille@hp.com> schrieb im Newsbeitrag
news:g2ojpb\$npv\$1@usenet01.boi.hp.com...
> hello,
>
>> Terrific!  And thanks for including the MASD tips, I have stuck to
>> HP's syntax (for little reason).  It's a bit of a culture shock to
>> have brace delimited blocks in assembly language, but it does save
>> having to make up yet another meaningless label.
> Masd syntax is a WHOLE lot better than the HP syntax, and for multiple
> reasons.
> how compact it is is critical when you program on the calculator itself
> (as Masd originally was), and the brackets does make it so much more
> readable, especially when you have multiple loops together...
> [..]
>
Preferences are different, but I don't think that MASD
is better than the original HP/SASM syntax.
A programmer doesn't automatically write better code
if he/she uses MASD syntax.

I think it's an alternative and a typing aid,
if you do programming on the calc directly.

But if it comes to code debugging and/or disassembling,
it'll be individual lines of code and labels anyway.

So you could have used the classic HP syntax in the first place;-)

After all it's a matter of taste:-)

Regards

Raymond

Ok, maybe I'm slighty biased, because I use
the original HP syntax since about 1984 or 85...

```
 0

```hello,

> Preferences are different, but I don't think that MASD
> is better than the original HP/SASM syntax.
> A programmer doesn't automatically write better code
> if he/she uses MASD syntax.
>
> I think it's an alternative and a typing aid,
> if you do programming on the calc directly.
>
> But if it comes to code debugging and/or disassembling,
> it'll be individual lines of code and labels anyway.
>
> So you could have used the classic HP syntax in the first place;-)
>
> After all it's a matter of taste:-)
>
> Ok, maybe I'm slighty biased, because I use
> the original HP syntax since about 1984 or 85...

I think that you are biased....
the reason for that, is that Masd, with it's skip structures (ie: { and } )
and multiple instructions per line allows to write code that is more
readable, code where you can see the structure of the code much more clearly
than with the HP syntax (that is providing that you read clear code of
course, you can always use MASD to obfulscate yor code :-)

for example: the following code (taken from the *ScrollVGrob entry point
that scrolls verticaly a portion of a graphic)
is much more readable and easier to debug in MASD syntax than in HP syntax
(especially once you get the syntax highlighting, fixed width font and no
word wrap...).
you can very easely see the various levels of lopps of the program, and
having multiple instructions per line allow to have the whole function
available on a single screen. In HP syntax, this would be a beast to read
and understand...

% P, Bb: Nb nibbles (P+1+Bb*16 = nb full nibbles)
% Da: Nb lines (clipped h)
% D0: @ source
% D1: @ destination
D-1.A SKNC { P=0 RTN }
% At least one line?
?ST=0.fGray { C=D.A RSTK=C AD0EX D0=A C=R4.A C+A.A R4=C.A }
% If grey mode then save D in RSTK, R4=pos on source on next plane
{
{
% This is the main scrool loop, one line is copyed at each loop
C=DAT0.S C&B.S A=DAT1.S B=-B-1.S A&B.S B=-B-1.S A!C.S DAT1=A.S D1+1 D0+1
% Copy the first nible of the line
?ST=1.fNoFullNible
% if there is some full nibbles to scrool,
{
% scrool them
C=DAT0.WP DAT1=C.WP CD1EX C+P+1 CD1EX CD0EX C+P+1 CD0EX
% Copy P nibble from source to dest
A=B.B A-1.B SKC { C=DAT0.16 DAT1=C.16 D1+16 D0+16 A-1.B UPNC }
% now, copy Ba*16 nibbles
}
?D=0.S { C=DAT0.S C&D.S A=C.S C=DAT1.S D=-D-1.S C&D.S D=-D-1.S A!C.S
DAT1=A.S } % do the final nibble if needed
% put pointers on Next line
D-1.A EXITC UP
% next line
}
?ST=1.fGray { P=0 RTN }
% if we are not in grey scale, return
A=R4.A AD0EX CD1EX C-A.A A=R4.A A+C.A D1=C C=RSTK D=C.A ST=0.fGray UP
% D0 point on second place source, D1 on second plane destination and Da is
restored
}

ho, and I am not saying that because I actually created the syntax :-)

cyrille

```
 0

```hello,

of couse, microsoft mangeled my code bellow, so the comments that were
nicely lined up at the end of each lines are now on their own separate lines
making things much harder to read... arrrrgggg...

cyrille

"cyrille de brebisson" <cyrille@hp.com> wrote in message
news:g2pi56\$apn\$1@usenet01.boi.hp.com...
> hello,
>
>> Preferences are different, but I don't think that MASD
>> is better than the original HP/SASM syntax.
>> A programmer doesn't automatically write better code
>> if he/she uses MASD syntax.
>>
>> I think it's an alternative and a typing aid,
>> if you do programming on the calc directly.
>>
>> But if it comes to code debugging and/or disassembling,
>> it'll be individual lines of code and labels anyway.
>>
>> So you could have used the classic HP syntax in the first place;-)
>>
>> After all it's a matter of taste:-)
>>
>> Ok, maybe I'm slighty biased, because I use
>> the original HP syntax since about 1984 or 85...
>
> I think that you are biased....
> the reason for that, is that Masd, with it's skip structures (ie: {
> and } ) and multiple instructions per line allows to write code that is
> more readable, code where you can see the structure of the code much more
> clearly than with the HP syntax (that is providing that you read clear
> code of course, you can always use MASD to obfulscate yor code :-)
>
> for example: the following code (taken from the *ScrollVGrob entry point
> that scrolls verticaly a portion of a graphic)
> is much more readable and easier to debug in MASD syntax than in HP syntax
> (especially once you get the syntax highlighting, fixed width font and no
> word wrap...).
> you can very easely see the various levels of lopps of the program, and
> having multiple instructions per line allow to have the whole function
> available on a single screen. In HP syntax, this would be a beast to read
> and understand...
>
> % P, Bb: Nb nibbles (P+1+Bb*16 = nb full nibbles)
> % Da: Nb lines (clipped h)
> % D0: @ source
> % D1: @ destination
> D-1.A SKNC { P=0 RTN } % At least one line?
> ?ST=0.fGray { C=D.A RSTK=C AD0EX D0=A C=R4.A C+A.A R4=C.A } % If grey mode
> then save D in RSTK, R4=pos on source on next plane
> {
>
> {
> % This is the main scrool loop, one line is copyed at each loop
>    C=DAT0.S C&B.S A=DAT1.S B=-B-1.S A&B.S B=-B-1.S A!C.S DAT1=A.S D1+1
> D0+1 % Copy the first nible of the line
>    ?ST=1.fNoFullNible % if there is some full nibbles to scrool,
>
> {                                                                        %
> scrool them
>      C=DAT0.WP DAT1=C.WP CD1EX C+P+1 CD1EX CD0EX C+P+1 CD0EX % Copy P
> nibble from source to dest
>      A=B.B A-1.B SKC { C=DAT0.16 DAT1=C.16 D1+16 D0+16 A-1.B UPNC } % now,
> copy Ba*16 nibbles
>    }
>    ?D=0.S { C=DAT0.S C&D.S A=C.S C=DAT1.S D=-D-1.S C&D.S D=-D-1.S A!C.S
> DAT1=A.S } % do the final nibble if needed
>    D-1.A EXITC UP % next line
>  }
>  ?ST=1.fGray { P=0 RTN } % if we are not in grey scale, return
>  A=R4.A AD0EX CD1EX C-A.A A=R4.A A+C.A D1=C C=RSTK D=C.A ST=0.fGray UP %
> D0 point on second place source, D1 on second plane destination and Da is
> restored
> }
>
> ho, and I am not saying that because I actually created the syntax :-)
>
> cyrille
>

```
 0

```In article <g2ojpb\$npv\$1@usenet01.boi.hp.com>,
cyrille de brebisson <cyrille@hp.com> wrote:

>but you said that my updated function makes little difference in time, so
>where is the time spent? what other things are done in your critical loop
>that could be optimized?

The critical loop is pretty simple.  Given the current 4-digit guess
and its count of "cows" and "bulls", it walks through the bitmap and:

ncandidates = 0
for all 3024 bits in bitmap
if bit set
convert bit index into corresponding 4-unique number
compare this number with guess
if cows/bulls counts match
ncandidates++
else
strike out current bit
return ncandidates

As things stand, only the first guess is slow, so the bit testing's
not the bottleneck.  The index conversion is what we've been looking
into here.  It's by far most complicated part, but I might as well
post here the cbcmp function to see if you have more clever ideas (or
Claudio comes along to trump you :-)

*************************************************************
* Rotate LS 4 nibbles of C left
*
doRot	MACRO
CSL		A
CPEX		4
CPEX		0
ENDM
*************************************************************
* Compare A.A and C.A, result in A.A
* Input:
*   A[A], C[A] - numbers to compare
* Output:
*   A[B] - 16*bulls + cows
* Modified:
*   B
*
cbcmp
B=0		A
GOSUB	mcnt		Bulls
BSL		B
doRot			Count cows for the other 3 positions
GOSUB	mcnt
doRot
GOSUB	mcnt
doRot
GOSUB	mcnt
A=B		A		Return 16*B+C
P=		0
RTN
*************************************************************
* Increment B for each equal digit in A and C
*
mcnt
P=		3
NxtDig
?A#C		P
GOYES	Differ
B=B+1	B
Differ
P=P-1
GONC		NxtDig
RTN

Inlining doRot saved about .03 of a second, so removing 3024x3 GOSUB
RTN pairs didn't help things all that much.

Mcnt does feel like the sort of thing that the image processing
fraternity have some clever way of using magical bit operations to do
the whole thing in parallel rather than a loop.  Beyond my knowledge,
but maybe not beyond yours.

>Masd syntax is a WHOLE lot better than the HP syntax, and for multiple
>reasons.
>how compact it is is critical when you program on the calculator itself (as
>Masd originally was), and the brackets does make it so much more readable,
>especially when you have multiple loops together...
>LC 01234
>{
>  B=0.B
>  {
>     B+1.B UPNC
>  }
>  C-1.A UPNC
>}
>it's like working in C :-)

It certainly seems to be in a different place from any other assembly
language I've seen.  And quite an interesting idea.  Rather than C
as a "machine independent assembly language", MASD is a "machine
dependent C language".

I still have an HP-71B that I got in the mid '80s, along with its
Forth/Assembler ROM.  I didn't get all that deeply into it back then,
but its still the first syntax I learnt for the Saturn and its hard to
get away from that.

Have fun,

Robert.
--
Robert Swan		| No, not the antarctic adventurer.
swan.r.l@gmail.com	| No, not the Canberra porn monger.
| Yes, that's right, the boring one.
```
 0

```Actually the syntax is VERY good :-) I did use it for my last assembly
super-mario game in 2000.

sources codes versionning here :
http://www.hpcalc.org/details.php?id=3058

julien meyer

cyrille de brebisson a �crit :

>> ho, and I am not saying that because I actually created the syntax :-)
>>
>> cyrille
>>
>
>
```
 0

12 Replies
246 Views

Similiar Articles:

7/21/2012 6:19:56 PM