Optimization problem #2

  • Follow


Hi all,

Can anyone give me some hints as to solve the following problem, 
preferably in a way that is faster than the way I used to do it, and 
without the bug in the current version;

Problem description:

The program processes an array of N cells, with N up to about 4096. The 
array is filled from the start, it may not be completely full but there 
are no empty cells between full cells. A cell contains a value of 
1..255, so logically every cell would be a byte, but if processing 
efficiency requires each cell to be a word or even a double-word, that 
would be no problem. The values are normalized, they are effectively 
indexes into a table, so if the table contains just T entries, the 
values would go from 1 to T.

Given the above setup, the problem is to find *all* sub-arrays that 
contain 3..T distinct elements and are as short as possible, i.e. if 
there are 42 sub-arrays of three elements containing three distinct 
values, there is no need to find sub-arrays of four elements containing 
three distinct values. Also, a shorter sub-array may *never* be a part 
of a longer sub-array containing *only* distinct values.

An example, suppose only the array contains 56 elements, in this case 
with, for clarity, values a..j:

          1         2         3         4         5     5
.....5....0....5....0....5....0....5....0....5....0....56
aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja

in this case the program should find the following sub-arrays:

len aperture from to value
  3     3      40  42 bef
  3     3      44  46 fgh
  7     7      50  56 gihbfja
  8    16      41  56 efffghggggihbfja, containing efghibja
  9    23      34  56 cbbbbbbefffghggggihbfja, containing cbefghija
10    25      32  56 dccbbbbbbefffghggggihbfja, containing dcbefghija

It is possible to find sub-arrays with 4, 5, or 6 distinct values, but 
they are either longer than the series of 7-in-7 (12-25 contain abcd), 
or are part of the 7-in-7 series and as such they should not be included!

My AD 1996 program used to slide a window that started with three 
elements (plus a sentinel on either side) over the big array and spit 
out the position of the first element if it found three distinct 
elements in the window *and* the two sentinels also contained any of the 
values in the window. The program included some fairly minor 
optimizations, such as immediately sliding the window to the last of a 
series of equal values, in the example above, the three position wide 
window would start by covering pos 6..8, and next it would move to 
11..13, etc. If at any moment the program would find that all elements 
in the window *and the sentinel on the right* would be distinct, the 
window size would be increased by one and the scan would continue.

Taking the above example, a window with an aperture of four elements 
would start its slide at position 6..9. Once it reached position 49, 
with "gihb" visible and "g" and "f" in the two sentinels, the window 
would be widened to five characters as the "f" is distinct from the four 
"visible" characters, with a new sentinel of "j". Given that this is 
again distinct from the now five characters in the window, the process 
would repeat itself, eventually resulting in the string of seven 
distinct characters starting at position 50.

Once the string of seven distinct characters has been found, the window 
is widened to *nine* (if there had been a string of 8-in-8 it would have 
been found by the previous slide!) characters and the process restarts 
at position 6..14, but in the end it fails to find a series of eight 
distinct characters and so the process is repeated with windows of 10, 
11, 12, 13, 14, 15 and finally 16 characters, when a string of eight 
distinct characters is finally found.

The problem is, for relatively low values of N and C, the process may 
not be overly efficient, but it works. However, once N and C increase, 
(N is theoretically unbounded, C has an upper limit of around 210), the 
process becomes horribly inefficient: using the old IBM V2.3.0 OS PL/I 
compiler, which had a statement count facility, this method required the 
execution of 556,379,518 PL/I statements (and a number of actual machine 
instruction that is at least one order of magnitude greater) for the 
current values of N(2329) and C(66).

As for some figures on required restarts, the final series of 66 is 
found in a sub-array with a length of 1950 elements and with a series of 
65 in a sub-array of only 1889 elements, this means that the slide had 
to be restarted 61 times! Even more restarts, 312(!), are needed when 
going from 61 to 62 elements.

In 1998 I posted the problem to comp.lang.pascal.borland, the post 
should be on Google, but Google refuses to show more than the first 
9,960 (out of 28,517) posts in that group.

Three people claimed that there was a much faster way, and two of them 
backed this up with programs, Brent Beach and Paul Green, with Paul's 
solution being the fastest by a fair margin. The results of his code 
matched my output, I replaced my code with his code, and that seemed to 
be the end of the story...

Until a few months back...

.... when I started converting the original Pascal code in assembler. 
While doing so I decided that it would be interesting to use this 
procedure in another part of the code. However, the results were not 
what I expected: result rows were missing. I managed to track down a 
paper copy of the original PG program, typed that in, had it checked for 
typos by a friend, and it turned out to give the same erroneous results. 
PG's email address is no longer valid, the floppy that contained a copy 
of my email conversation with him and Brent, including a basic 
explanation of the algorithm he used, is, after having been stored on a 
cold loft for the best part of the last decade, unreadable. The only 
thing I remember about his explanation were the words "clear as mud..." 
near the end.

I have spent the last two weeks trying to figure out how it works and 
staring at the Virtual Pascal IDE "Watch" window, I've finally decided 
to ask for help.

So, if anyone can give me any clues as to how I could perform the above 
described process in an efficient way, I'd be very grateful. *I'm quite 
capable of writing the code myself*, I would just like to get any clues 
towards a more efficient algorithm. If that means you want to see the 
current Pascal (or PL/I) code and the input data that leads to the 
erroneous results, drop me a line, "robert(a)prino(d)org".

Thanks,

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply spamtrap14 (66) 5/16/2010 6:28:26 PM

In comp.lang.pl1 Robert AH Prins <spamtrap@prino.org> wrote:
 
> Can anyone give me some hints as to solve the following problem, 
> preferably in a way that is faster than the way I used to do it, and 
> without the bug in the current version;
 
> Problem description:
 
> The program processes an array of N cells, with N up to about 4096. The 
> array is filled from the start, it may not be completely full but there 
> are no empty cells between full cells. A cell contains a value of 
> 1..255, so logically every cell would be a byte, but if processing 
> efficiency requires each cell to be a word or even a double-word, that 
> would be no problem. The values are normalized, they are effectively 
> indexes into a table, so if the table contains just T entries, the 
> values would go from 1 to T.

For alphabetsize of 32 or less, that is, values 0..31, and if I
did it in C, I would probably use the one bit per 32 bit word
storage, which makes testing fairly easy using & and |.
As far as I know, PL/I BIT(32) doesn't have the optimization
that uses a 32 bit word, and UNSPEC is likely even worse.
 
> Given the above setup, the problem is to find *all* sub-arrays that 
> contain 3..T distinct elements and are as short as possible, i.e. if 
> there are 42 sub-arrays of three elements containing three distinct 
> values, there is no need to find sub-arrays of four elements containing 
> three distinct values. Also, a shorter sub-array may *never* be a part 
> of a longer sub-array containing *only* distinct values.

I wonder if this can be done using dynamic programming.
DP can speed up many such problems, but it isn't always
so obvious.

For the longer subarrays, it might be that hash tables could
speed up the comparisons.  That would, for example, get you
an O(N) test for a value being in a subarray being considered.
 
> An example, suppose only the array contains 56 elements, in this case 
> with, for clarity, values a..j:
 
>          1         2         3         4         5     5
> ....5....0....5....0....5....0....5....0....5....0....56
> aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
 
> in this case the program should find the following sub-arrays:
 
> len aperture from to value
>  3     3      40  42 bef
>  3     3      44  46 fgh
>  7     7      50  56 gihbfja
>  8    16      41  56 efffghggggihbfja, containing efghibja
>  9    23      34  56 cbbbbbbefffghggggihbfja, containing cbefghija
> 10    25      32  56 dccbbbbbbefffghggggihbfja, containing dcbefghija
 
> It is possible to find sub-arrays with 4, 5, or 6 distinct values, but 
> they are either longer than the series of 7-in-7 (12-25 contain abcd), 
> or are part of the 7-in-7 series and as such they should not be included!

> My AD 1996 program used to slide a window that started with three 
> elements (plus a sentinel on either side) over the big array and spit 
> out the position of the first element if it found three distinct 
> elements in the window *and* the two sentinels also contained any of the 
> values in the window. The program included some fairly minor 
> optimizations, such as immediately sliding the window to the last of a 
> series of equal values, in the example above, the three position wide 
> window would start by covering pos 6..8, and next it would move to 
> 11..13, etc. If at any moment the program would find that all elements 
> in the window *and the sentinel on the right* would be distinct, the 
> window size would be increased by one and the scan would continue.

The optimization works if it is faster than a direct test.
For short subarrays, an unrolled compare loop should be pretty fast.
For longer ones, as I said above, a hash table might be fast.
 
> Taking the above example, a window with an aperture of four elements 
> would start its slide at position 6..9. Once it reached position 49, 
> with "gihb" visible and "g" and "f" in the two sentinels, the window 
> would be widened to five characters as the "f" is distinct from the four 
> "visible" characters, with a new sentinel of "j". Given that this is 
> again distinct from the now five characters in the window, the process 
> would repeat itself, eventually resulting in the string of seven 
> distinct characters starting at position 50.
 
> Once the string of seven distinct characters has been found, the window 
> is widened to *nine* (if there had been a string of 8-in-8 it would have 
> been found by the previous slide!) characters and the process restarts 
> at position 6..14, but in the end it fails to find a series of eight 
> distinct characters and so the process is repeated with windows of 10, 
> 11, 12, 13, 14, 15 and finally 16 characters, when a string of eight 
> distinct characters is finally found.
 
> The problem is, for relatively low values of N and C, the process may 
> not be overly efficient, but it works. However, once N and C increase, 
> (N is theoretically unbounded, C has an upper limit of around 210), the 
> process becomes horribly inefficient: using the old IBM V2.3.0 OS PL/I 
> compiler, which had a statement count facility, this method required the 
> execution of 556,379,518 PL/I statements (and a number of actual machine 
> instruction that is at least one order of magnitude greater) for the 
> current values of N(2329) and C(66).
 
> As for some figures on required restarts, the final series of 66 is 
> found in a sub-array with a length of 1950 elements and with a series of 
> 65 in a sub-array of only 1889 elements, this means that the slide had 
> to be restarted 61 times! Even more restarts, 312(!), are needed when 
> going from 61 to 62 elements.
 
> In 1998 I posted the problem to comp.lang.pascal.borland, the post 
> should be on Google, but Google refuses to show more than the first 
> 9,960 (out of 28,517) posts in that group.
 
> Three people claimed that there was a much faster way, and two of them 
> backed this up with programs, Brent Beach and Paul Green, with Paul's 
> solution being the fastest by a fair margin. The results of his code 
> matched my output, I replaced my code with his code, and that seemed to 
> be the end of the story...
 
> Until a few months back...
 
> ... when I started converting the original Pascal code in assembler. 

Can you post some code so we can see it?  It might give some good,
or not so good, ideas on how to do it better.

> While doing so I decided that it would be interesting to use this 
> procedure in another part of the code. However, the results were not 
> what I expected: result rows were missing. 

-- glen
0
Reply glen 5/17/2010 12:29:19 AM


In article <85akp6F4bpU1@mid.individual.net>,
	Robert AH Prins <spamtrap@prino.org> writes:

> Given the above setup, the problem is to find *all* sub-arrays that
> contain 3..T distinct elements and are as short as possible, i.e. if
> there are 42 sub-arrays of three elements containing three distinct
> values, there is no need to find sub-arrays of four elements containing
> three distinct values. Also, a shorter sub-array may *never* be a part
> of a longer sub-array containing *only* distinct values.
>
> An example, suppose only the array contains 56 elements, in this case
> with, for clarity, values a..j:
>
>           1         2         3         4         5     5
> ....5....0....5....0....5....0....5....0....5....0....56
> aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
>
> in this case the program should find the following sub-arrays:
>
> len aperture from to value
>   3     3      40  42 bef
>   3     3      44  46 fgh
>   7     7      50  56 gihbfja
>   8    16      41  56 efffghggggihbfja, containing efghibja
>   9    23      34  56 cbbbbbbefffghggggihbfja, containing cbefghija
> 10    25      32  56 dccbbbbbbefffghggggihbfja, containing dcbefghija
>
>
> It is possible to find sub-arrays with 4, 5, or 6 distinct values, but
> they are either longer than the series of 7-in-7 (12-25 contain abcd),

Are you saying that if we tabulate the results, then both the "len" and
"aperture" columns should have increasing values?

If my understanding of the requirements is correct then it may be better
to start with a large window and _decrease_ it in each pass. (As finding
the longest length and corresponding aperture is trivial).
For example, after finding len=10 aperture=25 one should only consider
substrings of 25 or less when searching for sequences of less than 10
distinct values.
0
Reply free 5/17/2010 4:57:52 AM

Dick Wesseling wrote:
> In article<85akp6F4bpU1@mid.individual.net>,
> 	Robert AH Prins<spamtrap@prino.org>  writes:
>
>> Given the above setup, the problem is to find *all* sub-arrays that
>> contain 3..T distinct elements and are as short as possible, i.e. if
>> there are 42 sub-arrays of three elements containing three distinct
>> values, there is no need to find sub-arrays of four elements containing
>> three distinct values. Also, a shorter sub-array may *never* be a part
>> of a longer sub-array containing *only* distinct values.
>>
>> An example, suppose only the array contains 56 elements, in this case
>> with, for clarity, values a..j:
>>
>>            1         2         3         4         5     5
>> ....5....0....5....0....5....0....5....0....5....0....56
>> aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
>>
>> in this case the program should find the following sub-arrays:
>>
>> len aperture from to value
>>    3     3      40  42 bef
>>    3     3      44  46 fgh
>>    7     7      50  56 gihbfja
>>    8    16      41  56 efffghggggihbfja, containing efghibja
>>    9    23      34  56 cbbbbbbefffghggggihbfja, containing cbefghija
>> 10    25      32  56 dccbbbbbbefffghggggihbfja, containing dcbefghija

I haven't thought this all the way through, but my immediate idea was to 
start by making a full inverted index, i.e. sort all possible substrings 
into alphabetical order.

However, on second consideration this probably doesn't really help. :-(

>>
>>
>> It is possible to find sub-arrays with 4, 5, or 6 distinct values, but
>> they are either longer than the series of 7-in-7 (12-25 contain abcd),
>
> Are you saying that if we tabulate the results, then both the "len" and
> "aperture" columns should have increasing values?
>
> If my understanding of the requirements is correct then it may be better
> to start with a large window and _decrease_ it in each pass. (As finding
> the longest length and corresponding aperture is trivial).
> For example, after finding len=10 aperture=25 one should only consider
> substrings of 25 or less when searching for sequences of less than 10
> distinct values.

This sounds like a good idea!

If we start with a full scan that indexes the locations of each possible 
byte value, then we know that the longest sequence will contain each of 
the unique byte values found, right?

We can then start with the full input array, then from each end remove 
bytes until we get to the only remaining entry for a byte value: This 
will be the shortest substring which contains all the unique values.

As you note, this is also the longest substring we need to consider...

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/17/2010 8:09:13 AM

Robert AH Prins wrote:
> On 2010-05-17 00:29, glen herrmannsfeldt wrote:
>> In comp.lang.pl1 Robert AH Prins<spamtrap@prino.org> wrote:
>>
>>> Can anyone give me some hints as to solve the following problem,
>>> preferably in a way that is faster than the way I used to do it, and
>>> without the bug in the current version;
>>
>>> Problem description:
>>
>>> The program processes an array of N cells, with N up to about 4096. The
>>> array is filled from the start, it may not be completely full but there
>>> are no empty cells between full cells. A cell contains a value of
>>> 1..255, so logically every cell would be a byte, but if processing
>>> efficiency requires each cell to be a word or even a double-word, that
>>> would be no problem. The values are normalized, they are effectively
>>> indexes into a table, so if the table contains just T entries, the
>>> values would go from 1 to T.

The maximum array size of 4096 is interesting, it means that we can 
afford to use quite a bit of lookup table space per entry.

Otoh is also means that we cannot afford to use a lot of time to 
initialize those tables, since that would take just as long or longer 
than the obvious window scanning algorithm.

I would still start by determining the first and last position of each 
unique byte, then use that to pick the determine the shortest substring 
which contains all the values: This becomes the limit for a window scan 
that starts with aperture 3 and goes up.

   byte not_found[256]; // Contains 1 for byte values not seen so far

   unsigned current_limit = 3; // Minimum len!
   unsigned next_aperture = 0; // Wait until we find the first substring

   for (unsigned aperture = 3; aperture < limit; aperture++) {
     if (aperture == next_aperture)
       current_limit++;

     for (unsigned start = 0; start <= length-aperture; start++) {
       /* This memset might turn out to be the single most expensive
          part of the algorithm, at least for short lengths/apertures.
          Even using 16 SSE writes, it will take at least 16 cycles
          just to get to L1 cache.

          One possible optimization is to note that at least half
          of all bus write cycles will be idle during the main loop
          below, so we could use two independent not_found buffers
          and alternate between them: Use one and zero the other
          at the same time!
        */
       memset(not_found, 1, sizeof(found));

       unsigned count = 0;

       /* The core of the scanning algorithm:
          Count the number of unique byte values by adding together
          all those values not seen previously, while making each
          entry in the not_found[] array:
        */
       for (i = 0; i < aperture; i++) {
         byte c = data[start+i];
         count += not_found[c];
         not_found[c] = 0;
       }
       /* The running time for this loop should be ~3 cycles/iteration
          as long as the loop branch is correctly predicted, and at
          least for the shorter apertures the modern multi-level
          predictors should do quite well here.
        */

       /* This next part will be very rarely executed except for
          the shortest apertures/lengths, so we can assume it to
          be skipped:
        */
       if (count >= current_limit) {
          if (count > current_limit)
            current_limit = count;
          save(start, aperture, count);
          next_aperture = aperture+1;
       }
     }
   }

Terje
-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/17/2010 9:03:15 AM

On 2010-05-17 04:57, Dick Wesseling wrote:
> In article<85akp6F4bpU1@mid.individual.net>,
> 	Robert AH Prins<spamtrap@prino.org>  writes:
>
>> Given the above setup, the problem is to find *all* sub-arrays that
>> contain 3..T distinct elements and are as short as possible, i.e. if
>> there are 42 sub-arrays of three elements containing three distinct
>> values, there is no need to find sub-arrays of four elements containing
>> three distinct values. Also, a shorter sub-array may *never* be a part
>> of a longer sub-array containing *only* distinct values.
>>
>> An example, suppose only the array contains 56 elements, in this case
>> with, for clarity, values a..j:
>>
>>            1         2         3         4         5     5
>> ....5....0....5....0....5....0....5....0....5....0....56
>> aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
>>
>> in this case the program should find the following sub-arrays:
>>
>> len aperture from to value
>>  3     3      40  42 bef
>>  3     3      44  46 fgh
>>  7     7      50  56 gihbfja
>>  8    16      41  56 efffghggggihbfja, containing efghibja
>>  9    23      34  56 cbbbbbbefffghggggihbfja, containing cbefghija
>>  0    25      32  56 dccbbbbbbefffghggggihbfja, containing dcbefghija
>>
>>
>> It is possible to find sub-arrays with 4, 5, or 6 distinct values, but
>> they are either longer than the series of 7-in-7 (12-25 contain abcd),
>
> Are you saying that if we tabulate the results, then both the "len" and
> "aperture" columns should have increasing values?

If they are temporarily stored in an array or list, then it's trivial to 
sort them in any required order before further processing.

> If my understanding of the requirements is correct then it may be better
> to start with a large window and _decrease_ it in each pass. (As finding
> the longest length and corresponding aperture is trivial).

Even if there may be more than one longest interval? This *is* pretty 
unlikely. However, the current set of data contains two sub-arrays with 
a length of 1879 that contain 64 distinct values.

> For example, after finding len=10 aperture=25 one should only consider
> substrings of 25 or less when searching for sequences of less than 10
> distinct values.

That is pretty obvious, but why would reducing the aperture from 24 down 
to potentially 9 to find substrings with 9 distinct values be any more 
efficient than starting with an aperture of 9 and increasing it to 24? 
If the substring of 9 elements were to be found with an aperture of 16 
or 17, i.e. dead in the middle, starting with the smaller aperture would 
involve less testing...

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply Robert 5/17/2010 9:12:12 AM

On 2010-05-17 00:29, glen herrmannsfeldt wrote:
> In comp.lang.pl1 Robert AH Prins<spamtrap@prino.org>  wrote:
>
>> Can anyone give me some hints as to solve the following problem,
>> preferably in a way that is faster than the way I used to do it, and
>> without the bug in the current version;
>
>> Problem description:
>
>> The program processes an array of N cells, with N up to about 4096. The
>> array is filled from the start, it may not be completely full but there
>> are no empty cells between full cells. A cell contains a value of
>> 1..255, so logically every cell would be a byte, but if processing
>> efficiency requires each cell to be a word or even a double-word, that
>> would be no problem. The values are normalized, they are effectively
>> indexes into a table, so if the table contains just T entries, the
>> values would go from 1 to T.
>
> For alphabetsize of 32 or less, that is, values 0..31, and if I
> did it in C, I would probably use the one bit per 32 bit word
> storage, which makes testing fairly easy using&  and |.
> As far as I know, PL/I BIT(32) doesn't have the optimization
> that uses a 32 bit word, and UNSPEC is likely even worse.

Bit operations, especially on aligned bitstrings, were never very bad, 
even in the old V2.3.0 compiler, and BIT(1) ALIGNED already uses one bit 
per 1 byte.

>> Given the above setup, the problem is to find *all* sub-arrays that
>> contain 3..T distinct elements and are as short as possible, i.e. if
>> there are 42 sub-arrays of three elements containing three distinct
>> values, there is no need to find sub-arrays of four elements containing
>> three distinct values. Also, a shorter sub-array may *never* be a part
>> of a longer sub-array containing *only* distinct values.
>
> I wonder if this can be done using dynamic programming.
> DP can speed up many such problems, but it isn't always
> so obvious.
>
> For the longer subarrays, it might be that hash tables could
> speed up the comparisons.  That would, for example, get you
> an O(N) test for a value being in a subarray being considered.

Never thought of using a hash table, always focused on the sliding 
window and tried to do that as optimal as possible.

<snip>

>> Until a few months back...
>
>> ... when I started converting the original Pascal code in assembler.
>
> Can you post some code so we can see it?  It might give some good,
> or not so good, ideas on how to do it better.

I'll pull the code out of the program, probably needs a bit of massaging 
to make it runnable. I probably still have the (or at least an 
iteration) of sliding window somewhere, but it may not be able to post 
it this week as I will be away for a a short break.

>> While doing so I decided that it would be interesting to use this
>> procedure in another part of the code. However, the results were not
>> what I expected: result rows were missing.

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply Robert 5/17/2010 9:29:55 AM

"Robert AH Prins" <spamtrap@prino.org> wrote in message news:85akp6F4bpU1@mid.individual.net...
| Hi all,
|
| Can anyone give me some hints as to solve the following problem,
| preferably in a way that is faster than the way I used to do it, and
| without the bug in the current version;
|
| Problem description:
|
| The program processes an array of N cells, with N up to about 4096. The
| array is filled from the start, it may not be completely full but there
| are no empty cells between full cells. A cell contains a value of
| 1..255, so logically every cell would be a byte, but if processing
| efficiency requires each cell to be a word or even a double-word, that
| would be no problem.

In general, byte values are quicker to manipulate than word,
halfword, and doubleword values.

| The values are normalized, they are effectively
| indexes into a table, so if the table contains just T entries, the
| values would go from 1 to T.
|
| Given the above setup, the problem is to find *all* sub-arrays that
| contain 3..T distinct elements and are as short as possible, i.e. if
| there are 42 sub-arrays of three elements containing three distinct
| values, there is no need to find sub-arrays of four elements containing
| three distinct values. Also, a shorter sub-array may *never* be a part
| of a longer sub-array containing *only* distinct values.
|
| An example, suppose only the array contains 56 elements, in this case
| with, for clarity, values a..j:
|
|          1         2         3         4         5     5
| ....5....0....5....0....5....0....5....0....5....0....56
| aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
|
| in this case the program should find the following sub-arrays:
|
| len aperture from to value
|  3     3      40  42 bef
|  3     3      44  46 fgh
|  7     7      50  56 gihbfja
|  8    16      41  56 efffghggggihbfja, containing efghibja
|  9    23      34  56 cbbbbbbefffghggggihbfja, containing cbefghija
| 10    25      32  56 dccbbbbbbefffghggggihbfja, containing dcbefghija
|
| It is possible to find sub-arrays with 4, 5, or 6 distinct values, but
| they are either longer than the series of 7-in-7 (12-25 contain abcd),
| or are part of the 7-in-7 series and as such they should not be included!
|
| My AD 1996 program used to slide a window that started with three
| elements (plus a sentinel on either side) over the big array and spit
| out the position of the first element if it found three distinct
| elements in the window *and* the two sentinels also contained any of the
| values in the window.

This sounds like something adaptable to using INDEX.
especially as your data are byte entries. (or maybe I'm
thinking of the wrong part of the search...). 


0
Reply robin 5/17/2010 9:59:14 AM

"glen herrmannsfeldt" <gah@ugcs.caltech.edu> wrote in message news:hsq2kv$4gt$1@speranza.aioe.org...

| For alphabetsize of 32 or less, that is, values 0..31, and if I
| did it in C, I would probably use the one bit per 32 bit word
| storage, which makes testing fairly easy using & and |.
| As far as I know, PL/I BIT(32) doesn't have the optimization
| that uses a 32 bit word,

It does if it's unioned with a 31-bit FIXED BINARY.

| and UNSPEC is likely even worse.

Did you try it? 


0
Reply robin 5/17/2010 10:01:26 AM

Robert AH Prins wrote:
> On 2010-05-17 09:03, Terje Mathisen wrote:
>> Robert AH Prins wrote:
>>> On 2010-05-17 00:29, glen herrmannsfeldt wrote:
>>>> In comp.lang.pl1 Robert AH Prins<spamtrap@prino.org> wrote:
>>>>
>>>>> Can anyone give me some hints as to solve the following problem,
>>>>> preferably in a way that is faster than the way I used to do it, an=
d
>>>>> without the bug in the current version;
>>>>
>>>>> Problem description:
>>>>
>>>>> The program processes an array of N cells, with N up to about 4096.
>>>>> The
>>>>> array is filled from the start, it may not be completely full but
>>>>> there
>>>>> are no empty cells between full cells. A cell contains a value of
>>>>> 1..255, so logically every cell would be a byte, but if processing
>>>>> efficiency requires each cell to be a word or even a double-word, t=
hat
>>>>> would be no problem. The values are normalized, they are effectivel=
y
>>>>> indexes into a table, so if the table contains just T entries, the
>>>>> values would go from 1 to T.
>>
>> The maximum array size of 4096 is interesting, it means that we can
>> afford to use quite a bit of lookup table space per entry.
>
> Memory is cheap, but cache is of course limited.
>
>> Otoh is also means that we cannot afford to use a lot of time to
>> initialize those tables, since that would take just as long or longer
>> than the obvious window scanning algorithm.
>
> The problem is, the sliding window algorithm is fundamentally wrong. I
> spend a bit of time converting the current Pl/I version of the program
> back to something that compiles with the old (AD late 1980's, early
> 1990's compiler) and ran the program with the count option. The old
> sliding window method resulted, as mentioned before, in the execution o=
f
> *556,379,518* PL/I statements. The Paul Green method gets the same
> result by executing a mere *79,386* PL/I statements, a rather staggerin=
g
> difference...

That sounds like the difference between an O(n*n) and O(n*log(n))=20
algorithm. :-)
>
> I'm now going to dig through about a dozen boxes with papers that were
> never unpacked after we moved. I have a faint hope that there may be
> printouts of my email exchange with him (PG) in one of them. Hope I hav=
e
> the time to do it today or tomorrow, as I'm off to the other side of
> Europe later this week, provided the Eyjafjallaj=F6kull allows me...

Britain and several other western parts of Europe seems quite iffy=20
tomorrow morning, according to=20
http://weatheronline.co.uk/cgi-app/volcanic?LANG=3Den&ART=3D3
>
>> I would still start by determining the first and last position of each
>> unique byte, then use that to pick the determine the shortest substrin=
g
>> which contains all the values: This becomes the limit for a window sca=
n
>> that starts with aperture 3 and goes up.
>
> That string may contain all values, but strings to either side of it ma=
y
> contain shorter substrings. In the example I gave, all values occur in
> the substring starting at position 32, but if there had been additional
> values b and c at positions "0" and "-1", your window scan would, if I
> read the above correctly, not pick them up as part of a 3-string.

Right, I did realize that a little bit later.

This is an intriguing problem, I'll have think some more...
:-)

Terje

--=20
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/17/2010 2:50:20 PM

On 2010-05-17 09:03, Terje Mathisen wrote:
> Robert AH Prins wrote:
>> On 2010-05-17 00:29, glen herrmannsfeldt wrote:
>>> In comp.lang.pl1 Robert AH Prins<spamtrap@prino.org> wrote:
>>>
>>>> Can anyone give me some hints as to solve the following problem,
>>>> preferably in a way that is faster than the way I used to do it, and
>>>> without the bug in the current version;
>>>
>>>> Problem description:
>>>
>>>> The program processes an array of N cells, with N up to about 4096. The
>>>> array is filled from the start, it may not be completely full but there
>>>> are no empty cells between full cells. A cell contains a value of
>>>> 1..255, so logically every cell would be a byte, but if processing
>>>> efficiency requires each cell to be a word or even a double-word, that
>>>> would be no problem. The values are normalized, they are effectively
>>>> indexes into a table, so if the table contains just T entries, the
>>>> values would go from 1 to T.
>
> The maximum array size of 4096 is interesting, it means that we can
> afford to use quite a bit of lookup table space per entry.

Memory is cheap, but cache is of course limited.

> Otoh is also means that we cannot afford to use a lot of time to
> initialize those tables, since that would take just as long or longer
> than the obvious window scanning algorithm.

The problem is, the sliding window algorithm is fundamentally wrong. I 
spend a bit of time converting the current Pl/I version of the program 
back to something that compiles with the old (AD late 1980's, early 
1990's compiler) and ran the program with the count option. The old 
sliding window method resulted, as mentioned before, in the execution of 
*556,379,518* PL/I statements. The Paul Green method gets the same 
result by executing a mere *79,386* PL/I statements, a rather staggering 
difference...

I'm now going to dig through about a dozen boxes with papers that were 
never unpacked after we moved. I have a faint hope that there may be 
printouts of my email exchange with him (PG) in one of them. Hope I have 
the time to do it today or tomorrow, as I'm off to the other side of 
Europe later this week, provided the Eyjafjallaj�kull allows me...

> I would still start by determining the first and last position of each
> unique byte, then use that to pick the determine the shortest substring
> which contains all the values: This becomes the limit for a window scan
> that starts with aperture 3 and goes up.

That string may contain all values, but strings to either side of it may 
contain shorter substrings. In the example I gave, all values occur in 
the substring starting at position 32, but if there had been additional 
values b and c at positions "0" and "-1", your window scan would, if I 
read the above correctly, not pick them up as part of a 3-string.

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply Robert 5/17/2010 3:24:18 PM

In article <85c8i8FphiU1@mid.individual.net>,
	Robert AH Prins <spamtrap@prino.org> writes:
> On 2010-05-17 04:57, Dick Wesseling wrote:
>> In article<85akp6F4bpU1@mid.individual.net>,
>> 	Robert AH Prins<spamtrap@prino.org>  writes:
>>
>>> Given the above setup, the problem is to find *all* sub-arrays that
>>> contain 3..T distinct elements and are as short as possible, i.e. if
>>> there are 42 sub-arrays of three elements containing three distinct
>>> values, there is no need to find sub-arrays of four elements containing
>>> three distinct values. Also, a shorter sub-array may *never* be a part
>>> of a longer sub-array containing *only* distinct values.
>>>
>>> An example, suppose only the array contains 56 elements, in this case
>>> with, for clarity, values a..j:
>>>
>>>            1         2         3         4         5     5
>>> ....5....0....5....0....5....0....5....0....5....0....56
>>> aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
>>>
>>> in this case the program should find the following sub-arrays:
>>>
>>> len aperture from to value
>>>  3     3      40  42 bef
>>>  3     3      44  46 fgh
>>>  7     7      50  56 gihbfja
>>>  8    16      41  56 efffghggggihbfja, containing efghibja
>>>  9    23      34  56 cbbbbbbefffghggggihbfja, containing cbefghija
>>>  0    25      32  56 dccbbbbbbefffghggggihbfja, containing dcbefghija
>>>
>>>
>>> It is possible to find sub-arrays with 4, 5, or 6 distinct values, but
>>> they are either longer than the series of 7-in-7 (12-25 contain abcd),
>>
>> Are you saying that if we tabulate the results, then both the "len" and
>> "aperture" columns should have increasing values?
> 
> If they are temporarily stored in an array or list, then it's trivial to 
> sort them in any required order before further processing.
> 

That's not what I meant. I wanted to make sure that I understood the
problem statement before attempting a solution.

>> If my understanding of the requirements is correct then it may be better
>> to start with a large window and _decrease_ it in each pass. (As finding
>> the longest length and corresponding aperture is trivial).
> 
> Even if there may be more than one longest interval? This *is* pretty 
> unlikely. However, the current set of data contains two sub-arrays with 
> a length of 1879 that contain 64 distinct values.
>
>> For example, after finding len=10 aperture=25 one should only consider
>> substrings of 25 or less when searching for sequences of less than 10
>> distinct values.
> 
> That is pretty obvious, but why would reducing the aperture from 24 down 
> to potentially 9 to find substrings with 9 distinct values be any more 
> efficient than starting with an aperture of 9 and increasing it to 24? 
> If the substring of 9 elements were to be found with an aperture of 16 
> or 17, i.e. dead in the middle, starting with the smaller aperture would 
> involve less testing...
> 

The solution that I have in mind is O(N*C) where N is the length of
the input and C is either the size of the alphabet (current version) or
- with a few improvements that left as an exercise - the number of
different lengths (which is 5 for {3,7,8,9,10} in your example).

My solution involves an exclusion list that is searched linearly,
this may ruin the performance, but it should be possible to use a
binary search tree instead. Also left as an exercise.

Anyway, here is what I had in mind. Compiles with gcc.


#include <stdio.h>
#include <string.h>


int verbose=0;

#if 1
// original example

#define TESTSTR "aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja"

#else

/*
  That string may contain all values, but strings to either side of it may
  contain shorter substrings. In the example I gave, all values occur in
  the substring starting at position 32, but if there had been additional
  values b and c at positions "0" and "-1", your window scan would, if I
  read the above correctly, not pick them up as part of a 3-string.
*/

#define TESTSTR "bcaaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja"
#endif
unsigned char input [] = TESTSTR ;
unsigned N = sizeof(TESTSTR)-1;



#define C 256               // Size of alphabet
#define ONE 1               /* Pascal array bias convention */


void findseqs(unsigned char *data, unsigned len) {
    unsigned i;
    unsigned uniq = 0;              // Nr unique symbols in window
    unsigned apub = ~0;             // Aperture upper bound

    unsigned nrfound = /* to keep gcc happy */ 0;
    unsigned found[len];

    unsigned freq[C];               // Frequency count
    bzero(freq, sizeof(freq));

#define inc(c) do { if(!freq[c]) uniq++; freq[c]++; } while(0)
#define dec(c) do { freq[c]--; if(!freq[c]) uniq--; } while(0)


#define WINSIZE ((r-l)+1)

    // THIS NEEDS TO BE IMPROVED:
    struct {
        unsigned start;
        unsigned end;                   // Exclusive end
    } excluded [len];
    int nrexcluded=0;


    // Start with largest possible sequence: the entire input string

    unsigned l = 0;             // Start of window
    unsigned r = len-1;         // Inclusive end of window

    for (i=0; i<len; i++) {
        inc(data[i]);
    }

    // Trim left & right edges

    while (freq[data[r]]>1) { dec(data[r]); r--; }
    while (freq[data[l]]>1) { dec(data[l]); l++; }

    // From here on we want to find only sequences that are shorter or
    // equal to what we have found above:

    unsigned goal = uniq;

haveseq:
    if (WINSIZE <= apub ) {         // Short enough?

        // A candidate. May be premature if we have shorter sequences
        // with the same nr of unique symbols.

        if (verbose) {
            printf("%s\n", data);
            for (i=0; i<l; i++) printf(" ");
            for (   ; i<=r; i++) printf("-");
            printf("\n");
            printf("Possible sequence: len %d aper %d from %d to %d\n",
                                   uniq, WINSIZE, l, r);
        }

        // Part of exclusion?
        // For now brute force. todo: some sort of binary search

        for (i=0; i<nrexcluded; i++) {
            if ( l >= excluded[i].start &&
                 r <  excluded[i].end) {
                 if (verbose) printf("Skip excluded\n");
                 goto skip;
            }
        }
        if (WINSIZE < apub) {
            nrfound = 0;            // Dump old candidates
            apub = WINSIZE;         // Next matches must be <= this one
        }
        found[nrfound] = l;         // Add to list of tentative results
        nrfound++;
skip:   ;
    }

    // Slide window

    dec(data[l]); l++;
    while (freq[data[l]]>1) { dec(data[l]); l++; }
    while (uniq < goal && r<len-1) { r++; inc(data[r]); }

    // Hit right edge?
    if (uniq == goal) goto haveseq;


    // No more matches of this length. Print results

    for (i=0; i<nrfound; i++) {

        unsigned j;
        printf("Found len %2d aperture %2d pos %d\t", goal, apub, ONE+found[i]);
        for (j=found[i]; j< apub+found[i]; j++) printf("%c", data[j]);
        printf("\n");

        if (apub == goal) {     // *only* distinct values?
            if (verbose) printf("Add exclude\n");
            excluded[nrexcluded].start = found[i];
            excluded[nrexcluded].end   = found[i]+goal;
            nrexcluded++;
        }
    }

    // Now the next shorter sequence.
    //
    // If may understanding of the problem is correct it must have
    // a smaller aperture than the one found above. So in order to
    // find the length and a first estimate of the aperture we slide
    // a fixed size window over the input data

#define RESET   \
    bzero(freq, sizeof(freq));  \
    uniq = l = r = 0;           /* reset everything */

    RESET

    apub--;                         // Must be shorter the previous

    for (r=0; r<apub-1; r++) {
        inc(data[r]);
    }   inc(data[r]);

    unsigned int start = 0;         // Where to start next iteration
    unsigned int end   = 0;         // of main loop

    goal=0;

    // match fixed window at all positions
    while (r < (len-1)) {
        if (uniq > goal ) {

            // Q: does it pay to test exclusion list at this point?
            // Yes! That saves quite a few iterations of the main
            // loop. E.g. sample data is now O(10*N) whereas
            // O(5*N) can be achieved. This is left as an exercise
            // for the reader.

            goal = uniq;
            start = l;
            end   = r;
        }
        dec(data[l]); l++;
        r++; inc(data[r]);
    }

    if (goal < 3) return;

    RESET
    l=start;
    for (r=l; r<end; r++) {
        inc(data[r]);
    }   inc(data[r]);

    while (freq[data[l]]>1) { dec(data[l]); l++; }
    apub=WINSIZE;
    nrfound=0;

    if (verbose) {
        printf("Next goal %d start %d end %d uniq %d \n",
                goal, start, end, uniq );
    }
    goto haveseq;
}


int main(int argc, char **argv) {
    if (argc >1) {
        findseqs(argv[1], strlen(argv[1]));
    } else {
        findseqs(input, N);
    }
    return 0;
}
0
Reply ftu 5/17/2010 11:06:23 PM

Robert AH Prins wrote:
> Hi all,
> 
> Can anyone give me some hints as to solve the following problem, 
> preferably in a way that is faster than the way I used to do it, and 
> without the bug in the current version;
> 
....
I believe the program below does what you want and should be a couple of orders of magnitude faster 
than the method you outlined (based on your statement about how many "PL/I statements" are executed 
on an input of length 2329 with 66 distinct values). For illustrative purposes, the program uses the 
character representation you used in your example for input and output, but the algorithm itself 
will work for values of c up to 32765 and v up to 32764 if you have enough memory and patience.

The program uses four arrays: y(c), a character array into which the input is read; x(c), an integer 
array giving the encoded input with a=1, b=2, etc; a(0:c,v+1), described below; and l(v), with l(j) 
the length of the shortest subsequence with j distinct values.

The basic idea is to build an array, a, with a row for each cell and a column for each value. The 
i,j element of the array gives the relative cell number in the input array starting from cell i of 
the first occurrence of value j, if any, otherwise a(i,j) is greater than max(c,v+1). The array is 
built starting from the last row, c, and working back to row 1. Row c is first set to max(c+1,v+2). 
Then a(c,x(c)) is set to 1. Working backward for i=c-1 to 1, each element of row i is set to the 
minimum of 32766 and the corresponding element of row i+1 plus 1 and then element a(i,x(i)) is set to 1.

Once this construction is complete, it is evident that there is one and only one instance of the 
integer 1 in each row, and all values in each row less than c+1 are unique.

Next each row is sorted. After sorting, the a(i,j) element contains the length of the shortest 
subsequence starting at position i that contains j distinct values unless there is no such 
subsequence, in which case a(i,j) is greater than c. It is also evident that if a(i,j)=j then 
a(i,k)=k for 1<=k<=j. Now if a(i,j)=j the subsequence of length j starting at position i has j 
distinct values, but it should be excluded from the solution set for j distinct values if it is a 
proper subsequence of a longer subsequence of distinct values. This will be so iff a(i,j+1)=j+1 or 
a(i-1,j+1)=j+1.

Based on these considerations, the solution sets for each j are constructed by scanning each column 
starting with j=3 and proceeding to j=v. For this purpose the array a actually has an additional 
row, 0, to contain the subscript of the first cell of the first minimal subsequence for each j, and 
an extra column, v+1, containing values greater than c to ensure that the test for inclusion can be 
carried out for j=v. For each j, l(j) is initialized to c+1 and k (the previous element in the list 
of minimal subsequences) to 0. The scan proceeds as follows: if a(i,j) is less than l(j) the current 
subsequence is shorter that the shortest found so far so l(j) is updated and k reset to 0. If a(i,j) 
is equal to l(j) the inclusion test is applied and if it passes the current subsequence is added to 
the list. Once the entire column has been processed the list is terminated by setting a(k,j)=0.

It is now a simple matter to read off the solution subsequences. The program utilizes sorta to sort 
the rows. call sorta(addr(first element),n,w,c)  sorts an array of n elements of width w in place 
according the comparison function c. c(u,v) returns '1'b (bit(1) aligned) if the u-th element may 
precede the v-th element, i.e., x(u)<=x(v).


Source:

%process mar(2,100) offset;
  subsets: proc options(main) reorder;
   dcl
    (c,v,i,j,k,m,ar init((rank('a')-1))) bin fixed,
    (x(c),a(0:c,v+1),l(v)) bin fixed ctl,
    y(c) char(1) ctl,
    sorta entry(ptr,bin fixed(31),bin fixed(31),
     entry(bin fixed(31),bin fixed(31)) returns(bit(1) aligned)),
    vfmt entry(bin float(53),bin fixed(15),bin fixed(15)) returns(char(50) var),
    sysin file input,
    sysprint file print,
    (addr,max,min,rank) builtin;
   get file(sysin) list(c,v);
   put file(sysprint) edit('c: ',vfmt(c,10,0),', v: ',vfmt(v,10,0))(col(1),4 a);
   allocate x,a,l,y;
   get file(sysin) edit(y)(col(1),(c)a(1));
   put file(sysprint) edit('Input: ',y)(col(1),a,(c)a(1));
   do i=1 to c; x(i)=rank(y(i))-ar; end;
   a(c,*)=max(c+1,v+2); a(c,x(c))=1;
   do i=c-1 to 1 by -1; a(i,*)=min(32766,a(i+1,*)+1); a(i,x(i))=1; end;
   do i=1 to c; call sorta(addr(a(i,1)),v,2,comp); end;
   do j=3 to v; l(j)=c+1; k=0; m=j+1; a(0,m)=0;
    do i=1 to c;
     if a(i,j)<l(j) then do; l(j)=a(i,j); k=0; end;
     if a(i,j)=l(j) then if a(i-1,m)~=m then if a(i,m)~=m then do; a(k,j),k=i; end;
     end;
    a(k,j)=0;
    if a(0,j)>0 then do;
     put file(sysprint) edit('Distinct: ',vfmt(j,10,0),', length: ',vfmt(l(j),10,0))(col(1),4 a);
     i=a(0,j); k=0; do while(i~=0); k+=1;
      put file(sysprint) edit(k,': (',vfmt(i,10,0),') ',(y(i+m) do m=0 to l(j)-1))
       (col(1),f(10),3 a,(l(j))a(1));
      i=a(i,j);
      end;
     end;
    end;
   comp: proc(u,v) returns(bit(1) aligned) reorder;
    dcl (u,v) bin fixed(31);
    return(a(i,u)<=a(i,v));
    end comp;
   end subsets;

Input:

  56 10
aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja

Output:

c: 56, v: 10
Input: aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
Distinct: 3, length: 3
          1: (40) bef
          2: (44) fgh
Distinct: 7, length: 7
          1: (50) gihbfja
Distinct: 8, length: 16
          1: (41) efffghggggihbfja
Distinct: 9, length: 23
          1: (34) cbbbbbbefffghggggihbfja
Distinct: 10, length: 25
          1: (32) dccbbbbbbefffghggggihbfja

Here is the output from another example:

c: 48, v: 10
Input: abcabcabcabcdefgabcabccdabcdeabcaaabbbcccdefghij
Distinct: 3, length: 3
          1: (1) abc
          2: (2) bca
          3: (3) cab
          4: (4) abc
          5: (5) bca
          6: (6) cab
          7: (7) abc
          8: (8) bca
          9: (9) cab
         10: (18) bca
         11: (19) cab
         12: (20) abc
         13: (31) bca
Distinct: 4, length: 4
          1: (23) cdab
          2: (24) dabc
Distinct: 5, length: 5
          1: (25) abcde
          2: (26) bcdea
          3: (27) cdeab
          4: (28) deabc
Distinct: 7, length: 7
          1: (10) abcdefg
          2: (11) bcdefga
          3: (12) cdefgab
          4: (13) defgabc
Distinct: 8, length: 8
          1: (41) cdefghij
Distinct: 9, length: 11
          1: (38) bcccdefghij
Distinct: 10, length: 14
          1: (35) abbbcccdefghij
0
Reply James 5/18/2010 8:12:04 AM

Robert AH Prins wrote:
> Hi all,
> 
> Can anyone give me some hints as to solve the following problem, 
> preferably in a way that is faster than the way I used to do it, and 
> without the bug in the current version;
> 
....
I believe the program below does what you want and should be a couple of orders of magnitude faster
than the method you outlined (based on your statement about how many "PL/I statements" are executed
on an input of length 2329 with 66 distinct values). For illustrative purposes, the program uses the
character representation you used in your example for input and output, but the algorithm itself
will work for values of c up to 32765 and v up to 32764 if you have enough memory and patience.

The program uses four arrays: y(c), a character array into which the input is read; x(c), an integer
array giving the encoded input with a=1, b=2, etc; a(0:c,v+1), described below; and l(v), with l(j)
the length of the shortest subsequence with j distinct values.

The basic idea is to build an array, a, with a row for each cell and a column for each value. The
i,j element of the array gives the relative cell number in the input array starting from cell i of
the first occurrence of value j, if any, otherwise a(i,j) is greater than max(c,v+1). The array is
built starting from the last row, c, and working back to row 1. Row c is first set to max(c+1,v+2).
Then a(c,x(c)) is set to 1. Working backward for i=c-1 to 1, each element of row i is set to the
minimum of 32766 and the corresponding element of row i+1 plus 1 and then element a(i,x(i)) is set to 1.

Once this construction is complete, it is evident that there is one and only one instance of the
integer 1 in each row, and all values in each row less than c+1 are unique.

Next each row is sorted. After sorting, the a(i,j) element contains the length of the shortest
subsequence starting at position i that contains j distinct values unless there is no such
subsequence, in which case a(i,j) is greater than c. It is also evident that if a(i,j)=j then
a(i,k)=k for 1<=k<=j. Now if a(i,j)=j the subsequence of length j starting at position i has j
distinct values, but it should be excluded from the solution set for j distinct values if it is a
proper subsequence of a longer subsequence of distinct values. This will be so iff a(i,j+1)=j+1 or
a(i-1,j+1)=j+1.

Based on these considerations, the solution sets for each j are constructed by scanning each column
starting with j=3 and proceeding to j=v. For this purpose the array a actually has an additional
row, 0, to contain the subscript of the first cell of the first minimal subsequence for each j, and
an extra column, v+1, containing values greater than c to ensure that the test for inclusion can be
carried out for j=v. For each j, l(j) is initialized to c+1 and k (the previous element in the list
of minimal subsequences) to 0. The scan proceeds as follows: if a(i,j) is less than l(j) the current
subsequence is shorter that the shortest found so far so l(j) is updated and k reset to 0. If a(i,j)
is equal to l(j) the inclusion test is applied and if it passes the current subsequence is added to
the list. Once the entire column has been processed the list is terminated by setting a(k,j)=0.

It is now a simple matter to read off the solution subsequences. The program utilizes sorta to sort
the rows. call sorta(addr(first element),n,w,c)  sorts an array of n elements of width w in place
according the comparison function c. c(u,v) returns '1'b (bit(1) aligned) if the u-th element may
precede the v-th element, i.e., x(u)<=x(v).


Source:

%process mar(2,100) offset;
  subsets: proc options(main) reorder;
   dcl
    (c,v,i,j,k,m,ar init((rank('a')-1))) bin fixed,
    (x(c),a(0:c,v+1),l(v)) bin fixed ctl,
    y(c) char(1) ctl,
    sorta entry(ptr,bin fixed(31),bin fixed(31),
     entry(bin fixed(31),bin fixed(31)) returns(bit(1) aligned)),
    vfmt entry(bin float(53),bin fixed(15),bin fixed(15)) returns(char(50) var),
    sysin file input,
    sysprint file print,
    (addr,max,min,rank) builtin;
   get file(sysin) list(c,v);
   put file(sysprint) edit('c: ',vfmt(c,10,0),', v: ',vfmt(v,10,0))(col(1),4 a);
   allocate x,a,l,y;
   get file(sysin) edit(y)(col(1),(c)a(1));
   put file(sysprint) edit('Input: ',y)(col(1),a,(c)a(1));
   do i=1 to c; x(i)=rank(y(i))-ar; end;
   a(c,*)=max(c+1,v+2); a(c,x(c))=1;
   do i=c-1 to 1 by -1; a(i,*)=min(32766,a(i+1,*)+1); a(i,x(i))=1; end;
   do i=1 to c; call sorta(addr(a(i,1)),v,2,comp); end;
   do j=3 to v; l(j)=c+1; k=0; m=j+1; a(0,m)=0;
    do i=1 to c;
     if a(i,j)<l(j) then do; l(j)=a(i,j); k=0; end;
     if a(i,j)=l(j) then if a(i-1,m)~=m then if a(i,m)~=m then do; a(k,j),k=i; end;
     end;
    a(k,j)=0;
    if a(0,j)>0 then do;
     put file(sysprint) edit('Distinct: ',vfmt(j,10,0),', length: ',vfmt(l(j),10,0))(col(1),4 a);
     i=a(0,j); k=0; do while(i~=0); k+=1;
      put file(sysprint) edit(k,': (',vfmt(i,10,0),') ',(y(i+m) do m=0 to l(j)-1))
       (col(1),f(10),3 a,(l(j))a(1));
      i=a(i,j);
      end;
     end;
    end;
   comp: proc(u,v) returns(bit(1) aligned) reorder;
    dcl (u,v) bin fixed(31);
    return(a(i,u)<=a(i,v));
    end comp;
   end subsets;

Input:

  56 10
aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja

Output:

c: 56, v: 10
Input: aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
Distinct: 3, length: 3
          1: (40) bef
          2: (44) fgh
Distinct: 7, length: 7
          1: (50) gihbfja
Distinct: 8, length: 16
          1: (41) efffghggggihbfja
Distinct: 9, length: 23
          1: (34) cbbbbbbefffghggggihbfja
Distinct: 10, length: 25
          1: (32) dccbbbbbbefffghggggihbfja

Here is the output from another example:

c: 48, v: 10
Input: abcabcabcabcdefgabcabccdabcdeabcaaabbbcccdefghij
Distinct: 3, length: 3
          1: (1) abc
          2: (2) bca
          3: (3) cab
          4: (4) abc
          5: (5) bca
          6: (6) cab
          7: (7) abc
          8: (8) bca
          9: (9) cab
         10: (18) bca
         11: (19) cab
         12: (20) abc
         13: (31) bca
Distinct: 4, length: 4
          1: (23) cdab
          2: (24) dabc
Distinct: 5, length: 5
          1: (25) abcde
          2: (26) bcdea
          3: (27) cdeab
          4: (28) deabc
Distinct: 7, length: 7
          1: (10) abcdefg
          2: (11) bcdefga
          3: (12) cdefgab
          4: (13) defgabc
Distinct: 8, length: 8
          1: (41) cdefghij
Distinct: 9, length: 11
          1: (38) bcccdefghij
Distinct: 10, length: 14
          1: (35) abbbcccdefghij


0
Reply James 5/18/2010 8:27:44 AM

Sorry for the double posting.

When I sent the message the first time, a message from my outgoing mail server appeared saying 
something to the effect that "this account is valid for email only; postings to newsgroups are ignored."

I looked on the newsgroup and sure enough the message was not there. I messed around a bit with the 
copy in my sent folder and tried again. I did not get the message from the isp this time but the 
message did not show up on the newsgroup either. I decided to give up and go to bed and try again in 
the morning. By that time both messages had appeared.

Can anyone explain what was going on? I have never seen this behavior before.
0
Reply James 5/18/2010 5:56:36 PM

James J. Weinkam wrote:
> Sorry for the double posting.
> 
> When I sent the message the first time, a message from my outgoing mail 
> server appeared saying something to the effect that "this account is 
> valid for email only; postings to newsgroups are ignored."

Hi James,

The NG comp.lang.asm.x86 is moderated. What this means is that your NNTP 
server, instead of posting the article as usual, sends it - by mail - to 
the moderator for approval. I've never seen that message, but I suspect 
that's where it's from. When the moderator (or the "moderator's 
apprentice", me) approves the message, we re-post it to an NNTP server. 
This time, bearing the highly arcane moderator's incantation, it gets 
posted. In the case of a cross-posted message, this delays posting to 
all groups, even the unmoderated ones. We try to keep on top of it, but 
there is inevitably some delay.

I strongly suspected I was approving the same message twice, but would 
rather do that than risk missing some minor edit. If anyone is terribly 
bothered by the double-posting, my abject apologies on behalf of the 
moderation team - and my personal advice: "get a life!" :)

Sorry 'bout that - we appreciate your posting!

Best,
Frank

0
Reply Frank 5/18/2010 6:40:54 PM

In article <oxAIn.4447$z%6.25@edtnps83>,
	"James J. Weinkam" <jjw@nospicedham.cs.sfu.ca> writes:
> Sorry for the double posting.
> 
> When I sent the message the first time, a message from my outgoing mail server appeared saying 
> something to the effect that "this account is valid for email only; postings to newsgroups are ignored."
> 
> I looked on the newsgroup and sure enough the message was not there.

Which newsgroup? I am reading comp.lang.asm.x86 which is moderated,
so it is not unusual for a message to not show up immediately. I don't know
if the other two groups - comp.lang.pascal.borland and comp.lang.pl1 - are
moderated. What happens when posting to multiple groups when one is moderated
and the others are not?

>                                                                      I messed around a bit with the
> copy in my sent folder and tried again. I did not get the message from the isp this time but the
> message did not show up on the newsgroup either. I decided to give up and go to bed and try again in 
> the morning. By that time both messages had appeared.
> 
> Can anyone explain what was going on? I have never seen this behavior before.

I noticed some strange things too.
My message <4bf1cbef$0$22934$c5fe704e@news6.xs4all.nl> references three
other messages:

References: <85akp6F4bpU1@mid.individual.net>
    <4bf0ccd0$0$22945$c5fe704e@news6.xs4all.nl>
    <85c8i8FphiU1@mid.individual.net>

But the message that appeared in clax86 contained only the first
reference. Did the moderation software remove the other two? Never
seen that before.
0
Reply free 5/18/2010 6:58:43 PM

In comp.lang.pl1 Dick Wesseling <free@nospicedham.securityaudit.val.newsbank.net> wrote:
(snip)
 
> Which newsgroup? I am reading comp.lang.asm.x86 which is moderated,
(snip)
> What happens when posting to multiple groups when one is moderated
> and the others are not?

Last I knew, it wasn't posted to any until the moderator approved it.

-- glen
0
Reply glen 5/18/2010 7:09:38 PM

Dick Wesseling wrote:
> In article <oxAIn.4447$z%6.25@edtnps83>,
> 	"James J. Weinkam" <jjw@nospicedham.cs.sfu.ca> writes:
>> Sorry for the double posting.
>>
>> When I sent the message the first time, a message from my outgoing mail server appeared saying 
>> something to the effect that "this account is valid for email only; postings to newsgroups are ignored."
>>
>> I looked on the newsgroup and sure enough the message was not there.
> 
> Which newsgroup? I am reading comp.lang.asm.x86 which is moderated,
> so it is not unusual for a message to not show up immediately. I don't know
> if the other two groups - comp.lang.pascal.borland and comp.lang.pl1 - are
> moderated. What happens when posting to multiple groups when one is moderated
> and the others are not?

All postings are delayed (I'm not sure why, exactly...) A more(?) 
interesting question is, "What if more than one of the groups is 
moderated?" I understand that it is considered "impolite" to approve a 
message to another group, but I *think* if it's approved for one, it's 
approved for all. It is not obvious whether other groups are moderated 
or not. Apologies to any other moderators whose toes are being stepped on!

>> copy in my sent folder and tried again. I did not get the message from the isp this time but the
>> message did not show up on the newsgroup either. I decided to give up and go to bed and try again in 
>> the morning. By that time both messages had appeared.
>>
>> Can anyone explain what was going on? I have never seen this behavior before.
> 
> I noticed some strange things too.
> My message <4bf1cbef$0$22934$c5fe704e@news6.xs4all.nl> references three
> other messages:
> 
> References: <85akp6F4bpU1@mid.individual.net>
>     <4bf0ccd0$0$22945$c5fe704e@news6.xs4all.nl>
>     <85c8i8FphiU1@mid.individual.net>
> 
> But the message that appeared in clax86 contained only the first
> reference. Did the moderation software remove the other two? Never
> seen that before.

Everything, as received by the "submission address" is "supposed" to be 
dumped to "clax.log". As shown there, your header looks like...

+OK 3319 octets
Return-path: <news@news1.news.xs4all.nl>
Envelope-to: clax86-submit@inspiretomorrow.net
Delivery-date: Sun, 16 May 2010 23:58:00 -0500
Received: from moderators.individual.net ([130.133.4.7])
	by srv16.hosting24.com with esmtp (Exim 4.69)
	(envelope-from <news@news1.news.xs4all.nl>)
	id 1ODsOu-0000DM-Az
	for clax86-submit@inspiretomorrow.net; Sun, 16 May 2010 23:58:00 -0500
Received: from news1.news.xs4all.nl ([194.109.133.242])
           by moderators.individual.net (Exim 4.69)
           for comp-lang-asm-x86@moderators.isc.org with esmtp
           (envelope-from <news@news1.news.xs4all.nl>)
           id <1ODsOo-00024O-57>; Mon, 17 May 2010 06:57:59 +0200
Received: (from news@localhost)
	by news1.news.xs4all.nl (8.11.6/8.11.6) id o4H4vqL01125;
	Mon, 17 May 2010 06:57:52 +0200 (CEST)
	(envelope-from news)
To: comp-lang-asm-x86@moderators.isc.org
Mime-Version: 1.0
X-Newsreader: knews 1.0c.0
References: <85akp6F4bpU1@mid.individual.net>
From: free@securityaudit.val.newsbank.net (Dick Wesseling)
Subject: Re: Optimization problem
Newsgroups: comp.lang.pascal.borland,comp.lang.asm.x86,comp.lang.pl1
Content-Type: text/plain; charset=us-ascii
Date: 17 May 2010 04:57:52 GMT
Lines: 40
Message-ID: <4bf0ccd0$0$22945$c5fe704e@news6.xs4all.nl>
NNTP-Posting-Host: 2001:888:11d7:0:210:a7ff:fe0a:d33
X-Trace: 1274072272 news6.xs4all.nl 22945 
[2001:888:11d7:0:210:a7ff:fe0a:d33]:40744
X-Complaints-To: abuse@xs4all.nl

As you can see, only one reference showing. We do not intentionally 
remove any "References:" lines. We *do* remove "Received:" lines, and a 
couple others which, some people feel, reveal too much information about 
the poster (sorry about "outing" you). Possible we're not getting this 
right... it is homemade software...

Again, sorry for the confusion. I suspect "moderation issues" are not of 
interest to the other groups, so further discussion should probably go 
in another thread...

Best,
Frank
0
Reply Frank 5/18/2010 9:35:30 PM

In article <hsv17s$rte$1@speranza.aioe.org>,
	Frank Kotler <fbkotler@nospicedham.myfairpoint.net> writes:
> Dick Wesseling wrote:
>> In article <oxAIn.4447$z%6.25@edtnps83>,
>> 	"James J. Weinkam" <jjw@nospicedham.cs.sfu.ca> writes:
>>> Sorry for the double posting.
>>>
>>> When I sent the message the first time, a message from my outgoing mail server appeared saying 
>>> something to the effect that "this account is valid for email only; postings to newsgroups are ignored."
>>>
>>> I looked on the newsgroup and sure enough the message was not there.
>> 
>> Which newsgroup? I am reading comp.lang.asm.x86 which is moderated,
>> so it is not unusual for a message to not show up immediately. I don't know
>> if the other two groups - comp.lang.pascal.borland and comp.lang.pl1 - are
>> moderated. What happens when posting to multiple groups when one is moderated
>> and the others are not?
> 
> All postings are delayed (I'm not sure why, exactly...) A more(?) 
> interesting question is, "What if more than one of the groups is 
> moderated?" I understand that it is considered "impolite" to approve a 
> message to another group, but I *think* if it's approved for one, it's 
> approved for all. It is not obvious whether other groups are moderated 
> or not. Apologies to any other moderators whose toes are being stepped on!
> 
>>> copy in my sent folder and tried again. I did not get the message from the isp this time but the
>>> message did not show up on the newsgroup either. I decided to give up and go to bed and try again in 
>>> the morning. By that time both messages had appeared.
>>>
>>> Can anyone explain what was going on? I have never seen this behavior before.
>> 
>> I noticed some strange things too.
>> My message <4bf1cbef$0$22934$c5fe704e@news6.xs4all.nl> references three
>> other messages:
>> 
>> References: <85akp6F4bpU1@mid.individual.net>
>>     <4bf0ccd0$0$22945$c5fe704e@news6.xs4all.nl>
>>     <85c8i8FphiU1@mid.individual.net>
>> 
>> But the message that appeared in clax86 contained only the first
>> reference. Did the moderation software remove the other two? Never
>> seen that before.
> 
> Everything, as received by the "submission address" is "supposed" to be 
> dumped to "clax.log". As shown there, your header looks like...
> 
> +OK 3319 octets
> Return-path: <news@news1.news.xs4all.nl>
> Envelope-to: clax86-submit@inspiretomorrow.net
> Delivery-date: Sun, 16 May 2010 23:58:00 -0500
> Received: from moderators.individual.net ([130.133.4.7])
> 	by srv16.hosting24.com with esmtp (Exim 4.69)
> 	(envelope-from <news@news1.news.xs4all.nl>)
> 	id 1ODsOu-0000DM-Az
> 	for clax86-submit@inspiretomorrow.net; Sun, 16 May 2010 23:58:00 -0500
> Received: from news1.news.xs4all.nl ([194.109.133.242])
>            by moderators.individual.net (Exim 4.69)
>            for comp-lang-asm-x86@moderators.isc.org with esmtp
>            (envelope-from <news@news1.news.xs4all.nl>)
>            id <1ODsOo-00024O-57>; Mon, 17 May 2010 06:57:59 +0200
> Received: (from news@localhost)
> 	by news1.news.xs4all.nl (8.11.6/8.11.6) id o4H4vqL01125;
> 	Mon, 17 May 2010 06:57:52 +0200 (CEST)
> 	(envelope-from news)
> To: comp-lang-asm-x86@moderators.isc.org
> Mime-Version: 1.0
> X-Newsreader: knews 1.0c.0
> References: <85akp6F4bpU1@mid.individual.net>
> From: free@securityaudit.val.newsbank.net (Dick Wesseling)
> Subject: Re: Optimization problem
> Newsgroups: comp.lang.pascal.borland,comp.lang.asm.x86,comp.lang.pl1
> Content-Type: text/plain; charset=us-ascii
> Date: 17 May 2010 04:57:52 GMT
> Lines: 40
> Message-ID: <4bf0ccd0$0$22945$c5fe704e@news6.xs4all.nl>

That is not <4bf1cbef$0$22934$c5fe704e@news6.xs4all.nl> !

How about the correct log entry?
0
Reply free 5/19/2010 12:48:56 AM

Dick Wesseling wrote:

....
>> Message-ID: <4bf0ccd0$0$22945$c5fe704e@news6.xs4all.nl>
> 
> That is not <4bf1cbef$0$22934$c5fe704e@news6.xs4all.nl> !
> 
> How about the correct log entry?

Oh, you mean *this* <4bf1cbef$0$22934$c5fe704e@news6.xs4all.nl>! Sorry 
'bout that. (note to self: purge that file more often - way too big!)


+OK 10846 octets
Return-path: <news@news1.news.xs4all.nl>
Envelope-to: clax86-submit@inspiretomorrow.net
Delivery-date: Mon, 17 May 2010 18:06:32 -0500
Received: from moderators.individual.net ([130.133.4.7])
	by srv16.hosting24.com with esmtp (Exim 4.69)
	(envelope-from <news@news1.news.xs4all.nl>)
	id 1OE9OK-0000ou-53
	for clax86-submit@inspiretomorrow.net; Mon, 17 May 2010 18:06:32 -0500
Received: from news1.news.xs4all.nl ([194.109.133.242])
           by moderators.individual.net (Exim 4.69)
           for comp-lang-asm-x86@moderators.isc.org with esmtp
           (envelope-from <news@news1.news.xs4all.nl>)
           id <1OE9OD-0003Pw-RR>; Tue, 18 May 2010 01:06:31 +0200
Received: (from news@localhost)
	by news1.news.xs4all.nl (8.11.6/8.11.6) id o4HN6N802174;
	Tue, 18 May 2010 01:06:23 +0200 (CEST)
	(envelope-from news)
To: comp-lang-asm-x86@moderators.isc.org
Mime-Version: 1.0
X-Newsreader: knews 1.0c.0
References: <85akp6F4bpU1@mid.individual.net>
     <4bf0ccd0$0$22945$c5fe704e@news6.xs4all.nl>
     <85c8i8FphiU1@mid.individual.net>
From: ftu@fi.ruu.nl (Dick Wesseling)
Subject: Re: Optimization problem
Newsgroups: comp.lang.pascal.borland,comp.lang.asm.x86,comp.lang.pl1
Content-Type: text/plain; charset=us-ascii
Date: 17 May 2010 23:06:23 GMT
Lines: 283
Message-ID: <4bf1cbef$0$22934$c5fe704e@news6.xs4all.nl>
NNTP-Posting-Host: 2001:888:11d7:0:210:a7ff:fe0a:d33
X-Trace: 1274137583 news6.xs4all.nl 22934 
[2001:888:11d7:0:210:a7ff:fe0a:d33]:41025
X-Complaints-To: abuse@xs4all.nl

Yeah, I do seem to be truncating your references! Presumably because 
they're on separate lines... I thought I had that handled. I'll look 
into it. Thanks for the heads up!

Best,
Frank
0
Reply Frank 5/19/2010 1:31:38 AM

On 5/18/2010 3:27 AM, James J. Weinkam wrote:
> Robert AH Prins wrote:
>> Hi all,
>>
>> Can anyone give me some hints as to solve the following problem,
>> preferably in a way that is faster than the way I used to do it, and
>> without the bug in the current version;
>>
> ...
> I believe the program below does what you want and should be a couple of
> orders of magnitude faster
> than the method you outlined (based on your statement about how many
> "PL/I statements" are executed
> on an input of length 2329 with 66 distinct values). For illustrative
> purposes, the program uses the
> character representation you used in your example for input and output,
> but the algorithm itself
> will work for values of c up to 32765 and v up to 32764 if you have
> enough memory and patience.
>
> The program uses four arrays: y(c), a character array into which the
> input is read; x(c), an integer
> array giving the encoded input with a=1, b=2, etc; a(0:c,v+1), described
> below; and l(v), with l(j)
> the length of the shortest subsequence with j distinct values.
>
> The basic idea is to build an array, a, with a row for each cell and a
> column for each value. The
> i,j element of the array gives the relative cell number in the input
> array starting from cell i of
> the first occurrence of value j, if any, otherwise a(i,j) is greater
> than max(c,v+1). The array is
> built starting from the last row, c, and working back to row 1. Row c is
> first set to max(c+1,v+2).
> Then a(c,x(c)) is set to 1. Working backward for i=c-1 to 1, each
> element of row i is set to the
> minimum of 32766 and the corresponding element of row i+1 plus 1 and
> then element a(i,x(i)) is set to 1.
>
> Once this construction is complete, it is evident that there is one and
> only one instance of the
> integer 1 in each row, and all values in each row less than c+1 are unique.
>
> Next each row is sorted. After sorting, the a(i,j) element contains the
> length of the shortest
> subsequence starting at position i that contains j distinct values
> unless there is no such
> subsequence, in which case a(i,j) is greater than c. It is also evident
> that if a(i,j)=j then
> a(i,k)=k for 1<=k<=j. Now if a(i,j)=j the subsequence of length j
> starting at position i has j
> distinct values, but it should be excluded from the solution set for j
> distinct values if it is a
> proper subsequence of a longer subsequence of distinct values. This will
> be so iff a(i,j+1)=j+1 or
> a(i-1,j+1)=j+1.
>
> Based on these considerations, the solution sets for each j are
> constructed by scanning each column
> starting with j=3 and proceeding to j=v. For this purpose the array a
> actually has an additional
> row, 0, to contain the subscript of the first cell of the first minimal
> subsequence for each j, and
> an extra column, v+1, containing values greater than c to ensure that
> the test for inclusion can be
> carried out for j=v. For each j, l(j) is initialized to c+1 and k (the
> previous element in the list
> of minimal subsequences) to 0. The scan proceeds as follows: if a(i,j)
> is less than l(j) the current
> subsequence is shorter that the shortest found so far so l(j) is updated
> and k reset to 0. If a(i,j)
> is equal to l(j) the inclusion test is applied and if it passes the
> current subsequence is added to
> the list. Once the entire column has been processed the list is
> terminated by setting a(k,j)=0.
>
> It is now a simple matter to read off the solution subsequences. The
> program utilizes sorta to sort
> the rows. call sorta(addr(first element),n,w,c) sorts an array of n
> elements of width w in place
> according the comparison function c. c(u,v) returns '1'b (bit(1)
> aligned) if the u-th element may
> precede the v-th element, i.e., x(u)<=x(v).
>
>
> Source:
>
> %process mar(2,100) offset;
> subsets: proc options(main) reorder;
> dcl
> (c,v,i,j,k,m,ar init((rank('a')-1))) bin fixed,
> (x(c),a(0:c,v+1),l(v)) bin fixed ctl,
> y(c) char(1) ctl,
> sorta entry(ptr,bin fixed(31),bin fixed(31),
> entry(bin fixed(31),bin fixed(31)) returns(bit(1) aligned)),
> vfmt entry(bin float(53),bin fixed(15),bin fixed(15)) returns(char(50)
> var),
> sysin file input,
> sysprint file print,
> (addr,max,min,rank) builtin;
> get file(sysin) list(c,v);
> put file(sysprint) edit('c: ',vfmt(c,10,0),', v:
> ',vfmt(v,10,0))(col(1),4 a);
> allocate x,a,l,y;
> get file(sysin) edit(y)(col(1),(c)a(1));
> put file(sysprint) edit('Input: ',y)(col(1),a,(c)a(1));
> do i=1 to c; x(i)=rank(y(i))-ar; end;
> a(c,*)=max(c+1,v+2); a(c,x(c))=1;
> do i=c-1 to 1 by -1; a(i,*)=min(32766,a(i+1,*)+1); a(i,x(i))=1; end;
> do i=1 to c; call sorta(addr(a(i,1)),v,2,comp); end;
> do j=3 to v; l(j)=c+1; k=0; m=j+1; a(0,m)=0;
> do i=1 to c;
> if a(i,j)<l(j) then do; l(j)=a(i,j); k=0; end;
> if a(i,j)=l(j) then if a(i-1,m)~=m then if a(i,m)~=m then do;
> a(k,j),k=i; end;
> end;
> a(k,j)=0;
> if a(0,j)>0 then do;
> put file(sysprint) edit('Distinct: ',vfmt(j,10,0),', length:
> ',vfmt(l(j),10,0))(col(1),4 a);
> i=a(0,j); k=0; do while(i~=0); k+=1;
> put file(sysprint) edit(k,': (',vfmt(i,10,0),') ',(y(i+m) do m=0 to
> l(j)-1))
> (col(1),f(10),3 a,(l(j))a(1));
> i=a(i,j);
> end;
> end;
> end;
> comp: proc(u,v) returns(bit(1) aligned) reorder;
> dcl (u,v) bin fixed(31);
> return(a(i,u)<=a(i,v));
> end comp;
> end subsets;
>
> Input:
>
> 56 10
> aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
>
> Output:
>
> c: 56, v: 10
> Input: aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
> Distinct: 3, length: 3
> 1: (40) bef
> 2: (44) fgh
> Distinct: 7, length: 7
> 1: (50) gihbfja
> Distinct: 8, length: 16
> 1: (41) efffghggggihbfja
> Distinct: 9, length: 23
> 1: (34) cbbbbbbefffghggggihbfja
> Distinct: 10, length: 25
> 1: (32) dccbbbbbbefffghggggihbfja
>
> Here is the output from another example:
>
> c: 48, v: 10
> Input: abcabcabcabcdefgabcabccdabcdeabcaaabbbcccdefghij
> Distinct: 3, length: 3
> 1: (1) abc
> 2: (2) bca
> 3: (3) cab
> 4: (4) abc
> 5: (5) bca
> 6: (6) cab
> 7: (7) abc
> 8: (8) bca
> 9: (9) cab
> 10: (18) bca
> 11: (19) cab
> 12: (20) abc
> 13: (31) bca
> Distinct: 4, length: 4
> 1: (23) cdab
> 2: (24) dabc
> Distinct: 5, length: 5
> 1: (25) abcde
> 2: (26) bcdea
> 3: (27) cdeab
> 4: (28) deabc
> Distinct: 7, length: 7
> 1: (10) abcdefg
> 2: (11) bcdefga
> 3: (12) cdefgab
> 4: (13) defgabc
> Distinct: 8, length: 8
> 1: (41) cdefghij
> Distinct: 9, length: 11
> 1: (38) bcccdefghij
> Distinct: 10, length: 14
> 1: (35) abbbcccdefghij
>
>
Professor Weinkam's elegant algorithm solves the problem using c.v.lg v 
operations (his notation), i.e., N.T.lg T (RAH Prins' notation).

In the PL/I code that he gave, Professor Weinkam used two procedures 
that are not included in IBM's VAPLI: SORTA and VFMT. However, it is 
easy to supply the missing functions. In place of SORTA one may use 
quicksort or mergesort, for which PL/I code is available at 
http://rosettacode.org/wiki/Sorting_algorithms/Quicksort#PL.2FI and 
http://rosettacode.org/wiki/Sort_an_integer_array#PL.2FI. For VFMT, the 
following will do, although it allocates memory that is freed only by 
exiting the program:

   vfmt: proc(v) returns(char(10) var);
   dcl (l,v) fixed bin;
   dcl str char(10) var ctl;
   allocate str;
   if v < 10 then l=1;
   else if v < 100 then l=2;
   else if v < 1000 then l=3;
   else l=4;
   put string (str) edit (v) (F(l));
   return (str);
   end vfmt;

-- mecej4
0
Reply mecej4 5/19/2010 2:02:31 AM

Frank Kotler wrote:

....
> Yeah, I do seem to be truncating your references! Presumably because 
> they're on separate lines... I thought I had that handled. I'll look 
> into it. Thanks for the heads up!

May have that fixed. Seems I was seeing a space where I was expecting a 
tab... Trouble with trial and error coding is you have to keep posting 
test messages! Couldn't resist trying to fix other things, too. Blew the 
last test... try again...

Best,
Frank

0
Reply Frank 5/19/2010 3:19:03 AM

On May 18, 2:58=A0pm, f...@nospicedham.securityaudit.val.newsbank.net
(Dick Wesseling) wrote:
>
> I noticed some strange things too.
> My message <4bf1cbef$0$22934$c5fe7...@news6.xs4all.nl> references three
> other messages:
>
> References: <85akp6F4bpU1@mid.individual.net>
> =A0 =A0 <4bf0ccd0$0$22945$c5fe704e@news6.xs4all.nl>
> =A0 =A0 <85c8i8FphiU1@mid.individual.net>
>
> But the message that appeared in clax86 contained only the first
> reference. Did the moderation software remove the other two? Never
> seen that before.

Wow!  Thanks for helping to debug our "paper mache" moderation
software.  Who writes that kind of thing in ASM?  We must be 'high' or
crazy or something...  {shifty eyes, pretends bewilderment}  ...
yeah, we're nuts.  Bring on the straight jacket!  :)

Nathan.
http://clax.inspiretomorrow.net/index.php
0
Reply Nathan 5/19/2010 3:36:54 AM

"James J. Weinkam" <jjw@nospicedham.cs.sfu.ca> wrote in message news:oZrIn.4426$z%6.1359@edtnps83...

| I believe the program below does what you want and should be a couple of orders of magnitude faster
| than the method you outlined (based on your statement about how many "PL/I statements" are executed
| on an input of length 2329 with 66 distinct values). For illustrative purposes, the program uses the
| character representation you used in your example for input and output, but the algorithm itself
| will work for values of c up to 32765 and v up to 32764 if you have enough memory and patience.

Assuming that it works, a very impressive effort! 


0
Reply robin 5/19/2010 12:54:53 PM

In comp.lang.pascal.borland message <hsv17s$rte$1@speranza.aioe.org>,
Tue, 18 May 2010 17:35:30, Frank Kotler <fbkotler@nospicedham.myfairpoin
t.net> posted:

>All postings are delayed (I'm not sure why, exactly...) A more(?)
>interesting question is, "What if more than one of the groups is
>moderated?" I understand that it is considered "impolite" to approve a
>message to another group, but I *think* if it's approved for one, it's
>approved for all. It is not obvious whether other groups are moderated
>or not. Apologies to any other moderators whose toes are being stepped
>on!

The initial submission should be mailed to the moderation address of the
first moderated newsgroup in the Newsgroups line.  Each moderator is
expected either to reject it or to mail it to the moderation address of
the next listed moderated newsgroup, except that the moderator address
of the last listed moderated newsgroup either rejects or posts.

A moderator is expected to consider suitability only for his own group,
but might be allowed to reject to protect his group (the Newsgroups
header might include a troll-haunt such as alt.palin.a.nutter.for.half.a
century).


Moderation team members should know how the system is meant to work,
even if their software actually handles the matter.

-- 
 (c) John Stockton, nr London UK. ?@merlyn.demon.co.uk  Turnpike v6.05   MIME.
 Web  <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
 Proper <= 4-line sig. separator as above, a line exactly "-- " (RFCs 5536/7)
 Do not Mail News to me. Before a reply, quote with ">" or "> " (RFCs 5536/7)
0
Reply Dr 5/19/2010 5:47:09 PM

robin wrote:
> "James J. Weinkam"<jjw@nospicedham.cs.sfu.ca>  wrote in message news:oZrIn.4426$z%6.1359@edtnps83...
>
> | I believe the program below does what you want and should be a couple of orders of magnitude faster
> | than the method you outlined (based on your statement about how many "PL/I statements" are executed
> | on an input of length 2329 with 66 distinct values). For illustrative purposes, the program uses the
> | character representation you used in your example for input and output, but the algorithm itself
> | will work for values of c up to 32765 and v up to 32764 if you have enough memory and patience.
>
> Assuming that it works, a very impressive effort!

Indeed.

I have not tried to read the code (or explanation) except to note that 
it must be far better than O(n*n). :-)

With that given, how about a simple approach:

Scan through the input string from beginning to end, while maintaining a 
set of lookup tables, one for each previous starting position. In each 
lookup table we will have two things: The values seen so far, and the 
total number of different characters.

For each possible length we remember the shortest string that so far has 
resulted in this many unique values, updating it when we find a shorter 
string.

This way the processing would consist of loading data[i], then for each 
of the previous bytes taken as the starting position check if this value 
hasn't been seen before, and if so, update the count and check if this 
count/length is equal or better to the best seen so far:

for (i = 2; i < data_len; i++) {
   byte b = data[i];
   for (j = 0; j < i-2; j++) {
     if (!seen[j][b]) {
       seen[j][b] = 1;
       unique_values = ++count[j];
       len = i-j+1;
       if (len <= shortest[unique_values]) {
         save(unique_values, len, j);
         shortest[unique_values] = len;
       }
     }
   }
}

The save function will generate a list of all possible "winning" 
candidates, so it must be sorted and pruned before the final output but 
that is trivial.

The main (only?) problem with the program above is that it is still 
O(n*n), with data_len = 4096, the total number of iterations will be 
1+2+3+4+...+4093 which is ~8M.

However, since the maximum possible number of unique values is the 
character set size, even using all 256 possible byte values would mean 
that for the longer sequences, starting early in the input, 15/16 of all 
bytes will have been seen previously so the only code executed will be the

   if (!seen[j][b]) {

test & branch.

With 4K 256-byte lookup tables we'll need 1 MB of cache, using bits 
instead would compress this to 128K, but that is still too large to fit 
in L1 cache, so probably not worth it.

I expect the running time to be ~ 10-20 cycles/inner loop iteration, or 
80-160M cycles for the 4K input, letting the program finish in less than 
a tenth of a second on any currently existing PC.

Is that fast enough?

Terje
-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/19/2010 6:15:32 PM

Terje Mathisen wrote:

> I have not tried to read the code (or explanation) except to note that
> it must be far better than O(n*n). :-)

I have studied the code and the explanation. I too am impressed by its
elegance, and now I finally understand the problem statement.
I think that there is a solution with equal or better O() and a smaller
memory requirement.

Let's go back to the problem statement:

> Given the above setup, the problem is to find *all* sub-arrays that
> contain 3..T distinct elements and are as short as possible,

I took me while to understand what is meant by "as short as possible".

In the example it says:

> It is possible to find sub-arrays with 4, 5, or 6 distinct values, but
> they are either longer than the series of 7-in-7 (12-25 contain abcd),
> or are part of the 7-in-7 series and as such they should not be included!

          1         2         3         4         5     5
.....5....0....5....0....5....0....5....0....5....0....56
aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
           --------------                        -------
            4-in-14                               7-in-7


I did not fully understand why 12-15 was to be rejected, after all
"abcd" is not a substring of "gihbfja".
Now I understand. 7-in-7 contains substrings 6-in-6, 5-in-5, 4-in-4
and so on. These are not to be printed because they are proper
substrings of "gihbfja". However, they define "as short as possible".
for lengths <= 7.

In other words, once you know the longest n-in-n sequence all shorter
sequences must consist entirely of unique symbols.

This suggests two different seach strategies, one for long seqences and
one for short sequences.

Plan A:

Starting with the longest possible sequence search for n-in-m
sequences until n=m. This requires multiple passes for the input,
one for each unique value of n with m>=n (Or two passes actually, see
my first attempt).

Plan B:

Once the longests n-in-n sequence is known perform ONE more pass
over the input, searching for remaining maximal sequences with
3 <= length <n that consist of unique symbols only.

Finish touch:
Sort the results if so desired.
0
Reply free 5/19/2010 7:34:56 PM

Frank Kotler wrote:
> Frank Kotler wrote:
> 
> ...
>> Yeah, I do seem to be truncating your references! Presumably because 
>> they're on separate lines... I thought I had that handled. I'll look 
>> into it. Thanks for the heads up!
> 
> May have that fixed. Seems I was seeing a space where I was expecting a 
> tab... Trouble with trial and error coding is you have to keep posting 
> test messages! Couldn't resist trying to fix other things, too. Blew the 
> last test... try again...

Guess my patch worked, eh?

Date: 19 May 2010 19:34:56 GMT
References: <85akp6F4bpU1@mid.individual.net>
     <oZrIn.4426$z%6.1359@edtnps83>
     <4bf40ffb$0$89667$c30e37c6@exi-reader.telstra.net>
     <6fmdc7-0kv1.ln1@ntp.tmsw.no>


Seems I mostly see multiple "References" all on one line... or with a 
new line starting with TAB. For some reason, I'm seeing "space" here. 
Easily fixed... hope it doesn't catch too much unintended stuff. We 
shall see...

"My users are my test suite, I shall not want." :)

Best,
Frank
0
Reply Frank 5/19/2010 8:02:06 PM

On 2010-05-19 18:15, Terje Mathisen wrote:
> robin wrote:
>> "James J. Weinkam"<jjw@nospicedham.cs.sfu.ca> wrote in message
>> news:oZrIn.4426$z%6.1359@edtnps83...
>>
>> | I believe the program below does what you want and should be a
>> couple of orders of magnitude faster
>> | than the method you outlined (based on your statement about how many
>> "PL/I statements" are executed
>> | on an input of length 2329 with 66 distinct values). For
>> illustrative purposes, the program uses the
>> | character representation you used in your example for input and
>> output, but the algorithm itself
>> | will work for values of c up to 32765 and v up to 32764 if you have
>> enough memory and patience.
>>
>> Assuming that it works, a very impressive effort!
>
> Indeed.
>
> I have not tried to read the code (or explanation) except to note that
> it must be far better than O(n*n). :-)

But is does contain a sort...

> With that given, how about a simple approach:
>
> Scan through the input string from beginning to end, while maintaining a
> set of lookup tables, one for each previous starting position. In each
> lookup table we will have two things: The values seen so far, and the
> total number of different characters.
>
> For each possible length we remember the shortest string that so far has
> resulted in this many unique values, updating it when we find a shorter
> string.
>
> This way the processing would consist of loading data[i], then for each
> of the previous bytes taken as the starting position check if this value
> hasn't been seen before, and if so, update the count and check if this
> count/length is equal or better to the best seen so far:
>
> for (i = 2; i < data_len; i++) {
> byte b = data[i];
> for (j = 0; j < i-2; j++) {
> if (!seen[j][b]) {
> seen[j][b] = 1;
> unique_values = ++count[j];
> len = i-j+1;
> if (len <= shortest[unique_values]) {
> save(unique_values, len, j);
> shortest[unique_values] = len;
> }
> }
> }
> }
>
> The save function will generate a list of all possible "winning"
> candidates, so it must be sorted and pruned before the final output but
> that is trivial.
>
> The main (only?) problem with the program above is that it is still
> O(n*n), with data_len = 4096, the total number of iterations will be
> 1+2+3+4+...+4093 which is ~8M.
>
> However, since the maximum possible number of unique values is the
> character set size, even using all 256 possible byte values would mean
> that for the longer sequences, starting early in the input, 15/16 of all
> bytes will have been seen previously so the only code executed will be the
>
> if (!seen[j][b]) {
>
> test & branch.
>
> With 4K 256-byte lookup tables we'll need 1 MB of cache, using bits
> instead would compress this to 128K, but that is still too large to fit
> in L1 cache, so probably not worth it.
>
> I expect the running time to be ~ 10-20 cycles/inner loop iteration, or
> 80-160M cycles for the 4K input, letting the program finish in less than
> a tenth of a second on any currently existing PC.
>
> Is that fast enough?

You seem to have come up with a carbon copy of the approach Paul Green 
used in 1998 - I did finally find a printout of our email conversation 
from 1998 in one of the unopened boxes on the loft. I will scan and 
(hopefully) OCR(ish) it when I come back from Vilnius next week time.

His code is in Pascal, and until I started running it for parts of the 
input string I had had blind faith in it. Wrong, as it turns out as 
there are three strings that it does not process correctly, and all my 
debugging hasn't given me any clues as to the why.

If there is interest, I can also post my original sliding window code.

FWIW, the number of statement executed is obviously a rather crude way 
of assessing the efficiency of a program. However, I also ran the PL/I 
program with Strobe, a mainframe performance analyzing tool. The result 
was even more devastating, *one single loop* to see if a next value is 
already in the series was responsible for 95% of the CPU time of the 
program. Ouch!

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply Robert 5/19/2010 10:19:27 PM

Frank Kotler wrote:
> James J. Weinkam wrote:
>> Sorry for the double posting.
>>
>> When I sent the message the first time, a message from my outgoing 
>> mail server appeared saying something to the effect that "this account 
>> is valid for email only; postings to newsgroups are ignored."
> 
> Hi James,
> 
> The NG comp.lang.asm.x86 is moderated. What this means is that your NNTP 
> server, instead of posting the article as usual, sends it - by mail - to 
> the moderator for approval. I've never seen that message, but I suspect 
> that's where it's from. When the moderator (or the "moderator's 
> apprentice", me) approves the message, we re-post it to an NNTP server. 
> This time, bearing the highly arcane moderator's incantation, it gets 
> posted. In the case of a cross-posted message, this delays posting to 
> all groups, even the unmoderated ones. We try to keep on top of it, but 
> there is inevitably some delay.
> 
> I strongly suspected I was approving the same message twice, but would 
> rather do that than risk missing some minor edit. If anyone is terribly 
> bothered by the double-posting, my abject apologies on behalf of the 
> moderation team - and my personal advice: "get a life!" :)
> 
> Sorry 'bout that - we appreciate your posting!
> 
> Best,
> Frank
> 
Well this just goes to show that you learn something new every day.

Looking at the subsequent posts in this thread, it strikes me that the process being used to deal 
with messages posted to multiple news groups some of which are moderated is ill conceived. The 
outcome should be the same as if the message were posted separately to each group. This could easily 
be accomplished by sending the measage directly to each unmoderated group and directly to the 
moderator of each moderated group. Why should there be any interaction? How is the ordering decided? 
Just the order the poster happened to choose?

Anyway, thanks for the explanation. However it still doesn't explain the pop up I got when I first 
posted the message. Surely in many years of posting to news groups, I must have posted a message to 
multiple groups before sone of which were moderated. Nevertheless, I have never seen that message 
from the mail server that the "account is only vaild for email and newsgroups will be ignored." 
Moreover that isn't what eventually happened.
0
Reply James 5/20/2010 2:17:08 AM

On May 19, 10:17=A0pm, "James J. Weinkam" <j...@nospicedham.cs.sfu.ca>
wrote:
> Frank Kotler wrote:
> > James J. Weinkam wrote:
> >> Sorry for the double posting.
>
> >> When I sent the message the first time, a message from my outgoing
> >> mail server appeared saying something to the effect that "this account
> >> is valid for email only; postings to newsgroups are ignored."
>
> > Hi James,
>
> > The NG comp.lang.asm.x86 is moderated. What this means is that your NNT=
P
> > server, instead of posting the article as usual, sends it - by mail - t=
o
> > the moderator for approval. I've never seen that message, but I suspect
> > that's where it's from. When the moderator (or the "moderator's
> > apprentice", me) approves the message, we re-post it to an NNTP server.
> > This time, bearing the highly arcane moderator's incantation, it gets
> > posted. In the case of a cross-posted message, this delays posting to
> > all groups, even the unmoderated ones. We try to keep on top of it, but
> > there is inevitably some delay.
>
> > I strongly suspected I was approving the same message twice, but would
> > rather do that than risk missing some minor edit. If anyone is terribly
> > bothered by the double-posting, my abject apologies on behalf of the
> > moderation team - and my personal advice: "get a life!" :)
>
> > Sorry 'bout that - we appreciate your posting!
>
> > Best,
> > Frank
>
> Well this just goes to show that you learn something new every day.
>
> Looking at the subsequent posts in this thread, it strikes me that the pr=
ocess being used to deal
> with messages posted to multiple news groups some of which are moderated =
is ill conceived. The
> outcome should be the same as if the message were posted separately to ea=
ch group. This could easily
> be accomplished by sending the measage directly to each unmoderated group=
 and directly to the
> moderator of each moderated group. Why should there be any interaction? H=
ow is the ordering decided?
> Just the order the poster happened to choose?
>

Well, I am pretty sure these questions were asked when Usenet was
quite young.  I imagine that policies and 'actual practice' have long
been not-quite-in-sync and surely both have evolved much over time.
If you are interested in the details, there are resources specifically
dedicated to these topics:

The Big-8 Management Board
http://www.big-8.org/dokuwiki/doku.php

Internet Systems Consortium
http://www.isc.org/community/reference

And some newsgroups where the above folks 'hang out' to post
references and answer questions:

news.admin.announce
news.announce.newsgroups
news.groups.questions
news.answers

The status of Usenet as a whole seems that it may not be in very good
health:  http://news.slashdot.org/story/10/05/18/2342241/Duke-To-Shut-Down-=
Usenet-Server?art_pos=3D24

Maybe it will come a day that we'll need to set-up our own News
Servers?

Heck, maybe people would see the value of it if there was a Facebook
app that allowed them to read/post to Usenet??  I dunno..  maybe it'll
just follow the gopher?  :(

> Anyway, thanks for the explanation. However it still doesn't explain the =
pop up I got when I first
> posted the message. Surely in many years of posting to news groups, I mus=
t have posted a message to
> multiple groups before sone of which were moderated. Nevertheless, I have=
 never seen that message
> from the mail server that the "account is only vaild for email and newsgr=
oups will be ignored."
> Moreover that isn't what eventually happened.

Probably just a quirk of your Newsreader -- it is the only item that
can actually produce a 'pop up' on your machine.

Nathan.
0
Reply Nathan 5/20/2010 5:02:50 AM

In article  <ht1g4p$up8$1@speranza.aioe.org>
           fbkotler@nospicedham.myfairpoint.net "Frank Kotler" writes:

> Frank Kotler wrote:
> > Frank Kotler wrote:
> >
> > ...
> >> Yeah, I do seem to be truncating your references! Presumably because
> >> they're on separate lines... I thought I had that handled. I'll look
> >> into it. Thanks for the heads up!
> >
> > May have that fixed. Seems I was seeing a space where I was expecting a
> > tab... Trouble with trial and error coding is you have to keep posting
> > test messages! Couldn't resist trying to fix other things, too. Blew the
> > last test... try again...
> 
> Guess my patch worked, eh?
> 
> Date: 19 May 2010 19:34:56 GMT
> References: <85akp6F4bpU1@mid.individual.net>
>      <oZrIn.4426$z%6.1359@edtnps83>
>      <4bf40ffb$0$89667$c30e37c6@exi-reader.telstra.net>
>      <6fmdc7-0kv1.ln1@ntp.tmsw.no>

I'm sure the OP will check it ;-)

> Seems I mostly see multiple "References" all on one line... or with a
> new line starting with TAB. For some reason, I'm seeing "space" here.
> Easily fixed... hope it doesn't catch too much unintended stuff. We
> shall see...

It shouldn't - the RFCs for news messages define "whitespace" as tabs 
or spaces (and I think newlines too for when the References: are 
wrapped on to "continuation" lines that start with tab/space).

But while you're in there, Frank, you might also want to check for 
multiple tabs/spaces between the <reference> entries.  The maximum 
allowed length of a header line is 1000 characters (and a References 
header spread over several lines is considered a single header line) 
so there might come a point in a busy thread where adding the latest 
message-id will take the line over the limit; what one is supposed to 
do then is retain as many as possible of the _latest_ references, plus 
the _original_ reference/message-id, by removing entries.  So for 
example if the references look like GFEDCBA (A is original) and 
reference H would blow the 1000 character limit, one should remove 
reference B (and if necessary, C, D...) until it all "fits". This will 
leave the references as HGFEDCA.

Anyway, getting back to checking for multiple spaces, the 
recommendation (which I must say I've never seen...) is to separate 
the reference entries with "3 or more" spaces where removals have 
taken place; so the example above might look like HGFEDC   A.

The only reason I know of this stuff is that my own newsreader was 
trashing References: headers so I had to rewrite that handling to fix 
it! 

Good luck!
Pete
-- 
   "We have not inherited the earth from our ancestors,
    we have borrowed it from our descendants."
0
Reply pete 5/20/2010 5:29:00 AM

Terje Mathisen wrote:
> robin wrote:
>> "James J. Weinkam"<jjw@nospicedham.cs.sfu.ca> wrote in message
>> news:oZrIn.4426$z%6.1359@edtnps83...
>>
>> | I believe the program below does what you want and should be a
>> couple of orders of magnitude faster
>> | than the method you outlined (based on your statement about how many
>> "PL/I statements" are executed
>> | on an input of length 2329 with 66 distinct values). For
>> illustrative purposes, the program uses the
>> | character representation you used in your example for input and
>> output, but the algorithm itself
>> | will work for values of c up to 32765 and v up to 32764 if you have
>> enough memory and patience.
>>
>> Assuming that it works, a very impressive effort!
>
> Indeed.
>
> I have not tried to read the code (or explanation) except to note that
> it must be far better than O(n*n). :-)
>
> With that given, how about a simple approach:
>
> Scan through the input string from beginning to end, while maintaining a
> set of lookup tables, one for each previous starting position. In each
> lookup table we will have two things: The values seen so far, and the
> total number of different characters.
>
> For each possible length we remember the shortest string that so far has
> resulted in this many unique values, updating it when we find a shorter
> string.
>
> This way the processing would consist of loading data[i], then for each
> of the previous bytes taken as the starting position check if this value
> hasn't been seen before, and if so, update the count and check if this
> count/length is equal or better to the best seen so far:
>
> for (i = 2; i < data_len; i++) {
> byte b = data[i];
> for (j = 0; j < i-2; j++) {
> if (!seen[j][b]) {
> seen[j][b] = 1;
> unique_values = ++count[j];
> len = i-j+1;
> if (len <= shortest[unique_values]) {
> save(unique_values, len, j);
> shortest[unique_values] = len;
> }
> }
> }
> }

I just realized that the actual lookup tables can be skipped, while 
making the program significantly faster:

Simply scan backwards through the input string, starting from the 
previous position:

For each such position increment the count of unique values seen, until 
we reach the first byte, or a duplicate of the current input: At this 
point the scan can terminate at once, which will significantly reduce 
the expected number of iterations, while getting rid of all the big 
lookup tables:

for (i = 2; i < data_len; i++) {
   byte b = data[i];
   for (j = i-1; j >= 0; j--) {
     if (data[j] == b) break;
     len = i-j+1;
     unique_values = ++count[len];
     if (len <= shortest[unique_values]) {
       save(unique_values, len, j);
       shortest[unique_values] = len;
     }
   }
}

>
> The save function will generate a list of all possible "winning"
> candidates, so it must be sorted and pruned before the final output but
> that is trivial.

This is still the same.
>
> The main (only?) problem with the program above is that it is still
> O(n*n), with data_len = 4096, the total number of iterations will be
> 1+2+3+4+...+4093 which is ~8M.

According to the birthday paradox, for random inputs the expected number 
of samples needed to find a collision (i.e. a non-unique value) in a set 
of size n is ~sqrt(n), so in one fell swoop the rewrite above has 
reduced the algorithm from O(n*n) to O(n*sqrt(m)) where (n) is the input 
array length and (m) is the alphabet size.

I.e. for the case of n=4096 and m=256, the expected iteration count is 
now 4096*16=64K, and everything will fit in L1 cache.

Running time should be _well_ below a single ms.

The inner loop can never iterate over more than the last (m) characters, 
so we have 4kB of input and 512 bytes (with 16-bit counters) for the 
active part of the shortest[] table.

Going to huge inputs (more than 64k of data and a 32-bit (utf8) 
alphabet) we end up with n*4 bytes of input and n*4 bytes of shortest[] 
table space, for a total of 8*n.

Assuming a 4GB input array consisting of 1G 32-bit random values, the 
expected iteration count becomes ~1G * sqrt(1G) = 1G*32k or 32e12.

With 5 instructions in the inner loop, say 10 cycles (all accesses are 
sequential so we get maximum use of L1 and L2!), this would take 3-6 hours.

> I expect the running time to be ~ 10-20 cycles/inner loop iteration, or
> 80-160M cycles for the 4K input, letting the program finish in less than
> a tenth of a second on any currently existing PC.
>
> Is that fast enough?

Obviously not. :-)

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/20/2010 8:07:47 AM

On 2010-05-18 22:02:31 -0400, mecej4 said:

> On 5/18/2010 3:27 AM, James J. Weinkam wrote:
>> Robert AH Prins wrote:
>>> Hi all,
>>> 
>>> Can anyone give me some hints as to solve the following problem,
>>> preferably in a way that is faster than the way I used to do it, and
>>> without the bug in the current version;
>>> 
>> ...
>> I believe the program below does what you want and should be a couple of
>> orders of magnitude faster
>> than the method you outlined (based on your statement about how many
>> "PL/I statements" are executed
>> on an input of length 2329 with 66 distinct values). For illustrative
>> purposes, the program uses the
>> character representation you used in your example for input and output,
>> but the algorithm itself
>> will work for values of c up to 32765 and v up to 32764 if you have
>> enough memory and patience.
>> 
>> The program uses four arrays: y(c), a character array into which the
>> input is read; x(c), an integer
>> array giving the encoded input with a=1, b=2, etc; a(0:c,v+1), described
>> below; and l(v), with l(j)
>> the length of the shortest subsequence with j distinct values.
>> 
>> The basic idea is to build an array, a, with a row for each cell and a
>> column for each value. The
>> i,j element of the array gives the relative cell number in the input
>> array starting from cell i of
>> the first occurrence of value j, if any, otherwise a(i,j) is greater
>> than max(c,v+1). The array is
>> built starting from the last row, c, and working back to row 1. Row c is
>> first set to max(c+1,v+2).
>> Then a(c,x(c)) is set to 1. Working backward for i=c-1 to 1, each
>> element of row i is set to the
>> minimum of 32766 and the corresponding element of row i+1 plus 1 and
>> then element a(i,x(i)) is set to 1.
>> 
>> Once this construction is complete, it is evident that there is one and
>> only one instance of the
>> integer 1 in each row, and all values in each row less than c+1 are unique.
>> 
>> Next each row is sorted. After sorting, the a(i,j) element contains the
>> length of the shortest
>> subsequence starting at position i that contains j distinct values
>> unless there is no such
>> subsequence, in which case a(i,j) is greater than c. It is also evident
>> that if a(i,j)=j then
>> a(i,k)=k for 1<=k<=j. Now if a(i,j)=j the subsequence of length j
>> starting at position i has j
>> distinct values, but it should be excluded from the solution set for j
>> distinct values if it is a
>> proper subsequence of a longer subsequence of distinct values. This will
>> be so iff a(i,j+1)=j+1 or
>> a(i-1,j+1)=j+1.
>> 
>> Based on these considerations, the solution sets for each j are
>> constructed by scanning each column
>> starting with j=3 and proceeding to j=v. For this purpose the array a
>> actually has an additional
>> row, 0, to contain the subscript of the first cell of the first minimal
>> subsequence for each j, and
>> an extra column, v+1, containing values greater than c to ensure that
>> the test for inclusion can be
>> carried out for j=v. For each j, l(j) is initialized to c+1 and k (the
>> previous element in the list
>> of minimal subsequences) to 0. The scan proceeds as follows: if a(i,j)
>> is less than l(j) the current
>> subsequence is shorter that the shortest found so far so l(j) is updated
>> and k reset to 0. If a(i,j)
>> is equal to l(j) the inclusion test is applied and if it passes the
>> current subsequence is added to
>> the list. Once the entire column has been processed the list is
>> terminated by setting a(k,j)=0.
>> 
>> It is now a simple matter to read off the solution subsequences. The
>> program utilizes sorta to sort
>> the rows. call sorta(addr(first element),n,w,c) sorts an array of n
>> elements of width w in place
>> according the comparison function c. c(u,v) returns '1'b (bit(1)
>> aligned) if the u-th element may
>> precede the v-th element, i.e., x(u)<=x(v).
>> 
>> 
>> Source:
>> 
>> %process mar(2,100) offset;
>> subsets: proc options(main) reorder;
>> dcl
>> (c,v,i,j,k,m,ar init((rank('a')-1))) bin fixed,
>> (x(c),a(0:c,v+1),l(v)) bin fixed ctl,
>> y(c) char(1) ctl,
>> sorta entry(ptr,bin fixed(31),bin fixed(31),
>> entry(bin fixed(31),bin fixed(31)) returns(bit(1) aligned)),
>> vfmt entry(bin float(53),bin fixed(15),bin fixed(15)) returns(char(50)
>> var),
>> sysin file input,
>> sysprint file print,
>> (addr,max,min,rank) builtin;
>> get file(sysin) list(c,v);
>> put file(sysprint) edit('c: ',vfmt(c,10,0),', v:
>> ',vfmt(v,10,0))(col(1),4 a);
>> allocate x,a,l,y;
>> get file(sysin) edit(y)(col(1),(c)a(1));
>> put file(sysprint) edit('Input: ',y)(col(1),a,(c)a(1));
>> do i=1 to c; x(i)=rank(y(i))-ar; end;
>> a(c,*)=max(c+1,v+2); a(c,x(c))=1;
>> do i=c-1 to 1 by -1; a(i,*)=min(32766,a(i+1,*)+1); a(i,x(i))=1; end;
>> do i=1 to c; call sorta(addr(a(i,1)),v,2,comp); end;
>> do j=3 to v; l(j)=c+1; k=0; m=j+1; a(0,m)=0;
>> do i=1 to c;
>> if a(i,j)<l(j) then do; l(j)=a(i,j); k=0; end;
>> if a(i,j)=l(j) then if a(i-1,m)~=m then if a(i,m)~=m then do;
>> a(k,j),k=i; end;
>> end;
>> a(k,j)=0;
>> if a(0,j)>0 then do;
>> put file(sysprint) edit('Distinct: ',vfmt(j,10,0),', length:
>> ',vfmt(l(j),10,0))(col(1),4 a);
>> i=a(0,j); k=0; do while(i~=0); k+=1;
>> put file(sysprint) edit(k,': (',vfmt(i,10,0),') ',(y(i+m) do m=0 to
>> l(j)-1))
>> (col(1),f(10),3 a,(l(j))a(1));
>> i=a(i,j);
>> end;
>> end;
>> end;
>> comp: proc(u,v) returns(bit(1) aligned) reorder;
>> dcl (u,v) bin fixed(31);
>> return(a(i,u)<=a(i,v));
>> end comp;
>> end subsets;
>> 
>> Input:
>> 
>> 56 10
>> aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
>> 
>> Output:
>> 
>> c: 56, v: 10
>> Input: aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
>> Distinct: 3, length: 3
>> 1: (40) bef
>> 2: (44) fgh
>> Distinct: 7, length: 7
>> 1: (50) gihbfja
>> Distinct: 8, length: 16
>> 1: (41) efffghggggihbfja
>> Distinct: 9, length: 23
>> 1: (34) cbbbbbbefffghggggihbfja
>> Distinct: 10, length: 25
>> 1: (32) dccbbbbbbefffghggggihbfja
>> 
>> Here is the output from another example:
>> 
>> c: 48, v: 10
>> Input: abcabcabcabcdefgabcabccdabcdeabcaaabbbcccdefghij
>> Distinct: 3, length: 3
>> 1: (1) abc
>> 2: (2) bca
>> 3: (3) cab
>> 4: (4) abc
>> 5: (5) bca
>> 6: (6) cab
>> 7: (7) abc
>> 8: (8) bca
>> 9: (9) cab
>> 10: (18) bca
>> 11: (19) cab
>> 12: (20) abc
>> 13: (31) bca
>> Distinct: 4, length: 4
>> 1: (23) cdab
>> 2: (24) dabc
>> Distinct: 5, length: 5
>> 1: (25) abcde
>> 2: (26) bcdea
>> 3: (27) cdeab
>> 4: (28) deabc
>> Distinct: 7, length: 7
>> 1: (10) abcdefg
>> 2: (11) bcdefga
>> 3: (12) cdefgab
>> 4: (13) defgabc
>> Distinct: 8, length: 8
>> 1: (41) cdefghij
>> Distinct: 9, length: 11
>> 1: (38) bcccdefghij
>> Distinct: 10, length: 14
>> 1: (35) abbbcccdefghij
>> 
>> 
> Professor Weinkam's elegant algorithm solves the problem using c.v.lg v 
> operations (his notation), i.e., N.T.lg T (RAH Prins' notation).
> 
> In the PL/I code that he gave, Professor Weinkam used two procedures 
> that are not included in IBM's VAPLI: SORTA and VFMT. However, it is 
> easy to supply the missing functions. In place of SORTA one may use 
> quicksort or mergesort, for which PL/I code is available at 
> http://rosettacode.org/wiki/Sorting_algorithms/Quicksort#PL.2FI and 
> http://rosettacode.org/wiki/Sort_an_integer_array#PL.2FI. For VFMT, the 
> following will do, although it allocates memory that is freed only by 
> exiting the program:
> 
>    vfmt: proc(v) returns(char(10) var);
>    dcl (l,v) fixed bin;
>    dcl str char(10) var ctl;
>    allocate str;
>    if v < 10 then l=1;
>    else if v < 100 then l=2;
>    else if v < 1000 then l=3;
>    else l=4;
>    put string (str) edit (v) (F(l));
>    return (str);
>    end vfmt;
> 
> -- mecej4

There is no earthly reason for str to be CONTROLLED.

Indeed, there is no earthly reason for any variable in any of the above 
code to be CONTROLLED. Get the bounds, and then declare the arrays as 
AUTOMATIC in a BEGIN block. CONTROLLED storage is virtually always a 
mistake; why do you think no other language has ever had it? It was a 
botched first draft of BASED, which was itself a botched first draft of 
HANDLE.

-- 
John W Kennedy
"The pathetic hope that the White House will turn a Caligula into a 
Marcus Aurelius is as na�ve as the fear that ultimate power inevitably 
corrupts."
  -- James D. Barber (1930-2004)


0
Reply jwkenne (1358) 5/20/2010 9:00:42 PM

mecej4 wrote:
> On 5/18/2010 3:27 AM, James J. Weinkam wrote
> 
.....

> In the PL/I code that he gave, Professor Weinkam used two procedures 
> that are not included in IBM's VAPLI: SORTA and VFMT. However, it is 
> easy to supply the missing functions. In place of SORTA one may use 
> quicksort or mergesort, for which PL/I code is available at 
> http://rosettacode.org/wiki/Sorting_algorithms/Quicksort#PL.2FI and 
> http://rosettacode.org/wiki/Sort_an_integer_array#PL.2FI. For VFMT, the 
> following will do, although it allocates memory that is freed only by 
> exiting the program:
> 
>   vfmt: proc(v) returns(char(10) var);
>   dcl (l,v) fixed bin;
>   dcl str char(10) var ctl;
>   allocate str;
>   if v < 10 then l=1;
>   else if v < 100 then l=2;
>   else if v < 1000 then l=3;
>   else l=4;
>   put string (str) edit (v) (F(l));
>   return (str);
>   end vfmt;
> 
Since the OP indicate he could handle programming details, I didn't think it necessary to include 
the sort subroutine or the formatting routine, which is purely cosmetic in any case. I will post the 
routines upon request.


> -- mecej4
0
Reply James 5/20/2010 10:18:30 PM

In <EY0Jn.4632$z%6.1904@edtnps83>, on 05/20/2010
   at 02:17 AM, "James J. Weinkam" <jjw@nospicedham.cs.sfu.ca> said:

>Looking at the subsequent posts in this thread, it strikes me 
>that the process being used to deal  with messages posted to 
>multiple news groups some of which are moderated is ill 
>conceived.

That's the result of taking a superficial look at a complex problem.

>The outcome should be the same as if the message were posted 
>separately to each group.

That would be fundamentally wrong, for reasons that have been hashed
out ad infinitum.

-- 
Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org

0
Reply Shmuel 5/21/2010 12:55:44 AM

Robert AH Prins wrote:
> 
> His code is in Pascal, and until I started running it for parts of the 
> input string I had had blind faith in it. Wrong, as it turns out as 
> there are three strings that it does not process correctly, and all my 
> debugging hasn't given me any clues as to the why.
> 
Can you send a copy of the data set that fails? Also, if it isn't prying into your business, would 
you describe the application this problem is abstracted from? You indicated in your original post 
that the values (characters in the posted esample) were actually indices into a table in the real 
application.

Finally, the code I posted has two bugs that I know of so far; but they didn't prevent the algorithm 
from getting the correct answer. Has anyone spotted them?
0
Reply James 5/21/2010 1:38:15 AM

/*
Here is the final version of my solution.

Let:    N       number of cells, size of the input
        V       number of values, size of symbol set.
        M       the number of distinct values in the largest sequence.
        T       the size of the largest n-in-n sequence, i.e. the
                largest sequence consisting of distinct values only.

As explained in my previous message <4bf43d60$0$22939$c5fe704e@news6.xs4all.nl>
the algorithm uses two different search strategies:

Plan A

Find the largest sequence with size M

for n from M downwards search n-in-m sequences, keeping only sequences
with minimal value of m.
Print sequence if n<m. (If n=m the same sequence will be found again
in the next step).

Now n=m=T.

Plan B
Prints all maximal n-in-n sequences.

Plan A scans the input 1+(M-T) .. 2+(M-T) times, depending on the
input position of the longest sequence.
Plan B scans the input once.

The running time is therefore bounded by O(N * (3+(M-T)) which
is typically better than or equal to O(N*C).
For small values of N the cost of erasing the frequency array should
also be accounted for.

Two arrays are used:
    found[N]    Intermediate results of plan A.
    freq[C]     Frequency count for all symbols in the sliding window


Finding sequences is done using a straightforward sliding window
scan of the input. The values in freq[] are updated when the window
edges move and the numer of unique symbols is updated when a
frequency count changes between zero and non-zero.


The output is unsorted.

*/

#include <stdio.h>
#include <string.h>

#define C 256               // Size of alphabet
#define ONE 1               // Pascal array bias convention

void printres (unsigned char *data, unsigned pos, unsigned l, unsigned w)
{
    unsigned i;
    printf("len %2d aperture %2d pos %2d\t", l, w, ONE+pos);
    for (i=pos; i<pos+w; i++) printf("%c", data[i]);
    printf("\n");
}

void findseqs (unsigned char *data, unsigned len)
{
//  printf("%s\n", data);
    unsigned i;
    unsigned uniq = 0;              // Nr unique symbols in window
    unsigned apub = ~0;             // Aperture upper bound (aka "m")

    unsigned nrfound = /* to keep gcc happy */ 0;
    unsigned found[len];            // intermediate results

    unsigned freq[C];               // Frequency count
    bzero(freq, sizeof(freq));

#   define inc(c) do { if(!freq[c]) uniq++; freq[c]++; } while(0)
#   define dec(c) do { freq[c]--; if(!freq[c]) uniq--; } while(0)
#   define WINSIZE ((r-l)+1)

    // Start with largest possible sequence: the entire input string

    unsigned l = 0;                 // Start of window
    unsigned r = len-1;             // Inclusive end of window
    unsigned lim = len-1;           // inclusive end of data

    for (i=0; i<len; i++) {
        inc(data[i]);
    }
    if (uniq < 3) return;

    // Trim left & right edges

    while (freq[data[r]]>1) { dec(data[r]); r--; }
    while (freq[data[l]]>1) { dec(data[l]); l++; }

    // From here on we want to find only sequences that are shorter or
    // equal to what we have found above:

    unsigned goal = uniq;

    while(1) {

        //  data[l..r] contains "goal" unique symbols
        //  freq[]     number of occurances of each symbol in data[l..r]
        //  data[l]    is a unique symbol
        //  data[r]    is a unique symbol

        if (WINSIZE <= apub) {          // Short enough?

            // A candidate. May be premature if we have shorter sequences
            // with the same nr of unique symbols.

            if (WINSIZE < apub) {
                nrfound = 0;            // Dump old candidates
                apub = WINSIZE;         // Next matches must be <= this one
            }
            found[nrfound] = l;         // Add to list of tentative results
            nrfound++;
        }

        // Slide window to next match.

        dec(data[l]); l++;
        while (freq[data[l]]>1) { dec(data[l]); l++; }
        while (uniq < goal && r<lim)   { r++; inc(data[r]); }

        if (uniq == goal) continue;

        // We've hit the right edge

        bzero(freq, sizeof(freq));      // Reset window & statistics
        uniq = l = r = 0;

        if (apub == goal) break;        // If *only* distinct values proceed
                                        // with plan B.

        for (i=0; i<nrfound; i++) {     // Print the results
            printres(data, found[i], goal, apub);
        }

        // Now the next shorter sequence.

        goal--;

        inc(data[l]);
        do {
            r++;
            inc(data[r]);
        } while (uniq != goal);

        while (freq[data[l]] != 1) {
            dec(data[l]);
            l++;
        }
        // Now the precondition holds. Also, the window size is smaller
        // than the aperture upper bound from the previous iteration.
    }

    // Plan B
    // Now we consider only sequences will all symbols distinct.
    // The freq[] array therefore behaves like a bitmap with values 0&1 only.

    freq[data[l]] = 1;

    while (r<lim) {
        unsigned char cl, cr;
        cl = data[l];
        cr = data[r+1];

        if (freq[cr] != 0) {            // Maximal sequence?
            if (WINSIZE >= 3) {
                printres(data, l, WINSIZE, WINSIZE);
            }
            do {                        // Move left edge until
                cl = data[l];           // we have all symbols unique.
                freq[cl] = 0;
                l++;
            } while (cl != cr);
        }
        r++;
        freq[data[r]]=1;
    }

    // Final sequence.

    if (WINSIZE >= 3) {
        printres(data, l, WINSIZE, WINSIZE);
    }
    return;
}

int main(int argc, char **argv)
{
    if (argc > 2) {
        fprintf(stderr, "Usage: %s {string}\n", argv[0]);
        return 1;
    }
    if (argc == 1) {
        // read from standard input
        unsigned char buf[4096];
        size_t inlen = fread(buf, sizeof(buf[0]), sizeof(buf), stdin);
        findseqs( buf, (unsigned) inlen);
    }
    if (argc == 2) {
        findseqs(argv[1], strlen(argv[1]));
    }

    return 0;
}
0
Reply ftu 5/21/2010 1:46:27 AM

James J. Weinkam wrote:
> Robert AH Prins wrote:
>>
>> His code is in Pascal, and until I started running it for parts of the
>> input string I had had blind faith in it. Wrong, as it turns out as
>> there are three strings that it does not process correctly, and all my
>> debugging hasn't given me any clues as to the why.
>>
> Can you send a copy of the data set that fails? Also, if it isn't prying
> into your business, would you describe the application this problem is
> abstracted from? You indicated in your original post that the values
> (characters in the posted esample) were actually indices into a table in
> the real application.

I would also love to see that, I really do believe that my current 
approach is sound.
>
> Finally, the code I posted has two bugs that I know of so far; but they
> didn't prevent the algorithm from getting the correct answer. Has anyone
> spotted them?

No, we have all been to busy debugging our own algorithms! :-)

The raw output (not filtered to only keep the best solution) is as 
follows for the sample input:

The rows are in order:

Nr of unique symbols
String length
Starting position
The found string

   3   5  11 abbbc
   4  14  11 abbbcbcccccccd
   3   4  31 dccb
   4  10  31 dccbbbbbbe
   5  30  11 abbbcbcccccccddddddddccbbbbbbe
   3   3  39 bef
   4   9  33 cbbbbbbef
   5  11  31 dccbbbbbbef
   6  31  11 abbbcbcccccccddddddddccbbbbbbef
   4   6  39 befffg
   6  14  31 dccbbbbbbefffg
   7  34  11 abbbcbcccccccddddddddccbbbbbbefffg
   3   3  43 fgh
   4   6  40 efffgh
   5   7  39 befffgh
   6  13  33 cbbbbbbefffgh
   7  15  31 dccbbbbbbefffgh
   8  35  11 abbbcbcccccccddddddddccbbbbbbefffgh
   6  12  39 befffghggggi
   8  20  31 dccbbbbbbefffghggggi
   9  40  11 abbbcbcccccccddddddddccbbbbbbefffghggggi
   3   3  49 gih
   3   3  50 ihb
   4   4  49 gihb
   3   3  51 hbf
   4   4  50 ihbf
   5   5  49 gihbf
   3   3  52 bfj
   4   4  51 hbfj
   5   5  50 ihbfj
   6   6  49 gihbfj
   7  15  40 efffghggggihbfj
   9  24  31 dccbbbbbbefffghggggihbfj
  10  44  11 abbbcbcccccccddddddddccbbbbbbefffghggggihbfj
   3   3  53 fja
   4   4  52 bfja
   5   5  51 hbfja
   6   6  50 ihbfja
   7   7  49 gihbfja
   8  16  40 efffghggggihbfja
   9  23  33 cbbbbbbefffghggggihbfja
  10  25  31 dccbbbbbbefffghggggihbfja

Filtering is trivial:

First remove all non-shortest strings for a given # of unique symbols
Next remove all strings that either start or end at the same position as 
another string while being exactly one character shorter. I.e. remove 
all pure substrings.

Doing the first stage manually on the set above result in:

   3   3  39 bef
   3   3  43 fgh
   3   3  49 gih
   3   3  50 ihb
   4   4  49 gihb
   3   3  51 hbf
   4   4  50 ihbf
   5   5  49 gihbf
   3   3  52 bfj
   4   4  51 hbfj
   5   5  50 ihbfj
   6   6  49 gihbfj
   3   3  53 fja
   4   4  52 bfja
   5   5  51 hbfja
   6   6  50 ihbfja
   7   7  49 gihbfja
   8  16  40 efffghggggihbfja
   9  23  33 cbbbbbbefffghggggihbfja
  10  25  31 dccbbbbbbefffghggggihbfja

The second stage removes the proper substrings, starting from the shortest:

   3   3  39 bef
   3   3  43 fgh
   7   7  49 gihbfja
   8  16  40 efffghggggihbfja
   9  23  33 cbbbbbbefffghggggihbfja
  10  25  31 dccbbbbbbefffghggggihbfja

Except for the use of zero-based counting of starting position, this 
looks identical to the original request. :-)

Running time is still O(N*sqrt(C)) where N is the input length and C is 
the character size, and total memory use is O(N), i.e. the same as the 
data itself.

With 4K inputs total data size would be 12KB, but only 4.5 KB would be 
within the working set at any given time, so L1 hit rates are very close 
to 100%.

int scan(byte *data, int data_len)
{
	int c = 0;
	for (int i = 2; i < data_len; i++) {
		byte b = data[i];
		for (int j = i-1; j >= 0; j--) {
// A previously seen character cannot end a new longest sequence!
// It also cannot add to the sequence count for this or any
// longer string, so stop scanning!
			if (data[j] == b) break;

			int len = i-j+1;
// Increment the count of unique values seen when starting
// from position [j]
			int unique_values = ++count[j];
			if (len <= shortest[unique_values]) {
				if (len >= 3)
					save(unique_values, len, j);
				shortest[unique_values] = len;
				c++;
			}
		}
	}
	return c;
}


Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/21/2010 6:56:52 AM

"John W Kennedy" <jwkenne@attglobal.net> wrote in message news:4bf5a2f7$0$22525$607ed4bc@cv.net...
| On 2010-05-18 22:02:31 -0400, mecej4 said:
|
| There is no earthly reason for str to be CONTROLLED.
|
| Indeed, there is no earthly reason for any variable in any of the above
| code to be CONTROLLED. Get the bounds, and then declare the arrays as
| AUTOMATIC in a BEGIN block. CONTROLLED storage is virtually always a
| mistake; why do you think no other language has ever had it?

Fortran has had controlled storage since 1991.

And CONTROLLED storage is/was not a mistake.

| It was a botched first draft of BASED,

No it wassn't. 


0
Reply robin51 (247) 5/21/2010 8:41:03 AM

 >Terje Mathisen wrote:


 > [Snip]
 >
 > With 4K inputs total data size would be 12KB, but only 4.5 KB would be within the working set at 
any given time, so L1 hit rates are very close to 100%.
 >
 > int scan(byte *data, int data_len)
 > {
 >     int c = 0;
 >     for (int i = 2; i < data_len; i++) {
 >         byte b = data[i];
 >         for (int j = i-1; j >= 0; j--) {
 > // A previously seen character cannot end a new longest sequence!
 > // It also cannot add to the sequence count for this or any
 > // longer string, so stop scanning!
 >             if (data[j] == b) break;
 >
 >             int len = i-j+1;
 > // Increment the count of unique values seen when starting
 > // from position [j]
 >             int unique_values = ++count[j];
 >             if (len <= shortest[unique_values]) {
 >                 if (len >= 3)
 >                     save(unique_values, len, j);
 >                 shortest[unique_values] = len;
 >                 c++;
 >             }
 >         }
 >     }
 >     return c;
 > }
As Terje has pointed out a number of times, taking into account
the hit rate for L1 and L2 cache will have a significant impact
on how fast the code runs.  While many of the other contributions
to this thread have ignored this aspect, a program which requires
only L1 cache for most of its data can run many times faster,
perhaps up to ten times as fast based on some of the tests I did
about a year ago.

Jerome Fine
0
Reply Jerome 5/21/2010 1:08:31 PM

On 2010-05-21 04:41:03 -0400, robin said:

> "John W Kennedy" <jwkenne@attglobal.net> wrote in message 
> news:4bf5a2f7$0$22525$607ed4bc@cv.net...
> | On 2010-05-18 22:02:31 -0400, mecej4 said:
> |
> | There is no earthly reason for str to be CONTROLLED.
> |
> | Indeed, there is no earthly reason for any variable in any of the above
> | code to be CONTROLLED. Get the bounds, and then declare the arrays as
> | AUTOMATIC in a BEGIN block. CONTROLLED storage is virtually always a
> | mistake; why do you think no other language has ever had it?
> 
> Fortran has had controlled storage since 1991.

No, it has had dynamic (allocatable) arrays, which are not one-tenth as 
complex.

> And CONTROLLED storage is/was not a mistake.

I repeat: it was a botch that was never repeated.

>> It was a botched first draft of BASED,
> 
> No it wassn't.

....which demonstrates conclusively that you know nothing of the 
development of PL/I. Anyone who was there at the time knows how in the 
early days, the syntax for declaring a based variable was:

    DECLARE P POINTER;
    DECLARE 1 RECORD CONTROLLED(P), 2...;

-- 
John W Kennedy
If Bill Gates believes in "intelligent design", why can't he apply it 
to Windows?

0
Reply jwkenne (1358) 5/21/2010 2:22:38 PM

"John W Kennedy" <jwkenne@attglobal.net> wrote in message news:4bf69720$0$31260$607ed4bc@cv.net...
| On 2010-05-21 04:41:03 -0400, robin said:
|
| > "John W Kennedy" <jwkenne@attglobal.net> wrote in message
| > news:4bf5a2f7$0$22525$607ed4bc@cv.net...
| > | On 2010-05-18 22:02:31 -0400, mecej4 said:
| > |
| > | There is no earthly reason for str to be CONTROLLED.
| > |
| > | Indeed, there is no earthly reason for any variable in any of the above
| > | code to be CONTROLLED. Get the bounds, and then declare the arrays as
| > | AUTOMATIC in a BEGIN block. CONTROLLED storage is virtually always a
| > | mistake; why do you think no other language has ever had it?
| >
| > Fortran has had controlled storage since 1991.
|
| No, it has had dynamic (allocatable) arrays, which are not one-tenth as
| complex.

Fortran's ALLOCATABLE array is an example of controlled storage.
Just like PL/I, storage is alllocated and freed explicitly by the
programmer.  The relevent statement correcponding to FREE
is DEALLOCATE.

| > And CONTROLLED storage is/was not a mistake.
|
| I repeat: it was a botch that was never repeated.
|
| >> It was a botched first draft of BASED,
| >
| > No it wassn't.
|
| ...which demonstrates conclusively that you know nothing of the
| development of PL/I. Anyone who was there at the time knows how in the
| early days, the syntax for declaring a based variable was:
|
|    DECLARE P POINTER;
|    DECLARE 1 RECORD CONTROLLED(P), 2...;

We haven't been talking about BASED variables;
the discussion is about controlled storage.


0
Reply robin51 (247) 5/21/2010 2:59:37 PM

Jerome H. Fine wrote:
>  >Terje Mathisen wrote:
>
>
>  > [Snip]
>  >
>  > With 4K inputs total data size would be 12KB, but only 4.5 KB would
> be within the working set at any given time, so L1 hit rates are very
> close to 100%.
>  >
>  > int scan(byte *data, int data_len)
>  > {
>  > int c = 0;
>  > for (int i = 2; i < data_len; i++) {
>  > byte b = data[i];
>  > for (int j = i-1; j >= 0; j--) {
>  > // A previously seen character cannot end a new longest sequence!
>  > // It also cannot add to the sequence count for this or any
>  > // longer string, so stop scanning!
>  > if (data[j] == b) break;
>  >
>  > int len = i-j+1;
>  > // Increment the count of unique values seen when starting
>  > // from position [j]
>  > int unique_values = ++count[j];
>  > if (len <= shortest[unique_values]) {
>  > if (len >= 3)
>  > save(unique_values, len, j);
>  > shortest[unique_values] = len;
>  > c++;
>  > }
>  > }
>  > }
>  > return c;
>  > }
> As Terje has pointed out a number of times, taking into account
> the hit rate for L1 and L2 cache will have a significant impact
> on how fast the code runs. While many of the other contributions
> to this thread have ignored this aspect, a program which requires
> only L1 cache for most of its data can run many times faster,
> perhaps up to ten times as fast based on some of the tests I did
> about a year ago.

I have now timed my code:

The full scan for potential solutions with the given 56-byte input 
string took about 8K cycles, while my big-O estimate said O(n*sqrt(c)) 
which in our case is O(56*sqrt(10)) ~= 180, i.e. each iteration took a 
little over 40 cycles.

Since the number of potential solutions was quite high (40), the part of 
the inner loop that saved each of them used significant amounts of time, 
including a significant number of mis-predicted branches.

Still, running at 130+ cycles per input byte is still fast enough to 
handle significant amounts of data.

I would love to have a copy of the "problematic" 4K input which the 
original poster have alluded to!

Terje

PS. The final filter operation runs in O(n*log(n)) so it should be much 
faster than the scan, at least for large (n). With the 56-byte input 
however, the filter took about twice as long as the scan, i.e. the 
constant terms were significantly higher!

This meant that the total processing time was actually about 25K cycles, 
or 400+ cycles/byte. On my laptop (2.2GHz) this corresponds to "only" 
5.5 MB/s. :-(

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/21/2010 3:21:35 PM

On 2010-05-21 10:59:37 -0400, robin said:

> "John W Kennedy" <jwkenne@attglobal.net> wrote in message 
> news:4bf69720$0$31260$607ed4bc@cv.net...
> | On 2010-05-21 04:41:03 -0400, robin said:
> |
> | > "John W Kennedy" <jwkenne@attglobal.net> wrote in message
> | > news:4bf5a2f7$0$22525$607ed4bc@cv.net...
> | > | On 2010-05-18 22:02:31 -0400, mecej4 said:
> | > |
> | > | There is no earthly reason for str to be CONTROLLED.
> | > |
> | > | Indeed, there is no earthly reason for any variable in any of the above
> | > | code to be CONTROLLED. Get the bounds, and then declare the arrays as
> | > | AUTOMATIC in a BEGIN block. CONTROLLED storage is virtually always a
> | > | mistake; why do you think no other language has ever had it?
> | >
> | > Fortran has had controlled storage since 1991.
> |
> | No, it has had dynamic (allocatable) arrays, which are not one-tenth as
> | complex.
> 
> Fortran's ALLOCATABLE array is an example of controlled storage.
> Just like PL/I, storage is alllocated and freed explicitly by the
> programmer.  The relevent statement correcponding to FREE
> is DEALLOCATE.

And can multiple allocations be stacked? Can they be in COMMON blocks? 
Can they be SAVE variables?

> 
> | > And CONTROLLED storage is/was not a mistake.
> |
> | I repeat: it was a botch that was never repeated.
> |
> | >> It was a botched first draft of BASED,
> | >
> | > No it wassn't.
> |
> | ...which demonstrates conclusively that you know nothing of the
> | development of PL/I. Anyone who was there at the time knows how in the
> | early days, the syntax for declaring a based variable was:
> |
> |    DECLARE P POINTER;
> |    DECLARE 1 RECORD CONTROLLED(P), 2...;
> 
> We haven't been talking about BASED variables;
> the discussion is about controlled storage.

That's a barefaced lie, and the quotes above prove it.

-- 
John W Kennedy
"There are those who argue that everything breaks even in this old dump 
of a world of ours. I suppose these ginks who argue that way hold that 
because the rich man gets ice in the summer and the poor man gets it in 
the winter things are breaking even for both. Maybe so, but I'll swear 
I can't see it that way."
  -- The last words of Bat Masterson

0
Reply jwkenne (1358) 5/21/2010 3:43:13 PM

On 21 Geg, 18:21, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> wrote:
> Jerome H. Fine wrote:
> > =A0>Terje Mathisen wrote:
>
> > =A0> [Snip]
>
> > =A0> With 4K inputs total data size would be 12KB, but only 4.5 KB woul=
d
> > be within the working set at any given time, so L1 hit rates are very
> > close to 100%.
>
> > =A0> int scan(byte *data, int data_len)
> > =A0> {
> > =A0> int c =3D 0;
> > =A0> for (int i =3D 2; i < data_len; i++) {
> > =A0> byte b =3D data[i];
> > =A0> for (int j =3D i-1; j >=3D 0; j--) {
> > =A0> // A previously seen character cannot end a new longest sequence!
> > =A0> // It also cannot add to the sequence count for this or any
> > =A0> // longer string, so stop scanning!
> > =A0> if (data[j] =3D=3D b) break;
>
> > =A0> int len =3D i-j+1;
> > =A0> // Increment the count of unique values seen when starting
> > =A0> // from position [j]
> > =A0> int unique_values =3D ++count[j];
> > =A0> if (len <=3D shortest[unique_values]) {
> > =A0> if (len >=3D 3)
> > =A0> save(unique_values, len, j);
> > =A0> shortest[unique_values] =3D len;
> > =A0> c++;
> > =A0> }
> > =A0> }
> > =A0> }
> > =A0> return c;
> > =A0> }
> > As Terje has pointed out a number of times, taking into account
> > the hit rate for L1 and L2 cache will have a significant impact
> > on how fast the code runs. While many of the other contributions
> > to this thread have ignored this aspect, a program which requires
> > only L1 cache for most of its data can run many times faster,
> > perhaps up to ten times as fast based on some of the tests I did
> > about a year ago.
>
> I have now timed my code:
>
> The full scan for potential solutions with the given 56-byte input
> string took about 8K cycles, while my big-O estimate said O(n*sqrt(c))
> which in our case is O(56*sqrt(10)) ~=3D 180, i.e. each iteration took a
> little over 40 cycles.
>
> Since the number of potential solutions was quite high (40), the part of
> the inner loop that saved each of them used significant amounts of time,
> including a significant number of mis-predicted branches.
>
> Still, running at 130+ cycles per input byte is still fast enough to
> handle significant amounts of data.
>
> I would love to have a copy of the "problematic" 4K input which the
> original poster have alluded to!

The current set of input data comprises of "just" 2329 values, and
right now it might potentially increase to around 3K. The current set
is processed without problems,  the problem arose when I started using
the code for subsets of this. The output for three of them was plain
wrong, and this could easily be verified manually - am right now in
Lithuania and didn't really want to take it, but the shortest
erroneous input set has, IIRC, *only six or seven elements*.

As for the 4k lmit, there is a chance that some users may pool their
data, and that might increase the "strings" beyond 4k values. The
values themselves will never exceed 255.

I will post the three problematic strings, and the output that results
from them, on Wednesday when I'm back in Belgium - program will be in
Pascal or PL/I, take your pick. :)


> PS. The final filter operation runs in O(n*log(n)) so it should be much
> faster than the scan, at least for large (n). With the 56-byte input
> however, the filter took about twice as long as the scan, i.e. the
> constant terms were significantly higher!
>
> This meant that the total processing time was actually about 25K cycles,
> or 400+ cycles/byte. On my laptop (2.2GHz) this corresponds to "only"
> 5.5 MB/s. :-(

Robert
0
Reply prino 5/21/2010 4:34:24 PM

"John W Kennedy" <jwkenne@attglobal.net> wrote in message news:4bf6aa0e$0$22538$607ed4bc@cv.net...
| On 2010-05-21 10:59:37 -0400, robin said:
|
| > "John W Kennedy" <jwkenne@attglobal.net> wrote in message
| > news:4bf69720$0$31260$607ed4bc@cv.net...
| > | On 2010-05-21 04:41:03 -0400, robin said:
| > |
| > | > "John W Kennedy" <jwkenne@attglobal.net> wrote in message
| > | > news:4bf5a2f7$0$22525$607ed4bc@cv.net...
| > | > | On 2010-05-18 22:02:31 -0400, mecej4 said:
| > | > |
| > | > | There is no earthly reason for str to be CONTROLLED.
| > | > |
| > | > | Indeed, there is no earthly reason for any variable in any of the above
| > | > | code to be CONTROLLED. Get the bounds, and then declare the arrays as
| > | > | AUTOMATIC in a BEGIN block. CONTROLLED storage is virtually always a
| > | > | mistake; why do you think no other language has ever had it?
| > | >
| > | > Fortran has had controlled storage since 1991.
| > |
| > | No, it has had dynamic (allocatable) arrays, which are not one-tenth as
| > | complex.
| >
| > Fortran's ALLOCATABLE array is an example of controlled storage.
| > Just like PL/I, storage is alllocated and freed explicitly by the
| > programmer.  The relevent statement correcponding to FREE
| > is DEALLOCATE.
|
| And can multiple allocations be stacked? Can they be in COMMON blocks?
| Can they be SAVE variables?

That's irrelevant.  You claimed that "no other language has ever
had it", whereas Fortran does have controlled storage, in point of fact. 


0
Reply robin51 (247) 5/21/2010 5:35:33 PM

John W Kennedy wrote:
> On 2010-05-21 04:41:03 -0400, robin said:
> 
>> "John W Kennedy" <jwkenne@attglobal.net> wrote in message 
>> news:4bf5a2f7$0$22525$607ed4bc@cv.net...
>> | On 2010-05-18 22:02:31 -0400, mecej4 said:
>> |
>> | There is no earthly reason for str to be CONTROLLED.
>> |
>> | Indeed, there is no earthly reason for any variable in any of the above
>> | code to be CONTROLLED. Get the bounds, and then declare the arrays as
>> | AUTOMATIC in a BEGIN block. CONTROLLED storage is virtually always a
>> | mistake; why do you think no other language has ever had it?
>>
>> Fortran has had controlled storage since 1991.
> 
> No, it has had dynamic (allocatable) arrays, which are not one-tenth as 
> complex.
> 
>> And CONTROLLED storage is/was not a mistake.
> 
> I repeat: it was a botch that was never repeated.
> 
>>> It was a botched first draft of BASED,
>>
>> No it wassn't.
> 
> ...which demonstrates conclusively that you know nothing of the 
> development of PL/I. Anyone who was there at the time knows how in the 
> early days, the syntax for declaring a based variable was:
> 
>    DECLARE P POINTER;
>    DECLARE 1 RECORD CONTROLLED(P), 2...;
> 

It's not a complete botch.  I've looked at modifying some PL/I code that 
used CONTROLLED to BASED for a subset-G type implementation.  It's quite 
a bit more complex.  The PL/I philosophy, for better or worse, is to let 
the compiler do things if possible.
0
Reply Peter_Flass (934) 5/21/2010 10:01:17 PM

On 2010-05-21 18:01:17 -0400, Peter Flass said:

> John W Kennedy wrote:
>> On 2010-05-21 04:41:03 -0400, robin said:
>> 
>>> "John W Kennedy" <jwkenne@attglobal.net> wrote in message 
>>> news:4bf5a2f7$0$22525$607ed4bc@cv.net...
>>> | On 2010-05-18 22:02:31 -0400, mecej4 said:
>>> |
>>> | There is no earthly reason for str to be CONTROLLED.
>>> |
>>> | Indeed, there is no earthly reason for any variable in any of the above
>>> | code to be CONTROLLED. Get the bounds, and then declare the arrays as
>>> | AUTOMATIC in a BEGIN block. CONTROLLED storage is virtually always a
>>> | mistake; why do you think no other language has ever had it?
>>> 
>>> Fortran has had controlled storage since 1991.
>> 
>> No, it has had dynamic (allocatable) arrays, which are not one-tenth as 
>> complex.
>> 
>>> And CONTROLLED storage is/was not a mistake.
>> 
>> I repeat: it was a botch that was never repeated.
>> 
>>>> It was a botched first draft of BASED,
>>> 
>>> No it wassn't.
>> 
>> ...which demonstrates conclusively that you know nothing of the 
>> development of PL/I. Anyone who was there at the time knows how in the 
>> early days, the syntax for declaring a based variable was:
>> 
>> DECLARE P POINTER;
>> DECLARE 1 RECORD CONTROLLED(P), 2...;
>> 
> 
> It's not a complete botch.  I've looked at modifying some PL/I code 
> that used CONTROLLED to BASED for a subset-G type implementation.  It's 
> quite a bit more complex.  The PL/I philosophy, for better or worse, is 
> to let the compiler do things if possible.

Did you even need BASED?  99 times out of 100, what you really need 
instead of CONTROLLED is:

   GET (M, N);
   BEGIN;
      DCL A(M, N);

In over a quarter century of using PL/I for nearly all my programming, 
including commercial batch programs, system programs, and home-grown 
on-line systems (with just enough assembler to talk to the terminals), 
I never found a program that wasn't simpler, more reliable, and faster 
with CONTROLLED removed.

-- 
John W Kennedy
"Those in the seat of power oft forget their failings and seek only the 
obeisance of others!  Thus is bad government born!  Hold in your heart 
that you and the people are one, human beings all, and good government 
shall arise of its own accord!  Such is the path of virtue!"
  -- Kazuo Koike.  "Lone Wolf and Cub:  Thirteen Strings" (tr. Dana Lewis)

0
Reply jwkenne (1358) 5/21/2010 10:48:45 PM

John W Kennedy <jwkenne@attglobal.net> wrote:
(snip)
 
> Did you even need BASED?  99 times out of 100, what you really need 
> instead of CONTROLLED is:
 
>   GET (M, N);
>   BEGIN;
>      DCL A(M, N);

I might not believe 99, but I agree that in a large fraction
of the cases that probably works fine, and is more readable.
 
> In over a quarter century of using PL/I for nearly all my programming, 
> including commercial batch programs, system programs, and home-grown 
> on-line systems (with just enough assembler to talk to the terminals), 
> I never found a program that wasn't simpler, more reliable, and faster 
> with CONTROLLED removed.

I only remember doing that one time, with CALL/OS, which doesn't
have CONTROLLED.  I probably would have used CONTROLLED, though,
if it had it.

But it isn't so unusual to have more than one thing allocated,
and to need to deallocate one, or reallocate one, and not the 
others such that BEGIN won't do it.

I might say 2 out of 3, though.

-- glen 
0
Reply gah (12259) 5/22/2010 12:16:25 AM

"John W Kennedy" <jwkenne@attglobal.net> wrote in message news:4bf70dbe$0$2874$607ed4bc@cv.net...

| Did you even need BASED?  99 times out of 100, what you really need
| instead of CONTROLLED is:
|
|   GET (M, N);
|   BEGIN;
|      DCL A(M, N);
|
| In over a quarter century of using PL/I for nearly all my programming,
| including commercial batch programs, system programs, and home-grown
| on-line systems (with just enough assembler to talk to the terminals),
| I never found a program that wasn't simpler, more reliable, and faster
| with CONTROLLED removed.

Linked lists?  Queueing? etc etc etc. 


0
Reply robin51 (247) 5/22/2010 12:49:27 AM

John W Kennedy wrote:
> On 2010-05-21 18:01:17 -0400, Peter Flass said:
> 
>> John W Kennedy wrote:
>>> On 2010-05-21 04:41:03 -0400, robin said:
>>>
>>>> "John W Kennedy" <jwkenne@attglobal.net> wrote in message 
>>>> news:4bf5a2f7$0$22525$607ed4bc@cv.net...
>>>> | On 2010-05-18 22:02:31 -0400, mecej4 said:
>>>> |
>>>> | There is no earthly reason for str to be CONTROLLED.
>>>> |
>>>> | Indeed, there is no earthly reason for any variable in any of the 
>>>> above
>>>> | code to be CONTROLLED. Get the bounds, and then declare the arrays as
>>>> | AUTOMATIC in a BEGIN block. CONTROLLED storage is virtually always a
>>>> | mistake; why do you think no other language has ever had it?
>>>>
>>>> Fortran has had controlled storage since 1991.
>>>
>>> No, it has had dynamic (allocatable) arrays, which are not one-tenth 
>>> as complex.
>>>
>>>> And CONTROLLED storage is/was not a mistake.
>>>
>>> I repeat: it was a botch that was never repeated.
>>>
>>>>> It was a botched first draft of BASED,
>>>>
>>>> No it wassn't.
>>>
>>> ...which demonstrates conclusively that you know nothing of the 
>>> development of PL/I. Anyone who was there at the time knows how in 
>>> the early days, the syntax for declaring a based variable was:
>>>
>>> DECLARE P POINTER;
>>> DECLARE 1 RECORD CONTROLLED(P), 2...;
>>>
>>
>> It's not a complete botch.  I've looked at modifying some PL/I code 
>> that used CONTROLLED to BASED for a subset-G type implementation.  
>> It's quite a bit more complex.  The PL/I philosophy, for better or 
>> worse, is to let the compiler do things if possible.
> 
> Did you even need BASED?  99 times out of 100, what you really need 
> instead of CONTROLLED is:
> 
>   GET (M, N);
>   BEGIN;
>      DCL A(M, N);
> 
> In over a quarter century of using PL/I for nearly all my programming, 
> including commercial batch programs, system programs, and home-grown 
> on-line systems (with just enough assembler to talk to the terminals), I 
> never found a program that wasn't simpler, more reliable, and faster 
> with CONTROLLED removed.
> 

The program uses CONTROLLED as it was intended -- as a pushdown stack. 
I can replicate the behavior with BASED, but I'd need an anchor for each 
variable, and have to redefine each to include a link, plus write 
allocate and deallocate routines that deal with the chain.  Given a 
number of such variables, this isn't trivial.  I'll just wait until Iron 
Spring gets CONTROLLED.
0
Reply Peter_Flass (934) 5/22/2010 12:54:16 AM

In article <11lic7-2u52.ln1@ntp.tmsw.no>,
	Terje Mathisen <"terje.mathisen at tmsw.no"@giganews.com> writes:
>
> I have now timed my code:
> 
> The full scan for potential solutions with the given 56-byte input 
> string took about 8K cycles, while my big-O estimate said O(n*sqrt(c)) 
> which in our case is O(56*sqrt(10)) ~= 180, i.e. each iteration took a 
> little over 40 cycles.
> 

That sounds reasonable. I've timed my solution also, and depending on
the CPU and compiler version and switches I get values from 7200 to
9999 cycles.

> Since the number of potential solutions was quite high (40), the part of
> the inner loop that saved each of them used significant amounts of time,
> including a significant number of mis-predicted branches.
> 
> Still, running at 130+ cycles per input byte is still fast enough to 
> handle significant amounts of data.
> 

Sounds like you've beaten me. My O() depends on the size of the longest
sequence minus the size of the longest unique sequence, which is good
for the reference data. If I time 4K of random data I get 7250 cycles per
input byte. Since sqrt(C) is 16 for that dataset you ought to get about
800 cycles/byte ( http://www.fi.uu.nl/~ftu/random.bin )
0
Reply free 5/22/2010 4:13:17 AM

"Robert AH Prins" <spamtrap@prino.org> ha scritto nel messaggio
news:85akp6F4bpU1@mid.individual.net...
> Hi all,

for answer, you use ot code in asm group.
than the problem in the few i understand
it seems to me about DNA search strings
or change strings; i'm again all this



0
Reply io_x 5/22/2010 10:36:48 AM

In comp.lang.pascal.borland message <EY0Jn.4632$z%6.1904@edtnps83>, Thu,
20 May 2010 02:17:08, James J. Weinkam <jjw@nospicedham.cs.sfu.ca>
posted:

>
>Looking at the subsequent posts in this thread, it strikes me that the
>process being used to deal with messages posted to multiple news groups
>some of which are moderated is ill conceived.

No.  It is well-defined and appropriate.  All moderators should
understand it.  All moderating software should implement it.

> The outcome should be the same as if the message were posted
>separately to each group.

No.  If you want to do that :
   (a) you are one or more of : foolish, ignorant, or inconsiderate,
   (b) you can do it directly and easily enough.

You should have permission from SFU (commonly considered, IIRC, a
reputable organisation) to cause span to be directed to that address?

-- 
 (c) John Stockton, nr London UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
    Grandson-Of-RFC1036 is released.  RFC 5536 Netnews Article Format is a
    subset of Internet Message Format which is described in RFC 5532.  The
    RFCs are read together to determine standard Netnews article format.
0
Reply Dr 5/22/2010 4:52:46 PM

In <4bf70dbe$0$2874$607ed4bc@cv.net>, on 05/21/2010
   at 06:48 PM, John W Kennedy <jwkenne@attglobal.net> said:

>Did you even need BASED?

For linked lists?

>I never found a program that wasn't simpler, more reliable, and
>faster  with CONTROLLED removed.

You've never seen a program that was simpler if it used a stack?

Note: I'm not challenging your account of the history, just looking at
the utility after the fact. Nor am I defending use of based or
controlled in contexts where they are not the appropriate tools for
the job.

BTW, what's the overhead for a BEGIN block with current compilers?
That used to be an excuse[1] for misuse of controlled, although with
the "optimizing" compiler I know that the overhead was reduced.

[1] The same overhead issue applied to the allocate.

-- 
Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org

0
Reply spamtrap16 (3672) 5/23/2010 12:39:33 AM

"Shmuel (Seymour J.) Metz" <spamtrap@library.lspace.org.invalid> wrote in message 
news:4bf87945$1$fuzhry+tra$mr2ice@news.patriot.net...
| In <4bf70dbe$0$2874$607ed4bc@cv.net>, on 05/21/2010
|   at 06:48 PM, John W Kennedy <jwkenne@attglobal.net> said:
|
| >Did you even need BASED?
|
| For linked lists?
|
| >I never found a program that wasn't simpler, more reliable, and
| >faster  with CONTROLLED removed.
|
| You've never seen a program that was simpler if it used a stack?
|
| Note: I'm not challenging your account of the history, just looking at
| the utility after the fact. Nor am I defending use of based or
| controlled in contexts where they are not the appropriate tools for
| the job.
|
| BTW, what's the overhead for a BEGIN block with current compilers?

A BEGIN block is treated as if it were a procedure.

| That used to be an excuse[1] for misuse of controlled, although with
| the "optimizing" compiler I know that the overhead was reduced.
|
| [1] The same overhead issue applied to the allocate.

Well, no.  Certainly storage is allocated [for ALLOCATE], but only for the
named variables.
But that's the end of it.
For a block, not only must storage be allocated for ALL variables
in the block, including temooraries, but also
there is the prologue and epilogue accociated with a block.
This may require that conditions be set up and dis-established, etc.

So, in the end, ALLOCATE has less overhead than a block. 


0
Reply robin51 (247) 5/23/2010 3:15:37 AM

"Dr J R Stockton" <reply1020@nospicedham.merlyn.demon.co.uk> wrote in 
message news:IJmTEXPevA+LFwGc@invalid.uk.co.demon.merlyn.invalid...
> In comp.lang.pascal.borland message <EY0Jn.4632$z%6.1904@edtnps83>, Thu,
> 20 May 2010 02:17:08, James J. Weinkam <jjw@nospicedham.cs.sfu.ca>
> posted:
>
>>
>>Looking at the subsequent posts in this thread, it strikes me that the
>>process being used to deal with messages posted to multiple news groups
>>some of which are moderated is ill conceived.
>
> No.  It is well-defined and appropriate.  All moderators should
> understand it.  All moderating software should implement it.
>

The word "should" implies a *desired* state.  In practice, well...

For instance, one would expect all such software to pass-along the MIME 
headers.  When we were using PyModerator last year, we discovered that it 
did not do this.  So we patched it.

>> The outcome should be the same as if the message were posted
>>separately to each group.
>
> No.  If you want to do that :
>   (a) you are one or more of : foolish, ignorant, or inconsiderate,

I'm pretty sure we don't do that here at CLAX.  But, it is possible we might 
not be following all of the traditional practice in other areas.  Mainly 
because our tree-house is still a bit in an "under construction" state. 
However, we are none of the above 'colorful words' -- we are simply 
pragmatic.  :)

I am currently in the process of porting our moderation client software from 
Linux to Windows (ASM code just "wants" to be OS agnostic!!), so I intend to 
revisit all of the related "handbooks", RFCs, etc. during this time.

> You should have permission from SFU (commonly considered, IIRC, a
> reputable organisation) to cause span to be directed to that address?
>

A good point.  But I bet that Mr. Weinkam wants to tell you that it "isn't 
any of your business" and while doing so, he might drop the leading 'S' from 
his organisation's acronym.  :)

Nathan.


0
Reply Nathan 5/23/2010 5:06:58 AM

Dick Wesseling wrote:
> In article<11lic7-2u52.ln1@ntp.tmsw.no>,
> 	Terje Mathisen<"terje.mathisen at tmsw.no"@giganews.com>  writes:
>> Still, running at 130+ cycles per input byte is still fast enough to
>> handle significant amounts of data.
>>
>
> Sounds like you've beaten me. My O() depends on the size of the longest
> sequence minus the size of the longest unique sequence, which is good
> for the reference data. If I time 4K of random data I get 7250 cycles per
> input byte. Since sqrt(C) is 16 for that dataset you ought to get about
> 800 cycles/byte ( http://www.fi.uu.nl/~ftu/random.bin )

It is worse (or better, seen from my point):

With 4K random input the scan took 252K cycles, which means that I 
averaged about 83 cycles/byte. (I.e. very fast!)

The problem was that with totally random data, there are a _lot_ of both 
potential and real solutions, I found 4099 candidates (i.e. just over 
one per starting byte!) and 121 after filtering.

The filter process here took a _lot_ of time though: 8M cycles!

This means that it needed 2000 cycles/input byte. :-(

This simply shows that the problem space (and therefore required 
algorithm as well) almost certainly won't contain totally random data, 
and this will most probably help my code...

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/23/2010 8:37:59 AM

In <4bf89ddf$0$89666$c30e37c6@exi-reader.telstra.net>, on 05/23/2010
   at 01:15 PM, "robin" <robin51@dodo.com.au> said:

>A BEGIN block is treated as if it were a procedure.

Water is wet.

>Well, no.  Certainly storage is allocated [for ALLOCATE], but only
>for the named variables.

The storage for a DSA is pretty much noise; what matters when talking
about overhead is the cost of housekeeping, especially GETMAIN and
FREEMAIN.

>This may require that conditions be set up and dis-established, etc.

Conditions are set up by the ON statement.

-- 
Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org

0
Reply spamtrap16 (3672) 5/23/2010 1:17:41 PM

Robert AH Prins wrote:

> Hi all,
>
> Can anyone give me some hints as to solve the following problem,
> preferably in a way that is faster than the way I used to do it, and
> without the bug in the current version;
>
> Problem description:
>
> The program processes an array of N cells, with N up to about 4096. The
> array is filled from the start, it may not be completely full but there
> are no empty cells between full cells. A cell contains a value of
> 1..255, so logically every cell would be a byte, but if processing
> efficiency requires each cell to be a word or even a double-word, that
> would be no problem. The values are normalized, they are effectively
> indexes into a table, so if the table contains just T entries, the
> values would go from 1 to T.
>
> Given the above setup, the problem is to find *all* sub-arrays that
> contain 3..T distinct elements and are as short as possible, i.e. if
> there are 42 sub-arrays of three elements containing three distinct
> values, there is no need to find sub-arrays of four elements containing
> three distinct values. Also, a shorter sub-array may *never* be a part
> of a longer sub-array containing *only* distinct values.
>
> An example, suppose only the array contains 56 elements, in this case
> with, for clarity, values a..j:
>
> 1 2 3 4 5 5
> ....5....0....5....0....5....0....5....0....5....0....56
> aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
>
> in this case the program should find the following sub-arrays:
>
> len aperture from to value
> 3 3 40 42 bef
> 3 3 44 46 fgh
> 7 7 50 56 gihbfja
> 8 16 41 56 efffghggggihbfja, containing efghibja
> 9 23 34 56 cbbbbbbefffghggggihbfja, containing cbefghija
> 10 25 32 56 dccbbbbbbefffghggggihbfja, containing dcbefghija
>
> It is possible to find sub-arrays with 4, 5, or 6 distinct values, but
> they are either longer than the series of 7-in-7 (12-25 contain abcd),
> or are part of the 7-in-7 series and as such they should not be included!
>
> My AD 1996 program used to slide a window that started with three
> elements (plus a sentinel on either side) over the big array and spit
> out the position of the first element if it found three distinct
> elements in the window *and* the two sentinels also contained any of the
> values in the window. The program included some fairly minor
> optimizations, such as immediately sliding the window to the last of a
> series of equal values, in the example above, the three position wide
> window would start by covering pos 6..8, and next it would move to
> 11..13, etc. If at any moment the program would find that all elements
> in the window *and the sentinel on the right* would be distinct, the
> window size would be increased by one and the scan would continue.
>
> Taking the above example, a window with an aperture of four elements
> would start its slide at position 6..9. Once it reached position 49,
> with "gihb" visible and "g" and "f" in the two sentinels, the window
> would be widened to five characters as the "f" is distinct from the four
> "visible" characters, with a new sentinel of "j". Given that this is
> again distinct from the now five characters in the window, the process
> would repeat itself, eventually resulting in the string of seven
> distinct characters starting at position 50.
>
> Once the string of seven distinct characters has been found, the window
> is widened to *nine* (if there had been a string of 8-in-8 it would have
> been found by the previous slide!) characters and the process restarts
> at position 6..14, but in the end it fails to find a series of eight
> distinct characters and so the process is repeated with windows of 10,
> 11, 12, 13, 14, 15 and finally 16 characters, when a string of eight
> distinct characters is finally found.
>
> The problem is, for relatively low values of N and C, the process may
> not be overly efficient, but it works. However, once N and C increase,
> (N is theoretically unbounded, C has an upper limit of around 210), the
> process becomes horribly inefficient: using the old IBM V2.3.0 OS PL/I
> compiler, which had a statement count facility, this method required the
> execution of 556,379,518 PL/I statements (and a number of actual machine
> instruction that is at least one order of magnitude greater) for the
> current values of N(2329) and C(66).
>
> As for some figures on required restarts, the final series of 66 is
> found in a sub-array with a length of 1950 elements and with a series of
> 65 in a sub-array of only 1889 elements, this means that the slide had
> to be restarted 61 times! Even more restarts, 312(!), are needed when
> going from 61 to 62 elements.
>
> In 1998 I posted the problem to comp.lang.pascal.borland, the post
> should be on Google, but Google refuses to show more than the first
> 9,960 (out of 28,517) posts in that group.
>
> Three people claimed that there was a much faster way, and two of them
> backed this up with programs, Brent Beach and Paul Green, with Paul's
> solution being the fastest by a fair margin. The results of his code
> matched my output, I replaced my code with his code, and that seemed to
> be the end of the story...
>
> Until a few months back...
>
> ... when I started converting the original Pascal code in assembler.
> While doing so I decided that it would be interesting to use this
> procedure in another part of the code. However, the results were not
> what I expected: result rows were missing.

If you want to do this in assembler, you might
consider to use an alternative approach:

1. Allocate 4,352 byte of memory.
2. Scan through the input:

EBP = address input
ESI = address input
EDI = address array (1st byte)
ECX = input size

Store EDI, ESI and EBP on stack or another lo-
cation for reloading. The array must be zeroed
before it is used (2,048 byte).

....
add EBP,ECX
0:
dec  EBP
movz EAX,BYTE 00[ESI]
movz EBX,BYTE 00[EBP]
cmp DWORD 00[EDI + EAX * 8],0
jne 1f
mov DWORD 00[EDI + EAX * 8],ESI
1:
cmp DWORD 04[EDI + EBX * 8],0
jne 2f
mov DWORD 04[EDI + EBX * 8],EBP
2:
inc ESI
dec ECX
jne 0b
....

Time: ~131,072 clocks or better.

This one pass function stores the addresses of
the 1st and last occurence of a character. The
array is organised as follows:

0000 1st_00, last_00, 1st_01, last_01
0010 1st_02, last_02, 1st_03, last_03
....
07F0 1st_FE, last_FE, 1st_FF, last_FF

Array entries with a value of zero tell us the
corresponding character was not found. If both
entries are equal, the corresponding character
was found only once.

Next, sort out all non-existing characters and
create three work tables:

EDI = address array (1st byte)
EDI + 0x0800 + (n * 4) = 1st occurence
EDI + 0x0C00 + (n * 4) = last occurence
EDI + 0x1000 + (n * 1) = associated char

....
xor EAX,EAX
mov EBX,254
0:
mov ESI,DWORD 0x00[EDI + EBX * 8]
mov EBP,DWORD 0x04[EDI + EBX * 8]
mov ECX,DWORD 0x08[EDI + EBX * 8]
mov EDX,DWORD 0x0C[EDI + EBX * 8]
test ECX,ECX
je 1f
mov DWORD 0x0800[EDI + EAX * 4],ECX
mov DWORD 0x0C00[EDI + EAX * 4],EDX
mov BYTE 0x1000[EDI + EAX * 4],BL
inc EAX
1:
decl EBX
test ESI,ESI
je 2f
mov DWORD 0x0800[EDI + EAX * 4],ESI
mov DWORD 0x0C00[EDI + EAX * 4],EBP
mov BYTE 0x1000[EDI + EAX * 4],BL
inc EAX
2:
decl EBX
jns 0b
....

EAX holds the amount of found chars on exit.

Time: ~5,120 clocks or better.

Tables 1 and 2 contain lists with addresses of
found characters (1st and last), table 3 holds
the associated characters in descending order.
Both - addresses and associated char - are ac-
cessed via indices to get address / char-pairs
for the output. If you move any address to an-
other index (e.g. when sorting), you *have to*
move the other address and the associated cha-
racter to the same index. Otherwise, relation-
ship between any of them will be corrupted!

Searching non-unique patterns is done via com-
paring addresses in both tables. Every char is
included at least once in the area between its
1st and last address, even if they're equal. A
character included in *any* non-unique pattern
must 'sit' at an address equal to or above the
currently processed character. A found pattern
ends at the highest last address.

I do not know which sequences are of interest,
so I cannot provide code for that. If you look
for sequences m...n, a corresponding substring
starts at m(first) and ends at n(last). Only a
copy operation (from n(start) up to m(end)) is
required to retrieve non-unique substrings for
a print-out (address calculation is trivial).

To search for unique patterns - all characters
differ from each other - the entire input must
be scanned one time:

EBP = address input
ESI = input_size

....
0:
xor EDX,EDX
mov AL,BYTE 00[EBP + EDX]
inc EDX
1:
mov BL,BYTE 0x00[EBP + EDX]
mov CL,BYTE 0x01[EBP + EDX]
cmp AL,BL
je 4f
inc EDX
cmp BL,CL
je 4f
cmp AL,CL
je 4f
inc EDX
cmp EDX,ESI
jae 4f
push EDX
sub EDX,2
je 3f
2:
cmp BL,BYTE 0x00[EBP + EDX]
je 3f
cmp CL,BYTE 0x00[EBP + EDX]
je 3f
dec EDX
jne 2b
3:
mov EAX,EDX
pop EDX
test EAX,EAX
je 1b
4:
cmp EDX,3
jb 5f
.... # store EBP (address 1st char)
.... # store EDX as pattern length
.... # (store EDX chars as found pattern)
5:
add EBP,EDX
sub ESI,EDX
jbe done
jmp 0b

done:
....

With the largest patterns (255 unique chars in
a row), about 3,000,000 clock cycles are quite
realistic. The smaller the found patterns, the
less time is spent - with any pattern below 32
chars, it probably takes less than half a mil-
lion clocks (over the thumb). A once processed
character is not touched again, when it passed
the inner loop. If you search for one specific
pattern, time can be reduced further (by using
a '1st' address from the table and subtracting
the skipped bytes from ESI, decreasing overall
iterations).

Given clocks are for 4,096 byte arrays (linear
growth with increasing size, except the unique
pattern search).

Clock counts are for AMD Athlon, Family 8+. In
64 bit code, execution times can be reduced to
about 60...75 percent of the given ones. Eight
additional registers are snake oil for complex
functions... ;)


Greetings from Augsburg

Bernhard Schornak
0
Reply Bernhard 5/23/2010 5:29:56 PM

In article <946nc7-rcj2.ln1@ntp.tmsw.no>,
	Terje Mathisen <"terje.mathisen at tmsw.no"@giganews.com> writes:
> Dick Wesseling wrote:
>> In article<11lic7-2u52.ln1@ntp.tmsw.no>,
>> 	Terje Mathisen<"terje.mathisen at tmsw.no"@giganews.com>  writes:
>>> Still, running at 130+ cycles per input byte is still fast enough to
>>> handle significant amounts of data.
>>>
>>
>> Sounds like you've beaten me. My O() depends on the size of the longest
>> sequence minus the size of the longest unique sequence, which is good
>> for the reference data. If I time 4K of random data I get 7250 cycles per
>> input byte. Since sqrt(C) is 16 for that dataset you ought to get about
>> 800 cycles/byte ( http://www.fi.uu.nl/~ftu/random.bin )
> 
> It is worse (or better, seen from my point):
> 
> With 4K random input the scan took 252K cycles, which means that I 
> averaged about 83 cycles/byte. (I.e. very fast!)
> 

Are you using my data set? I completed and fixed (or broke?) your
code as follows:

#define C 0x100
#define OSIZ 409600
unsigned outpos;

struct {
    unsigned len;
    unsigned pos;
    unsigned w;
} output[OSIZ];

static
void save(unsigned l, unsigned w, unsigned pos)
{
#if 1
    output[outpos].len = l  ;
    output[outpos].pos = pos;
    output[outpos].w   = w  ;
#endif
    outpos++;
}


int scan(byte *data, int data_len)
{
        unsigned count[data_len];
        int      shortest[C+1];
        int i;
        for (i=0; i<C+1; i++) shortest[i] = 0x7FFFFFFF;
        for (i=0; i<data_len; i++) count[i] = 1;

        int c = 0;
	for (i = 1;             // not 2: else "abc" won't work


This version took about 15M cycles on my athlon, 10M cycles if I
just count the solutions without actually saving them.


> The problem was that with totally random data, there are a _lot_ of both
> potential and real solutions, I found 4099 candidates (i.e. just over
> one per starting byte!) and 121 after filtering.
> 

Then we're definitely not using the same data set. When I ran your
program I got 78059 potential solutions instead. My program found
831 solutions, all of which were in your unfiltered output.

> The filter process here took a _lot_ of time though: 8M cycles!
>
> This means that it needed 2000 cycles/input byte. :-(
> 
> This simply shows that the problem space (and therefore required
> algorithm as well) almost certainly won't contain totally random data,
> and this will most probably help my code...
> 

A slightly optimized version of my program - see below - took
23M cycles on the same data set. Since I only timed the scan phase
of your program and not the filter phase it appears our approaches
are comparable for random data, not as bad as I feared.

Here's my optimized inner loop

#if 0   // 30M cycles for 4K random data
        // Note: previous "final" version posted still had a bug here

        dec(data[l]); l++;
        while (uniq < goal && r<lim)   { r++; inc(data[r]); }
        while (freq[data[l]]>1) { dec(data[l]); l++; }
#else
        // 23M cycles for 4K random data
        // The code generated by gcc still hurts my eyes, but this
        // simplified loop is a good starting point for asm.


        freq[data[l]]=0; l++; uniq--;  // data[l] is unique

        while (r<lim) {              // optimize controlling expression of
            unsigned char cr;        // loop
            r++;
            cr = data[r];
            unsigned frc = freq[cr];
            freq[cr] = frc+1;
            if (frc == 0) {
                uniq++;
                break;
            }
        }
        while (freq[data[l]]>1) {     // "uniq" does not change
            freq[data[l]]--;
            l++;
        }
#endif

Once I get rid of the variable "uniq", which is not essential, the
remaining variables data[], freq[], l and r will fit into the x86
register set and the inner loop becomes 14 instructions per input
byte per iteration of the outermost loop.
0
Reply free 5/23/2010 6:14:47 PM

Dick Wesseling wrote:
>> With 4K random input the scan took 252K cycles, which means that I
>> averaged about 83 cycles/byte. (I.e. very fast!)
>>
>
> Are you using my data set? I completed and fixed (or broke?) your

Definitely not the same data set, I am simply using
	rand() & 63) + 33;
to generate random (but printable) bytes.

With rand & 255 I get slightly different counts.

> int scan(byte *data, int data_len)
> {
>          unsigned count[data_len];
>          int      shortest[C+1];
>          int i;
>          for (i=0; i<C+1; i++) shortest[i] = 0x7FFFFFFF;
>          for (i=0; i<data_len; i++) count[i] = 1;
>
>          int c = 0;
> 	for (i = 1;             // not 2: else "abc" won't work

You're absolutely right, in order to increment the count to 3 the scan 
has to start with the second char. Thanks!

> Once I get rid of the variable "uniq", which is not essential, the
> remaining variables data[], freq[], l and r will fit into the x86
> register set and the inner loop becomes 14 instructions per input
> byte per iteration of the outermost loop.

OK, I'll have to compare that to some hand-optimised asm. :-)

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/23/2010 8:11:57 PM

In article <htbonq$61b$1@news.eternal-september.org>,
	Bernhard Schornak <schornak@nospicedham.web.de> writes:

>
> If you want to do this in assembler, you might
> consider to use an alternative approach:
> 
> 1. Allocate 4,352 byte of memory.
> 2. Scan through the input:
> 
[snip]

I haven't studied your table construction and search algorithm
yet,
However, the following cannot be correct:

> Searching non-unique patterns is done via com-
> paring addresses in both tables. Every char is
> included at least once in the area between its
> 1st and last address, even if they're equal. A
> character included in *any* non-unique pattern
> must 'sit' at an address equal to or above the
> currently processed character. A found pattern
> ends at the highest last address.
> 
> I do not know which sequences are of interest,
> so I cannot provide code for that. If you look
> for sequences m...n, a corresponding substring
> starts at m(first) and ends at n(last). Only a
> copy operation (from n(start) up to m(end)) is
> required to retrieve non-unique substrings for
> a print-out (address calculation is trivial).
> 
> To search for unique patterns - all characters
> differ from each other - the entire input must
> be scanned one time:
> 
> EBP = address input
> ESI = input_size
> 
> ...
> 0:
> xor EDX,EDX
> mov AL,BYTE 00[EBP + EDX]
> inc EDX
> 1:
> mov BL,BYTE 0x00[EBP + EDX]
> mov CL,BYTE 0x01[EBP + EDX]
> cmp AL,BL
> je 4f
> inc EDX
> cmp BL,CL
> je 4f
> cmp AL,CL
> je 4f
> inc EDX
> cmp EDX,ESI
> jae 4f
> push EDX
> sub EDX,2
> je 3f
> 2:
> cmp BL,BYTE 0x00[EBP + EDX]
> je 3f
> cmp CL,BYTE 0x00[EBP + EDX]
> je 3f
> dec EDX
> jne 2b
> 3:
> mov EAX,EDX
> pop EDX
> test EAX,EAX
> je 1b

When label 1 is reached from 0 AL contains the first input
character, When label 1 is reached via "je 1b" AL is always
zero.
0
Reply free 5/23/2010 9:06:01 PM

 >Terje Mathisen wrote:

> >Dick Wesseling wrote:
>
>>> With 4K random input the scan took 252K cycles, which means that I
>>> averaged about 83 cycles/byte. (I.e. very fast!)
>>
>> Are you using my data set? I completed and fixed (or broke?) your
>
> Definitely not the same data set, I am simply using
>     rand() & 63) + 33;
> to generate random (but printable) bytes.
>
> With rand & 255 I get slightly different counts.
>
>> int scan(byte *data, int data_len)
>> {
>>          unsigned count[data_len];
>>          int      shortest[C+1];
>>          int i;
>>          for (i=0; i<C+1; i++) shortest[i] = 0x7FFFFFFF;
>>          for (i=0; i<data_len; i++) count[i] = 1;
>>
>>          int c = 0;
>>     for (i = 1;             // not 2: else "abc" won't work 
>
> You're absolutely right, in order to increment the count to 3 the scan 
> has to start with the second char. Thanks!
>
>> Once I get rid of the variable "uniq", which is not essential, the
>> remaining variables data[], freq[], l and r will fit into the x86
>> register set and the inner loop becomes 14 instructions per input
>> byte per iteration of the outermost loop.
>
> OK, I'll have to compare that to some hand-optimised as

I am attempting to follow the code for this problem, however, it becomes 
a bit
difficult to follow as it changes so often.  This is not a complaint as 
I very much
appreciate some real examples.

However, I would be extremely helpful if both Terje and Dick could 
include all of
the code and some examples, as was done the first few times, 
approximately every
fifth time the code changes.  This will enable the rest of us who want 
to follow the
discussion to put it into context.  The problem is very instructive and 
I really enjoy
the description of what you fellows are doing.

Jerome Fine
0
Reply Jerome 5/24/2010 12:07:17 AM

In article <4bf9c369$0$275$14726298@news.sunsite.dk>,
	"Jerome H. Fine" <jhfinedp3k@nospicedham.dp3knospamcompsys.to.dotsrc.org> writes:
>
> I am attempting to follow the code for this problem, however, it becomes 
> a bit difficult to follow as it changes so often.

I know. Never say "final version"....


> This is not a complaint as I very much appreciate some real examples.

I'd be happy to provide some provide some real examples. Tomorrow is a
holliday here, so I can afford to spend time on this puzzle without
feeling guilty about it. Depending on how things are going - I am
now coding the final asm version - I may even do so later tonight/today.
I don't have to bring my offspring to school. so I can stay up as late
as I want to.
0
Reply free 5/24/2010 2:04:07 AM

"Shmuel (Seymour J.) Metz" <spamtrap@library.lspace.org.invalid> wrote in message 
news:4bf92af5$1$fuzhry+tra$mr2ice@news.patriot.net...
| In <4bf89ddf$0$89666$c30e37c6@exi-reader.telstra.net>, on 05/23/2010
|   at 01:15 PM, "robin" <robin51@dodo.com.au> said:
|
| >A BEGIN block is treated as if it were a procedure.
|
| Water is wet.

I'm referring to the code that is produced.

| >Well, no.  Certainly storage is allocated [for ALLOCATE], but only
| >for the named variables.
|
| The storage for a DSA is pretty much noise;

But it's an extra overhead that ALLOCATE doesn't have
[apart from the that required for the specificate ALLOCATE].

| what matters when talking
| about overhead is the cost of housekeeping, especially GETMAIN and
| FREEMAIN.
|
| >This may require that conditions be set up and dis-established, etc.
|
| Conditions are set up by the ON statement.

Conditions are set up by condition prefixes. 


0
Reply robin51 (247) 5/24/2010 2:44:44 AM

Dick Wesseling wrote:

<snip>

 > However, the following cannot be correct:

<snip>

 >> EBP = address input
 >> ESI = input_size
 >>
 >> ...
 >> 0:
 >> xor EDX,EDX
 >> mov AL,BYTE 00[EBP + EDX]
 >> inc EDX
 >> 1:
 >> mov BL,BYTE 0x00[EBP + EDX]
 >> mov CL,BYTE 0x01[EBP + EDX]
 >> cmp AL,BL
 >> je 4f
 >> inc EDX
 >> cmp BL,CL
 >> je 4f
 >> cmp AL,CL
 >> je 4f
 >> inc EDX
 >> cmp EDX,ESI
 >> jae 4f
 >> push EDX
 >> sub EDX,2
 >> je 3f

mov AL,CL # AL = char at -1[EBP + EDX]
.......... # for the next iteration

 >> 2:
 >> cmp BL,BYTE 0x00[EBP + EDX]
 >> je 3f
 >> cmp CL,BYTE 0x00[EBP + EDX]
 >> je 3f
 >> dec EDX
 >> jne 2b
 >> 3:

mov ECX,EDX # ECX is loaded in 1:

 >> pop EDX

test ECX,ECX

 >> je 1b
 >
 > When label 1 is reached from 0 AL contains the first input
 > character, When label 1 is reached via "je 1b" AL is always
 > zero.

Sorry! (I optimised the most important instruction out
of my code.) AL must contain the last character in the
already processed pattern, of course.

The table stuff is not as complicated as it seems. All
substrings can be determined with the addresses stored
in it (no extra scans required). It uses a fast method
scanning through an input array in both directions si-
multaneously. Getting the first and last positition of
each char in a single pass is (probably) faster than a
repeated scan through the entire array - the addresses
are sufficient to calculate all substrings rather than
to search for them.


Greetings from Augsburg

Bernhard Schornak
0
Reply Bernhard 5/24/2010 3:01:27 AM

 >Dick Wesseling wrote:

>>In article <4bf9c369$0$275$14726298@news.sunsite.dk>,
>	"Jerome H. Fine" writes:
>
>>I am attempting to follow the code for this problem, however, it becomes 
>>a bit difficult to follow as it changes so often.
>>
>I know. Never say "final version"....
>  
>
Having started on an IBM 650 in SOAP around 1960, I fully realize that 
aspect.

>>This is not a complaint as I very much appreciate some real examples.
>>
>I'd be happy to provide some provide some real examples. Tomorrow is a
>holliday here, so I can afford to spend time on this puzzle without
>feeling guilty about it. Depending on how things are going - I am
>now coding the final asm version - I may even do so later tonight/today.
>I don't have to bring my offspring to school. so I can stay up as late
>as I want to.
>  
>
The C code is even more instructive in its own way as it provides an
overview that assembly code is unable to do.  The current C code
along with a few examples as you did with your first post would be
VERY appreciated!  It also provides a comparison with the original
and shows the steps taken to make the code faster and more efficient.

As Terje notes, the efficient use of L1 cache is able to improve the
execution speed, but is rarely considered when an algorithm is
evaluated.  The C code might include a few comments as to just
how much storage is required at any point in the process.  As far
as I can determine, the L1 instruction cache will not be a problem
for this algorithm.  Likewise the L1 data cache if care is taken even
though the total storage requirements are much larger.

Perhaps having started with a system that also had an effective L1
cache of about 100 bytes helps to remember how to optimize the
code.  I think that the total memory including the disk which also
held the code and the data was about 20 KBytes.

Jerome Fine
0
Reply Jerome 5/24/2010 3:44:51 AM

On 2010-05-22 07:54:13 -0400, David W Noon said:

> -----BEGIN PGP SIGNED MESSAGE-----
> 
> Hash: SHA1
> 
> 
> 
> On Sat, 22 May 2010 10:49:27 +1000, robin wrote about Re: Optimization
> 
> problem:
> 
> 
> 
> >"John W Kennedy" <jwkenne@attglobal.net> wrote in message
> 
> >news:4bf70dbe$0$2874$607ed4bc@cv.net...
> 
> >
> 
> >| Did you even need BASED?  99 times out of 100, what you really need
> 
> >| instead of CONTROLLED is:
> 
> >|
> 
> >|   GET (M, N);
> 
> >|   BEGIN;
> 
> >|      DCL A(M, N);
> 
> >|
> 
> >| In over a quarter century of using PL/I for nearly all my
> 
> >programming, | including commercial batch programs, system programs,
> 
> >and home-grown | on-line systems (with just enough assembler to talk
> 
> >to the terminals), | I never found a program that wasn't simpler, more
> 
> >reliable, and faster | with CONTROLLED removed.
> 
> >
> 
> >Linked lists?  Queueing? etc etc etc. 
> 
> 
> 
> Lists and queues can be implemented using BASED, which actually gives
> 
> faster access than CONTROLLED.
> 
> 
> 
> The real benefit that CONTROLLED offers, to my mind, is block
> 
> independence.  A subroutine or function can maintain its internal state
> 
> without compromising its reentrability.  Each CONTROLLED variable is
> 
> anchored by an external dummy section (on the mainframe, other
> 
> platforms use a work-alike) that is allocated in the Pseudo-Register
> 
> Vector (PRV), which hangs off the Task Control Area -- that magic piece
> 
> of memory addressed by the sacred register 12. This means that state
> 
> information can be placed in dynamic storage anchored at task level,
> 
> rather than block level. Subsequent calls to the subroutine or function
> 
> can retrieve the previous state with ease, but can still be
> 
> safely executed by multiple TCB's concurrently.
> 
> 
> 
> Of course, this is a somewhat exotic use case.  But when confronted by
> 
> such a requirement, I want CONTROLLED storage.
> 

But that is a way to force a language designed in the 60s, implemented 
on an operating system designed in the 60s,* to do what more modern 
languages do with modules or packages, and still more modern languages 
do with objects.

* Even TSS/360 had a better approach to reentrance, conceptually much 
the same as is implemented in pretty much all modern operating systems.

-- 
John W Kennedy
"Those in the seat of power oft forget their failings and seek only the 
obeisance of others!  Thus is bad government born!  Hold in your heart 
that you and the people are one, human beings all, and good government 
shall arise of its own accord!  Such is the path of virtue!"
  -- Kazuo Koike.  "Lone Wolf and Cub:  Thirteen Strings" (tr. Dana Lewis)

0
Reply jwkenne (1358) 5/24/2010 3:46:56 AM

On 2010-05-23 22:44:44 -0400, robin said:

> "Shmuel (Seymour J.) Metz" <spamtrap@library.lspace.org.invalid> wrote 
> in message
> news:4bf92af5$1$fuzhry+tra$mr2ice@news.patriot.net...
> | In <4bf89ddf$0$89666$c30e37c6@exi-reader.telstra.net>, on 05/23/2010
> |   at 01:15 PM, "robin" <robin51@dodo.com.au> said:
> |
> | >A BEGIN block is treated as if it were a procedure.
> |
> | Water is wet.
> 
> I'm referring to the code that is produced.
> 
> | >Well, no.  Certainly storage is allocated [for ALLOCATE], but only
> | >for the named variables.
> |
> | The storage for a DSA is pretty much noise;
> 
> But it's an extra overhead that ALLOCATE doesn't have
> [apart from the that required for the specificate ALLOCATE].

You don't know what the Hell you're talking about.

-- 
John W Kennedy
"The whole modern world has divided itself into Conservatives and 
Progressives. The business of Progressives is to go on making mistakes. 
The business of the Conservatives is to prevent the mistakes from being 
corrected."
  -- G. K. Chesterton

0
Reply jwkenne (1358) 5/24/2010 4:11:01 AM

Jerome H. Fine wrote:
> I am attempting to follow the code for this problem, however, it
> becomes a bit difficult to follow as it changes so often. This is not
> a complaint as I very much appreciate some real examples.

OK, here is my complete program, as it currently exists:
(I finally realized that my potential solution can explode when working 
with random data, so I have increased the buffer to store them to the 
maximum possible count!)

// unique-strings.cpp : Defines the entry point for the console application.
//

//#include "stdafx.h"
#include <stdio.h>
#include <tchar.h>
#include <stdlib.h>

typedef unsigned char byte;

#define MAX_DATA 4096
int data_len;

// Data to be scanned
byte data[MAX_DATA+1];

// Remember the shortest sequence with n unique symbols
short shortest[256]; // Alphabet size entries
// Running count of unique symbols seen, from [n] to current pos
short count[MAX_DATA];

typedef struct _solution_t {
	int uniq;
	int len;
	int start;
} solution_t;

solution_t solution[MAX_DATA*256]; // Maximum possible count of possible 
solutions
int solutions = 0;

void init(void)
{
	for (int i = 0; i < MAX_DATA; i++)
		shortest[i] = 32767;

	for (int i = 0; i < MAX_DATA; i++)
		count[i] = 1;

	for (int i = 0; i < MAX_DATA; i++)
		data[i] = (rand() & 63) + 33;
	data_len = MAX_DATA;

	// Here are a couple of small sets with known solutions:
	//memcpy(data, 
"aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja", 56);
	//data_len = 56;
	//memcpy(data, "abc", 3);
	//data_len = 3;
}

void save(int unique, int len, int start)
{
	solution[solutions].uniq = unique;
	solution[solutions].len = len;
	solution[solutions].start = start;
	solutions++;
}

int solsort(const void *a, const void *b)
{
	solution_t *as = (solution_t *) a;
	solution_t *bs = (solution_t *) b;
	int u = bs->uniq - as->uniq;		// Return longest unique sets first
	if (!u) {
		u = as->len - bs->len;			// Return shortest string first for each set 
length
		if (!u)
			u = as->start - bs->start;	// Finally order by starting position
	}
	return u;
}

int filter(void)
{
	int prev_uniq = 0x7fffffff; // Max 32-bit int
	int prev_len = 0x7fffffff; // Max 32-bit int

	// First get rid of all non-optimal strings for each unique set length:
	// Sort reverse by set length, forward by string length, then by 
starting position
	qsort(solution, solutions, sizeof(solution[0]), solsort);

	int sols = 0;
	for (int s = 0; s < solutions; s++) {
		if (solution[s].uniq == prev_uniq) {
			if (solution[s].len > prev_len) { // Filter this entry away!
				solution[s].uniq = 0; // Mark as removed
			}
			else { // Keep this, move down:
				solution[sols++] = solution[s];
			}
		}
		else { // (solution[s].uniq < prev_uniq) This is the first entry for a 
shorter set:
			prev_uniq = solution[s].uniq;
			prev_len = solution[s].len;
			solution[sols++] = solution[s];
		}
	}
	solutions = sols;

	// For each remaining solution, check if any other overlap-1:

	for (int s = 0; s < solutions; s++) {
		for (int k = s+1; k < solutions; k++) {
			if ((solution[s].len == solution[k].len+1) &&
				((solution[s].start == solution[k].start) ||
				 (solution[s].start+1 == solution[k].start)))
				 solution[k].uniq = 0; // Mark as removed
			if (solution[s].len > solution[k].len+1) break;
		}
	}
	// Compact the final solution set:
	sols = 0;
	for (int s = 0; s < solutions; s++) {
		if (solution[s].uniq)
			solution[sols++] = solution[s];
	}
	solutions = sols;

	return solutions;
}

int scan(byte *data, int data_len)
{
	int c = 0; // # of candidate solutions

	// The first 3-byte solution cannot end before the third entry, so
	// the loop below starts at i = 2, i.e. the third byte:
	for (int i = 1; i < data_len; i++) {
		byte b = data[i];
		for (int j = i-1; j >= 0; j--) {
			// A previously seen character cannot end a new longest sequence!
			// It also cannot add to the sequence count for this or any
			// longer string, so stop scanning!
			if (data[j] == b) break;

			// Increment the count of unique values seen when starting from 
position [j]
			int unique_values = ++count[j];
			if (unique_values >= 3) {
				int len = i-j+1;
				if (len <= shortest[unique_values]) {
					save(unique_values, len, j);
					shortest[unique_values] = len;
					c++;
				}
			}
		}
	}
	return c;
}

typedef __int64 int64_t;

int64_t rdtsc(void)
{
	__asm rdtsc
}

int _tmain(int argc, _TCHAR* argv[])
{
	init();
	int64_t t0 = rdtsc();
	int pot = scan(data, data_len);
	int64_t t1 = rdtsc();
	int cnt = filter();
	int64_t t2 = rdtsc();
	// Print the final solutions in opposite order, from shortest to longest:
	printf("len aperture from to value\n");
	for (int s = cnt-1; s >= 0; s--) {
		int start = solution[s].start;
		int len = solution[s].len;
		printf("%3d %6d %5d %3d ", solution[s].uniq, len, start+1, start+len, 
data+start);
		if (len > 20) len = 20;
		for (int s = start; s < start+len; s++) {
			byte b = data[s];
			if ((b >= ' ') && (b < 127))
				printf("%c", b);
			else
				printf("\\%03o", b);
		}
		printf("\n");
	}

	printf("Time to scan: %12I64d (%d potential solutions)\nTime to filter: 
%10I64d (%d filtered solutions)\n",
		t1-t0, pot, t2-t1, cnt);

	return 0;
}


-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/24/2010 6:09:43 AM


The basic idea is to create a simple algorithm that performs
one sequential scan over the input for each sequence length.
There is no backtracking. Instead intermediate results are
stored in a table "found", and if a shorter string with
the same number of unique symbols is found the table is reset.

If all of the data fits into the L1 cache then performing
multiple passes over the data should perform well. Apart from
the input data the program needs 2 extra arrays:

- freq[C], counts the frequency of each symbol in the window
- found[?], as explained above this array contains tentative results.

freq is heavily used. It adds 1K to the amount of data that
needs to be in the L1 cache.
The cache usage of "found" is more complicated. In theory it
can become quite large, but my inituition is that it should
behave quite modest in practice.

During each pass the following variables keep track of the state
of the sliding window:
- l, the left margin
- r, the right margin (inclusive)
- uniq, the number of unique symbols within the window.

When the right margin moves, we bring a new symbol into the window,
and freq[new_symbol] must be incremented. If the new symbol does
not yet occur within the window then "uniq" must be incremented
as well:

#   define inc(c) do { if(!freq[c]) uniq++; freq[c]++; } while(0)

When the left margin moves a symbol falls out of the window and
the opposite happens:

#   define dec(c) do { freq[c]--; if(!freq[c]) uniq--; } while(0)

The macros inc() and dec() and the variable uniq are just a convenience
for bootstrapping the algorithm. With careful coding they can be
optimized away from the final version, sacrificing clarity for raw
speed.

Sliding the window needs to maintain a loop invariant. The invariant
is:
- freq[l] must be 1. If the leftmost symbol would occur elsewhere
  in the window then this cannot possibly be the shortest string
  for the current goal.
- freq[r] must be 1 for the same reason.
- uniq equals the goal of the current pass. Remember that each pass
  searches for one particular sequence length only.

That being said, the outline of the program is:

   Select a large initial window such that the loop invariant holds.
   goal = number of unique symbols in this window.

  do
     do
       slide window from left to right maintaining the invariant
     while not at end of input (i.e. invariant still true)
       if window size <  previous
          dump the found array
          and update apub: the current best value for the window
       if window size <= apub
          add to found array
     od
     print results for sequence length "goal".
  while goal not minimal (see below)
     goal = goal -1
     select an initial window such that the loop invariant holds.
  od

The choice for looping from a large goal downwards is arbitrary,
it seemed the easiest way to tackle the problem. If you start
with the window being the entire input stream then you just need
to shrink the margins until freq[l]=freq[r]=1. Setting goal=uniq
completes the loop invariant.

The important step is:
   slide window from left to right maintaining the invariant

This is done as follows:

#if 0
   1) dec(data[l]); l++;
   2) while (uniq < goal && r<lim)   { r++; inc(data[r]); }
   3) while (freq[data[l]]>1) { dec(data[l]); l++; }
#else
   ...

1) The window must slide by at least 1 symbol.
   We know that the leftmost symbol is unique. If we increment l
   the number of unique symbols in the window therefore decreases
   by 1.
2) We therefore move the right margin until a new unique symbol
   moves into the window. Now equals==goal holds again
3) However, in step 2) we may have skipped over instances of the
   leftmost symbol which is now no longer uniq:

      before     aabcdbx
                  ----     uniq=goal=4       aperture=4
      after 1)   aabcdbx
                   ---     uniq=3 goal-4     aperture=3
      after 2)   aabcdbx
                   -----   uniq=goal=4       aperture=5

      now "b" is no longer unique. We must therefore move the
      left margin even further:

      after 3)   aabcdbx
                    ----   uniq=goal=4       aperture=4

      Now the loop invariant holds again, both margins are
      at a unique symbol.

This is the inner loop of the program. Since we know that
data[l] and data[r] are unique we can get rid of the inc/dec
macros and expand and optimize these three statements into:

...
#else
    1)  freq[data[l]]=0; l++; uniq--;  // data[l] is unique

    2)  while (r<lim) {              // optimize controlling expression of
            unsigned char cr;        // loop
            r++;
            cr = data[r];
            unsigned frc = freq[cr];
            freq[cr] = frc+1;
            if (frc == 0) {
                uniq++;
                break;               // See below
            }
        }
    3)  while (freq[data[l]]>1) {     // no reason to update "uniq"
            freq[data[l]]--;
            l++;
        }
#endif

This is better than the first version. Since only one unique
symbol moved out of the window in step 1) we can break out of
step 2) as soon as we find the first new symbol, i.e. a symbols
whose frequence was zero prior to incrementing.
We can still improve upon this version. The variable uniq is
incremented exactly once and decremented excactly once. We therefore
need not change it at all. (If we hit the end of data with r==lim
then "uniq" no longer serves a purpose).

For the purpose of designing an x86 asm program we now have everyting
in place, the inner loop now uses 4 variables l, r, freq[] and data[].
This will fit into the x86 reqister set, leaving 2 or 3 other registers
for holding intermediate results. Now we're all set to try and beat
the C compiler, which is harder than it seems.

Next we need to implement the remainder of the algorithm outline above:

     print results for sequence length "goal".
  while goal not minimal (see below)
     goal = goal -1
     select an initial window such that the loop invariant holds.
  od

printing the results is straightforward. Initializing the window
for the next iteration of the main loop is similar to sliding the
window: initialize l to 0, move r to the right until we have
enough unique symbols, and move l in case the leftmost character
is no longer uniqe:

        // Now the next shorter sequence.

        goal--;

    1)  inc(data[l]);
    2)  do {
            r++;
            inc(data[r]);
        } while (uniq != goal);

    3)  while (freq[data[l]] != 1) {
            dec(data[l]);
            l++;
        }

Using the same input data as in the previous example:

      after 1)   aabcdbx
                 -         uniq1  goal=3     aperture=1
      after 2)   aabcdbx
                 ----      uniq=3 goal-4     aperture=4
      after 3)   aabcdbx
                  ---      uniq=goal=4       aperture=5

Again, this loop can be heavily optimized to get rid of inc/dec and
uniq, but I'm not doing that in the C prototype, after this is not
the inner loop.



So far so good, but the reader may ask:

   How about the requirement that no sequences that are
   substrings of longer unique sequences must be printed?

Good question! This is another reason why looping downwards seemed
the natural thing to do. Let me quote from my previous posting:

<EUREKA>

In the example it says:

> It is possible to find sub-arrays with 4, 5, or 6 distinct values, but
> they are either longer than the series of 7-in-7 (12-25 contain abcd),
> or are part of the 7-in-7 series and as such they should not be included!

          1         2         3         4         5     5
.....5....0....5....0....5....0....5....0....5....0....56
aaaaaabbbbbabbbcbcccccccddddddddccbbbbbbefffghggggihbfja
           --------------                        -------
            4-in-14                               7-in-7


I did not fully understand why 12-15 was to be rejected, after all
"abcd" is not a substring of "gihbfja".
Now I understand. 7-in-7 contains substrings 6-in-6, 5-in-5, 4-in-4
and so on. These are not to be printed because they are proper
substrings of "gihbfja". However, they define "as short as possible".
for lengths <= 7.

IN OTHER WORDS, ONCE YOU KNOW THE LONGEST N-IN-N SEQUENCE ALL SHORTER
SEQUENCES MUST CONSIST ENTIRELY OF UNIQUE SYMBOLS.

This suggests two different seach strategies, one for long seqences and
one for short sequences.

</EUREKA>

In other words, as long as you do not have a n-in-n sequence yet you
need not test for substrings, and once you have a n-in-n sequence
you can switch to a different algorithm that needs to handle n-in-n
sequences only, which is much easier than the general n-in-m loop.

Enter plan B. Plan B performs ONE scan of the input to find the
remaining n-in-n sequences. Again a sliding window is used, but
this time the invariant is simply:

   - All symbols in the window must be unique.

This is true for l=r. It remains true as long as the symbol just
beyond the right margin does not occur within the window.
Otherwise print the current window if >= 3, and move l until
all symbols are unique again.

1)  // l=r=0

    while (r<lim) {
        unsigned char cl, cr;
        cl = data[l];
        cr = data[r+1];                // Look ahead

        if (freq[cr] != 0) {            // Maximal sequence?
            if (WINSIZE >= 3) {
                printres(data, l, WINSIZE, WINSIZE);
            }
            do {                        // Move left edge until
                cl = data[l];           // we have all symbols unique.
                freq[cl] = 0;
2)              l++;
            } while (cl != cr);
        }
3)      r++;
        freq[cr]=1;
    }

Example:
      after 1)    abcdabcd
                  -
      after 3)    abcdabcd
                  --
      after 3)    abcdabcd
                  --
      after 3)    abcdabcd
                  ---
      after 3)    abcdabcd
                  ----
      Look ahead, print "abcd" & proceed:

      after 2)    abcdabcd
                   ----

Another example:
      after 1)    abcdca
                  -
      after 3)    abcdca
                  --
      after 3)    abcdca
                  ---
      after 3)    abcdca
                  ----

      Look ahead, print "abcd" & proceed:

      after 2)    abcdca
                     --

As for the asm, this C program translates into +- 240 x86 instructions.
The inner loop is 15 instructions. Unfortunately most of these are
tightly interlocked, and there are 6 memory references. The L1
hitrate should be high, but there's not much room for improvement.
Processing the sample went from 9000 to 7000 cycles, but for random
data the gain went down to 2% compare to gcc's code on one CPU and
and -2% one another.
I may play with the asm code some time later, but for now I think
it is not worth bothering. Perhaps Terje is faring better then I.

Here is the next final final version. A TRACE macro is provided.
If you don't have an ansi c++ compiler handy then a trace for the
sample data can be found at http://www.fi.uu.nl/~ftu/trace.txt




-----------------------------------------------------------------------

#include <portos/profile.h>
/*
Here is the final version of my solution.

Let:    N       number of cells, size of the input
        C       number of values, size of symbol set.
        M       the number of distinct values in the largest sequence.
        T       the size of the largest n-in-n sequence, i.e. the
                largest sequence consisting of distinct values only.

The algorithm uses two different search strategies:

Plan A

Find the largest sequence with size M

for n from M downwards search n-in-m sequences, keeping only sequences
with minimal value of m.
Print sequence if n<m. (If n=m the same sequence will be found again
in the next step).

Now n=m=T.

Plan B
Print all maximal n-in-n sequences.

Plan A scans the input 1+(M-T) .. 2+(M-T) times, depending on the
input position of the longest sequence.
Plan B scans the input once.

The running time is therefore bounded by O(N * (3+(M-T)) which
is typically better than or equal to O(N*C).
For small values of N the cost of erasing the frequency array should
also be accounted for.

Two arrays are used:
    found[N]    Intermediate results of plan A.
    freq[C]     Frequency count for all symbols in the sliding window


Finding sequences is done using a straightforward sliding window
scan of the input. The values in freq[] are updated when the window
edges move and the number of unique symbols is updated when a
frequency count changes between zero and non-zero.


The output is unsorted.


*/

#include <stdio.h>
#include <string.h>

#define C 256               // Size of alphabet
//#define ONE 1               // Pascal array bias convention
#define ONE 0               // C array bias convention

struct {
    unsigned len;
    unsigned pos;
    unsigned w;
} output[4096];
unsigned outpos;

static
void printres (unsigned char *data, unsigned pos, unsigned l, unsigned w)
{
#if 0
    unsigned i;
    printf("len %2d aperture %2d pos %2d\t", l, w, ONE+pos);
    for (i=pos; i<pos+w; i++) printf("%c", data[i]);
    printf("\n");
#else
    output[outpos].len = l  ;
    output[outpos].pos = pos;
    output[outpos].w   = w  ;
    outpos++;
#endif
 }

#if 1
void trace(char *comment, unsigned char *data,
         unsigned l, unsigned r, unsigned uniq, unsigned *freq)
{
   unsigned i;
   printf("%s: %d-%d aperture %d unique %d\n", comment, l, r, (r-l)+1, uniq);
   printf("%s\t", data);
   for (i=0; i<C; i++) {
      if (freq[i]) {
         printf("%c=%d ", i, freq[i]);
      }
   }
   printf("\n");

   for (i=0; i<l;  i++) printf(" ");
   for (   ; i<=r; i++) printf("-");
   printf("\n");
}
#define TRACE(x) trace((x), data, l, r, uniq, freq)
#else
#define TRACE(x)
#endif

void findseqs (unsigned char *data, unsigned len)
{
//  printf("%s\n", data);
    outpos=0;
    unsigned i;
    unsigned uniq = 0;              // Nr unique symbols in window
    unsigned apub = ~0;             // Aperture upper bound (aka "m")

    unsigned nrfound = /* to keep gcc happy */ 0;
    unsigned found[len];            // intermediate results

    unsigned freq[C];               // Frequency count
    bzero(freq, sizeof(freq));

#   define inc(c) do { if(!freq[c]) uniq++; freq[c]++; } while(0)
#   define dec(c) do { freq[c]--; if(!freq[c]) uniq--; } while(0)
#   define WINSIZE ((r-l)+1)

    // Start with largest possible sequence: the entire input string

    unsigned l = 0;                 // Start of window
    unsigned r = len-1;             // Inclusive end of window
    unsigned lim = len-1;           // Inclusive end of data

    for (i=0; i<len; i++) {
        inc(data[i]);
    }
    if (uniq < 3) return;

    // Trim left & right edges

    TRACE("Entire input");
    while (freq[data[r]]>1) { dec(data[r]); r--; }
    TRACE("Initial trim r");
    while (freq[data[l]]>1) { dec(data[l]); l++; }
    TRACE("Initial trim l");

    // From here on we want to find only sequences that are shorter or
    // equal to what we have found above:

    unsigned goal = uniq;

    while (1) {

        //  data[l..r] contains "goal" unique symbols
        //  freq[]     number of occurances of each symbol in data[l..r]
        //  data[l]    is a unique symbol
        //  data[r]    is a unique symbol

        TRACE("Main loop");

        if (WINSIZE <= apub) {          // Short enough?

            // A candidate. May be premature if we have shorter sequences
            // with the same nr of unique symbols.

            if (WINSIZE < apub) {
                nrfound = 0;            // Dump old candidates
                apub = WINSIZE;         // Next matches must be <= this one
            }
            found[nrfound] = l;         // Add to list of tentative results
            nrfound++;
        }

        // Slide window to next match.

#if 0   // 30M cycles for 4K random data
        // Note: previous "final" version posted still had a bug here

        dec(data[l]); l++;
        while (uniq < goal && r<lim)   { r++; inc(data[r]); }
        while (freq[data[l]]>1) { dec(data[l]); l++; }
#else
        // 23M cycles for 4K random data
        // The code generated by gcc still hurts my eyes, but this
        // simplified loop is a good starting point for asm.


        freq[data[l]]=0; l++; uniq--;  // data[l] is unique
        TRACE("After slide step 1");


        while (r<lim) {              // optimize controlling expression of
            unsigned char cr;        // loop
            r++;
            cr = data[r];
            unsigned frc = freq[cr];
            freq[cr] = frc+1;
            if (frc == 0) {
                uniq++;
                break;
            }
        }
        TRACE("After slide step 3");
        while (freq[data[l]]>1) {     // no reason to update "uniq"
            freq[data[l]]--;
            l++;
        }
        TRACE("After slide step 3");

#endif

        if (uniq == goal) continue;

        // We've hit the right edge

        bzero(freq, sizeof(freq));      // Reset window & statistics
        uniq = l = r = 0;

        if (apub == goal) break;        // If *only* distinct values proceed
                                        // with plan B.

        for (i=0; i<nrfound; i++) {     // Print the results
            printres(data, found[i], goal, apub);
        }

        // Now the next shorter sequence.

        goal--;

        inc(data[l]);
        do {
            r++;
            inc(data[r]);
        } while (uniq != goal);
        TRACE("Find shorter, after slideR");

        while (freq[data[l]] != 1) {
            dec(data[l]);
            l++;
        }
        TRACE("Find shorter, after slideL");

        // Now the precondition holds. Also, the window size is smaller
        // than the aperture upper bound from the previous iteration.
    }

    // Plan B
    // Now we consider only sequences with all symbols distinct.
    // The freq[] array therefore behaves like a bitmap with values 0&1 only.

    freq[data[l]] = 1;

    while (r<lim) {
        unsigned char cl, cr;
        cl = data[l];
        cr = data[r+1];                // Look ahead

        if (freq[cr] != 0) {            // Maximal sequence?
            if (WINSIZE >= 3) {
                printres(data, l, WINSIZE, WINSIZE);
            }
            do {                        // Move left edge until
                cl = data[l];           // we have all symbols unique.
                freq[cl] = 0;
                l++;
            } while (cl != cr);
        }
        r++;
        freq[cr]=1;
    }

    // Final sequence.

    if (WINSIZE >= 3) {
        printres(data, l, WINSIZE, WINSIZE);
    }
    return;
}

void printresults() {
   // .. qsort "output" if desired
   unsigned i;
   for (i=0; i<outpos; i++) {
      printf("len %3d aperture %3d pos %3d\t",
         output[i].len,
         output[i].w,
         ONE+ output[i].pos);
         printf("\n");
   }
}

int main(int argc, char **argv)
{
    if (argc > 2) {
        fprintf(stderr, "Usage: %s {string}\n", argv[0]);
        return 1;
    }
    if (argc == 1) {
        // read from standard input
        unsigned char buf[4096];
        size_t inlen = fread(buf, sizeof(buf[0]), sizeof(buf), stdin);
        findseqs( buf, (unsigned) inlen);
    }
    if (argc == 2) {
        size_t l = strlen(argv[1]);
        findseqs( (unsigned char *)argv[1], l );
    }
    //printresults();
    return 0;
}
0
Reply free 5/24/2010 7:26:22 AM

"John W Kennedy" <jwkenne@attglobal.net> wrote in message news:4bf9fc55$0$4975$607ed4bc@cv.net...
| On 2010-05-23 22:44:44 -0400, robin said:
|
| > "Shmuel (Seymour J.) Metz" <spamtrap@library.lspace.org.invalid> wrote
| > in message
| > news:4bf92af5$1$fuzhry+tra$mr2ice@news.patriot.net...
| > | In <4bf89ddf$0$89666$c30e37c6@exi-reader.telstra.net>, on 05/23/2010
| > |   at 01:15 PM, "robin" <robin51@dodo.com.au> said:
| > |
| > | >A BEGIN block is treated as if it were a procedure.
| > |
| > | Water is wet.
| >
| > I'm referring to the code that is produced.
| >
| > | >Well, no.  Certainly storage is allocated [for ALLOCATE], but only
| > | >for the named variables.
| > |
| > | The storage for a DSA is pretty much noise;
| >
| > But it's an extra overhead that ALLOCATE doesn't have
| > [apart from the that required for the specificate ALLOCATE].
|
| You don't know what the Hell you're talking about.

Are you just having another bad day? 


0
Reply robin51 (247) 5/24/2010 8:57:13 AM

 >Dick Wesseling wrote:

>The basic idea is to create a simple algorithm that performs
>one sequential scan over the input for each sequence length.
>There is no backtracking. Instead intermediate results are
>stored in a table "found", and if a shorter string with
>the same number of unique symbols is found the table is reset.
>
>If all of the data fits into the L1 cache then performing
>multiple passes over the data should perform well. Apart from
>the input data the program needs 2 extra arrays:
>
>[Snip]
>  
>
I just want to say THANK  YOU to both Dick and Terje for making the
complete program (the current version that is) available.  The discussion
of L1 cache usage also provides valuable insight to what your concepts
are and how they impact the code.  While the percentage of individuals
who actually consider cache usage when an algorithm is being designed
is certainly higher than usual with the followers of newsgroups, I doubt
that it is always a high priority.  It would be interesting to develop some
tools to measure hit rates.  In that regard, could Intel and AMD add
any microcode which would help the user?  Does anyone have any idea
how much extra time (if any) it might take to maintain a hit rate register
for cache hits and misses as part of the microcode for the CPU, just as
it is possible to obtain the number of cycles?

Just a thought, but nothing gets done if the questions are not asked.

Jerome Fine
0
Reply Jerome 5/24/2010 10:42:08 AM

robin wrote:
> "Shmuel (Seymour J.) Metz" <spamtrap@library.lspace.org.invalid> wrote in message 
> news:4bf92af5$1$fuzhry+tra$mr2ice@news.patriot.net...
> | In <4bf89ddf$0$89666$c30e37c6@exi-reader.telstra.net>, on 05/23/2010
> |   at 01:15 PM, "robin" <robin51@dodo.com.au> said:
> |
> | >A BEGIN block is treated as if it were a procedure.
> |
> | Water is wet.
> 
> I'm referring to the code that is produced.
> 
> | >Well, no.  Certainly storage is allocated [for ALLOCATE], but only
> | >for the named variables.
> |
> | The storage for a DSA is pretty much noise;
> 
> But it's an extra overhead that ALLOCATE doesn't have
> [apart from the that required for the specificate ALLOCATE].
> 

OTOH, although I haven't looked at this in detail, CONTROLLED storage 
has interactions with Tasking (maybe not with the Enterprise compilers) 
that adds quite another level of overhead.  It makes things much less 
simple than just push and pop.

0
Reply Peter_Flass (934) 5/24/2010 10:57:18 AM

Jerome H. Fine wrote:
> I just want to say THANK YOU to both Dick and Terje for making the
> complete program (the current version that is) available. The discussion
> of L1 cache usage also provides valuable insight to what your concepts
> are and how they impact the code. While the percentage of individuals
> who actually consider cache usage when an algorithm is being designed
> is certainly higher than usual with the followers of newsgroups, I doubt
> that it is always a high priority. It would be interesting to develop some
> tools to measure hit rates. In that regard, could Intel and AMD add
> any microcode which would help the user? Does anyone have any idea
> how much extra time (if any) it might take to maintain a hit rate register
> for cache hits and misses as part of the microcode for the CPU, just as
> it is possible to obtain the number of cycles?

This has existed since the original Pentium, in the form of the EMON 
counters!

These were "secret" back then, but I reverse engineered them and wrote a 
BYTE article which was published in the summer of 1994. On all later x86 
cpu models the EMON counters, which are different for pretty much every 
cpu generation, have been documented in the cpu manuals.

And Yes, L1 hit/miss rates are one of the most basic counters, included 
on every single version since the original one. :-)

Today the easiest way to access them under Win* is to use Intel's VTune, 
but there are similar (and free!) tools available under Linux.

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/24/2010 1:48:05 PM

"John W Kennedy" <jwkenne@attglobal.net> wrote in message news:4bf9fc55$0$4975$607ed4bc@cv.net...
| On 2010-05-23 22:44:44 -0400, robin said:
|
| > "Shmuel (Seymour J.) Metz" <spamtrap@library.lspace.org.invalid> wrote
| > in message
| > news:4bf92af5$1$fuzhry+tra$mr2ice@news.patriot.net...
| > | In <4bf89ddf$0$89666$c30e37c6@exi-reader.telstra.net>, on 05/23/2010
| > |   at 01:15 PM, "robin" <robin51@dodo.com.au> said:
| > |
| > | >A BEGIN block is treated as if it were a procedure.
| > |
| > | Water is wet.
| >
| > I'm referring to the code that is produced.
| >
| > | >Well, no.  Certainly storage is allocated [for ALLOCATE], but only
| > | >for the named variables.
| > |
| > | The storage for a DSA is pretty much noise;
| >
| > But it's an extra overhead that ALLOCATE doesn't have
| > [apart from the that required for the specificate ALLOCATE].
|
| You don't know what the Hell you're talking about.

The time to execute 1,000,000 ALLOCATE and FREE statements
for a vector of size N on my PC, is 1.4 secs.

The time to do the same with a BEGIN block in which
a vector of size N is declared, is 79 secs.

N = 1,000,000.

BEGIN takes more than 50 times longer.

So who doesn't know what the hell he's talking about?
But I knew that answer before conducting the test. 


0
Reply robin51 (247) 5/24/2010 3:36:50 PM

In comp.lang.pl1 Jerome H. Fine <jhfinedp3k@nospicedham.dp3knospamcompsys.to.dotsrc.org> wrote:
(snip)

> I just want to say THANK  YOU to both Dick and Terje for making the
> complete program (the current version that is) available.  The discussion
> of L1 cache usage also provides valuable insight to what your concepts
> are and how they impact the code.  While the percentage of individuals
> who actually consider cache usage when an algorithm is being designed
> is certainly higher than usual with the followers of newsgroups, I doubt
> that it is always a high priority.  

The only place I know it commonly done is matrix multiply and
matrix transpose.  In both cases, one matrix is traversed in the
"wrong" direction (for the cache), and so the effect is easily seen.

> It would be interesting to develop some
> tools to measure hit rates.  In that regard, could Intel and AMD add
> any microcode which would help the user?  Does anyone have any idea
> how much extra time (if any) it might take to maintain a hit rate register
> for cache hits and misses as part of the microcode for the CPU, just as
> it is possible to obtain the number of cycles?

-- glen
0
Reply glen 5/24/2010 4:12:30 PM

In comp.lang.pl1 Terje Mathisen <"terje.mathisen at tmsw.no"@giganews.com> wrote:
>> Does anyone have any idea how much extra time (if any) it might 
>> take to maintain a hit rate register for cache hits and misses 
>> as part of the microcode for the CPU, just as it is possible 
>> to obtain the number of cycles?
 
> This has existed since the original Pentium, in the form of the EMON 
> counters!

Otherwise, as far as I understand it, it was first done with emulation.

There is somewhere the description of the work IBM did for the 360/85,
I believe the first machine with cache.  (Before the term cache.)

They used emulation or simulation to measure the data access patterns
of actual code to determine the expected hit rate for the cache.
That was somewhere around 1968.  (Presumably earlier, if the machine
came out in 1968.)

-- glen
0
Reply glen 5/24/2010 4:17:21 PM

On 2010-05-24 06:57:18 -0400, Peter Flass said:

> robin wrote:
>> "Shmuel (Seymour J.) Metz" <spamtrap@library.lspace.org.invalid> wrote 
>> in message news:4bf92af5$1$fuzhry+tra$mr2ice@news.patriot.net...
>> | In <4bf89ddf$0$89666$c30e37c6@exi-reader.telstra.net>, on 05/23/2010
>> |   at 01:15 PM, "robin" <robin51@dodo.com.au> said:
>> |
>> | >A BEGIN block is treated as if it were a procedure.
>> |
>> | Water is wet.
>> 
>> I'm referring to the code that is produced.
>> 
>> | >Well, no.  Certainly storage is allocated [for ALLOCATE], but only
>> | >for the named variables.
>> |
>> | The storage for a DSA is pretty much noise;
>> 
>> But it's an extra overhead that ALLOCATE doesn't have
>> [apart from the that required for the specificate ALLOCATE].
>> 
> 
> OTOH, although I haven't looked at this in detail, CONTROLLED storage 
> has interactions with Tasking (maybe not with the Enterprise compilers) 
> that adds quite another level of overhead.  It makes things much less 
> simple than just push and pop.

Yes, the old PL/I tasking model imposed a great deal of strangeness and 
overhead (that alone should be enough to dispose of the old canard that 
PL/I was designed specifically for OS/360), and IBM's final approach to 
solving the problem was simply to dump it completely and replace it 
with little more than a direct interface to the underlying operating 
system's thread logic.

-- 
John W Kennedy
"Sweet, was Christ crucified to create this chat?"
  -- Charles Williams.  "Judgement at Chelmsford"

0
Reply jwkenne (1358) 5/24/2010 6:38:50 PM

On 2010-05-24 08:14:11 -0400, David W Noon said:

> -----BEGIN PGP SIGNED MESSAGE-----
> 
> Hash: SHA1
> 
> 
> 
> On Sun, 23 May 2010 23:46:56 -0400, John W Kennedy wrote about Re:
> 
> Optimization problem:
> 
> 
> 
> >On 2010-05-22 07:54:13 -0400, David W Noon said:
> 
> [snip]
> 
> >> Of course, this is a somewhat exotic use case.  But when confronted
> 
> >> by such a requirement, I want CONTROLLED storage.
> 
> >
> 
> >But that is a way to force a language designed in the 60s, implemented 
> 
> >on an operating system designed in the 60s,* to do what more modern 
> 
> >languages do with modules or packages, and still more modern languages 
> 
> >do with objects.
> 
> 
> 
> The 1960s have nothing to do with whether a software design is good or
> 
> bad.

"Nothing", forsooth? As though decades of hindsight are worthless?

> Declaring a variable at package scope effectively makes it STATIC.

"Static" in some sense or other, but in a modern operating system, 
"static" has very different implications from those in OS/360. (This is 
why the MVS C prelinker was invented -- the existing pseudo-register 
kludge was not adequate for porting Unix C programs.) Only a year or 
two after OS/360, TSS/360 introduced PSECTs to avoid the problems in 
reentrance combined with static data; I don't know whether the TSS PL/I 
compiler took advantage of them -- probably not.

In any case, using a package or object also makes clear the distinction 
between things that ought to be stateful and things that ought not.

-- 
John W Kennedy
"Though a Rothschild you may be
In your own capacity,
    As a Company you've come to utter sorrow--
But the Liquidators say,
'Never mind--you needn't pay,'
    So you start another company to-morrow!"
  -- Sir William S. Gilbert.  "Utopia Limited"

0
Reply jwkenne (1358) 5/24/2010 6:51:54 PM

On Sun, 16 May 2010 18:28:26 +0000
Robert AH Prins <spamtrap@prino.org> wrote:

> Hi all,
> 
> Can anyone give me some hints as to solve the following problem, 
> preferably in a way that is faster than the way I used to do it, and 
> without the bug in the current version;

That depends on if you are using hyperthreaded CPU (cheaper variant)
or real multicore?

Greets1


-- 
http://maxa.homedns.org/

Sometimes online sometimes not

 Svima je "dozvoljeno" biti idiot i
> mrak, ali samo neki to odaberu,  



0
Reply Branimir 5/24/2010 8:12:29 PM

John W Kennedy wrote:
> On 2010-05-24 08:14:11 -0400, David W Noon said:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>>
>> Hash: SHA1
>>
>>
>>
>> On Sun, 23 May 2010 23:46:56 -0400, John W Kennedy wrote about Re:
>>
>> Optimization problem:
>>
>>
>>
>> >On 2010-05-22 07:54:13 -0400, David W Noon said:
>>
>> [snip]
>>
>> >> Of course, this is a somewhat exotic use case.  But when confronted
>>
>> >> by such a requirement, I want CONTROLLED storage.
>>
>> >
>>
>> >But that is a way to force a language designed in the 60s, implemented
>> >on an operating system designed in the 60s,* to do what more modern
>> >languages do with modules or packages, and still more modern languages
>> >do with objects.
>>
>>
>>
>> The 1960s have nothing to do with whether a software design is good or
>>
>> bad.
> 
> "Nothing", forsooth? As though decades of hindsight are worthless?
> 
>> Declaring a variable at package scope effectively makes it STATIC.
> 
> "Static" in some sense or other, but in a modern operating system, 
> "static" has very different implications from those in OS/360. (This is 
> why the MVS C prelinker was invented -- the existing pseudo-register 
> kludge was not adequate for porting Unix C programs.) Only a year or two 
> after OS/360, TSS/360 introduced PSECTs to avoid the problems in 
> reentrance combined with static data; I don't know whether the TSS PL/I 
> compiler took advantage of them -- probably not.

I think it did -- Bitsavers has the TSS PL/I manual.  This is somewhat 
unfortunate, since the TSS PL/I compiler is apparently copyrighted and 
therefore unavailable.

> 
> In any case, using a package or object also makes clear the distinction 
> between things that ought to be stateful and things that ought not.
> 
0
Reply Peter_Flass (934) 5/24/2010 10:39:29 PM

Peter Flass wrote:
....
> 
> OTOH, although I haven't looked at this in detail, CONTROLLED storage 
> has interactions with Tasking (maybe not with the Enterprise compilers) 
> that adds quite another level of overhead.  It makes things much less 
> simple than just push and pop.
> 
With the F and Optimizing compilers, each task had its own allocation stack, emanating from the 
pseudo register vector, for each controlled variable known within that task. It is not possible for 
one task to refer directly to a generation of a controled variable allocated in another task. Any 
sharing of controlled data had to be done by parameter passing or pointers, and any needed 
synchronization or mutual exclusion was the responsibility of the programmer. If no sharing is 
taking place there is no tasking-related overhead.

With the enterprise compiler, vaos390, and the workstation products, there is one system wide 
allocation stack for each controller variable. Each thread can reference the current generation of 
all controlled variables known at the point of reference, as well as allocate new generations or 
free existing ones. Again any required synchronization or mutual exclusion is the programmer's 
responsibility. In the absence of sharing, there is no thread-related overhead.
0
Reply jjw (604) 5/25/2010 2:19:33 AM

On 2010-05-21 06:56, Terje Mathisen wrote:
> James J. Weinkam wrote:
>> Robert AH Prins wrote:
>>>
>>> His code is in Pascal, and until I started running it for parts of the
>>> input string I had had blind faith in it. Wrong, as it turns out as
>>> there are three strings that it does not process correctly, and all my
>>> debugging hasn't given me any clues as to the why.
>>>
>> Can you send a copy of the data set that fails? Also, if it isn't prying
>> into your business, would you describe the application this problem is
>> abstracted from? You indicated in your original post that the values
>> (characters in the posted esample) were actually indices into a table in
>> the real application.
>
> I would also love to see that, I really do believe that my current
> approach is sound.

Group 61 - the x-marked 3-string (FEG) is not found by the Paul Green code:

61  1 A
61  2 A
61  3 B
61  4 C
61  5 D
61  6 E
61  7 F x
61  8 E x
61  9 G x
61 10 G
61 11 G
61 12 G
61 13 G
61 14 G
61 15 G
61 16 G
61 17 G
61 18 G
61 19 H
61 20 I
61 21 C
61 22 D
61 23 A
61 24 A
61 25 A
61 26 A
61 27 A

Group 69, again the x-marked 3-string (HIJ) is not found

69  1 A
69  2 A
69  3 A
69  4 A
69  5 A
69  6 B
69  7 B
69  8 C
69  9 C
69 10 D
69 11 E
69 12 F
69 13 G
69 14 H
69 15 H
69 16 H
69 17 H x
69 18 I x
69 19 J x
69 20 J
69 21 J

Will extract the code from the Pascal program, and get it working RSN, 
bit tired from travel, it takes as long to get from Vilnius to Charleroi 
as from Charleroi to Oostende.

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply Robert 5/25/2010 10:35:24 PM

On 2010-05-23 08:37, Terje Mathisen wrote:
> Dick Wesseling wrote:
>> In article<11lic7-2u52.ln1@ntp.tmsw.no>,
>> Terje Mathisen<"terje.mathisen at tmsw.no"@giganews.com> writes:
>>> Still, running at 130+ cycles per input byte is still fast enough to
>>> handle significant amounts of data.
>>>
>>
>> Sounds like you've beaten me. My O() depends on the size of the longest
>> sequence minus the size of the longest unique sequence, which is good
>> for the reference data. If I time 4K of random data I get 7250 cycles per
>> input byte. Since sqrt(C) is 16 for that dataset you ought to get about
>> 800 cycles/byte ( http://www.fi.uu.nl/~ftu/random.bin )
>
> It is worse (or better, seen from my point):
>
> With 4K random input the scan took 252K cycles, which means that I
> averaged about 83 cycles/byte. (I.e. very fast!)
>
> The problem was that with totally random data, there are a _lot_ of both
> potential and real solutions, I found 4099 candidates (i.e. just over
> one per starting byte!) and 121 after filtering.
>
> The filter process here took a _lot_ of time though: 8M cycles!
>
> This means that it needed 2000 cycles/input byte. :-(
>
> This simply shows that the problem space (and therefore required
> algorithm as well) almost certainly won't contain totally random data,
> and this will most probably help my code...

The data is in fact not very random, but you'll be hard pressed to 
figure out what it represents. Here is the current full set, represented 
by 66

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+-/*

printable chars:

aaabbbbbbbccccadddddddddddddddddddddddddddddddeaaaaaaaabbbcacddddeeeee
eeeebbeebeeeeeefeeegggdggggggggggggggddddddddddddddddccbbbbhbbbbibjjjj
kbbabaaaaaacbcccaddddddcccbaaaaaaaabbbbablmkkkbnnnnnnonnnnbkkkkkkllibb
ibaaaaaaapbbbbccddadddaddddgccccccccccaaaaaaaaaabbbbbabbbdbdddddddcccc
ccccddbbbbbbqlllnhnnnnrhblkaabpbbbbbllknnnnnbbaaaaaabbbbnkaaalbllllhln
nnnsinnkkkkhhpblbbbbbcccbdgdaaaaabbbbbfbtknnnnnnnknbbbbbaaaaaaabbbbucg
gdbhhhhhhhhhhhhhhhhohhhhhhhhrhhhhhhhhhhhhbhhhinjnnnnaaaaaaaiiiiiioojoa
aaaooooojjjjiiiiiioviiiaaaaaabbibrrrrrrrrrrrrrrrnnnnnbrrrrbbaaaiiiiiii
iiiiiiiiiiiiiiiiiiwiiiiiiiiixyiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iihiiiiiiziiiiiiiiiiiooaaaaaaabbbbfbbbrrrrgginnralarabbaaaiiiiiiiiAiii
iiibiiiiiiiiiiiiiiiiiiiiiiiibiiiiiiiiiiiiiiiiiiiiiiiiiiabbbblbrrairrbr
lnnnjnirrrblbbjaaaaabbbbbaiiiiiiiiiiiiiiiBiiiiiiiiiiriiiiiiiiiiiiiiiii
iiiiiiiiiiiiiiiiiiiiiiiCiiiiiiiiiiiiiiiriiiiiiiiiiiiibiiiiiiiiiiiiiiii
aaaDbbbbbbbrrriaiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiriiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiviiiiiiiiiiiiiiiiiiiiiiabhbiblelrrribnbbbo
rrrrabEbbbadcdddddbbbbbbbbBbBbbbaaaaaaiiiiiiijiiiioiiiiaaFahbbbbbbbbbb
bbbbbbbbbbbbbbbGbbbbbbbbbbrrrrrnlrHcbbrbbbaiiiiiiIiiiicoaaaaabbbbbibab
bbbybbbabbbbbbooooaiiiiiiiiiiiiiiiiiiaaaabalbbbbbbblbnrrrrcbbbaiiiiixi
ziiiiiiiiaaaaaabbbabrJbinnaiiiiaaaaaabbGJKLLLLLMLdddddddaaaaaaioiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiNiiiiiiiiiiiiiifiiibbhcbbacccbccO
bbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKgdd
dgSdddddjcccccccccjaaaaaaafabbNGGLLLLLLLLGGGrbbbaoiiiiiiiiiiiiiiiaaiii
iibbbbbGGizzzziiiTiiiiijbNbLaGGGGKLLLLLLLLLLDDDDDDDDDDDDDDDDaaaaaabbbG
GGGGGGGLLLLGbaaaaaoooobiaiaoiiiiiiiiiiibaMbbUbGGGLLLLLLGGGGGGNbbooiiii
iiiiiiijibmabbGLLLLLLVbbbioiiiiiiiiiiiiiiiiiiiWiiiiiiXiaaaoooiaiiiiiii
iaaaaaGaVaYGZLLLLLLL0KGGGGGGGGbbooooiaiiiiii1iooaabbGaGGGGGGLLLLLLLLbG
GGGGGbaaaaajwiiiiiiiiiihaoGCGLLLLLLLLLLKbaoiiiiiiioabGLLLLLLLLEGGGbbob
ooon2iiiiijjjiooVbVbr3rrbrHHbalhGbbGGGLLLbbb4baaoaiiiiwiiYiiiiiiibaaaa
aaGGLLLLLLLLGGGGGGGGVGGVGbbbaaNbooiaiiiii0ooaYLLLLKGGGGGGLihiiiiijjoog
5zabbbbGLLLiiiiiiiiziiiiiiooaaaabbbGbooboooommmmoooooooaaoaoiooowooabm
mmmBBBBBBBBBI6jjIjLjoC7jjbmoaoooiooEEE0000000G8bFb0oooommmoMmm9Yoooooo
iGbbbbbaaaaao+oobGbbbLGLLLLLbbbjiiooobbGbbmo-ooaoJbbbbVbbbbbb/baoooooo
o*bbbbbbbbbbbbbbbbbbaotoommooojaoooo-ooahbbbbbbbaaa3ooooobbllbbrbbbbb*
bmaoooooobobbbbbooo (2329 characters)

The top-3 account for 58.3 % of all values, the top-10 for 85.2%.

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply spamtrap14 (66) 5/25/2010 11:04:58 PM

In article <862qk6Fe5gU1@mid.individual.net>,
	Robert AH Prins <spamtrap@prino.org> writes:

> Group 61 - the x-marked 3-string (FEG) is not found by the Paul Green code:

What a relief, mine works:

5641 cycles 6 solutions
AABCDEFEGGGGGGGGGGHICDAAAAA
 len  win  pos
   9   19    1 ABCDEFEGGGGGGGGGGHI
   8   17    6 FEGGGGGGGGGGHICDA
   7    8    1 ABCDEFEG
   6    6    1 ABCDEF
   3    3    6 FEG                      <there it is!
   6    6   17 GHICDA

> Group 69, again the x-marked 3-string (HIJ) is not found

5630 cycles 6 solutions
AAAAABBCCDEFGHHHHIJJJ
 len  win  pos
  10   15    4 ABBCCDEFGHHHHIJ
   9   13    6 BCCDEFGHHHHIJ
   8   10    4 ABBCCDEFGH
   7    8    6 BCCDEFGH
   6    6    8 CDEFGH
   3    3   16 HIJ                      <there it is!

> Will extract the code from the Pascal program, and get it working RSN,
> bit tired from travel, it takes as long to get from Vilnius to Charleroi
> as from Charleroi to Oostende.

If it isn't asking too much, once you've taken some rest would you mind
posting one of the longer (2K or so) data sets?
Earlier I had to disappoint the readers of clax by saying that it wasn't
worth bothering about an asm version because it outperformed the C
version by a small margin only. However, that was for random data.
Random data suffocates branch prediction causing _any_ version to come
to a grinding halt (*).

For real data such as group 61 and 69 and the original example the asm
outperformed the high level version by 30% or so. Still not something to
get excited about, but I'm curious how it would perform on data sets
that are both real and large.

(*) I expect professor Weinkam's solution to perform much better for
random data. If my understanding is correct most of the time is spend
in:

  do i=c-1 to 1 by -1; a(i,*)=min(32766,a(i+1,*)+1); a(i,x(i))=1; end;

a branchless implementation of min() should work fine, and even a
branching one should should have a 99% branch prediction success rate
because overflow will happen only for very large data sets.
0
Reply free 5/26/2010 3:48:27 AM

Dick Wesseling wrote:
> In article<862qk6Fe5gU1@mid.individual.net>,
> 	Robert AH Prins<spamtrap@prino.org>  writes:
>
>> Group 61 - the x-marked 3-string (FEG) is not found by the Paul Green code:
>
> What a relief, mine works:
>
> 5641 cycles 6 solutions
> AABCDEFEGGGGGGGGGGHICDAAAAA
>   len  win  pos
>     9   19    1 ABCDEFEGGGGGGGGGGHI
>     8   17    6 FEGGGGGGGGGGHICDA
>     7    8    1 ABCDEFEG
>     6    6    1 ABCDEF
>     3    3    6 FEG<there it is!
>     6    6   17 GHICDA

I got these:
C:\>\c2\unique-strings\Release\unique-strings.exe
len aperture from to value
   3      3     7   9 FEG
   6      6    18  23 GHICDA
   6      6     2   7 ABCDEF
   7      8     2   9 ABCDEFEG
   8     17     7  23 FEGGGGGGGGGGHICDA
   9     19     2  20 ABCDEFEGGGGGGGGGGHI
Time to scan:         3278 (26 potential solutions)
Time to filter:      10736 (6 filtered solutions)
>
>> Group 69, again the x-marked 3-string (HIJ) is not found
>
> 5630 cycles 6 solutions
> AAAAABBCCDEFGHHHHIJJJ
>   len  win  pos
>    10   15    4 ABBCCDEFGHHHHIJ
>     9   13    6 BCCDEFGHHHHIJ
>     8   10    4 ABBCCDEFGH
>     7    8    6 BCCDEFGH
>     6    6    8 CDEFGH
>     3    3   16 HIJ<there it is!

and so did I:

len aperture from to value
   3      3    17  19 HIJ
   6      6     9  14 CDEFGH
   7      8     7  14 BCCDEFGH
   8     10     5  14 ABBCCDEFGH
   9     13     7  19 BCCDEFGHHHHIJ
  10     15     5  19 ABBCCDEFGHHHHIJ
Time to scan:         2453 (25 potential solutions)
Time to filter:       9735 (6 filtered solutions)


I have however found a _much_ faster way to filter my potential 
solutions, I believe I can make it run in close to linear time. :-)

Terje
>
>> Will extract the code from the Pascal program, and get it working RSN,
>> bit tired from travel, it takes as long to get from Vilnius to Charleroi
>> as from Charleroi to Oostende.
>
> If it isn't asking too much, once you've taken some rest would you mind
> posting one of the longer (2K or so) data sets?
> Earlier I had to disappoint the readers of clax by saying that it wasn't
> worth bothering about an asm version because it outperformed the C
> version by a small margin only. However, that was for random data.
> Random data suffocates branch prediction causing _any_ version to come
> to a grinding halt (*).
>
> For real data such as group 61 and 69 and the original example the asm
> outperformed the high level version by 30% or so. Still not something to
> get excited about, but I'm curious how it would perform on data sets
> that are both real and large.
>
> (*) I expect professor Weinkam's solution to perform much better for
> random data. If my understanding is correct most of the time is spend
> in:
>
>    do i=c-1 to 1 by -1; a(i,*)=min(32766,a(i+1,*)+1); a(i,x(i))=1; end;
>
> a branchless implementation of min() should work fine, and even a
> branching one should should have a 99% branch prediction success rate
> because overflow will happen only for very large data sets.


-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/26/2010 6:23:32 AM

Robert AH Prins wrote:
> On 2010-05-23 08:37, Terje Mathisen wrote:
>> Dick Wesseling wrote:
>>> In article<11lic7-2u52.ln1@ntp.tmsw.no>,
>>> Terje Mathisen<"terje.mathisen at tmsw.no"@giganews.com> writes:
>>>> Still, running at 130+ cycles per input byte is still fast enough to
>>>> handle significant amounts of data.
>>>>
>>>
>>> Sounds like you've beaten me. My O() depends on the size of the longest
>>> sequence minus the size of the longest unique sequence, which is good
>>> for the reference data. If I time 4K of random data I get 7250 cycles
>>> per
>>> input byte. Since sqrt(C) is 16 for that dataset you ought to get about
>>> 800 cycles/byte ( http://www.fi.uu.nl/~ftu/random.bin )
>>
>> It is worse (or better, seen from my point):
>>
>> With 4K random input the scan took 252K cycles, which means that I
>> averaged about 83 cycles/byte. (I.e. very fast!)
>>
>> The problem was that with totally random data, there are a _lot_ of both
>> potential and real solutions, I found 4099 candidates (i.e. just over
>> one per starting byte!) and 121 after filtering.
>>
>> The filter process here took a _lot_ of time though: 8M cycles!
>>
>> This means that it needed 2000 cycles/input byte. :-(
>>
>> This simply shows that the problem space (and therefore required
>> algorithm as well) almost certainly won't contain totally random data,
>> and this will most probably help my code...
>
> The data is in fact not very random, but you'll be hard pressed to
> figure out what it represents. Here is the current full set, represented
> by 66
>
> abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+-/*
>
> printable chars:
>
> aaabbbbbbbccccadddddddddddddddddddddddddddddddeaaaaaaaabbbcacddddeeeee
> eeeebbeebeeeeeefeeegggdggggggggggggggddddddddddddddddccbbbbhbbbbibjjjj
> kbbabaaaaaacbcccaddddddcccbaaaaaaaabbbbablmkkkbnnnnnnonnnnbkkkkkkllibb
> ibaaaaaaapbbbbccddadddaddddgccccccccccaaaaaaaaaabbbbbabbbdbdddddddcccc
> ccccddbbbbbbqlllnhnnnnrhblkaabpbbbbbllknnnnnbbaaaaaabbbbnkaaalbllllhln
> nnnsinnkkkkhhpblbbbbbcccbdgdaaaaabbbbbfbtknnnnnnnknbbbbbaaaaaaabbbbucg
> gdbhhhhhhhhhhhhhhhhohhhhhhhhrhhhhhhhhhhhhbhhhinjnnnnaaaaaaaiiiiiioojoa
> aaaooooojjjjiiiiiioviiiaaaaaabbibrrrrrrrrrrrrrrrnnnnnbrrrrbbaaaiiiiiii
> iiiiiiiiiiiiiiiiiiwiiiiiiiiixyiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
> iihiiiiiiziiiiiiiiiiiooaaaaaaabbbbfbbbrrrrgginnralarabbaaaiiiiiiiiAiii
> iiibiiiiiiiiiiiiiiiiiiiiiiiibiiiiiiiiiiiiiiiiiiiiiiiiiiabbbblbrrairrbr
> lnnnjnirrrblbbjaaaaabbbbbaiiiiiiiiiiiiiiiBiiiiiiiiiiriiiiiiiiiiiiiiiii
> iiiiiiiiiiiiiiiiiiiiiiiCiiiiiiiiiiiiiiiriiiiiiiiiiiiibiiiiiiiiiiiiiiii
> aaaDbbbbbbbrrriaiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiriiiiiii
> iiiiiiiiiiiiiiiiiiiiiiiiiiiiviiiiiiiiiiiiiiiiiiiiiiabhbiblelrrribnbbbo
> rrrrabEbbbadcdddddbbbbbbbbBbBbbbaaaaaaiiiiiiijiiiioiiiiaaFahbbbbbbbbbb
> bbbbbbbbbbbbbbbGbbbbbbbbbbrrrrrnlrHcbbrbbbaiiiiiiIiiiicoaaaaabbbbbibab
> bbbybbbabbbbbbooooaiiiiiiiiiiiiiiiiiiaaaabalbbbbbbblbnrrrrcbbbaiiiiixi
> ziiiiiiiiaaaaaabbbabrJbinnaiiiiaaaaaabbGJKLLLLLMLdddddddaaaaaaioiiiiii
> iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiNiiiiiiiiiiiiiifiiibbhcbbacccbccO
> bbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKgdd
> dgSdddddjcccccccccjaaaaaaafabbNGGLLLLLLLLGGGrbbbaoiiiiiiiiiiiiiiiaaiii
> iibbbbbGGizzzziiiTiiiiijbNbLaGGGGKLLLLLLLLLLDDDDDDDDDDDDDDDDaaaaaabbbG
> GGGGGGGLLLLGbaaaaaoooobiaiaoiiiiiiiiiiibaMbbUbGGGLLLLLLGGGGGGNbbooiiii
> iiiiiiijibmabbGLLLLLLVbbbioiiiiiiiiiiiiiiiiiiiWiiiiiiXiaaaoooiaiiiiiii
> iaaaaaGaVaYGZLLLLLLL0KGGGGGGGGbbooooiaiiiiii1iooaabbGaGGGGGGLLLLLLLLbG
> GGGGGbaaaaajwiiiiiiiiiihaoGCGLLLLLLLLLLKbaoiiiiiiioabGLLLLLLLLEGGGbbob
> ooon2iiiiijjjiooVbVbr3rrbrHHbalhGbbGGGLLLbbb4baaoaiiiiwiiYiiiiiiibaaaa
> aaGGLLLLLLLLGGGGGGGGVGGVGbbbaaNbooiaiiiii0ooaYLLLLKGGGGGGLihiiiiijjoog
> 5zabbbbGLLLiiiiiiiiziiiiiiooaaaabbbGbooboooommmmoooooooaaoaoiooowooabm
> mmmBBBBBBBBBI6jjIjLjoC7jjbmoaoooiooEEE0000000G8bFb0oooommmoMmm9Yoooooo
> iGbbbbbaaaaao+oobGbbbLGLLLLLbbbjiiooobbGbbmo-ooaoJbbbbVbbbbbb/baoooooo
> o*bbbbbbbbbbbbbbbbbbaotoommooojaoooo-ooahbbbbbbbaaa3ooooobbllbbrbbbbb*
> bmaoooooobobbbbbooo (2329 characters)
>
> The top-3 account for 58.3 % of all values, the top-10 for 85.2%.

Here's my results, with the longer result strings truncated:
len aperture from to value
   3      3  2291 2293 a3o
   3      3  2271 2273 jao
   3      3  2270 2272 oja
   3      3  2241 2243 o*b
   3      3  2201 2203 bji
   3      3  2191 2193 bLG
   3      3  2186 2188 obG
   3      3  2182 2184 ao+
   3      3  2159 2161 oMm
   3      3  2158 2160 moM
   3      3  2117 2119 IjL
   3      3  2089 2091 aoi
   3      3  2066 2068 Gbo
   3      3  2037 2039 bGL
   3      3  2010 2012 LKG
   3      3  2001 2003 i0o
   3      3  1994 1996 oia
   3      3  1984 1986 VGb
   3      3  1955 1957 iba
   3      3  1939 1941 oai
   3      3  1935 1937 4ba
   3      3  1915 1917 brH
   3      3  1906 1908 oVb
   3      3  1903 1905 jio
   3      3  1882 1884 LEG
   3      3  1848 1850 CGL
   3      3  1825 1827 Gba
   3      3  1818 1820 LbG
   3      3  1802 1804 bGa
   3      3  1795 1797 1io
   3      3  1786 1788 oia
   3      3  1757 1759 GaV
   3      3  1741 1743 oia
   3      3  1734 1736 Xia
   3      3  1705 1707 bio
   3      3  1701 1703 LVb
   3      3  1694 1696 bGL
   3      3  1691 1693 mab
   3      3  1671 1673 GNb
   3      3  1655 1657 UbG
   3      3  1651 1653 aMb
   3      3  1637 1639 aoi
   3      3  1636 1638 iao
   3      3  1573 1575 GKL
   3      3  1549 1551 Giz
   3      3  1514 1516 Grb
   3      3  1500 1502 bNG
   3      3  1497 1499 fab
   3      3  1488 1490 cja
   3      3  1478 1480 djc
   3      3  1472 1474 gSd
   3      3  1471 1473 dgS
   3      3  1455 1457 biG
   3      3  1437 1439 bao
   3      3  1399 1401 cOb
   3      3  1392 1394 bac
   3      3  1389 1391 hcb
   3      3  1388 1390 bhc
   3      3  1322 1324 aio
   3      3  1308 1310 MLd
   3      3  1286 1288 nai
   3      3  1259 1261 xiz
   3      3  1252 1254 bai
   3      3  1248 1250 rcb
   3      3  1233 1235 alb
   3      3  1232 1234 bal
   3      3  1208 1210 oai
   3      3  1187 1189 iba
   3      3  1162 1164 bai
   3      3  1151 1153 rnl
   3      3  1049 1051 bor
   3      3  1039 1041 elr
   3      3  1034 1036 hbi
   3      3   924 926 ria
   3      3   913 915 aDb
   3      3   795 797 bai
   3      3   784 786 bja
   3      3   780 782 rbl
   3      3   765 767 air
   3      3   764 766 rai
   3      3   761 763 lbr
   3      3   755 757 iab
   3      3   682 684 rab
   3      3   680 682 lar
   3      3   674 676 gin
   3      3   589 591 xyi
   3      3   588 590 ixy
   3      3   543 545 nbr
   3      3   522 524 ibr
   3      3   509 511 ovi
   3      3   508 510 iov
   3      3   488 490 joa
   3      3   400 402 knb
   3      3   377 379 gda
   3      3   354 356 sin
   3      3   353 355 nsi
   3      3   348 350 hln
   3      3   341 343 alb
   3      3   318 320 lkn
   3      3   309 311 abp
   3      3   296 298 lnh
   3      3   292 294 bql
   3      3   237 239 dgc
   3      3   219 221 apb
   3      3   211 213 iba
   3      3   207 209 lib
   3      3   198 200 nbk
   3      3   186 188 kbn
   3      3   166 168 cba
   3      3   156 158 cad
   3      3   151 153 acb
   3      3   140 142 jkb
   3      3   135 137 ibj
   3      3    60  62 acd
   3      3    58  60 bca
   3      3    46  48 dea
   3      3    14  16 cad
   4      4  2279 2282 oahb
   4      4  2260 2263 baot
   4      4  2232 2235 /bao
   4      4  2218 2221 aoJb
   4      4  2212 2215 bmo-
   4      4  2170 2173 oiGb
   4      4  2162 2165 m9Yo
   4      4  2149 2152 Fb0o
   4      4  2121 2124 oC7j
   4      4  2112 2115 BI6j
   4      4  2097 2100 oabm
   4      4  2017 2020 GLih
   4      4  2004 2007 oaYL
   4      4  1990 1993 aNbo
   4      4  1909 1912 Vbr3
   4      4  1893 1896 on2i
   4      4  1831 1834 ajwi
   4      4  1770 1773 L0KG
   4      4  1649 1652 ibaM
   4      4  1632 1635 obia
   4      4  1621 1624 LGba
   4      4  1563 1566 ijbN
   4      4  1518 1521 baoi
   4      4  1446 1449 iFra
   4      4  1438 1441 aobd
   4      4  1279 1282 abrJ
   4      4  1242 1245 lbnr
   4      4  1174 1177 icoa
   4      4  1108 1111 Fahb
   4      4  1060 1063 badc
   4      4  1054 1057 rabE
   4      4  1043 1046 ribn
   4      4  1036 1039 ible
   4      4  1031 1034 iabh
   4      4   775 778 jnir
   4      4   769 772 brln
   4      4   677 680 nral
   4      4   465 468 hinj
   4      4   421 424 gdbh
   4      4   417 420 bucg
   4      4   374 377 cbdg
   4      4   363 366 hpbl
   4      4   336 339 bnka
   5      5  2310 2314 *bmao
   5      5  2145 2149 0G8bF
   5      5  2125 2129 jbmoa
   5      5  2119 2123 LjoC7
   5      5  1920 1924 alhGb
   5      5  1688 1692 jibma
   5      5  1566 1570 NbLaG
   5      5  1465 1469 LMKgd
   5      5  1443 1447 bRriF
   5      5  1406 1410 PQBub
   5      5  1405 1409 bPQBu
   5      5  1299 1303 bGJKL
   5      5  1281 1285 rJbin
   5      5   389 393 fbtkn
   5      5   180 184 ablmk
   6      6  2029 2034 og5zab
   6      6  1918 1923 HbalhG
   6      6  1870 1875 ioabGL
   6      6  1859 1864 LKbaoi
   6      6  1843 1848 ihaoGC
   6      6  1759 1764 VaYGZL
   6      6  1447 1452 Frainb
   6      6  1152 1157 nlrHcb
   7      7   302 308 nrhblka
   8     10  2025 2034 ijjoog5zab
   8     10  1438 1447 aobdbbRriF
   8     10   302 311 nrhblkaabp
   9     13  2117 2129 IjLjoC7jjbmoa
   9     13  2027 2039 joog5zabbbbGL
   9     13  1439 1451 obdbbRriFrain
  10     15  2025 2039 ijjoog5zabbbbGL
  11     18  2112 2129 BI6jjIjLjoC7jjbmoa
  11     18  2017 2034 GLihiiiiijjoog5zab
  12     21  1903 1923 jiooVbVbr3rrbrHHbalhG
  13     25  2112 2136 BI6jjIjLjoC7jjbmoaoooiooE
  14     28  2112 2139 BI6jjIjLjoC7jjbmoaoooiooEEE0
  15     33  2117 2149 IjLjoC7jjbmoaoooiooEEE0000000G8bF
  16     36  2114 2149 6jjIjLjoC7jjbmoaoooiooEEE0000000G8bF
  16     36  2112 2147 BI6jjIjLjoC7jjbmoaoooiooEEE0000000G8
  17     38  2112 2149 BI6jjIjLjoC7jjbmoaoooiooEEE0000000G8bF
  18     48  2117 2164 IjLjoC7jjbmoaoooiooEEE0000000G8bFb0oooommmoMmm9Y
  19     51  2114 2164 6jjIjLjoC7jjbmoaoooiooEEE0000000G8bFb0oooommmoMmm9Y
  20     53  2112 2164 BI6jjIjLjoC7jjbmoaoooiooEEE0000000G8bFb0oooommmoMmm9Y
  21     70  2095 2164 
wooabmmmmBBBBBBBBBI6jjIjLjoC7jjbmoaoooiooEEE0000000G8bFb0oooommmoMmm9Y
  22     81  1400 1480 
ObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKgd
  22     81  1399 1479 
cObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKg
  23     91  1389 1479 
hcbbacccbccObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGG
  23     91  1383 1473 
fiiibbhcbbacccbccObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainb
  24     97  1383 1479 
fiiibbhcbbacccbccObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainb
  25    112  1368 1479 
NiiiiiiiiiiiiiifiiibbhcbbacccbccObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibb
  26    131  2112 2242 
BI6jjIjLjoC7jjbmoaoooiooEEE0000000G8bFb0oooommmoMmm9YooooooiGbbbbbaaaa
  27    148  2095 2242 
wooabmmmmBBBBBBBBBI6jjIjLjoC7jjbmoaoooiooEEE0000000G8bFb0oooommmoMmm9Y
  28    169  2095 2263 
wooabmmmmBBBBBBBBBI6jjIjLjoC7jjbmoaoooiooEEE0000000G8bFb0oooommmoMmm9Y
  29    181  2112 2292 
BI6jjIjLjoC7jjbmoaoooiooEEE0000000G8bFb0oooommmoMmm9YooooooiGbbbbbaaaa
  30    189  2112 2300 
BI6jjIjLjoC7jjbmoaoooiooEEE0000000G8bFb0oooommmoMmm9YooooooiGbbbbbaaaa
  31    193  2112 2304 
BI6jjIjLjoC7jjbmoaoooiooEEE0000000G8bFb0oooommmoMmm9YooooooiGbbbbbaaaa
  32    210  2095 2304 
wooabmmmmBBBBBBBBBI6jjIjLjoC7jjbmoaoooiooEEE0000000G8bFb0oooommmoMmm9Y
  33    252  1991 2242 
Nbooiaiiiii0ooaYLLLLKGGGGGGLihiiiiijjoog5zabbbbGLLLiiiiiiiiziiiiiiooaa
  33    252  1912 2163 
3rrbrHHbalhGbbGGGLLLbbb4baaoaiiiiwiiYiiiiiiibaaaaaaGGLLLLLLLLGGGGGGGGV
  34    267  1894 2160 
n2iiiiijjjiooVbVbr3rrbrHHbalhGbbGGGLLLbbb4baaoaiiiiwiiYiiiiiiibaaaaaaG
  35    270  1894 2163 
n2iiiiijjjiooVbVbr3rrbrHHbalhGbbGGGLLLbbb4baaoaiiiiwiiYiiiiiiibaaaaaaG
  36    291  1894 2184 
n2iiiiijjjiooVbVbr3rrbrHHbalhGbbGGGLLLbbb4baaoaiiiiwiiYiiiiiiibaaaaaaG
  37    314  1991 2304 
Nbooiaiiiii0ooaYLLLLKGGGGGGLihiiiiijjoog5zabbbbGLLLiiiiiiiiziiiiiiooaa
  38    327  1894 2220 
n2iiiiijjjiooVbVbr3rrbrHHbalhGbbGGGLLLbbb4baaoaiiiiwiiYiiiiiiibaaaaaaG
  39    339  1894 2232 
n2iiiiijjjiooVbVbr3rrbrHHbalhGbbGGGLLLbbb4baaoaiiiiwiiYiiiiiiibaaaaaaG
  40    349  1894 2242 
n2iiiiijjjiooVbVbr3rrbrHHbalhGbbGGGLLLbbb4baaoaiiiiwiiYiiiiiiibaaaaaaG
  41    370  1894 2263 
n2iiiiijjjiooVbVbr3rrbrHHbalhGbbGGGLLLbbb4baaoaiiiiwiiYiiiiiiibaaaaaaG
  42    469  1795 2263 
1iooaabbGaGGGGGGLLLLLLLLbGGGGGGbaaaaajwiiiiiiiiiihaoGCGLLLLLLLLLLKbaoi
  43    501  1763 2263 
ZLLLLLLL0KGGGGGGGGbbooooiaiiiiii1iooaabbGaGGGGGGLLLLLLLLbGGGGGGbaaaaaj
  44    516  1727 2242 
WiiiiiiXiaaaoooiaiiiiiiiiaaaaaGaVaYGZLLLLLLL0KGGGGGGGGbbooooiaiiiiii1i
  45    536  1400 1935 
ObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKgd
  46    609  1655 2263 
UbGGGLLLLLLGGGGGGNbbooiiiiiiiiiiijibmabbGLLLLLLVbbbioiiiiiiiiiiiiiiiii
  47    664  1600 2263 
DaaaaaabbbGGGGGGGGLLLLGbaaaaaoooobiaiaoiiiiiiiiiiibaMbbUbGGGLLLLLLGGGG
  48    706  1558 2263 
TiiiiijbNbLaGGGGKLLLLLLLLLLDDDDDDDDDDDDDDDDaaaaaabbbGGGGGGGGLLLLGbaaaa
  49    724  1400 2123 
ObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKgd
  50    748  1400 2147 
ObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKgd
  51    764  1400 2163 
ObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKgd
  52    785  1400 2184 
ObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKgd
  53    815  1406 2220 
PQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKgdddgSdd
  54    821  1400 2220 
ObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKgd
  55    833  1400 2232 
ObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKgd
  56    843  1400 2242 
ObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKgd
  57    864  1400 2263 
ObbbbbPQBubbbiiiiiiiiiiiiiiiiKiiiiiibbaobdbbRriFrainbbbbiGGGLLLLLLMKgd
  58   1005  1259 2263 
xiziiiiiiiiaaaaaabbbabrJbinnaiiiiaaaaaabbGJKLLLLLMLdddddddaaaaaaioiiii
  59   1070  1194 2263 
ybbbabbbbbbooooaiiiiiiiiiiiiiiiiiiaaaabalbbbbbbblbnrrrrcbbbaiiiiixizii
  60   1225  1039 2263 
elrrribnbbborrrrabEbbbadcdddddbbbbbbbbBbBbbbaaaaaaiiiiiiijiiiioiiiiaaF
  61   1255  1009 2263 
viiiiiiiiiiiiiiiiiiiiiiabhbiblelrrribnbbborrrrabEbbbadcdddddbbbbbbbbBb
  62   1567   697 2263 
Aiiiiiibiiiiiiiiiiiiiiiiiiiiiiiibiiiiiiiiiiiiiiiiiiiiiiiiiiabbbblbrrai
  63   1852   391 2242 
tknnnnnnnknbbbbbaaaaaaabbbbucggdbhhhhhhhhhhhhhhhhohhhhhhhhrhhhhhhhhhhh
  64   1879   364 2242 
pblbbbbbcccbdgdaaaaabbbbbfbtknnnnnnnknbbbbbaaaaaaabbbbucggdbhhhhhhhhhh
  64   1879   354 2232 
sinnkkkkhhpblbbbbbcccbdgdaaaaabbbbbfbtknnnnnnnknbbbbbaaaaaaabbbbucggdb
  65   1889   354 2242 
sinnkkkkhhpblbbbbbcccbdgdaaaaabbbbbfbtknnnnnnnknbbbbbaaaaaaabbbbucggdb
  66   1950   293 2242 
qlllnhnnnnrhblkaabpbbbbbllknnnnnbbaaaaaabbbbnkaaalbllllhlnnnnsinnkkkkh

Time to scan:      2285008 (1200 potential solutions)
Time to filter:    2459754 (253 filtered solutions)

Does this seem reasonable?

The total solution time is about 2.2 ms...

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/26/2010 8:43:31 AM

James J. Weinkam wrote:
> Peter Flass wrote:
> ...
>>
>> OTOH, although I haven't looked at this in detail, CONTROLLED storage 
>> has interactions with Tasking (maybe not with the Enterprise 
>> compilers) that adds quite another level of overhead.  It makes things 
>> much less simple than just push and pop.
>>
> With the F and Optimizing compilers, each task had its own allocation 
> stack, emanating from the pseudo register vector, for each controlled 
> variable known within that task. It is not possible for one task to 
> refer directly to a generation of a controled variable allocated in 
> another task. Any sharing of controlled data had to be done by parameter 
> passing or pointers, and any needed synchronization or mutual exclusion 
> was the responsibility of the programmer. If no sharing is taking place 
> there is no tasking-related overhead.

That's interesting.  The manual describes this behavior, but gives no 
idea that it's just an unfortunate artifact of the implementation.  I 
was thinking it was intended, for some reason or other.  Thanks.
0
Reply Peter_Flass (934) 5/26/2010 10:45:31 AM

Robert AH Prins wrote:
> On 2010-05-26 08:43, Terje Mathisen wrote:
> A very quick look at the results tells me they are OK. My results
> perform one other, minor massage of the output by removing duplicates,
> so to take an example
>
> 15 33 2117 2149 IjLjoC7jjbmoaoooiooEEE0000000G8bF
>
> would come out as
>
> 15 33 2117 2149 IjLoC7bmaiE0G8F
>
> Or even more correct, as the 4 character strings indexed by the values
> of these characters. Nit-picking...
>
>> Time to scan: 2285008 (1200 potential solutions)
>> Time to filter: 2459754 (253 filtered solutions)
>>
>> Does this seem reasonable?
>>
>> The total solution time is about 2.2 ms...
>
> I think my very first Pascal (Turbo Pascal 3.01a) solution using a
> sliding window without any optimizations ran for around 26 *minutes*
> (OK, on a 16MHz 386) on a far smaller subset (less than half) of this
> data, so I think 2.2 ms is pretty reasonable. ;)

My first usable compiler (on a PC) was TP 1.0. :-)

Currently my best runtime has been:

  65   1889   354 2242 
sinnkkkkhhpblbbbbbcccbdgdaaaaabbbbbfbtknnnnnnnknbbbbbaaaaaaabbbbucggdb
  66   1950   293 2242 
qlllnhnnnnrhblkaabpbbbbbllknnnnnbbaaaaaabbbbnkaaalbllllhlnnnnsinnkkkkh
Time to scan:      1247257 (1200 potential solutions)
Time to filter:    1226621 (253 filtered solutions)

I.e. 2.47M cycles on a 2.2GHz cpu, for a total of about 1.1 ms.

I have also implemented the promised improved filter function, using 
that I got:

Time to scan:      1048432 (1200 potential solutions)
Time to filter:     982135 (237 filtered solutions)

I.e. 0.9 ms, but I'm missing 16 of the shorter solutions. :-(

I'll have to recheck my logic...

Terje
-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/26/2010 11:32:00 AM

Robert AH Prins wrote:
> A very quick look at the results tells me they are OK. My results
> perform one other, minor massage of the output by removing duplicates,
> so to take an example
>
> 15 33 2117 2149 IjLjoC7jjbmoaoooiooEEE0000000G8bF
>
> would come out as
>
> 15 33 2117 2149 IjLoC7bmaiE0G8F

Here's with the same stripping of repeating result bytes:

len aperture from to value
   3      3  2291 2293 a3o
   3      3  2271 2273 jao
   3      3  2270 2272 oja
   3      3  2241 2243 o*b
   3      3  2201 2203 bji
   3      3  2191 2193 bLG
   3      3  2186 2188 obG
   3      3  2182 2184 ao+
   3      3  2159 2161 oMm
   3      3  2158 2160 moM
   3      3  2117 2119 IjL
   3      3  2089 2091 aoi
   3      3  2066 2068 Gbo
   3      3  2037 2039 bGL
   3      3  2010 2012 LKG
   3      3  2001 2003 i0o
   3      3  1994 1996 oia
   3      3  1984 1986 VGb
   3      3  1955 1957 iba
   3      3  1939 1941 oai
   3      3  1935 1937 4ba
   3      3  1915 1917 brH
   3      3  1906 1908 oVb
   3      3  1903 1905 jio
   3      3  1882 1884 LEG
   3      3  1848 1850 CGL
   3      3  1825 1827 Gba
   3      3  1818 1820 LbG
   3      3  1802 1804 bGa
   3      3  1795 1797 1io
   3      3  1786 1788 oia
   3      3  1757 1759 GaV
   3      3  1741 1743 oia
   3      3  1734 1736 Xia
   3      3  1705 1707 bio
   3      3  1701 1703 LVb
   3      3  1694 1696 bGL
   3      3  1691 1693 mab
   3      3  1671 1673 GNb
   3      3  1655 1657 UbG
   3      3  1651 1653 aMb
   3      3  1637 1639 aoi
   3      3  1636 1638 iao
   3      3  1573 1575 GKL
   3      3  1549 1551 Giz
   3      3  1514 1516 Grb
   3      3  1500 1502 bNG
   3      3  1497 1499 fab
   3      3  1488 1490 cja
   3      3  1478 1480 djc
   3      3  1472 1474 gSd
   3      3  1471 1473 dgS
   3      3  1455 1457 biG
   3      3  1437 1439 bao
   3      3  1399 1401 cOb
   3      3  1392 1394 bac
   3      3  1389 1391 hcb
   3      3  1388 1390 bhc
   3      3  1322 1324 aio
   3      3  1308 1310 MLd
   3      3  1286 1288 nai
   3      3  1259 1261 xiz
   3      3  1252 1254 bai
   3      3  1248 1250 rcb
   3      3  1233 1235 alb
   3      3  1232 1234 bal
   3      3  1208 1210 oai
   3      3  1187 1189 iba
   3      3  1162 1164 bai
   3      3  1151 1153 rnl
   3      3  1049 1051 bor
   3      3  1039 1041 elr
   3      3  1034 1036 hbi
   3      3   924 926 ria
   3      3   913 915 aDb
   3      3   795 797 bai
   3      3   784 786 bja
   3      3   780 782 rbl
   3      3   765 767 air
   3      3   764 766 rai
   3      3   761 763 lbr
   3      3   755 757 iab
   3      3   682 684 rab
   3      3   680 682 lar
   3      3   674 676 gin
   3      3   589 591 xyi
   3      3   588 590 ixy
   3      3   543 545 nbr
   3      3   522 524 ibr
   3      3   509 511 ovi
   3      3   508 510 iov
   3      3   488 490 joa
   3      3   400 402 knb
   3      3   377 379 gda
   3      3   354 356 sin
   3      3   353 355 nsi
   3      3   348 350 hln
   3      3   341 343 alb
   3      3   318 320 lkn
   3      3   309 311 abp
   3      3   296 298 lnh
   3      3   292 294 bql
   3      3   237 239 dgc
   3      3   219 221 apb
   3      3   211 213 iba
   3      3   207 209 lib
   3      3   198 200 nbk
   3      3   186 188 kbn
   3      3   166 168 cba
   3      3   156 158 cad
   3      3   151 153 acb
   3      3   140 142 jkb
   3      3   135 137 ibj
   3      3    60  62 acd
   3      3    58  60 bca
   3      3    46  48 dea
   3      3    14  16 cad
   4      4  2279 2282 oahb
   4      4  2260 2263 baot
   4      4  2232 2235 /bao
   4      4  2218 2221 aoJb
   4      4  2212 2215 bmo-
   4      4  2170 2173 oiGb
   4      4  2162 2165 m9Yo
   4      4  2149 2152 Fb0o
   4      4  2121 2124 oC7j
   4      4  2112 2115 BI6j
   4      4  2097 2100 oabm
   4      4  2017 2020 GLih
   4      4  2004 2007 oaYL
   4      4  1990 1993 aNbo
   4      4  1909 1912 Vbr3
   4      4  1893 1896 on2i
   4      4  1831 1834 ajwi
   4      4  1770 1773 L0KG
   4      4  1649 1652 ibaM
   4      4  1632 1635 obia
   4      4  1621 1624 LGba
   4      4  1563 1566 ijbN
   4      4  1518 1521 baoi
   4      4  1446 1449 iFra
   4      4  1438 1441 aobd
   4      4  1279 1282 abrJ
   4      4  1242 1245 lbnr
   4      4  1174 1177 icoa
   4      4  1108 1111 Fahb
   4      4  1060 1063 badc
   4      4  1054 1057 rabE
   4      4  1043 1046 ribn
   4      4  1036 1039 ible
   4      4  1031 1034 iabh
   4      4   775 778 jnir
   4      4   769 772 brln
   4      4   677 680 nral
   4      4   465 468 hinj
   4      4   421 424 gdbh
   4      4   417 420 bucg
   4      4   374 377 cbdg
   4      4   363 366 hpbl
   4      4   336 339 bnka
   5      5  2310 2314 *bmao
   5      5  2145 2149 0G8bF
   5      5  2125 2129 jbmoa
   5      5  2119 2123 LjoC7
   5      5  1920 1924 alhGb
   5      5  1688 1692 jibma
   5      5  1566 1570 NbLaG
   5      5  1465 1469 LMKgd
   5      5  1443 1447 bRriF
   5      5  1406 1410 PQBub
   5      5  1405 1409 bPQBu
   5      5  1299 1303 bGJKL
   5      5  1281 1285 rJbin
   5      5   389 393 fbtkn
   5      5   180 184 ablmk
   6      6  2029 2034 og5zab
   6      6  1918 1923 HbalhG
   6      6  1870 1875 ioabGL
   6      6  1859 1864 LKbaoi
   6      6  1843 1848 ihaoGC
   6      6  1759 1764 VaYGZL
   6      6  1447 1452 Frainb
   6      6  1152 1157 nlrHcb
   7      7   302 308 nrhblka
   8     10  2025 2034 ijog5zab
   8     10  1438 1447 aobdRriF
   8     10   302 311 nrhblkap
   9     13  2117 2129 IjLoC7bma
   9     13  2027 2039 jog5zabGL
   9     13  1439 1451 obdRriFan
  10     15  2025 2039 ijog5zabGL
  11     18  2112 2129 BI6jLoC7bma
  11     18  2017 2034 GLihjog5zab
  12     21  1903 1923 jioVbr3HalhG
  13     25  2112 2136 BI6jLoC7bmaiE
  14     28  2112 2139 BI6jLoC7bmaiE0
  15     33  2117 2149 IjLoC7bmaiE0G8F
  16     36  2114 2149 6jILoC7bmaiE0G8F
  16     36  2112 2147 BI6jLoC7bmaiE0G8
  17     38  2112 2149 BI6jLoC7bmaiE0G8F
  18     48  2117 2164 IjLoC7bmaiE0G8FM9Y
  19     51  2114 2164 6jILoC7bmaiE0G8FM9Y
  20     53  2112 2164 BI6jLoC7bmaiE0G8FM9Y
  21     70  2095 2164 woabmBI6jLC7iE0G8FM9Y
  22     81  1400 1480 ObPQBuiKaodRrFnGLMgSjc
  22     81  1399 1479 cObPQBuiKaodRrFnGLMgSj
  23     91  1389 1479 hcbaOPQBuiKodRrFnGLMgSj
  23     91  1383 1473 fibhcaOPQBuKodRrFnGLMgS
  24     97  1383 1479 fibhcaOPQBuKodRrFnGLMgSj
  25    112  1368 1479 NifbhcaOPQBuKodRrFnGLMgSj
  26    131  2112 2242 BI6jLoC7bmaiE0G8FM9Y+-JV/*
  27    148  2095 2242 woabmBI6jLC7iE0G8FM9Y+-JV/*
  28    169  2095 2263 woabmBI6jLC7iE0G8FM9Y+-JV/*t
  29    181  2112 2292 BI6jLoC7bmaiE0G8FM9Y+-JV/*th3
  30    189  2112 2300 BI6jLoC7bmaiE0G8FM9Y+-JV/*th3l
  31    193  2112 2304 BI6jLoC7bmaiE0G8FM9Y+-JV/*th3lr
  32    210  2095 2304 woabmBI6jLC7iE0G8FM9Y+-JV/*th3lr
  33    252  1991 2242 Nboia0YLKGhjg5zmwBI6C7E8FM9+-JV/*
  33    252  1912 2163 3rbHalhGL4oiwYVN0Kjg5zmBI6C7E8FM9
  34    267  1894 2160 n2ijoVbr3HalhGL4wYN0Kg5zmBI6C7E8FM
  35    270  1894 2163 n2ijoVbr3HalhGL4wYN0Kg5zmBI6C7E8FM9
  36    291  1894 2184 n2ijoVbr3HalhGL4wYN0Kg5zmBI6C7E8FM9+
  37    314  1991 2304 Nboia0YLKGhjg5zmwBI6C7E8FM9+-JV/*t3lr
  38    327  1894 2220 n2ijoVbr3HalhGL4wYN0Kg5zmBI6C7E8FM9+-J
  39    339  1894 2232 n2ijoVbr3HalhGL4wYN0Kg5zmBI6C7E8FM9+-J/
  40    349  1894 2242 n2ijoVbr3HalhGL4wYN0Kg5zmBI6C7E8FM9+-J/*
  41    370  1894 2263 n2ijoVbr3HalhGL4wYN0Kg5zmBI6C7E8FM9+-J/*t
  42    469  1795 2263 1ioabGLjwhCKEn2Vr3Hl4YN0g5zmBI678FM9+-J/*t
  43    501  1763 2263 ZL0KGboia1jwhCEn2Vr3Hl4YNg5zmBI678FM9+-J/*t
  44    516  1727 2242 WiXaoGVYZL0Kb1jwhCEn2r3Hl4Ng5zmBI678FM9+-J/*
  45    536  1400 1935 ObPQBuiKaodRrFnGLMgSjcfNzTDUmVWXYZ01whCE23Hl4
  46    609  1655 2263 UbGLNoijmaVWXYZ0K1whCEn2r3Hl4g5zBI678FM9+-J/*t
  47    664  1600 2263 DabGLoiMUNjmVWXYZ0K1whCEn2r3Hl4g5zBI678F9+-J/*t
  48    706  1558 2263 TijbNLaGKDoMUmVWXYZ01whCEn2r3Hl4g5zBI678F9+-J/*t
  49    724  1400 2123 ObPQBuiKaodRrFnGLMgSjcfNzTDUmVWXYZ01whCE23Hl45I67
  50    748  1400 2147 ObPQBuiKaodRrFnGLMgSjcfNzTDUmVWXYZ01whCE23Hl45I678
  51    764  1400 2163 ObPQBuiKaodRrFnGLMgSjcfNzTDUmVWXYZ01whCE23Hl45I6789
  52    785  1400 2184 ObPQBuiKaodRrFnGLMgSjcfNzTDUmVWXYZ01whCE23Hl45I6789+
  53    815  1406 2220 PQBubiKaodRrFnGLMgSjcfNzTDUmVWXYZ01whCE23Hl45I6789+-J
  54    821  1400 2220 
ObPQBuiKaodRrFnGLMgSjcfNzTDUmVWXYZ01whCE23Hl45I6789+-J
  55    833  1400 2232 
ObPQBuiKaodRrFnGLMgSjcfNzTDUmVWXYZ01whCE23Hl45I6789+-J/
  56    843  1400 2242 
ObPQBuiKaodRrFnGLMgSjcfNzTDUmVWXYZ01whCE23Hl45I6789+-J/*
  57    864  1400 2263 
ObPQBuiKaodRrFnGLMgSjcfNzTDUmVWXYZ01whCE23Hl45I6789+-J/*t
  58   1005  1259 2263 
xizabrJnGKLMdoNfhcOPQBuRFgSjTDUmVWXYZ01wCE23Hl45I6789+-/*t
  59   1070  1194 2263 
ybaoilnrcxzJGKLMdNfhOPQBuRFgSjTDUmVWXYZ01wCE23H45I6789+-/*t
  60   1225  1039 2263 
elribnoaEdcBjFhGHIyxzJKLMNfOPQuRgSTDUmVWXYZ01wC23456789+-/*t
  61   1255  1009 2263 
viabhlernoEdcBjFGHIyxzJKLMNfOPQuRgSTDUmVWXYZ01wC23456789+-/*t
  62   1567   697 2263 
AibalrnjBCDvheoEdcFGHIyxzJKLMNfOPQuRgSTUmVWXYZ01w23456789+-/*t
  63   1852   391 2242 
tknbaucgdhorijvwxyzflABCDeEFGHIJKLMNOPQRSTUmVWXYZ0123456789+-/*
  64   1879   364 2242 
pblcdgaftknuhorijvwxyzABCDeEFGHIJKLMNOPQRSTUmVWXYZ0123456789+-/*
  64   1879   354 2232 
sinkhpblcdgaftuorjvwxyzABCDeEFGHIJKLMNOPQRSTUmVWXYZ0123456789+-/
  65   1889   354 2242 
sinkhpblcdgaftuorjvwxyzABCDeEFGHIJKLMNOPQRSTUmVWXYZ0123456789+-/*
  66   1950   293 2242 
qlnhrbkapsicdgftuojvwxyzABCDeEFGHIJKLMNOPQRSTUmVWXYZ0123456789+-/*
Time to scan:       853094 (1200 potential solutions)
Time to filter:     874588 (253 filtered solutions)

1.7M cycles, 0.75ms total time.

Terje
-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/26/2010 11:38:07 AM

Robert AH Prins wrote:
> I'm afraid the 2329 set of input data, built up over 29 years, is all
> that I have at the moment. It's likely to grow a bit more in the coming
> years, at a rate of 40 to 80 entries per year (in 2008 there were 138
> new values, but only 5 of them had never been seen, in 2009 there were
> 56 with 3 previously unseen ones, and so far this year there have been
> 67 new entries, but all of them have occurred before) Most of the growth
> will come from the top-10 values with possibly minor additions from the
> values in the remainder of the top-20.
>
> The amount of data from other users is likely to be rather a lot smaller
> and more likely to contain far fewer unique values, I'll see if I can
> get any.

In that case, and with my current code running at sub-ms speed, it 
really seems like this is a completely solved problem.

I really can't see how you could need to to re-run the scanner 1000 
times/second, every second, for almost a week while waiting for the next 
addition to your data set!
:-)

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/26/2010 11:45:50 AM

On 2010-05-26 08:43, Terje Mathisen wrote:
> Robert AH Prins wrote:
>> On 2010-05-23 08:37, Terje Mathisen wrote:
>>> Dick Wesseling wrote:
>>>> In article<11lic7-2u52.ln1@ntp.tmsw.no>,
>>>> Terje Mathisen<"terje.mathisen at tmsw.no"@giganews.com> writes:
>>>>> Still, running at 130+ cycles per input byte is still fast enough to
>>>>> handle significant amounts of data.
>>>>>
>>>>
>>>> Sounds like you've beaten me. My O() depends on the size of the longest
>>>> sequence minus the size of the longest unique sequence, which is good
>>>> for the reference data. If I time 4K of random data I get 7250 cycles
>>>> per
>>>> input byte. Since sqrt(C) is 16 for that dataset you ought to get about
>>>> 800 cycles/byte ( http://www.fi.uu.nl/~ftu/random.bin )
>>>
>>> It is worse (or better, seen from my point):
>>>
>>> With 4K random input the scan took 252K cycles, which means that I
>>> averaged about 83 cycles/byte. (I.e. very fast!)
>>>
>>> The problem was that with totally random data, there are a _lot_ of both
>>> potential and real solutions, I found 4099 candidates (i.e. just over
>>> one per starting byte!) and 121 after filtering.
>>>
>>> The filter process here took a _lot_ of time though: 8M cycles!
>>>
>>> This means that it needed 2000 cycles/input byte. :-(
>>>
>>> This simply shows that the problem space (and therefore required
>>> algorithm as well) almost certainly won't contain totally random data,
>>> and this will most probably help my code...
>>
>> The data is in fact not very random, but you'll be hard pressed to
>> figure out what it represents. Here is the current full set, represented
>> by 66
>>
>> abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+-/*
>>
>> printable chars:
>>
>> aaabbbbbbbccccadddddddddddddddddddddddddddddddeaaaaaaaabbbcacddddeeeee

<SNIP>

>> bmaoooooobobbbbbooo (2329 characters)
>>
>> The top-3 account for 58.3 % of all values, the top-10 for 85.2%.
>
> Here's my results, with the longer result strings truncated:
> len aperture from to value
> 3 3 2291 2293 a3o

<SNIP>

> 66 1950 293 2242
> qlllnhnnnnrhblkaabpbbbbbllknnnnnbbaaaaaabbbbnkaaalbllllhlnnnnsinnkkkkh

A very quick look at the results tells me they are OK. My results 
perform one other, minor massage of the output by removing duplicates, 
so to take an example

15 33  2117 2149 IjLjoC7jjbmoaoooiooEEE0000000G8bF

would come out as

15 33  2117 2149 IjLoC7bmaiE0G8F

Or even more correct, as the 4 character strings indexed by the values 
of these characters. Nit-picking...

> Time to scan: 2285008 (1200 potential solutions)
> Time to filter: 2459754 (253 filtered solutions)
>
> Does this seem reasonable?
>
> The total solution time is about 2.2 ms...

I think my very first Pascal (Turbo Pascal 3.01a) solution using a 
sliding window without any optimizations ran for around 26 *minutes* 
(OK, on a 16MHz 386) on a far smaller subset (less than half) of this 
data, so I think 2.2 ms is pretty reasonable. ;)

Still cutting the Pascal - the PL/I would be easier, but PL/I compilers 
are a bit thin on the ground...

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply Robert 5/26/2010 12:54:58 PM

On 2010-05-26 03:48, Dick Wesseling wrote:
> In article<862qk6Fe5gU1@mid.individual.net>,
> 	Robert AH Prins<spamtrap@prino.org>  writes:
>
>> Group 61 - the x-marked 3-string (FEG) is not found by the Paul Green code:
>
> What a relief, mine works:
>
> 5641 cycles 6 solutions
> AABCDEFEGGGGGGGGGGHICDAAAAA
>   len  win  pos
>     9   19    1 ABCDEFEGGGGGGGGGGHI
>     8   17    6 FEGGGGGGGGGGHICDA
>     7    8    1 ABCDEFEG
>     6    6    1 ABCDEF
>     3    3    6 FEG<there it is!
>     6    6   17 GHICDA
>
>> Group 69, again the x-marked 3-string (HIJ) is not found
>
> 5630 cycles 6 solutions
> AAAAABBCCDEFGHHHHIJJJ
>   len  win  pos
>    10   15    4 ABBCCDEFGHHHHIJ
>     9   13    6 BCCDEFGHHHHIJ
>     8   10    4 ABBCCDEFGH
>     7    8    6 BCCDEFGH
>     6    6    8 CDEFGH
>     3    3   16 HIJ<there it is!
>
>> Will extract the code from the Pascal program, and get it working RSN,
>> bit tired from travel, it takes as long to get from Vilnius to Charleroi
>> as from Charleroi to Oostende.
>
> If it isn't asking too much, once you've taken some rest would you mind
> posting one of the longer (2K or so) data sets?
> Earlier I had to disappoint the readers of clax by saying that it wasn't
> worth bothering about an asm version because it outperformed the C
> version by a small margin only. However, that was for random data.
> Random data suffocates branch prediction causing _any_ version to come
> to a grinding halt (*).

I'm afraid the 2329 set of input data, built up over 29 years, is all 
that I have at the moment. It's likely to grow a bit more in the coming 
years, at a rate of 40 to 80 entries per year (in 2008 there were 138 
new values, but only 5 of them had never been seen, in 2009 there were 
56 with 3 previously unseen ones, and so far this year there have been 
67 new entries, but all of them have occurred before) Most of the growth 
will come from the top-10 values with possibly minor additions from the 
values in the remainder of the top-20.

The amount of data from other users is likely to be rather a lot smaller 
and more likely to contain far fewer unique values, I'll see if I can 
get any.

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply Robert 5/26/2010 1:16:20 PM

In article <864cvsFor7U1@mid.individual.net>,
	Robert AH Prins <spamtrap@prino.org> writes:
>
[snip]
>> Time to scan: 2285008 (1200 potential solutions)
>> Time to filter: 2459754 (253 filtered solutions)
>>
>> Does this seem reasonable?
>>
>> The total solution time is about 2.2 ms...
> 
> I think my very first Pascal (Turbo Pascal 3.01a) solution using a 
> sliding window without any optimizations ran for around 26 *minutes* 
> (OK, on a 16MHz 386) on a far smaller subset (less than half) of this 
> data, so I think 2.2 ms is pretty reasonable. ;)
> 
> Still cutting the Pascal - the PL/I would be easier, but PL/I compilers 
> are a bit thin on the ground...
> 

My results:

      2165682 cycles 253 solutions.

Using 2K of random data instead takes 6 times longer.
0
Reply free 5/26/2010 1:23:39 PM

Peter Flass <Peter_Flass@yahoo.com> wrote:
> James J. Weinkam wrote:
(snip)

>> With the F and Optimizing compilers, each task had its own allocation 
>> stack, emanating from the pseudo register vector, for each controlled 
>> variable known within that task. It is not possible for one task to 
>> refer directly to a generation of a controled variable allocated in 
>> another task. Any sharing of controlled data had to be done by parameter 
>> passing or pointers, and any needed synchronization or mutual exclusion 
>> was the responsibility of the programmer. If no sharing is taking place 
>> there is no tasking-related overhead.
 
> That's interesting.  The manual describes this behavior, but gives no 
> idea that it's just an unfortunate artifact of the implementation.  I 
> was thinking it was intended, for some reason or other.  Thanks.

It could probably be argued either way.  Since PL/I, and the F compiler,
were designed before much experience with multitasking systems, it
might not have been obvious which way it should work.

Even more, OS/360 has one address space for all tasks.  Storage
keys might stop you from accessing something, but it is addressable.

It seems to me that the definition of threads is such that they
run in the same address space, which tasks (not counting OS/360)
are allowed to run in different address spaces.

-- glen
0
Reply gah (12259) 5/26/2010 5:14:32 PM

LS0tLS1CRUdJTiBQR1AgU0lHTkVEIE1FU1NBR0UtLS0tLQ0KSGFzaDogU0hBMQ0KDQpPbiBXZWQs
IDI2IE1heSAyMDEwIDE3OjE0OjMyICswMDAwIChVVEMpLCBnbGVuIGhlcnJtYW5uc2ZlbGR0IHdy
b3RlDQphYm91dCBSZTogQUxMT0NBVEUgdnMuIEJFR0lOOg0KDQpbc25pcF0NCj5JdCBjb3VsZCBw
cm9iYWJseSBiZSBhcmd1ZWQgZWl0aGVyIHdheS4gIFNpbmNlIFBML0ksIGFuZCB0aGUgRg0KPmNv
bXBpbGVyLCB3ZXJlIGRlc2lnbmVkIGJlZm9yZSBtdWNoIGV4cGVyaWVuY2Ugd2l0aCBtdWx0aXRh
c2tpbmcNCj5zeXN0ZW1zLCBpdCBtaWdodCBub3QgaGF2ZSBiZWVuIG9idmlvdXMgd2hpY2ggd2F5
IGl0IHNob3VsZCB3b3JrLg0KPg0KPkV2ZW4gbW9yZSwgT1MvMzYwIGhhcyBvbmUgYWRkcmVzcyBz
cGFjZSBmb3IgYWxsIHRhc2tzLiAgU3RvcmFnZQ0KPmtleXMgbWlnaHQgc3RvcCB5b3UgZnJvbSBh
Y2Nlc3Npbmcgc29tZXRoaW5nLCBidXQgaXQgaXMgYWRkcmVzc2FibGUuDQoNCk9TLzM2MCBoYWQg
bXVsdGktdGFza2luZywgYXQgbGVhc3QgaW4gaXRzIE1GVCBhbmQgTVZUIGluY2FybmF0aW9uczsN
CnRoZXNlIGJlY2FtZSBPUy9WUzEgYW5kIE9TL1ZTMiwgcmVzcGVjdGl2ZWx5LCBvbmNlIHBhZ2lu
ZyBhbmQgYSBmZXcNCm90aGVyIGtuaWNrLWtuYWNrcyB3ZXJlIGFkZGVkLg0KDQo+SXQgc2VlbXMg
dG8gbWUgdGhhdCB0aGUgZGVmaW5pdGlvbiBvZiB0aHJlYWRzIGlzIHN1Y2ggdGhhdCB0aGV5DQo+
cnVuIGluIHRoZSBzYW1lIGFkZHJlc3Mgc3BhY2UsDQoNClRoaXMgaXMgdHJ1ZS4NCg0KPndoaWNo
IHRhc2tzIChub3QgY291bnRpbmcgT1MvMzYwKQ0KPmFyZSBhbGxvd2VkIHRvIHJ1biBpbiBkaWZm
ZXJlbnQgYWRkcmVzcyBzcGFjZXMuDQoNClRoaXMgaXMgbm90Lg0KDQpVbnByaXZpbGVnZWQgY29k
ZSB0aGF0IHVzZXMgdGhlIEFUVEFDSCBzdXBlcnZpc29yIGNhbGwgY3JlYXRlcyBhIG5ldw0Kc3Vi
LXRhc2sgaW4gdGhlIGN1cnJlbnQgYWRkcmVzcyBzcGFjZS4gIE1vc3QgcHJpdmlsZWdlZCBjb2Rl
IChlLmcuDQpDSUNTLCBJTVMpIGRvZXMgdGhlIHNhbWUuIEhvd2V2ZXIsIHRoZXJlIGlzIGFuIG9i
c2N1cmUgcGFyYW1ldGVyaXphdGlvbg0Kb2YgQVRUQUNIIHRoYXQgSkVTWzIzXSB1c2VzIHRvIHN0
YXJ0IGluaXRpYXRvcnMgaW4gbmV3IGFkZHJlc3Mgc3BhY2VzLA0KYnV0IGFsbCBvdGhlciBBVFRB
Q0hlcyByZW1haW4gd2l0aGluIHRoZSBhZGRyZXNzIHNwYWNlIGZyb20gd2hpY2ggdGhleQ0Kd2Vy
ZSBpc3N1ZWQuDQotIC0tIA0KUmVnYXJkcywNCg0KRGF2ZSAgW1JMVSAjMzE0NDY1XQ0KPT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT0NCmR3bm9vbkBzcGFtdHJhcC5udGx3b3JsZC5jb20gKERhdmlkIFcgTm9vbikNClJl
bW92ZSBzcGFtIHRyYXAgdG8gcmVwbHkgYnkgZS1tYWlsLg0KPT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0NCi0tLS0t
QkVHSU4gUEdQIFNJR05BVFVSRS0tLS0tDQpWZXJzaW9uOiBHbnVQRyB2Mi4wLjE0IChHTlUvTGlu
dXgpDQoNCmlFWUVBUkVDQUFZRkFrdjllREFBQ2drUTlNcWFVSlF3Mk1uMm9nQ2ZmUkNDaHp1OGRt
ZlZHdUhGM1VhT3dFU1YNCkZPMEFuUlpkOXNyS21pa1NVM2ppS2laV0FKR1pBbE40DQo9SEF6Vg0K
LS0tLS1FTkQgUEdQIFNJR05BVFVSRS0tLS0tDQo=

0
Reply dwnoon (51) 5/26/2010 7:36:09 PM

David W Noon <dwnoon@spamtrap.ntlworld.com> wrote:
(snip, I wrote)
 
>>It seems to me that the definition of threads is such that they
>>run in the same address space,
 
> This is true.
 
>>which tasks (not counting OS/360)
>>are allowed to run in different address spaces.
 
> This is not.
 
> Unprivileged code that uses the ATTACH supervisor call creates a new
> sub-task in the current address space.  

I was thinking of it in the general computer sense, as, for example,
in the wikipedia entry under Computer_multitasking.

Of course IBM and OS/360 were and are (with successors) important
in the development of multitasking and subtasks, and one can't
completely separate from them.

-- glen
0
Reply gah (12259) 5/26/2010 8:11:30 PM

glen herrmannsfeldt wrote:
> David W Noon <dwnoon@spamtrap.ntlworld.com> wrote:
> (snip, I wrote)
>  
>>> It seems to me that the definition of threads is such that they
>>> run in the same address space,
>  
>> This is true.
>  
>>> which tasks (not counting OS/360)
>>> are allowed to run in different address spaces.
>  
>> This is not.
>  
>> Unprivileged code that uses the ATTACH supervisor call creates a new
>> sub-task in the current address space.  
> 
> I was thinking of it in the general computer sense, as, for example,
> in the wikipedia entry under Computer_multitasking.
> 
> Of course IBM and OS/360 were and are (with successors) important
> in the development of multitasking and subtasks, and one can't
> completely separate from them.
> 

Wiki is probably screwed up.  Those would be "processes".  AFAIK, same 
address space=thread, different address space=process.  Of course, as 
you point out, OS/360++ calls threads tasks, and processes "address spaces."
0
Reply Peter_Flass (934) 5/26/2010 11:12:30 PM

This message was originally posted on 2010/05/21 at 23:09 PDT.
It has not yet appeared on any of the newsgroups; numerous other messages in this thread have
appeared since then.

When I tried to repost it just now, a message box appeared saying "Sending of message failed. You
may only send to one newsgroup at a time."

Accordingly, I am now posting it to each group separately.


Dick Wesseling wrote:
 > /*
 > Here is the final version of my solution.
 >
 > Let:    N       number of cells, size of the input
 >         V       number of values, size of symbol set.
 >         M       the number of distinct values in the largest sequence.
 >         T       the size of the largest n-in-n sequence, i.e. the
 >                 largest sequence consisting of distinct values only.
 >
 > As explained in my previous message <4bf43d60$0$22939$c5fe704e@news6.xs4all.nl>
 > the algorithm uses two different search strategies:
 >
 > Plan A
 >
 > Find the largest sequence with size M
 >
 > for n from M downwards search n-in-m sequences, keeping only sequences
 > with minimal value of m.
 > Print sequence if n<m. (If n=m the same sequence will be found again
 > in the next step).
 >
 > Now n=m=T.
 >
 > Plan B
 > Prints all maximal n-in-n sequences.
 >
 > Plan A scans the input 1+(M-T) .. 2+(M-T) times, depending on the
 > input position of the longest sequence.
 > Plan B scans the input once.
 >
 > The running time is therefore bounded by O(N * (3+(M-T)) which
 > is typically better than or equal to O(N*C).
 > For small values of N the cost of erasing the frequency array should
 > also be accounted for.
 >
 > Two arrays are used:
 >     found[N]    Intermediate results of plan A.
 >     freq[C]     Frequency count for all symbols in the sliding window
 >
 >
 > Finding sequences is done using a straightforward sliding window
 > scan of the input. The values in freq[] are updated when the window
 > edges move and the numer of unique symbols is updated when a
 > frequency count changes between zero and non-zero.
 >
 >
 > The output is unsorted.
....
When I did my original solution a few days ago, I didn't even consider a moving window approach
because the OP had said that it's performance was terrible. He also mentioned complexities such as
having to restart the scan multiple times, etc.

Seeing your sliding window approach inspired me to give that approach a try.

I have come up with a streamlined version of your algorithm that I believe meets the OP's
specifications. All of us who have participated in discussing this problem have been bandying about
terms such as unique, distinct, minimal, shortest, without making the intended interpretation
completely clear. I had to do a bit of reading between the lines to come up with what I believe are
the OP's intended specifications, which is why I only say that I believe my solution meets his
specifications. I also inadvertently added to the confusion: by the time I wrote my first solution,
I did not have the original post in front of me; I remembered that the OP had referred to cells and
values so I used C for the number of cells and V for the number of values. Only later did I discover
that he had used N and T for these quantities. I have renamed some of the variables in my program to
make it consistent with the OP's notation.

Let me suggest the following definitions and nomenclature:

The data consist of a sequence of n cells stored in an array, x, with subscripts running from 1 to n.

Each cell contains one of a set of t values which are integers between 1 and t inclusive.

A subsequence of length l consists of l consecutive cells from the array x.

A k-subsequence is a subsequence of length l>=k which contains exactly k of the t possible values.

A minimal k-subsequence is one for which the removal of the first or last cell would reduce it to a
(k-1)-subsequence.

A shortest k-subsequence is a minimal k-subsequence whose length is less than of equal to that of
any other minimal k-subsequence in the array.

A shortest possible k-subsequence is a k-subsequence of length k.

If a minimal k-subsequence is not shortest possible, it contains at least one repeated value. Since
every minimal 1- or 2-subsequence is shortest possible shortest k-subsequences are only of interest
for k>=3. The array may but need not contain a shortest possible k-subsequence for 3<=k<=t.
If the array contains a shortest possible k-subsequence, it also contains shortest possible
j-subsequences for 3<=j<k.

Assuming that each of the t values appears in the array at least once, the array contains at least
one shortest k-subsequence for 1<=k<=t.

Within this framework, here is my understanding of the OP's requirements: "For 3<=k<=t, list all
shortest k-subsequences except, for k<t, omit any shortest possible k-subsequences which are
contained within a shortest possible (k+1)-subsequence."

A shortest possible k-subsequence, z, is not contained within a shortest possible k+1-subsequence
iff each of the immediately preceding and following cells (if it exists) contains a value already
present in z. I shall refer to this as the inclusion test. Note that the inclusion test always
passes for k=t.

In my new program, I have abandoned the use of characters for the external representation and
switched to integers between 1 and t. I have also changed the data representation from bin fixed(15)
to bin fixed(16) unsigned. This allows values of n and t up to 65534. Finally, I added timing of the
input, processing, and output.

To facilitate application of the inclusion test, the data array, x, has sentinel cells x(0) and
x(n+1) both set equal to the non existent value 0. The frequency table, f, has an extra entry f(0)
set equal to 1.

For each k, 3<=k<=t, a window delimited by the subscripts i on the left and j on the right is moved
across the entire data array. The width of the window is w=j-i+1. Initially, the window is at cell 1
(i=1) and is empty (w=0). The relationship w=j-i+1 gives j=0. Since no subsequence can be longer
than n, l, the length of the shortest minimal k-subsequence found so far is set to n+1; the number
of shortest minimal k-subsequences found so far, s, is set to 0; the frequency table, f, is
initialized to 0 except for f(0) which is set to 1 as previously explained; finally the number of
distinct values within the window, d, is set to 0.

The scan of the data consists of repeatedly finding the next minimal k-subsequence and deciding
whether or not to remember it. This is done as follows: first the window is increased on the right
one cell at a time until the number of distinct values is equal to k or the end of the data is
reached, in which case there are no more minimal k-subsequences so the scan is terminated.  At this
point, the number of instances of the value at position j is necessarily 1. Next values are removed
from the window on the left until the leftmost value has only 1 instance in the window. At the point
the window contains the leftmost minimal k-subsequence not already examined. Next the width of the
window, w, is computed and the disposition of the subsequence decided. If w>l, the subsequence is
skipped; if w<l, a shorter minimal k-subsequence has been found, so the previous set of
k-subsequences is thrown away and the subsequence becomes the first element in a new set if it
passes the inclusion test, otherwise the set is left empty; otherwise w=l, so the subsequence is
added to the set of remembered k-subsequences if it passes the inclusion test, which is only applied
if w=k.

Finally the first cell is deleted from the window leaving a not necessarily minimal
(k-1)-subsequence and the loop repeats.

Here is the PL/I source code:

%process mar(2,100) offset;
  subsets: proc options(main) reorder;
   dcl
    (n,t,i,j,k,v,d,w,l,s) bin fixed(16) unsigned,
    (x(0:n+1),a(n),f(0:t)) bin fixed(16) unsigned ctl,
    (ti,to,tc init(0)) bin float(53),
    vfmt entry(bin float(53),bin fixed(15),bin fixed(15)) returns(char(50) var),
    uttime0 entry, uttime entry returns(bin float(53)),
    sysin file input,
    output file output stream;
   call uttime0;
   get file(sysin) list(n,t);
   allocate x,a,f;
   get file(sysin) list((x(i) do i=1 to n)); x(0),x(n+1)=0;
   ti=uttime;
   open file(output) title('/stdout:,type(crlf),recsize(32000)');
   put file(output) edit('n: ',vfmt(n,10,0),', t: ',vfmt(t,10,0))(col(1),4 a);
   put file(output) edit('Input: ',(vfmt(x(i),10,0) do i=1 to n))(col(1),a,(n)(a,x(1)));
   to=uttime;
   do k=3 to t; l=n+1; f(0)=1; do i=1 to t; f(i)=0; end; d=0; i=1; j=0; s=0;
    scan: do forever;
     do j=j+1 to n until(d=k); v=x(j); f(v)+=1; if f(v)=1 then d+=1; end; if d<k then leave scan;
     v=x(i); do while(f(v)>1); f(v)-=1; i+=1; v=x(i); end; w=j-i+1;
     if w>l then;
     else if w<l then do; l=w; s=0;
      if w>k then do; s=1; a(1)=i; end;
      else if f(x(i-1))>0 then if f(x(j+1))>0 then do; s=1; a(1)=i; end;
      end;
     else do;
      if w>k then do; s+=1; a(s)=i; end;
      else if f(x(i-1))>0 then if f(x(j+1))>0 then do; s+=1; a(s)=i; end;
      end;
     f(x(i))=0; d-=1; i+=1;
     end;
    tc+=uttime;
    if s>0 then do;
     put file(output) edit('Distinct: ',vfmt(k,10,0),', length: ',vfmt(l,10,0))(col(1),4 a);
     do j=1 to s; i=a(j);
      put file(output) edit(j,': (',vfmt(i,10,0),') ',(vfmt(x(i+v),10,0) do v=0 to l-1))
       (col(1),f(6),3 a,(l)(a,x(1)));
      end;
     end;
    to+=uttime;
    end;
   put file(output) skip(2) edit('Elapsed time: input ',vfmt(ti,15,8),', processing ',
    vfmt(tc,15,8),', output ',vfmt(to,15,8),'.')(col(1),7 a);
end subsets;

Here is the output from one small example of randomly generated data:

n: 64, t: 12
Input: 11 12 10 9 3 7 10 8 6 7 6 3 5 12 10 10 11 2 12 9 6 4 5 4 5 6 7 3 12 5 4 1 1 8 12 6 7 11 3 8 3
11 11 7 11 6 9 8 9 1 3 11 1 6 9 10 2 7 10 8 4 9 7 5
Distinct: 3, length: 3
      1: (40) 8 3 11
Distinct: 4, length: 4
      1: (7) 10 8 6 7
Distinct: 5, length: 5
      1: (44) 7 11 6 9 8
      2: (48) 8 9 1 3 11
Distinct: 6, length: 6
      1: (1) 11 12 10 9 3 7
      2: (4) 9 3 7 10 8 6
      3: (10) 7 6 3 5 12 10
      4: (24) 4 5 6 7 3 12
      5: (35) 12 6 7 11 3 8
      6: (57) 2 7 10 8 4 9
      7: (59) 10 8 4 9 7 5
Distinct: 7, length: 7
      1: (26) 6 7 3 12 5 4 1
      2: (33) 1 8 12 6 7 11 3
Distinct: 8, length: 8
      1: (16) 10 11 2 12 9 6 4 5
      2: (51) 3 11 1 6 9 10 2 7
Distinct: 9, length: 10
      1: (30) 5 4 1 1 8 12 6 7 11 3
      2: (51) 3 11 1 6 9 10 2 7 10 8
      3: (52) 11 1 6 9 10 2 7 10 8 4
Distinct: 10, length: 11
      1: (51) 3 11 1 6 9 10 2 7 10 8 4
Distinct: 11, length: 14
      1: (51) 3 11 1 6 9 10 2 7 10 8 4 9 7 5
Distinct: 12, length: 19
      1: (16) 10 11 2 12 9 6 4 5 4 5 6 7 3 12 5 4 1 1 8

Elapsed time: input 0, processing 0, output 0.

The table below summarizes some features of the results of some larger examples, a couple of which
go way beyond the limits indicated by the OP.

     n    t   i    c     o  max shortest possible  shortest minimal t-subsequence
    64   12   0    0     0                   8(2)                           19(1)
  4096  256 .01  .03   .19                  54(1)                         1181(1)
  8192  250 .02  .06   .19                  54(1)                          994(1)
16384  256 .03  .13    .2                  64(2)                          981(1)
32000  250 .06  .21   .31                  64(1)                          931(1)
60000  800 .13 1.18  1.85                 117(1)                         4043(1)
65534 4096 .15 7.42 42.68                 220(1)                        29213(1)

All times are in seconds. The numbers in parentheses are the number of instances; the first example
has 2 shortest possible minimal 8-subsequences and the third example has 2 shortest possible minimal
64-subsequences.

The program is vary fast. The complexity is O(n*t). For n=4096 and t=256 which is the maximum size
contemplated by the OP, the computational portion of the program requires only 30ms.


0
Reply James 5/27/2010 12:48:13 AM

This message was originally posted on 2010/05/21 at 23:09 PDT.
It has not yet appeared on any of the newsgroups; numerous other messages in this thread have
appeared since then.

When I tried to repost it just now, a message box appeared saying "Sending of message failed. You
may only send to one newsgroup at a time."

Accordingly, I am now posting it to each group separately.


Dick Wesseling wrote:
 > /*
 > Here is the final version of my solution.
 >
 > Let:    N       number of cells, size of the input
 >         V       number of values, size of symbol set.
 >         M       the number of distinct values in the largest sequence.
 >         T       the size of the largest n-in-n sequence, i.e. the
 >                 largest sequence consisting of distinct values only.
 >
 > As explained in my previous message <4bf43d60$0$22939$c5fe704e@news6.xs4all.nl>
 > the algorithm uses two different search strategies:
 >
 > Plan A
 >
 > Find the largest sequence with size M
 >
 > for n from M downwards search n-in-m sequences, keeping only sequences
 > with minimal value of m.
 > Print sequence if n<m. (If n=m the same sequence will be found again
 > in the next step).
 >
 > Now n=m=T.
 >
 > Plan B
 > Prints all maximal n-in-n sequences.
 >
 > Plan A scans the input 1+(M-T) .. 2+(M-T) times, depending on the
 > input position of the longest sequence.
 > Plan B scans the input once.
 >
 > The running time is therefore bounded by O(N * (3+(M-T)) which
 > is typically better than or equal to O(N*C).
 > For small values of N the cost of erasing the frequency array should
 > also be accounted for.
 >
 > Two arrays are used:
 >     found[N]    Intermediate results of plan A.
 >     freq[C]     Frequency count for all symbols in the sliding window
 >
 >
 > Finding sequences is done using a straightforward sliding window
 > scan of the input. The values in freq[] are updated when the window
 > edges move and the numer of unique symbols is updated when a
 > frequency count changes between zero and non-zero.
 >
 >
 > The output is unsorted.
....
When I did my original solution a few days ago, I didn't even consider a moving window approach
because the OP had said that it's performance was terrible. He also mentioned complexities such as
having to restart the scan multiple times, etc.

Seeing your sliding window approach inspired me to give that approach a try.

I have come up with a streamlined version of your algorithm that I believe meets the OP's
specifications. All of us who have participated in discussing this problem have been bandying about
terms such as unique, distinct, minimal, shortest, without making the intended interpretation
completely clear. I had to do a bit of reading between the lines to come up with what I believe are
the OP's intended specifications, which is why I only say that I believe my solution meets his
specifications. I also inadvertently added to the confusion: by the time I wrote my first solution,
I did not have the original post in front of me; I remembered that the OP had referred to cells and
values so I used C for the number of cells and V for the number of values. Only later did I discover
that he had used N and T for these quantities. I have renamed some of the variables in my program to
make it consistent with the OP's notation.

Let me suggest the following definitions and nomenclature:

The data consist of a sequence of n cells stored in an array, x, with subscripts running from 1 to n.

Each cell contains one of a set of t values which are integers between 1 and t inclusive.

A subsequence of length l consists of l consecutive cells from the array x.

A k-subsequence is a subsequence of length l>=k which contains exactly k of the t possible values.

A minimal k-subsequence is one for which the removal of the first or last cell would reduce it to a
(k-1)-subsequence.

A shortest k-subsequence is a minimal k-subsequence whose length is less than of equal to that of
any other minimal k-subsequence in the array.

A shortest possible k-subsequence is a k-subsequence of length k.

If a minimal k-subsequence is not shortest possible, it contains at least one repeated value. Since
every minimal 1- or 2-subsequence is shortest possible shortest k-subsequences are only of interest
for k>=3. The array may but need not contain a shortest possible k-subsequence for 3<=k<=t.
If the array contains a shortest possible k-subsequence, it also contains shortest possible
j-subsequences for 3<=j<k.

Assuming that each of the t values appears in the array at least once, the array contains at least
one shortest k-subsequence for 1<=k<=t.

Within this framework, here is my understanding of the OP's requirements: "For 3<=k<=t, list all
shortest k-subsequences except, for k<t, omit any shortest possible k-subsequences which are
contained within a shortest possible (k+1)-subsequence."

A shortest possible k-subsequence, z, is not contained within a shortest possible k+1-subsequence
iff each of the immediately preceding and following cells (if it exists) contains a value already
present in z. I shall refer to this as the inclusion test. Note that the inclusion test always
passes for k=t.

In my new program, I have abandoned the use of characters for the external representation and
switched to integers between 1 and t. I have also changed the data representation from bin fixed(15)
to bin fixed(16) unsigned. This allows values of n and t up to 65534. Finally, I added timing of the
input, processing, and output.

To facilitate application of the inclusion test, the data array, x, has sentinel cells x(0) and
x(n+1) both set equal to the non existent value 0. The frequency table, f, has an extra entry f(0)
set equal to 1.

For each k, 3<=k<=t, a window delimited by the subscripts i on the left and j on the right is moved
across the entire data array. The width of the window is w=j-i+1. Initially, the window is at cell 1
(i=1) and is empty (w=0). The relationship w=j-i+1 gives j=0. Since no subsequence can be longer
than n, l, the length of the shortest minimal k-subsequence found so far is set to n+1; the number
of shortest minimal k-subsequences found so far, s, is set to 0; the frequency table, f, is
initialized to 0 except for f(0) which is set to 1 as previously explained; finally the number of
distinct values within the window, d, is set to 0.

The scan of the data consists of repeatedly finding the next minimal k-subsequence and deciding
whether or not to remember it. This is done as follows: first the window is increased on the right
one cell at a time until the number of distinct values is equal to k or the end of the data is
reached, in which case there are no more minimal k-subsequences so the scan is terminated.  At this
point, the number of instances of the value at position j is necessarily 1. Next values are removed
from the window on the left until the leftmost value has only 1 instance in the window. At the point
the window contains the leftmost minimal k-subsequence not already examined. Next the width of the
window, w, is computed and the disposition of the subsequence decided. If w>l, the subsequence is
skipped; if w<l, a shorter minimal k-subsequence has been found, so the previous set of
k-subsequences is thrown away and the subsequence becomes the first element in a new set if it
passes the inclusion test, otherwise the set is left empty; otherwise w=l, so the subsequence is
added to the set of remembered k-subsequences if it passes the inclusion test, which is only applied
if w=k.

Finally the first cell is deleted from the window leaving a not necessarily minimal
(k-1)-subsequence and the loop repeats.

Here is the PL/I source code:

%process mar(2,100) offset;
  subsets: proc options(main) reorder;
   dcl
    (n,t,i,j,k,v,d,w,l,s) bin fixed(16) unsigned,
    (x(0:n+1),a(n),f(0:t)) bin fixed(16) unsigned ctl,
    (ti,to,tc init(0)) bin float(53),
    vfmt entry(bin float(53),bin fixed(15),bin fixed(15)) returns(char(50) var),
    uttime0 entry, uttime entry returns(bin float(53)),
    sysin file input,
    output file output stream;
   call uttime0;
   get file(sysin) list(n,t);
   allocate x,a,f;
   get file(sysin) list((x(i) do i=1 to n)); x(0),x(n+1)=0;
   ti=uttime;
   open file(output) title('/stdout:,type(crlf),recsize(32000)');
   put file(output) edit('n: ',vfmt(n,10,0),', t: ',vfmt(t,10,0))(col(1),4 a);
   put file(output) edit('Input: ',(vfmt(x(i),10,0) do i=1 to n))(col(1),a,(n)(a,x(1)));
   to=uttime;
   do k=3 to t; l=n+1; f(0)=1; do i=1 to t; f(i)=0; end; d=0; i=1; j=0; s=0;
    scan: do forever;
     do j=j+1 to n until(d=k); v=x(j); f(v)+=1; if f(v)=1 then d+=1; end; if d<k then leave scan;
     v=x(i); do while(f(v)>1); f(v)-=1; i+=1; v=x(i); end; w=j-i+1;
     if w>l then;
     else if w<l then do; l=w; s=0;
      if w>k then do; s=1; a(1)=i; end;
      else if f(x(i-1))>0 then if f(x(j+1))>0 then do; s=1; a(1)=i; end;
      end;
     else do;
      if w>k then do; s+=1; a(s)=i; end;
      else if f(x(i-1))>0 then if f(x(j+1))>0 then do; s+=1; a(s)=i; end;
      end;
     f(x(i))=0; d-=1; i+=1;
     end;
    tc+=uttime;
    if s>0 then do;
     put file(output) edit('Distinct: ',vfmt(k,10,0),', length: ',vfmt(l,10,0))(col(1),4 a);
     do j=1 to s; i=a(j);
      put file(output) edit(j,': (',vfmt(i,10,0),') ',(vfmt(x(i+v),10,0) do v=0 to l-1))
       (col(1),f(6),3 a,(l)(a,x(1)));
      end;
     end;
    to+=uttime;
    end;
   put file(output) skip(2) edit('Elapsed time: input ',vfmt(ti,15,8),', processing ',
    vfmt(tc,15,8),', output ',vfmt(to,15,8),'.')(col(1),7 a);
end subsets;

Here is the output from one small example of randomly generated data:

n: 64, t: 12
Input: 11 12 10 9 3 7 10 8 6 7 6 3 5 12 10 10 11 2 12 9 6 4 5 4 5 6 7 3 12 5 4 1 1 8 12 6 7 11 3 8 3
11 11 7 11 6 9 8 9 1 3 11 1 6 9 10 2 7 10 8 4 9 7 5
Distinct: 3, length: 3
      1: (40) 8 3 11
Distinct: 4, length: 4
      1: (7) 10 8 6 7
Distinct: 5, length: 5
      1: (44) 7 11 6 9 8
      2: (48) 8 9 1 3 11
Distinct: 6, length: 6
      1: (1) 11 12 10 9 3 7
      2: (4) 9 3 7 10 8 6
      3: (10) 7 6 3 5 12 10
      4: (24) 4 5 6 7 3 12
      5: (35) 12 6 7 11 3 8
      6: (57) 2 7 10 8 4 9
      7: (59) 10 8 4 9 7 5
Distinct: 7, length: 7
      1: (26) 6 7 3 12 5 4 1
      2: (33) 1 8 12 6 7 11 3
Distinct: 8, length: 8
      1: (16) 10 11 2 12 9 6 4 5
      2: (51) 3 11 1 6 9 10 2 7
Distinct: 9, length: 10
      1: (30) 5 4 1 1 8 12 6 7 11 3
      2: (51) 3 11 1 6 9 10 2 7 10 8
      3: (52) 11 1 6 9 10 2 7 10 8 4
Distinct: 10, length: 11
      1: (51) 3 11 1 6 9 10 2 7 10 8 4
Distinct: 11, length: 14
      1: (51) 3 11 1 6 9 10 2 7 10 8 4 9 7 5
Distinct: 12, length: 19
      1: (16) 10 11 2 12 9 6 4 5 4 5 6 7 3 12 5 4 1 1 8

Elapsed time: input 0, processing 0, output 0.

The table below summarizes some features of the results of some larger examples, a couple of which
go way beyond the limits indicated by the OP.

     n    t   i    c     o  max shortest possible  shortest minimal t-subsequence
    64   12   0    0     0                   8(2)                           19(1)
  4096  256 .01  .03   .19                  54(1)                         1181(1)
  8192  250 .02  .06   .19                  54(1)                          994(1)
16384  256 .03  .13    .2                  64(2)                          981(1)
32000  250 .06  .21   .31                  64(1)                          931(1)
60000  800 .13 1.18  1.85                 117(1)                         4043(1)
65534 4096 .15 7.42 42.68                 220(1)                        29213(1)

All times are in seconds. The numbers in parentheses are the number of instances; the first example
has 2 shortest possible minimal 8-subsequences and the third example has 2 shortest possible minimal
64-subsequences.

The program is vary fast. The complexity is O(n*t). For n=4096 and t=256 which is the maximum size
contemplated by the OP, the computational portion of the program requires only 30ms.



0
Reply jjw (604) 5/27/2010 12:49:38 AM

On 2010-05-26 06:45:31 -0400, Peter Flass said:

> James J. Weinkam wrote:
>> Peter Flass wrote:
>> ...
>>> 
>>> OTOH, although I haven't looked at this in detail, CONTROLLED storage 
>>> has interactions with Tasking (maybe not with the Enterprise compilers) 
>>> that adds quite another level of overhead.  It makes things much less 
>>> simple than just push and pop.
>>> 
>> With the F and Optimizing compilers, each task had its own allocation 
>> stack, emanating from the pseudo register vector, for each controlled 
>> variable known within that task. It is not possible for one task to 
>> refer directly to a generation of a controled variable allocated in 
>> another task. Any sharing of controlled data had to be done by 
>> parameter passing or pointers, and any needed synchronization or mutual 
>> exclusion was the responsibility of the programmer. If no sharing is 
>> taking place there is no tasking-related overhead.
> 
> That's interesting.  The manual describes this behavior, but gives no 
> idea that it's just an unfortunate artifact of the implementation.  I 
> was thinking it was intended, for some reason or other.  Thanks.

As far as I can recall, and as far as the available information on 
BitSavers indicates, it isn't an implementation artifact. It is simply 
the way the old PL/I tasking features were designed, and was in place 
earlier than July, 1966. Compared to the contortions that both the F 
and Optimizing runtimes had to go through to make EVENT variables work 
as defined, it's simplicity itself.

The new compiler simply discarded old PL/I tasking outright and 
replaced it with something completely new.

-- 
John W Kennedy
"But now is a new thing which is very old--
that the rich make themselves richer and not poorer,
which is the true Gospel, for the poor's sake."
  -- Charles Williams.  "Judgement at Chelmsford"

0
Reply jwkenne (1358) 5/27/2010 1:13:23 AM

On 2010-05-26 19:12:30 -0400, Peter Flass said:

> glen herrmannsfeldt wrote:
>> David W Noon <dwnoon@spamtrap.ntlworld.com> wrote:
>> (snip, I wrote)
>> 
>>>> It seems to me that the definition of threads is such that they
>>>> run in the same address space,
>> 
>>> This is true.
>> 
>>>> which tasks (not counting OS/360)
>>>> are allowed to run in different address spaces.
>> 
>>> This is not.
>> 
>>> Unprivileged code that uses the ATTACH supervisor call creates a new
>>> sub-task in the current address space.
>> 
>> I was thinking of it in the general computer sense, as, for example,
>> in the wikipedia entry under Computer_multitasking.
>> 
>> Of course IBM and OS/360 were and are (with successors) important
>> in the development of multitasking and subtasks, and one can't
>> completely separate from them.
>> 
> 
> Wiki is probably screwed up.  Those would be "processes".  AFAIK, same 
> address space=thread, different address space=process.  Of course, as 
> you point out, OS/360++ calls threads tasks, and processes "address 
> spaces."

OS/360 most certainly did not call anything "address spaces". Prior to 
TSO, the term was "job step". However, a job step is not a unit of 
tasking; from that viewpoint, the word "task" was used. (Consider that 
the very terms MFT and MVT use "task" in this way.)

You can't really establish any decent isomorphisms here. The OS/360 
concept set is wholly different.

And old PL/I confused the issue further. PL/I defined a tree of mother 
and daughter tasks, but in the underlying implementation all the 
mothers and daughters were represented by first-generation daughters of 
a supernumerary Control Task.

-- 
John W Kennedy
"There are those who argue that everything breaks even in this old dump 
of a world of ours. I suppose these ginks who argue that way hold that 
because the rich man gets ice in the summer and the poor man gets it in 
the winter things are breaking even for both. Maybe so, but I'll swear 
I can't see it that way."
  -- The last words of Bat Masterson

0
Reply jwkenne (1358) 5/27/2010 1:28:54 AM

John W Kennedy <jwkenne@attglobal.net> wrote:
(snip)
 
> As far as I can recall, and as far as the available information on 
> BitSavers indicates, it isn't an implementation artifact. It is simply 
> the way the old PL/I tasking features were designed, and was in place 
> earlier than July, 1966. Compared to the contortions that both the F 
> and Optimizing runtimes had to go through to make EVENT variables work 
> as defined, it's simplicity itself.

I always thought that there was some connection between EVENT
variables and the OS/360 ECB (event contol block) such as
used by the WAIT macro.

-- glen
0
Reply gah (12259) 5/27/2010 1:34:30 AM

James J. Weinkam wrote:
> This message was originally posted on 2010/05/21 at 23:09 PDT.
> It has not yet appeared on any of the newsgroups; numerous other 
> messages in this thread have
> appeared since then.
> 
> When I tried to repost it just now, a message box appeared saying 
> "Sending of message failed. You
> may only send to one newsgroup at a time."

Possibly a problem at xs4all(???). I can guarantee that neither this 
messages, nor the message you reported earlier, comes from our 
moderation process.

You can see "pending" articles in the clax submission inbox:

http://clax.inspiretomorrow.net/msgs.php

If your posting doesn't appear there, we didn't get it, and there isn't 
much we can do about that. It shouldn't stay there long. It is not our 
usual policy to discuss, in the group, who's on our "whitelist" and 
who's not, but your post "should" be auto-approved within 15 minutes, IF 
all goes well. It's an imperfect world.

> Accordingly, I am now posting it to each group separately.

Unfortunate you had to do that. Some people have a "colorful" opinion of 
the practice. I'm glad you did it, though... better than not posting at 
all!!!

Best,
Frank

0
Reply Frank 5/27/2010 1:45:31 AM

James J. Weinkam wrote:
> Here is the PL/I source code:
>
> %process mar(2,100) offset;
> subsets: proc options(main) reorder;


This source code really needs some indenting and white space!

As it is it is almost unreadable. :-(

> The program is vary fast. The complexity is O(n*t). For n=4096 and t=256

Since my approach terminates each scan on the first collision, it is 
O(n*sqrt(t)), while keeping the working set very low.

> which is the maximum size
> contemplated by the OP, the computational portion of the program
> requires only 30ms.

You need to run your program on all of the same data sets given by the 
OP, verifying both the results and the runtime!

As I wrote yesterday, my current program has done the 2300 char set in 
less than a ms total, both scanning and sorting/filtering.

OTOH, all of the competing solutions we've come up with are _far_ faster 
than what could be needed. :-)

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/27/2010 5:25:05 AM

Terje Mathisen wrote:
> As I wrote yesterday, my current program has done the 2300 char set in
> less than a ms total, both scanning and sorting/filtering.

I have to reply to my own post here:

I did a few more micro-optimizations, streamlining the filtering process:

  64   1879   364 2242 
pblcdgaftknuhorijvwxyzABCDeEFGHIJKLMNOPQRSTUmVWXYZ0123456789+-/*
  64   1879   354 2232 
sinkhpblcdgaftuorjvwxyzABCDeEFGHIJKLMNOPQRSTUmVWXYZ0123456789+-/
  65   1889   354 2242 
sinkhpblcdgaftuorjvwxyzABCDeEFGHIJKLMNOPQRSTUmVWXYZ0123456789+-/*
  66   1950   293 2242 
qlnhrbkapsicdgftuojvwxyzABCDeEFGHIJKLMNOPQRSTUmVWXYZ0123456789+-/*
Time to scan:       768218 (1200 potential solutions)
Time to filter:     707784 (253 filtered solutions)

I.e. 1.47M cycles total, or about 0.67ms.

Terje
-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/27/2010 6:05:11 AM

John W Kennedy wrote:
> On 2010-05-26 19:12:30 -0400, Peter Flass said:
> 
>> glen herrmannsfeldt wrote:
>>> David W Noon <dwnoon@spamtrap.ntlworld.com> wrote:
>>> (snip, I wrote)
>>>
>>>>> It seems to me that the definition of threads is such that they
>>>>> run in the same address space,
>>>
>>>> This is true.
>>>
>>>>> which tasks (not counting OS/360)
>>>>> are allowed to run in different address spaces.
>>>
>>>> This is not.
>>>
>>>> Unprivileged code that uses the ATTACH supervisor call creates a new
>>>> sub-task in the current address space.
>>>
>>> I was thinking of it in the general computer sense, as, for example,
>>> in the wikipedia entry under Computer_multitasking.
>>>
>>> Of course IBM and OS/360 were and are (with successors) important
>>> in the development of multitasking and subtasks, and one can't
>>> completely separate from them.
>>>
>>
>> Wiki is probably screwed up.  Those would be "processes".  AFAIK, same 
>> address space=thread, different address space=process.  Of course, as 
>> you point out, OS/360++ calls threads tasks, and processes "address 
>> spaces."
> 
> OS/360 most certainly did not call anything "address spaces". Prior to 
> TSO, the term was "job step". 

Right, or "region" or "partition" depending on which flaor of OS you 
were running.

However, a job step is not a unit of
> tasking; from that viewpoint, the word "task" was used. (Consider that 
> the very terms MFT and MVT use "task" in this way.)
> 
> You can't really establish any decent isomorphisms here. The OS/360 
> concept set is wholly different.
> 
> And old PL/I confused the issue further. PL/I defined a tree of mother 
> and daughter tasks, but in the underlying implementation all the mothers 
> and daughters were represented by first-generation daughters of a 
> supernumerary Control Task.
> 
0
Reply Peter_Flass (934) 5/27/2010 10:45:50 AM

glen herrmannsfeldt wrote:
> John W Kennedy <jwkenne@attglobal.net> wrote:
> (snip)
>  
>> As far as I can recall, and as far as the available information on 
>> BitSavers indicates, it isn't an implementation artifact. It is simply 
>> the way the old PL/I tasking features were designed, and was in place 
>> earlier than July, 1966. Compared to the contortions that both the F 
>> and Optimizing runtimes had to go through to make EVENT variables work 
>> as defined, it's simplicity itself.
> 
> I always thought that there was some connection between EVENT
> variables and the OS/360 ECB (event contol block) such as
> used by the WAIT macro.
> 

Yes, I did stuff in assembler that posted ECBs that were a PL/I EVENT 
variables; no additional processing necessary.
0
Reply Peter_Flass (934) 5/27/2010 10:47:17 AM

In <htk9su$1ji$1@news.eternal-september.org>, on 05/26/2010
   at 07:12 PM, Peter Flass <Peter_Flass@Yahoo.com> said:

>Wiki is probably screwed up.

Most Wiki editors don't know History. What we now call threads used to
be called light weight processes (LWP).

>AFAIK, same 
>address space=thread, different address space=process.

MVS can start a new Unix process in the same address space.

>Of course, as you point out, OS/360++ calls threads tasks, 
>and processes "address spaces."

 1. See above.

 2. Nobody had coined the term thread at the time OS/360
    came along.

-- 
Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org

0
Reply spamtrap16 (3672) 5/27/2010 10:55:54 AM

In <4bfdcac8$0$31283$607ed4bc@cv.net>, on 05/26/2010
   at 09:28 PM, John W Kennedy <jwkenne@attglobal.net> said:

>OS/360 most certainly did not call anything "address spaces". Prior
>to  TSO, the term was "job step".

No, "partition" or "region", depending on which option you were
running. An address space does not terminate when a job step ends, or
even when a batch job ends.

-- 
Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org

0
Reply spamtrap16 (3672) 5/27/2010 10:59:42 AM

On 2010-05-16 18:28, Robert AH Prins wrote:
> Hi all,
>
> Can anyone give me some hints as to solve the following problem,
> preferably in a way that is faster than the way I used to do it, and
> without the bug in the current version;

Follow-up to the very first post, with compilable source extracted from 
the current Pascal program:

{
This was Paul Green's literal comment to the code:

The method I used was as follows. As I read the data file, I keep a
record of the last occurance of each string, in a list sorted by the
line numbers of the occurances. For each line, I read the string and
find the last occurance. Then I check each possible series ending on
the penultament line. Then I update the list and continue. Now let's see 
if I can explain this.

The reason this works is because the shortest series, as you describe
it, would both start and end with a string that occurs only once in
the series. If you have a sequence like C A B C D E for example, C is
in it twice; you can shorten the sequence by dropping the C. So that
means you only have to check the length of series that begin with the
last previous occurance of some string. And we have a list of those
lines. In order to check that the string previous to that is in the
series, we check the next last occurance to make sure it is at least
two lines before the first string in the series. In order to make sure
the current line is in the series, you skip series that start after
the last occurance of the current line, because that string must be
included in the series. Finally, I put in one extra test. You don't
have to check a series that extends past the penultamate occurance of 
the penultamate line, since that means the last element of the series is 
in the series twice. Well that's clear as mud. Does it sound like the 
right method though?

My remarks:

Paul's code was fitted into the existing program that used a linked
list due to the unknown amount of records being read in. To simulate
the reading in, an array is allocated and that is later used in the
scanner. Going backwards through the array turned out to be necessary
to get the output in the correct order. Also his original routine
expected four character strings, for speed purposes these are cast into
longints, the real program uses longints and only at the stage where
the data is being printed, it uses these as index into the array with
the 4-character strings.

Below are the two sets of data that fail to return the 3-in-3 strings,
I've been tearing my hair out trying to find why, but without any
result, it's probably a case of Edgar Allan Poe's "The Purloined
Letter", with a solution so obvious that one misses it by staring at
the problem too long. (Wishful thinking?)

The two input files that fail are:

A   61  1
A   61  2
B   61  3
C   61  4
D   61  5
E   61  6
F   61  7
E   61  8
G   61  9
G   61 10
G   61 11
G   61 12
G   61 13
G   61 14
G   61 15
G   61 16
G   61 17
G   61 18
H   61 19
I   61 20
C   61 21
D   61 22
A   61 23
A   61 24
A   61 25
A   61 26
A   61 27

Result:

  |    6 |    6 |    2 | A   B   C   D   E   F                   |
  |      |      |   18 | G   H   I   C   D   A                   |
  |    7 |    8 |    2 | A   B   C   D   E   F   G               |
  |    8 |   17 |    7 | F   E   G   H   I   C   D   A           |
  |    9 |   19 |    2 | A   B   C   D   E   F   G   H   I       |

Missing:

  |    3 |    3 |    7 | F   E   G                               |

and

A   69  1
A   69  2
A   69  3
A   69  4
A   69  5
B   69  6
B   69  7
C   69  8
C   69  9
D   69 10
E   69 11
F   69 12
G   69 13
H   69 14
H   69 15
H   69 16
H   69 17
I   69 18
J   69 19
J   69 20
J   69 21

Result:

  |    6 |    6 |    9 | C   D   E   F   G   H                   |
  |    7 |    8 |    7 | B   C   D   E   F   G   H               |
  |    8 |   10 |    5 | A   B   C   D   E   F   G   H           |
  |    9 |   13 |    7 | B   C   D   E   F   G   H   I   J       |
  |   10 |   15 |    5 | A   B   C   D   E   F   G   H   I   J   |

Missing:

  |    3 |    3 |   17 | H   I   J                               |

}
type
   tycona       = array [1..4] of char;
   scanptr      = ^scan_list;
   runptr       = ^run_list;

   scan_list    = record                                     {s  16}
                    scan_nxt : scanptr;                       {   4}
                    id       : longint;                       {   4}
                    sca      : tycona;                        {   4}
                    fil      : array [1..4] of char;          {   4}
                  end;


   spot_rec     = record                                     {s  16}
                    sca   : tycona;                           {   4}
                    atl   : longint;                          {   4}
                    id    : longint;                          {   4}
                    fil   : array [1..4] of char;             {   4}
                  end;

   run_list     = record                                     {s1044}
                    run_nxt: runptr;                          {   4}
                    run_prv: runptr;                          {   4}
                    rlen   : longint;                         {   4}
                    scas   : longint;                         {   4}
                    id     : longint;                         {   4}
                    run    : array [1..256] of tycona;        {1024}
                  end;

   run_data     = record                                     {s  16}
                    run_top: runptr;                          {   4}
                    run_end: runptr;                          {   4}
                    run_cnt: longint;                         {   4}
                    fil    : array [1..4] of char;            {   4}
                  end;

   _sc_ptr      = ^_sc_tab;
   _sc_tab      = array [1..65536 div sizeof(scanptr)] of scanptr;

   _spot_ptr    = ^_spot_arr;
   _spot_arr    = array [0..65536 div sizeof(spot_rec) - 1] of spot_rec;

   _run_ptr     = ^_run_arr;
   _run_arr     = array [0..65536 div sizeof(run_data) - 1] of run_data;

const sca_skel  : string[24]  = ' |      |      |      | ';
const run_ptr  : runptr    = nil;
const sc_tab   : _sc_ptr   = nil;
const spot_arr : _spot_ptr = nil;
const run_arr  : _run_ptr  = nil;
const _atline  : longint = 0;
const _lnls    : longint = 0;
const _lnls0   : longint = 0;
const _lsat    : longint = 0;
const _i       : longint = 0;
const _nct     : longint = 0;
const _nreq    : longint = 0;
const _scount  : longint = 0;
const
   scan_ptr: scanptr = nil;
   scan_top: scanptr = nil;
   scan_end: scanptr = nil;
const
   _line  : string[255]         = '';
   _linex : array[1..4] of char = #0#0#0#0;

var w_tycona   : tycona;
var print      : string;
var scanin     : text;

procedure add_integer_2_line4(_l: longint);
begin
   str(_l:4, print);
   _line:= _line + print + ' | ';
end; {add_integer_2_line4}

procedure dispose_run(_i: longint);
var nxt_run: runptr;

begin
   run_ptr:= run_arr^[_i].run_top;

   while run_ptr <> nil do
     begin
       nxt_run:= run_ptr^.run_nxt;
       dispose(run_ptr);
       run_ptr:= nxt_run;
     end;

   run_arr^[_i].run_top:= nil;
   run_arr^[_i].run_cnt:= 0;
end; {dispose_run}

procedure setup_run(_n: longint);
var _in: longint;

begin
   new(run_ptr);
   fillchar(run_ptr^.run_nxt, sizeof(run_ptr^), #0);
   fillchar(run_ptr^.run    , sizeof(run_ptr^.run), ' ');

   if run_arr^[_n].run_top = nil then
     run_arr^[_n].run_top:= run_ptr
   else
     begin
       run_arr^[_n].run_end^.run_nxt:= run_ptr;
       run_ptr^.run_prv             := run_arr^[_n].run_end;
     end;

   run_arr^[_n].run_end:= run_ptr;

   inc(run_arr^[_n].run_cnt);

   run_ptr^.rlen := _atline - spot_arr^[_n].atl;
   run_ptr^.scas := _n;
   run_ptr^.id   := spot_arr^[1].id;

   for _in:= 1 to _n do
     run_ptr^.run[_in]:= spot_arr^[_in].sca;
end; {setup_run}

procedure check_runs;
var _j: longint;
var _k: longint;
var _l: longint;

begin
   _k:= 3;
   if _k < _lsat then
     _k:= _lsat;

   for _j:= _k to _nct do
     begin
       _l:= spot_arr^[_j].atl;

       if _lnls0 <= _l then
         if (_j = _nct) or
            (_l > spot_arr^[_j + 1].atl + 1) then
           begin
             if (run_arr^[_j].run_top = nil) or
                (run_arr^[_j].run_top^.rlen > _atline - _l) then
               begin
                 dispose_run(_j);
                 setup_run(_j);
               end
             else
               if (run_arr^[_j].run_top <> nil) and
                  (run_arr^[_j].run_top^.rlen = _atline - _l) then
                 setup_run(_j);
           end;
     end;
end; {check_run}

{
* Scanner:
*
* Part of the code in this procedure was originally written by Paul
* Green as a result of a posting in comp.lang.pascal.borland, asking
* for a faster method of performing scan.
}
procedure scanner;
var _i     : longint;
var _k     : longint;
var _id    : longint;

var scn_in : tycona;

begin;
   _i:= (_scount + 2) * sizeof(run_data);
   getmem(run_arr, _i);
   fillchar(run_arr^, _i, #0);

   _i:= (_scount + 2) * sizeof(spot_rec);
   getmem(spot_arr, _i);
   fillchar(spot_arr^, _i, #0);

   writeln('Scanning');

   for _i:= _scount downto 1 do
     begin
       move(sc_tab^[_i]^.sca[1], w_tycona, sizeof(w_tycona));
       _id:= sc_tab^[_i]^.id;

       inc(_atline);

       _lsat:= 1;
       while (_lsat             <= _nct) and
             (longint(w_tycona) <> longint(spot_arr^[_lsat].sca)) do
         inc(_lsat);

       if _lsat > _nct then
         _lsat:= 0;

       _lnls:= 0;
       if _lsat <> 0 then
         begin
           _lnls:= spot_arr^[_lsat].atl;

           if _nct >= 3 then
             check_runs;
         end;

       _lnls0:= _lnls;

       if _lsat = 0 then
         begin
           inc(_nct);
           _lsat:= _nct;
         end;

       if _lsat > 1 then
         move(spot_arr^[1], spot_arr^[2], sizeof(spot_rec) * pred(_lsat));

       spot_arr^[1].sca := w_tycona;
       spot_arr^[1].id  := _id;
       spot_arr^[1].atl := _atline;
     end;

   inc(_atline);
   _lsat:= 1;
   _lnls:= _atline + 1;
   check_runs;

   for _lnls:= 3 to _nct do
     begin
       run_ptr:= run_arr^[_lnls].run_end;

       if (run_ptr <> nil) and
         ((_lnls = _nct)   or
          (_lnls < _nct)   and
                          (run_arr^[succ(_lnls)].run_end <> nil) and
          (run_ptr^.rlen < run_arr^[succ(_lnls)].run_end^.rlen)) then
         begin
           _line:= sca_skel;
           _line[0]:= #3;

           add_integer_2_line4(_lnls);
           add_integer_2_line4(run_ptr^.rlen);
           add_integer_2_line4(run_ptr^.id);
           _line[0]:= #24;

           while run_ptr <> nil do
             begin
               _k:= 1;

               repeat
                 move(run_ptr^.run[_k], _line[25], 40);
                 inc(_line[0], 39);
                 _line:= _line + ' | ';
                 writeln(_line);

                 _line:= sca_skel;
                 inc(_k, 10);
               until run_ptr^.run[_k][1] = ' ';

               run_ptr:= run_ptr^.run_prv;

               _line[0]:= #17;
               if run_ptr <> nil then
                 add_integer_2_line4(run_ptr^.id);
               _line[0]:= #24;
             end;
         end;

       dispose_run(_lnls);
     end;
end; {scanner}

begin
   assign(scanin, 'scanin.txt');
   reset (scanin);

   _scount:= 0;
   repeat
     _line:= '    ';
     readln(scanin, _line);
     new(scan_ptr);

     if scan_top = nil then
       scan_top:= scan_ptr
     else
       scan_end^.scan_nxt:= scan_ptr;

     scan_end:= scan_ptr;

     inc(_scount);
     scan_ptr^.scan_nxt:= nil;
     scan_ptr^.id:= _scount;
     move(_line[1], scan_ptr^.sca, 4);
   until eof(scanin);

   getmem(sc_tab, _scount * sizeof(scan_ptr));

   scan_ptr:= scan_top;
   _i      := 0;

   while (scan_ptr <> nil) do
     begin
       inc(_i);
       sc_tab^[_i]:= scan_ptr;
       scan_ptr   := scan_ptr^.scan_nxt;
     end;

   scanner;
end.

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply Robert 5/27/2010 3:09:59 PM

Robert AH Prins wrote:
> On 2010-05-16 18:28, Robert AH Prins wrote:
>> Hi all,
>>
>> Can anyone give me some hints as to solve the following problem,
>> preferably in a way that is faster than the way I used to do it, and
>> without the bug in the current version;
>
> Follow-up to the very first post, with compilable source extracted from
> the current Pascal program:
>
> {
> This was Paul Green's literal comment to the code:
>
> The method I used was as follows. As I read the data file, I keep a
> record of the last occurance of each string, in a list sorted by the
> line numbers of the occurances. For each line, I read the string and
> find the last occurance. Then I check each possible series ending on
> the penultament line. Then I update the list and continue. Now let's see
> if I can explain this.

Robert, there are two critical issues here:

a) Do you have to run this Pascal version of the program? I.e. is there 
anything that stops you from simply switching to one of the two fast & 
working versions we have written as a result of your OP?

b) How fast is this Pascal algorithm?

If it is significantly faster, then it might be worthwhile to figure out 
where it fails and see if a fix is possible that doesn't make it too slow!

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/27/2010 4:48:21 PM

In article <kac1d7-22h.ln1@ntp.tmsw.no>,
	Terje Mathisen <"terje.mathisen at tmsw.no"@giganews.com> writes:
> James J. Weinkam wrote:
>> Here is the PL/I source code:
>>
>> %process mar(2,100) offset;
>> subsets: proc options(main) reorder;
> 
> 
> This source code really needs some indenting and white space!
> 
> As it is it is almost unreadable. :-(
> 

So is your reply, unless you put comp.lang.pl1 back in the newsgroups
list.

>> The program is vary fast. The complexity is O(n*t). For n=4096 and t=256
>
> Since my approach terminates each scan on the first collision, it is 
> O(n*sqrt(t)), while keeping the working set very low.
> 
>> which is the maximum size
>> contemplated by the OP, the computational portion of the program
>> requires only 30ms.
> 
> You need to run your program on all of the same data sets given by the 
> OP, verifying both the results and the runtime!
> 
> As I wrote yesterday, my current program has done the 2300 char set in 
> less than a ms total, both scanning and sorting/filtering.
> 
> OTOH, all of the competing solutions we've come up with are _far_ faster 
> than what could be needed. :-)
> 
> Terje
> 
0
Reply free 5/27/2010 7:32:51 PM

On 2010-05-27 16:48, Terje Mathisen wrote:
> Robert AH Prins wrote:
>> On 2010-05-16 18:28, Robert AH Prins wrote:
>>> Hi all,
>>>
>>> Can anyone give me some hints as to solve the following problem,
>>> preferably in a way that is faster than the way I used to do it, and
>>> without the bug in the current version;
>>
>> Follow-up to the very first post, with compilable source extracted from
>> the current Pascal program:
>>
>> {
>> This was Paul Green's literal comment to the code:
>>
>> The method I used was as follows. As I read the data file, I keep a
>> record of the last occurance of each string, in a list sorted by the
>> line numbers of the occurances. For each line, I read the string and
>> find the last occurance. Then I check each possible series ending on
>> the penultament line. Then I update the list and continue. Now let's see
>> if I can explain this.
>
> Robert, there are two critical issues here:
>
> a) Do you have to run this Pascal version of the program? I.e. is there
> anything that stops you from simply switching to one of the two fast &
> working versions we have written as a result of your OP?

No, I can (and quite likely will) change the code any way I like, but 
the code will remain Pascal (and PL/I), as this routine is part of a 
much larger program - this routine produces just one of the many tables 
and the post-processing logic, a few more Pascal programs on the PC and 
a bunch of ISPF edit macro's on z/OS rely on the current order of the 
tables.

The "problem" is that I have to translate the two fast solutions to 
Pascal and as far as yours is concerned... I've got some fairly basic 
understanding of C, but even your source is already using language 
constructions above my rather modest knowledge of the language.

BTW, I see (in your 2010-05-24 06:09 posting) that you use a qsort. The 
PG solution does not need any sorts... Or maybe your as yet unpublished 
fastest version has also dispensed with it?

> b) How fast is this Pascal algorithm?
>
> If it is significantly faster, then it might be worthwhile to figure out
> where it fails and see if a fix is possible that doesn't make it too slow!

Just timed it via RDTSC. Actual scan time is around 2.7M cycles, and the 
somewhat fancy printout adds around another 4M cycles. Source is 
compiled with Virtual Pascal 2.1 build 279, and VP isn't really, to 
express it politely, a very optimizing compiler, so I guess it's pretty 
good - using the assembler output option of VP it would also be pretty 
easy to convert it into in-line assembler, giving considerable scope for 
optimizations.

Sadly it cannot handle two cases and although I'm currently out of work, 
there is more to life than sitting behind a screen for up to 10 hours a 
day, certainly now that Mrs Prins has also returned from Vilnius...

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply Robert 5/28/2010 12:45:02 AM

On 2010-05-26 21:34:30 -0400, glen herrmannsfeldt said:

> John W Kennedy <jwkenne@attglobal.net> wrote:
> (snip)
> 
>> As far as I can recall, and as far as the available information on
>> BitSavers indicates, it isn't an implementation artifact. It is simply
>> the way the old PL/I tasking features were designed, and was in place
>> earlier than July, 1966. Compared to the contortions that both the F
>> and Optimizing runtimes had to go through to make EVENT variables work
>> as defined, it's simplicity itself.
> 
> I always thought that there was some connection between EVENT
> variables and the OS/360 ECB (event contol block) such as
> used by the WAIT macro.

Oh, undoubtedly there is a /connection/. But nothing like a one-to-one 
mapping. You see, there are two major problems:

  Only one OS/360 task can wait on a given ECB, but any number of old 
PL/I TASKs could wait on a given EVENT.

  Furthermore, as soon as an EVENT went complete, /all/ the old PL/I 
TASKs waiting on it would become dispatchable, even if the EVENT went 
incomplete again before they could be dispatched.

The contortions the runtimes, both F and Optimizing, had to go through 
to fix this functional mismatch were something extraordinary, and 
rather expensive. I once tried using PL/I tasking to decouple PLISRTD's 
E15 and E35 to produce an interface more resembling the COBOL SORT, 
RELEASE, and RETURN verbs, but I found that the CPU overhead entailed 
was unacceptable, because each operation involved several shoulder taps 
back and forth among the PL/I TASKs and the supernumerary Control Task.

-- 
John W Kennedy
A proud member of the reality-based community.

0
Reply jwkenne (1358) 5/28/2010 3:19:25 AM

Robert AH Prins wrote:
> On 2010-05-27 16:48, Terje Mathisen wrote:
>> Robert, there are two critical issues here:
>>
>> a) Do you have to run this Pascal version of the program? I.e. is there
>> anything that stops you from simply switching to one of the two fast &
>> working versions we have written as a result of your OP?
>
> No, I can (and quite likely will) change the code any way I like, but
> the code will remain Pascal (and PL/I), as this routine is part of a
> much larger program - this routine produces just one of the many tables
> and the post-processing logic, a few more Pascal programs on the PC and
> a bunch of ISPF edit macro's on z/OS rely on the current order of the
> tables.

OK. I assume you for some reason cannot link in functions written in C 
either?
>
> The "problem" is that I have to translate the two fast solutions to
> Pascal and as far as yours is concerned... I've got some fairly basic
> understanding of C, but even your source is already using language
> constructions above my rather modest knowledge of the language.

That's easy: There's no "funky" C at all in my code, it is really an asm 
algorithm written in C shorthand.
>
> BTW, I see (in your 2010-05-24 06:09 posting) that you use a qsort. The
> PG solution does not need any sorts... Or maybe your as yet unpublished
> fastest version has also dispensed with it?

The qsort is actually slower than a tiny custom (sort) routine to get 
the potential solutions into the order I need for the final filtering.

I have written many such routines in my Pascal days. :-)
>
>> b) How fast is this Pascal algorithm?
>>
>> If it is significantly faster, then it might be worthwhile to figure out
>> where it fails and see if a fix is possible that doesn't make it too
>> slow!
>
> Just timed it via RDTSC. Actual scan time is around 2.7M cycles, and the
> somewhat fancy printout adds around another 4M cycles. Source is

OK, so it is 4-10 times slower than my best (working) code, and it 
doesn't actually work.



> compiled with Virtual Pascal 2.1 build 279, and VP isn't really, to
> express it politely, a very optimizing compiler, so I guess it's pretty
> good - using the assembler output option of VP it would also be pretty
> easy to convert it into in-line assembler, giving considerable scope for
> optimizations.

So maybe an inline asm of my C code would be the easiest? Then you would 
just have to change the function definition. :-?

int scan(byte *data, int data_len)
{
     /* esi -> data
        ecx = data_len
      */
     __asm {
	mov esi,[data]
	mov ecx,[data_len]
	push ebp
	lea ebp,[solution] // solutions = 0

	add ecx,esi	// Points one beyond the end of data array
	inc esi		// Skipping the first byte

	cmp esi,ecx	// If we had less than 3 chars, then there
	 jae outer_done // cannot be any solutions!

outer_loop:		// Scanning from i=1 to data_len
	mov bl,[esi]	// Current byte
	lea edi,[esi-1]	// Start at prev byte
	inc esi		// Get ready for next iteration

inner_loop:		// Scanning backwards from i-1 to 0
	cmp bl,[edi]	// Same as byte we started with?
	 je found_equal	// Yes! (Not a new unique char, so abort)

// Increment count of unique values for this starting position:
	mov edx,count[edi*4]
	mov eax,esi
	inc edx
	sub eax,edi	// String length
	mov count[edi*4],edx
	cmp edx,3	// At least 3 unique values?
	 jb skip_too_short

	cmp eax,shortest[edx*4]
	 ja not_shortest_string

// Save as potential solution:
	mov shortest[edx*4],eax
	mov [ebp.len],eax
	mov [ebp.uniq],edx
	mov [ebp.start],edi
	add ebp,16	// sizeof(solution[0])

not_shortest_string:
	sub edi,1
	 jnc inner_loop

found_equal:
	cmp esi,ecx
	 jb outer_loop
outer_done:
	lea eax,[ebp - offset solution]
	pop ebp
	shr eax,4	// EAX has potential solution count
     }
}

Better?

Terje
-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 5/28/2010 8:16:53 AM

On 2010-05-28, Terje Mathisen <"terje.mathisen at tmsw.no"@giganews.com> wrote:
> So maybe an inline asm of my C code would be the easiest? Then you would 
> just have to change the function definition. :-?

(and assumptions about calling conventions and other ABI stuff).

But it is more readable than PL/I, and that's what counts :-)

/me ducks for cover.

0
Reply Marco 5/28/2010 9:03:53 AM

On 2010-05-28, Robert AH Prins <spamtrap@prino.org> wrote:
>> where it fails and see if a fix is possible that doesn't make it too slow!
>
> Just timed it via RDTSC. Actual scan time is around 2.7M cycles, and the 
> somewhat fancy printout adds around another 4M cycles. Source is 
> compiled with Virtual Pascal 2.1 build 279, and VP isn't really, to 
> express it politely, a very optimizing compiler, so I guess it's pretty 
> good - using the assembler output option of VP it would also be pretty 
> easy to convert it into in-line assembler, giving considerable scope for 
> optimizations.

Any chance to repeat the timing with Delphi or FPC? It would make a great
table. 

0
Reply Marco 5/28/2010 9:05:23 AM

On 2010-05-28 08:16, Terje Mathisen wrote:
> Robert AH Prins wrote:
>> On 2010-05-27 16:48, Terje Mathisen wrote:
>>> Robert, there are two critical issues here:
>>>
>>> a) Do you have to run this Pascal version of the program? I.e. is there
>>> anything that stops you from simply switching to one of the two fast &
>>> working versions we have written as a result of your OP?
>>
>> No, I can (and quite likely will) change the code any way I like, but
>> the code will remain Pascal (and PL/I), as this routine is part of a
>> much larger program - this routine produces just one of the many tables
>> and the post-processing logic, a few more Pascal programs on the PC and
>> a bunch of ISPF edit macro's on z/OS rely on the current order of the
>> tables.
>
> OK. I assume you for some reason cannot link in functions written in C
> either?
>>
>> The "problem" is that I have to translate the two fast solutions to
>> Pascal and as far as yours is concerned... I've got some fairly basic
>> understanding of C, but even your source is already using language
>> constructions above my rather modest knowledge of the language.
>
> That's easy: There's no "funky" C at all in my code, it is really an asm
> algorithm written in C shorthand.
>>
>> BTW, I see (in your 2010-05-24 06:09 posting) that you use a qsort. The
>> PG solution does not need any sorts... Or maybe your as yet unpublished
>> fastest version has also dispensed with it?
>
> The qsort is actually slower than a tiny custom (sort) routine to get
> the potential solutions into the order I need for the final filtering.
>
> I have written many such routines in my Pascal days. :-)
>>
>>> b) How fast is this Pascal algorithm?
>>>
>>> If it is significantly faster, then it might be worthwhile to figure out
>>> where it fails and see if a fix is possible that doesn't make it too
>>> slow!
>>
>> Just timed it via RDTSC. Actual scan time is around 2.7M cycles, and the
>> somewhat fancy printout adds around another 4M cycles. Source is
>
> OK, so it is 4-10 times slower than my best (working) code, and it
> doesn't actually work.
>
>
>
>> compiled with Virtual Pascal 2.1 build 279, and VP isn't really, to
>> express it politely, a very optimizing compiler, so I guess it's pretty
>> good - using the assembler output option of VP it would also be pretty
>> easy to convert it into in-line assembler, giving considerable scope for
>> optimizations.
>
> So maybe an inline asm of my C code would be the easiest? Then you would
> just have to change the function definition. :-?
>
> int scan(byte *data, int data_len)
> {
> /* esi -> data
> ecx = data_len
> */
> __asm {
> mov esi,[data]
> mov ecx,[data_len]
> push ebp
> lea ebp,[solution] // solutions = 0
>
> add ecx,esi // Points one beyond the end of data array
> inc esi // Skipping the first byte
>
> cmp esi,ecx // If we had less than 3 chars, then there
> jae outer_done // cannot be any solutions!
>
> outer_loop: // Scanning from i=1 to data_len
> mov bl,[esi] // Current byte
> lea edi,[esi-1] // Start at prev byte
> inc esi // Get ready for next iteration
>
> inner_loop: // Scanning backwards from i-1 to 0
> cmp bl,[edi] // Same as byte we started with?
> je found_equal // Yes! (Not a new unique char, so abort)
>
> // Increment count of unique values for this starting position:
> mov edx,count[edi*4]
> mov eax,esi
> inc edx
> sub eax,edi // String length
> mov count[edi*4],edx
> cmp edx,3 // At least 3 unique values?
> jb skip_too_short
>
> cmp eax,shortest[edx*4]
> ja not_shortest_string
>
> // Save as potential solution:
> mov shortest[edx*4],eax
> mov [ebp.len],eax
> mov [ebp.uniq],edx
> mov [ebp.start],edi
> add ebp,16 // sizeof(solution[0])
>
> not_shortest_string:
> sub edi,1
> jnc inner_loop
>
> found_equal:
> cmp esi,ecx
> jb outer_loop
> outer_done:
> lea eax,[ebp - offset solution]
> pop ebp
> shr eax,4 // EAX has potential solution count
> }
> }
>
> Better?

No! ;) (But thank you anyway!)

I should have done it myself.

Anyway, I need to get back to the C code, as I also need a PL/I version! 
But before doing anything, I will spend one more day trying to find the 
bug in the existing code.

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply Robert 5/28/2010 7:27:45 PM

On 2010-05-28 09:05, Marco van de Voort wrote:
> On 2010-05-28, Robert AH Prins<spamtrap@prino.org>  wrote:
>>> where it fails and see if a fix is possible that doesn't make it too slow!
>>
>> Just timed it via RDTSC. Actual scan time is around 2.7M cycles, and the
>> somewhat fancy printout adds around another 4M cycles. Source is
>> compiled with Virtual Pascal 2.1 build 279, and VP isn't really, to
>> express it politely, a very optimizing compiler, so I guess it's pretty
>> good - using the assembler output option of VP it would also be pretty
>> easy to convert it into in-line assembler, giving considerable scope for
>> optimizations.
>
> Any chance to repeat the timing with Delphi or FPC? It would make a great
> table.

I only have an uninstalled version of Turbo Delphi somewhere and after 
having found a full blown PL/I compiler for Windows, equivalent to 
V3.9.0 on z/OS, I'm not sure if I want to install FPC, when VP does, at 
least at the moment, everything I need.

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply Robert 5/28/2010 7:31:54 PM

["Followup-To:" header set to comp.lang.pascal.borland.]
On 2010-05-28, Robert AH Prins <spamtrap@prino.org> wrote:
> On 2010-05-28 09:05, Marco van de Voort wrote:
>> On 2010-05-28, Robert AH Prins<spamtrap@prino.org>  wrote:
>>>> where it fails and see if a fix is possible that doesn't make it too slow!
>>>
>>> Just timed it via RDTSC. Actual scan time is around 2.7M cycles, and the
>>> somewhat fancy printout adds around another 4M cycles. Source is
>>> compiled with Virtual Pascal 2.1 build 279, and VP isn't really, to
>>> express it politely, a very optimizing compiler, so I guess it's pretty
>>> good - using the assembler output option of VP it would also be pretty
>>> easy to convert it into in-line assembler, giving considerable scope for
>>> optimizations.
>>
>> Any chance to repeat the timing with Delphi or FPC? It would make a great
>> table.
>
> I only have an uninstalled version of Turbo Delphi somewhere and after 
> having found a full blown PL/I compiler for Windows, equivalent to 
> V3.9.0 on z/OS, I'm not sure if I want to install FPC, when VP does, at 
> least at the moment, everything I need.

FPC is relatively self contained (the worst can happen is that the installer
adds the directory to the path, which can be undone easily)

The question was not as much meant to convince you to swap compilers, just
that it would be interesting to see how the compilers fare relatively to
eachother.  (and maybe pit it against a GCC implementation of the same
algorithm)

0
Reply Marco 5/29/2010 3:17:06 PM

On Fri, 28 May 2010 10:16:53 +0200
Terje Mathisen <"terje.mathisen at tmsw.no"@giganews.com> wrote:

> 
> int scan(byte *data, int data_len)
> {
>      /* esi -> data
>         ecx = data_len
>       */
>      __asm {
> 	mov esi,[data]
> 	mov ecx,[data_len]
> 	push ebp
> 	lea ebp,[solution] // solutions = 0
> 
> 	add ecx,esi	// Points one beyond the end of data array
> 	inc esi		// Skipping the first byte
> 
> 	cmp esi,ecx	// If we had less than 3 chars, then there
> 	 jae outer_done // cannot be any solutions!
> 
> outer_loop:		// Scanning from i=1 to data_len
> 	mov bl,[esi]	// Current byte
> 	lea edi,[esi-1]	// Start at prev byte
> 	inc esi		// Get ready for next iteration
> 
> inner_loop:		// Scanning backwards from i-1 to 0
> 	cmp bl,[edi]	// Same as byte we started with?
> 	 je found_equal	// Yes! (Not a new unique char, so
> abort)
> 
> // Increment count of unique values for this starting position:
> 	mov edx,count[edi*4]
> 	mov eax,esi
> 	inc edx
> 	sub eax,edi	// String length
> 	mov count[edi*4],edx
> 	cmp edx,3	// At least 3 unique values?
> 	 jb skip_too_short
> 
> 	cmp eax,shortest[edx*4]
> 	 ja not_shortest_string
> 
> // Save as potential solution:
> 	mov shortest[edx*4],eax
> 	mov [ebp.len],eax
> 	mov [ebp.uniq],edx
> 	mov [ebp.start],edi
> 	add ebp,16	// sizeof(solution[0])
> 
> not_shortest_string:
> 	sub edi,1
> 	 jnc inner_loop
> 
> found_equal:
> 	cmp esi,ecx
> 	 jb outer_loop
> outer_done:
> 	lea eax,[ebp - offset solution]
> 	pop ebp
> 	shr eax,4	// EAX has potential solution count
>      }
> }
> 
> Better?
> 
> Terje

bmaxa@maxa:~/fasm/better$ cat test.asm
format ELF64 executable 
segment writeable executable

dta rb 256
len dw 256

macro sys_read fd, buf, size
{
	mov eax, 3 ; sys_read
	mov ebx, fd
	mov ecx, buf
	mov edx, size
	int 0x80
}
macro sys_write fd, buf, size
{
	mov eax, 4 ; sys_write
	mov ebx, fd
	mov ecx, buf
	mov edx, size
	int 0x80
}

macro sys_exit rc
{
	mov eax,1 ; exit
	mov ebx,rc
	int 0x80
}

macro scan data, data_len, srch
{
	mov edi,dword[data]
	mov ecx,dword[data_len]
	mov eax,srch
	cld
	repne scasb
}

entry $
	scan dta,len,'a'
	sys_exit 0

;Better?

Greets!


-- 
http://maxa.homedns.org/

Sometimes online sometimes not

 Svima je "dozvoljeno" biti idiot i
> mrak, ali samo neki to odaberu,  


0
Reply Branimir 5/31/2010 1:05:04 PM

On 2010-05-21 01:38, James J. Weinkam wrote:
> Robert AH Prins wrote:
>>
>> His code is in Pascal, and until I started running it for parts of the
>> input string I had had blind faith in it. Wrong, as it turns out as
>> there are three strings that it does not process correctly, and all my
>> debugging hasn't given me any clues as to the why.
>>
> Can you send a copy of the data set that fails? Also, if it isn't prying
> into your business, would you describe the application this problem is
> abstracted from? You indicated in your original post that the values
> (characters in the posted esample) were actually indices into a table in
> the real application.

Somehow I missed this post, apologies!

The values are indices into a table of "DISTINGUISHING SIGNS USED ON 
VEHICLES IN INTERNATIONAL TRAFFIC", or in more understandable language, 
"international licence plate country code" signs - the current full
official list can be found at

http://www.unece.org/trans/conventn/Distsigns.pdf

one also containing informal ones can be found on Wikipedia,

http://en.wikipedia.org/wiki/List_of_international_license_plate_codes

The upper limit of 255 comes from here.

As for the application itself?

Ahum, it's from a program that processes hitchhike data...

The addition of this particular table (one of about 60 or so) was 
inspired in 1995 by a ride with Danish truck driver who took me from 
somewhere around J�nk�ping to Stockholm. He was the fourth consecutive 
driver with the fourth distinct nationality, D-NL-S-DK. When I told him 
so, he asked me if this had happened before and I told him I did not 
have a clue, but the eventual addition of the table produced by my 
sliding window code told me that it had in fact happened on 13 previous 
occasions.

The program is used by a few other hitch-hikers and one of them asked me 
if I could change the code to also produce this table for the trips done 
in a single year, and it was this change that resulted in the discovery 
of the erroneous results for several sets of data.

The hitchhiking angle also explains why the data grows at a rather 
leisurely pace, I'm no longer a spring-chicken and I do not hitchhike as 
much as I used to do.

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply Robert 6/8/2010 9:11:27 PM

Robert AH Prins wrote:
> Ahum, it's from a program that processes hitchhike data...
<BG>
>
> The addition of this particular table (one of about 60 or so) was
> inspired in 1995 by a ride with Danish truck driver who took me from
> somewhere around J=F6nk=F6ping to Stockholm. He was the fourth consecut=
ive
> driver with the fourth distinct nationality, D-NL-S-DK. When I told him
> so, he asked me if this had happened before and I told him I did not
> have a clue, but the eventual addition of the table produced by my
> sliding window code told me that it had in fact happened on 13 previous
> occasions.
>
> The program is used by a few other hitch-hikers and one of them asked m=
e
> if I could change the code to also produce this table for the trips don=
e
> in a single year, and it was this change that resulted in the discovery
> of the erroneous results for several sets of data.

So in reality, this program only has to keep up with the rate of new=20
rides, comparing the current string with the previous records, right?
>
> The hitchhiking angle also explains why the data grows at a rather
> leisurely pace, I'm no longer a spring-chicken and I do not hitchhike a=
s
> much as I used to do.

I think I stopped doing that nearly 25 years ago: My wife & I tramped=20
around the US for two months on our honeymoon, flying all the=20
long-distance stretches, hitching rides on many shorter stretches.

In the US you'd have to check state licence plates instead, I guess.:-)

Anyway, I love the idea: Getting a large group of hotshot programmers to=20
optimize a program that could be handled with a small set of=20
handwritten/colored cards. :-)

Terje
>
> Robert


--=20
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 6/9/2010 6:11:06 AM

On 2010-06-09 06:11, Terje Mathisen wrote:
> Robert AH Prins wrote:
>> Ahum, it's from a program that processes hitchhike data...
> <BG>
>>
>> The addition of this particular table (one of about 60 or so) was
>> inspired in 1995 by a ride with Danish truck driver who took me from
>> somewhere around J�nk�ping to Stockholm. He was the fourth consecutive
>> driver with the fourth distinct nationality, D-NL-S-DK. When I told him
>> so, he asked me if this had happened before and I told him I did not
>> have a clue, but the eventual addition of the table produced by my
>> sliding window code told me that it had in fact happened on 13 previous
>> occasions.
>>
>> The program is used by a few other hitch-hikers and one of them asked me
>> if I could change the code to also produce this table for the trips done
>> in a single year, and it was this change that resulted in the discovery
>> of the erroneous results for several sets of data.
>
> So in reality, this program only has to keep up with the rate of new
> rides, comparing the current string with the previous records, right?

"Yes", he replied with a blush...

>> The hitchhiking angle also explains why the data grows at a rather
>> leisurely pace, I'm no longer a spring-chicken and I do not hitchhike as
>> much as I used to do.
>
> I think I stopped doing that nearly 25 years ago: My wife & I tramped
> around the US for two months on our honeymoon, flying all the
> long-distance stretches, hitching rides on many shorter stretches.

My wife (I met her at the 4th International Hitch-Hikers Congress in 
2000) and I did a bit of hitchhiking on our honeymoon in Japan, and only 
short stretches.

> In the US you'd have to check state licence plates instead, I guess.:-)

Your guess is as good as mine, I don't know if any Americans are using 
the program.

> Anyway, I love the idea: Getting a large group of hotshot programmers to
> optimize a program that could be handled with a small set of
> handwritten/colored cards. :-)

The very first posting started with "Can anyone give me some *hints* as 
to solve the following problem, preferably in a way that is faster than 
the way I used to do it, and without the bug in the current version;" 
(emphasis added)

How could I possibly have known that a large group of hotshot 
programmers would jump on something one of them referred to as "This is 
an intriguing problem, I'll have think some more... :-)" ;)

A few years ago I came across "CSCRX2HT.txt" on 
https://sites.google.com/site/schlabb/home/code-snippets/rexx-to-html 
and found that it had a raft of bugs. I managed to eliminate them, 
thought that it would be fun to write similar routines for a bunch of 
other "legacy" languages, even spent some of my own money on the 
Javascript that produces the z/OS ISPF-like scrolling HTML and in the 
end I made it freely available under the provisions of the GPL V3. See 
http://www.cbttape.org/ftp/cbt/CBT769.zip - the contents are in TSO XMIT 
format, use XMIT Manager, http://www.cbttape.org/njw/index.html to 
extract the REXX code.

I've got a quote somewhere from a Dutch computer scientist, W.L. (Willem 
Louis) van der Poel: "The intrinsic value of software is nil"

I may have cheated a little (or a lot, take your pick) by not telling 
what the actual code was used for - if I had done so, the thread might 
never have developed into the current pretty high quality-for-Usenet one.

The value of my hitchhike program is nil (or at least pretty close to 
it), but to quote another contributor, Jerome Fine, "The problem is 
very instructive and I really enjoy the description of what you fellows 
are doing."

Robert
-- 
Robert AH Prins
spamtrap(a)prino(d)org
0
Reply Robert 6/9/2010 10:42:18 AM

In comp.lang.pascal.borland message <877iv0FbfdU1@mid.individual.net>,
Tue, 8 Jun 2010 21:11:27, Robert AH Prins <spamtrap@prino.org> posted:

>
>The addition of this particular table (one of about 60 or so) was
>inspired in 1995 by a ride with Danish truck driver who took me from
>somewhere around J�nk�ping to Stockholm. He was the fourth consecutive
>driver with the fourth distinct nationality, D-NL-S-DK. When I told him
>so, he asked me if this had happened before and I told him I did not
>have a clue, but the eventual addition of the table produced by my
>sliding window code told me that it had in fact happened on 13 previous
>occasions.


That's a more understandable description than before, IMHO.

It does indeed seem a close match to what I wrote after and because of
reading that earlier description - JavaScript to find the shortest
interval with (all) 35 different Easter Sunday dates.

My script has a sliding window; the front of the window moves forward
until the content of the window becomes interesting, then the back moves
forward until the content is about to become uninteresting, and reports
if necessary.  Passes are so fast that they can be repeated for
different "interesting".

See <URL:http://www.merlyn.demon.co.uk/estrcons.HTM#ESP>, and "Pop Code"
- the code is essentially pascal-compatible.  Sets are not used; an
array of as many integer elements as there are possible elements is
needed.

FYI, the minimum span is 72 years, for both Gregorian and Julian; the
Julians are in the middle of such a period, but the Gregorians will have
to wait a few millennia for a shortest span.

-- 
 (c) John Stockton, nr London, UK.    ?@merlyn.demon.co.uk     Turnpike v6.05.
 Web  <URL:http://www.merlyn.demon.co.uk/> - w. FAQish topics, links, acronyms
 PAS EXE etc : <URL:http://www.merlyn.demon.co.uk/programs/> - see 00index.htm
 Dates - miscdate.htm estrdate.htm js-dates.htm pas-time.htm critdate.htm etc.
0
Reply Dr 6/9/2010 8:54:36 PM

Dr J R Stockton wrote:
> In comp.lang.pascal.borland message<877iv0FbfdU1@mid.individual.net>,
> Tue, 8 Jun 2010 21:11:27, Robert AH Prins<spamtrap@prino.org>  posted:
>
>>
>> The addition of this particular table (one of about 60 or so) was
>> inspired in 1995 by a ride with Danish truck driver who took me from
>> somewhere around J=F6nk=F6ping to Stockholm. He was the fourth consecu=
tive
>> driver with the fourth distinct nationality, D-NL-S-DK. When I told hi=
m
>> so, he asked me if this had happened before and I told him I did not
>> have a clue, but the eventual addition of the table produced by my
>> sliding window code told me that it had in fact happened on 13 previou=
s
>> occasions.
>
>
> That's a more understandable description than before, IMHO.
>
> It does indeed seem a close match to what I wrote after and because of
> reading that earlier description - JavaScript to find the shortest
> interval with (all) 35 different Easter Sunday dates.

Nice problem...
>
> My script has a sliding window; the front of the window moves forward
> until the content of the window becomes interesting, then the back move=
s
> forward until the content is about to become uninteresting, and reports
> if necessary.  Passes are so fast that they can be repeated for
> different "interesting".

I assume you run the scan from any given starting year, then cross out=20
the various dates as you get to them?

As soon as you've found all possible dates, you optimize by running the=20
back end forward until you get to the first date which isn't repeated=20
anywhere later in the interval.

This is very easy to do by using a counting array to cross out each date=20
as you pass it: Increment when the front end finds a date, decrement=20
when the back end passes it.

Increment the total count each time a counter goes from 0->1, decrement=20
for 1->0.

(Actually, in asm you could do a little trick, by using -1 as the=20
initial count for each date and -35 as the initial day count:

;; Forward scan, for each year
;; EAX =3D easter_sunday(year)

   add day_count[EAX*4],1	; Generates a carry for -1 -> 0
   adc EBX,0			; Total number of different days
    jc Found_35_Front

Similarly for the tail end:

   sub day_count[EAX*4],1
   sbb EBX,0
    jc Found_35_Tail

This makes the code totally branchless except for each time you actually=20
find an edge of a new interval.

> See<URL:http://www.merlyn.demon.co.uk/estrcons.HTM#ESP>, and "Pop Code"
> - the code is essentially pascal-compatible.  Sets are not used; an
> array of as many integer elements as there are possible elements is
> needed.
>
> FYI, the minimum span is 72 years, for both Gregorian and Julian; the
> Julians are in the middle of such a period, but the Gregorians will hav=
e
> to wait a few millennia for a shortest span.
>
:-)

Terje

--=20
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 6/10/2010 5:48:13 AM

In comp.lang.pascal.borland message <vta6e7-4vv2.ln1@ntp.tmsw.no>, Thu,
10 Jun 2010 07:48:13, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> posted:

>Dr J R Stockton wrote:

>> It does indeed seem a close match to what I wrote after and because of
>> reading that earlier description - JavaScript to find the shortest
>> interval with (all) 35 different Easter Sunday dates.
>
>Nice problem...
>>
>> My script has a sliding window; the front of the window moves forward
>> until the content of the window becomes interesting, then the back moves
>> forward until the content is about to become uninteresting, and reports
>> if necessary.  Passes are so fast that they can be repeated for
>> different "interesting".
>
>I assume you run the scan from any given starting year, then cross out
>the various dates as you get to them?

You assume incorrectly, to judge by that sentence.  But read on ...

>As soon as you've found all possible dates, you optimize by running the
>back end forward until you get to the first date which isn't repeated
>anywhere later in the interval.

Yes.

>This is very easy to do by using a counting array to cross out each
>date as you pass it:

I don't cross out.

> Increment when the front end finds a date, decrement when the back end
>passes it.

Yes.

>Increment the total count each time a counter goes from 0->1, decrement
>for 1->0.

Yes.

>(Actually, in asm you could do a little trick, by using -1 as the
>initial count for each date and -35 as the initial day count:

It's fast enough in JavaScript, even for the 5700000 year Gregorian
cycle.  There should be no need to optimise ASM code, fun though that
can be.


>> See<URL:http://www.merlyn.demon.co.uk/estrcons.HTM#ESP>, and "Pop Code"
>> - the code is essentially pascal-compatible.  Sets are not used; an
>> array of as many integer elements as there are possible elements is
>> needed.
>>
>> FYI, the minimum span is 72 years, for both Gregorian and Julian; the
>> Julians are in the middle of such a period, but the Gregorians will have
>> to wait a few millennia for a shortest span.

I did think of adapting it for the maximum span not containing all 35
dates, but that must be given by the longest interval between date
repeats, which I've already done.

Aficionados of ISO 8601 will realise, with a little thought, that the
number of different dates is actually (probably; provably) 35 + 36 + 6.

My page <URL:http://www.merlyn.demon.co.uk/estrdate.htm> links to
various JavaScript full-range algorithms for Easter Sunday in
estralgs.txt - could you beat
<http://www.hugi.scene.org/compo/compoold.htm#compo20> in ASM while
retaining traceability to the formal standards (the Papal Bull, the
British Calendar Act, and whatever Norway relies on?!?)?  The rules for
compo20 give Oudin's algorithm, which I can beat.

-- 
 (c) John Stockton, nr London, UK.    ?@merlyn.demon.co.uk     Turnpike v6.05.
 Web  <URL:http://www.merlyn.demon.co.uk/> - w. FAQish topics, links, acronyms
 PAS EXE etc : <URL:http://www.merlyn.demon.co.uk/programs/> - see 00index.htm
 Dates - miscdate.htm estrdate.htm js-dates.htm pas-time.htm critdate.htm etc.
0
Reply Dr 6/11/2010 9:28:06 PM

131 Replies
85 Views

(page loaded in 0.67 seconds)

Similiar Articles:


















7/22/2012 12:48:43 PM


Reply: