f



Binary reader speed comparison - Tcl 8.5.8 to Tcl 8.6b1.1

I have a fairly simple binary reader proc that exhibits massive speed
differences between Tcl 8.5.8 and Tcl8.6b1.1.

Here's the proc:

proc readFormatted {filename} {
     set fd [ open $filename r ]
     fconfigure $fd -encoding binary -translation binary

     binary scan [ read $fd 2 ] cc type nextlen

     if {$type != 75} {
        return -error "File is not in the expected format"
     }

     while {![append buffer [read $fd $nextlen] ; eof $fd]} {
        binary scan [read $fd 2] cc lastlen nextlen
        # convert to unsigned value
        set nextlen [expr {$nextlen & 0xff}]
        # 130 is the EOF marker
        if {$nextlen == 130} break
        # 129 is a special continuation marker (actually only 128
bytes)
        if {$nextlen == 129} {set nextlen 128}
     }
     close $fd

     return $buffer
}

Now, the timings on a particularly large file (13.3 MB).

Tcl 8.5.8 - 0.550 secs
Tcl 8.6b1.1 - 14.28 secs

So, 8.6b1 is approx 25x slower than 8.5.8.  I know 8.6 is still beta,
but is the above expected at this stage?

Thanks,

Jeff
0
jgodfrey
4/26/2010 10:09:11 PM
comp.lang.tcl 23429 articles. 2 followers. Post Follow

10 Replies
1188 Views

Similar Articles

[PageSpeed] 41

On Apr 26, 5:09=A0pm, jgodfrey <jeffgodfre...@gmail.com> wrote:
> Now, the timings on a particularly large file (13.3 MB).
>
> Tcl 8.5.8 - 0.550 secs
> Tcl 8.6b1.1 - 14.28 secs
>
> So, 8.6b1 is approx 25x slower than 8.5.8. =A0I know 8.6 is still beta,
> but is the above expected at this stage?

I should add the following:

- Both of the above Tcl builds are from ActiveState
- The above timings were recorded using Win7 Pro x64

Jeff
0
jgodfrey
4/26/2010 10:13:43 PM
On Apr 27, 12:09=A0am, jgodfrey <jeffgodfre...@gmail.com> wrote:
> I have a fairly simple binary reader proc that exhibits massive speed
> differences between Tcl 8.5.8 and Tcl8.6b1.1.
>
> Here's the proc:
>
> proc readFormatted {filename} {
> =A0 =A0 =A0set fd [ open $filename r ]
> =A0 =A0 =A0fconfigure $fd -encoding binary -translation binary
>
> =A0 =A0 =A0binary scan [ read $fd 2 ] cc type nextlen
>
> =A0 =A0 =A0if {$type !=3D 75} {
> =A0 =A0 =A0 =A0 return -error "File is not in the expected format"
> =A0 =A0 =A0}
>
> =A0 =A0 =A0while {![append buffer [read $fd $nextlen] ; eof $fd]} {
> =A0 =A0 =A0 =A0 binary scan [read $fd 2] cc lastlen nextlen
> =A0 =A0 =A0 =A0 # convert to unsigned value
> =A0 =A0 =A0 =A0 set nextlen [expr {$nextlen & 0xff}]
> =A0 =A0 =A0 =A0 # 130 is the EOF marker
> =A0 =A0 =A0 =A0 if {$nextlen =3D=3D 130} break
> =A0 =A0 =A0 =A0 # 129 is a special continuation marker (actually only 128
> bytes)
> =A0 =A0 =A0 =A0 if {$nextlen =3D=3D 129} {set nextlen 128}
> =A0 =A0 =A0}
> =A0 =A0 =A0close $fd
>
> =A0 =A0 =A0return $buffer
>
> }
>
> Now, the timings on a particularly large file (13.3 MB).
>
> Tcl 8.5.8 - 0.550 secs
> Tcl 8.6b1.1 - 14.28 secs
>
> So, 8.6b1 is approx 25x slower than 8.5.8. =A0I know 8.6 is still beta,
> but is the above expected at this stage?

Interesting. Cursory investigation shows this is not a shimmering
issue, since this usage pattern seems to be keeping byte arrays
without string rep (which are the most efficiently concatenated
values), as told by ::tcl::unsupported::representation.

Maybe something with the quadratic process of endlessly reallocating
of the growing value... though I fail to see why it's new to 8.6.

In any case, please file a bug report.

-Alex

PS:

Also, you might add the timings with alternative [append]'s:

     set buffer $buffer[set buffer {}][read ...]

(this form allows for in-place appending, and also tries to stay at
binary level. I say "tries" because it can be spoilt if the {} literal
has been stringified before. A more robust method may involve using
[read $somefd 0] insetad of {}).
0
Alexandre
4/26/2010 11:30:34 PM
Alexandre Ferrieux wrote:
> (this form allows for in-place appending, and also tries to stay at
> binary level. I say "tries" because it can be spoilt if the {} literal
> has been stringified before. A more robust method may involve using
> [read $somefd 0] insetad of {}).

Uh? The literal {} always has a string rep: tclEmptyStringRep.

If you want a "fresh" non-literal {}, [] might be better.

OTOH, append will work in place if the variable's value is unshared (I 
think).
0
miguel
4/27/2010 12:10:47 AM
On Apr 27, 2:10=A0am, miguel sofer <mso...@users.sf.net> wrote:
> Alexandre Ferrieux wrote:
> > (this form allows for in-place appending, and also tries to stay at
> > binary level. I say "tries" because it can be spoilt if the {} literal
> > has been stringified before. A more robust method may involve using
> > [read $somefd 0] insetad of {}).
>
> Uh? The literal {} always has a string rep: tclEmptyStringRep.

Yes, but by "stringified" I mean the String obj type, which defeats
shortcuts in the concatenation code about "being already a byte
array". But re-testing, I see that the problem lies only with
[append], not with direct CONCAT1:

   # $z is [open /dev/zero r], binary mode
   set x {};append x [read $z 10];::tcl::unsupported::representation
$x

  ->

   value is a string with a refcount of 2, object pointer at
0x89805f0, internal representation 0x8985258:0x897fe70, string
representation "..."

But if the variable starts as unset, things are different:

   unset x;append x [read $z 10];::tcl::unsupported::representation $x

   ->

   value is a bytearray with a refcount of 2, object pointer at
0x892e518, internal representation 0x8989680:0x892f5c8, no string
representation.


> If you want a "fresh" non-literal {}, [] might be better.

It turns out [] has the same effect as {} in the trials above :/

>
> OTOH, append will work in place if the variable's value is unshared (I
> think).

Yes, and same for CONCAT1 with the K-free K $x[set x {}].

-Alex
0
Alexandre
4/27/2010 6:51:58 AM
<jeffgodfre...@gmail.com> wrote:
>> So, 8.6b1 is approx 25x slower than 8.5.8.  I know 8.6 is still beta,
>> but is the above expected at this stage?

Alexandre Ferrieux wrote:
> Interesting. Cursory investigation shows this is not a shimmering
> issue, since this usage pattern seems to be keeping byte arrays
> without string rep (which are the most efficiently concatenated
> values), as told by ::tcl::unsupported::representation.

If a bug is filed on this, I'll take a look in a few days.

Meanwhile since Alex appears to be chasing it as well, I offer the
hint of looking at lines 41-47 of tclStringObj.c:

/*
  * Set COMPAT to 1 to restore the shimmering patterns to those of Tcl 8.5.
  * This is an escape hatch in case the changes have some unexpected 
unwelcome
  * impact on performance.  If things go well, this mechanism can go 
away when
  * post-8.6 development begins.
  */
#define COMPAT 0

Might not be the problem, but should be easy to try.

DGP
0
Donald
4/27/2010 12:18:47 PM
Alexandre Ferrieux wrote:

> Maybe something with the quadratic process of endlessly reallocating
> of the growing value... though I fail to see why it's new to 8.6.
> 
> In any case, please file a bug report.

Alex (et al.),

Thanks for the input.  I'll file a bug report this morning.  The issue 
does seem to be the append itself as removing it from the mix places the 
8.5 / 8.6 timings in at least the same ballpark.  That said, 8.6 is 
still about 23% slower in that case, but that's better than 25x 
slower... ;^)

I'll report back with a bug number once it's filed.

Thanks,

Jeff
0
Jeff
4/27/2010 1:45:27 PM
Jeff Godfrey wrote:

> I'll report back with a bug number once it's filed.

OK, bug report filed as SF (2992970).  It seems I can only attach a 
single file to a ticket?  I attached the sample Tcl script, but then 
didn't see a way to also attach the sample binary file it reads.  It may 
have been too large anyway (13.4 MB).  I can make it available by other 
means if necessary - just ask.

Thanks,

Jeff
0
Jeff
4/27/2010 1:59:55 PM
Alexandre Ferrieux wrote:

> Also, you might add the timings with alternative [append]'s:
> 
>      set buffer $buffer[set buffer {}][read ...]
> 

Alex,

Trying something similar to the above...

set rec [read $fd $nextlen]
set buffer $buffer[set buffer {}]$rec

8.5 goes from 0.5 secs to 34.5 secs
8.6b1.1 goes from 14 secs to 34.5 secs

So, in that case, they are approx equal, but much worse.

Jeff
0
Jeff
4/27/2010 2:38:38 PM
On Apr 27, 3:59=A0pm, Jeff Godfrey <jeff_godf...@pobox.com> wrote:
> Jeff Godfrey wrote:
> > I'll report back with a bug number once it's filed.
>
> OK, bug report filed as SF (2992970). =A0It seems I can only attach a
> single file to a ticket? =A0I attached the sample Tcl script, but then
> didn't see a way to also attach the sample binary file it reads. =A0It ma=
y
> have been too large anyway (13.4 MB). =A0I can make it available by other
> means if necessary - just ask.

Thank you very much Jeff. Sequel in the ticket's comments :-)

-Alex

0
Alexandre
4/27/2010 4:05:06 PM
On Apr 27, 6:05=A0pm, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:
> On Apr 27, 3:59=A0pm, Jeff Godfrey <jeff_godf...@pobox.com> wrote:
>
> > Jeff Godfrey wrote:
> > > I'll report back with a bug number once it's filed.
>
> > OK, bug report filed as SF (2992970). =A0It seems I can only attach a
> > single file to a ticket? =A0I attached the sample Tcl script, but then
> > didn't see a way to also attach the sample binary file it reads. =A0It =
may
> > have been too large anyway (13.4 MB). =A0I can make it available by oth=
er
> > means if necessary - just ask.
>
> Thank you very much Jeff. Sequel in the ticket's comments :-)

For the record: the problem was indeed one of quadratic realloc().
Solved in HEAD by Donal, by doubling the realloc size every time until
memory gets scarce.

-Alex
0
Alexandre
5/1/2010 3:34:05 PM
Reply: