f



Avoid 'int', 'long' and 'short'...

.... #include <cstdint> instead!

/Flibble
0
Mr
6/26/2015 7:39:42 PM
comp.lang.c++ 49423 articles. 7 followers. Post Follow

39 Replies
1247 Views

Similar Articles

[PageSpeed] 14

On 26/06/2015 20:39, Mr Flibble wrote:
> ... #include <cstdint> instead!
>
> /Flibble

you mean using int16_t? why is that?
int is the fastest integer so why would i use something else?
0
JiiPee
6/26/2015 8:31:33 PM
On 26/06/2015 21:31, JiiPee wrote:
> On 26/06/2015 20:39, Mr Flibble wrote:
>> ... #include <cstdint> instead!
>>
>> /Flibble
>
> you mean using int16_t? why is that?
> int is the fastest integer so why would i use something else?

Because 'int' is both unsafe and non-portable.  If you want the fastest 
integer that is at least a certain size then there is a <cstdint> 
typedef for that.

/Flibble
0
Mr
6/26/2015 9:37:37 PM
On 26/06/2015 22:37, Mr Flibble wrote:
> On 26/06/2015 21:31, JiiPee wrote:
>> On 26/06/2015 20:39, Mr Flibble wrote:
>>> ... #include <cstdint> instead!
>>>
>>> /Flibble
>>
>> you mean using int16_t? why is that?
>> int is the fastest integer so why would i use something else?
>
> Because 'int' is both unsafe and non-portable. 

How is is unsafe? I though int is in all system one byte integer, right? 
int choses automatically the faster integer type.

> If you want the fastest integer that is at least a certain size then 
> there is a <cstdint> typedef for that.
>
But if you chose the wrong one from there it would not be the fastest. 
int is always the fastest quaranteed

0
JiiPee
6/26/2015 10:44:41 PM
JiiPee wrote:
> On 26/06/2015 22:37, Mr Flibble wrote:
>> On 26/06/2015 21:31, JiiPee wrote:
>>> On 26/06/2015 20:39, Mr Flibble wrote:
>>>> ... #include <cstdint> instead!
>>>>
>>>> /Flibble
>>>
>>> you mean using int16_t? why is that?
>>> int is the fastest integer so why would i use something else?
>>
>> Because 'int' is both unsafe and non-portable.
>
> How is is unsafe? I though int is in all system one byte integer, right?
> int choses automatically the faster integer type.

Please, not again!  We've already had this thread...

-- 
Ian Collins
0
Ian
6/26/2015 11:02:51 PM
On 26/06/2015 23:44, JiiPee wrote:
>
> How is is unsafe? I though int is in all system one byte integer, 
> right? int choses automatically the faster integer type.

I forgot the minimum int size, its actually min 16 bit I think (not 
always 32). So I guess to be portable with any system cannot use bigger 
ints than that. Otherwise must use cstdint.

Although its quite rare a compiler would use 16 bit, right? VC++ uses 32 
bit and I guess gcc as well.

Also I read that many prosessors are optimized for ints, so they run 
fast(est) with int.

But you are right there can be risk, although maybe rare.
"If you want to you can statically (compile-time) assert the sizeof 
these fundamental types. It will alert people to think about porting 
your code if the sizeof assumptions change. "

So could check the size of the int to be sure the code works.
I dont know if I bother change my int-habit as it might make the code 
more complex (as for example many library functions take int as an 
parameter, so there would be integer conversions possibly). But sure 
this risk must be taken into consideration.

>
>> If you want the fastest integer that is at least a certain size then 
>> there is a <cstdint> typedef for that.
>>
> But if you chose the wrong one from there it would not be the fastest. 
> int is always the fastest quaranteed

fastest for integers less than 4 bytes in most of the machines.

0
JiiPee
6/27/2015 12:16:57 AM
On 2015-06-27 02:16, JiiPee wrote:
> On 26/06/2015 23:44, JiiPee wrote:
>>
>> How is is unsafe? I though int is in all system one byte integer,
>> right? int choses automatically the faster integer type.
>
> I forgot the minimum int size, its actually min 16 bit I think (not
> always 32). So I guess to be portable with any system cannot use bigger
> ints than that. Otherwise must use cstdint.
>

No. What Mr Flibble wants to forget is that systems with 16 bit ints 
will have lots of other limitations, like 640k of total RAM. Or no hard 
disk.

Even if you use int32_t for your variables, an average program wouldn't 
fit in available memory anyway. Or wouldn't run for other reasons. So 
why bother?


Bo Persson


0
Bo
6/27/2015 4:58:59 AM
On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
> On 26/06/2015 21:31, JiiPee wrote:
> > On 26/06/2015 20:39, Mr Flibble wrote:
> >> ... #include <cstdint> instead!
> >>
> >> /Flibble
> >
> > you mean using int16_t? why is that?
> > int is the fastest integer so why would i use something else?
> 
> Because 'int' is both unsafe and non-portable.  If you want the fastest 
> integer that is at least a certain size then there is a <cstdint> 
> typedef for that.

Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>? 
I don't think that latter is anyway safer or more portable than
former.
 
0
ISO
6/27/2015 6:27:44 AM
On 6/26/2015 11:58 PM, Bo Persson wrote:
> On 2015-06-27 02:16, JiiPee wrote:
>> On 26/06/2015 23:44, JiiPee wrote:
>>>
>>> How is is unsafe? I though int is in all system one byte integer,
>>> right? int choses automatically the faster integer type.
>>
>> I forgot the minimum int size, its actually min 16 bit I think (not
>> always 32). So I guess to be portable with any system cannot use bigger
>> ints than that. Otherwise must use cstdint.
>>
>
> No. What Mr Flibble wants to forget is that systems with 16 bit ints
> will have lots of other limitations, like 640k of total RAM. Or no hard
> disk.
>

640kB is pretty massive as far as 16-bit systems go.

the PC was sort of an oddball in that it stayed 16-bit a little longer 
than other systems, but had significantly more RAM than did most other 
16-bit systems which have existed (or other computers which had existed 
at the time).


and, to what extent 16-bit processors have lived on (such as in 
microcontrollers), it has generally been with much smaller address 
spaces than on the PC. (640kB of RAM would be huge by MCU standards...).


> Even if you use int32_t for your variables, an average program wouldn't
> fit in available memory anyway. Or wouldn't run for other reasons. So
> why bother?
>

yeah.

practically, 16-bit can mostly be ignored at this point.

if dealing with 16-bit DOS or Win16, there will be a lot bigger 
funkiness to deal with than assumptions about "sizeof(int)" being wrong. 
chances are, there will be some fairly specific reason for why code is 
being targeted at these.


and if dealing with a microcontroller... chances are the code will be 
written specifically for said microcontroller (there not being enough 
RAM or ROM space to try to put any generic code on it, more likely the 
code will need to be written specifically for that use-case).

0
BGB
6/27/2015 6:28:17 AM
Öö Tiib <ootiib@hot.ee> wrote:
> On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
> > On 26/06/2015 21:31, JiiPee wrote:
> > > On 26/06/2015 20:39, Mr Flibble wrote:
> > >> ... #include <cstdint> instead!
> > >>
> > >> /Flibble
> > >
> > > you mean using int16_t? why is that?
> > > int is the fastest integer so why would i use something else?
> > 
> > Because 'int' is both unsafe and non-portable.  If you want the fastest 
> > integer that is at least a certain size then there is a <cstdint> 
> > typedef for that.

> Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>? 
> I don't think that latter is anyway safer or more portable than
> former.

Especially when some clever-san notices that he actually
can store values larger than 32767 in an int_fast16_t on
his machine. And, since he just learned from Mr Flibble
that the types from <cstdint> are safe and portable (in
contrast to this horribly unsafe and non-portable 'int'
type) concludes that this must work on all architectures
and is such a cute trick, saving lots of memory for sure!
And the cycles repeats again...

                           Regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de
0
jt
6/27/2015 8:45:33 AM
BGB <cr88192@hotmail.com> writes:
>written specifically for said microcontroller (there not being enough 
>RAM or ROM space to try to put any generic code on it

  Templates that aren't used aren't instantiated.
  Templates can thus generate less code than non-templates!

  Most problems with �template code bloat� arise from
  misunderstandings of template rules or improper use
  of templates.

0
ram
6/27/2015 1:46:16 PM
On 6/27/2015 8:46 AM, Stefan Ram wrote:
> BGB <cr88192@hotmail.com> writes:
>> written specifically for said microcontroller (there not being enough
>> RAM or ROM space to try to put any generic code on it
>
>    Templates that aren't used aren't instantiated.
>    Templates can thus generate less code than non-templates!
>
>    Most problems with �template code bloat� arise from
>    misunderstandings of template rules or improper use
>    of templates.
>

errm, wasn't talking about templates, but using 'generic' in the general 
sense, as-in, 'general purpose' code.

because of the typically small ROM space, most of the code needs to be 
rather specific to the project.


it depends on the microcontroller though how much ROM space there is, 
but typically it is in the kB range for 16-bit devices (32kB to 128kB is 
fairly typical).

0
BGB
6/27/2015 2:01:37 PM
On 27/06/2015 07:27, �� Tiib wrote:
> On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
>> On 26/06/2015 21:31, JiiPee wrote:
>>> On 26/06/2015 20:39, Mr Flibble wrote:
>>>> ... #include <cstdint> instead!
>>>>
>>>> /Flibble
>>>
>>> you mean using int16_t? why is that?
>>> int is the fastest integer so why would i use something else?
>>
>> Because 'int' is both unsafe and non-portable.  If you want the fastest
>> integer that is at least a certain size then there is a <cstdint>
>> typedef for that.
>
> Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
> I don't think that latter is anyway safer or more portable than
> former.

Because use of the typedef means you are explicitly documenting in code 
that the size of this type can vary and should be treated as such.

/Flibble

0
Mr
6/27/2015 2:46:19 PM
On Saturday, 27 June 2015 17:06:08 UTC+3, BGB  wrote:
> On 6/27/2015 8:46 AM, Stefan Ram wrote:
> > BGB <cr88192@hotmail.com> writes:
> >> written specifically for said microcontroller (there not being enough
> >> RAM or ROM space to try to put any generic code on it
> >
> >    Templates that aren't used aren't instantiated.
> >    Templates can thus generate less code than non-templates!
> >
> >    Most problems with =BBtemplate code bloat=AB arise from
> >    misunderstandings of template rules or improper use
> >    of templates.
> >
>=20
> errm, wasn't talking about templates, but using 'generic' in the general=
=20
> sense, as-in, 'general purpose' code.

Then it is straw-man in C++ sense. We lately tend to have all less and
less general purpose non-template code. Popular C++ libraries tend to
move towards templates and header only.

The ones that don't do it lose in popularity because of the unneeded
binary bloat, intrusive interface, not doing everything that is
possible compile-time and failing to generate competitive binaries.

The libraries that may survive that trend longer are in-house libraries
but those are not general purpose anyway.

> because of the typically small ROM space, most of the code needs to be=20
> rather specific to the project.
>
> it depends on the microcontroller though how much ROM space there is,=20
> but typically it is in the kB range for 16-bit devices (32kB to 128kB is=
=20
> fairly typical).

If the controller has 2048 bytes of ROM or under then it does only=20
very few things. Even in assembler it can't take lot of weeks to
implement what it does. 32kB however can be quite lot of C++. Templates
(that is what we mean with "generic" in C++) are powerful tool on that
level.
0
ISO
6/27/2015 6:01:43 PM
On Saturday, 27 June 2015 17:46:19 UTC+3, Mr Flibble  wrote:
> On 27/06/2015 07:27, =D6=F6 Tiib wrote:
> > On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
> >> On 26/06/2015 21:31, JiiPee wrote:
> >>> On 26/06/2015 20:39, Mr Flibble wrote:
> >>>> ... #include <cstdint> instead!
> >>>>
> >>>> /Flibble
> >>>
> >>> you mean using int16_t? why is that?
> >>> int is the fastest integer so why would i use something else?
> >>
> >> Because 'int' is both unsafe and non-portable.  If you want the fastes=
t
> >> integer that is at least a certain size then there is a <cstdint>
> >> typedef for that.
> >
> > Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
> > I don't think that latter is anyway safer or more portable than
> > former.
>=20
> Because use of the typedef means you are explicitly documenting in code=
=20
> that the size of this type can vary and should be treated as such.

Maybe try to elaborate it bit more. What *is* the special treatment you
have in mind?=20

Usage of 'int', 'long', 'wchar_t', 'size_t', 'time_t', 'ptrdiff_t',=20
'fpos_t' (etc. just name it) in C or in C++ means that the byte size of
it may (and eventually will) vary per platform or next generation
of same platform. Also if two of above are exactly same type on current
platform then it is wrong to assume that these two are same on some other
platform (unlike 'int' and 'int_fast16_t' that must be are same in all=20
platforms where 'int_fast16_t' is defined). That all *must* be known of
course to everyone wanting to use C++ but what is the treatment?

Might be you expect too lot of the types of <cstdint>. Exact size of
type matters for example on case of binary exchange and storage formats.
Actually I don't know much other prominent cases. Are there any?

It may be that you currently deal with some such format ... but most=20
things are text these days like (JSON, XML or CSV). It is narrow=20
fraction of works where we want to squeeze last out and pick a binary
format.

Fixing physical byte size of types that participate in binary format
is unfortunately not the solution. Alignment and endianness are still
to be taken care of. Also since we took it for to squeeze everything
out of our bandwidth then why not to compress it even further? When
a value is in range 0..1000 then why not to use only 10 bits for it?
Well-made bit-packing can be faster than 'memcpy' on modern platforms
thanks to less cache misses.=20

A buffer of 'uint8_t' is therefore pretty much the best underlying
type for a portable binary format. Where 'uint8_t' exists at all there
it is precisely 'unsigned char' so 'uint8_t' can be considered just a
shorthand for 'unsigned char'.=20
0
ISO
6/27/2015 9:17:27 PM
On 27/06/2015 22:17, �� Tiib wrote:
> On Saturday, 27 June 2015 17:46:19 UTC+3, Mr Flibble  wrote:
>> On 27/06/2015 07:27, �� Tiib wrote:
>>> On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
>>>> On 26/06/2015 21:31, JiiPee wrote:
>>>>> On 26/06/2015 20:39, Mr Flibble wrote:
>>>>>> ... #include <cstdint> instead!
>>>>>>
>>>>>> /Flibble
>>>>>
>>>>> you mean using int16_t? why is that?
>>>>> int is the fastest integer so why would i use something else?
>>>>
>>>> Because 'int' is both unsafe and non-portable.  If you want the fastest
>>>> integer that is at least a certain size then there is a <cstdint>
>>>> typedef for that.
>>>
>>> Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
>>> I don't think that latter is anyway safer or more portable than
>>> former.
>>
>> Because use of the typedef means you are explicitly documenting in code
>> that the size of this type can vary and should be treated as such.
>
> Maybe try to elaborate it bit more. What *is* the special treatment you
> have in mind?
>
> Usage of 'int', 'long', 'wchar_t', 'size_t', 'time_t', 'ptrdiff_t',
> 'fpos_t' (etc. just name it) in C or in C++ means that the byte size of
> it may (and eventually will) vary per platform or next generation
> of same platform. Also if two of above are exactly same type on current
> platform then it is wrong to assume that these two are same on some other
> platform (unlike 'int' and 'int_fast16_t' that must be are same in all
> platforms where 'int_fast16_t' is defined). That all *must* be known of
> course to everyone wanting to use C++ but what is the treatment?
>
> Might be you expect too lot of the types of <cstdint>. Exact size of
> type matters for example on case of binary exchange and storage formats.
> Actually I don't know much other prominent cases. Are there any?
>
> It may be that you currently deal with some such format ... but most
> things are text these days like (JSON, XML or CSV). It is narrow
> fraction of works where we want to squeeze last out and pick a binary
> format.
>
> Fixing physical byte size of types that participate in binary format
> is unfortunately not the solution. Alignment and endianness are still
> to be taken care of. Also since we took it for to squeeze everything
> out of our bandwidth then why not to compress it even further? When
> a value is in range 0..1000 then why not to use only 10 bits for it?
> Well-made bit-packing can be faster than 'memcpy' on modern platforms
> thanks to less cache misses.
>
> A buffer of 'uint8_t' is therefore pretty much the best underlying
> type for a portable binary format. Where 'uint8_t' exists at all there
> it is precisely 'unsigned char' so 'uint8_t' can be considered just a
> shorthand for 'unsigned char'.

Because the size and value range of 'int' can vary from platform to 
platform 'int' is inherently unsafe and non-portable.  Use <cstdint> 
typedefs instead.

/Flibble


0
Mr
6/27/2015 9:19:53 PM
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk> wrote:
> On 27/06/2015 07:27, Öö Tiib wrote:
> > On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
> >> On 26/06/2015 21:31, JiiPee wrote:
> >>> On 26/06/2015 20:39, Mr Flibble wrote:
> >>>> ... #include <cstdint> instead!
> >>>>
> >>> you mean using int16_t? why is that?
> >>> int is the fastest integer so why would i use something else?
> >>
> >> Because 'int' is both unsafe and non-portable.  If you want the fastest
> >> integer that is at least a certain size then there is a <cstdint>
> >> typedef for that.
> >
> > Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
> > I don't think that latter is anyway safer or more portable than
> > former.

> Because use of the typedef means you are explicitly documenting in code 
> that the size of this type can vary and should be treated as such.

That's a wrong expectation. For people that do know that 'int'
has a rather limited range it's just visual noise. And those
that don't know will just be puzzled what this 'magical incan-
tation' is supposed to mean and move on, not any more educated
than before.

This isn't a problem that can be solved by some coding conven-
tion (that most won't even understand what it's all about). All
that this will achieve is that a few adopt it in a cargo-cult
programming way, i.e. using 'int_fast16_t' instead of 'int' be-
cause it looks "cool" - but they will still use it to store
too large values - which, in nearly all cases, is going to work
perfectly well on their systems. Without real hard, non-off-
switchable compiler and runtime range checking you can't
stop that kind of behaviour.

Just because you know what the '16' in 'int_fast16_t' means in
no way guarantees that others also do. For those that don't it's
not any more "explicitly documenting" than a mere 'int'. I've
seen too many people, including some that had worked for "im-
portant companies" in Silicon Valley, that had no idea at all
of any portability issues - they had never even heard of the
existence of architecures with other ranges for ints, endia-
nnesses or, God beware, chars with more than 8 bits or
alignment issues etc. (admittedly, they mostly came from a
Java background, so they had been living in a "padded cell"
where they couldn't hurt themselves too much;-).

<cstdint> is an extremely useful addition when you need types
of guaranteed sizes, especially when doing low level stuff,
but it's not a silver bullet to eradicate unawareness of por-
tability issues. For that people have to become aware of the
mere existence of such problems first. Only then the stuff
from <cstdint> becomes meaningful. Just making them write

  #include <cstdint>
  ...
  for ( int_fast16_t i = 0; i < 5; ++i )
  {...}

instead of, more simply,

  for ( int i = 0; i < 5; ++i )
  {...}

won't fix any such problems because they'll also do

  #include <cstdint>
  ...
  for ( int_fast16_t i = 99990; i < 100000; ++i )
  {...}

and get away with it...

On the other hand, making unsubstantiated claims that the
use of 'int' would be inherently "unsafe" or "non-portable"
makes you look a bit "strange" to those that have used
'int' safely and in a portable manner for a quarter of
a century.
                        Regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de
0
jt
6/27/2015 10:05:08 PM
On Saturday, June 27, 2015 at 4:17:38 PM UTC-5, =D6=F6 Tiib wrote:
> On Saturday, 27 June 2015 17:46:19 UTC+3, Mr Flibble  wrote:
> > On 27/06/2015 07:27, =D6=F6 Tiib wrote:
> > > On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
> > >> On 26/06/2015 21:31, JiiPee wrote:
> > >>> On 26/06/2015 20:39, Mr Flibble wrote:
> > >>>> ... #include <cstdint> instead!
> > >>>>
> > >>>> /Flibble
> > >>>
> > >>> you mean using int16_t? why is that?
> > >>> int is the fastest integer so why would i use something else?
> > >>
> > >> Because 'int' is both unsafe and non-portable.  If you want the fast=
est
> > >> integer that is at least a certain size then there is a <cstdint>
> > >> typedef for that.
> > >
> > > Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
> > > I don't think that latter is anyway safer or more portable than
> > > former.
> >=20
> > Because use of the typedef means you are explicitly documenting in code=
=20
> > that the size of this type can vary and should be treated as such.
>=20
> Maybe try to elaborate it bit more. What *is* the special treatment you
> have in mind?=20
>=20
> Usage of 'int', 'long', 'wchar_t', 'size_t', 'time_t', 'ptrdiff_t',=20
> 'fpos_t' (etc. just name it) in C or in C++ means that the byte size of
> it may (and eventually will) vary per platform or next generation
> of same platform. Also if two of above are exactly same type on current
> platform then it is wrong to assume that these two are same on some other
> platform (unlike 'int' and 'int_fast16_t' that must be are same in all=20
> platforms where 'int_fast16_t' is defined). That all *must* be known of
> course to everyone wanting to use C++ but what is the treatment?
>=20
> Might be you expect too lot of the types of <cstdint>. Exact size of
> type matters for example on case of binary exchange and storage formats.
> Actually I don't know much other prominent cases. Are there any?
>=20
> It may be that you currently deal with some such format ... but most=20
> things are text these days like (JSON, XML or CSV). It is narrow=20
> fraction of works where we want to squeeze last out and pick a binary
> format.

There are a lot of games that use binary formats.  Scientific
and medical applications often use binary formats.  Also I=20
posted an article here a few weeks ago about bandwidth hitting=20
a wall.  This isn't it, but it's related:

https://royalsociety.org/events/2015/05/communication-networks/


Brian
Ebenezer Enterprises - So far G-d has helped us.
http://webEbenezer.net
0
woodbrian77
6/27/2015 10:21:57 PM
On Sunday, 28 June 2015 00:19:52 UTC+3, Mr Flibble  wrote:
> On 27/06/2015 22:17, =D6=F6 Tiib wrote:
> > On Saturday, 27 June 2015 17:46:19 UTC+3, Mr Flibble  wrote:
> >> On 27/06/2015 07:27, =D6=F6 Tiib wrote:
> >>> On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
> >>>> On 26/06/2015 21:31, JiiPee wrote:
> >>>>> On 26/06/2015 20:39, Mr Flibble wrote:
> >>>>>> ... #include <cstdint> instead!
> >>>>>>
> >>>>>> /Flibble
> >>>>>
> >>>>> you mean using int16_t? why is that?
> >>>>> int is the fastest integer so why would i use something else?
> >>>>
> >>>> Because 'int' is both unsafe and non-portable.  If you want the fast=
est
> >>>> integer that is at least a certain size then there is a <cstdint>
> >>>> typedef for that.
> >>>
> >>> Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
> >>> I don't think that latter is anyway safer or more portable than
> >>> former.
> >>
> >> Because use of the typedef means you are explicitly documenting in cod=
e
> >> that the size of this type can vary and should be treated as such.
> >
> > Maybe try to elaborate it bit more. What *is* the special treatment you
> > have in mind?
> >
> > Usage of 'int', 'long', 'wchar_t', 'size_t', 'time_t', 'ptrdiff_t',
> > 'fpos_t' (etc. just name it) in C or in C++ means that the byte size of
> > it may (and eventually will) vary per platform or next generation
> > of same platform. Also if two of above are exactly same type on current
> > platform then it is wrong to assume that these two are same on some oth=
er
> > platform (unlike 'int' and 'int_fast16_t' that must be are same in all
> > platforms where 'int_fast16_t' is defined). That all *must* be known of
> > course to everyone wanting to use C++ but what is the treatment?
> >
> > Might be you expect too lot of the types of <cstdint>. Exact size of
> > type matters for example on case of binary exchange and storage formats=
..
> > Actually I don't know much other prominent cases. Are there any?
> >
> > It may be that you currently deal with some such format ... but most
> > things are text these days like (JSON, XML or CSV). It is narrow
> > fraction of works where we want to squeeze last out and pick a binary
> > format.
> >
> > Fixing physical byte size of types that participate in binary format
> > is unfortunately not the solution. Alignment and endianness are still
> > to be taken care of. Also since we took it for to squeeze everything
> > out of our bandwidth then why not to compress it even further? When
> > a value is in range 0..1000 then why not to use only 10 bits for it?
> > Well-made bit-packing can be faster than 'memcpy' on modern platforms
> > thanks to less cache misses.
> >
> > A buffer of 'uint8_t' is therefore pretty much the best underlying
> > type for a portable binary format. Where 'uint8_t' exists at all there
> > it is precisely 'unsigned char' so 'uint8_t' can be considered just a
> > shorthand for 'unsigned char'.
>=20
> Because the size and value range of 'int' can vary from platform to=20
> platform 'int' is inherently unsafe and non-portable.  Use <cstdint>=20
> typedefs instead.

Use for what? Stop repeating those pointless round silver bullet dogmas.=20

Value range and byte size of types from <cstddef> can and do vary
from platform to platform but not in sync with 'int'. These are=20
therefore more safe and more portable for typical usages.

If someone used 'int' instead of 'ptrdiff_t' or 'size_t' then it=20
is indeed a mild defect but it can not be repaired by using anything
from <cstdint>.=20

0
ISO
6/27/2015 11:42:31 PM
On Sunday, 28 June 2015 01:22:07 UTC+3, woodb...@gmail.com  wrote:
> On Saturday, June 27, 2015 at 4:17:38 PM UTC-5, =D6=F6 Tiib wrote:
> > On Saturday, 27 June 2015 17:46:19 UTC+3, Mr Flibble  wrote:
> > > On 27/06/2015 07:27, =D6=F6 Tiib wrote:
> > > > On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
> > > >> On 26/06/2015 21:31, JiiPee wrote:
> > > >>> On 26/06/2015 20:39, Mr Flibble wrote:
> > > >>>> ... #include <cstdint> instead!
> > > >>>>
> > > >>>> /Flibble
> > > >>>
> > > >>> you mean using int16_t? why is that?
> > > >>> int is the fastest integer so why would i use something else?
> > > >>
> > > >> Because 'int' is both unsafe and non-portable.  If you want the fa=
stest
> > > >> integer that is at least a certain size then there is a <cstdint>
> > > >> typedef for that.
> > > >
> > > > Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
> > > > I don't think that latter is anyway safer or more portable than
> > > > former.
> > >=20
> > > Because use of the typedef means you are explicitly documenting in co=
de=20
> > > that the size of this type can vary and should be treated as such.
> >=20
> > Maybe try to elaborate it bit more. What *is* the special treatment you
> > have in mind?=20
> >=20
> > Usage of 'int', 'long', 'wchar_t', 'size_t', 'time_t', 'ptrdiff_t',=20
> > 'fpos_t' (etc. just name it) in C or in C++ means that the byte size of
> > it may (and eventually will) vary per platform or next generation
> > of same platform. Also if two of above are exactly same type on current
> > platform then it is wrong to assume that these two are same on some oth=
er
> > platform (unlike 'int' and 'int_fast16_t' that must be are same in all=
=20
> > platforms where 'int_fast16_t' is defined). That all *must* be known of
> > course to everyone wanting to use C++ but what is the treatment?
> >=20
> > Might be you expect too lot of the types of <cstdint>. Exact size of
> > type matters for example on case of binary exchange and storage formats=
..
> > Actually I don't know much other prominent cases. Are there any?
> >=20
> > It may be that you currently deal with some such format ... but most=20
> > things are text these days like (JSON, XML or CSV). It is narrow=20
> > fraction of works where we want to squeeze last out and pick a binary
> > format.
>=20
> There are a lot of games that use binary formats.  Scientific
> and medical applications often use binary formats.

All programs that use sounds or images technically use binary formats
but those are abstracted far under some low level API from programmers.
I did not mean that.
0
ISO
6/27/2015 11:47:29 PM
On 6/27/2015 1:01 PM, �� Tiib wrote:
> On Saturday, 27 June 2015 17:06:08 UTC+3, BGB  wrote:
>> On 6/27/2015 8:46 AM, Stefan Ram wrote:
>>> BGB <cr88192@hotmail.com> writes:
>>>> written specifically for said microcontroller (there not being enough
>>>> RAM or ROM space to try to put any generic code on it
>>>
>>>     Templates that aren't used aren't instantiated.
>>>     Templates can thus generate less code than non-templates!
>>>
>>>     Most problems with �template code bloat� arise from
>>>     misunderstandings of template rules or improper use
>>>     of templates.
>>>
>>
>> errm, wasn't talking about templates, but using 'generic' in the general
>> sense, as-in, 'general purpose' code.
>
> Then it is straw-man in C++ sense. We lately tend to have all less and
> less general purpose non-template code. Popular C++ libraries tend to
> move towards templates and header only.
>
> The ones that don't do it lose in popularity because of the unneeded
> binary bloat, intrusive interface, not doing everything that is
> possible compile-time and failing to generate competitive binaries.
>
> The libraries that may survive that trend longer are in-house libraries
> but those are not general purpose anyway.
>

I wasn't speaking either for or against templates.


>> because of the typically small ROM space, most of the code needs to be
>> rather specific to the project.
>>
>> it depends on the microcontroller though how much ROM space there is,
>> but typically it is in the kB range for 16-bit devices (32kB to 128kB is
>> fairly typical).
>
> If the controller has 2048 bytes of ROM or under then it does only
> very few things. Even in assembler it can't take lot of weeks to
> implement what it does. 32kB however can be quite lot of C++. Templates
> (that is what we mean with "generic" in C++) are powerful tool on that
> level.
>

ok.

but, yeah, it depends a lot on the MCU.

0
BGB
6/28/2015 1:46:14 AM
On 27/06/2015 23:05, Jens Thoms Toerring wrote:
> Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk> wrote:
>> On 27/06/2015 07:27, Öö Tiib wrote:
>>> On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
>>>> On 26/06/2015 21:31, JiiPee wrote:
>>>>> On 26/06/2015 20:39, Mr Flibble wrote:
>>>>>> ... #include <cstdint> instead!
>>>>>>
>>>>> you mean using int16_t? why is that?
>>>>> int is the fastest integer so why would i use something else?
>>>>
>>>> Because 'int' is both unsafe and non-portable.  If you want the fastest
>>>> integer that is at least a certain size then there is a <cstdint>
>>>> typedef for that.
>>>
>>> Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
>>> I don't think that latter is anyway safer or more portable than
>>> former.
>
>> Because use of the typedef means you are explicitly documenting in code
>> that the size of this type can vary and should be treated as such.
>
> That's a wrong expectation. For people that do know that 'int'
> has a rather limited range it's just visual noise. And those
> that don't know will just be puzzled what this 'magical incan-
> tation' is supposed to mean and move on, not any more educated
> than before.
>
> This isn't a problem that can be solved by some coding conven-
> tion (that most won't even understand what it's all about). All
> that this will achieve is that a few adopt it in a cargo-cult
> programming way, i.e. using 'int_fast16_t' instead of 'int' be-
> cause it looks "cool" - but they will still use it to store
> too large values - which, in nearly all cases, is going to work
> perfectly well on their systems. Without real hard, non-off-
> switchable compiler and runtime range checking you can't
> stop that kind of behaviour.
>
> Just because you know what the '16' in 'int_fast16_t' means in
> no way guarantees that others also do. For those that don't it's
> not any more "explicitly documenting" than a mere 'int'. I've
> seen too many people, including some that had worked for "im-
> portant companies" in Silicon Valley, that had no idea at all
> of any portability issues - they had never even heard of the
> existence of architecures with other ranges for ints, endia-
> nnesses or, God beware, chars with more than 8 bits or
> alignment issues etc. (admittedly, they mostly came from a
> Java background, so they had been living in a "padded cell"
> where they couldn't hurt themselves too much;-).
>
> <cstdint> is an extremely useful addition when you need types
> of guaranteed sizes, especially when doing low level stuff,
> but it's not a silver bullet to eradicate unawareness of por-
> tability issues. For that people have to become aware of the
> mere existence of such problems first. Only then the stuff
> from <cstdint> becomes meaningful. Just making them write
>
>    #include <cstdint>
>    ...
>    for ( int_fast16_t i = 0; i < 5; ++i )
>    {...}
>
> instead of, more simply,
>
>    for ( int i = 0; i < 5; ++i )
>    {...}
>
> won't fix any such problems because they'll also do
>
>    #include <cstdint>
>    ...
>    for ( int_fast16_t i = 99990; i < 100000; ++i )
>    {...}
>
> and get away with it...
>
> On the other hand, making unsubstantiated claims that the
> use of 'int' would be inherently "unsafe" or "non-portable"
> makes you look a bit "strange" to those that have used
> 'int' safely and in a portable manner for a quarter of
> a century.

My claims are not unsubstantiated: see the MISRA C++ coding standard for 
safety critical systems: it bans the use of 'int' BECAUSE it is 
inherently non-portable and therefore unsafe.  Use the typedefs from 
<cstdint> instead; DO NOT use 'int'.

/Flibble

0
Mr
6/28/2015 2:12:07 AM
On 6/27/2015 4:17 PM, �� Tiib wrote:
> On Saturday, 27 June 2015 17:46:19 UTC+3, Mr Flibble  wrote:
>> On 27/06/2015 07:27, �� Tiib wrote:
>>> On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
>>>> On 26/06/2015 21:31, JiiPee wrote:
>>>>> On 26/06/2015 20:39, Mr Flibble wrote:
>>>>>> ... #include <cstdint> instead!
>>>>>>
>>>>>> /Flibble
>>>>>
>>>>> you mean using int16_t? why is that?
>>>>> int is the fastest integer so why would i use something else?
>>>>
>>>> Because 'int' is both unsafe and non-portable.  If you want the fastest
>>>> integer that is at least a certain size then there is a <cstdint>
>>>> typedef for that.
>>>
>>> Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
>>> I don't think that latter is anyway safer or more portable than
>>> former.
>>
>> Because use of the typedef means you are explicitly documenting in code
>> that the size of this type can vary and should be treated as such.
>
> Maybe try to elaborate it bit more. What *is* the special treatment you
> have in mind?
>
> Usage of 'int', 'long', 'wchar_t', 'size_t', 'time_t', 'ptrdiff_t',
> 'fpos_t' (etc. just name it) in C or in C++ means that the byte size of
> it may (and eventually will) vary per platform or next generation
> of same platform. Also if two of above are exactly same type on current
> platform then it is wrong to assume that these two are same on some other
> platform (unlike 'int' and 'int_fast16_t' that must be are same in all
> platforms where 'int_fast16_t' is defined). That all *must* be known of
> course to everyone wanting to use C++ but what is the treatment?
>
> Might be you expect too lot of the types of <cstdint>. Exact size of
> type matters for example on case of binary exchange and storage formats.
> Actually I don't know much other prominent cases. Are there any?
>
> It may be that you currently deal with some such format ... but most
> things are text these days like (JSON, XML or CSV). It is narrow
> fraction of works where we want to squeeze last out and pick a binary
> format.
>
> Fixing physical byte size of types that participate in binary format
> is unfortunately not the solution. Alignment and endianness are still
> to be taken care of. Also since we took it for to squeeze everything
> out of our bandwidth then why not to compress it even further? When
> a value is in range 0..1000 then why not to use only 10 bits for it?
> Well-made bit-packing can be faster than 'memcpy' on modern platforms
> thanks to less cache misses.
>

bit-streams and variable-length numbers can also be useful, but are more 
painful to work with.

bit-packing and byte-oriented formats can also do pretty good if done 
well. though, yes, the code will need to take care of things like endianess.

sometimes there are tradeoffs, like one format which could have 
technically been a little more space-efficient with differentially-coded 
variable-width numbers for a table, but then needing to decode the table 
prior to being able to use it. I instead used a design where the field 
was 16/24/32 bits wide (for the whole table) depending on what was 
needed for the target-index, but the fixed-width entries allow for 
random access to the table.

this sort of things has in a few cases pointed towards arguably 
old/archaic solutions, like using codepages (particularly 1252 and 437) 
in a few cases due to fixed-width characters being cheaper for random 
access than UTF-8, but less overly expensive vs UTF-16, and sufficient 
in the vast majority of cases (and are the cheapest option in cases when 
they are an option). note that this would be done per-string.


> A buffer of 'uint8_t' is therefore pretty much the best underlying
> type for a portable binary format. Where 'uint8_t' exists at all there
> it is precisely 'unsigned char' so 'uint8_t' can be considered just a
> shorthand for 'unsigned char'.
>

yeah.

pretty much every data format comes down to bytes.

0
BGB
6/28/2015 2:21:11 AM
On 6/27/15 10:12 PM, Mr Flibble wrote:
>
> My claims are not unsubstantiated: see the MISRA C++ coding standard for
> safety critical systems: it bans the use of 'int' BECAUSE it is
> inherently non-portable and therefore unsafe.  Use the typedefs from
> <cstdint> instead; DO NOT use 'int'.
>
> /Flibble
>

They also totally banned the use of 'char', including for holding text 
strings. They later realized the sillyness of that and changed it, but 
that shows they are not infallible.
0
Richard
6/28/2015 1:40:34 PM
On Sunday, 28 June 2015 05:25:39 UTC+3, BGB  wrote:
> On 6/27/2015 4:17 PM, =D6=F6 Tiib wrote:
> > On Saturday, 27 June 2015 17:46:19 UTC+3, Mr Flibble  wrote:
> >> On 27/06/2015 07:27, =D6=F6 Tiib wrote:
> >>> On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
> >>>> On 26/06/2015 21:31, JiiPee wrote:
> >>>>> On 26/06/2015 20:39, Mr Flibble wrote:
> >>>>>> ... #include <cstdint> instead!
> >>>>>>
> >>>>>> /Flibble
> >>>>>
> >>>>> you mean using int16_t? why is that?
> >>>>> int is the fastest integer so why would i use something else?
> >>>>
> >>>> Because 'int' is both unsafe and non-portable.  If you want the fast=
est
> >>>> integer that is at least a certain size then there is a <cstdint>
> >>>> typedef for that.
> >>>
> >>> Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
> >>> I don't think that latter is anyway safer or more portable than
> >>> former.
> >>
> >> Because use of the typedef means you are explicitly documenting in cod=
e
> >> that the size of this type can vary and should be treated as such.
> >
> > Maybe try to elaborate it bit more. What *is* the special treatment you
> > have in mind?
> >
> > Usage of 'int', 'long', 'wchar_t', 'size_t', 'time_t', 'ptrdiff_t',
> > 'fpos_t' (etc. just name it) in C or in C++ means that the byte size of
> > it may (and eventually will) vary per platform or next generation
> > of same platform. Also if two of above are exactly same type on current
> > platform then it is wrong to assume that these two are same on some oth=
er
> > platform (unlike 'int' and 'int_fast16_t' that must be are same in all
> > platforms where 'int_fast16_t' is defined). That all *must* be known of
> > course to everyone wanting to use C++ but what is the treatment?
> >
> > Might be you expect too lot of the types of <cstdint>. Exact size of
> > type matters for example on case of binary exchange and storage formats=
..
> > Actually I don't know much other prominent cases. Are there any?
> >
> > It may be that you currently deal with some such format ... but most
> > things are text these days like (JSON, XML or CSV). It is narrow
> > fraction of works where we want to squeeze last out and pick a binary
> > format.
> >
> > Fixing physical byte size of types that participate in binary format
> > is unfortunately not the solution. Alignment and endianness are still
> > to be taken care of. Also since we took it for to squeeze everything
> > out of our bandwidth then why not to compress it even further? When
> > a value is in range 0..1000 then why not to use only 10 bits for it?
> > Well-made bit-packing can be faster than 'memcpy' on modern platforms
> > thanks to less cache misses.
>=20
> bit-streams and variable-length numbers can also be useful, but are more=
=20
> painful to work with.
>=20
> bit-packing and byte-oriented formats can also do pretty good if done=20
> well. though, yes, the code will need to take care of things like endiane=
ss.

We have to take care of endianness in any portable format. It is=20
particularly funny since neither C nor C++ contain standard way for=20
detecting endianness compile-time. There are some libraries that use
every known non-standard way for that and so produce minimal code.

> sometimes there are tradeoffs, like one format which could have=20
> technically been a little more space-efficient with differentially-coded=
=20
> variable-width numbers for a table, but then needing to decode the table=
=20
> prior to being able to use it. I instead used a design where the field=20
> was 16/24/32 bits wide (for the whole table) depending on what was=20
> needed for the target-index, but the fixed-width entries allow for=20
> random access to the table.

It indeed depends on anticipated amounts of data if it is worth that
or not but if we know that a table contains only for example values 0..11
then it feels reasonable at least to consider 4 bit wide entries. The
processors crunch numbers at ungodly speeds but it is 4 times shorter
table than one with 16 bit wide entries.
=20
> this sort of things has in a few cases pointed towards arguably=20
> old/archaic solutions, like using codepages (particularly 1252 and 437)=
=20
> in a few cases due to fixed-width characters being cheaper for random=20
> access than UTF-8, but less overly expensive vs UTF-16, and sufficient=20
> in the vast majority of cases (and are the cheapest option in cases when=
=20
> they are an option). note that this would be done per-string.

How we moved somehow now from int32_t of Leigh to binary representations
of text? :) But that is interesting topic as well. Have you noticed that
with text the need for random access of contents is actually rather rare
case?  OTOH storage for texts can be significant if there are lot of texts
or lot of translations. Number of PC software let to download and install
translations separately or optionally.
=20
Above can't be done with embedded system so easily since it can affect pric=
e
of unit to organize flashing the very same product differently for each
market targeted. When access is sequential then polished Huffman decoding
does actually rarely affect performance. So I have seen embedded systems
keeping the text dictionaries Huffman encoded all time. If to keep texts
Huffman encoded anyway then UCS-2 or UTF-16 are perfectly fine and there
are no need for archaic tricks like Windows-1252 or Code Page 437. =20

0
ISO
6/28/2015 1:43:29 PM
On Sun, 28 Jun 2015 06:43:29 -0700 (PDT), �� Tiib wrote:

>We have to take care of endianness in any portable format. 

i'm not agree
because in every endianess in the same unsigned type 
and the same size
+/-*
would be the same, despite endianes
number & 8
would be the same
so wuold be ok for | too and not etc
etc

where is the problem with endianess of the number?

0
Rosario19
6/28/2015 1:53:28 PM
On Sunday, 28 June 2015 16:40:45 UTC+3, Richard Damon  wrote:
> On 6/27/15 10:12 PM, Mr Flibble wrote:
> >
> > My claims are not unsubstantiated: see the MISRA C++ coding standard for
> > safety critical systems: it bans the use of 'int' BECAUSE it is
> > inherently non-portable and therefore unsafe.  Use the typedefs from
> > <cstdint> instead; DO NOT use 'int'.
> 
> They also totally banned the use of 'char', including for holding text 
> strings. They later realized the sillyness of that and changed it, but 
> that shows they are not infallible.

Argument from authority is logically fallacious anyway and MISRA isn't
in list of authorities for me.

My feelings about MISRA C++ coding standard were that they had some
interesting points in it and so it was worth reading but it also
contained several apparent controversies that were not thought thru
properly so it can not be used in practice without fixing it.
0
ISO
6/28/2015 1:54:40 PM
On Sunday, 28 June 2015 16:53:35 UTC+3, Rosario19  wrote:
> On Sun, 28 Jun 2015 06:43:29 -0700 (PDT), =E5=98=B1 Tiib wrote:
>=20
> >We have to take care of endianness in any portable format.=20
>=20
> i'm not agree
> because in every endianess in the same unsigned type=20
> and the same size
> +/-*
> would be the same, despite endianes
> number & 8
> would be the same
> so wuold be ok for | too and not etc
> etc
>=20
> where is the problem with endianess of the number?

Rosario, we did talk about keeping data in some portable binary
format. We have portable binary formats for to achieve that one
computer saves it to disk or sends over internet and other computer
reads or receives it and both understand it in same way.=20

Different computers may keep the numbers in their own memory with
different endianness. So when computer reads or writes the bytes
of binary format then it must take care that those are ordered
correctly. That is what we call taking care about endianness in
portable format.
0
ISO
6/28/2015 2:10:37 PM
On 6/28/2015 8:43 AM, �� Tiib wrote:
> On Sunday, 28 June 2015 05:25:39 UTC+3, BGB  wrote:
>> On 6/27/2015 4:17 PM, �� Tiib wrote:
>>> On Saturday, 27 June 2015 17:46:19 UTC+3, Mr Flibble  wrote:
>>>> On 27/06/2015 07:27, �� Tiib wrote:
>>>>> On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
>>>>>> On 26/06/2015 21:31, JiiPee wrote:
>>>>>>> On 26/06/2015 20:39, Mr Flibble wrote:
>>>>>>>> ... #include <cstdint> instead!
>>>>>>>>
>>>>>>>> /Flibble
>>>>>>>
>>>>>>> you mean using int16_t? why is that?
>>>>>>> int is the fastest integer so why would i use something else?
>>>>>>
>>>>>> Because 'int' is both unsafe and non-portable.  If you want the fastest
>>>>>> integer that is at least a certain size then there is a <cstdint>
>>>>>> typedef for that.
>>>>>
>>>>> Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
>>>>> I don't think that latter is anyway safer or more portable than
>>>>> former.
>>>>
>>>> Because use of the typedef means you are explicitly documenting in code
>>>> that the size of this type can vary and should be treated as such.
>>>
>>> Maybe try to elaborate it bit more. What *is* the special treatment you
>>> have in mind?
>>>
>>> Usage of 'int', 'long', 'wchar_t', 'size_t', 'time_t', 'ptrdiff_t',
>>> 'fpos_t' (etc. just name it) in C or in C++ means that the byte size of
>>> it may (and eventually will) vary per platform or next generation
>>> of same platform. Also if two of above are exactly same type on current
>>> platform then it is wrong to assume that these two are same on some other
>>> platform (unlike 'int' and 'int_fast16_t' that must be are same in all
>>> platforms where 'int_fast16_t' is defined). That all *must* be known of
>>> course to everyone wanting to use C++ but what is the treatment?
>>>
>>> Might be you expect too lot of the types of <cstdint>. Exact size of
>>> type matters for example on case of binary exchange and storage formats.
>>> Actually I don't know much other prominent cases. Are there any?
>>>
>>> It may be that you currently deal with some such format ... but most
>>> things are text these days like (JSON, XML or CSV). It is narrow
>>> fraction of works where we want to squeeze last out and pick a binary
>>> format.
>>>
>>> Fixing physical byte size of types that participate in binary format
>>> is unfortunately not the solution. Alignment and endianness are still
>>> to be taken care of. Also since we took it for to squeeze everything
>>> out of our bandwidth then why not to compress it even further? When
>>> a value is in range 0..1000 then why not to use only 10 bits for it?
>>> Well-made bit-packing can be faster than 'memcpy' on modern platforms
>>> thanks to less cache misses.
>>
>> bit-streams and variable-length numbers can also be useful, but are more
>> painful to work with.
>>
>> bit-packing and byte-oriented formats can also do pretty good if done
>> well. though, yes, the code will need to take care of things like endianess.
>
> We have to take care of endianness in any portable format. It is
> particularly funny since neither C nor C++ contain standard way for
> detecting endianness compile-time. There are some libraries that use
> every known non-standard way for that and so produce minimal code.
>

yeah.

I prefer fixed endianess formats, personally.

granted, probably most everyone else does as well, as most formats are 
this way. just a few formats exist where the endianess depends on 
whichever computer saved the file, with magic numbers to detect when 
swapping is needed. I find these annoying.


some of my tools (script language and C compiler, as an extension) have 
the ability to specify the endianess for variables and pointer types 
(so, you can be sure the value is stored as either big or little endian, 
regardless of native endianess), and implicitly also makes it safe for 
misaligned loads/stores.

namely, if you know a value needs to be a particular way, then chances 
are the person is willing to pay whatever CPU cycles are needed to make 
it that way.


typically, for compile-time stuff, there is a mess of #define's and 
#ifdef's for figuring out the target architecture and other things, 
picking appropriate type-sizes and setting values for things like 
whether or not the target supports misaligned access, ...

I guess it could be nicer if more of this were standardized.


>> sometimes there are tradeoffs, like one format which could have
>> technically been a little more space-efficient with differentially-coded
>> variable-width numbers for a table, but then needing to decode the table
>> prior to being able to use it. I instead used a design where the field
>> was 16/24/32 bits wide (for the whole table) depending on what was
>> needed for the target-index, but the fixed-width entries allow for
>> random access to the table.
>
> It indeed depends on anticipated amounts of data if it is worth that
> or not but if we know that a table contains only for example values 0..11
> then it feels reasonable at least to consider 4 bit wide entries. The
> processors crunch numbers at ungodly speeds but it is 4 times shorter
> table than one with 16 bit wide entries.
>

could be, but the table entries in this case were fairly unlikely to be 
that much below 16 bits (so 8 bit or smaller would not have been useful).

these were basically offsets within a TLV lump. where you would have one 
TLV lump which contains lots of payload data (as an array of 
variable-sized sub-lumps packed end-to-end), and a table to say where 
each sub-lump is within that bigger lump (to better allow random access).

in most of the cases, the lumps were between kB or maybe up to a few MB, 
so 16 or 24 bits are the most likely cases, and always using 32-bits "to 
be safe" would be a waste.

it would be fairly easy to compress them further, but this would require 
decoding them before they could be used, which was undesirable in this case.


>> this sort of things has in a few cases pointed towards arguably
>> old/archaic solutions, like using codepages (particularly 1252 and 437)
>> in a few cases due to fixed-width characters being cheaper for random
>> access than UTF-8, but less overly expensive vs UTF-16, and sufficient
>> in the vast majority of cases (and are the cheapest option in cases when
>> they are an option). note that this would be done per-string.
>
> How we moved somehow now from int32_t of Leigh to binary representations
> of text? :) But that is interesting topic as well. Have you noticed that
> with text the need for random access of contents is actually rather rare
> case?  OTOH storage for texts can be significant if there are lot of texts
> or lot of translations. Number of PC software let to download and install
> translations separately or optionally.
>

yeah, probably should have been clearer.

this was for string literals/values in a VM.

in the predecessor VM, M-UTF-8 had been used for all the string literals 
(except the UTF-16 ones), which mostly worked (since direct 
per-character access is fairly rare), but it meant doing something like 
"str[idx]" would take 'O(n)' time (and looping over a string 
per-character would be O(n^2)...).

in the use-case for the new VM, I wanted O(1) access here (mostly to 
make things more predictable, *), but also didn't want the nearly pure 
waste that is UTF-16 strings.

however, the language in question uses UTF-16 as its logical model (so, 
from high-level code, it appears as if all strings are UTF-16). in the 
language, strings are immutable, so there is no issue with the use of 
ASCII or similar for the underlying storage.

in C, it isn't really an issue mostly as C makes no attempt to gloss 
over in-memory storage, so you can just return the raw byte values or 
similar.


*: the VM needs to be able to keep timing latencies bounded, which 
basically weighs against doing anything in the VM where the time-cost 
can't be easily predicted in advance. wherever possible, all operations 
need to be kept O(1), with the operation either being able to complete 
in the available time-step (generally 1us per "trace"), or the VM will 
need to halt and defer execution until later (blocking is not allowed, 
and any operations which may result in unexpected behaviors, such as 
halting, throwing an exception, ... effectively need to terminate the 
current trace, which makes them more expensive).

for some related reasons, the VM is also using B-Trees rather than hash 
tables in a few places (more predictable, if slower, than hashes, but 
less memory waste than AVL or BST variants). likewise, because of their 
structure, it is possible to predict in advance (based on the size of 
the tree) approximately how long it will take to perform the operation.


> Above can't be done with embedded system so easily since it can affect price
> of unit to organize flashing the very same product differently for each
> market targeted. When access is sequential then polished Huffman decoding
> does actually rarely affect performance. So I have seen embedded systems
> keeping the text dictionaries Huffman encoded all time. If to keep texts
> Huffman encoded anyway then UCS-2 or UTF-16 are perfectly fine and there
> are no need for archaic tricks like Windows-1252 or Code Page 437.
>

granted, but in this case, it is mostly for string literals, rather than 
bulk text storage.

Windows-1252 covers most general use-cases for text (and is fairly easy 
to convert to/from UTF-16, as for most of the range the characters map 1:1).
CP-437 is good mostly for things like ASCII art and text-based UIs.

for literals, it will be the job of the compiler to sort out which 
format to use.


bulk storage will tend to remain in compressed UTF-8.
though a more specialized format could be good.

I had good results before compressing short fragments (such as character 
strings) with a combination of LZ77 and MTF+Rice Coding, which for small 
pieces of data did significantly better than Deflate or LZMA. however, 
the MTF makes it slower per-character than a Huffman-based option.

basically, options like Deflate or LZMA are largely ineffective for 
payloads much under 200-500 bytes or so, but are much more effective as 
payloads get bigger.

0
BGB
6/28/2015 5:17:29 PM
On 26/06/2015 20:39, Mr Flibble wrote:
> ... #include <cstdint> instead!
>
> /Flibble

Just watching Scott Meyers videos. He seems to also always use :
int a =9;

not
fastint32 a = 9;

if int was wrong, surely they would not teach people using int, right? :)
0
JiiPee
6/28/2015 7:01:26 PM
On 28/06/2015 20:01, JiiPee wrote:
> On 26/06/2015 20:39, Mr Flibble wrote:
>> ... #include <cstdint> instead!
>>
>> /Flibble
>
> Just watching Scott Meyers videos. He seems to also always use :
> int a =9;
>
> not
> fastint32 a = 9;
>
> if int was wrong, surely they would not teach people using int, right? :)

also note that Scott recommends to use
auto a = 9;

so using auto. so letting the ccomputer to deside the type !!!
What will you say about this??? :)

0
JiiPee
6/28/2015 7:15:54 PM
On Saturday, June 27, 2015 at 6:47:38 PM UTC-5, =D6=F6 Tiib wrote:
> On Sunday, 28 June 2015 01:22:07 UTC+3, woodb...@gmail.com  wrote:
> > On Saturday, June 27, 2015 at 4:17:38 PM UTC-5, =D6=F6 Tiib wrote:
> > > On Saturday, 27 June 2015 17:46:19 UTC+3, Mr Flibble  wrote:
> > > > On 27/06/2015 07:27, =D6=F6 Tiib wrote:
> > > > > On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
> > > > >> On 26/06/2015 21:31, JiiPee wrote:
> > > > >>> On 26/06/2015 20:39, Mr Flibble wrote:
> > > > >>>> ... #include <cstdint> instead!
> > > > >>>>
> > > > >>>> /Flibble
> > > > >>>
> > > > >>> you mean using int16_t? why is that?
> > > > >>> int is the fastest integer so why would i use something else?
> > > > >>
> > > > >> Because 'int' is both unsafe and non-portable.  If you want the =
fastest
> > > > >> integer that is at least a certain size then there is a <cstdint=
>
> > > > >> typedef for that.
> > > > >
> > > > > Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
> > > > > I don't think that latter is anyway safer or more portable than
> > > > > former.
> > > >=20
> > > > Because use of the typedef means you are explicitly documenting in =
code=20
> > > > that the size of this type can vary and should be treated as such.
> > >=20
> > > Maybe try to elaborate it bit more. What *is* the special treatment y=
ou
> > > have in mind?=20
> > >=20
> > > Usage of 'int', 'long', 'wchar_t', 'size_t', 'time_t', 'ptrdiff_t',=
=20
> > > 'fpos_t' (etc. just name it) in C or in C++ means that the byte size =
of
> > > it may (and eventually will) vary per platform or next generation
> > > of same platform. Also if two of above are exactly same type on curre=
nt
> > > platform then it is wrong to assume that these two are same on some o=
ther
> > > platform (unlike 'int' and 'int_fast16_t' that must be are same in al=
l=20
> > > platforms where 'int_fast16_t' is defined). That all *must* be known =
of
> > > course to everyone wanting to use C++ but what is the treatment?
> > >=20
> > > Might be you expect too lot of the types of <cstdint>. Exact size of
> > > type matters for example on case of binary exchange and storage forma=
ts.
> > > Actually I don't know much other prominent cases. Are there any?
> > >=20
> > > It may be that you currently deal with some such format ... but most=
=20
> > > things are text these days like (JSON, XML or CSV). It is narrow=20
> > > fraction of works where we want to squeeze last out and pick a binary
> > > format.
> >=20
> > There are a lot of games that use binary formats.  Scientific
> > and medical applications often use binary formats.
>=20
> All programs that use sounds or images technically use binary formats
> but those are abstracted far under some low level API from programmers.
> I did not mean that.

I didn't mean that either. =20


Brian
Ebenezer Enterprises - In G-d we trust.
http://webEbenezer.net
0
woodbrian77
6/28/2015 7:56:15 PM
On 28-Jun-15 9:15 PM, JiiPee wrote:
>
> also note that Scott recommends to use
> auto a = 9;
>
> so using auto. so letting the ccomputer to deside the type !!!
> What will you say about this??? :)

Using `auto` to declare a variable without an explicit type communicates 
well to the compiler but not to a human reader. Also it's longer to 
write and read than just `int`. And one can't generally adopt this as a 
convention, e.g. it doesn't work for a variable without initializer, so 
it's not forced by a convention.

Therefore I consider it an abuse of the language.

As to why, I guess that Scott has to use all kinds of fancy features 
just to grab and keep interest from the audience. And maybe so that 
novices can ask "what's the `auto`, huh?", so that he can explain it. 
Explaining things and giving advice is after all how he makes a living.


Cheers & hth.,

- Alf

-- 
Using Thunderbird as Usenet client, Eternal September as NNTP server.
0
Alf
6/28/2015 8:13:33 PM
On 28/06/2015 21:13, Alf P. Steinbach wrote:
> On 28-Jun-15 9:15 PM, JiiPee wrote:
>>
>> also note that Scott recommends to use
>> auto a = 9;
>>
>> so using auto. so letting the ccomputer to deside the type !!!
>> What will you say about this??? :)
>
> Using `auto` to declare a variable without an explicit type 
> communicates well to the compiler but not to a human reader. 

In Visual Studio you can hoover the mouse and see the real type quite 
easily, also works on other compilers.

> Also it's longer to write and read than just `int`. And one can't 
> generally adopt this as a convention, e.g. it doesn't work for a 
> variable without initializer, so it's not forced by a convention.

think about a funktion returning the size of elements in and 
container... you could wrongly put:
unsigend long getSize();

if it actually returs a 64 bit integer. auto would find the right type 
straight away.

>
> Therefore I consider it an abuse of the language.

in many places it makes coding safer because the auto always finds the 
correct type. You can get bugs by putting a wrong type... and people 
have done that when reading forums.

>
> As to why, I guess that Scott has to use all kinds of fancy features 
> just to grab and keep interest from the audience. And maybe so that 
> novices can ask "what's the `auto`, huh?", so that he can explain it. 
> Explaining things and giving advice is after all how he makes a living.
>

with comples types it can increase safety, because auto always gets it 
right. We might get the type wrong which migh cause bugs.

But auto definitely is not always best for sure even if somebody likes it.
0
JiiPee
6/28/2015 8:22:37 PM
On 28/06/2015 21:13, Alf P. Steinbach wrote:
> On 28-Jun-15 9:15 PM, JiiPee wrote:
>>
>> also note that Scott recommends to use
>> auto a = 9;
>>
>> so using auto. so letting the ccomputer to deside the type !!!
>> What will you say about this??? :)
>
> Using `auto` to declare a variable without an explicit type 
> communicates well to the compiler but not to a human reader. Also it's 
> longer to write and read than just `int`.

but if you take an everage value of all types the auto would win big 
time. on average auto makes typenames much shorter if all types are 
considered.
0
JiiPee
6/28/2015 8:23:52 PM
JiiPee <no@notvalid.com> wrote:
> On 28/06/2015 21:13, Alf P. Steinbach wrote:
> > On 28-Jun-15 9:15 PM, JiiPee wrote:
> >>
> >> also note that Scott recommends to use
> >> auto a = 9;
> >>
> >> so using auto. so letting the ccomputer to deside the type !!!
> >> What will you say about this??? :)
> >
> > Using `auto` to declare a variable without an explicit type 
> > communicates well to the compiler but not to a human reader. 

> In Visual Studio you can hoover the mouse and see the real type quite 
> easily, also works on other compilers.

Please keep in mind that VS is an IDE with an attached compiler
(beside a lot of other things). So this won't work "on other
compilers", since a compiler is a program to comile code and
not something you can "hoover over" with the mouse. You may
be surprised, but not everyone is using an IDE (for various
reasons) - or even a graphical user interface - all of the
time (and thus a mouse or something similar)...

> > Also it's longer to write and read than just `int`. And one can't 
> > generally adopt this as a convention, e.g. it doesn't work for a 
> > variable without initializer, so it's not forced by a convention.

> think about a funktion returning the size of elements in and 
> container... you could wrongly put:
> unsigend long getSize();

> if it actually returs a 64 bit integer. auto would find the right type 
> straight away.

If you define a variable and assign to it the return value
of a function then it's relatively clear what the type will
be - it can be easily found out by looking at the function
declaration. But something like

auto a = 0;

is a bit different: you have to very carefully look at that
'0' to figure out if this will end up being an 'int' or per-
haps something else? And it can be prone to getting the wrong
type by vorgetting (or mis-typing) some character after the
'0' that makes the variable have a different type. There's
definitely a readability issue.

> in many places it makes coding safer because the auto always finds the 
> correct type. You can get bugs by putting a wrong type... and people 
> have done that when reading forums.

Yes, but cases like

int a = 0f;

are places where this isn't the case. 'auto' is very usefulf
in cases like

    for ( auto it = xyz.begin(); it != xyz.end(); ++i )

instead of maybe

    for ( std::pair< std::vector< std::pair< int, char const * >, double >, std::vector< std::string > >::iterator it = xyz.begin( ); it != xyz.end( ); ++i )

since you will be aware of the type of 'xyz', but

auto a = 0ull;

is different since it makes the type of 'a' hard to recognize
at a glance. And you may not forget anything of the 'ull' bit
at the end or you'll get something you never wanted and thus
don't expect. It actually creates a new class of possible bugs.

                            Regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de
0
jt
6/28/2015 10:33:07 PM
On Sunday, 28 June 2015 20:22:01 UTC+3, BGB  wrote:
> On 6/28/2015 8:43 AM, =D6=F6 Tiib wrote:
> > On Sunday, 28 June 2015 05:25:39 UTC+3, BGB  wrote:
> >> On 6/27/2015 4:17 PM, =D6=F6 Tiib wrote:
> >>> On Saturday, 27 June 2015 17:46:19 UTC+3, Mr Flibble  wrote:
> >>>> On 27/06/2015 07:27, =D6=F6 Tiib wrote:
> >>>>> On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
> >>>>>> On 26/06/2015 21:31, JiiPee wrote:
> >>>>>>> On 26/06/2015 20:39, Mr Flibble wrote:
> >>>>>>>> ... #include <cstdint> instead!
> >>>>>>>>
> >>>>>>>> /Flibble
> >>>>>>>
> >>>>>>> you mean using int16_t? why is that?
> >>>>>>> int is the fastest integer so why would i use something else?
> >>>>>>
> >>>>>> Because 'int' is both unsafe and non-portable.  If you want the fa=
stest
> >>>>>> integer that is at least a certain size then there is a <cstdint>
> >>>>>> typedef for that.
> >>>>>
> >>>>> Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
> >>>>> I don't think that latter is anyway safer or more portable than
> >>>>> former.
> >>>>
> >>>> Because use of the typedef means you are explicitly documenting in c=
ode
> >>>> that the size of this type can vary and should be treated as such.
> >>>
> >>> Maybe try to elaborate it bit more. What *is* the special treatment y=
ou
> >>> have in mind?
> >>>
> >>> Usage of 'int', 'long', 'wchar_t', 'size_t', 'time_t', 'ptrdiff_t',
> >>> 'fpos_t' (etc. just name it) in C or in C++ means that the byte size =
of
> >>> it may (and eventually will) vary per platform or next generation
> >>> of same platform. Also if two of above are exactly same type on curre=
nt
> >>> platform then it is wrong to assume that these two are same on some o=
ther
> >>> platform (unlike 'int' and 'int_fast16_t' that must be are same in al=
l
> >>> platforms where 'int_fast16_t' is defined). That all *must* be known =
of
> >>> course to everyone wanting to use C++ but what is the treatment?
> >>>
> >>> Might be you expect too lot of the types of <cstdint>. Exact size of
> >>> type matters for example on case of binary exchange and storage forma=
ts.
> >>> Actually I don't know much other prominent cases. Are there any?
> >>>
> >>> It may be that you currently deal with some such format ... but most
> >>> things are text these days like (JSON, XML or CSV). It is narrow
> >>> fraction of works where we want to squeeze last out and pick a binary
> >>> format.
> >>>
> >>> Fixing physical byte size of types that participate in binary format
> >>> is unfortunately not the solution. Alignment and endianness are still
> >>> to be taken care of. Also since we took it for to squeeze everything
> >>> out of our bandwidth then why not to compress it even further? When
> >>> a value is in range 0..1000 then why not to use only 10 bits for it?
> >>> Well-made bit-packing can be faster than 'memcpy' on modern platforms
> >>> thanks to less cache misses.
> >>
> >> bit-streams and variable-length numbers can also be useful, but are mo=
re
> >> painful to work with.
> >>
> >> bit-packing and byte-oriented formats can also do pretty good if done
> >> well. though, yes, the code will need to take care of things like endi=
aness.
> >
> > We have to take care of endianness in any portable format. It is
> > particularly funny since neither C nor C++ contain standard way for
> > detecting endianness compile-time. There are some libraries that use
> > every known non-standard way for that and so produce minimal code.
> >
>=20
> yeah.
>=20
> I prefer fixed endianess formats, personally.
>=20
> granted, probably most everyone else does as well, as most formats are=20
> this way. just a few formats exist where the endianess depends on=20
> whichever computer saved the file, with magic numbers to detect when=20
> swapping is needed. I find these annoying.
>=20
> some of my tools (script language and C compiler, as an extension) have=
=20
> the ability to specify the endianess for variables and pointer types=20
> (so, you can be sure the value is stored as either big or little endian,=
=20
> regardless of native endianess), and implicitly also makes it safe for=20
> misaligned loads/stores.
>=20
> namely, if you know a value needs to be a particular way, then chances=20
> are the person is willing to pay whatever CPU cycles are needed to make=
=20
> it that way.

After we take care of size and we take care of endianness there is
still alignment to care about. ;)
Note that alignment requirements may come as surprise to many.=20
Sometimes even on platforms with relaxed ones. For example one will=20
get crash on Win32 because of alignment of XMVECTOR was not 16.

In real application the code that encodes or decodes information for
some binary is usually therefore separate from rest of the code to
abstract such difficulties away.=20

> typically, for compile-time stuff, there is a mess of #define's and=20
> #ifdef's for figuring out the target architecture and other things,=20
> picking appropriate type-sizes and setting values for things like=20
> whether or not the target supports misaligned access, ...
>=20
> I guess it could be nicer if more of this were standardized.

Yes, preprocessor metaprogramming is not much better than template
metaprogramming if to read it. So these things are better to keep
away from code, in a separate library.

> >> sometimes there are tradeoffs, like one format which could have
> >> technically been a little more space-efficient with differentially-cod=
ed
> >> variable-width numbers for a table, but then needing to decode the tab=
le
> >> prior to being able to use it. I instead used a design where the field
> >> was 16/24/32 bits wide (for the whole table) depending on what was
> >> needed for the target-index, but the fixed-width entries allow for
> >> random access to the table.
> >
> > It indeed depends on anticipated amounts of data if it is worth that
> > or not but if we know that a table contains only for example values 0..=
11
> > then it feels reasonable at least to consider 4 bit wide entries. The
> > processors crunch numbers at ungodly speeds but it is 4 times shorter
> > table than one with 16 bit wide entries.
> >
>=20
> could be, but the table entries in this case were fairly unlikely to be=
=20
> that much below 16 bits (so 8 bit or smaller would not have been useful).
>=20
> these were basically offsets within a TLV lump. where you would have one=
=20
> TLV lump which contains lots of payload data (as an array of=20
> variable-sized sub-lumps packed end-to-end), and a table to say where=20
> each sub-lump is within that bigger lump (to better allow random access).
>=20
> in most of the cases, the lumps were between kB or maybe up to a few MB,=
=20
> so 16 or 24 bits are the most likely cases, and always using 32-bits "to=
=20
> be safe" would be a waste.
>=20
> it would be fairly easy to compress them further, but this would require=
=20
> decoding them before they could be used, which was undesirable in this ca=
se.

Interesting. Doesn't TLV mean tag or type, length, value? Most of those=20
formats encode some sort of tag/type/length in few first bits/bytes and
rest are data bytes, potentially also TLV. So why you used offsets?=20
Weren't the "sub-lumps" TLV themselves?

> >> this sort of things has in a few cases pointed towards arguably
> >> old/archaic solutions, like using codepages (particularly 1252 and 437=
)
> >> in a few cases due to fixed-width characters being cheaper for random
> >> access than UTF-8, but less overly expensive vs UTF-16, and sufficient
> >> in the vast majority of cases (and are the cheapest option in cases wh=
en
> >> they are an option). note that this would be done per-string.
> >
> > How we moved somehow now from int32_t of Leigh to binary representation=
s
> > of text? :) But that is interesting topic as well. Have you noticed tha=
t
> > with text the need for random access of contents is actually rather rar=
e
> > case?  OTOH storage for texts can be significant if there are lot of te=
xts
> > or lot of translations. Number of PC software let to download and insta=
ll
> > translations separately or optionally.
> >
>=20
> yeah, probably should have been clearer.
>=20
> this was for string literals/values in a VM.
>=20
> in the predecessor VM, M-UTF-8 had been used for all the string literals=
=20
> (except the UTF-16 ones), which mostly worked (since direct=20
> per-character access is fairly rare), but it meant doing something like=
=20
> "str[idx]" would take 'O(n)' time (and looping over a string=20
> per-character would be O(n^2)...).
>=20
> in the use-case for the new VM, I wanted O(1) access here (mostly to=20
> make things more predictable, *), but also didn't want the nearly pure=20
> waste that is UTF-16 strings.
>=20
> however, the language in question uses UTF-16 as its logical model (so,=
=20
> from high-level code, it appears as if all strings are UTF-16). in the=20
> language, strings are immutable, so there is no issue with the use of=20
> ASCII or similar for the underlying storage.
>=20
> in C, it isn't really an issue mostly as C makes no attempt to gloss=20
> over in-memory storage, so you can just return the raw byte values or=20
> similar.
>=20
>=20
> *: the VM needs to be able to keep timing latencies bounded, which=20
> basically weighs against doing anything in the VM where the time-cost=20
> can't be easily predicted in advance. wherever possible, all operations=
=20
> need to be kept O(1), with the operation either being able to complete=20
> in the available time-step (generally 1us per "trace"), or the VM will=20
> need to halt and defer execution until later (blocking is not allowed,=20
> and any operations which may result in unexpected behaviors, such as=20
> halting, throwing an exception, ... effectively need to terminate the=20
> current trace, which makes them more expensive).
>=20
> for some related reasons, the VM is also using B-Trees rather than hash=
=20
> tables in a few places (more predictable, if slower, than hashes, but=20
> less memory waste than AVL or BST variants). likewise, because of their=
=20
> structure, it is possible to predict in advance (based on the size of=20
> the tree) approximately how long it will take to perform the operation.

One interesting tree is splay that often works best in real application.

> > Above can't be done with embedded system so easily since it can affect =
price
> > of unit to organize flashing the very same product differently for each
> > market targeted. When access is sequential then polished Huffman decodi=
ng
> > does actually rarely affect performance. So I have seen embedded system=
s
> > keeping the text dictionaries Huffman encoded all time. If to keep text=
s
> > Huffman encoded anyway then UCS-2 or UTF-16 are perfectly fine and ther=
e
> > are no need for archaic tricks like Windows-1252 or Code Page 437.
>=20
> granted, but in this case, it is mostly for string literals, rather than=
=20
> bulk text storage.

The stings of application do not typically form some sort of bulk but a=20
sort of dictionary of short texts.

> Windows-1252 covers most general use-cases for text (and is fairly easy=
=20
> to convert to/from UTF-16, as for most of the range the characters map 1:=
1).
> CP-437 is good mostly for things like ASCII art and text-based UIs.
>=20
> for literals, it will be the job of the compiler to sort out which=20
> format to use.
>=20
> bulk storage will tend to remain in compressed UTF-8.
> though a more specialized format could be good.
>=20
> I had good results before compressing short fragments (such as character=
=20
> strings) with a combination of LZ77 and MTF+Rice Coding, which for small=
=20
> pieces of data did significantly better than Deflate or LZMA. however,=20
> the MTF makes it slower per-character than a Huffman-based option.
>=20
> basically, options like Deflate or LZMA are largely ineffective for=20
> payloads much under 200-500 bytes or so, but are much more effective as=
=20
> payloads get bigger.

Those packing algorithms are all for larger texts. With relatively short
one-liners one can perhaps make some special sub-string-packing but it
will be computationally expensive to pack.
0
ISO
6/28/2015 11:50:28 PM
On 6/28/2015 6:50 PM, �� Tiib wrote:
> On Sunday, 28 June 2015 20:22:01 UTC+3, BGB  wrote:
>> On 6/28/2015 8:43 AM, �� Tiib wrote:
>>> On Sunday, 28 June 2015 05:25:39 UTC+3, BGB  wrote:
>>>> On 6/27/2015 4:17 PM, �� Tiib wrote:
>>>>> On Saturday, 27 June 2015 17:46:19 UTC+3, Mr Flibble  wrote:
>>>>>> On 27/06/2015 07:27, �� Tiib wrote:
>>>>>>> On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
>>>>>>>> On 26/06/2015 21:31, JiiPee wrote:
>>>>>>>>> On 26/06/2015 20:39, Mr Flibble wrote:
>>>>>>>>>> ... #include <cstdint> instead!
>>>>>>>>>>
>>>>>>>>>> /Flibble
>>>>>>>>>
>>>>>>>>> you mean using int16_t? why is that?
>>>>>>>>> int is the fastest integer so why would i use something else?
>>>>>>>>
>>>>>>>> Because 'int' is both unsafe and non-portable.  If you want the fastest
>>>>>>>> integer that is at least a certain size then there is a <cstdint>
>>>>>>>> typedef for that.
>>>>>>>
>>>>>>> Isn't 'int' easier to read than 'int_fast16_t' of <cstdint>?
>>>>>>> I don't think that latter is anyway safer or more portable than
>>>>>>> former.
>>>>>>
>>>>>> Because use of the typedef means you are explicitly documenting in code
>>>>>> that the size of this type can vary and should be treated as such.
>>>>>
>>>>> Maybe try to elaborate it bit more. What *is* the special treatment you
>>>>> have in mind?
>>>>>
>>>>> Usage of 'int', 'long', 'wchar_t', 'size_t', 'time_t', 'ptrdiff_t',
>>>>> 'fpos_t' (etc. just name it) in C or in C++ means that the byte size of
>>>>> it may (and eventually will) vary per platform or next generation
>>>>> of same platform. Also if two of above are exactly same type on current
>>>>> platform then it is wrong to assume that these two are same on some other
>>>>> platform (unlike 'int' and 'int_fast16_t' that must be are same in all
>>>>> platforms where 'int_fast16_t' is defined). That all *must* be known of
>>>>> course to everyone wanting to use C++ but what is the treatment?
>>>>>
>>>>> Might be you expect too lot of the types of <cstdint>. Exact size of
>>>>> type matters for example on case of binary exchange and storage formats.
>>>>> Actually I don't know much other prominent cases. Are there any?
>>>>>
>>>>> It may be that you currently deal with some such format ... but most
>>>>> things are text these days like (JSON, XML or CSV). It is narrow
>>>>> fraction of works where we want to squeeze last out and pick a binary
>>>>> format.
>>>>>
>>>>> Fixing physical byte size of types that participate in binary format
>>>>> is unfortunately not the solution. Alignment and endianness are still
>>>>> to be taken care of. Also since we took it for to squeeze everything
>>>>> out of our bandwidth then why not to compress it even further? When
>>>>> a value is in range 0..1000 then why not to use only 10 bits for it?
>>>>> Well-made bit-packing can be faster than 'memcpy' on modern platforms
>>>>> thanks to less cache misses.
>>>>
>>>> bit-streams and variable-length numbers can also be useful, but are more
>>>> painful to work with.
>>>>
>>>> bit-packing and byte-oriented formats can also do pretty good if done
>>>> well. though, yes, the code will need to take care of things like endianess.
>>>
>>> We have to take care of endianness in any portable format. It is
>>> particularly funny since neither C nor C++ contain standard way for
>>> detecting endianness compile-time. There are some libraries that use
>>> every known non-standard way for that and so produce minimal code.
>>>
>>
>> yeah.
>>
>> I prefer fixed endianess formats, personally.
>>
>> granted, probably most everyone else does as well, as most formats are
>> this way. just a few formats exist where the endianess depends on
>> whichever computer saved the file, with magic numbers to detect when
>> swapping is needed. I find these annoying.
>>
>> some of my tools (script language and C compiler, as an extension) have
>> the ability to specify the endianess for variables and pointer types
>> (so, you can be sure the value is stored as either big or little endian,
>> regardless of native endianess), and implicitly also makes it safe for
>> misaligned loads/stores.
>>
>> namely, if you know a value needs to be a particular way, then chances
>> are the person is willing to pay whatever CPU cycles are needed to make
>> it that way.
>
> After we take care of size and we take care of endianness there is
> still alignment to care about. ;)
> Note that alignment requirements may come as surprise to many.
> Sometimes even on platforms with relaxed ones. For example one will
> get crash on Win32 because of alignment of XMVECTOR was not 16.
>
> In real application the code that encodes or decodes information for
> some binary is usually therefore separate from rest of the code to
> abstract such difficulties away.
>

yeah.

this is why the explicit endian variables could also be implicitly 
misaligned. it could also be possible to specify explicit alignment via 
an attribute, but this isn't currently supported.

the ways it is implemented generally gloss over whatever the hardware 
does (so, it can still do unaligned operations even on targets which 
only support aligned loads/stores, by internally using byte operations 
and shifts).

currently, my C compiler/VM doesn't do SIMD, but this might get 
implemented eventually.

keywords were '__bigendian' and '__ltlendian'.

ex:
   __bigendian int x;
or:
   i=*(__bigendian int *)ptr;

possible explicit alignment value:
   __declspec(align(2)) __bigendian int x;
or (if another keyword were added):
   __align(2) __bigendian int x;


>> typically, for compile-time stuff, there is a mess of #define's and
>> #ifdef's for figuring out the target architecture and other things,
>> picking appropriate type-sizes and setting values for things like
>> whether or not the target supports misaligned access, ...
>>
>> I guess it could be nicer if more of this were standardized.
>
> Yes, preprocessor metaprogramming is not much better than template
> metaprogramming if to read it. So these things are better to keep
> away from code, in a separate library.
>

usually, this is a 'whatever_conf.h' file which is copied/pasted around, 
sometimes with tweaks.

it also does explicit-size types, partly because I often use a compiler 
which until very recently did not support 'stdint' stuff...


>>>> sometimes there are tradeoffs, like one format which could have
>>>> technically been a little more space-efficient with differentially-coded
>>>> variable-width numbers for a table, but then needing to decode the table
>>>> prior to being able to use it. I instead used a design where the field
>>>> was 16/24/32 bits wide (for the whole table) depending on what was
>>>> needed for the target-index, but the fixed-width entries allow for
>>>> random access to the table.
>>>
>>> It indeed depends on anticipated amounts of data if it is worth that
>>> or not but if we know that a table contains only for example values 0..11
>>> then it feels reasonable at least to consider 4 bit wide entries. The
>>> processors crunch numbers at ungodly speeds but it is 4 times shorter
>>> table than one with 16 bit wide entries.
>>>
>>
>> could be, but the table entries in this case were fairly unlikely to be
>> that much below 16 bits (so 8 bit or smaller would not have been useful).
>>
>> these were basically offsets within a TLV lump. where you would have one
>> TLV lump which contains lots of payload data (as an array of
>> variable-sized sub-lumps packed end-to-end), and a table to say where
>> each sub-lump is within that bigger lump (to better allow random access).
>>
>> in most of the cases, the lumps were between kB or maybe up to a few MB,
>> so 16 or 24 bits are the most likely cases, and always using 32-bits "to
>> be safe" would be a waste.
>>
>> it would be fairly easy to compress them further, but this would require
>> decoding them before they could be used, which was undesirable in this case.
>
> Interesting. Doesn't TLV mean tag or type, length, value? Most of those
> formats encode some sort of tag/type/length in few first bits/bytes and
> rest are data bytes, potentially also TLV. So why you used offsets?
> Weren't the "sub-lumps" TLV themselves?
>

offsets are used because this saves needing to scan over the payload 
lumps and rebuild the table. granted, the cost of this probably wouldn't 
be huge, but the table will be needed either way, and if it is already 
present in the data, it doesn't need to be built.

note that the file is not read-in/loaded sequentially, but is basically 
loaded into the address space and used in-place.


it is Tag/Length/Value, yes.

the format is vaguely akin to if IFF and ASN.1 BER were fused together.

ASN.1 style tags are used mostly for compact structures (with typically 
about 2 bytes of overhead), with TWOCC for intermediate structures (4 or 
6 bytes overhead), and FOURCC for top-level structures (8 or 12 bytes of 
overhead).


* 0x00-0x1F: Public Primitive (Class=0)
* 0x20-0x3F: Public Composite (Class=1)
* 0x40-0x5F: Private Primitive (Class=2)
* 0x60-0x7F: Private Composite (Class=3)
* 0x80-0x9F: Context Primitive (Class=4)
* 0xA0-0xBF: Context Composite (Class=5)
** ccct-tttt:
*** ccc=class, ttttt=tag
*** tag=0..30, tag encoded directly
*** tag=31, tag is escape coded.
* 0xC0-0xDF: Reserved
* 0xE0-0xFF: Special Markers
** 0xE0, End Of Data
** 0xE1, len:WORD24
** 0xE2, len:BYTE
*** Context Dependent Untagged Data
** 0xE3, len:WORD24, tag:TWOCC
** 0xE4, len:WORD24, tag:FOURCC
** 0xE5, len:BYTE, tag:TWOCC
** 0xE6, len:Word56, tag:FOURCC
** 0xE7, len:WORD24, tag:EIGHTCC
** 0xE8, len:WORD24, tag:SIXTEENCC
** 0xE9, len:Word56, tag:EIGHTCC
** 0xEA, len:WORD56, tag:SIXTEENCC
*** Tagged Markers

I have used variations on this design for a number of formats (it 
actually started out mostly in some of my video codecs).


>>
>> *: the VM needs to be able to keep timing latencies bounded, which
>> basically weighs against doing anything in the VM where the time-cost
>> can't be easily predicted in advance. wherever possible, all operations
>> need to be kept O(1), with the operation either being able to complete
>> in the available time-step (generally 1us per "trace"), or the VM will
>> need to halt and defer execution until later (blocking is not allowed,
>> and any operations which may result in unexpected behaviors, such as
>> halting, throwing an exception, ... effectively need to terminate the
>> current trace, which makes them more expensive).
>>
>> for some related reasons, the VM is also using B-Trees rather than hash
>> tables in a few places (more predictable, if slower, than hashes, but
>> less memory waste than AVL or BST variants). likewise, because of their
>> structure, it is possible to predict in advance (based on the size of
>> the tree) approximately how long it will take to perform the operation.
>
> One interesting tree is splay that often works best in real application.
>

yeah.

splay is fast, but not necessarily all that predictable nor memory 
efficient (it is more like AVL in the memory-efficiency sense, and 
leaves open the possibility of an O(n) worst case).


B-Tree is not necessarily the fastest option, but should be fairly 
predictable in this case.

like, you really don't want a random edge-case throwing a wrench in the 
timing, and fouling up external electronics by not updating the IO pins 
at the correct times or something.

nevermind if it is questionable to do high-level logic and also deal 
with hardware-level IO on the same processor core, but alas... (why have 
a separate ARM chip and an MCU, when you can save some money by just 
doing everything on the main ARM chip?...).


>>> Above can't be done with embedded system so easily since it can affect price
>>> of unit to organize flashing the very same product differently for each
>>> market targeted. When access is sequential then polished Huffman decoding
>>> does actually rarely affect performance. So I have seen embedded systems
>>> keeping the text dictionaries Huffman encoded all time. If to keep texts
>>> Huffman encoded anyway then UCS-2 or UTF-16 are perfectly fine and there
>>> are no need for archaic tricks like Windows-1252 or Code Page 437.
>>
>> granted, but in this case, it is mostly for string literals, rather than
>> bulk text storage.
>
> The stings of application do not typically form some sort of bulk but a
> sort of dictionary of short texts.
>

not sure I follow.

in the case of the VM, they are basically a pair of string tables, one 
for ASCII and UTF-8, and the other for UTF-16. each string is terminated 
with a NUL character.

for the ASCII table, string references are in terms of byte offsets, 
with it depending on context how the string is encoded.

if C is compiled to the VM, it really doesn't care, since it has good 
old 'char *', so will use pointers into the string table.

the script-language does care, but will implicitly declare the type as 
part of the process of loading the string into a register (and, in this 
case, the VM will remember the type of string).


though, granted, in this VM, the string type will be handled by 
implicitly using tag bits in the reference. this allows using a 
reference directly to the string-table memory, without needing an 
intermediate structure (so, it is sort of like a normal 'char *' 
pointer, just with a hidden type-tag in the reference).


>> Windows-1252 covers most general use-cases for text (and is fairly easy
>> to convert to/from UTF-16, as for most of the range the characters map 1:1).
>> CP-437 is good mostly for things like ASCII art and text-based UIs.
>>
>> for literals, it will be the job of the compiler to sort out which
>> format to use.
>>
>> bulk storage will tend to remain in compressed UTF-8.
>> though a more specialized format could be good.
>>
>> I had good results before compressing short fragments (such as character
>> strings) with a combination of LZ77 and MTF+Rice Coding, which for small
>> pieces of data did significantly better than Deflate or LZMA. however,
>> the MTF makes it slower per-character than a Huffman-based option.
>>
>> basically, options like Deflate or LZMA are largely ineffective for
>> payloads much under 200-500 bytes or so, but are much more effective as
>> payloads get bigger.
>
> Those packing algorithms are all for larger texts. With relatively short
> one-liners one can perhaps make some special sub-string-packing but it
> will be computationally expensive to pack.
>

well, as noted, I had some limited success with MTF+Rice.
though, there isn't much that can be done with short character strings.

0
BGB
6/29/2015 7:21:30 AM
On Monday, 29 June 2015 10:26:01 UTC+3, BGB  wrote:
> On 6/28/2015 6:50 PM, =D6=F6 Tiib wrote:
> > On Sunday, 28 June 2015 20:22:01 UTC+3, BGB  wrote:
> >> On 6/28/2015 8:43 AM, =D6=F6 Tiib wrote:
> >>> On Sunday, 28 June 2015 05:25:39 UTC+3, BGB  wrote:
> >>>> On 6/27/2015 4:17 PM, =D6=F6 Tiib wrote:
> >>>>> On Saturday, 27 June 2015 17:46:19 UTC+3, Mr Flibble  wrote:
> >>>>>> On 27/06/2015 07:27, =D6=F6 Tiib wrote:
> >>>>>>> On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
> >>>>>>>> On 26/06/2015 21:31, JiiPee wrote:
> >>>>>>>>> On 26/06/2015 20:39, Mr Flibble wrote:

.... Snip for focus =20
>=20
> >>
> >> for some related reasons, the VM is also using B-Trees rather than has=
h
> >> tables in a few places (more predictable, if slower, than hashes, but
> >> less memory waste than AVL or BST variants). likewise, because of thei=
r
> >> structure, it is possible to predict in advance (based on the size of
> >> the tree) approximately how long it will take to perform the operation=
..
> >
> > One interesting tree is splay that often works best in real application=
..
> >
>=20
> yeah.
>=20
> splay is fast, but not necessarily all that predictable nor memory=20
> efficient (it is more like AVL in the memory-efficiency sense, and=20
> leaves open the possibility of an O(n) worst case).
>=20
> B-Tree is not necessarily the fastest option, but should be fairly=20
> predictable in this case.
>=20
> like, you really don't want a random edge-case throwing a wrench in the=
=20
> timing, and fouling up external electronics by not updating the IO pins=
=20
> at the correct times or something.

On that case RB tree is maybe most stable ... on any case it seems more
stable and generally quicker than AVL.

> nevermind if it is questionable to do high-level logic and also deal=20
> with hardware-level IO on the same processor core, but alas... (why have=
=20
> a separate ARM chip and an MCU, when you can save some money by just=20
> doing everything on the main ARM chip?...).

You certainly may need separate processor for more demanding I/O (WiFi
network for example) otherwise yes.

> >>> Above can't be done with embedded system so easily since it can affec=
t price
> >>> of unit to organize flashing the very same product differently for ea=
ch
> >>> market targeted. When access is sequential then polished Huffman deco=
ding
> >>> does actually rarely affect performance. So I have seen embedded syst=
ems
> >>> keeping the text dictionaries Huffman encoded all time. If to keep te=
xts
> >>> Huffman encoded anyway then UCS-2 or UTF-16 are perfectly fine and th=
ere
> >>> are no need for archaic tricks like Windows-1252 or Code Page 437.
> >>
> >> granted, but in this case, it is mostly for string literals, rather th=
an
> >> bulk text storage.
> >
> > The stings of application do not typically form some sort of bulk but a
> > sort of dictionary of short texts.
>=20
> not sure I follow.
>=20
> in the case of the VM, they are basically a pair of string tables, one=20
> for ASCII and UTF-8, and the other for UTF-16. each string is terminated=
=20
> with a NUL character.
>=20
> for the ASCII table, string references are in terms of byte offsets,=20
> with it depending on context how the string is encoded.
>=20
> if C is compiled to the VM, it really doesn't care, since it has good=20
> old 'char *', so will use pointers into the string table.
>=20
> the script-language does care, but will implicitly declare the type as=20
> part of the process of loading the string into a register (and, in this=
=20
> case, the VM will remember the type of string).
>=20
> though, granted, in this VM, the string type will be handled by=20
> implicitly using tag bits in the reference. this allows using a=20
> reference directly to the string-table memory, without needing an=20
> intermediate structure (so, it is sort of like a normal 'char *'=20
> pointer, just with a hidden type-tag in the reference).

I meant that if we really wan't to have byte pointers into our string=20
literals then there is indeed no way to keep things compressed
behind scenes. However If we want to have a "text" available in
limited occasions, for example like GNU gettext library provides
then we can have the texts compressed most of the time.

> >> Windows-1252 covers most general use-cases for text (and is fairly eas=
y
> >> to convert to/from UTF-16, as for most of the range the characters map=
 1:1).
> >> CP-437 is good mostly for things like ASCII art and text-based UIs.
> >>
> >> for literals, it will be the job of the compiler to sort out which
> >> format to use.
> >>
> >> bulk storage will tend to remain in compressed UTF-8.
> >> though a more specialized format could be good.
> >>
> >> I had good results before compressing short fragments (such as charact=
er
> >> strings) with a combination of LZ77 and MTF+Rice Coding, which for sma=
ll
> >> pieces of data did significantly better than Deflate or LZMA. however,
> >> the MTF makes it slower per-character than a Huffman-based option.
> >>
> >> basically, options like Deflate or LZMA are largely ineffective for
> >> payloads much under 200-500 bytes or so, but are much more effective a=
s
> >> payloads get bigger.
> >
> > Those packing algorithms are all for larger texts. With relatively shor=
t
> > one-liners one can perhaps make some special sub-string-packing but it
> > will be computationally expensive to pack.
>=20
> well, as noted, I had some limited success with MTF+Rice.
> though, there isn't much that can be done with short character strings.

If all texts in application are UTF-16 then Huffman coding compresses about
2.6 times (minus the decoder and its data of course). So the only question =
is=20
if there is big enough amount of texts to be handled by application to both=
er
with it.
0
ISO
6/29/2015 2:20:17 PM
On 6/29/2015 9:20 AM, �� Tiib wrote:
> On Monday, 29 June 2015 10:26:01 UTC+3, BGB  wrote:
>> On 6/28/2015 6:50 PM, �� Tiib wrote:
>>> On Sunday, 28 June 2015 20:22:01 UTC+3, BGB  wrote:
>>>> On 6/28/2015 8:43 AM, �� Tiib wrote:
>>>>> On Sunday, 28 June 2015 05:25:39 UTC+3, BGB  wrote:
>>>>>> On 6/27/2015 4:17 PM, �� Tiib wrote:
>>>>>>> On Saturday, 27 June 2015 17:46:19 UTC+3, Mr Flibble  wrote:
>>>>>>>> On 27/06/2015 07:27, �� Tiib wrote:
>>>>>>>>> On Saturday, 27 June 2015 00:37:40 UTC+3, Mr Flibble  wrote:
>>>>>>>>>> On 26/06/2015 21:31, JiiPee wrote:
>>>>>>>>>>> On 26/06/2015 20:39, Mr Flibble wrote:
>
> ... Snip for focus
>>
>>>>
>>>> for some related reasons, the VM is also using B-Trees rather than hash
>>>> tables in a few places (more predictable, if slower, than hashes, but
>>>> less memory waste than AVL or BST variants). likewise, because of their
>>>> structure, it is possible to predict in advance (based on the size of
>>>> the tree) approximately how long it will take to perform the operation.
>>>
>>> One interesting tree is splay that often works best in real application.
>>>
>>
>> yeah.
>>
>> splay is fast, but not necessarily all that predictable nor memory
>> efficient (it is more like AVL in the memory-efficiency sense, and
>> leaves open the possibility of an O(n) worst case).
>>
>> B-Tree is not necessarily the fastest option, but should be fairly
>> predictable in this case.
>>
>> like, you really don't want a random edge-case throwing a wrench in the
>> timing, and fouling up external electronics by not updating the IO pins
>> at the correct times or something.
>
> On that case RB tree is maybe most stable ... on any case it seems more
> stable and generally quicker than AVL.
>

could be.

more analysis and testing may be needed.


checking for memory density.
( note: Implicitly assuming a slab allocator or similar, so negligible 
overhead apart from alignment ).

say, tree node for a binary tree (BST1):
struct BiNode1 {
BiNode1 *left, *right;
SlotInfo *key;
byte depth;
//pad:3 bytes
Value val;
};
memory cost: ~24 bytes, about 33% payload.
assume: node is either a node or a leaf.
for a node, both left and right are filled and key is a pivot, nodes do 
not hold a value.
for a leaf, both left and right are NULL.

or (BST2):
struct BiNode2 {
SlotInfo *key;
byte depth;
//pad: 7 bytes
union {
   struct {
     BiNode2 *left, *right;
   }b;
   Value val;
}u;
};
memory cost: ~16 bytes, about 50% payload.
same behavior as BST1, but reducing memory by eliminating 
mutually-exclusive data.

BST3:
case of BST1, where each node always contains a value, reducing node counts.
each node has a value, and 1 or 2 child nodes (whereas a leaf lacks 
child nodes).

for an order-12 B-Tree (BT1):
#define MSX_BT_KEY 12
struct BtNode {
BtNode *next;
SlotInfo *key[MAX_BT_KEY];
byte nkey;
byte depth;
//pad: 2 bytes
union {
   BtNode *child[MAX_BT_KEY];
   Value val[MAX_BT_KEY];
}u;
};

Cost: 152 bytes, 63% payload.

for an order-6 B-Tree (BT2):
Cost: 96 bytes, 50% payload.

memory estimates (count, type, total bytes, memory efficiency):
  1 item(1L): BST1: 24 bytes (33.3%)
  1 item(1L): BST2: 16 bytes (50.0%)
  1 item(1L): BST3: 24 bytes (33.3%)
  1 item(1L): BT1: 152 bytes ( 5.3%)
  1 item(1N): BT2:  96 bytes ( 8.3%)

  2 items(1N,2L): BST1: 72 bytes (22.2%)
  2 items(1N,2L): BST2: 48 bytes (33.3%)
  2 items(1N,1L): BST3: 48 bytes (33.3%)
  2 items(   1L): BT1: 152 bytes (10.1%)
  2 items(   1L): BT2:  96 bytes (16.6%)

  4 items(3N,4L): BST1: 168 bytes (19.0%)
  4 items(3N,4L): BST2: 112 bytes (28.6%)
  4 items(2N,2L): BST3:  96 bytes (33.3%)
  4 items(   1L): BT1:  152 bytes (21.1%)
  4 items(   1L): BT2:   96 bytes (33.3%)

  8 items(7N,8L): BST1: 360 bytes (17.8%)
  8 items(7N,8L): BST2: 240 bytes (26.6%)
  8 items(4N,4L): BST3: 192 bytes (33.3%)
  8 items(   1L): BT1:  152 bytes (42.1%)
  8 items(1N,2L): BT2:  288 bytes (22.2%)

  16 items(15N,16L): BST1: 744 bytes (17.2%)
  16 items(15N,16L): BST2: 496 bytes (25.8%)
  16 items( 8N, 8L): BST3: 384 bytes (33.3%)
  16 items( 1N, 2L): BT1:  456 bytes (28.1%)
  16 items( 1N, 3L): BT2:  384 bytes (33.3%)

  32 items(31N,32L): BST1: 1512 bytes (16.9%)
  32 items(31N,32L): BST2: 1008 bytes (25.4%)
  32 items(16N,16L): BST3:  768 bytes (33.3%)
  32 items( 1N, 3L): BT1:   608 bytes (42.1%)
  32 items( 1N, 5L): BT2:   576 bytes (44.4%)


so, it is a lot closer than I was thinking...

BST has an advantage on the memory front for small item counts, but 
loses them as item counts increases.

though, it is possible that a B-Tree could suffer from an unfavorable 
pattern of inserts which could reduce its memory efficiency (causing it 
to under-perform vs a BST variant).


could rig-up some benchmarks and compare them...


>> nevermind if it is questionable to do high-level logic and also deal
>> with hardware-level IO on the same processor core, but alas... (why have
>> a separate ARM chip and an MCU, when you can save some money by just
>> doing everything on the main ARM chip?...).
>
> You certainly may need separate processor for more demanding I/O (WiFi
> network for example) otherwise yes.
>

a lot of things in my case are things like running motors and dealing 
with external sensors and similar (some of the sensors generate PWM 
signals).

a lot of this stuff runs in the kHz range (say, 25-50 kHz or so), so it 
is necessarily to be able to respond quickly enough to keep everything 
running smoothly.

internally, there might also be fun like interpreting G-Code or similar 
(telling the machine the desired positions to move the motors to, what 
RPM the tool spindle should turn, ...).


the same basic code has also been used in a small wheeled robot, and 
should be applicable to other types of robots (such as ones with 
articulated limbs).

networking is used to some extent to control these, but the network 
hardware seems able to handle a lot on its own (doesn't need anywhere 
near the levels of micro-management as motors or sensors)


>>>>> Above can't be done with embedded system so easily since it can affect price
>>>>> of unit to organize flashing the very same product differently for each
>>>>> market targeted. When access is sequential then polished Huffman decoding
>>>>> does actually rarely affect performance. So I have seen embedded systems
>>>>> keeping the text dictionaries Huffman encoded all time. If to keep texts
>>>>> Huffman encoded anyway then UCS-2 or UTF-16 are perfectly fine and there
>>>>> are no need for archaic tricks like Windows-1252 or Code Page 437.
>>>>
>>>> granted, but in this case, it is mostly for string literals, rather than
>>>> bulk text storage.
>>>
>>> The stings of application do not typically form some sort of bulk but a
>>> sort of dictionary of short texts.
>>
>> not sure I follow.
>>
>> in the case of the VM, they are basically a pair of string tables, one
>> for ASCII and UTF-8, and the other for UTF-16. each string is terminated
>> with a NUL character.
>>
>> for the ASCII table, string references are in terms of byte offsets,
>> with it depending on context how the string is encoded.
>>
>> if C is compiled to the VM, it really doesn't care, since it has good
>> old 'char *', so will use pointers into the string table.
>>
>> the script-language does care, but will implicitly declare the type as
>> part of the process of loading the string into a register (and, in this
>> case, the VM will remember the type of string).
>>
>> though, granted, in this VM, the string type will be handled by
>> implicitly using tag bits in the reference. this allows using a
>> reference directly to the string-table memory, without needing an
>> intermediate structure (so, it is sort of like a normal 'char *'
>> pointer, just with a hidden type-tag in the reference).
>
> I meant that if we really wan't to have byte pointers into our string
> literals then there is indeed no way to keep things compressed
> behind scenes. However If we want to have a "text" available in
> limited occasions, for example like GNU gettext library provides
> then we can have the texts compressed most of the time.
>

ok.


>>>> Windows-1252 covers most general use-cases for text (and is fairly easy
>>>> to convert to/from UTF-16, as for most of the range the characters map 1:1).
>>>> CP-437 is good mostly for things like ASCII art and text-based UIs.
>>>>
>>>> for literals, it will be the job of the compiler to sort out which
>>>> format to use.
>>>>
>>>> bulk storage will tend to remain in compressed UTF-8.
>>>> though a more specialized format could be good.
>>>>
>>>> I had good results before compressing short fragments (such as character
>>>> strings) with a combination of LZ77 and MTF+Rice Coding, which for small
>>>> pieces of data did significantly better than Deflate or LZMA. however,
>>>> the MTF makes it slower per-character than a Huffman-based option.
>>>>
>>>> basically, options like Deflate or LZMA are largely ineffective for
>>>> payloads much under 200-500 bytes or so, but are much more effective as
>>>> payloads get bigger.
>>>
>>> Those packing algorithms are all for larger texts. With relatively short
>>> one-liners one can perhaps make some special sub-string-packing but it
>>> will be computationally expensive to pack.
>>
>> well, as noted, I had some limited success with MTF+Rice.
>> though, there isn't much that can be done with short character strings.
>
> If all texts in application are UTF-16 then Huffman coding compresses about
> 2.6 times (minus the decoder and its data of course). So the only question is
> if there is big enough amount of texts to be handled by application to bother
> with it.
>

dunno, it depends on the application.

in the VM, a lot of the string data tends to be things like imported 
function names, type-signatures, ... some amount of this also tends to 
be strings for any printed messages.

bytecode is also a big chunk of memory.

generally the VM's bytecode images are around 1/2 to 1/3 the size of 
native x86 versions of the program (and also smaller than ARM code and 
Deflate-compressed source code).

though, the bytecode was designed some with code density in mind. it 
doesn't so much pull this off with the smallest possible opcodes, but 
rather by trying to allow things to be done with a reasonably small 
number of operations.

nevermind if the current VM goes and craps all over this by translating 
the bytecode into relatively bulky threaded-code, but alas...

0
BGB
6/29/2015 5:18:29 PM
Reply:

Similar Artilces:

'is not' or '!='
A newbie question to you; what is the difference between statements like: if x is not None: and if x != None: Without any context, which one should be preferred? IMHO, the latter is more readable. On 2014-08-18 21:35, ElChino wrote: > A newbie question to you; what is the difference between statements > like: > if x is not None: > and > if x != None: > > Without any context, which one should be preferred? > IMHO, the latter is more readable. > "x == y" tells you whether x and y refer to objects that are equal. "x is y&qu...

'^=' and '~='?
Hello, What is the difference between '^=' and '~='? Thanks, Duckhye ...

'''''''''''''The Running Update/Append Queries Using VBA code Ordeal''''''''''''''
Hello fellow programmers, I am trying to run an append/update query from code, a command button on a form initiates the queries. the format i am using is; _____________________________________________________ SELECT "criteria" FROM "criteria" WHERE "criteria" UPDATE/APPEND "field selections" RecordSource "qryExample" = above text strings" _______________________________________________________________________ When i am running a SELECT query in this manner it works fine with no problems, and accepts the values of specified linked for...

'''''''''''''The Running Update/Append Queries Using VBA code Ordeal'''''''''''''' #2
Hi, Thanks for ur help there HJ. I know how to do the tasks you specified there. I would like for the update query to use field values from some of the fields on the form (frmInvoices) such as InvoiceNumber, DateFrom, DateTo. My problem is that an append/update query can't find the values in the open Form (frmInvoices) when I specify them as; [Forms]![frmInvoices]![InvoiceNumber] a select query has no problem finding the field values on a form. please help. Aaron Hi Aaron, Could you post the entire code that you are having trouble with? Now it is not possible to see what goes wron...

if str_mo not in ('','.') and str_da not in ('','.') and str_yy not in ('','.') Any shorter ?
Hi, there. =20 I'm just curious if it ever dawned on anybody how to abbreviate this line : if str_mo not in ('','.') and str_da not in ('','.') and str_yy not in ('','.')=20 =20 Igor Kurbeko Clinical Programmer Analyst 678 336 4328 ikurbeko@atherogenics.com =20 no brain no pain =20 how about: if not (str_mo in ('','.') or str_da in ('','.') or str_yy in ('','.')) OR if not (missing(str_mo) or missing(str_da) or missing(str_yy)) Eric On 22 Oct 03 21:13:37 GMT, ikurbeko@ATHER...

'int *i ' or 'int* i'
Hi, there, In C programming, for pointer, I saw two programming styles: one is connecting '*' with variable, like, 'int *i'; the other is connecting '*' with data type, like, 'int* i' I wonder which style is more standard. Is there a rule to follow? Thanks in advance, Julia Julia wrote: > > Hi, there, > > In C programming, for pointer, I saw two programming styles: > one is connecting '*' with variable, like, 'int *i'; > the other is connecting '*' with data type, like, 'int* i' > > I wonder which...

A function with 'and' , 'not' , 'null' , 'car' and 'cdr'
What's this ? (defun enigma (x) (and (not (null x)) (or (null (car x)) (enigma (cdr x))))) "I suppose I should learn Lisp, but it seems so foreign." - Paul Graham, Nov 1983 On Wed, Oct 07 2015, CAI GENGYANG wrote: > What's this ? > > > (defun enigma (x) > (and (not (null x)) > (or (null (car x)) > (enigma (cdr x))))) Bad taste? It returns T if the list X contains nil as an element. It would be clearer to write (some #'null x). Helmut CAI GENGYANG ...

Re: if str_mo not in ('','.') and str_da not in ('','.') and str_yy not in ('','.') Any shorter ?
OR you could use ARRAY data new; set old; array igor $ (*) str_mo str_da str_yr; do over igor; if igor ~in (' ','.') then do; end; run; Prasad Ravi Igor Kurbeko <ikurbeko@ATHEROGENIC To: SAS-L@LISTSERV.UGA.EDU S.COM> cc: Sent by: "SAS(r) Subject: if str_mo not in ('','.') and str_da not in ('','.') and str_yy ...

error: expected '=', ',', ';', 'asm' or '__attrib
Hi I'm trying to compile an ADC Driver & come acrosss the following error. I've no experience writing drivers before, and hence have no clue how to fix it. Hope someone out there has encountered the problem & suggesst a fix for the same. The Error is I get is : qadc.c: At top level: qadc.c:97: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'qadc_read' make: *** [qadc.o] Error 1 [root@localhost qadc]# ########################################################################### ADC Driver Code ####################...

error: expected '=', ',', ';', 'asm' or '__attrib
Hi I'm trying to compile an ADC Driver & come acrosss the following error. I've no experience writing drivers before, and hence have no clue how to fix it. Hope someone out there has encountered the problem & suggesst a fix for the same. The Error is I get is : qadc.c: At top level: qadc.c:97: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'qadc_read' make: *** [qadc.o] Error 1 [root@localhost qadc]# ########################################################################### ADC Driver Code ##...

Difference between 'is' and '=='
Hey guys, this maybe a stupid question, but I can't seem to find the result anywhere online. When is the right time to use 'is' and when should we use '=='? Thanks alot~ mwql: >Hey guys, this maybe a stupid question, but I can't seem to find the >result anywhere online. When is the right time to use 'is' and when >should we use '=='? http://docs.python.org/ref/comparisons.html -- Ren� Pijlman mwql wrote: > Hey guys, this maybe a stupid question, but I can't seem to find the > result anywhere online. When is the right time to ...

Replacing ',' with '.'
Hello, I have a huge amount of numbers in a .txt file. The numbers are in the form 2,43252e+1. I need to replace the , with . How should I do this? I'd prefer some import method that does this during the import procedure. -Janne Hi, I guess you import the data as text and convert it then to numbers. Try 'strrep' before you convert the text to numbers. Tobias Jake the Snake schrieb: > Hello, > > I have a huge amount of numbers in a .txt file. The numbers are in the form 2,43252e+1. I need to replace the , with . How should I do this? I'd prefer some import method...

Meaning of ':' and '{}'
Hi, Please, can anyone tell me what is meaning of predicates ':' and '{}' ? ---------------- yours Advait Advait wrote: > Hi, > Please, can anyone tell me what is meaning of predicates ':' and '{}' > ? It depends on the context. : is used for module qualification of predicates. Like lists:member(X,L) The curly brackets are used in Definite Clauses Grammars: find the section on DCGs in the manual or google for it and you will find out easily. There are uses of {} as a functor in datastructures also. Cheers Bart Demoen Hello everyone,...

Re: '^=' and '~='?
Duckhye, According to the doc ( http://xrl.us/befwjx ) they, and one other set of characters, and the mnemonic 'NE' all represent 'NOT EQUAL'. Art ------- On Wed, 11 Feb 2009 16:52:40 -0600, Duck-Hye Yang <dyang@CHAPINHALL.ORG> wrote: >Hello, >What is the difference between '^=' and '~='? > >Thanks, >Duckhye ...

Web resources about - Avoid 'int', 'long' and 'short'... - comp.lang.c++

Resources last updated: 1/25/2016 12:43:59 PM