This paper by Martin Burtscher is scheduled for publication in IEEE
Tran. on Computers, with the following abstract:
"Many scientific programs exchange large quantities of double-
precision data between processing nodes and with mass storage devices.
Data compression can reduce the number of bytes that need to be
transferred and stored. However, compression is only likely to be
employed in high-end computing environments if it does not impede the
throughput. This paper describes and evaluates FPC, a fast lossless
compression algorithm for linear streams of 64-bit floating-point
data. FPC works well on hard-to-compress scientific datasets and meets
the throughput demands of high-performance systems. A comparison with
five lossless compression schemes, BZIP2, DFCM, FSD, GZIP, and PLMI,
on four architectures and thirteen datasets shows that FPC compresses
and decompresses one to two orders of magnitude faster than the other
algorithms at the same geometric-mean compression ratio. Moreover, FPC
provides a guaranteed throughput as long as the prediction tables fit
into the L1 data cache. For example, on a 1.6 GHz Itanium 2 server,
the throughput is 670 megabytes per second regardless of what data are
being compressed."
A tricky-to-find preprint http://users.ices.utexas.edu/~burtscher/papers/tr08.pdf
reveals a little more about the technique:
"FPC compresses linear sequences of IEEE 754 double-precision floating-
point values by sequentially predicting each value, xoring the true
value with the predicted value, and leading-zero compressing the
result. As illustrated in Figure 1, it uses variants of an fcm and a
dfcm value predictor to predict the doubles. Both predictors are
effectively hash tables. The more accurate of the two predictions,
i.e., the one that shares more common most significant bits with the
true value, is xored with the true value. Xor turns identical bits
into zeros. Hence, if the predicted and the true value are close, the
xor result has many leading zeros. FPC then counts the number of
leading zero bytes, encodes the count in a three-bit value, and
concatenates it with a single bit that specifies which of the two
predictions was used. The resulting four-bit code and the nonzero
residual bytes are written to the output. The latter are emitted
verbatim without any encoding."
Interesting article, but I have one question: anyone know anything
about these (D)FCM predictors?
|
| Mark Nelson - http://marknelson.us
|
|
|
0
|
|
|
|
Reply
|
Mark
|
8/12/2008 2:42:17 PM |
|
Hy Mark;
> "FPC compresses linear sequences of IEEE 754 double-precision floating-
> point values by sequentially predicting each value, xoring the true
> value with the predicted value, and leading-zero compressing the
> result. As illustrated in Figure 1, it uses variants of an fcm and a
> dfcm value predictor to predict the doubles. Both predictors are
> effectively hash tables. The more accurate of the two predictions,
> i.e., the one that shares more common most significant bits with the
> true value, is xored with the true value. Xor turns identical bits
> into zeros. Hence, if the predicted and the true value are close, the
> xor result has many leading zeros. FPC then counts the number of
> leading zero bytes, encodes the count in a three-bit value, and
> concatenates it with a single bit that specifies which of the two
> predictions was used. The resulting four-bit code and the nonzero
> residual bytes are written to the output. The latter are emitted
> verbatim without any encoding."
A note about this: predicting FP-values without breaking up the
bitfield (thus taking it as a random unified bitfield) is clearly
suboptimal. The only reason why the above approach of XPCM (not the
frequency-approach, only the XOR) works is, because of the strong left-
right ordering inside the FP-bitfield. The most effective approach
would be to construct a tree over the exponent-bits and create a model
of the /random/ mantissa-bitfield on each leaf. It's also possibly
worth to create transition-proabilities (probabilities that transfer
weight to near models, like the inheritance-approach of PPMII) which
adjust +1/-1 exponent according to the nature of the FP, even possibly
adjusted by a second stage (like SSE in PPMZ).
> Interesting article, but I have one question: anyone know anything
> about these (D)FCM predictors?
As far as I understood
http://adsabs.harvard.edu/abs/1981AnTel..36..197O
It's just a naming scheme for defining the nature of the source-data.
The technics (code-modulation) is still the same. It's analogue to
making quality-analisis in pulse- or frequency-space, which in the
case of edge-detection in either space is technically hardly any
different. The interpretation obviously _is_ different.
Well as such (as described above) I couln't exactly see it as (D)FCM,
I would curse it (X)FCM, because (D)ifferencial isn't correct,
e(X)clusive-differiencial? would be.
> | Mark Nelson -http://marknelson.us
> |
Ciao
Niels
|
|
0
|
|
|
|
Reply
|
spamtrap
|
8/12/2008 4:19:56 PM
|
|
> Interesting article, but I have one question: anyone know anything
> about these (D)FCM predictors?
AFAIK /older/ FP-compressors use bit-plane decomposition like
JPEG-2000, CompFits and hCompress.
The only really used format in that area is OpenEXR (I say really,
because it's used in massive production scale since years):
....
// HALF values are simply re-interpreted as 16-bit integers. FLOAT
// values are converted to 24 bits, and the resulting bit patterns
// are interpreted as integers. The compressor then replaces each
// value with the difference between the value and its left neighbor.
// This turns flat fields in the image into zeroes, and ramps into
// strings of similar values. Next, each difference is split into
// 2, 3 or 4 bytes, and the bytes are transposed so that all the
// most significant bytes end up in a contiguous block, followed
// by the second most significant bytes, and so on. The resulting
// string of bytes is compressed with zlib.
This is clearly linear DPCM, no spatial decorrelation or modeling.
I suppose the only maximum-compression tries done on this kind if
data is the stuff you find about 'geo'. Nothing serious(ly and
thoroughly investigated though).
> | Mark Nelson -http://marknelson.us
> |
Ciao
Niels
|
|
0
|
|
|
|
Reply
|
spamtrap
|
8/12/2008 4:34:50 PM
|
|
spamtrap@adsignum.com wrote:
> Hy Mark;
>
>> "FPC compresses linear sequences of IEEE 754 double-precision floating-
>> point values by sequentially predicting each value, xoring the true
>> value with the predicted value, and leading-zero compressing the
>> result. As illustrated in Figure 1, it uses variants of an fcm and a
>> dfcm value predictor to predict the doubles. Both predictors are
>> effectively hash tables. The more accurate of the two predictions,
>> i.e., the one that shares more common most significant bits with the
>> true value, is xored with the true value. Xor turns identical bits
>> into zeros. Hence, if the predicted and the true value are close, the
>> xor result has many leading zeros. FPC then counts the number of
>> leading zero bytes, encodes the count in a three-bit value, and
>> concatenates it with a single bit that specifies which of the two
>> predictions was used. The resulting four-bit code and the nonzero
>> residual bytes are written to the output. The latter are emitted
>> verbatim without any encoding."
>
> A note about this: predicting FP-values without breaking up the
> bitfield (thus taking it as a random unified bitfield) is clearly
> suboptimal. The only reason why the above approach of XPCM (not the
> frequency-approach, only the XOR) works is, because of the strong left-
> right ordering inside the FP-bitfield.
A comment to this comment: If we talk about *positive* floating point
values, the approach isn't really too bad. The way how IEEE numbers are
constructed implies that interpreting a (positive) IEEE floating point
number as integer (i.e. reading its bit-pattern as integer pattern)
defines a logarithmic map from the IEEE (subset of the reals) to the
integers, and is an approximation of a smooth, monotonic "nice" mapping.
Now, if your IEEE numbers follow a nice random distribution, the
corresponding distribution of the "re-interpreted" integers is easily
derived, and and a "naive" difference coding on them is not too bad.
Except that you loose correlation for negative numbers; the IEEE
handling of the sign-bit implies a non-continuous jump in the
"interpretation" mapping described above.
> The most effective approach
> would be to construct a tree over the exponent-bits and create a model
> of the /random/ mantissa-bitfield on each leaf. It's also possibly
> worth to create transition-proabilities (probabilities that transfer
> weight to near models, like the inheritance-approach of PPMII) which
> adjust +1/-1 exponent according to the nature of the FP, even possibly
> adjusted by a second stage (like SSE in PPMZ).
Some side-information on this from my field: Various techniques for
floating point compression (floating-point images) have been tried by my
colleagues in Erlangen (Fraunhofer). Interestingly, separating exponents
from mantissa and sign bit do *not* work very well. "Casting" works even
reasonable, for reasons pointed out above, and with a little "fix-up"
for the sign-bit, it should even work better.
Nevertheless, simply XORing the data (that's what I read), seems to be
rather naive.
So long,
Thomas
|
|
0
|
|
|
|
Reply
|
Thomas
|
8/12/2008 7:02:20 PM
|
|
Hy Thomas;
> Some side-information on this from my field: Various techniques for
> floating point compression (floating-point images) have been tried by my
> colleagues in Erlangen (Fraunhofer). Interestingly, separating exponents
> from mantissa and sign bit do *not* work very well. "Casting" works even
> reasonable, for reasons pointed out above, and with a little "fix-up"
> for the sign-bit, it should even work better.
Well, that's compression, it just works with the right data the right
way. :-)
If the data doesn't contain any extra extractable correlation, a
bigger model isn't worth it. But more than with other datatypes, with
doubles you have a huge sparse-data problem, especially for the
context-model. _If_ there is a linear relationship in the data and you
know it, you may wanna catch it.
> Nevertheless, simply XORing the data (that's what I read), seems to be
> rather naive.
Naja, we should forgive him, he wanted speed. You saw the code at the
end of the PDF.
And he makes emphasis that the hash-tables (may) fit into _L1_ cache.
8-]
Ciao
Niels
|
|
0
|
|
|
|
Reply
|
spamtrap
|
8/12/2008 9:37:03 PM
|
|
>
> Interesting article, but I have one question: anyone know anything
> about these (D)FCM predictors?
>
Take a look at the paper by the same author:
http://users.ices.utexas.edu/~burtscher/papers/can06.pdf
J.
|
|
0
|
|
|
|
Reply
|
Jacek
|
8/12/2008 9:40:40 PM
|
|
Thanks for the pointers everyone.
I'm underwhelmed by the utility of the FCM and DFCM predictors :-)
In particular, it looks to me as though they aren't going to do
particularly well on noisy data, which is exactly what I think you
will normally encounter when moving floating point numbers around.
|
| Mark Nelson - http://marknelson.us
|
|
|
0
|
|
|
|
Reply
|
Mark
|
8/16/2008 9:04:12 PM
|
|
|
6 Replies
287 Views
(page loaded in 0.094 seconds)
|