f



Lepton: a JPEG re-compressor

Dropbox -- who I am not affiliated with, even as a customer -- recently
announced a project called "Lepton". It takes a JPEG file and further
compresses it. They are interested in using less disk space for files
they hold on behalf of customers (or so I gather), so lossless is
imperative. But it is important to note that "lossless" here means:
no additional losses above those introduced in the original production
of the JPEG file.

They claim 22% reduction on average and decently fast speeds. I built a
copy and threw some files at it and got 21% reduction, with verified
lossless decompress.

https://blogs.dropbox.com/tech/2016/07/lepton-image-compression-saving-22-losslessly-from-images-at-15mbs/ 

https://github.com/dropbox/lepton

I'm posting here in hopes of soliciting comments from people more
knowledgeable than myself.

Elijah
------
would love to see this become widespread, if it's as good as it says it is
0
Eli
7/18/2016 9:37:33 PM
comp.compression 4696 articles. 0 followers. Post Follow

3 Replies
718 Views

Similar Articles

[PageSpeed] 11

Am 18.07.2016 um 23:37 schrieb Eli the Bearded:

> They claim 22% reduction on average and decently fast speeds. I built a
> copy and threw some files at it and got 21% reduction, with verified
> lossless decompress.
> 
> https://blogs.dropbox.com/tech/2016/07/lepton-image-compression-saving-22-losslessly-from-images-at-15mbs/ 
> 
> https://github.com/dropbox/lepton
> 
> I'm posting here in hopes of soliciting comments from people more
> knowledgeable than myself.

This looks like another attempt on JPEG recompression. This has been
seen before, for example in tools like packJPG, or the stuffIt JPEG
compression. Up to now, attempts to commercialize such tools, as useful
as they might seem, have not had much success.

Even within the JPEG standard itself, (smaller) improvements are
possible. So for example, one can losslessly transcode a Huffman-coded
JPEG file to an arithmetically coded one, and convert it back. I would
expect an improvement of about 10% on average for this simple method.

The transcoded file can be used to recover the original, without loss,
though it is surprisingly even conforming to the JPEG standard. However,
I would not expect that many decoders will actually be able to recover a
graphics from the AC-coded file, so transcoding it back to Huffman coded
data is likely the most compatible option.

Greetings,
	Thomas

0
Thomas
7/20/2016 6:58:12 PM
On 7/20/2016 1:58 PM, Thomas Richter wrote:
> Am 18.07.2016 um 23:37 schrieb Eli the Bearded:
>
>> They claim 22% reduction on average and decently fast speeds. I built a
>> copy and threw some files at it and got 21% reduction, with verified
>> lossless decompress.
>>
>> https://blogs.dropbox.com/tech/2016/07/lepton-image-compression-saving-22-losslessly-from-images-at-15mbs/
>>
>> https://github.com/dropbox/lepton
>>
>> I'm posting here in hopes of soliciting comments from people more
>> knowledgeable than myself.
>
> This looks like another attempt on JPEG recompression. This has been
> seen before, for example in tools like packJPG, or the stuffIt JPEG
> compression. Up to now, attempts to commercialize such tools, as useful
> as they might seem, have not had much success.
>

personally, I haven't had too much of an issue with JPEG's file sizes.


> Even within the JPEG standard itself, (smaller) improvements are
> possible. So for example, one can losslessly transcode a Huffman-coded
> JPEG file to an arithmetically coded one, and convert it back. I would
> expect an improvement of about 10% on average for this simple method.
>

IME feeding a JPEG through a bitwise arithmetic coder generally gets in 
this area as well (~ 5-10%).

generally I have considered it not particularly worthwhile, given the 
relatively steep cost this has in terms of decoder performance.


my own JPEG encoders tend to use trellis quantization, which tends to 
both improve compression while also making decoding slightly faster at 
an otherwise similar quality level. another trick (for improving decode 
speed) is detecting some special cases (ex: no coefficients past a 
certain point), and using alternate versions of the inverse quantization 
and IDCT transform which omit these coefficients.


potential improvements to the format I can think of are mostly things 
like a slightly tweaked VLC coding, and using a more advanced DC 
predictor. maybe a few "special case" features (mostly involving block 
coding, such as explicitly subsampled blocks).


in the past, I have noted it is pretty hard to get JPEG-like designs 
decoding much faster than about 80-100 megapixels/second. (faster may 
exist, dunno, this is just the fastest I have gotten it).


like, they are nice/elegant, but this is a limiting factor.

in all though, it is still faster than what I have typically managed 
with PNG-like designs.

big factors I think tend to be the number of entropy coded symbols, and 
tendency of "scanlines" to have a harder time fitting into cache, as 
opposed to small heavily-reused buffers used for a JPEG-style block 
transform.

though, yes, PNG like designs are intuitively pretty simple, and seem 
like they should be pretty fast, this hasn't really matched my experiences.


> The transcoded file can be used to recover the original, without loss,
> though it is surprisingly even conforming to the JPEG standard. However,
> I would not expect that many decoders will actually be able to recover a
> graphics from the AC-coded file, so transcoding it back to Huffman coded
> data is likely the most compatible option.
>

yeah.

generally it mostly matters that the original can be recovered I think, 
and in-general JPEG works pretty good as a "general purpose" format.


my own recent codec efforts have mostly been focusing on speed, and 
specialized use-cases, rather than getting the best possible 
quality/bitrate, nor making something which competes directly with JPEG 
in the general-purpose use-case (sort of like how both PNG and JPEG have 
certain things they do well, but neither can fully address the others' 
use-case).


for example, after resurrecting my previously stalled-out BTIC4B effort, 
mostly re-purposing it for texture compression (want reasonably fast 
decoding for quick load times), I have single-thread RGBA decode speeds 
of often around 250-400 megapixels/second (depends some on image and 
settings), which is a fair bit faster than I have been able to get from 
a JPEG decoder. though, image quality and bitrate are a bit more hit and 
miss vs JPEG, they are usually "not drastically worse".

currently, the RGBA/BGRA/RGBx/BGRx decode paths are the fastest paths, 
with BGRx being the fastest among these (note that the format supports 
and alpha channel, and RGBA/BGRA will try to decode alpha information).

in some cases, the BGRx path can often reach memcpy speeds.

in a current test, with my (still in development) HDR / float16 decode 
path, getting around 150-200 megapixels/second (to R11_G11_B10). also 
have a basically working BC7 and S3TC decode paths (which can also 
generate mipmapped output, at some speed cost), which are also currently 
still a little slower than the RGB decode paths.

still, TBD, I may also go add a BC6H decode path.


IME, WRT quality/bitrate, it does a little better than JPEG at lower 
quality levels (most likely due to Paeth prediction for the block color 
vectors, vs using the prior value as a predictor, where at lower quality 
levels the color-vector or DC coefficients tend to dominate over pixel 
bits or AC coeffs), albeit it typically does worse at higher quality levels.

while initially the cost of the block-level Paeth predictor was pretty 
steep, ended up using the trick of using the Y prediction to also select 
the U and V predictors. effectively the color vector works as 2 vectors 
((Y, U, V), (Dy, Du, Dv)), with Y and Dy working as the primary predictors.

prediction accuracy seems still pretty close to that of selecting each 
component independently. in my tests, the 2nd and 3rd place (non-Paeth) 
predictors were: (3A+3B+2C)/8, and (3A+3B-2C)/4, which curiously both 
seemed to do better on average than (A+B)/2. none did quite as good as 
Paeth, but did generally do better than always using the last value.

compression is working out a bit better than my original estimates, 
though still falling short of clearly "beating" JPEG. there is a 
drawback in that the codec is currently a bit more complicated than JPEG.


some common features:
   both use 8x8 blocks and chroma subsampled YUV.

some differences:
it uses fixed-format blocks and interpolation
   rather than DCT and a VLC coding;
use of STF2+AdRiceLL rather than static Huffman
   though, most values use raw AdRiceLL
use of a little-endian bitstream
   this  allows for cheaper bitstream operations
   most bitstream operations are branch-free,
    but this avoids needing to byte-swap

STF2 is similar to my prior SMTF trick, but is a little bit faster 
because it eliminates the use of branches. instead, a symbol index I is 
swapped with 7I/8. experimentally, this seemed to be the most effective 
(vs either 3I/4 or 15I/16). also, unlike SMTF, there is no need for the 
table to be able to rotate (my prior SMTF design required several 
branches and zero-extended arithmetic to rotate a table).

note that "STF" here stands for "Swap Towards Front". a prior STF 
swapped I with I-1, but needed to make sure I wasn't 0, whereas STF2 
avoids this check (it just swaps index 0 with itself).


the AdRiceLL coder basically uses length-limited Rice codes, with the K 
value updated dynamically based on Q. this whole process is driven by a 
lookup table (thus isn't that much different than a typical table-driven 
Huffman coder, except that the lookup tables tend to be constant).

for deltas, AdRiceLL is used by itself, as while AdRice with an 
extra-bits suffix can give better compression, it is also slower (and an 
exponential length-limiting scheme can serve a roughly similar effect, 
even if less optimally).


the STF2+AdRiceLL combination is pretty close in terms of speed and 
compression with that of static Huffman (though typically it does 
slightly worse on both fronts on average), but it is adaptive, and its 
low per-context cost tends to make contexts fairly cheap, which tends to 
be a net win.

well, and also I can understand how it works.
it is possible to feed the entropy-coded data through a bitwise 
range-coder for some additional compression, but this tends to come at a 
pretty steep cost in terms of speed.


note, a version of the format I am working on is here:
https://github.com/cr88192/bgbtech_misc/tree/master/tool_btic4b

and a basic spec format:
https://github.com/cr88192/bgbtech_misc/wiki/BTIC4B


0
BGB
7/21/2016 4:43:52 AM
On 7/20/2016 11:43 PM, BGB wrote:
> On 7/20/2016 1:58 PM, Thomas Richter wrote:
>> Am 18.07.2016 um 23:37 schrieb Eli the Bearded:
>>

>
> my own JPEG encoders tend to use trellis quantization, which tends to
> both improve compression while also making decoding slightly faster at
> an otherwise similar quality level. another trick (for improving decode
> speed) is detecting some special cases (ex: no coefficients past a
> certain point), and using alternate versions of the inverse quantization
> and IDCT transform which omit these coefficients.
>
>
> potential improvements to the format I can think of are mostly things
> like a slightly tweaked VLC coding, and using a more advanced DC
> predictor. maybe a few "special case" features (mostly involving block
> coding, such as explicitly subsampled blocks).
>

for another possible customized format:

the explicit subsampling would be to allow more efficiently using the 
trick mentioned above. this is in addition to fixed block layouts.

another trick is allowing blocks which skip using free-form VLC coding 
in favor of interpolation.

say, for an 8x8 block:
   full resolution, 8x8 VLC;
   1/2 resolution, 4x4 VLC;
   1/4 resolution, 2x2 VLC;
   flat, DC only;
   1/2 resolution, 4x4, 2/3/4 bits/coeff (30/45/60 coeff bits);
   1/4 resolution, 2x2, 2/3/4 bits/coeff (14/21/28 coeff bits);
     a VLC value would be used to encode the AC range.
     ex: block_tag DC_delta AC_range AC_vector_bits

reduced resolution blocks would work by omitting high-frequency coeffs, 
and would be shimmed onto specialized inverse-quantization and IDCT 
functions.

still yet to be seen how well this would work, as most of my prior 
attempts to make DCT-based designs faster have been met with limited 
success (or, they lose too much in terms of compression to really be 
worthwhile).


with normal JPEG images though, one mostly just has to chop in zigzag 
space. the encoder can be "clever", and use heuristics to detect when to 
chop off the end of the blocks.

ex:
   full block (64 coeffs);
   half block (32 coeffs);
   quarter block (16 coeffs);
   flat block (DC only).

downside is zigzag space isn't quite such a nice way to chop it in terms 
of the IDCT transform, and one mostly detects what type of block it is 
by the location of the EOB marker within the block.

making full use of block-chopping on the decoder side can also get a 
little hairy (effectively have to fully expand out the 2D DCT to do so), 
whereas a square-chop could be done with 1D DCT's.


advantage is all this remains fully compatible, partial disadvantage is 
that "generic" encoders will tend to produce images which can't be 
decoded as quickly (ex: blocks which could otherwise be decoded more 
quickly are decoded via a slower path because of a few +/- 1's hanging 
out in higher-frequency coeffs or similar).

generally, also, there is logic to detect the use of known fixed layouts 
(such as 4:2:0 or 4:2:2).



>
> in the past, I have noted it is pretty hard to get JPEG-like designs
> decoding much faster than about 80-100 megapixels/second. (faster may
> exist, dunno, this is just the fastest I have gotten it).
>

this was for single threaded; around 300-400 is typically possible via 
multithreading...

a quick skim implies that ~ 100 for a single-thread, and ~ 300 for 
multithreaded decoding, is a fairly typical limit on "modernish" hardware.



but, then, one can multithread color-cell / VQ designs can get a fair 
bit faster than this (gigapixels/second).

so, it is the issue:
I have had more luck pushing VQ towards better image quality and 
compression, than I have had at trying to make DCT or WHT based designs 
speed-competitive with blocky VQ.

but, who knows...

pretty much no one seems to care about blocky VQ, but then again, these 
sorts of codecs tend to resort in am implementation which is basically a 
big pile of hair...


in controlled situations, namely where both the encoder and decoder are 
known in advance, and generally images are batch-converted (say, from 
source images stored in PNG or similar), ..., the case for sticking with 
normal JPEG (vs going full custom) is a bit weaker (and the question 
then becomes which format does "best").


kind of left also half considering the idea of hacking HDR data into 
PNG, say, RGBE data in a PNG, rather than the usual true-color RGBA data 
(sort of as an alternative to Radiance HDR and OpenEXR, similar levels 
of annoyance involved...).

0
BGB
7/21/2016 10:05:15 PM
Reply: