Re: A Random Data Compression method -draft #7

  • Permalink
  • submit to reddit
  • Email
  • Follow


There with the eggs down my throat and stummie it's now time for the move to 
front to perform it's miracles ;) :)

Let's see... where's my input:

Random input:

 1  2  3  4  5  6  7  8  9  10 11 12
11 10 10 01 01 10 11 01 01 01 01 01

Ok... man...

Now for the table which will always stay the same as according to the 
original mtf algo:

[0, 10, 110, 111]

OK, now for the initial table for the values:

[00, 01, 10, 11]

And now the encoding begins:

1: 11 -> 111   table-> [11, 00, 01, 10]
2: 10 -> 111   table-> [10, 11, 00, 01]
3: 10 -> 0     table-> [10, 11, 00, 01]
4: 01 -> 111   table-> [01, 10, 11, 00]
5: 01 -> 0     table-> [01, 10, 11, 00]
6: 10 -> 10    table-> [10, 01, 11, 00]
7: 11 -> 110   table-> [11, 10, 01, 00]
8: 01 -> 110   table-> [01, 11, 10, 00]
9: 01 -> 0     table-> [01, 11, 10, 00]
10: 01 -> 0     table-> [01, 11, 10, 00]
11: 01 -> 0     table-> [01, 11, 10, 00]
12: 01 -> 0     table-> [01, 11, 10, 00]

Final result: 6x1 + 1x2 + 5x3 bits = 23 bits.

Input was 24 bits, 1 bit saved therefore... not bad.
Much better than huffman code at least for this short code.

Perhaps recursive encoding might provide further compression, let's try:
Endings will be padded with zero's.

Output from last round:

 1  2  3  4  5  6  7  8  9 10 11 12
11 11 11 01 11 01 01 10 11 00 00 00

table[00,01,10,11]

1: 11->111  table->[11,00,01,10]
2: 11->0    table->[11,00,01,10]
3: 11->0    table->[11,00,01,10]
4: 01->110  table->[01,11,00,10]
5: 11->10   table->[11,01,00,10]
6: 01->10   table->[01,11,00,10]
7: 01->0    table->[01,11,00,10]
8: 10->111  table->[10,01,11,00]
9: 11->110  table->[11,10,01,00]
10: 00->111  table->[00,11,10,01]
11: 00->0    table->[00,11,10,01]
12: 00->0    table->[00,11,10,01]

Result of round2: 5x1 + 2x2 + 5x3 = 24 bits.

Back to square one again ! ;) :)

Unless we leave last code at 1 bit hehe.

Now purely for my own interest I will give the bubble to front a try to see 
how it performs:

Random input:

 1  2  3  4  5  6  7  8  9  10 11 12
11 10 10 01 01 10 11 01 01 01 01 01

table[00, 01, 10, 11]

1: 11->111   table->[00,01,11,10]
2: 10->111   table->[00,01,10,11]
3: 10->110   table->[00,10,01,11]
4: 01->110   table->[00,01,10,11]
5: 01->10    table->[01,00,10,11]
6: 10->110   table->[01,10,00,11]
7: 11->111   table->[01,10,11,00]
8: 01->0     table->[01,10,11,00]
9: 01->0     table->[01,10,11,00]
10: 01->0     table->[01,10,11,00]
11: 01->0     table->[01,10,11,00]
12: 01->0     table->[01,10,11,00]

BubbleToFront Result: 5x1 + 1x2 + 6x3 = 25 bits

Slightly worse... then input.

Let's do bubble to front one more time just to see what happens:

 1  2  3  4  5  6  7  8  9 10 11 12 13
11 11 11 11 01 10 10 11 01 11 00 00 00

table[00,01,10,11]

1: 11->111   table->[00,01,11,10]
2: 11->110   table->[00,11,01,10]
3: 11->10    table->[11,00,01,10]
4: 11->0     table->[11,00,01,10]
5: 01->110   table->[11,01,00,10]
6: 10->111   table->[11,01,10,00]
7: 10->110   table->[11,10,01,00]
8: 11->0     table->[11,10,01,00]
9: 01->110   table->[11,01,10,00]
10: 11->0    table->[11,01,10,00]
11: 00->111  table->[11,01,00,10]
12: 00->110  table->[11,00,01,10]
13: 00->10   table->[00,11,01,10]

Result for round 2 of bubble: 3x1 + 2x2 + 8x3 = 31 bits

A big increase for bubble... so bubble is d.e.a.d for this case at least ;) 
:)

Pretty cool... maybe in next post I try his switch idea in table form ;) :)

Bye,
  Skybuck. 


0
Reply Skybuck 1/10/2011 8:55:56 PM

See related articles to this posting

comp.compression 4466 articles. 14 followers. Post

0 Replies
256 Views

Similar Articles

[PageSpeed] 2


Reply:

Similar Artilces:

Re: Compressing data sets (was Re: [SAS-L]) #2 #7
I'm crazy..yes, SAS/Access...not Connect... ________________________________ Bruce A. Johnson bjohnson@solucient.com -----Original Message----- From: Jack Hamilton [mailto:JackHamilton@firsthealth.com] Sent: Tuesday, January 06, 2004 1:22 PM To: SAS-L@LISTSERV.UGA.EDU; Bruce Johnson Subject: Re: [SAS-L] Compressing data sets (was Re: [SAS-L]) A new way to read/write Excel files in 9.1 was mentioned at SUGI, but as I recall it required a server running SAS/Access to PC File Formats. It wasn't SAS/Connect doing the reading and writing, just as SAS/Connect isn't doing the remote...

Best possible compression method for random data?
What is the best known compression method for random data? I mean any random data, example in hex: it would be between 00-FF Size let's say 1 mega > ??? Posted Via Usenet.com Premium Usenet Newsgroup Services ---------------------------------------------------------- ** SPEED ** RETENTION ** COMPLETION ** ANONYMITY ** ---------------------------------------------------------- http://www.usenet.com Serge wrote: ) What is the best known compression method for random data? ) I mean any random data, example in hex: it would be between 00-FF ) Size let's s...

Re: How to pick records randomly from a data sets #7
Rune Runnest� asked in part: > -----Urspr�ngliche Nachricht----- > Von: Rune Runnest� > Gesendet: Donnerstag, 22. M�rz 2007 23:18 > > Quite another challenge: Is there a way to make each of > the dataset names > created in macro CREATE_DS to end > with '_p'. So that the dataset names will > be TP1986_P, tp1991_p, tp1997_p and tp1999_p ? > > > Regards, Rune > <some code snipped here> > > proc sql noprint; > select count(*) into :numrows from one; > %let numrows = &numrows; > select year into :year1 - :year&numr...

Re: How to randomly select records from a large data set? #7
if PROC SURVEYSELECT is not available, try DATA STEP like this: data random; set population; ranNum=ranuni(123); run; proc sort data=random; by ranNum; run; data sample(drop=ranNum); set random(obs=1000); run; On Jan 23, 2008 12:59 AM, Nordlund, Dan (DSHS/RDA) <NordlDJ@dshs.wa.gov> wrote: > > If you can get your proc surveyselect problems sorted out that will > probably be the best way to go. But if not, here is an approach that will > not take a lot of work space since there is no need sort/order your large > dataset. > -- Jiangtang Hu Peking U...

Re: How to randomly select a certain data from a large dataset #7
=0A=0A=0A=0A________________________________=0A=0AFrom: Ari Toikka <toikkar= i@YAHOO.CO.UK>=0ATo: SAS-L@LISTSERV.UGA.EDU=0ASent: Fri, 6 November, 2009 1= 6:19:07=0ASubject: Re: How to randomly select a certain data from a large d= ataset=0A=0Aa simple method to select a=A0sample of exactly 400:=0A=0Adata = sample;=0A=A0=A0=A0=A0 set yourdata nobs=3DN;=0A=A0=A0=A0=A0=A0 if _n_ =3D = 1 then samplesize =3D 400;=0A=A0=A0=A0=A0=A0 retain samplesize;=0A=A0=A0=A0= =A0=A0 drop samplesize;=0A=A0=A0=A0=A0=A0 if ranuni(0) < samplesize / (n- _= n_ +1);=A0=0A=A0=A0=A0=A0=A0 samplesize =3D samplesi...

Re: COMPRESS function working differently in Data Step and Macro #7
Hi, The macro %CMPRESS() function works more like datastep COMPBL function than the COMPRESS function. From the documentation ... "The CMPRES and QCMPRES macros compress multiple blanks and remove leading and trailing blanks." Regards ++ Guido On 31/08/06, Cat <job.alerte@gmail.com> wrote: > Hi Jack, > > %sysfunc(compress()) can be replaced by %cmpress() . > > Regards, > > Catherine. > ------------------------------------- > Biostatistician / SAS pharma programer > www.keyrusbiopharma.fr > > Jack Clark wrote: > > I moved the closi...

Re: Method to distinguish different periods in time series data #7
Following up on the Regime-Shift idea, that suggests quality-control and control charts could be a useful arena as well. There are both parametric and nonparametric constructs for cumsum and exponentially weighted moving averages (EWMA).=20 Warren Schlechte -----Original Message----- From: Samuel Croker [mailto:samuel.croker@GMAIL.COM]=20 Sent: Thursday, October 18, 2007 9:21 AM Subject: Re: Method to distinguish different periods in time series data If you are pretty sure that you know when the change happened, then an intervention analysis is the way to go. I think that you should be ab...

Re: Randomly Select Observations on Infile Step (Large Data Sets) #7
Muthia: The reason that every case has an equal chance of being in the sample is th= at each incoming record has an equal probability of having a random number = in the top 50, even though you don't know in advance the cutpoint for the 5= 0th rank value of ranuni . Note that: 1. Every record gets a ranuni value and, 2. The seed is the same for each. Then as the program steps through the infile, a record whose ranuni score a= t first is determined to be part of the top 50 (and therefore put in the ha= sh table) will be dropped from the table if inclusion of subsequent recor...

Re: Random number generation
Wow, that already was about 400% mpre than I had any idea of. Thanks, Dan! I particularly find interesting the repeat period. While I don't operate on datasets in the 2,000,000,000 range, in the range I do operate (100,000 - 1,000,000) it is a good idea to pay some attention to this, when generating multiple random keys. 1 in 2,000 is unlikely to overlap, but by no means impossible. Can I ask, how you get the updated seed? That's not something I recall having seen before. Thanks, and I look forward to seeing your writeup! -Joe On Feb 6, 2009, at 7:10 PM, "Nordlund, Dan (DSHS...

Determine compression method from compressed data
Hi; Maybe one of you recognizes some characteristic bits in the stream described below. My goal is to find a way to compress a different stream so that the same given decoder will be able to decompress it, but I haven't yet been able to determine the compression method nor its parameters. There is no header (no gzip,arj,...). Data is stored in contiguous chunks; each prefixed with one big-endian 16-bit word specifying the expected length (in bytes) of the chunk when uncompressed, and a second word specifying the number of bytes following in the compressed chunk. The uncompressed length...