Re: A Random Data Compression method -draft #7

  • Permalink
  • submit to reddit
  • Email
  • Follow


There with the eggs down my throat and stummie it's now time for the move to 
front to perform it's miracles ;) :)

Let's see... where's my input:

Random input:

 1  2  3  4  5  6  7  8  9  10 11 12
11 10 10 01 01 10 11 01 01 01 01 01

Ok... man...

Now for the table which will always stay the same as according to the 
original mtf algo:

[0, 10, 110, 111]

OK, now for the initial table for the values:

[00, 01, 10, 11]

And now the encoding begins:

1: 11 -> 111   table-> [11, 00, 01, 10]
2: 10 -> 111   table-> [10, 11, 00, 01]
3: 10 -> 0     table-> [10, 11, 00, 01]
4: 01 -> 111   table-> [01, 10, 11, 00]
5: 01 -> 0     table-> [01, 10, 11, 00]
6: 10 -> 10    table-> [10, 01, 11, 00]
7: 11 -> 110   table-> [11, 10, 01, 00]
8: 01 -> 110   table-> [01, 11, 10, 00]
9: 01 -> 0     table-> [01, 11, 10, 00]
10: 01 -> 0     table-> [01, 11, 10, 00]
11: 01 -> 0     table-> [01, 11, 10, 00]
12: 01 -> 0     table-> [01, 11, 10, 00]

Final result: 6x1 + 1x2 + 5x3 bits = 23 bits.

Input was 24 bits, 1 bit saved therefore... not bad.
Much better than huffman code at least for this short code.

Perhaps recursive encoding might provide further compression, let's try:
Endings will be padded with zero's.

Output from last round:

 1  2  3  4  5  6  7  8  9 10 11 12
11 11 11 01 11 01 01 10 11 00 00 00

table[00,01,10,11]

1: 11->111  table->[11,00,01,10]
2: 11->0    table->[11,00,01,10]
3: 11->0    table->[11,00,01,10]
4: 01->110  table->[01,11,00,10]
5: 11->10   table->[11,01,00,10]
6: 01->10   table->[01,11,00,10]
7: 01->0    table->[01,11,00,10]
8: 10->111  table->[10,01,11,00]
9: 11->110  table->[11,10,01,00]
10: 00->111  table->[00,11,10,01]
11: 00->0    table->[00,11,10,01]
12: 00->0    table->[00,11,10,01]

Result of round2: 5x1 + 2x2 + 5x3 = 24 bits.

Back to square one again ! ;) :)

Unless we leave last code at 1 bit hehe.

Now purely for my own interest I will give the bubble to front a try to see 
how it performs:

Random input:

 1  2  3  4  5  6  7  8  9  10 11 12
11 10 10 01 01 10 11 01 01 01 01 01

table[00, 01, 10, 11]

1: 11->111   table->[00,01,11,10]
2: 10->111   table->[00,01,10,11]
3: 10->110   table->[00,10,01,11]
4: 01->110   table->[00,01,10,11]
5: 01->10    table->[01,00,10,11]
6: 10->110   table->[01,10,00,11]
7: 11->111   table->[01,10,11,00]
8: 01->0     table->[01,10,11,00]
9: 01->0     table->[01,10,11,00]
10: 01->0     table->[01,10,11,00]
11: 01->0     table->[01,10,11,00]
12: 01->0     table->[01,10,11,00]

BubbleToFront Result: 5x1 + 1x2 + 6x3 = 25 bits

Slightly worse... then input.

Let's do bubble to front one more time just to see what happens:

 1  2  3  4  5  6  7  8  9 10 11 12 13
11 11 11 11 01 10 10 11 01 11 00 00 00

table[00,01,10,11]

1: 11->111   table->[00,01,11,10]
2: 11->110   table->[00,11,01,10]
3: 11->10    table->[11,00,01,10]
4: 11->0     table->[11,00,01,10]
5: 01->110   table->[11,01,00,10]
6: 10->111   table->[11,01,10,00]
7: 10->110   table->[11,10,01,00]
8: 11->0     table->[11,10,01,00]
9: 01->110   table->[11,01,10,00]
10: 11->0    table->[11,01,10,00]
11: 00->111  table->[11,01,00,10]
12: 00->110  table->[11,00,01,10]
13: 00->10   table->[00,11,01,10]

Result for round 2 of bubble: 3x1 + 2x2 + 8x3 = 31 bits

A big increase for bubble... so bubble is d.e.a.d for this case at least ;) 
:)

Pretty cool... maybe in next post I try his switch idea in table form ;) :)

Bye,
  Skybuck. 


0
Reply Skybuck 1/10/2011 8:55:56 PM

See related articles to this posting

comp.compression 4515 articles. 14 followers. Post

0 Replies
319 Views

Similar Articles

[PageSpeed] 44


Reply:

Similar Artilces:

Re: Compressing data sets (was Re: [SAS-L]) #2 #7
I'm crazy..yes, SAS/Access...not Connect... ________________________________ Bruce A. Johnson bjohnson@solucient.com -----Original Message----- From: Jack Hamilton [mailto:JackHamilton@firsthealth.com] Sent: Tuesday, January 06, 2004 1:22 PM To: SAS-L@LISTSERV.UGA.EDU; Bruce Johnson Subject: Re: [SAS-L] Compressing data sets (was Re: [SAS-L]) A new way to read/write Excel files in 9.1 was mentioned at SUGI, but as I recall it required a server running SAS/Access to PC File Formats. It wasn't SAS/Connect doing the reading and writing, just as SAS/Connect isn't doing the remote...

Re: Random number generation
Wow, that already was about 400% mpre than I had any idea of. Thanks, Dan! I particularly find interesting the repeat period. While I don't operate on datasets in the 2,000,000,000 range, in the range I do operate (100,000 - 1,000,000) it is a good idea to pay some attention to this, when generating multiple random keys. 1 in 2,000 is unlikely to overlap, but by no means impossible. Can I ask, how you get the updated seed? That's not something I recall having seen before. Thanks, and I look forward to seeing your writeup! -Joe On Feb 6, 2009, at 7:10 PM, "Nordlund, Dan (DSHS...

Best possible compression method for random data?
What is the best known compression method for random data? I mean any random data, example in hex: it would be between 00-FF Size let's say 1 mega > ??? Posted Via Usenet.com Premium Usenet Newsgroup Services ---------------------------------------------------------- ** SPEED ** RETENTION ** COMPLETION ** ANONYMITY ** ---------------------------------------------------------- http://www.usenet.com Serge wrote: ) What is the best known compression method for random data? ) I mean any random data, example in hex: it would be between 00-FF ) Size let's s...

Re: How to pick records randomly from a data sets #7
Rune Runnest� asked in part: > -----Urspr�ngliche Nachricht----- > Von: Rune Runnest� > Gesendet: Donnerstag, 22. M�rz 2007 23:18 > > Quite another challenge: Is there a way to make each of > the dataset names > created in macro CREATE_DS to end > with '_p'. So that the dataset names will > be TP1986_P, tp1991_p, tp1997_p and tp1999_p ? > > > Regards, Rune > <some code snipped here> > > proc sql noprint; > select count(*) into :numrows from one; > %let numrows = &numrows; > select year into :year1 - :year&numr...

Determine compression method from compressed data
Hi; Maybe one of you recognizes some characteristic bits in the stream described below. My goal is to find a way to compress a different stream so that the same given decoder will be able to decompress it, but I haven't yet been able to determine the compression method nor its parameters. There is no header (no gzip,arj,...). Data is stored in contiguous chunks; each prefixed with one big-endian 16-bit word specifying the expected length (in bytes) of the chunk when uncompressed, and a second word specifying the number of bytes following in the compressed chunk. The uncompressed length...

Re: How to randomly select records from a large data set? #7
if PROC SURVEYSELECT is not available, try DATA STEP like this: data random; set population; ranNum=ranuni(123); run; proc sort data=random; by ranNum; run; data sample(drop=ranNum); set random(obs=1000); run; On Jan 23, 2008 12:59 AM, Nordlund, Dan (DSHS/RDA) <NordlDJ@dshs.wa.gov> wrote: > > If you can get your proc surveyselect problems sorted out that will > probably be the best way to go. But if not, here is an approach that will > not take a lot of work space since there is no need sort/order your large > dataset. > -- Jiangtang Hu Peking U...

Re: How to randomly select a certain data from a large dataset #7
=0A=0A=0A=0A________________________________=0A=0AFrom: Ari Toikka <toikkar= i@YAHOO.CO.UK>=0ATo: SAS-L@LISTSERV.UGA.EDU=0ASent: Fri, 6 November, 2009 1= 6:19:07=0ASubject: Re: How to randomly select a certain data from a large d= ataset=0A=0Aa simple method to select a=A0sample of exactly 400:=0A=0Adata = sample;=0A=A0=A0=A0=A0 set yourdata nobs=3DN;=0A=A0=A0=A0=A0=A0 if _n_ =3D = 1 then samplesize =3D 400;=0A=A0=A0=A0=A0=A0 retain samplesize;=0A=A0=A0=A0= =A0=A0 drop samplesize;=0A=A0=A0=A0=A0=A0 if ranuni(0) < samplesize / (n- _= n_ +1);=A0=0A=A0=A0=A0=A0=A0 samplesize =3D samplesi...

Re: COMPRESS function working differently in Data Step and Macro #7
Hi, The macro %CMPRESS() function works more like datastep COMPBL function than the COMPRESS function. From the documentation ... "The CMPRES and QCMPRES macros compress multiple blanks and remove leading and trailing blanks." Regards ++ Guido On 31/08/06, Cat <job.alerte@gmail.com> wrote: > Hi Jack, > > %sysfunc(compress()) can be replaced by %cmpress() . > > Regards, > > Catherine. > ------------------------------------- > Biostatistician / SAS pharma programer > www.keyrusbiopharma.fr > > Jack Clark wrote: > > I moved the closi...

Re: Method to distinguish different periods in time series data #7
Following up on the Regime-Shift idea, that suggests quality-control and control charts could be a useful arena as well. There are both parametric and nonparametric constructs for cumsum and exponentially weighted moving averages (EWMA).=20 Warren Schlechte -----Original Message----- From: Samuel Croker [mailto:samuel.croker@GMAIL.COM]=20 Sent: Thursday, October 18, 2007 9:21 AM Subject: Re: Method to distinguish different periods in time series data If you are pretty sure that you know when the change happened, then an intervention analysis is the way to go. I think that you should be ab...

Re: Randomly Select Observations on Infile Step (Large Data Sets) #7
Muthia: The reason that every case has an equal chance of being in the sample is th= at each incoming record has an equal probability of having a random number = in the top 50, even though you don't know in advance the cutpoint for the 5= 0th rank value of ranuni . Note that: 1. Every record gets a ranuni value and, 2. The seed is the same for each. Then as the program steps through the infile, a record whose ranuni score a= t first is determined to be part of the top 50 (and therefore put in the ha= sh table) will be dropped from the table if inclusion of subsequent recor...

Re: Compressing data sets (was Re: [SAS-L])
On Mon, 5 Jan 2004 18:14:02 -0700, Jack Hamilton <JackHamilton@FIRSTHEALTH.COM> wrote: >OK, "you or a program on your behalf will have to decompress them before >use". > > > >-- >JackHamilton@FirstHealth.com >Manager, Technical Development >Metrics Department, First Health >West Sacramento, California USA > >>>> "Richard Graham" <richardwgraham@earthlink.net> 01/05/2004 5:04 PM >>>> >Actually you can compress(WINZIP, PKZIP) SAS data sets and use them in >the >compressed format. There is software...

Re: combining data between different data sets #7
Bob, Yes, there is a lot of confusion. You have never given a complete specification of what you are attempting to do, or listed ALL of the variables that are in each of the various datasets that you are using. This is important because it may APPEAR that a RETAIN statement is "not working" if a variable you are retaining exists in the input dataset. Slowly but surely I think I am getting a better handle on what you want to do, but I am still doing some assuming (and yes, that is dangerous). If I understand correctly, you some monthly transactions for a variety of stocks, and yo...

Re: make a data set from an empty data? #7
Ya: I believe this is how data step works: If end of dataset is reached, the data step ends. Hence in your case, the data step ends before it reaches *a='abc';*. J S Huang 1-515-557-3987 fax 1-515-557-2422 >>> Ya Huang <ya.huang@AMYLIN.COM> 05/25/06 3:52 PM >>> proc sql; create table tmplt (a char(5), b char(10), c num); data xx; set tmplt; a='abc'; b='xyz'; d=a||b; output; run; OK, so I have a empty data with the structure I needed created by proc sql. I then use a data step to fill in the records, and some new variables. I got an em...

Re: 7.2 or 7.4 for critical data?
>I will be installing Debian distro soon at my company. I'll be >storing critical data and am wondering if I should go with stable 7.2 >version or use the 7.4 version (for all the latest features and bug >fixes). I'm currently doing the research for the management here. > >I have checked the newsgroups/blogs/websites and there doesn't seem to >be any major problems with using 7.4 version. If you are a DBA/system >admin at your company and currently using 7.4 for critical data >storing, please post your experiences and if they were positive or ...

Re: Compressing data sets (was Re: [SAS-L]) #2
OK, "you or a program on your behalf will have to decompress them before use". -- JackHamilton@FirstHealth.com Manager, Technical Development Metrics Department, First Health West Sacramento, California USA >>> "Richard Graham" <richardwgraham@earthlink.net> 01/05/2004 5:04 PM >>> Actually you can compress(WINZIP, PKZIP) SAS data sets and use them in the compressed format. There is software named ZipMagic which allows zip files to be used as windows folders. I have done this with mixed results when the underlying data set is upwards of 4GB. Bu...

Re: Converting surveymonkey data into sas data #7 662749
Assuming that you already know how you want to recode the answers and unashamedly stealing and reworkingToby's code, here is another approach to the problem. If you have not seen the usage before, the @ at the end of the first input allows us to read a line and reread it later. The _infile_ let's us grab only a portion of the input line. This let's us avoid a numeric to character conversion. In production code, I would not read the variable STRING. Nat Wooding proc format ; INvalue $MonkA '0' = 1 '1' = 2 '3' = 3 '6' = 4 '...

Re: question of how data handle large data set #7
If you use the roundZ function you get the result you expected. cz=roundz(a,0.01); On Tue, Apr 8, 2008 at 7:43 PM, Mindy <master2005_sas@yahoo.com> wrote: > Hey, Guys, > > I have a question of how sas handle large data set. Sample code and > output as below. two question, (1)why sas automatically apply format > to variable a when print out. > (2) why round value is -0.03 instead of -0.02. Many thanks. __Mindy > > (a) Code > data try; > input a; > cards; > -0.02499999999999 > ; > data try2; > set try; > b= put(a, 8.2); > c=round(a,...

Re: Converting surveymonkey data into sas data #7 1552233
I see your problem as READING character strings to create a numeric variable. The numeric INFORMAT is the helpful device for this problem. I would let my data help me create this INFORMAT as show below. data work.monkey; input resp $16.; cards; 0 1 or 2 3-5 6-10 6-10 1 or 2 ;;;; run; proc summary nway missing data=work.monkey; class resp / order=data; output out=work.cntlin(drop=_type_ _freq_) / levels; run; data work.cntlin; set work.cntlin; start = upcase(resp); label = _level_; retain fmtname 'monkey' type 'I' hlo 'UJ'; /* HLO ...

Compressing "random" data? They are not that random, actually we have a HINT...
Let's describe a problem that I want to solve. I need to compress blocks of seemengly random data, which are created in a way described there: http://en.wikipedia.org/wiki/OFFSystem Suppose that block size is always 128Kb for all blocks. A compressible block of data A is xored with a block of data R that is random perfectly random and saved into block B, which is then also statistically perfectly random. Blocks R and B can be downloaded so that anyone who has knowledge that B = R xor A can obtain A = R xor B. I need to be able to construct an algorithm that will compress block B so tha...

Re: Help on data look up between two data set #7
On Fri, 2 Nov 2007 10:08:33 -0700, Terjeson, Mark <Mterjeson@RUSSELL.COM> wrote: >Hi Sophia, > >The example below can generate the result. >When using real record counts 1000x20000 >then I venture a guess that you want to >trim down the results to possibly just the >records that have intersections or something? >I added a second alternative to only show >matches but include all patients. > > >data tableA; > input patient_id arrive_time time.; > format arrive_time time.; >cards; >1 9:25:00 >2 13:32:00 >3 13.56.00 >; >run; ...

Re: Compressing data sets (was Re: [SAS-L]) #2 #5
"using SAS/Connect to read non-SAS files"=Reading and writing (for example) Excel files directly from SAS. ________________________________ Bruce A. Johnson bjohnson@solucient.com -----Original Message----- From: Jack Hamilton [mailto:JackHamilton@firsthealth.com] Sent: Tuesday, January 06, 2004 12:00 PM To: SAS-L@LISTSERV.UGA.EDU; Bruce Johnson Subject: Re: [SAS-L] Compressing data sets (was Re: [SAS-L]) It's not a big deal, but there is a difference between reading a file directly and having another program as an intermediary, and someone someday is going to get into troub...

Re: Compressing data sets (was Re: [SAS-L]) #2 #6
A new way to read/write Excel files in 9.1 was mentioned at SUGI, but as I recall it required a server running SAS/Access to PC File Formats. It wasn't SAS/Connect doing the reading and writing, just as SAS/Connect isn't doing the remote reading and writing in a regular Connect session. Or are you thinking of something else? -- JackHamilton@FirstHealth.com Manager, Technical Development Metrics Department, First Health West Sacramento, California USA >>> "Bruce Johnson" <bjohnson@SOLUCIENT.COM> 01/06/2004 10:02 AM >>> "using SAS/Connect to...

Re: Compressing data sets (was Re: [SAS-L]) #2 #4
It's not a big deal, but there is a difference between reading a file directly and having another program as an intermediary, and someone someday is going to get into trouble because they didn't take that into account. I'm not sure what you mean by "using SAS/Connect to read non-SAS files". -- JackHamilton@FirstHealth.com Manager, Technical Development Metrics Department, First Health West Sacramento, California USA >>> "Bruce Johnson" <bjohnson@SOLUCIENT.COM> 01/06/2004 9:36 AM >>> But if another program is doing it, within SAS, wh...

Re: Compressing data sets (was Re: [SAS-L]) #2 #2
Yes, but that's still having another program do the decompression on your behalf. -- JackHamilton@FirstHealth.com Manager, Technical Development Metrics Department, First Health West Sacramento, California USA >>> <ben.powell@CLA.CO.UK> 01/06/2004 1:21 AM >>> On Mon, 5 Jan 2004 18:14:02 -0700, Jack Hamilton <JackHamilton@FIRSTHEALTH.COM> wrote: >OK, "you or a program on your behalf will have to decompress them before >use". > > > >-- >JackHamilton@FirstHealth.com >Manager, Technical Development >Metrics Department, Fir...

Re: [revisited] Re: a better alternative to collapse data? #2 #7
On Wed, Mar 5, 2008 at 9:57 PM, Paul Dorfman <sashole@bellsouth.net> wrote: > call missing (of _all_, exclude = (var_i_do_not_want_to_reset)) ; ? I don't think we need this. An explicitly sub-scripted array can be used as a variable list. Or for this example CD:, or CD1-CD4; or move the array definition to the Top and use _all_. array cd[4] $3; call missing(of _all_); On this one I'd say you can have the cake and eat it to. But of course the "real" problem is the addition of the DO UNTIL(EOF) which interrupts the problem rhythm. data dsout (keep = idindex cd:)...