Find repeated sequence in array

  • Follow


Hello,
I have array of two vectors, which consists of repeated data sequences. For simplicity lets consider:

x = 
1 1
3 1
5 2
7 2
1 1
3 1
5 2
7 2
1 1
3 1
5 2
7 2
....

I would like to extract the sequenced part:

y = 
1 1
3 1
5 2
7 2

Arrays are consisted of 10^7 to 10^8 elements, and sequence consists of maximum 10^6 elements, usually less like 10^5

Thanks
0
Reply Dejan 12/21/2010 5:37:08 PM

"Dejan" wrote in message <ieqok4$hso$1@fred.mathworks.com>...
> Hello,
> I have array of two vectors, which consists of repeated data sequences. For simplicity lets consider:
> 
> x = 
> 1 1
> 3 1
> 5 2
> 7 2
> 1 1
> 3 1
> 5 2
> 7 2
> 1 1
> 3 1
> 5 2
> 7 2
> ...
> 
> I would like to extract the sequenced part:
> 
> y = 
> 1 1
> 3 1
> 5 2
> 7 2
> 
> Arrays are consisted of 10^7 to 10^8 elements, and sequence consists of maximum 10^6 elements, usually less like 10^5
> 
> Thanks

y = unique(x,'rows');
0
Reply Sean 12/21/2010 5:48:06 PM


"Sean de " <sean.dewolski@nospamplease.umit.maine.edu> wrote in message <ieqp8m$se4$1@fred.mathworks.com>...
> 
> y = unique(x,'rows');

Thanks for your quick answer :)

I'm obviously very new to Matlab, but I just didn't thought that answer could be that simple, considering amount of data in question

Solution works great

Cheers
0
Reply Dejan 12/21/2010 6:00:22 PM

> y = unique(x,'rows');

After better checking, unfortunately it reorders the elements in ascending order

Is there are way to keep original elements order?
0
Reply Dejan 12/21/2010 6:31:06 PM

[T,H] = unique(x,'rows','first');
x_new = x(H,:)
0
Reply Matt 12/21/2010 7:38:20 PM

Matt small typo:
instead of :
x_new = x(H,:) 
it should be
x_new = T(H,:);

Branko


"Matt Fig" wrote in message <ieqvnc$o25$1@fred.mathworks.com>...
> [T,H] = unique(x,'rows','first');
> x_new = x(H,:)
0
Reply Branko 12/21/2010 8:03:22 PM

"Branko " <brankobogunovic@gmail.com> wrote in message <ier16a$sqh$1@fred.mathworks.com>...
> Matt small typo:
> instead of :
> x_new = x(H,:) 
> it should be
> x_new = T(H,:);
> 
> Branko


Oops, I did make a mistake.  I think the correct code would be:

[T,H] = unique(x,'rows','first');
[H,H] = sort(H);
x_new = T(H,:)
0
Reply Matt 12/21/2010 8:32:22 PM

> [T,H] = unique(x,'rows','first');
> [H,H] = sort(H);
> x_new = T(H,:)

Thanks Matt and Branko ;)

Solution works great
0
Reply Dejan 12/22/2010 5:41:04 AM

Hm, once again I was faster in replies than proper checking. Although result seems correct on first sight, it isn't

I'll try in my limited English to explain:

-----------------------------------------------------------
% taking small sample x ~ 5 x 10^5
x = <518720x2 double>

% doing suggested:
[T, H]= unique(x, 'rows', 'first');
[H, H] = sort(H);
y = T(H,:);

% output contents of y [<131127x2 double>] in a file:
dlmwrite('y.out', y)

% cut same number of elements as in "y" from "x" sample:
z = [x(1:131127,1) x(1:131127,2)];
dlmwrite('z.out', z)
-----------------------------------------------------------

Now comparing both files:
 - there are 11926 differences, or said differently
 - 11926 elements are missing inside y.out and appended at y.out end, or said differently
 - 11926 elements are inside z.out and missing at z.out end

I hope it's not confusing

Why could this be?

Is this unwanted result from:

[T, H]= unique(x, 'rows', 'first');

or from:

[H, H] = sort(H);

or something else?

Thanks for reading this and for your time
0
Reply Dejan 12/22/2010 8:09:05 AM

8 Replies
739 Views

(page loaded in 0.261 seconds)

Similiar Articles:













7/20/2012 3:03:30 AM


Reply: