COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### Find repeated sequence in array

• Email
• Follow

```Hello,
I have array of two vectors, which consists of repeated data sequences. For simplicity lets consider:

x =
1 1
3 1
5 2
7 2
1 1
3 1
5 2
7 2
1 1
3 1
5 2
7 2
....

I would like to extract the sequenced part:

y =
1 1
3 1
5 2
7 2

Arrays are consisted of 10^7 to 10^8 elements, and sequence consists of maximum 10^6 elements, usually less like 10^5

Thanks
```
 0

See related articles to this posting

```"Dejan" wrote in message <ieqok4\$hso\$1@fred.mathworks.com>...
> Hello,
> I have array of two vectors, which consists of repeated data sequences. For simplicity lets consider:
>
> x =
> 1 1
> 3 1
> 5 2
> 7 2
> 1 1
> 3 1
> 5 2
> 7 2
> 1 1
> 3 1
> 5 2
> 7 2
> ...
>
> I would like to extract the sequenced part:
>
> y =
> 1 1
> 3 1
> 5 2
> 7 2
>
> Arrays are consisted of 10^7 to 10^8 elements, and sequence consists of maximum 10^6 elements, usually less like 10^5
>
> Thanks

y = unique(x,'rows');
```
 0

```"Sean de " <sean.dewolski@nospamplease.umit.maine.edu> wrote in message <ieqp8m\$se4\$1@fred.mathworks.com>...
>
> y = unique(x,'rows');

I'm obviously very new to Matlab, but I just didn't thought that answer could be that simple, considering amount of data in question

Solution works great

Cheers
```
 0

```> y = unique(x,'rows');

After better checking, unfortunately it reorders the elements in ascending order

Is there are way to keep original elements order?
```
 0

```[T,H] = unique(x,'rows','first');
x_new = x(H,:)
```
 0

```Matt small typo:
x_new = x(H,:)
it should be
x_new = T(H,:);

Branko

"Matt Fig" wrote in message <ieqvnc\$o25\$1@fred.mathworks.com>...
> [T,H] = unique(x,'rows','first');
> x_new = x(H,:)
```
 0

```"Branko " <brankobogunovic@gmail.com> wrote in message <ier16a\$sqh\$1@fred.mathworks.com>...
> Matt small typo:
> x_new = x(H,:)
> it should be
> x_new = T(H,:);
>
> Branko

Oops, I did make a mistake.  I think the correct code would be:

[T,H] = unique(x,'rows','first');
[H,H] = sort(H);
x_new = T(H,:)
```
 0

```> [T,H] = unique(x,'rows','first');
> [H,H] = sort(H);
> x_new = T(H,:)

Thanks Matt and Branko ;)

Solution works great
```
 0

```Hm, once again I was faster in replies than proper checking. Although result seems correct on first sight, it isn't

I'll try in my limited English to explain:

-----------------------------------------------------------
% taking small sample x ~ 5 x 10^5
x = <518720x2 double>

% doing suggested:
[T, H]= unique(x, 'rows', 'first');
[H, H] = sort(H);
y = T(H,:);

% output contents of y [<131127x2 double>] in a file:
dlmwrite('y.out', y)

% cut same number of elements as in "y" from "x" sample:
z = [x(1:131127,1) x(1:131127,2)];
dlmwrite('z.out', z)
-----------------------------------------------------------

Now comparing both files:
- there are 11926 differences, or said differently
- 11926 elements are missing inside y.out and appended at y.out end, or said differently
- 11926 elements are inside z.out and missing at z.out end

I hope it's not confusing

Why could this be?

Is this unwanted result from:

[T, H]= unique(x, 'rows', 'first');

or from:

[H, H] = sort(H);

or something else?

```
 0