COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### Find and count frequencies of sequences

• Email
• Follow

```Hello all.

I have a column of data of several hundred elements long consisting of 1s and 4s, like this e.g.;

a = 1
4
4
4
4
1
1
4
1
4
....

I need to obtain the total frequency for the appearance of several sequences of combinations of 1 and 4, i.e. how many times do sequences "1, 4", "1, 1, 4", "1, 1, 1, 4", and so on, appear in the column.

Iain
```
 0

See related articles to this posting

```Hi, if the sequences that you want to check is finite try this; you can get other ideas from it good luck:
a=[
4
4
4
4
1
1
4
1
4];
count=0;
for i=3:size(a);
if a(i-2:i)==[1,1,4]';
count=count+1;
end
end
% the "count" value now has the frequency of [1 1 4]' in a
```
 0

```Iain:
I don't think you really want to do this.  Think about it.  There are
millions of different possible patterns of a variety of different
lengths from 2 to N (the number of elements).  I don't think you'd
really want them all even if you could get them from some brute force
algorithm (which is possible though).  If you could narrow it down to
a finite number of predefined sequences then you could do it.  For
example you want sequences of only up to 5 digits long.  You could
then use filter or imfilter with the pattern you're looking for and
then look for a particular value, i.e. sum(pattern .* pattern), in the
filtered output.

```
 0

```stu <tmnarges2@live.utm.my> wrote in message <895557605.43596.1296825277597.JavaMail.root@gallium.mathforum.org>...
> Hi, if the sequences that you want to check is finite try this; you can get other ideas from it good luck:
> a=[
> 4
> 4
> 4
> 4
> 1
> 1
> 4
> 1
> 4];
> count=0;
> for i=3:size(a);
>     if a(i-2:i)==[1,1,4]';
>         count=count+1;
>     end
> end
> % the "count" value now has the frequency of [1 1 4]' in a

Thanks for these suggestions.

Apologies ImageAnalyst for lack of clarity in original post, I didn't mean to imply 2 to N sequence patterns - the max length of sequence will likely be around 10.

stu - that idea is working very well (so far!), many thanks.
```
 0

```"Iain" wrote in message <iignu3\$qg\$1@fred.mathworks.com>...
> Hello all.
>
> I have a column of data of several hundred elements long consisting of 1s and 4s, like this e.g.;
>
> a = 1
> 4
> 4
> 4
> 4
> 1
> 1
> 4
> 1
> 4
> ...
>
> I need to obtain the total frequency for the appearance of several sequences of combinations of 1 and 4, i.e. how many times do sequences "1, 4", "1, 1, 4", "1, 1, 1, 4", and so on, appear in the column.
>
> Any suggestions? Thanks in advance.
> Iain

Take a look at STRFIND and CELLFUN

C = [1 1 4 1 1 4 1 1  4 1 4 4 4 1 4 1 4 1 1].' ; % column vector
seq = {[1 4 1],[1 4],[4 1 1 4],[9 9 9]} % sequences
N = cellfun(@(x) numel(strfind(C(:).',x(:).')), seq)

Note that strfind requires row vectors.

hth
Jos
```
 0