f

#### PDF in matlab not the same as PDF in Excel

```Ok I am really confused. I have a file with 30000 data points, that varies between 0 and 1. I can create a pdf plot of this data in Excel using frequency command, and say a hundred bins between 0.01 and 1. Now if I try to do the same in Matlab, using either hist or histc, the values I get in the bins are different to excel. Why is this? And how can I generate a pdf plot in Matlab that is the same in Excel? Would appreciate any help you could give me. I've spent many hours on this. Many thanks.
``` 0  S
8/11/2010 11:07:04 PM comp.soft-sys.matlab  210078 articles. 11 followers. 6 Replies 468 Views Similar Articles

[PageSpeed] 37

```"S " <enxss10@nottingham.ac.uk> wrote in message <i3vaeo\$ggp\$1@fred.mathworks.com>...
> Ok I am really confused. I have a file with 30000 data points, that varies between 0 and 1. I can create a pdf plot of this data in Excel using frequency command, and say a hundred bins between 0.01 and 1. Now if I try to do the same in Matlab, using either hist or histc, the values I get in the bins are different to excel. Why is this? And how can I generate a pdf plot in Matlab that is the same in Excel? Would appreciate any help you could give me. I've spent many hours on this. Many thanks.
- - - - - - - - - - - - - -
My recommendation would be to undertake your own investigation of why the bin counts are different.  For example, do a histogram with just two bins with both histc and Excel.  If there is a difference in their respective counts, do a sort on your original data and this will allow you to identify the particular data value or values that for matlab went into one bin and in Excel into the other.  If you study these values carefully you may discover just why matlab and Excel treated them differently.

Don't forget to read the histc documentation carefully to find out just how it makes such bin decisions.

Roger Stafford
``` 0  Roger
8/12/2010 12:14:04 AM
```"S "  wrote in message <i3vaeo\$ggp\$1@fred.mathworks.com>...
> Ok I am really confused. I have a file with 30000 data points, that varies between 0 and 1. I can create a pdf plot of this data in Excel using frequency command, and say a hundred bins between 0.01 and 1. Now if I try to do the same in Matlab, using either hist or histc, the values I get in the bins are different to excel. Why is this? And how can I generate a pdf plot in Matlab that is the same in Excel? Would appreciate any help you could give me. I've spent many hours on this. Many thanks.

Thanks for your suggestion. I have read histc documentation several times. It gives the following equation:
n(k) counts the value x(i) if edges(k) <= x(i) < edges(k+1).
What I would like to be able to do, is to tweak the histc command so that it gives the same frequency distribution as in excel. Is this possible? At the moment, the Excel and Matlab are counting the numbers differently, and I am at a loss to why it is doing this. Appreciate any further advice you could give on this matter. Thanks in advance.
``` 0  Safa
8/12/2010 12:32:20 AM
```> Thanks for your suggestion. I have read histc documentation several times. It gives the following equation:
> n(k) counts the value x(i) if edges(k) <= x(i) < edges(k+1).
> What I would like to be able to do, is to tweak the histc command so that it gives the same frequency distribution as in excel. Is this possible? At the moment, the Excel and Matlab are counting the numbers differently, and I am at a loss to why it is doing this. Appreciate any further advice you could give on this matter. Thanks in advance.
- - - - - - - -
I repeat!  This something you are entirely capable of finding out for yourself with the use of the sort function.  Pin down the individual data value or values where Excel made one decision and histc a different one and then you are well on your way to solving your own problem.

Roger Stafford
``` 0  Roger
8/12/2010 12:55:06 AM
```"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <i3vgp9\$io5\$1@fred.mathworks.com>...
> > Thanks for your suggestion. I have read histc documentation several times. It gives the following equation:
> > n(k) counts the value x(i) if edges(k) <= x(i) < edges(k+1).
> > What I would like to be able to do, is to tweak the histc command so that it gives the same frequency distribution as in excel. Is this possible? At the moment, the Excel and Matlab are counting the numbers differently, and I am at a loss to why it is doing this. Appreciate any further advice you could give on this matter. Thanks in advance.
> - - - - - - - -
>   I repeat!  This something you are entirely capable of finding out for yourself with the use of the sort function.  Pin down the individual data value or values where Excel made one decision and histc a different one and then you are well on your way to solving your own problem.
>
> Roger Stafford

I must say I wasn&#8217;t happy with the tone of your second message as this is a serious query about the operation of Matlab. I will respond to it, by submitting the following example.

Y=[0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.2]
Bins=[0.2 0.4 0.6 0.8 1]
Histc(Y,Bins) gives 4,4,4,4,1

Frequency command in Excel gives 3,4,4,4,4

I am more of an Excel user, and I already know that the frequency command counts the instance of numbers that are less than or equal to the upper limit of each bin. Obviously Matlab is using the formula: n(k) counts the value x(i) if edges(k) <= x(i) < edges(k+1). It is unclear however what Matlab does for the last bin, does it just count instances of 1 exactly?

As I mentioned in my previous message, I already looked histc in Matlab help, and I requested  a way to CHANGE the histc so that it matches Excel. Histc is an inbuild command in Matlab and I don't know how to change the above inbuilt equation.
Is my query clearer now?
Thank you!
``` 0  Safa
8/12/2010 11:09:04 PM
```Safa wrote:

> I am more of an Excel user, and I already know that the frequency
> command counts the instance of numbers that are less than or equal to
> the upper limit of each bin. Obviously Matlab is using the formula: n(k)
> counts the value x(i) if edges(k) <= x(i) < edges(k+1). It is unclear
> however what Matlab does for the last bin, does it just count instances
> of 1 exactly?

What it does is precisely documented
http://www.mathworks.com/access/helpdesk/help/techdoc/ref/histc.html

"n(k) counts the value x(i) if edges(k) <= x(i) < edges(k+1). The last bin
counts any values of x that match edges(end). Values outside the values in
edges are not counted. Use -inf and inf in edges to include all non-NaN values."

To repeat for emphasis: the last bin counts any values of x that match edges(end).

> As I mentioned in my previous message, I already looked histc in Matlab
> help, and I requested  a way to CHANGE the histc so that it matches
> Excel. Histc is an inbuild command in Matlab and I don't know how to
> change the above inbuilt equation.

Excel does not appear to follow a consistent method with regards to its lower
bound. It appears that you might be perhaps able to duplicate excel's
inconsistent method via

T = histc(Y,[Bins(1)*(1+eps) Bins(2:end)*(1-eps)]);
T(1:end-1)

However, I am basing this on a single example and there might be a deeper more
subtle reason why the 2 is not matched.
``` 0  Walter
8/12/2010 11:32:05 PM
```
"Safa " <enxss10@nottingham.ac.uk> wrote in message
news:i41uug\$6ln\$1@fred.mathworks.com...
> "Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in
> message <i3vgp9\$io5\$1@fred.mathworks.com>...
>> > Thanks for your suggestion. I have read histc documentation several
>> > times. It gives the following equation:
>> > n(k) counts the value x(i) if edges(k) <= x(i) < edges(k+1). What I
>> > would like to be able to do, is to tweak the histc command so that it
>> > gives the same frequency distribution as in excel. Is this possible? At
>> > the moment, the Excel and Matlab are counting the numbers differently,
>> > and I am at a loss to why it is doing this. Appreciate any further
>> > advice you could give on this matter. Thanks in advance.
>> - - - - - - - -
>>   I repeat!  This something you are entirely capable of finding out for
>> yourself with the use of the sort function.  Pin down the individual data
>> value or values where Excel made one decision and histc a different one
>> and then you are well on your way to solving your own problem.
>>
>> Roger Stafford
>
> I must say I wasn&#8217;t happy with the tone of your second message as
> this is a serious query about the operation of Matlab. I will respond to
> it, by submitting the following example.
>
> Y=[0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85
> 0.9 0.95 1 1.2]
> Bins=[0.2 0.4 0.6 0.8 1]
> Histc(Y,Bins) gives 4,4,4,4,1
>
> Frequency command in Excel gives 3,4,4,4,4
>
> I am more of an Excel user, and I already know that the frequency command
> counts the instance of numbers that are less than or equal to the upper
> limit of each bin. Obviously Matlab is using the formula: n(k) counts the
> value x(i) if edges(k) <= x(i) < edges(k+1).

Indeed; that is the documented behavior of the function.

http://www.mathworks.com/access/helpdesk/help/techdoc/ref/histc.html

> It is unclear however what Matlab does for the last bin, does it just
> count instances of 1 exactly?

That too is documented on the page above:

"The last bin counts any values of x that match edges(end). Values outside
the values in edges are not counted."

> As I mentioned in my previous message, I already looked histc in Matlab
> help, and I requested  a way to CHANGE the histc so that it matches Excel.
> Histc is an inbuild command in Matlab and I don't know how to change the
> above inbuilt equation.

We do not provide the source code for HISTC and so you cannot change what
HISTC does short of shadowing it (which I do NOT recommend.)  I suppose you
could create your own HISTC subfunction or private function to shadow it
without making your shadowed version globally visible, but I'd still be wary
of doing so.  What I would recommend is creating your own function that does
the equivalent of Excel's FREQUENCY command, but you would likely need to
test it thoroughly as the documentation for the FREQUENCY command leaves
unsaid (at least in my cursory glance) what the command does in certain
(potentially common) scenarios.

For your particular issue, you will need to be extra careful, as many of the
numbers in your Y vector and almost all of the numbers in your Bins vector
cannot be exactly represented in floating-point double precision.  I'm
assuming that for your real problem (rather than the demonstration example
above) that you will not be typing in the Y vector but are computing it
somehow; if that's the case, you may think the third element is the same as
the first element of the Bins vector, but it may not be.

x = 0:0.1:1;
x == 0.3 % returns all false values, even though it _looks_ like 0.3 is the
fourth element of x.  This is the CORRECT behavior.

See question 6.1 in the newsgroup FAQ for more information on this issue
related to floating-point arithmetic.

--
Steve Lord
slord@mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ 0 