Ok I am really confused. I have a file with 30000 data points, that varies between 0 and 1. I can create a pdf plot of this data in Excel using frequency command, and say a hundred bins between 0.01 and 1. Now if I try to do the same in Matlab, using either hist or histc, the values I get in the bins are different to excel. Why is this? And how can I generate a pdf plot in Matlab that is the same in Excel? Would appreciate any help you could give me. I've spent many hours on this. Many thanks.

0 |

8/11/2010 11:07:04 PM

"S " <enxss10@nottingham.ac.uk> wrote in message <i3vaeo$ggp$1@fred.mathworks.com>... > Ok I am really confused. I have a file with 30000 data points, that varies between 0 and 1. I can create a pdf plot of this data in Excel using frequency command, and say a hundred bins between 0.01 and 1. Now if I try to do the same in Matlab, using either hist or histc, the values I get in the bins are different to excel. Why is this? And how can I generate a pdf plot in Matlab that is the same in Excel? Would appreciate any help you could give me. I've spent many hours on this. Many thanks. - - - - - - - - - - - - - - My recommendation would be to undertake your own investigation of why the bin counts are different. For example, do a histogram with just two bins with both histc and Excel. If there is a difference in their respective counts, do a sort on your original data and this will allow you to identify the particular data value or values that for matlab went into one bin and in Excel into the other. If you study these values carefully you may discover just why matlab and Excel treated them differently. Don't forget to read the histc documentation carefully to find out just how it makes such bin decisions. Roger Stafford

0 |

8/12/2010 12:14:04 AM

"S " wrote in message <i3vaeo$ggp$1@fred.mathworks.com>... > Ok I am really confused. I have a file with 30000 data points, that varies between 0 and 1. I can create a pdf plot of this data in Excel using frequency command, and say a hundred bins between 0.01 and 1. Now if I try to do the same in Matlab, using either hist or histc, the values I get in the bins are different to excel. Why is this? And how can I generate a pdf plot in Matlab that is the same in Excel? Would appreciate any help you could give me. I've spent many hours on this. Many thanks. Thanks for your suggestion. I have read histc documentation several times. It gives the following equation: n(k) counts the value x(i) if edges(k) <= x(i) < edges(k+1). What I would like to be able to do, is to tweak the histc command so that it gives the same frequency distribution as in excel. Is this possible? At the moment, the Excel and Matlab are counting the numbers differently, and I am at a loss to why it is doing this. Appreciate any further advice you could give on this matter. Thanks in advance.

0 |

8/12/2010 12:32:20 AM

> Thanks for your suggestion. I have read histc documentation several times. It gives the following equation: > n(k) counts the value x(i) if edges(k) <= x(i) < edges(k+1). > What I would like to be able to do, is to tweak the histc command so that it gives the same frequency distribution as in excel. Is this possible? At the moment, the Excel and Matlab are counting the numbers differently, and I am at a loss to why it is doing this. Appreciate any further advice you could give on this matter. Thanks in advance. - - - - - - - - I repeat! This something you are entirely capable of finding out for yourself with the use of the sort function. Pin down the individual data value or values where Excel made one decision and histc a different one and then you are well on your way to solving your own problem. Roger Stafford

0 |

8/12/2010 12:55:06 AM

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <i3vgp9$io5$1@fred.mathworks.com>... > > Thanks for your suggestion. I have read histc documentation several times. It gives the following equation: > > n(k) counts the value x(i) if edges(k) <= x(i) < edges(k+1). > > What I would like to be able to do, is to tweak the histc command so that it gives the same frequency distribution as in excel. Is this possible? At the moment, the Excel and Matlab are counting the numbers differently, and I am at a loss to why it is doing this. Appreciate any further advice you could give on this matter. Thanks in advance. > - - - - - - - - > I repeat! This something you are entirely capable of finding out for yourself with the use of the sort function. Pin down the individual data value or values where Excel made one decision and histc a different one and then you are well on your way to solving your own problem. > > Roger Stafford I must say I wasn’t happy with the tone of your second message as this is a serious query about the operation of Matlab. I will respond to it, by submitting the following example. Y=[0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.2] Bins=[0.2 0.4 0.6 0.8 1] Histc(Y,Bins) gives 4,4,4,4,1 Frequency command in Excel gives 3,4,4,4,4 I am more of an Excel user, and I already know that the frequency command counts the instance of numbers that are less than or equal to the upper limit of each bin. Obviously Matlab is using the formula: n(k) counts the value x(i) if edges(k) <= x(i) < edges(k+1). It is unclear however what Matlab does for the last bin, does it just count instances of 1 exactly? As I mentioned in my previous message, I already looked histc in Matlab help, and I requested a way to CHANGE the histc so that it matches Excel. Histc is an inbuild command in Matlab and I don't know how to change the above inbuilt equation. Is my query clearer now? Thank you!

0 |

8/12/2010 11:09:04 PM

Safa wrote: > I am more of an Excel user, and I already know that the frequency > command counts the instance of numbers that are less than or equal to > the upper limit of each bin. Obviously Matlab is using the formula: n(k) > counts the value x(i) if edges(k) <= x(i) < edges(k+1). It is unclear > however what Matlab does for the last bin, does it just count instances > of 1 exactly? What it does is precisely documented http://www.mathworks.com/access/helpdesk/help/techdoc/ref/histc.html "n(k) counts the value x(i) if edges(k) <= x(i) < edges(k+1). The last bin counts any values of x that match edges(end). Values outside the values in edges are not counted. Use -inf and inf in edges to include all non-NaN values." To repeat for emphasis: the last bin counts any values of x that match edges(end). > As I mentioned in my previous message, I already looked histc in Matlab > help, and I requested a way to CHANGE the histc so that it matches > Excel. Histc is an inbuild command in Matlab and I don't know how to > change the above inbuilt equation. Excel does not appear to follow a consistent method with regards to its lower bound. It appears that you might be perhaps able to duplicate excel's inconsistent method via T = histc(Y,[Bins(1)*(1+eps) Bins(2:end)*(1-eps)]); T(1:end-1) However, I am basing this on a single example and there might be a deeper more subtle reason why the 2 is not matched.

0 |

8/12/2010 11:32:05 PM