Loading, pre-processing and plotting CSV file in Matlab

  • Follow


Hello---

I have a CSV file with at least 7000 rows and 72 columns.  Each column 
has a header comprised of 5 rows.  The first column is comprised of 
dates in the form "01/03/10 10:45", where "DD/MM/YY hh:mm", and "DD" is 
the day of the month, "MM" is the month, "YY" is the year, "hh" is the 
hour, and "mm" is the minute.  All columns except for the first one 
contain numerical values expressible in double format.  Missing values 
in each column are indicated by "NAN"

I would like to import this CSV file into Matlab, ignore the first five 
rows, and then plot the first date column on the x-axis, and another 
column (say the 66th column) on the y-axis of a 2D plot.

Is there a way to do this in Matlab?

Nicholas
0
Reply n.kinar (156) 5/29/2010 7:33:03 PM

Nicholas Kinar wrote:
....

> Is there a way to do this in Matlab?

yes...

--
0
Reply dpb 5/29/2010 8:16:03 PM


On 10-05-29 2:16 PM, dpb wrote:
> Nicholas Kinar wrote:
> ...
>
>> Is there a way to do this in Matlab?
>
> yes...
>
> --

Thanks for your response. How might I proceed?  I've been trying to do 
this with the readtext() function available on the file exchange, and 
I'm not having very much luck with the cell arrays.

% read in the data
fileName = 'example.csv';
[infoString, result]= readtext(fileName);
time = infoString(:,1);
depth = infoString(:,66)

Now "time" and "depth" are cell arrays which are non-homogeneous, 
containing both numbers and strings.  How do I strip off the first few 
rows and convert each column into numbers, given that each column also 
contains "NAN" values?

Nicholas
0
Reply Nicholas 5/29/2010 8:48:01 PM

On 29 Mai, 22:48, Nicholas Kinar <n.ki...@usask.ca> wrote:
> On 10-05-29 2:16 PM, dpb wrote:
>
> > Nicholas Kinar wrote:
> > ...
>
> >> Is there a way to do this in Matlab?
>
> > yes...
>
> > --
>
> Thanks for your response. How might I proceed? =A0

By explaining what the problem is? This is a fairly trivial
exercise...

Rune
0
Reply Rune 5/29/2010 9:06:21 PM

Nicholas Kinar wrote:
> On 10-05-29 2:16 PM, dpb wrote:
>> Nicholas Kinar wrote:
>> ...
>>
>>> Is there a way to do this in Matlab?
>>
>> yes...
....

> Thanks for your response. How might I proceed?  I've been trying to do 
> this with the readtext() function available on the file exchange, ...

I'd suggest lookling at the file w/ the import wizard first; if it 
handles it, that's probably the least intensive from user standpoint.

textscan() is more flexible than textread()  Specifically, the 
'collectoutputs' option will place the numerics in an array which is 
convenient for plotting.

datenum() and friends works on cell arrays, too...

--
0
Reply dpb 5/29/2010 10:12:25 PM

> I'd suggest lookling at the file w/ the import wizard first; if it
> handles it, that's probably the least intensive from user standpoint.
>
> textscan() is more flexible than textread() Specifically, the
> 'collectoutputs' option will place the numerics in an array which is
> convenient for plotting.
>
> datenum() and friends works on cell arrays, too...
>
> --

Thank you once again for your response, dpb; this is greatly 
appreciated.  The import wizard seems to handle the file correctly, and 
in Matlab R2010a, I'm also given the ability to generate code to load 
the file:

fileToRead = 'test.csv';
DELIMITER = ',';
HEADERLINES = 3;

% Import the file
newData = importdata(fileToRead, DELIMITER, HEADERLINES);

% Create new variables in the base workspace from those fields.
vars = fieldnames(newData);
for i = 1:length(vars)
     assignin('base', vars{i}, newData.(vars{i}));
end

This code creates the following in the workspace:

DELIMITER, HEADERLINES, data, fileToRead, i, newData, textdata, vars

Note that "textdata" is not listed in the code, but Matlab creates this 
variable in the workspace.


 >> size(textdata)

ans =

         6840          72

Then how would I deal with the "NAN" elements in "textdata" and convert 
the cell array to numerical values? Suppose that I want to grab the 
first and 66th column of this cell array, remove the "NAN" elements, and 
then convert the matrix all to a value such as double?

However, this has given me yet another idea.  Perhaps I could write a 
MEX file in C.  The MEX program would parse the data as I please and 
then return it to Matlab. To me, this seems much easier than trying to 
work with all of these cell arrays.

Nicholas

0
Reply Nicholas 5/29/2010 11:24:44 PM

>
> By explaining what the problem is? This is a fairly trivial
> exercise...
>
> Rune

Perhaps trivial to you, Rune; I've been using either Fortran/C/C++ for a 
long time, but you know more of Matlab than I do at this time.

Nicholas
0
Reply Nicholas 5/29/2010 11:27:12 PM

>
> Then how would I deal with the "NAN" elements in "textdata" and convert
> the cell array to numerical values? Suppose that I want to grab the
> first and 66th column of this cell array, remove the "NAN" elements, and
> then convert the matrix all to a value such as double?

Perhaps "remove" is not the right operation.  I might want to convert 
"NAN" to "NaN" so that Matlab can still plot the data.

Nicholas
0
Reply Nicholas 5/29/2010 11:36:57 PM

Nicholas Kinar wrote:
> 
>>
>> Then how would I deal with the "NAN" elements in "textdata" and convert
>> the cell array to numerical values? Suppose that I want to grab the
>> first and 66th column of this cell array, remove the "NAN" elements, and
>> then convert the matrix all to a value such as double?
> 
> Perhaps "remove" is not the right operation.  I might want to convert 
> "NAN" to "NaN" so that Matlab can still plot the data.

Oh, the "NAN" is embedded ASCII text; I assumed it was a NaN

To use cell arrays, you simply dereference them w/ the curly brackets or 
if you need to convert to arrays unless later versions of Matlab (I'm at 
a fairly old release now) need to use for loops in this version.

But, textscan() does have the 'group' option I mentioned earlier that 
does place things in arrays of same type (but the mixed numeric/text is 
an issue it won't handle well, that's true...

Not knowing precisely the data, something otoo the following trivial 
example may help.  Matlab isn't C or C++ so it takes a little while 
getting used to, perhaps even more so if one is adept in another 
language w/ so much to unlearn... :)

 >> c(1)=num2cell(pi);
 >> c(3)=num2cell(2);
 >> c(2)=cellstr('NAN')
c =
     [3.1416]
     'NAN'
     [     2]
 >> for idx=1:length(c), if isstr(c{idx}), c(idx)=num2cell(nan);end,end
 >> c
c =
     [3.1416]
     [   NaN]
     [     2]

--
0
Reply dpb 5/30/2010 1:48:52 AM

On May 30, 11:27=A0am, Nicholas Kinar <n.ki...@usask.ca> wrote:
> > By explaining what the problem is? This is a fairly trivial
> > exercise...
>
> > Rune
>
> Perhaps trivial to you, Rune; I've been using either Fortran/C/C++ for a
> long time, but you know more of Matlab than I do at this time.
>
> Nicholas

Oh dear, now you've torn it.
Mentioning Fortran and Rune in the same sentence is like pouring
gasoline on a fire...................

0
Reply TideMan 5/30/2010 1:50:33 AM

>
> Oh, the "NAN" is embedded ASCII text; I assumed it was a NaN
>
> To use cell arrays, you simply dereference them w/ the curly brackets or
> if you need to convert to arrays unless later versions of Matlab (I'm at
> a fairly old release now) need to use for loops in this version.
>
> But, textscan() does have the 'group' option I mentioned earlier that
> does place things in arrays of same type (but the mixed numeric/text is
> an issue it won't handle well, that's true...
>
> Not knowing precisely the data, something otoo the following trivial
> example may help. Matlab isn't C or C++ so it takes a little while
> getting used to, perhaps even more so if one is adept in another
> language w/ so much to unlearn... :)
>
>  >> c(1)=num2cell(pi);
>  >> c(3)=num2cell(2);
>  >> c(2)=cellstr('NAN')
> c =
> [3.1416]
> 'NAN'
> [ 2]
>  >> for idx=1:length(c), if isstr(c{idx}), c(idx)=num2cell(nan);end,end
>  >> c
> c =
> [3.1416]
> [ NaN]
> [ 2]
>


Yes - thank you very much for that example, dpb! It really helped to 
point me in the right direction.  Okay, here is what I did to convert 
the character values into numbers:

% remove the first three rows of textdata
% since this is the header
textdata(1:3, :) = [];

% create a new vector with only numbers
colnum = 66;
[rownum,~] = size( textdata );
depth = zeros( rownum, 1);
for idx=1:rownum
     if strcmp(textdata{idx, colnum}, 'NAN')
         depth( idx ) = nan;
     else
         depth( idx ) = str2double( textdata{idx, colnum} );
     end
end

Now one question remains: How do I plot the data, given time/date in the 
first column?

Using Matlab, I extract the column with the timestamp and refer to it as 
the "time" vector.  Now how do I take this cell array and plot it on the 
x-axis of the plot as shown below?

 >> time = textdata(:,1);
 >> class (time)

ans =

cell

 >> time{1,1}

ans =

01/03/10 10:45

 >> time{2,1}

ans =

01/03/10 11:00

 >> length(time)

ans =

         6837

 >> plot(time, depth)
??? Error using ==> plot
Conversion to double from cell is not possible.

Once again, thank you very much for your help!

Nicholas



0
Reply Nicholas 5/30/2010 3:17:57 AM

> Now one question remains: How do I plot the data, given time/date in the
> first column?
>
> Using Matlab, I extract the column with the timestamp and refer to it as
> the "time" vector. Now how do I take this cell array and plot it on the
> x-axis of the plot as shown below?
>
>  >> time = textdata(:,1);
>  >> class (time)
>
> ans =
>
> cell
>
>  >> time{1,1}
>
> ans =
>
> 01/03/10 10:45
>
>  >> time{2,1}
>
> ans =
>
> 01/03/10 11:00
>
>  >> length(time)
>
> ans =
>
> 6837
>
>  >> plot(time, depth)
> ??? Error using ==> plot
> Conversion to double from cell is not possible.
>

Erm - I think that I might have figured it out myself.  What I've done 
is created the time vector, and converted the string into Matlab's 
"datenum" format.  Then, I've plotted the time vector, but I've also 
used the datetick function to set the date to a proper display format.

time = datenum( textdata(:,1), 'dd/mm/yy HH:MM' );
plot(time, depth);
datetick('x', 'mm/dd/yy')


Nicholas






0
Reply Nicholas 5/30/2010 3:31:32 AM

Nicholas Kinar wrote:
....

> Erm - I think that I might have figured it out myself.  What I've done 
> is created the time vector, and converted the string into Matlab's 
> "datenum" format.  Then, I've plotted the time vector, but I've also 
> used the datetick function to set the date to a proper display format.
> 
> time = datenum( textdata(:,1), 'dd/mm/yy HH:MM' );
> plot(time, depth);
> datetick('x', 'mm/dd/yy')
....

Ermmm...yes... :)

I mentioned datenum operated on cells earlier, now, didn't I??? Wonder 
why that was, ya' reckon....  <VBG>

--
0
Reply dpb 5/30/2010 4:05:17 AM

On 10-05-29 10:05 PM, dpb wrote:
> Nicholas Kinar wrote:
> ...
>
>> Erm - I think that I might have figured it out myself. What I've done
>> is created the time vector, and converted the string into Matlab's
>> "datenum" format. Then, I've plotted the time vector, but I've also
>> used the datetick function to set the date to a proper display format.
>>
>> time = datenum( textdata(:,1), 'dd/mm/yy HH:MM' );
>> plot(time, depth);
>> datetick('x', 'mm/dd/yy')
> ...
>
> Ermmm...yes... :)
>
> I mentioned datenum operated on cells earlier, now, didn't I??? Wonder
> why that was, ya' reckon.... <VBG>
>
> --

Yes - thank you very much for your help, dpb; I was actually responding 
to the question posed in my previous post.  I had to use the datetick() 
function to create the plot, but you did indeed point me in the right 
direction with the datenum() function, which generated the vector of 
dates in the first place.

Nicholas
0
Reply Nicholas 5/30/2010 4:39:25 AM

On 30 Mai, 01:27, Nicholas Kinar <n.ki...@usask.ca> wrote:
> > By explaining what the problem is? This is a fairly trivial
> > exercise...
>
> > Rune
>
> Perhaps trivial to you, Rune; I've been using either Fortran/C/C++ for a
> long time, but you know more of Matlab than I do at this time.

There are a number of matlab functions that emulate the
file IO functions from C. Look at

FOPEN
FCLOSE
FREAD
FWRITE
FSCANF
FPRINTF

There are some details that differ between C and matlab,
but if you know how to do this in C, it will take you no
time get the hang of it.

The main difference you might want to be aware of, is that
matlab's versions have what are termed 'vectorized' versions.
These versions scan and import several lines of the file for
each call of the function, thus bypassing the matlab interpreter
with the intention to save some run-time time.

Rune
0
Reply Rune 5/30/2010 6:52:11 AM

On 30 Mai, 03:50, TideMan <mul...@gmail.com> wrote:
> On May 30, 11:27=A0am, Nicholas Kinar <n.ki...@usask.ca> wrote:
>
> > > By explaining what the problem is? This is a fairly trivial
> > > exercise...
>
> > > Rune
>
> > Perhaps trivial to you, Rune; I've been using either Fortran/C/C++ for =
a
> > long time, but you know more of Matlab than I do at this time.
>
> > Nicholas
>
> Oh dear, now you've torn it.
> Mentioning Fortran and Rune in the same sentence is like pouring
> gasoline on a fire...................

Of course not. This guy does the sensible thing: He moves away
from fortran. No need to flame somebody who both agrees with me
and already does what I would have adviced him to...

Rune
0
Reply Rune 5/30/2010 6:53:32 AM

On 30 Mai, 01:24, Nicholas Kinar <n.ki...@usask.ca> wrote:

> However, this has given me yet another idea. =A0Perhaps I could write a
> MEX file in C. =A0The MEX program would parse the data as I please and
> then return it to Matlab. To me, this seems much easier than trying to
> work with all of these cell arrays.

That would be a very good idea, yes. It would be the preferred way,
as you would be able to re-use your file parser in non-matlab
programs down the line.

However, if you want to return non-trivial data from the files,
you will soon find you need to work with the cell arrays anyway.
You might prefer to work the cell arrays from the matlab prompt
and not in MEX.

Rune
0
Reply Rune 5/30/2010 6:57:03 AM

16 Replies
315 Views

(page loaded in 0.106 seconds)

Similiar Articles:


















7/27/2012 1:10:14 AM


Reply: