Reading a large mixed CSV file of unknown types and size

  • Follow


hello.
I have a CSV file. I don't know in advance what size it is (rows, cols) nor do I know the types associated with each col. The CSV contains mixed types.

	url = 'http://www.bea.gov/national/nipaweb/csv/NIPATable.csv?TableName=6&FirstYear=1900&LastYear=2011&Freq=Qtr';
 
str = urlread(url);

The above link is one such example of such a file. As it updated in "real-time" we never know how many cols it has etc

I would like to get the whole contents of the CSV file loaded into a variable, probably with all the types changed to char. Hence I can search for the data I want (eg 'Gross domestic product') in that variable and then convert the asscoiated data to the correct type.

Matlab seems particuarly bad at doing this; there are so many methods (dlmread, textscan, urlwrite etc etc) and loads of very complex file exchange submissions. 

Many Thanks
0
Reply matlaberboy 6/21/2010 4:22:05 PM

"matlaberboy " <matlaberboy@gmail.NOSPAM.com> wrote in message <hvo3jd$sta$1@fred.mathworks.com>...
> hello.
> I have a CSV file. I don't know in advance what size it is (rows, cols) nor do I know the types associated with each col. The CSV contains mixed types.
> 
> 	url = 'http://www.bea.gov/national/nipaweb/csv/NIPATable.csv?TableName=6&FirstYear=1900&LastYear=2011&Freq=Qtr';
>  
> str = urlread(url);
> 
> The above link is one such example of such a file. As it updated in "real-time" we never know how many cols it has etc
> 
> I would like to get the whole contents of the CSV file loaded into a variable, probably with all the types changed to char. Hence I can search for the data I want (eg 'Gross domestic product') in that variable and then convert the asscoiated data to the correct type.
> 
> Matlab seems particuarly bad at doing this; there are so many methods (dlmread, textscan, urlwrite etc etc) and loads of very complex file exchange submissions. 
> 
> Many Thanks


The two lines you provide puts the content of the file into the variable named str.
The function strfind helps you locate substrings in the string, str.  
>> strfind( str,  'Gross domestic product' )
ans =
        3265

The "lines" are separated with char(13) 
>> is = str == char(10);
sum(is)
is = str == char(13);
sum(is)
ans =
     0
ans =
    38

/ per
0
Reply per 6/21/2010 5:28:20 PM


I understand that you can search a string.

What I want to do is simple: just put the whole contents of the CSV into a cell/ matrix structure.

thank you.
0
Reply matlaberboy 6/21/2010 5:35:27 PM

On Jun 22, 5:35=A0am, "matlaberboy " <matlaber...@gmail.NOSPAM.com>
wrote:
> I understand that you can search a string.
>
> What I want to do is simple: just put the whole contents of the CSV into =
a cell/ matrix structure.
>
> thank you.

fid=3Dfopen(csvfile,'rt');
a=3Dfscanf(fid,'%c');
fclose(fid);

Now everything, including commas and line-feeds, is in the string a.
You can parse it using the string functions: strfind, findstr and
strmatch.
First job is to look for line-feeds char(10).
These separate the lines.
Then you look for commas, which separate the data.
0
Reply TideMan 6/21/2010 8:17:52 PM

TideMan <mulgor@gmail.com> wrote in message <254e945c-512d-45ce-9416-1d7743213148@d37g2000yqm.googlegroups.com>...
> On Jun 22, 5:35 am, "matlaberboy " <matlaber...@gmail.NOSPAM.com>
> wrote:
> > I understand that you can search a string.
> >
> > What I want to do is simple: just put the whole contents of the CSV into a cell/ matrix structure.
> >
> > thank you.
> 
> fid=fopen(csvfile,'rt');
> a=fscanf(fid,'%c');
> fclose(fid);

derek, just a thought

     a=fread(fid,inf,'*char').';

is (probably) much(!) faster...

urs
0
Reply us 6/21/2010 8:24:21 PM

On Jun 22, 8:24=A0am, "us " <u...@neurol.unizh.ch> wrote:
> TideMan <mul...@gmail.com> wrote in message <254e945c-512d-45ce-9416-1d77=
43213...@d37g2000yqm.googlegroups.com>...
> > On Jun 22, 5:35=A0am, "matlaberboy " <matlaber...@gmail.NOSPAM.com>
> > wrote:
> > > I understand that you can search a string.
>
> > > What I want to do is simple: just put the whole contents of the CSV i=
nto a cell/ matrix structure.
>
> > > thank you.
>
> > fid=3Dfopen(csvfile,'rt');
> > a=3Dfscanf(fid,'%c');
> > fclose(fid);
>
> derek, just a thought
>
> =A0 =A0 =A0a=3Dfread(fid,inf,'*char').';
>
> is (probably) much(!) faster...
>
> urs

You're right, us:
>> tic;a=3Dfscanf(fid,'%c');toc
Elapsed time is 1.332797 seconds.
>> frewind(fid);
>> tic;b=3Dfread(fid,inf,'*char').';toc
Elapsed time is 0.399964 seconds.
>>
0
Reply TideMan 6/21/2010 8:44:21 PM

5 Replies
630 Views

(page loaded in 0.063 seconds)

Similiar Articles:













7/25/2012 9:33:55 PM


Reply: