I have a huge csv file, that I would just like to be able to read specific lines form the file.
Is this posssible, I have tried fget, fscanf, but they don't seem to address the problem very well.
for example, I have about 500,000 rows, and irregular noumber of columns. I know which row or rows I need the information from. Is there a way I can just jump to that line using any of the built in commands.
function fgetl, does not work becuase, it start from first line to the next.
|
|
0
|
|
|
|
Reply
|
smith
|
8/25/2010 1:20:09 PM |
|
"smith Og" <adeog@ymail.com> wrote in message <i535a9$9ji$1@fred.mathworks.com>...
> I have a huge csv file, that I would just like to be able to read specific lines form the file.
>
> Is this posssible, I have tried fget, fscanf, but they don't seem to address the problem very well.
>
> for example, I have about 500,000 rows, and irregular noumber of columns. I know which row or rows I need the information from. Is there a way I can just jump to that line using any of the built in commands.
>
> function fgetl, does not work becuase, it start from first line to the next.
Hi, have you tried csvread() with the Row and Column options specified?
Wayne
|
|
0
|
|
|
|
Reply
|
Wayne
|
8/25/2010 1:35:06 PM
|
|
"Wayne King" <wmkingty@gmail.com> wrote in message <i5366a$7cc$1@fred.mathworks.com>...
> "smith Og" <adeog@ymail.com> wrote in message <i535a9$9ji$1@fred.mathworks.com>...
> > I have a huge csv file, that I would just like to be able to read specific lines form the file.
> >
> > Is this posssible, I have tried fget, fscanf, but they don't seem to address the problem very well.
> >
> > for example, I have about 500,000 rows, and irregular noumber of columns. I know which row or rows I need the information from. Is there a way I can just jump to that line using any of the built in commands.
> >
> > function fgetl, does not work becuase, it start from first line to the next.
Hi Wayne,
thanks. unfortunately the file contains both text and numeric, and csvread works only on numeric file.
>
> Hi, have you tried csvread() with the Row and Column options specified?
>
> Wayne
|
|
0
|
|
|
|
Reply
|
smith
|
8/25/2010 1:45:21 PM
|
|
smith Og wrote:
....
> for example, I have about 500,000 rows, and irregular noumber of
> columns. I know which row or rows I need the information from. Is there
> a way I can just jump to that line using any of the built in commands.
>
> function fgetl, does not work becuase, it start from first line to the
> next.
Well, the thing about sequential files is that they
are...ummmmmh...._sequential_.
You could play games with assuming an average number of characters/line
and fseek() based on that then find the next eol and go from there but
then you're at the mercy of trying to figure out which actual line you
are on since it is only an average.
If the data file will fit in memory you could suck it all in w/ binary
read and search for eol internally instead of of reading a line at a
time, but the file is fundamentally a variable-length record sequential
file so you're at the mercy of having to deal with it in that form or
change the file format.
--
|
|
0
|
|
|
|
Reply
|
dpb
|
8/25/2010 2:22:29 PM
|
|
On 25/08/10 8:20 AM, smith Og wrote:
> I have a huge csv file, that I would just like to be able to read
> specific lines form the file.
>
> Is this posssible, I have tried fget, fscanf, but they don't seem to
> address the problem very well.
>
> for example, I have about 500,000 rows, and irregular noumber of
> columns. I know which row or rows I need the information from. Is there
> a way I can just jump to that line using any of the built in commands.
>
> function fgetl, does not work becuase, it start from first line to the
> next.
Are you using a very old version of Matlab on a DEC VAX computer with
VMS that dates before OpenVMS, and was the csv file created with a text
editor or a program that was NOT written in C or C++ ? If that
combination of circumstances does not happen to be the case, then Sorry,
your operating system itself has no internal markers as how long "lines"
are and so there is no method other than starting at the beginning and
reading until you get to the line you want.
[If you do happen to be using an old old Matlab on real VMS not OpenVMS,
then you might be able to create some MEX that uses VMS's RMS (Record
Management Services) calls in order to locate a line. There is no
equivalent service in any modern operating system that I have been
exposed to.]
|
|
0
|
|
|
|
Reply
|
Walter
|
8/25/2010 2:30:02 PM
|
|
I'm not at MATLAB right now, but does it really take so long to read through 500k lines when you're not doing anything to most of them?
Could you post the result of the following for your data file:
fileName = 'myFile.csv'; % your file name here
fid = fopen(fileName);
currLine = 0;
tic
while fid ~= -1
tmp = fgetl(fid);
currLine = currLine + 1;
end
toc
fclose(fid);
|
|
0
|
|
|
|
Reply
|
Andy
|
8/25/2010 2:36:20 PM
|
|
"Andy " <myfakeemailaddress@gmail.com> wrote in message <i539p4$9v6$1@fred.mathworks.com>...
> I'm not at MATLAB right now, but does it really take so long to read through 500k lines when you're not doing anything to most of them?
>
> Could you post the result of the following for your data file:
>
> fileName = 'myFile.csv'; % your file name here
> fid = fopen(fileName);
> currLine = 0;
>
> tic
> while fid ~= -1
> tmp = fgetl(fid);
> currLine = currLine + 1;
Andy,
the solution is subtle, however, fid ~= -1 does not work.
however if I know the total number of lines, then it works.
fileName = 'myFile.csv'; % your file name here
fid = fopen(fileName);
for i = 1:total number of lines:
tmp = fgetl(fid)
fclose(fid)
then I can extract which lines I want, within the for loop.
though I still believe MATLAB should have included a more direct approach to read lines of interest.
|
|
0
|
|
|
|
Reply
|
smith
|
8/25/2010 3:35:08 PM
|
|
"smith Og" <adeog@ymail.com> wrote in message <i53d7c$j4$1@fred.mathworks.com>...
> "Andy " <myfakeemailaddress@gmail.com> wrote in message <i539p4$9v6$1@fred.mathworks.com>...
> > I'm not at MATLAB right now, but does it really take so long to read through 500k lines when you're not doing anything to most of them?
> >
> > Could you post the result of the following for your data file:
> >
> > fileName = 'myFile.csv'; % your file name here
> > fid = fopen(fileName);
> > currLine = 0;
> >
> > tic
> > while fid ~= -1
> > tmp = fgetl(fid);
> > currLine = currLine + 1;
>
> Andy,
> the solution is subtle, however, fid ~= -1 does not work.
> however if I know the total number of lines, then it works.
>
> fileName = 'myFile.csv'; % your file name here
> fid = fopen(fileName);
> for i = 1:total number of lines:
> tmp = fgetl(fid)
> fclose(fid)
>
> then I can extract which lines I want, within the for loop.
>
> though I still believe MATLAB should have included a more direct approach to read lines of interest.
Whoops. That's what I get for posting code that I haven't checked in MATLAB. It should have read something like:
fileName = 'myFile.csv'; % your file name here
fid = fopen(fileName);
currLine = 1;
tmp = fgetl(fid);
tic
while tmp ~= -1
tmp = fgetl(fid);
currLine = currLine + 1;
end
toc
fclose(fid);
As for MATLAB including ways to directly read lines of interest, it seems to me there are as many ways as there are in any other programming language. You can't get around the fact that there is just no way to identify a particular line of a csv file on disk without reading it in sequentially. What you could do, assuming you have some other program or function that creates your data file regularly, is automatically run a script that reads through the file line by line (as above), extracts the few lines you care about, and stores them separately in a .mat file for your use later. Of course this won't save time overall, but it will push this extra file reading to a time when you don't have to wait for it.
|
|
0
|
|
|
|
Reply
|
Andy
|
8/25/2010 3:57:04 PM
|
|
"smith Og" <adeog@ymail.com> wrote in message
news:i53d7c$j4$1@fred.mathworks.com...
> "Andy " <myfakeemailaddress@gmail.com> wrote in message
> <i539p4$9v6$1@fred.mathworks.com>...
>> I'm not at MATLAB right now, but does it really take so long to read
>> through 500k lines when you're not doing anything to most of them?
>>
>> Could you post the result of the following for your data file:
>>
>> fileName = 'myFile.csv'; % your file name here
>> fid = fopen(fileName);
>> currLine = 0;
>>
>> tic
>> while fid ~= -1
>> tmp = fgetl(fid);
>> currLine = currLine + 1;
>
> Andy, the solution is subtle, however, fid ~= -1 does not work.
That's correct. You need that for another reason, though.
> however if I know the total number of lines, then it works.
>
> fileName = 'myFile.csv'; % your file name here
> fid = fopen(fileName);
> for i = 1:total number of lines:
> tmp = fgetl(fid)
> fclose(fid)
>
> then I can extract which lines I want, within the for loop.
filename = 'myFile.csv';
[fid, message] = fopen(filename);
if fid == -1
error('example:failureOpeningFile', 'Error opening file: %s', message);
end
lineToBeRead = 1;
targetLine = 100000;
while ~feof(fid) && lineToBeRead < targetLine
fgetl(fid); % can throw this line away
lineToBeRead = lineToBeRead+1;
end
% We reached this point because either:
% 1) We've reached the end of the file
% 2) We've reached the target line
% Handle case 1 here:
if feof(fid)
fclose(fid);
error('example:fileTooShort', 'The file did not contain %d lines',
targetLine);
end
% If you've reached this point, we're in case 2 and you're now ready to read
the target line or lines.
--
Steve Lord
slord@mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ
To contact Technical Support use the Contact Us link on
http://www.mathworks.com
|
|
0
|
|
|
|
Reply
|
Steven_Lord
|
8/25/2010 4:51:05 PM
|
|
"Andy " <myfakeemailaddress@gmail.com> wrote in message <i53egg$r33$1@fred.mathworks.com>...
> "smith Og" <adeog@ymail.com> wrote in message <i53d7c$j4$1@fred.mathworks.com>...
> > "Andy " <myfakeemailaddress@gmail.com> wrote in message <i539p4$9v6$1@fred.mathworks.com>...
> > > I'm not at MATLAB right now, but does it really take so long to read through 500k lines when you're not doing anything to most of them?
> > >
> > > Could you post the result of the following for your data file:
> > >
> > > fileName = 'myFile.csv'; % your file name here
> > > fid = fopen(fileName);
> > > currLine = 0;
> > >
> > > tic
> > > while fid ~= -1
> > > tmp = fgetl(fid);
> > > currLine = currLine + 1;
> >
> > Andy,
> > the solution is subtle, however, fid ~= -1 does not work.
> > however if I know the total number of lines, then it works.
> >
> > fileName = 'myFile.csv'; % your file name here
> > fid = fopen(fileName);
> > for i = 1:total number of lines:
> > tmp = fgetl(fid)
> > fclose(fid)
> >
> > then I can extract which lines I want, within the for loop.
> >
> > though I still believe MATLAB should have included a more direct approach to read lines of interest.
>
> Whoops. That's what I get for posting code that I haven't checked in MATLAB. It should have read something like:
>
> fileName = 'myFile.csv'; % your file name here
> fid = fopen(fileName);
> currLine = 1;
> tmp = fgetl(fid);
>
> tic
> while tmp ~= -1
> tmp = fgetl(fid);
> currLine = currLine + 1;
> end
> toc
>
> fclose(fid);
>
> As for MATLAB including ways to directly read lines of interest, it seems to me there are as many ways as there are in any other programming language. You can't get around the fact that there is just no way to identify a particular line of a csv file on disk without reading it in sequentially. What you could do, assuming you have some other program or function that creates your data file regularly, is automatically run a script that reads through the file line by line (as above), extracts the few lines you care about, and stores them separately in a .mat file for your use later. Of course this won't save time overall, but it will push this extra file reading to a time when you don't have to wait for it.
Thanks Andy. Deeply appreciated
|
|
0
|
|
|
|
Reply
|
smith
|
8/25/2010 4:56:04 PM
|
|
smith Og wrote:
....
> though I still believe MATLAB should have included a more direct
> approach to read lines of interest.
This indicates we have "a failure to communicate" problem....
As noted earlier, a csv file is a variable-length-record _SEQUENTIAL_
file.
Precisely describe how would you propose to have a way to determine how
many random number of bytes into that file would be the beginning of a
particular line of interest?
Once you have come to a conclusion on that question you should realize
why the function you request doesn't exist.
--
|
|
0
|
|
|
|
Reply
|
dpb
|
8/25/2010 5:12:00 PM
|
|
dpb <none@non.net> wrote in message <i53j3r$7o5$1@news.eternal-september.org>...
> smith Og wrote:
> ...
> > though I still believe MATLAB should have included a more direct
> > approach to read lines of interest.
>
> This indicates we have "a failure to communicate" problem....
>
> As noted earlier, a csv file is a variable-length-record _SEQUENTIAL_
> file.
>
> Precisely describe how would you propose to have a way to determine how
> many random number of bytes into that file would be the beginning of a
> particular line of interest?
>
> Once you have come to a conclusion on that question you should realize
> why the function you request doesn't exist.
>
> --
Thanks dpb, Steven, Walter.
For some reason I did not see you guy's post before my replies. Thanks for the enlightment, and possible solution.
|
|
0
|
|
|
|
Reply
|
smith
|
8/25/2010 6:47:08 PM
|
|
"smith Og" <adeog@ymail.com> wrote in message <i535a9$9ji$1@fred.mathworks.com>...
> I have a huge csv file, that I would just like to be able to read specific lines form the file.
>
> Is this posssible, I have tried fget, fscanf, but they don't seem to address the problem very well.
>
> for example, I have about 500,000 rows, and irregular noumber of columns. I know which row or rows I need the information from. Is there a way I can just jump to that line using any of the built in commands.
>
> function fgetl, does not work becuase, it start from first line to the next.
I tested this and it is true that doing a loop in Matlab to find the line is quite slow. Generating a million line file a fetching the 350,000th line for example:
---
fileName = 'largefile.txt'; % your file name here
tgt = 350000; %number of the line here
fid = fopen(fileName);
currLine = 1;
tmp = 'a';
tic
while tmp ~= -1
tmp = fgets(fid);
if currLine == tgt
break;
end
currLine = currLine + 1;
end
toc
fclose(fid);
theline = tmp;
----
This takes about 4 seconds on my computer. If this is the bottleneck in your program then you can always call some external file to do the job. For example this python script reads the nth line in a file:
---
#arg 1 filename arg 2 line num
import sys
try:
linenum = int(sys.argv[2])
f = open(sys.argv[1],'rt')
idx = 0
theline = 1
while idx < linenum and theline:
theline = f.readline()
idx += 1
print theline
sys.exit(0)
except Exception:
print "Error"
sys.exit(1)
---
This script, saved as readnthline.py, can be run from matlab like so:
---
[flag,theline] = unix(sprintf('python readnthline.py %s %d',fileName,tgt));
---
This works just like the matlab script but on my computer takes only 0.2 seconds. Note that this will work unmodified on Linux and probably on Mac. On Windows you would have to have python installed and on your path and call dos instead of unix.
|
|
0
|
|
|
|
Reply
|
Patrick
|
8/25/2010 6:51:04 PM
|
|
On 10-08-25 01:51 PM, Patrick Mineault wrote:
> This takes about 4 seconds on my computer. If this is the bottleneck in
> your program then you can always call some external file to do the job.
> For example this python script reads the nth line in a file:
The equivalent perl script would be
$linenum = pop(@ARGV);
while (<>) { $. == $linenum && chomp && print && exit 0 }
exit 1
The script, saved as readnthline.pl, can be run from matlab like so:
fileName = 'largefile.txt'; % your file name here
tgt = 350000; %number of the line here
[theline,flag] = perl(fileName, tgt);
|
|
0
|
|
|
|
Reply
|
Walter
|
8/25/2010 7:07:43 PM
|
|
smith Og wrote:
> dpb <none@non.net> wrote in message
> <i53j3r$7o5$1@news.eternal-september.org>...
>> smith Og wrote:
>> ...
>> > though I still believe MATLAB should have included a more direct >
>> approach to read lines of interest.
>>
>> This indicates we have "a failure to communicate" problem....
>>
>> As noted earlier, a csv file is a variable-length-record _SEQUENTIAL_
>> file.
>>
>> Precisely describe how would you propose to have a way to determine
>> how many random number of bytes into that file would be the beginning
>> of a particular line of interest?
>>
>> Once you have come to a conclusion on that question you should realize
>> why the function you request doesn't exist.
>>
>> --
>
>
> Thanks dpb, Steven, Walter.
>
> For some reason I did not see you guy's post before my replies. Thanks
> for the enlightment, and possible solution.
And, of course, Steven's solution encapsulated _IS_ that function... :)
It's just that TMW didn't ship it w/ base Matlab leaving it as an
"exercise for the student". <vbg>
--
|
|
0
|
|
|
|
Reply
|
dpb
|
8/25/2010 7:08:32 PM
|
|
Walter Roberson <roberson@hushmail.com> wrote in message <i53po0$bd2$1@canopus.cc.umanitoba.ca>...
> On 10-08-25 01:51 PM, Patrick Mineault wrote:
>
> > This takes about 4 seconds on my computer. If this is the bottleneck in
> > your program then you can always call some external file to do the job.
> > For example this python script reads the nth line in a file:
>
> The equivalent perl script would be
>
> $linenum = pop(@ARGV);
> while (<>) { $. == $linenum && chomp && print && exit 0 }
> exit 1
>
> The script, saved as readnthline.pl, can be run from matlab like so:
>
> fileName = 'largefile.txt'; % your file name here
> tgt = 350000; %number of the line here
>
> [theline,flag] = perl(fileName, tgt);
+2 geek points for perl. However the matlab part isn't right, it's actually:
[theline,flag]=perl('readnthline.pl',fileName,num2str(tgt));
And that takes about .13 s versus .22 s for my Python solution. However the following mex file with some inline assembly is even faster ... (just joking)
|
|
0
|
|
|
|
Reply
|
Patrick
|
8/25/2010 7:30:11 PM
|
|
On 10-08-25 02:30 PM, Patrick Mineault wrote:
> +2 geek points for perl. However the matlab part isn't right, it's
> actually:
>
> [theline,flag]=perl('readnthline.pl',fileName,num2str(tgt));
You are right, and I didn't know before that MATLAB perl() calls only accept
strings... I guess I just never happened to run into that.
Normally if I have a reason to invoke perl, I do so outside of Matlab,
probably doing all the pre-processing in one fell swoop and leaving a
parsing-friendly input file with just the information I need.
Side note: perl is included with the Matlab distribution, which can make it
handy for odd jobs like this one that aren't worth a serious invocation of python.
|
|
0
|
|
|
|
Reply
|
Walter
|
8/25/2010 7:45:32 PM
|
|
|
16 Replies
372 Views
(page loaded in 0.094 seconds)
Similiar Articles: Reading a Line from a file - comp.soft-sys.matlabI have a huge csv file, that I would just like to be able to read specific lines form the file. Is this posssible, I have tried fget, fscanf, but ... reading the previous line in a text file - comp.soft-sys.matlab ...I need to read a large text file and use only certain lines. I can decide wether I need a line only by using information in the next line. line ... Shortest way to read all lines (one by one) from a text file ...Ok, I know in general a way to read text lines ony-by-one from a file into a string variable. But I miss somehow a short one-liner like: foreach... Reading TXT file into MSAccess using Line Input? - comp.databases ...I need to read txt files into MSAccess and get a count of the number of records in the text files. The problem is, sometimes they're delimited and ... I want to read only a few lines from multiple text files - comp ...Suppose I have multiple text files in a directory, and I want to read only a few lines from each and then skip the rest of the text file and move on... Reading in BibTeX files using SAS - comp.soft-sys.sasReading a Line from a file - comp.soft-sys.matlab Reading in BibTeX files using SAS - comp.soft-sys.sas I started by reading in this file so that each line is an ... textscan reads only the first data line - comp.soft-sys.matlab ...I'm trying to read a .txt file with headers, using textscan. This works fine for some files, but in some it only reads the first line of data. My ... Reading a Header txt file - comp.soft-sys.matlabMy approach would probably be to fopen() the file, fgetl() to read the header line, ftell() to record the current position, fgetl() to read the first line of data ... Reading a data file with the command 'load' - comp.soft-sys.math ...how to unzip a file and view it in a single command line? - comp ... Reading a data file with the command 'load' - comp.soft-sys.math ... Hi serge, For data files that ... File read failure - comp.lang.cWhile debugging we could find that the function which reads a line from the file is not reading the entire line. The function uses fgetc () for getting the characters ... Error in reading compressed file - comp.soft-sys.sasI need some help on how to do the following: 'simply read the file, with saszipam, as lines of text and write all but the last line of each file to a new csv file'. How to create a tar + gzip from a listing of file names. - comp ...I have a file with about a dozen file names in it. I want to create a tar/gzip of those files. Currently I have it written this way: CNT=0 while read line do if ... reading files with fields - comp.soft-sys.matlabi have 2 types of files, each with different number of fields. i wanna read the whole line in each file and put it in a struct. i want that the pro... Plot 2D graph by reading .dat file - comp.soft-sys.matlab ...The first few sample lines are : 1.2356 12.0325 24.6589 6.2563 13.2564 5 ... 2D Waterfall plot - comp.soft-sys.matlab Plot 2D graph by reading .dat file - comp.soft-sys ... textscan not reading all the values - comp.soft-sys.matlab ...Hello, I am using textscan to read a text file that has hundreds of lines and each value in a line delimited by a space. To be precise, these are th... How to Read a File Line by Line in PHP | eHow.comPHP is a scripting language created in 1995 by Rasmus Lerdorf. It was originally designed to create dynamic web pages, but it can also be used in a standalone ... Reading a File Line By Line | DaniWebWhile not terribly difficult, first timers may not know the functions to look up and use. Here is some basic code for reading a file one line at a time. The contents ... 7/27/2012 12:31:44 AM
|