|
|
Parsing huge text files?
Hi all,
I have huge logs files that need to be processed. Each log file is
about 500MB.
These are non-structured mixtures of strings and numbers.
To give an example, the first row might contain trader id, trade_id,
quantity, limit_price, etc.
And then the next row contains the acknowledgement returned back from
the exchange after the trade is sent to the exchange.
The next a few rows may contain info related to earlier trades and the
status of the orderbook, etc.
And then the next row may contain the fills info for that trade sent
to the exchange. Of course, we use the "trade_id" to keep track of the
info.
And also, in between rows, the log file contains the status of the
order book (order depth, etc.)
We may want to construct the queries, such as what trades did a trader
do and what are the average prices for his trades...
The most stupid way I can think of is to search the whole hundreds of
such log files for a specified trader id, and then find the trade_ids
and then search the details of the trades for each of the
trade_ids...
Therefore there will be a lot of searching and data-extracting on
these huge files...
Is there a way to parse the huge log files conveniently in Matlab, or
read one line at a time to parse the text data?
Any tools that can help processing such meta-data?
Thanks a lot!
|
|
0
|
|
|
|
Reply
|
lunamoonmoon (258)
|
4/13/2011 2:47:11 AM |
|
Luna Moon <lunamoonmoon@gmail.com> writes:
> I have huge logs files that need to be processed. Each log file is
> about 500MB.
The 'textscan' command is your friend. If your data does not match a
regular pattern, preprocess the files with a Perl script and perform
the analysis in Matlab.
--
Ralph Schleicher <http://ralph-schleicher.de>
Development * Consulting * Training
Mathematical Modeling and Simulation
Software Tools
|
|
0
|
|
|
|
Reply
|
usenet4143 (24)
|
4/13/2011 5:00:02 AM
|
|
On Apr 13, 4:47=A0am, Luna Moon <lunamoonm...@gmail.com> wrote:
> The most stupid way I can think of is to search the whole hundreds of
> such log files for a specified trader id, and then find the trade_ids
> and then search the details of the trades for each of the
> trade_ids...
Maybe 'stupid', but that's the only way if you only
have these log-files.
> Therefore there will be a lot of searching and data-extracting on
> these huge files...
>
> Is there a way to parse the huge log files conveniently in Matlab, or
> read one line at a time to parse the text data?
>
> Any tools that can help processing such meta-data?
If you have hundreds of GByte'ish-sized files, expect to
spend a lot of time waiting for the results - hours or
even days every time you do the scan. I would suggest you
set up some sort of database where you store the data, so
that you easily can find whatever info in the future.
Rune
|
|
0
|
|
|
|
Reply
|
allnor (8474)
|
4/13/2011 5:49:22 AM
|
|
|
2 Replies
66 Views
(page loaded in 0.65 seconds)
Similiar Articles: parsing a text file - comp.soft-sys.matlabParsing Text Files with the TextFieldParser Object The TextFieldParser object allows you to parse and process very large file that are structured as delimited-width ... Parsing table in rtf file - comp.lang.perl.miscI am trying to extract data from the table in a large number of rtf files. ... Simple parsing text , but not for newbie - comp.lang.awk ... Parsing table in ... Very fast delimited record parsing with boost - comp.lang.c++ ...The solution I used was to read the file in large blocks, scanning from the start ... file, point the parser to the memory-mapped data. However ... parsing a text file ... Indirect object referencing (PDF parsing) - comp.text.pdf ...On the other hand, it is a big book so equally you ... comp.text.pdf Indirect object referencing (PDF parsing) - comp.text ... Removing duplicates from within sections of a file ... Reading TXT file into MSAccess using Line Input? - comp.databases ...So the tast would be to parse out the fields. You can use Instr to check if a ... Import large text file to a MS Access database; Reading from a Text file. Programmatically check pdf page size? - comp.text.pdfHowever, most of the api are used for _generating_ pdf files, rather than parsing existing ... comp.text.pdf Programmatically check pdf page size? - comp.text.pdf large (page ... Re: 3D graphing from text file? - comp.soft-sys.math.mathematica ...Is it possible to have mathematica >automatically parse this file and insert ... Matlab code which takes the data from a huge text file and ... ZBuffer, and OpenGL ... data extraction from large datafile - comp.soft-sys.matlab ...Would like to use Awk to extract data from a large text file. Sometimes the data ... The XML Extractor application uses SAX to parse a large XML file, and extracts data from ... how to use multiline records to split a large file but only a ...Very fast delimited record parsing with boost - comp.lang ... multiline records to split a ... ... There are a huge number of unnecessary ... regarding writing text files ... Reading a Line from a file - comp.soft-sys.matlabI need to read a large text file and use only certain ... soft-sys.sas I started by reading in this file so that each line is an observation, and started parsing the text ... Parsing Text Files with the TextFieldParser ObjectThe TextFieldParser object allows you to parse and process very large file that are structured as delimited-width columns of text, such as log files or legacy ... c# - What's the best way to read and parse a large text file over ...I have a problem which requires me to parse several log files from a remote machine. There are a few complications: 1) The file may be in use 2) The files can be ... 7/28/2012 11:38:01 PM
|
|
|
|
|
|
|
|
|