Parsing huge text files?

  • Follow


Hi all,

I have huge logs files that need to be processed. Each log file is
about 500MB.

These are non-structured mixtures of strings and numbers.

To give an example, the first row might  contain trader id, trade_id,
quantity, limit_price, etc.

And then the next row contains the acknowledgement returned back from
the exchange after the trade is sent to the exchange.

The next a few rows may contain info related to earlier trades and the
status of the orderbook, etc.

And then the next row may contain the fills info for that trade sent
to the exchange. Of course, we use the "trade_id" to keep track of the
info.

And also, in between rows, the log file contains the status of the
order book (order depth, etc.)

We may want to construct the queries, such as what trades did a trader
do and what are the average prices for his trades...

The most stupid way I can think of is to search the whole hundreds of
such log files for a specified trader id, and then find the trade_ids
and then search the details of the trades for each of the
trade_ids...

Therefore there will be a lot of searching and data-extracting on
these huge files...

Is there a way to parse the huge log files conveniently in Matlab, or
read one line at a time to parse the text data?

Any tools that can help processing such meta-data?

Thanks a lot!
0
Reply lunamoonmoon (258) 4/13/2011 2:47:11 AM

Luna Moon <lunamoonmoon@gmail.com> writes:

> I have huge logs files that need to be processed. Each log file is
> about 500MB.

The 'textscan' command is your friend.  If your data does not match a
regular pattern, preprocess the files with a Perl script and perform
the analysis in Matlab.

-- 
Ralph Schleicher  <http://ralph-schleicher.de>

Development * Consulting * Training
Mathematical Modeling and Simulation
Software Tools
0
Reply usenet4143 (24) 4/13/2011 5:00:02 AM


On Apr 13, 4:47=A0am, Luna Moon <lunamoonm...@gmail.com> wrote:

> The most stupid way I can think of is to search the whole hundreds of
> such log files for a specified trader id, and then find the trade_ids
> and then search the details of the trades for each of the
> trade_ids...

Maybe 'stupid', but that's the only way if you only
have these log-files.

> Therefore there will be a lot of searching and data-extracting on
> these huge files...
>
> Is there a way to parse the huge log files conveniently in Matlab, or
> read one line at a time to parse the text data?
>
> Any tools that can help processing such meta-data?

If you have hundreds of GByte'ish-sized files, expect to
spend a lot of time waiting for the results - hours or
even days every time you do the scan. I would suggest you
set up some sort of database where you store the data, so
that you easily can find whatever info in the future.

Rune
0
Reply allnor (8474) 4/13/2011 5:49:22 AM

2 Replies
66 Views

(page loaded in 0.65 seconds)

Similiar Articles:













7/28/2012 11:38:01 PM


Reply: