500 million rows every (8 hour) day

  • Follow


I've been asked to investigate how to best build an Oracle DB that
will be able to handle the insertion of 500 million rows every 8 hour
work day.  The data will be batch inserted via sql loader every few
seconds, into two main tables, distributed about 8:1.  Each table is
about 30 columns wide.  Around 6 months of data will be retained.

At the end of the day, a few stored procedures will have to be run
that produce 4 or 5 reports within a few minutes.  Each of the reports
will contain about a million rows.  Afterward, about 50 other reports
will be run.  All of the reports will run against the data inserted
during the day, with the exception of maybe .1% of the data which will
reference data from a previous day.

I fully recognise i'm not providing sufficient detail for anyone to
provide very accurate recommendations.  What I'm curious to know if
it's too tall an order for Oracle, or if it will require hadoop, or
similar technology.  I've done quite a bit of SQL Server work, but not
nearly to this scale.


thx



0
Reply richard_ln (1) 11/24/2009 2:48:53 AM

"Richard Last" <richard_ln@yahoo.com> wrote in message 
news:884bd094-e613-47fe-8c4c-f6e9ea2d69dc@e7g2000vbi.googlegroups.com...
> I've been asked to investigate how to best build an Oracle DB that
> will be able to handle the insertion of 500 million rows every 8 hour
> work day.  The data will be batch inserted via sql loader every few
> seconds, into two main tables, distributed about 8:1.  Each table is
> about 30 columns wide.  Around 6 months of data will be retained.
>
> At the end of the day, a few stored procedures will have to be run
> that produce 4 or 5 reports within a few minutes.  Each of the reports
> will contain about a million rows.  Afterward, about 50 other reports
> will be run.  All of the reports will run against the data inserted
> during the day, with the exception of maybe .1% of the data which will
> reference data from a previous day.
>
> I fully recognise i'm not providing sufficient detail for anyone to
> provide very accurate recommendations.  What I'm curious to know if
> it's too tall an order for Oracle, or if it will require hadoop, or
> similar technology.  I've done quite a bit of SQL Server work, but not
> nearly to this scale.
>
>
> thx
>
>
>

You are in rigth path. Go ahead, just find rigth questions.
Maybe you would use partitioned table.
http://www.psoug.org/reference/partitions.html




0
Reply Jack 11/24/2009 7:21:16 AM


Surely it will depend much on quick hard drives especially for writing and 
enough main memory for the SGA to hold the indexes needed and much of the 
data of one day.

Arne Ortlinghaus
ACS Data Systems 

0
Reply Arne 11/24/2009 8:27:47 AM

"Richard Last" <richard_ln@yahoo.com> wrote in message 
news:884bd094-e613-47fe-8c4c-f6e9ea2d69dc@e7g2000vbi.googlegroups.com...
> I've been asked to investigate how to best build an Oracle DB that
> will be able to handle the insertion of 500 million rows every 8 hour
> work day.  The data will be batch inserted via sql loader every few
> seconds, into two main tables, distributed about 8:1.  Each table is
> about 30 columns wide.  Around 6 months of data will be retained.
>
> At the end of the day, a few stored procedures will have to be run
> that produce 4 or 5 reports within a few minutes.  Each of the reports
> will contain about a million rows.  Afterward, about 50 other reports
> will be run.  All of the reports will run against the data inserted
> during the day, with the exception of maybe .1% of the data which will
> reference data from a previous day.
>
> I fully recognise i'm not providing sufficient detail for anyone to
> provide very accurate recommendations.  What I'm curious to know if
> it's too tall an order for Oracle, or if it will require hadoop, or
> similar technology.  I've done quite a bit of SQL Server work, but not
> nearly to this scale.
>
>
> thx
>
>
>

Some relevant questions:
Data amount in TB?
Which hardware brand you wish to use?
Contact oracle and ask "Oracle Optimized Warehouse"

You will get that optimized system running, delivered and cost efective very 
fast.

Have fun! 


0
Reply Jack 11/24/2009 8:33:10 AM

On Nov 24, 3:27=A0am, "Arne Ortlinghaus" <Arne.Ortlingh...@acs.it>
wrote:
> Surely it will depend much on quick hard drives especially for writing an=
d
> enough main memory for the SGA to hold the indexes needed and much of the
> data of one day.
>
> Arne Ortlinghaus
> ACS Data Systems

What has been mentioned by others... Oracle RAC, 96GB RAM, several
hundred TB of SAN storage.  The hardware budget is bigger than the GDP
of some countries!!!!!




0
Reply Richard 11/24/2009 1:55:01 PM

Richard Last schreef:
> I've been asked to investigate how to best build an Oracle DB that
> will be able to handle the insertion of 500 million rows every 8 hour
> work day.  The data will be batch inserted via sql loader every few
> seconds, into two main tables, distributed about 8:1.  Each table is
> about 30 columns wide.  Around 6 months of data will be retained.
> 
> At the end of the day, a few stored procedures will have to be run
> that produce 4 or 5 reports within a few minutes.  Each of the reports
> will contain about a million rows.  Afterward, about 50 other reports
> will be run.  All of the reports will run against the data inserted
> during the day, with the exception of maybe .1% of the data which will
> reference data from a previous day.
> 
> I fully recognise i'm not providing sufficient detail for anyone to
> provide very accurate recommendations.  What I'm curious to know if
> it's too tall an order for Oracle, or if it will require hadoop, or
> similar technology.  I've done quite a bit of SQL Server work, but not
> nearly to this scale.
> 
> 
> thx
> 
> 
> 

If the budget is high enough, I'd consider a full rack Exadata Server.

Shakespeare
0
Reply Shakespeare 11/24/2009 3:02:19 PM

On Nov 24, 8:55=A0am, Richard Last <richard...@yahoo.com> wrote:
> On Nov 24, 3:27=A0am, "Arne Ortlinghaus" <Arne.Ortlingh...@acs.it>
> wrote:
>
> > Surely it will depend much on quick hard drives especially for writing =
and
> > enough main memory for the SGA to hold the indexes needed and much of t=
he
> > data of one day.
>
> > Arne Ortlinghaus
> > ACS Data Systems
>
> What has been mentioned by others... Oracle RAC, 96GB RAM, several
> hundred TB of SAN storage. =A0The hardware budget is bigger than the GDP
> of some countries!!!!!

Richard, I think the answer depends on what and how the data will be
used for.  That is how will the data be queried?  Will it permanently
reside in the database or is the database just a holding point till
the data is filtered and transferred to its pemanent home?  Is the
initial insert done to the data's final store within the database or
is it moved within the database?

I am not a big fan of using sqlldr direct path loads in a production
environment.  If a direct path load job fails then the indexes become
unusable and require rebuilding.  Having to rebuild large indexes on a
massive single point of insertion table pretty much brings the system
to a halt.  If there will be multiple, concurrent sources of data
intput I would insist on using convention load or using programs
written to use bulk inserts or the older array insert feature of pro*
languages.

HTH -- Mark D Powell --
0
Reply Mark 11/24/2009 3:25:07 PM

On Nov 24, 10:25=A0am, Mark D Powell <Mark.Powe...@hp.com> wrote:

snip

> Richard, I think the answer depends on what and how the data will be
> used for. =A0That is how will the data be queried? =A0Will it permanently
> reside in the database or is the database just a holding point till
> the data is filtered and transferred to its pemanent home? =A0Is the
> initial insert done to the data's final store within the database or
> is it moved within the database?
>
> I am not a big fan of using sqlldr direct path loads in a production
> environment. =A0If a direct path load job fails then the indexes become
> unusable and require rebuilding. =A0Having to rebuild large indexes on a
> massive single point of insertion table pretty much brings the system
> to a halt. =A0If there will be multiple, concurrent sources of data
> intput I would insist on using convention load or using programs
> written to use bulk inserts or the older array insert feature of pro*
> languages.

What the OP seemed to describe is a daily load cycle and if I
understood it correctly I would seriously think about using some kind
of daily staging table that accumulates the input.  It possibly could
have no real useful indexes on it ... just have data dumped into it
( depending on what and where validation is needed ) ... then near the
end of the cycle use it to go elsewhere.

Partitioning might be a possibility.  Inserting 500 million rows over
8 hours is pretty trivial on even most modest servers.

Designing properly the storage structures to handle 6 months of this
stuff then ditch it ( perhaps ) in some fashion or move it out
elsewhere to me sounds like where the business rules would really need
to be understood and clarified.
0
Reply hpuxrac 11/25/2009 12:31:40 AM

On Nov 24, 4:02=A0pm, Shakespeare <what...@xs4all.nl> wrote:
>
> If the budget is high enough, I'd consider a full rack Exadata Server.
>
> Shakespeare

Or even better, move to Teradata ;-)

Cheers.

Carlos.
0
Reply Carlos 11/25/2009 8:13:48 AM

Richard Last wrote,on my timestamp of 24/11/2009 1:48 PM:

> 
> I fully recognise i'm not providing sufficient detail for anyone to
> provide very accurate recommendations.  What I'm curious to know if
> it's too tall an order for Oracle, or if it will require hadoop, or
> similar technology.  I've done quite a bit of SQL Server work, but not
> nearly to this scale.


Here are my not very accurate recommendations, for the reasons you pointed out.

Partition by day.  Make the day's tables separate, manipulate them for the daily 
reporting as needed. Add them to partitioned tables at end of day and run 
reports across days then. At end of 6 months, drop partitions off the start of 
the table.
Make sure no one comes up with some nonsense of storing LOBs at those volumes!

No matter what database technology is used, you'll need some serious disk 
capacity.  As well as memory and CPU. Look at compression at database layer. 
You're definitely looking at 11gr1 or even r2: some improvements in partitioning 
in those releases that would definitely help.

At these sort of volumes, the secret is always the same: storage architecture 
must avoid having to traverse data not immediately needed.  Whatever the db.
0
Reply Noons 11/25/2009 11:53:14 AM

On Nov 25, 5:53=A0am, Noons <wizofo...@yahoo.com.au> wrote:
> Richard Last wrote,on my timestamp of 24/11/2009 1:48 PM:
>
>
>
> > I fully recognise i'm not providing sufficient detail for anyone to
> > provide very accurate recommendations. =A0What I'm curious to know if
> > it's too tall an order for Oracle, or if it will require hadoop, or
> > similar technology. =A0I've done quite a bit of SQL Server work, but no=
t
> > nearly to this scale.
>
> Here are my not very accurate recommendations, for the reasons you pointe=
d out.
>
> Partition by day. =A0Make the day's tables separate, manipulate them for =
the daily
> reporting as needed. Add them to partitioned tables at end of day and run
> reports across days then. At end of 6 months, drop partitions off the sta=
rt of
> the table.
> Make sure no one comes up with some nonsense of storing LOBs at those vol=
umes!
>
> No matter what database technology is used, you'll need some serious disk
> capacity. =A0As well as memory and CPU. Look at compression at database l=
ayer.
> You're definitely looking at 11gr1 or even r2: some improvements in parti=
tioning
> in those releases that would definitely help.
>
> At these sort of volumes, the secret is always the same: storage architec=
ture
> must avoid having to traverse data not immediately needed. =A0Whatever th=
e db.

In a previous life we did the the same type of load. We were getting
about the same number of
rows in a 6 hour period.

We partitioned by day and subpartitioned by 'loader'.  We had 12
programs loading data and they each had a number associated with them
that made up the subpartioning key. Obviously that didn't help with
queries but the queries were generally over an entire day anyway.
These programs were not using any type of direct path loading they
were just batching up a couple of hundred rows and inserting them.



0
Reply scottgamble 11/25/2009 12:45:09 PM

>> If the budget is high enough, I'd consider a full rack Exadata Server.

>Or even better, move to Teradata ;-)

But...Exa is a million times better than Tera! ;-)

I wish I had such a beast of a database server under my control
though...heck even the minimal configuration of an Oracle Database Machine
is far more powerful than most customers' setups.
0
Reply Kai 11/25/2009 6:14:04 PM

Kai Rode schreef:
>>> If the budget is high enough, I'd consider a full rack Exadata Server.
> 
>> Or even better, move to Teradata ;-)
> 
> But...Exa is a million times better than Tera! ;-)
> 
> I wish I had such a beast of a database server under my control
> though...heck even the minimal configuration of an Oracle Database Machine
> is far more powerful than most customers' setups.

And by far more expensive too.... still would love to have a full 
rack... If I were half as rich as Larry, I'd buy one.

Shakespeare
(What's in a buck?)
0
Reply Shakespeare 11/25/2009 8:57:06 PM

"scottgamble@comcast.net" <zifnabiom@gmail.com> wrote in message 
news:83513236-dd64-4273-a89c-814ed56fbeb8@m16g2000yqc.googlegroups.com...
On Nov 25, 5:53 am, Noons <wizofo...@yahoo.com.au> wrote:
> Richard Last wrote,on my timestamp of 24/11/2009 1:48 PM:
>
>
>
> > I fully recognise i'm not providing sufficient detail for anyone to
> > provide very accurate recommendations. What I'm curious to know if
> > it's too tall an order for Oracle, or if it will require hadoop, or
> > similar technology. I've done quite a bit of SQL Server work, but not
> > nearly to this scale.
>
> Here are my not very accurate recommendations, for the reasons you pointed 
> out.
>
> Partition by day. Make the day's tables separate, manipulate them for the 
> daily
> reporting as needed. Add them to partitioned tables at end of day and run
> reports across days then. At end of 6 months, drop partitions off the 
> start of
> the table.
> Make sure no one comes up with some nonsense of storing LOBs at those 
> volumes!
>
> No matter what database technology is used, you'll need some serious disk
> capacity. As well as memory and CPU. Look at compression at database 
> layer.
> You're definitely looking at 11gr1 or even r2: some improvements in 
> partitioning
> in those releases that would definitely help.
>
> At these sort of volumes, the secret is always the same: storage 
> architecture
> must avoid having to traverse data not immediately needed. Whatever the 
> db.

In a previous life we did the the same type of load. We were getting
about the same number of
rows in a 6 hour period.

We partitioned by day and subpartitioned by 'loader'.  We had 12
programs loading data and they each had a number associated with them
that made up the subpartioning key. Obviously that didn't help with
queries but the queries were generally over an entire day anyway.
These programs were not using any type of direct path loading they
were just batching up a couple of hundred rows and inserting them.

-------------------
Nowadays it would be couple of (ten) thousands rows.

Advanced compression will save some TB of storage
http://www.oracle.com/database/advanced-compression.html
Also disk I/O will be faster




0
Reply Jack 11/26/2009 7:25:22 AM

Jack wrote,on my timestamp of 26/11/2009 6:25 PM:

> Advanced compression will save some TB of storage
> http://www.oracle.com/database/advanced-compression.html
> Also disk I/O will be faster

Not really: compression does not change the speed of disk I/O one atom.
It just makes less use of it.
0
Reply Noons 11/28/2009 12:57:49 PM

14 Replies
424 Views

(page loaded in 0.156 seconds)

Similiar Articles:

















7/28/2012 6:30:54 AM


Reply: