I've been asked to investigate how to best build an Oracle DB that
will be able to handle the insertion of 500 million rows every 8 hour
work day. The data will be batch inserted via sql loader every few
seconds, into two main tables, distributed about 8:1. Each table is
about 30 columns wide. Around 6 months of data will be retained.
At the end of the day, a few stored procedures will have to be run
that produce 4 or 5 reports within a few minutes. Each of the reports
will contain about a million rows. Afterward, about 50 other reports
will be run. All of the reports will run against the data inserted
during the day, with the exception of maybe .1% of the data which will
reference data from a previous day.
I fully recognise i'm not providing sufficient detail for anyone to
provide very accurate recommendations. What I'm curious to know if
it's too tall an order for Oracle, or if it will require hadoop, or
similar technology. I've done quite a bit of SQL Server work, but not
nearly to this scale.
thx
|
|
0
|
|
|
|
Reply
|
richard_ln (1)
|
11/24/2009 2:48:53 AM |
|
"Richard Last" <richard_ln@yahoo.com> wrote in message
news:884bd094-e613-47fe-8c4c-f6e9ea2d69dc@e7g2000vbi.googlegroups.com...
> I've been asked to investigate how to best build an Oracle DB that
> will be able to handle the insertion of 500 million rows every 8 hour
> work day. The data will be batch inserted via sql loader every few
> seconds, into two main tables, distributed about 8:1. Each table is
> about 30 columns wide. Around 6 months of data will be retained.
>
> At the end of the day, a few stored procedures will have to be run
> that produce 4 or 5 reports within a few minutes. Each of the reports
> will contain about a million rows. Afterward, about 50 other reports
> will be run. All of the reports will run against the data inserted
> during the day, with the exception of maybe .1% of the data which will
> reference data from a previous day.
>
> I fully recognise i'm not providing sufficient detail for anyone to
> provide very accurate recommendations. What I'm curious to know if
> it's too tall an order for Oracle, or if it will require hadoop, or
> similar technology. I've done quite a bit of SQL Server work, but not
> nearly to this scale.
>
>
> thx
>
>
>
You are in rigth path. Go ahead, just find rigth questions.
Maybe you would use partitioned table.
http://www.psoug.org/reference/partitions.html
|
|
0
|
|
|
|
Reply
|
Jack
|
11/24/2009 7:21:16 AM
|
|
Surely it will depend much on quick hard drives especially for writing and
enough main memory for the SGA to hold the indexes needed and much of the
data of one day.
Arne Ortlinghaus
ACS Data Systems
|
|
0
|
|
|
|
Reply
|
Arne
|
11/24/2009 8:27:47 AM
|
|
"Richard Last" <richard_ln@yahoo.com> wrote in message
news:884bd094-e613-47fe-8c4c-f6e9ea2d69dc@e7g2000vbi.googlegroups.com...
> I've been asked to investigate how to best build an Oracle DB that
> will be able to handle the insertion of 500 million rows every 8 hour
> work day. The data will be batch inserted via sql loader every few
> seconds, into two main tables, distributed about 8:1. Each table is
> about 30 columns wide. Around 6 months of data will be retained.
>
> At the end of the day, a few stored procedures will have to be run
> that produce 4 or 5 reports within a few minutes. Each of the reports
> will contain about a million rows. Afterward, about 50 other reports
> will be run. All of the reports will run against the data inserted
> during the day, with the exception of maybe .1% of the data which will
> reference data from a previous day.
>
> I fully recognise i'm not providing sufficient detail for anyone to
> provide very accurate recommendations. What I'm curious to know if
> it's too tall an order for Oracle, or if it will require hadoop, or
> similar technology. I've done quite a bit of SQL Server work, but not
> nearly to this scale.
>
>
> thx
>
>
>
Some relevant questions:
Data amount in TB?
Which hardware brand you wish to use?
Contact oracle and ask "Oracle Optimized Warehouse"
You will get that optimized system running, delivered and cost efective very
fast.
Have fun!
|
|
0
|
|
|
|
Reply
|
Jack
|
11/24/2009 8:33:10 AM
|
|
On Nov 24, 3:27=A0am, "Arne Ortlinghaus" <Arne.Ortlingh...@acs.it>
wrote:
> Surely it will depend much on quick hard drives especially for writing an=
d
> enough main memory for the SGA to hold the indexes needed and much of the
> data of one day.
>
> Arne Ortlinghaus
> ACS Data Systems
What has been mentioned by others... Oracle RAC, 96GB RAM, several
hundred TB of SAN storage. The hardware budget is bigger than the GDP
of some countries!!!!!
|
|
0
|
|
|
|
Reply
|
Richard
|
11/24/2009 1:55:01 PM
|
|
Richard Last schreef:
> I've been asked to investigate how to best build an Oracle DB that
> will be able to handle the insertion of 500 million rows every 8 hour
> work day. The data will be batch inserted via sql loader every few
> seconds, into two main tables, distributed about 8:1. Each table is
> about 30 columns wide. Around 6 months of data will be retained.
>
> At the end of the day, a few stored procedures will have to be run
> that produce 4 or 5 reports within a few minutes. Each of the reports
> will contain about a million rows. Afterward, about 50 other reports
> will be run. All of the reports will run against the data inserted
> during the day, with the exception of maybe .1% of the data which will
> reference data from a previous day.
>
> I fully recognise i'm not providing sufficient detail for anyone to
> provide very accurate recommendations. What I'm curious to know if
> it's too tall an order for Oracle, or if it will require hadoop, or
> similar technology. I've done quite a bit of SQL Server work, but not
> nearly to this scale.
>
>
> thx
>
>
>
If the budget is high enough, I'd consider a full rack Exadata Server.
Shakespeare
|
|
0
|
|
|
|
Reply
|
Shakespeare
|
11/24/2009 3:02:19 PM
|
|
On Nov 24, 8:55=A0am, Richard Last <richard...@yahoo.com> wrote:
> On Nov 24, 3:27=A0am, "Arne Ortlinghaus" <Arne.Ortlingh...@acs.it>
> wrote:
>
> > Surely it will depend much on quick hard drives especially for writing =
and
> > enough main memory for the SGA to hold the indexes needed and much of t=
he
> > data of one day.
>
> > Arne Ortlinghaus
> > ACS Data Systems
>
> What has been mentioned by others... Oracle RAC, 96GB RAM, several
> hundred TB of SAN storage. =A0The hardware budget is bigger than the GDP
> of some countries!!!!!
Richard, I think the answer depends on what and how the data will be
used for. That is how will the data be queried? Will it permanently
reside in the database or is the database just a holding point till
the data is filtered and transferred to its pemanent home? Is the
initial insert done to the data's final store within the database or
is it moved within the database?
I am not a big fan of using sqlldr direct path loads in a production
environment. If a direct path load job fails then the indexes become
unusable and require rebuilding. Having to rebuild large indexes on a
massive single point of insertion table pretty much brings the system
to a halt. If there will be multiple, concurrent sources of data
intput I would insist on using convention load or using programs
written to use bulk inserts or the older array insert feature of pro*
languages.
HTH -- Mark D Powell --
|
|
0
|
|
|
|
Reply
|
Mark
|
11/24/2009 3:25:07 PM
|
|
On Nov 24, 10:25=A0am, Mark D Powell <Mark.Powe...@hp.com> wrote:
snip
> Richard, I think the answer depends on what and how the data will be
> used for. =A0That is how will the data be queried? =A0Will it permanently
> reside in the database or is the database just a holding point till
> the data is filtered and transferred to its pemanent home? =A0Is the
> initial insert done to the data's final store within the database or
> is it moved within the database?
>
> I am not a big fan of using sqlldr direct path loads in a production
> environment. =A0If a direct path load job fails then the indexes become
> unusable and require rebuilding. =A0Having to rebuild large indexes on a
> massive single point of insertion table pretty much brings the system
> to a halt. =A0If there will be multiple, concurrent sources of data
> intput I would insist on using convention load or using programs
> written to use bulk inserts or the older array insert feature of pro*
> languages.
What the OP seemed to describe is a daily load cycle and if I
understood it correctly I would seriously think about using some kind
of daily staging table that accumulates the input. It possibly could
have no real useful indexes on it ... just have data dumped into it
( depending on what and where validation is needed ) ... then near the
end of the cycle use it to go elsewhere.
Partitioning might be a possibility. Inserting 500 million rows over
8 hours is pretty trivial on even most modest servers.
Designing properly the storage structures to handle 6 months of this
stuff then ditch it ( perhaps ) in some fashion or move it out
elsewhere to me sounds like where the business rules would really need
to be understood and clarified.
|
|
0
|
|
|
|
Reply
|
hpuxrac
|
11/25/2009 12:31:40 AM
|
|
On Nov 24, 4:02=A0pm, Shakespeare <what...@xs4all.nl> wrote:
>
> If the budget is high enough, I'd consider a full rack Exadata Server.
>
> Shakespeare
Or even better, move to Teradata ;-)
Cheers.
Carlos.
|
|
0
|
|
|
|
Reply
|
Carlos
|
11/25/2009 8:13:48 AM
|
|
Richard Last wrote,on my timestamp of 24/11/2009 1:48 PM:
>
> I fully recognise i'm not providing sufficient detail for anyone to
> provide very accurate recommendations. What I'm curious to know if
> it's too tall an order for Oracle, or if it will require hadoop, or
> similar technology. I've done quite a bit of SQL Server work, but not
> nearly to this scale.
Here are my not very accurate recommendations, for the reasons you pointed out.
Partition by day. Make the day's tables separate, manipulate them for the daily
reporting as needed. Add them to partitioned tables at end of day and run
reports across days then. At end of 6 months, drop partitions off the start of
the table.
Make sure no one comes up with some nonsense of storing LOBs at those volumes!
No matter what database technology is used, you'll need some serious disk
capacity. As well as memory and CPU. Look at compression at database layer.
You're definitely looking at 11gr1 or even r2: some improvements in partitioning
in those releases that would definitely help.
At these sort of volumes, the secret is always the same: storage architecture
must avoid having to traverse data not immediately needed. Whatever the db.
|
|
0
|
|
|
|
Reply
|
Noons
|
11/25/2009 11:53:14 AM
|
|
On Nov 25, 5:53=A0am, Noons <wizofo...@yahoo.com.au> wrote:
> Richard Last wrote,on my timestamp of 24/11/2009 1:48 PM:
>
>
>
> > I fully recognise i'm not providing sufficient detail for anyone to
> > provide very accurate recommendations. =A0What I'm curious to know if
> > it's too tall an order for Oracle, or if it will require hadoop, or
> > similar technology. =A0I've done quite a bit of SQL Server work, but no=
t
> > nearly to this scale.
>
> Here are my not very accurate recommendations, for the reasons you pointe=
d out.
>
> Partition by day. =A0Make the day's tables separate, manipulate them for =
the daily
> reporting as needed. Add them to partitioned tables at end of day and run
> reports across days then. At end of 6 months, drop partitions off the sta=
rt of
> the table.
> Make sure no one comes up with some nonsense of storing LOBs at those vol=
umes!
>
> No matter what database technology is used, you'll need some serious disk
> capacity. =A0As well as memory and CPU. Look at compression at database l=
ayer.
> You're definitely looking at 11gr1 or even r2: some improvements in parti=
tioning
> in those releases that would definitely help.
>
> At these sort of volumes, the secret is always the same: storage architec=
ture
> must avoid having to traverse data not immediately needed. =A0Whatever th=
e db.
In a previous life we did the the same type of load. We were getting
about the same number of
rows in a 6 hour period.
We partitioned by day and subpartitioned by 'loader'. We had 12
programs loading data and they each had a number associated with them
that made up the subpartioning key. Obviously that didn't help with
queries but the queries were generally over an entire day anyway.
These programs were not using any type of direct path loading they
were just batching up a couple of hundred rows and inserting them.
|
|
0
|
|
|
|
Reply
|
scottgamble
|
11/25/2009 12:45:09 PM
|
|
>> If the budget is high enough, I'd consider a full rack Exadata Server.
>Or even better, move to Teradata ;-)
But...Exa is a million times better than Tera! ;-)
I wish I had such a beast of a database server under my control
though...heck even the minimal configuration of an Oracle Database Machine
is far more powerful than most customers' setups.
|
|
0
|
|
|
|
Reply
|
Kai
|
11/25/2009 6:14:04 PM
|
|
Kai Rode schreef:
>>> If the budget is high enough, I'd consider a full rack Exadata Server.
>
>> Or even better, move to Teradata ;-)
>
> But...Exa is a million times better than Tera! ;-)
>
> I wish I had such a beast of a database server under my control
> though...heck even the minimal configuration of an Oracle Database Machine
> is far more powerful than most customers' setups.
And by far more expensive too.... still would love to have a full
rack... If I were half as rich as Larry, I'd buy one.
Shakespeare
(What's in a buck?)
|
|
0
|
|
|
|
Reply
|
Shakespeare
|
11/25/2009 8:57:06 PM
|
|
"scottgamble@comcast.net" <zifnabiom@gmail.com> wrote in message
news:83513236-dd64-4273-a89c-814ed56fbeb8@m16g2000yqc.googlegroups.com...
On Nov 25, 5:53 am, Noons <wizofo...@yahoo.com.au> wrote:
> Richard Last wrote,on my timestamp of 24/11/2009 1:48 PM:
>
>
>
> > I fully recognise i'm not providing sufficient detail for anyone to
> > provide very accurate recommendations. What I'm curious to know if
> > it's too tall an order for Oracle, or if it will require hadoop, or
> > similar technology. I've done quite a bit of SQL Server work, but not
> > nearly to this scale.
>
> Here are my not very accurate recommendations, for the reasons you pointed
> out.
>
> Partition by day. Make the day's tables separate, manipulate them for the
> daily
> reporting as needed. Add them to partitioned tables at end of day and run
> reports across days then. At end of 6 months, drop partitions off the
> start of
> the table.
> Make sure no one comes up with some nonsense of storing LOBs at those
> volumes!
>
> No matter what database technology is used, you'll need some serious disk
> capacity. As well as memory and CPU. Look at compression at database
> layer.
> You're definitely looking at 11gr1 or even r2: some improvements in
> partitioning
> in those releases that would definitely help.
>
> At these sort of volumes, the secret is always the same: storage
> architecture
> must avoid having to traverse data not immediately needed. Whatever the
> db.
In a previous life we did the the same type of load. We were getting
about the same number of
rows in a 6 hour period.
We partitioned by day and subpartitioned by 'loader'. We had 12
programs loading data and they each had a number associated with them
that made up the subpartioning key. Obviously that didn't help with
queries but the queries were generally over an entire day anyway.
These programs were not using any type of direct path loading they
were just batching up a couple of hundred rows and inserting them.
-------------------
Nowadays it would be couple of (ten) thousands rows.
Advanced compression will save some TB of storage
http://www.oracle.com/database/advanced-compression.html
Also disk I/O will be faster
|
|
0
|
|
|
|
Reply
|
Jack
|
11/26/2009 7:25:22 AM
|
|
Jack wrote,on my timestamp of 26/11/2009 6:25 PM:
> Advanced compression will save some TB of storage
> http://www.oracle.com/database/advanced-compression.html
> Also disk I/O will be faster
Not really: compression does not change the speed of disk I/O one atom.
It just makes less use of it.
|
|
0
|
|
|
|
Reply
|
Noons
|
11/28/2009 12:57:49 PM
|
|
|
14 Replies
424 Views
(page loaded in 0.156 seconds)
Similiar Articles: 500 million rows every (8 hour) day - comp.databases.oracle.server ...I've been asked to investigate how to best build an Oracle DB that will be able to handle the insertion of 500 million rows every 8 hour work day. ... Recover deleted rows from a table from archive logs? - comp ...500 million rows every (8 hour) day - comp.databases.oracle.server ... Recover deleted rows from a table from archive logs? - comp ... 500 million rows every (8 hour) day ... script to renice oracle jobs - comp.unix.admin500 million rows every (8 hour) day - comp.databases.oracle.server ... Oracle RAC, 96GB RAM, several hundred TB of SAN storage. ... If a direct path load job fails then ... socket in use ? - comp.protocols.time.ntpCrontab starts the ntp-clinet every hour to ... cannot keep time correctly; after 1 day the clock goes for approx. 1 hour ... of the clock is under 500 parts per million. 500 ... Efficient way to delete a large amount of records from a big table ...... million records out of 250 million daily). Initially, with once-a-day ... hour. With an index on the timestamp, InnoDB's row locking, and a 30-second deletion cycle every ... How best to detect duplicate values in a column? - comp.databases ...But because of the reporting on every script-run, if due to large number of rows ... I'd say about a million rows ....then you ... discovered it myself by a couple hours ... time reset, syncronisation lost and Large PPM values - comp ...... bothers ntpd so it looses sync about every half hour. ... service ntpd start 7) Wait for a day or 2 for ... does have a frequency error in excess of 500 parts per million. Ike phase 1 rekey & timeout - comp.dcom.sys.ciscoThe default is likely once per day (86,400 sec.). You might want a lifetime of an hour ... not receive "three" keepalives in a row ... change isakmp duration to around 1= 500 ... Sampling: What Nyquist Didn't Say, and What to Do About It - comp ...On a sunny day (Mon, 20 Dec 2010 01:40:32 -0800) it ... start as it then seeks the start of the next line (row). ... I think Frame was only about $500 when I bought my ... Bought Deadspace for ten bucks from Steam. Where's the manual ...... diverted all my calls over the 10 day break I took & got 2 days in a row without being hassled - woo hoo! Got about 500 ... calling me 7 days a week almost every 2 hours ... Quick sync between two computers not connected to the internet ...NTP can take hours to settle, and is better ... ntpd has a hard limit of 500 PPM for slewed ... probably get it to within a few parts per million (Ie, a second per day ... [comp.publish.cdrom] CD-Recordable FAQ, Part 1/4 - comp.publish ...Archive-name: cdrom/cd-recordable/part1 Posting-Frequency: monthly Last-modified: 2008/10/09 Version: 2.71 Send corrections and updates to And... How to get envelope from AM signal without phase shift - comp.dsp ...Your problem will probably take 3-4 hours of work. ... It looks like there is more and more of them every day. ... wm1 = 2 \[Pi] fm1; wm2 = 2 \[Pi] fm2; c = 3*10^8; fc = 500 ... 500 million rows every (8 hour) day - comp.databases.oracle.server ...I've been asked to investigate how to best build an Oracle DB that will be able to handle the insertion of 500 million rows every 8 hour work day. ... MS SQL Server :: 500 Million Rows Of Data?500 Million Rows Of Data? I'm new to ... that gets updated every morning, start of the day is copying 4 million rows ... handling about 1.2 million rows takes about 7-8 hours ... 7/28/2012 6:30:54 AM
|