Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs and
added Display Lists drawing for comparison. The new code is available here:
http://allanmb.redirectme.net/binaries/VBOs_vs_DL.zip
Can anybody who is interested, and those who believe that Dispay Lists are
slower than VBOs (or vice versa) please try this out and we can compare the
results? See if we can finally determine which is faster in general or on
specific hardware.
Ok, for my results:
Machine Spec:
Athlon64 3000+
1GB DDR400 RAM
ATI Radeon 9700XT 256MB
(default window size)
VBO speed: ~295 fps
DL speed: ~450 fps
(1600x1200 window)
VBO speed: ~81 fps
DL speed: ~88 fps (dips down to 75ps, but mostly >90fps)
My conclusion is that Display Lists are faster for my system but I would
like to see what other peoples experiences are.
Allan
|
|
0
|
|
|
|
Reply
|
abruce (27)
|
2/13/2005 1:28:42 PM |
|
great idea, allan!
i've tested your app on my notebook (1.9 GHz P4, 512 MB, ATI Radeon
Mobility 7500) and it looks like a draw:
default window size:
VBO: 109-130 fps
DL: 112-139 fps
and on a P4 3 GHz, 1 GB and ATI Radeon X800:
default window size:
VBO: 710-795 fps ( down to 690 for a moment)
DL: 1030-1075 fps (down to 940 for a moment)
seems that ATI prefers display lists.
cheers,
david
|
|
0
|
|
|
|
Reply
|
David
|
2/13/2005 2:22:02 PM
|
|
Allan Bruce wrote:
> Machine Spec:
Pentium 4, 2.4 GHz, FSB800
1GB DDR400 RAM
nVidia Geforce 6800 128MB, 70.41 drivers
> (default window size)
VBO: 330 fps
DL: 170 fps
> My conclusion is that Display Lists are faster for my system but I would
> like to see what other peoples experiences are.
Looks like that's why nVidia recommends VBO to game developers...
Malte
|
|
0
|
|
|
|
Reply
|
Malte
|
2/13/2005 4:03:45 PM
|
|
"Allan Bruce" <abruce@TAKEMEAWAY.csd.abdn.ac.uk> wrote in message
news:110ulgdltj9ue2f@corp.supernews.com...
> Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs
and
> added Display Lists drawing for comparison. The new code is available
here:
> http://allanmb.redirectme.net/binaries/VBOs_vs_DL.zip
>
> Can anybody who is interested, and those who believe that Dispay Lists are
> slower than VBOs (or vice versa) please try this out and we can compare
the
> results? See if we can finally determine which is faster in general or on
> specific hardware.
>
> Ok, for my results:
>
> Machine Spec:
> Athlon64 3000+
> 1GB DDR400 RAM
> ATI Radeon 9700XT 256MB
>
> (default window size)
> VBO speed: ~295 fps
> DL speed: ~450 fps
>
> (1600x1200 window)
> VBO speed: ~81 fps
> DL speed: ~88 fps (dips down to 75ps, but mostly >90fps)
>
> My conclusion is that Display Lists are faster for my system but I would
> like to see what other peoples experiences are.
>
> Allan
>
>
Interesting results so far. Looks like the NV 6800 may be better at VBO's
(finally).
I'm always suspicious of benchmarks that rely on extremely high frame rates.
They tend to be dominated by glClear effectiveness & other overheads.
In my world of massive Sci-viz data, we're lucky to get 15 FPS; we really
are dominated
by vertex processing rate, lighting, clipping & etc; not fill rate or buffer
overhead.
And, while generally less germaine to the mostly game/enthusiast oriented
crowd here,
I tend to use Linux a lot more than Window$, so we can handle our 64-bit
data needs,
also, we need the capabilities of pro cards; FIREGL and QUADRO's (esp: 2
sided lighting
in hardware, stereo...).
Nevertheless, I appreciate your effort here, I'll have to provide the
standard caution that
benchmarks only benchmark what you are benchmarking, and may have little
to do with how your application will actually run. I'd really like to see
the test run again
with a considerable amount of geometry, something more representative of
what
most folks are doing with OpenGL.
-jbw
jbw
|
|
0
|
|
|
|
Reply
|
JB
|
2/13/2005 4:50:42 PM
|
|
> Nevertheless, I appreciate your effort here, I'll have to provide the
> standard caution that
> benchmarks only benchmark what you are benchmarking, and may have little
> to do with how your application will actually run. I'd really like to see
> the test run again
> with a considerable amount of geometry, something more representative of
> what
> most folks are doing with OpenGL.
>
You can supply a much larger .bmp file and that will test higher amounts of
geometry...
Allan
|
|
0
|
|
|
|
Reply
|
Allan
|
2/13/2005 5:08:34 PM
|
|
"Allan Bruce" <abruce@TAKEMEAWAY.csd.abdn.ac.uk> wrote in message
news:110ulgdltj9ue2f@corp.supernews.com...
> Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs
> and added Display Lists drawing for comparison. The new code is available
> here:
> http://allanmb.redirectme.net/binaries/VBOs_vs_DL.zip
>
> Can anybody who is interested, and those who believe that Dispay Lists are
> slower than VBOs (or vice versa) please try this out and we can compare
> the results? See if we can finally determine which is faster in general
> or on specific hardware.
>
> Ok, for my results:
>
> Machine Spec:
> Athlon64 3000+
> 1GB DDR400 RAM
> ATI Radeon 9700XT 256MB
>
> (default window size)
> VBO speed: ~295 fps
> DL speed: ~450 fps
>
> (1600x1200 window)
> VBO speed: ~81 fps
> DL speed: ~88 fps (dips down to 75ps, but mostly >90fps)
>
> My conclusion is that Display Lists are faster for my system but I would
> like to see what other peoples experiences are.
>
> Allan
>
Got one of my mates to test out the prog too, his machine is as follows:
P4 2.53GHz
512 MB DDR333
GeForce Ti4600 (66.93)
and his results are:
(default window)
VBO: 500
DL: 190
(1280x1024)
VBO: 230
DL: 190
This is very interesting. The Display List performance wasnt affected by
the size of the window, but VBOs took a big performance hit.
Lets see if we can get a few more results especially on lower end cards if
someon has access?
Allan
|
|
0
|
|
|
|
Reply
|
Allan
|
2/13/2005 5:18:40 PM
|
|
Allan Bruce wrote:
> Got one of my mates to test out the prog too, his machine is as follows:
> P4 2.53GHz
> 512 MB DDR333
> GeForce Ti4600 (66.93)
>
> and his results are:
>
> (default window)
> VBO: 500
> DL: 190
>
> (1280x1024)
> VBO: 230
> DL: 190
>
> This is very interesting. The Display List performance wasnt affected by
> the size of the window, but VBOs took a big performance hit.
Not that interesting if you consider that the size of the window affects
only the fragment processing stage, not the vertex processing. So it
does not matter at all where the vertex data comes from (vbo or display
list). In this case, the fragment stage simply limits the throughput to
230 fps in the full screen case. This is a bottleneck when using VBO,
but since the display list limit is even lower, it doesn't affect that
test case.
Malte
|
|
0
|
|
|
|
Reply
|
Malte
|
2/13/2005 6:55:14 PM
|
|
Allan Bruce wrote:
> Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs and
> added Display Lists drawing for comparison.
>
Ah, if only life were that simple...!
What happens if you use texture... or colors?
What happens if you have indexed arrays?
In that "benchmark" the DL was faster than the VBO
but I've measured VBO as 50% faster than DL on
a radiosity model with a million triangles in it
on the exact same machine.
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
|
|
0
|
|
|
|
Reply
|
fungus
|
2/13/2005 6:56:44 PM
|
|
"fungus" <openglMY@SOCKSartlum.com> wrote in message
news:PpNPd.16603$dr.12558@news.ono.com...
> Allan Bruce wrote:
>> Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs
>> and added Display Lists drawing for comparison.
>
> Ah, if only life were that simple...!
>
> What happens if you use texture... or colors?
>
> What happens if you have indexed arrays?
>
> In that "benchmark" the DL was faster than the VBO
> but I've measured VBO as 50% faster than DL on
> a radiosity model with a million triangles in it
> on the exact same machine.
>
>
The code does use a texture, but not colouring I admit. Do you fancy
posting your results?
Allan
|
|
0
|
|
|
|
Reply
|
Allan
|
2/13/2005 7:31:38 PM
|
|
Allan Bruce wrote:
> "fungus" <openglMY@SOCKSartlum.com> wrote in message
> news:PpNPd.16603$dr.12558@news.ono.com...
>> Allan Bruce wrote:
>>> Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs
>>> and added Display Lists drawing for comparison.
>>
>> Ah, if only life were that simple...!
>>
>> What happens if you use texture... or colors?
>>
>> What happens if you have indexed arrays?
My main objection to this benchmark is the lack of efficient geometry
(triangle strips). However, this would just widen any performance gap by
making per-vertex caching more coherent.
> The code does use a texture, but not colouring I admit. Do you fancy
> posting your results?
> Allan
Regardless, we have objectively shown that display lists are much simpler
and, at least sometimes, as fast or faster than VBOs. This clearly makes
them useful and not just "... a very ancient technique used to send one
integer over the network instead of a lot of GL calls." as Gernot Frisch
said. Sadly, the lack of a display list equivalent in DirectX appears to be
due to similar misconceptions at MS... :-(
Also, all the memory handling involved in VBOs makes it much easier to screw
them up and crash your program. Display lists are, in contrast, simpler and
more elegant.
Having said that, if you need top-notch performance on your nVidia 6800 then
you'll have to put the effort in and use VBOs.
--
Dr Jon D Harrop, Flying Frog Consultancy
|
|
0
|
|
|
|
Reply
|
Jon
|
2/13/2005 7:44:58 PM
|
|
"JB West" <jbwest@NOSPAM_acm.org> wrote in message
news:tIqdnX2JxMLMGJLfRVn-ig@comcast.com...
>
> "Allan Bruce" <abruce@TAKEMEAWAY.csd.abdn.ac.uk> wrote in message
> news:110ulgdltj9ue2f@corp.supernews.com...
>> Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs
> and
>> added Display Lists drawing for comparison. The new code is available
> here:
>> http://allanmb.redirectme.net/binaries/VBOs_vs_DL.zip
>>
>> Can anybody who is interested, and those who believe that Dispay Lists
>> are
>> slower than VBOs (or vice versa) please try this out and we can compare
> the
>> results? See if we can finally determine which is faster in general or
>> on
>> specific hardware.
>>
>> Ok, for my results:
>>
>> Machine Spec:
>> Athlon64 3000+
>> 1GB DDR400 RAM
>> ATI Radeon 9700XT 256MB
>>
>> (default window size)
>> VBO speed: ~295 fps
>> DL speed: ~450 fps
>>
>> (1600x1200 window)
>> VBO speed: ~81 fps
>> DL speed: ~88 fps (dips down to 75ps, but mostly >90fps)
>>
>> My conclusion is that Display Lists are faster for my system but I would
>> like to see what other peoples experiences are.
>>
>> Allan
>>
>>
>
> Interesting results so far. Looks like the NV 6800 may be better at VBO's
> (finally).
>
> I'm always suspicious of benchmarks that rely on extremely high frame
> rates.
> They tend to be dominated by glClear effectiveness & other overheads.
>
> In my world of massive Sci-viz data, we're lucky to get 15 FPS; we really
> are dominated
> by vertex processing rate, lighting, clipping & etc; not fill rate or
> buffer
> overhead.
> And, while generally less germaine to the mostly game/enthusiast oriented
> crowd here,
> I tend to use Linux a lot more than Window$, so we can handle our 64-bit
> data needs,
> also, we need the capabilities of pro cards; FIREGL and QUADRO's (esp: 2
> sided lighting
> in hardware, stereo...).
>
> Nevertheless, I appreciate your effort here, I'll have to provide the
> standard caution that
> benchmarks only benchmark what you are benchmarking, and may have little
> to do with how your application will actually run. I'd really like to see
> the test run again
> with a considerable amount of geometry, something more representative of
> what
> most folks are doing with OpenGL.
>
> -jbw
>
>
> jbw
>
>
I just tested an image 1600x1200 for the height map and the results are as
follows:
(default window)
VBO: ~44fps
DL: ~85fps
(1600x1200)
VBO: 15fps
DL: 24fps
Allan
|
|
0
|
|
|
|
Reply
|
Allan
|
2/13/2005 8:11:49 PM
|
|
Allan Bruce wrote:
<snippit>
> My conclusion is that Display Lists are faster for my system but I
> would like to see what other peoples experiences are.
This benchmark is _very_ crude. I agree with "fungus" that you really
need to write a more thorough test to say _anything_ about the use of
VBO vs. DL.
I find your lack of knowledge concerning what you're actually testing
somewhat disturbing. You can't simply draw a single, randomly chosen,
scene on various configurations and expect the results to give any kind
of meaningful results as to performance in general. And you certainly
can't expect your test of vertex-throughput to make sense, if you are
cpu or fillrate-limited (as is the case when you increase the window-area).
Display Lists and VBO also have very different characteristics. DL's are
optimized on creation, making them quite slow on initialization. VBO on
the other hand, are optimized per design, to enable fast transfer of
geometry to the graphics card. For this benchmark to make sense, this
should be a parameter also. Display-lists yet again, have the ability to
change opengl-state, and apply matrices - which is immensely useful if
you need to do this often.
(actually, I have no idea whether this is hardware accelerated or not...
which would also make for a nice test).
Finally, doing a general geometry-benchmark, testing triangle-strips vs.
individual triangles, different batch-sizes for DL and VBO, and of
course vertex cached versus not, would be quite useful :)
Regards,
\\Mikkel Gjoel
PS. a couple of nvidia-presentations on the subject:
http://download.nvidia.com/developer/presentations/2004/Eurographics/EG_04_IntroductionToGPU.pdf
http://download.nvidia.com/developer/presentations/2004/Eurographics/EG_04_OptimizingGPUPipeline.pdf
http://developer.nvidia.com/docs/IO/8230/BatchBatchBatch.pdf
|
|
0
|
|
|
|
Reply
|
ISO
|
2/13/2005 9:19:35 PM
|
|
Mikkel Gj�l wrote:
> This benchmark is _very_ crude. I agree with "fungus" that you really
> need to write a more thorough test to say _anything_ about the use of
> VBO vs. DL.
Well, I'd say it a bit more positive: After this "crude test" (including
all results on ATI and nVidia cards), we actually know that we can't say
anything, since different systems prefered different techniques. If both
ATI and nVidia showed an advantage of DL over VBO (or vice versa), it
could have lead some of us to a false assumption, but this way, we can
say for sure that there's no simple rule at all :)
Btw, is it possible to use VBO in display lists?
Malte
|
|
0
|
|
|
|
Reply
|
Malte
|
2/13/2005 10:18:24 PM
|
|
Mikkel Gj�l wrote:
> Allan Bruce wrote:
> <snippit>
>> My conclusion is that Display Lists are faster for my system but I
>> would like to see what other peoples experiences are.
>
> This benchmark is _very_ crude.
Yes.
> I agree with "fungus" that you really
> need to write a more thorough test to say _anything_ about the use of
> VBO vs. DL.
No, we can say _something_ about them already. This wasn't supposed to be
the be-all and end-all of VBO vs DL benchmarking.
> I find your lack of knowledge concerning what you're actually testing
> somewhat disturbing.
I think that is both rude and wrong. Allan was simply having a first stab at
answering our question, which he did. In contrast, you are whining and
haven't written a shred of code!
> You can't simply draw a single, randomly chosen,
> scene on various configurations and expect the results to give any kind
> of meaningful results as to performance in general.
This isn't about "in general", this discussion started when someone posted
(again) saying that DLs are archaic and useless. We have shown that this is
definitely not the case, DLs can still be useful. I can't see this
situation changing - DLs are a good idea.
> And you certainly
> can't expect your test of vertex-throughput to make sense, if you are
> cpu or fillrate-limited (as is the case when you increase the
> window-area).
Which is, of course, precisely why he gave both results.
> Display Lists and VBO also have very different characteristics. DL's are
> optimized on creation, making them quite slow on initialization. VBO on
> the other hand, are optimized per design, to enable fast transfer of
> geometry to the graphics card. For this benchmark to make sense, this
> should be a parameter also.
No, for this benchmark to be more thorough it should test more parameters.
It still makes sense without that. This was never about being thorough,
this was about knocking up a simple test to see just how much slower DLs
are at anything at all. Some quantitative information is better than an
infinite number of guesses.
> Display-lists yet again, have the ability to
> change opengl-state, and apply matrices - which is immensely useful if
> you need to do this often.
> (actually, I have no idea whether this is hardware accelerated or not...
> which would also make for a nice test).
I suspect that is infeasible in the general case. Texture maps are probably
handled efficiently by the DL compiler though, and this could be tested...
> Finally, doing a general geometry-benchmark, testing triangle-strips vs.
> individual triangles, different batch-sizes for DL and VBO, and of
> course vertex cached versus not, would be quite useful :)
Then stop complaining and get coding. :-)
> Regards,
> \\Mikkel Gjoel
> PS. a couple of nvidia-presentations on the subject:
> ...
I'm just flicking through these here and they appear to be non-GL specific.
Of course, this means that they are (primarily?) aimed at DirectX which
lacks DLs.
--
Dr Jon D Harrop, Flying Frog Consultancy
|
|
0
|
|
|
|
Reply
|
Jon
|
2/14/2005 1:20:45 AM
|
|
Thanks for putting up an arguement in my case Jon, I thought he was being a
bit rude but didnt want to pipe up!
Anyway just to throw a spanner into the works here, I am at work and have a
GeForce5700LE 256MB here. With this card, Display Lists are faster for
large viewports. Results:
(default window)
VBO: 190fps
DL: 122fps
(1600x1200)
VBO: 65fps
DL: 75fps
I appreciate the fact that this is a crude test. I have an lwo2 viewer that
I have been working on - if I was to add an option to use VBOs or Display
Lists (by a dialog at startup) and supply models with a large number of
vertices, would anyone be interested in trying this out? It uses colouring,
texturing and lighting so would be a better 'real' test.
Allan
|
|
0
|
|
|
|
Reply
|
Allan
|
2/14/2005 10:34:06 AM
|
|
Hey
Just to clarify this: Allan, I think creating this benchmark was a good
thing, and it would be interresting to have it refined to remove the
current limitations.
I'm not really sure I understand why DL/VBO is such a hot topic though.
For raw geometry rendering, I can't imagine DLs being faster than VBOs.
But even so, DLs allow you to do fast testing and easy, checked,
optimizations - also some not available using VBO.
On a sidenode, I would find it immensely logical to be able to make
VBO-calls from a Display List to utilize "geometry instancing"
(</buzzword>). I don't think there should be any server/client-issues,
but I'm aware that there are probably other design-based limitations
that will make this kind of usage impossible. It would still be cool
though ;)
- on with the show:
(this is my final I-said, you-said post)
Jon Harrop wrote:
> Mikkel Gj�l wrote:
>> ...expect the results to give any kind of meaningful results as to
>> performance in general.
>
> This isn't about "in general", this discussion started when someone
> posted (again) saying that DLs are archaic and useless. We have shown
> that this is definitely not the case, DLs can still be useful. I
> can't see this situation changing - DLs are a good idea.
I never argued they weren't. I argued that this test doesn't add
meaningful insight, because the results give a seemingly simple but
potentially confusing answer to a complicated question. In short: It
will only add to the confusion because the results need a great deal of
insight to be interpreted correctly (if possible).
>> And you certainly can't expect your test of vertex-throughput to
>> make sense, if you are cpu or fillrate-limited (as is the case when
>> you increase the window-area).
>
> Which is, of course, precisely why he gave both results.
No, it's probably because Allan doesn't fully understand what is the
limiting factors in the benchmark (see reply about vbo/dl "performance
hit" at different resolutions). I don't blame him, and I don't make any
claims about my own insight on the subject - it's a very complex area.
>> For this benchmark to make sense, this should be a parameter also.
>
> No, for this benchmark to be more thorough it should test more
> parameters.
hehe, so what did I just say? :)
> This was never about being thorough, this was about knocking up a
> simple test to see just how much slower DLs are at anything at all.
> Some quantitative information is better than an infinite number of
> guesses.
Well, at least it has started a discussion about the results, possibly
leading to more insight. I still don't think knocking up a random test
makes sense though.
>> PS. a couple of nvidia-presentations on the subject: ...
>
> I'm just flicking through these here and they appear to be non-GL
> specific.
That's true. My point was mainly to clarify some of the other potential
bottlenecks when doing this kind of rendering. DL/VBO shouldn't really
make a difference in this regard, as they are located the same place in
the pipeline.
I just had a quick look around, and I can't seem to find anything about
hardware-support / performance for Display Lists. I did come across a
very interresting discussion on gl-geometry performance:
http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=009111
Kind regards,
\\Mikkel Gjoel
|
|
0
|
|
|
|
Reply
|
ISO
|
2/14/2005 10:57:53 AM
|
|
Mikkel Gj�l wrote:
> - on with the show:
> (this is my final I-said, you-said post)
There seems to be little point in my replying so I'll be brief:
> I never argued they weren't. I argued that this test doesn't add
> meaningful insight, because the results give a seemingly simple but
> potentially confusing answer to a complicated question. In short: It
> will only add to the confusion because the results need a great deal of
> insight to be interpreted correctly (if possible).
We disagree and have drawn a conclusion.
>> Which is, of course, precisely why he gave both results.
>
> No, it's probably because Allan doesn't fully understand what is the
> limiting factors in the benchmark (see reply about vbo/dl "performance
> hit" at different resolutions). I don't blame him, and I don't make any
> claims about my own insight on the subject - it's a very complex area.
I think this statement is based upon a misunderstanding of what we were
trying to achieve.
>>> For this benchmark to make sense, this should be a parameter also.
>>
>> No, for this benchmark to be more thorough it should test more
>> parameters.
>
> hehe, so what did I just say? :)
"...make sense..." vs "...more thorough...".
>> This was never about being thorough, this was about knocking up a
>> simple test to see just how much slower DLs are at anything at all.
>> Some quantitative information is better than an infinite number of
>> guesses.
>
> Well, at least it has started a discussion about the results, possibly
> leading to more insight. I still don't think knocking up a random test
> makes sense though.
If you think it started with that then you missed the first half of the
conversation, which explains why you have misunderstood what we are trying
to achieve...
>>> PS. a couple of nvidia-presentations on the subject: ...
>>
>> I'm just flicking through these here and they appear to be non-GL
>> specific.
>
> That's true.
It's more than true. I spent an hour reading all three lectures in detail
last night and they are all completely irrelevant. The first is too
"beginner" to be relevant, the last two are entirely DirectX/game
programming.
> My point was mainly to clarify some of the other potential
> bottlenecks when doing this kind of rendering. DL/VBO shouldn't really
> make a difference in this regard, as they are located the same place in
> the pipeline.
I believe that is quite wrong. When geometry bound, DLs could exhibit
various levels of optimisation. In the worst case, they could just exhibit
the performance of vertex arrays, called by the driver. In the best case,
they could be split between VBOs and vertex array ranges, resulting in
better performance (tuned to a given graphics card) than an application
which tried to use VBOs alone would get. Also, DLs are likely to give the
best performance for pre-VBO hardware.
On top of this, as Dls are easier to code than VBOs, so people will still
want to use them when doing small projects, provided the performance isn't
awful (say, within 2x).
Finally, Allan's program is not that different from some of the kinds of
things I'd expect to write. Sure, it doesn't use fragment shaders, but
neither do I!
> I just had a quick look around, and I can't seem to find anything about
> hardware-support / performance for Display Lists. I did come across a
> very interresting discussion on gl-geometry performance:
This is also completely irrelevant twaddle about game programming and
DirectX.
I think you should read the parent thread to get some idea of what we are
talking about here. This has nothing to do with game programming.
--
Dr Jon D Harrop, Flying Frog Consultancy
|
|
0
|
|
|
|
Reply
|
Jon
|
2/14/2005 12:41:15 PM
|
|
Jon Harrop wrote:
> Mikkel Gj�l wrote:
>> (this is my final I-said, you-said post)
I will however reply to relevant, on-topic subjects.
(and I have now read the thread about "wirecube rendering"-gone mad)
>> My point was mainly to clarify some of the other potential
>> bottlenecks when doing this kind of rendering. DL/VBO shouldn't
>> really make a difference in this regard, as they are located the
>> same place in the pipeline.
>
> I believe that is quite wrong. When geometry bound, DLs could exhibit
> various levels of optimisation.
My above statement was probably unclear. This is what I was trying to
communicate: The papers are relevant no matter if they are concerned
with VBO or DL, as they simply give guidelines on how to test what your
program's bottlenecks are. The same goes for DirectX/OpenGL - it doesn't
doesn't really matter in this context, as the subject is graphics-card
performance, not API-performance.
Concerning the benchmark, I guess it would be relevant to test in both
CPU- and geometry-bound cases, as the workload might very well vary
switching between VBO/DL.
> In the best case, they could be split between VBOs and vertex array
> ranges, resulting in better performance (tuned to a given graphics
> card) than an application which tried to use VBOs alone would get.
Just a sidenode: Isn't the VAR-extension dead as a doornail? I believe
any graphics-card supporting VAR should be able to (and does, for
atleast ati/nv) support VBO with about the same speedsups.
> Finally, Allan's program is not that different from some of the kinds
> of things I'd expect to write.
This isn't really relevant to the accuracy of the results. I appreciate
the attempt to show that Display Lists are useful, but for you to be
able to use this knowledge, you need to know when these results apply.
Hence the need for more thorough testing. A general guideline of "if
your program does about the same as Allan's, then use Displaylists" was
probably not what you where trying to establish either.
Again, I don't disagee that Display Lists are helpful(*), only that the
benchmark isn't actually helpful until it does more accurate testing.
(*) (except as an example of "displaylists not being slower, and thus
useful")
Kind Regards,
\\Mikkel Gjoel
|
|
0
|
|
|
|
Reply
|
ISO
|
2/14/2005 1:58:42 PM
|
|
Allan Bruce wrote:
> Anyway just to throw a spanner into the works here, I am at work and have a
> GeForce5700LE 256MB here. With this card, Display Lists are faster for
> large viewports. Results:
>
> (default window)
> VBO: 190fps
> DL: 122fps
>
> (1600x1200)
> VBO: 65fps
> DL: 75fps
>
In the case when you're fill limited they should be
both the same speed...
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
|
|
0
|
|
|
|
Reply
|
fungus
|
2/14/2005 2:00:03 PM
|
|
Mikkel Gj�l wrote:
> Jon Harrop wrote:
>
>> Mikkel Gj�l wrote:
>>
>>> (this is my final I-said, you-said post)
>
>
> I will however reply to relevant, on-topic subjects.
> (and I have now read the thread about "wirecube rendering"-gone mad)
>
[... zzz ...]
Please post some real good benchmark code. So finally we'll know
which one is faster under which circumstance - VBO or DL.
[more blabla "Like I said, I was trying to say, I meant..." removed]
*sigh*
Cheers,
Toni
--
for mail, mirror: ed.lausivksa@elielb
|
|
0
|
|
|
|
Reply
|
Antonio
|
2/14/2005 2:23:34 PM
|
|
"Allan Bruce" <abruce@TAKEMEAWAY.csd.abdn.ac.uk> wrote in message news:<110ulgdltj9ue2f@corp.supernews.com>...
> Can anybody who is interested, and those who believe that Dispay Lists are
> slower than VBOs (or vice versa) please try this out and we can compare the
> results? See if we can finally determine which is faster in general or on
> specific hardware.
>
P4 2.2 512MB
Windows 2000
Quadro 500FX
VBO: ~ 220 FPS
DL: ~ 330 FPS
I use PBO as well and whe it comes to updating the buffer very often
(STREAM_DRAW) PBO owns. Perhaps a test using all possible combinations
would give us better results.
NVIDIA's SDK 8.5 has an example but it only runs on GeForce FX or
above, if you have the time you could change it so that it would run
in any > Geforce 3.
Regards,
wpr
|
|
0
|
|
|
|
Reply
|
mlopes_filho
|
2/14/2005 3:46:14 PM
|
|
[snip]
> >
>
> I just tested an image 1600x1200 for the height map and the results are as
> follows:
>
> (default window)
> VBO: ~44fps
> DL: ~85fps
>
> (1600x1200)
> VBO: 15fps
> DL: 24fps
>
> Allan
>
>
Very Interesting! Quite a reversal.
jbw
|
|
0
|
|
|
|
Reply
|
JB
|
2/14/2005 4:07:20 PM
|
|
Hi all,
I was curious and made some tests as well:
P4 2.2 512MB
Windows 2000
Quadro FX 500
Drivers 56.72
VBO: 220 FPS
DL: 330 FPS
I was a bit disapointed, so I used my own program to load two .PLY meshes
BIG: 327323 vertices 654666 Faces
SMALL: 52227 vertices 102280 Faces
VBO SMALL : 10
DL SMALL: 18
VBO BIG: 10
DL BIG: 5
Pretty slow... I decided to check in a better machine:
Dual Xeon 2.66 512MB
Windows XP SP1
Quadro FX 1100
Drivers 61.76
VBO: 380 FPS
DL: 600 FPS !!!
VBO SMALL: 40 FPS
DL SMALL: 68 FPS
VBO BIG: 38 FPS
DL BIG: 10 FPS
For the BIG mesh VBO does make a diference.
The question is when it starts to make a diference !?
wpr.
|
|
0
|
|
|
|
Reply
|
mlopes_filho
|
2/14/2005 4:39:43 PM
|
|
Malte Clasen wrote:
> Allan Bruce wrote:
>> Machine Spec:
> Pentium 4, 2.4 GHz, FSB800
> 1GB DDR400 RAM
> nVidia Geforce 6800 128MB, 70.41 drivers
>
> VBO: 330 fps
> DL: 170 fps
>
>> My conclusion is that Display Lists are faster for my system but I
>> would like to see what other peoples experiences are.
Actually, it looks like a driver-bug. If you change glCreateList(
GL_COMPILE_AND_EXECUTE ); to glCreateList( GL_COMPILE ); - the DL-speed
hits the roof. It seems the driver compiles the list for each call to
glCallList... ie. for each frame.
Regards,
\\Mikkel Gjoel
|
|
0
|
|
|
|
Reply
|
ISO
|
2/14/2005 6:54:30 PM
|
|
fungus wrote:
>>
>
> In the case when you're fill limited they should be
> both the same speed...
maybe they should but tests shown that dl are faster also on my box,
the difference was few fps.
GeForce 4 mx 420 , athlon 1.7 256 ram
|
|
0
|
|
|
|
Reply
|
Sulsa
|
2/14/2005 11:03:57 PM
|
|
Mikkel Gj�l wrote:
> Malte Clasen wrote:
>> Allan Bruce wrote:
>>> Machine Spec:
>> Pentium 4, 2.4 GHz, FSB800
>> 1GB DDR400 RAM
>> nVidia Geforce 6800 128MB, 70.41 drivers
>>
>> VBO: 330 fps
>> DL: 170 fps
>>
>>> My conclusion is that Display Lists are faster for my system but I
>>> would like to see what other peoples experiences are.
>
> Actually, it looks like a driver-bug. If you change glCreateList(
> GL_COMPILE_AND_EXECUTE ); to glCreateList( GL_COMPILE ); - the DL-speed
> hits the roof. It seems the driver compiles the list for each call to
> glCallList... ie. for each frame.
IIRC, I reported that bug to nVidia quite some time ago. Perhaps if you guys
all report it too...
For my own version of the NeHe lesson 45 demo (255 512-triangle strips but
index array for the VBOs in system memory):
1.2GHz Athlon T-bird
2.4.27-2-k7 Debian (Sarge) Linux
64Mb GeForce 3
768Mb RAM
nVidia 6629 drivers
gcc -pipe -Wall -O2 -Wall -march=athlon-tbird -mmmx -m3dnow -ffast-math
-fno-math-errno -funsafe-math-optimizations -fno-trapping-math
-malign-double -funroll-loops -pipe -fomit-frame-pointer main.cpp error.cpp
lesson45.cpp -o lesson45 -L/usr/X11R6/lib/ -lGL -lGLU `sdl-config --cflags
--libs`
8fps DL: COMPILE_AND_EXECUTE
36fps VBOs
58fps DL: COMPILE
My guess is that the DLs are a win here because they put the index array
onto the graphics card. I believe I need to use GL_ELEMENT_ARRAY_BUFFER but
my nVidia driver doesn't appear to provide this.
Here's my code:
#define NO_VBOS
#define EXECUTE
#include <iostream>
#include <cstdio>
#include <string.h>
#include <GL/gl.h>
#include <GL/glu.h>
#include "main.h"
#ifndef CDS_FULLSCREEN
#define CDS_FULLSCREEN 4
#endif
#ifndef GL_BGR
#define GL_BGR 0x80E0
#endif
#define GL_ARRAY_BUFFER_ARB 0x8892
#define GL_STATIC_DRAW_ARB 0x88E4
typedef void (APIENTRY * PFNGLBINDBUFFERARBPROC)
(GLenum target, GLuint buffer);
typedef void (APIENTRY * PFNGLDELETEBUFFERSARBPROC)
(GLsizei n, const GLuint *buffers);
typedef void (APIENTRY * PFNGLGENBUFFERSARBPROC)
(GLsizei n, GLuint *buffers);
typedef void (APIENTRY * PFNGLBUFFERDATAARBPROC)
(GLenum target, int size, const GLvoid *data, GLenum usage);
PFNGLGENBUFFERSARBPROC glGenBuffersARB = NULL;
PFNGLBINDBUFFERARBPROC glBindBufferARB = NULL;
PFNGLBUFFERDATAARBPROC glBufferDataARB = NULL;
PFNGLDELETEBUFFERSARBPROC glDeleteBuffersARB = NULL;
extern S_AppStatus AppStatus;
int width, height;
class CMesh {
public:
// Mesh Data
GLuint m_nVertexCount;
GLfloat *m_pVertices;
GLfloat *m_pTexCoords;
GLuint *m_pElements;
unsigned int m_nTextureId;
// Vertex Buffer Object Names
unsigned int m_nVBOVertices;
unsigned int m_nVBOTexCoords;
// Temporary Data
SDL_Surface* m_pTextureImage;
public:
CMesh();
~CMesh();
bool LoadHeightmap(char* szPath);
float PtHeight(int nX, int nY);
void BuildVBOs();
};
bool g_fVBOSupported = false;
CMesh* g_pMesh = NULL;
float g_flYRot = 0.0f;
int g_nFPS = 0, g_nFrames = 0;
int g_dwLastFPS = 0;
bool IsExtensionSupported(char* szTargetExtension) {
const unsigned char *pszExtensions = NULL;
const unsigned char *pszStart;
unsigned char *pszWhere, *pszTerminator;
pszWhere = (unsigned char *) strchr( szTargetExtension, ' ' );
if( pszWhere || *szTargetExtension == '\0' )
return false;
pszExtensions = glGetString( GL_EXTENSIONS );
pszStart = pszExtensions;
for(;;)
{
pszWhere = (unsigned char *) strstr((const char *) pszStart,
szTargetExtension);
if(!pszWhere)
break;
pszTerminator = pszWhere + strlen( szTargetExtension );
if(pszWhere == pszStart || *( pszWhere - 1 ) == ' ')
if(*pszTerminator == ' ' || *pszTerminator == '\0')
return true;
pszStart = pszTerminator;
}
return false;
}
bool Initialize (void) {
AppStatus.Visible = true;
AppStatus.MouseFocus = true;
AppStatus.KeyboardFocus = true;
g_pMesh = new CMesh();
if(!g_pMesh->LoadHeightmap("terrain.bmp")) {
Log( "Error Loading Heightmap");
return false;
}
// Check For VBOs Supported
Log("Checking for extensions.....");
#ifndef NO_VBOS
g_fVBOSupported = IsExtensionSupported( "GL_ARB_vertex_buffer_object" );
if( g_fVBOSupported )
{
glGenBuffersARB = (PFNGLGENBUFFERSARBPROC)
SDL_GL_GetProcAddress("glGenBuffersARB");
glBindBufferARB = (PFNGLBINDBUFFERARBPROC)
SDL_GL_GetProcAddress("glBindBufferARB");
glBufferDataARB = (PFNGLBUFFERDATAARBPROC)
SDL_GL_GetProcAddress("glBufferDataARB");
glDeleteBuffersARB = (PFNGLDELETEBUFFERSARBPROC)
SDL_GL_GetProcAddress("glDeleteBuffersARB");
// Load Vertex Data Into The Graphics Card Memory
g_pMesh->BuildVBOs();
}
#else /* NO_VBOS */
g_fVBOSupported = false;
#endif
return true;
}
bool InitGL(SDL_Surface *S) {
glEnable( GL_TEXTURE_2D );
glClearColor (0.0f, 0.0f, 0.0f, 0.5f);
glClearDepth (1.0f);
glDepthFunc (GL_LEQUAL);
glEnable (GL_DEPTH_TEST);
glShadeModel (GL_SMOOTH);
glHint (GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST);
SDL_GL_SetAttribute( SDL_GL_DOUBLEBUFFER, 1 );
return true;
}
void Deinitialize (void) {
if(g_pMesh) delete g_pMesh;
g_pMesh = NULL;
}
void Update (Uint32 milliseconds, Uint8 *Keys) {
g_flYRot += (float) ( milliseconds ) / 1000.0f * 25.0f;
if(Keys)
{
if (Keys [SDLK_ESCAPE] == true) TerminateApplication ();
if (Keys [SDLK_F1] == true) ToggleFullscreen ();
}
}
GLuint displaylist=0;
void coord(int x, int y) {
glTexCoord2fv(g_pMesh->m_pTexCoords + (y*width + x)*2);
glVertex3fv(g_pMesh->m_pVertices + (y*width + x)*3);
}
void Draw (void) {
glClear (GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glLoadIdentity ();
if( SDL_GetTicks() - g_dwLastFPS >= 10000 )
{
g_dwLastFPS = SDL_GetTicks();
g_nFPS = g_nFrames / 10.;
g_nFrames = 0;
char szTitle[256]={0};
sprintf( szTitle, "VBO Tut - %d Triangles, %d FPS",
(width-1)*(height-1)*2, g_nFPS );
if( g_fVBOSupported )
strcat( szTitle, ", Using VBOs" );
else
strcat( szTitle, ", Using a DL" );
SDL_WM_SetCaption(szTitle,NULL);
}
g_nFrames++;
// Move The Camera
glTranslatef( 0.0f, -220.0f, 0.0f );
glRotatef( 10.0f, 1.0f, 0.0f, 0.0f );
glRotatef( g_flYRot, 0.0f, 1.0f, 0.0f );
// Set Pointers To Our Data
if( g_fVBOSupported ) {
// Enable Pointers
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, g_pMesh->m_nVBOVertices );
glVertexPointer(3, GL_FLOAT, 0, (char *) NULL);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, g_pMesh->m_nVBOTexCoords );
glTexCoordPointer(2, GL_FLOAT, 0, (char *) NULL);
// Render
for (int i=0; i<height-1; i++)
glDrawElements(GL_TRIANGLE_STRIP, width*2, GL_UNSIGNED_INT,
g_pMesh->m_pElements + width*2*i);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
} else {
// Render
if (!displaylist) {
displaylist = glGenLists(1);
#ifdef EXECUTE
glNewList(displaylist, GL_COMPILE_AND_EXECUTE);
#else
glNewList(displaylist, GL_COMPILE);
#endif
for (int y=0; y<height-1; y++) {
glBegin(GL_TRIANGLE_STRIP);
for (int x=0; x<width; x++) {
coord(x, y);
coord(x, y+1);
}
glEnd();
}
glEndList();
#ifndef EXECUTE
glCallList(displaylist);
#endif
}
else
glCallList(displaylist);
}
glFlush();
}
CMesh :: CMesh() {
m_pTextureImage = NULL;
m_pVertices = NULL;
m_pTexCoords = NULL;
m_pElements = NULL;
m_nVertexCount = 0;
m_nVBOVertices = m_nVBOTexCoords = m_nTextureId = 0;
}
CMesh :: ~CMesh() {
if(g_fVBOSupported) {
unsigned int nBuffers[3] = { m_nVBOVertices, m_nVBOTexCoords };
glDeleteBuffersARB(2, nBuffers);
}
if (m_pVertices) delete [] m_pVertices;
m_pVertices = NULL;
if (m_pTexCoords) delete [] m_pTexCoords;
m_pTexCoords = NULL;
if (m_pElements) delete [] m_pElements;
m_pElements = NULL;
if (m_nVBOTexCoords) glDeleteBuffersARB(1, &m_nVBOTexCoords);
if (m_nVBOVertices) glDeleteBuffersARB(1, &m_nVBOVertices);
}
bool CMesh :: LoadHeightmap(char* szPath) {
SDL_Surface *surface;
Uint32 rmask, gmask, bmask, amask;
#if SDL_BYTEORDER == SDL_BIG_ENDIAN
rmask = 0xff000000;
gmask = 0x00ff0000;
bmask = 0x0000ff00;
amask = 0x00000000;
#else
rmask = 0x000000ff;
gmask = 0x0000ff00;
bmask = 0x00ff0000;
amask = 0x00000000;
#endif
// Load Texture Data
m_pTextureImage = SDL_LoadBMP(szPath);
width = m_pTextureImage->w;
height = m_pTextureImage->h;
surface = SDL_CreateRGBSurface(SDL_SWSURFACE, width, height, 24,
rmask, gmask, bmask, amask);
m_pTextureImage = SDL_ConvertSurface(m_pTextureImage, surface->format ,
SDL_SWSURFACE );
// Generate Vertex Field
m_nVertexCount = (int) (width * height);
m_pVertices = new GLfloat[m_nVertexCount*3];
m_pTexCoords = new GLfloat[m_nVertexCount*2];
for (int y=0; y<height; y++)
for (int x=0; x<width; x++) {
m_pVertices[(y*width + x)*3 + 0] = x - width/2;
m_pVertices[(y*width + x)*3 + 1] = PtHeight(x, y);
m_pVertices[(y*width + x)*3 + 2] = y - height/2;
m_pTexCoords[(y*width + x)*2 + 0] = GLfloat(x) / width;
m_pTexCoords[(y*width + x)*2 + 1] = GLfloat(y) / height;
}
m_pElements = new GLuint[width*(height - 1)*2];
{
GLuint *it = m_pElements;
for (int y=0; y<height-1; y++)
for (int x=0; x<width; x++) {
*(it++) = y*width + x;
*(it++) = (y + 1)*width + x;
}
}
// Load The Texture Into OpenGL
glGenTextures(1, &m_nTextureId);
glBindTexture(GL_TEXTURE_2D, m_nTextureId);
glTexImage2D(GL_TEXTURE_2D, 0, 3, width, height, 0,
GL_RGB, GL_UNSIGNED_BYTE, m_pTextureImage->pixels);
glTexParameteri(GL_TEXTURE_2D,GL_TEXTURE_MIN_FILTER,GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER,GL_LINEAR);
// Free The Texture Data
if(m_pTextureImage) SDL_FreeSurface(m_pTextureImage);
if(surface) SDL_FreeSurface(surface);
return true;
}
float CMesh :: PtHeight(int nX, int nY) {
SDL_Color color; // Used to store R,G,B components of pixel
Uint32 col=0; // Temporary pixel value storage
char* offset = (char *)(m_pTextureImage->pixels);
offset += m_pTextureImage->pitch * nY;
offset += m_pTextureImage->format->BytesPerPixel * nX;
memcpy(&col, offset, m_pTextureImage->format->BytesPerPixel);
SDL_GetRGB(col, m_pTextureImage->format, &color.r, &color.g, &color.b);
return 0.299f * color.r + 0.587f * color.g + 0.114f * color.b;
}
void CMesh :: BuildVBOs() {
glGenBuffersARB(1, &m_nVBOVertices);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, m_nVBOVertices);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, m_nVertexCount*3*sizeof(GLfloat),
m_pVertices, GL_STATIC_DRAW_ARB);
glGenBuffersARB(1, &m_nVBOTexCoords);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, m_nVBOTexCoords);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, m_nVertexCount*2*sizeof(GLfloat),
m_pTexCoords, GL_STATIC_DRAW_ARB);
delete [] m_pVertices; m_pVertices = NULL;
delete [] m_pTexCoords; m_pTexCoords = NULL;
}
--
Dr Jon D Harrop, Flying Frog Consultancy
|
|
0
|
|
|
|
Reply
|
Jon
|
2/14/2005 11:05:32 PM
|
|
Antonio Bleile wrote:
> Please post some real good benchmark code. So finally we'll know
> which one is faster under which circumstance - VBO or DL.
>
> [more blabla "Like I said, I was trying to say, I meant..." removed]
>
very wise answer
|
|
0
|
|
|
|
Reply
|
Sulsa
|
2/14/2005 11:11:28 PM
|
|
Sulsa wrote:
> fungus wrote:
>
>> In the case when you're fill limited they should be
>> both the same speed...
>
>
> maybe they should but tests shown that dl are faster also on my box,
> the difference was few fps.
>
> GeForce 4 mx 420 , athlon 1.7 256 ram
I've also noticed that the frame rate varies a *lot*
as the model rotates. I get between 550fps and 800fps
depending on the rotation...
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
|
|
0
|
|
|
|
Reply
|
fungus
|
2/14/2005 11:33:16 PM
|
|
I noticed that the "benchmark" used two separate
VBOs, one for vertex coordinates and one for
texture coordinates.
I just combined them into a single VBO and gained
about 100fps on my machine.
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
|
|
0
|
|
|
|
Reply
|
fungus
|
2/14/2005 11:47:16 PM
|
|
fungus wrote:
> I've also noticed that the frame rate varies a *lot*
> as the model rotates. I get between 550fps and 800fps
> depending on the rotation...
The variation for me is within 4%.
--
Dr Jon D Harrop, Flying Frog Consultancy
|
|
0
|
|
|
|
Reply
|
Jon
|
2/15/2005 1:15:38 AM
|
|
fungus wrote:
> I noticed that the "benchmark" used two separate
> VBOs, one for vertex coordinates and one for
> texture coordinates.
>
> I just combined them into a single VBO and gained
> about 100fps on my machine.
Which is sort of wierd, as I could swear nvidia stated that interleaved
arrays did not result in performance-gains (of course, you might not be
testing on an nvidia card). Using separate VBO's for each
vertex-attribute is probably the regular case though.
Actually, you should disable depth-tests, depth-writes, texture,
lighting, and stop clearing the color and depth-buffers too.
Yeah I know... write a better one. I actually sort of did, but I get
strange performance on nv40 (and I use s crappy timer - probably
related). I'll finish it tomorrow, but if someone wants to take a look
at it:
http://www.userwebs.dk/gjoel/geom_bench.zip
http://www.userwebs.dk/gjoel/geom_bench_src.zip
- it writes output to a textfile. I added a couple of testresults to a
textfile in the zip. What is really needed in this one, is test of
cpu-usage. It assumes the card has a 10-entry vertex-cache (matching
geforceFX cards)... which favors some cards to others, but shouldn't
affect VBO/DL comparisont.
Regards,
\\Mikkel Gjoel
|
|
0
|
|
|
|
Reply
|
ISO
|
2/15/2005 1:17:44 AM
|
|
fungus wrote:
> I noticed that the "benchmark" used two separate
> VBOs, one for vertex coordinates and one for
> texture coordinates.
>
> I just combined them into a single VBO and gained
> about 100fps on my machine.
>
I just made the VBO use an index array
(ie. glDrawElements()) and gained another
100fps or so. Looks like indexed arrays are
much faster than non-indexed on my Radeon x800.
I also changed the primitive type to GL_POINTS
to avoid pixel fill overhead. Now there's not
really any variation as the terrain rotates.
I also increased the number of vertices quite
a bit because I was getting over 1200fps and
I don't think GetTickCount() is that accurate...
After these simple tweaks it showed display
lists running about 8% faster than VBOs. If
you want the tweaked code it's here:
http://www.artlum.com/Lesson45.cpp
Still...I've got a "real" model with 800,000
triangles running here in my game engine. I
just got 175fps* when I render it with VBOs
and only about 75fps with display lists.
What's the difference between the little
benchmark and the "real" program? I don't
know, but I'm sticking with VBOs....
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
[*] Which is actually quite impressive! I only
got 95fps last time I loaded that model (a couple
of driver versions/program tweaks ago).
175fps is about 140 million real, antialiased
triangles/sec...
Eat your heart out all those people who paid
$$millions for a RealityMonster a few years
ago!!
.....and this is only a normally-clocked x800 "Pro"
- not an overclocked Platinum XT GTi Turbo-nutter
model or anything fancy.
|
|
0
|
|
|
|
Reply
|
fungus
|
2/15/2005 1:18:51 AM
|
|
Jon Harrop wrote:
> fungus wrote:
>
>>I've also noticed that the frame rate varies a *lot*
>>as the model rotates. I get between 550fps and 800fps
>>depending on the rotation...
>
>
> The variation for me is within 4%.
>
The difference must be in the pixel filling so it
will depend on your graphics card. I changed from
triangles to GL_POINTS and the variation vanished
- see my other post.
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
|
|
0
|
|
|
|
Reply
|
fungus
|
2/15/2005 1:20:31 AM
|
|
Mikkel Gj�l wrote:
>> I just combined them into a single VBO and gained
>> about 100fps on my machine.
>
>
> Which is sort of wierd, as I could swear nvidia stated that interleaved
> arrays did not result in performance-gains (of course, you might not be
> testing on an nvidia card). Using separate VBO's for each
> vertex-attribute is probably the regular case though.
>
I didn't interleave them I just put them into the
same VBO.
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
|
|
0
|
|
|
|
Reply
|
fungus
|
2/15/2005 1:24:33 AM
|
|
fungus wrote:
> The difference must be in the pixel filling so it
> will depend on your graphics card. I changed from
> triangles to GL_POINTS and the variation vanished
> - see my other post.
I think it is because you have a sweet graphics card and I do not. Also, my
frame rates are about 0.1x everyone else's. )-;
--
Dr Jon D Harrop, Flying Frog Consultancy
|
|
0
|
|
|
|
Reply
|
Jon
|
2/15/2005 1:26:25 AM
|
|
fungus wrote:
> Mikkel Gj�l wrote:
>>> I just combined them into a single VBO and gained
>>> about 100fps on my machine.
>>>
>> Which is sort of wierd, as I could swear nvidia stated that
>> interleaved arrays did not result in performance-gains
>>
> I didn't interleave them I just put them into the
> same VBO.
Hmm, ok. Seems to me the program is cpu-limited - did this difference
hold when you added more points?
Regards,
\\Mikkel Gjoel
|
|
0
|
|
|
|
Reply
|
ISO
|
2/15/2005 1:31:33 AM
|
|
fungus wrote:
> I noticed that the "benchmark" used two separate
> VBOs, one for vertex coordinates and one for
> texture coordinates.
>
> I just combined them into a single VBO and gained
> about 100fps on my machine.
I just did that (non-interleaved, of course ;-) and got 0fps more. :-)
--
Dr Jon D Harrop, Flying Frog Consultancy
|
|
0
|
|
|
|
Reply
|
Jon
|
2/15/2005 1:33:57 AM
|
|
fungus wrote:
> I just made the VBO use an index array
> (ie. glDrawElements()) and gained another
> 100fps or so. Looks like indexed arrays are
> much faster than non-indexed on my Radeon x800.
I've been doing that from the start, so I can't say (and I can't be bothered
to make it less efficient ;-).
Is your index array in the VBO?
> I also changed the primitive type to GL_POINTS
> to avoid pixel fill overhead. Now there's not
> really any variation as the terrain rotates.
I just used GL_POINTS and got _exactly_ the same framerate. I think my setup
is totally geo-bound.
> I also increased the number of vertices quite
> a bit because I was getting over 1200fps and
> I don't think GetTickCount() is that accurate...
I could plot vertices vs frame-rate but I can't be bothered ATM. Maybe
later.
> Still...I've got a "real" model with 800,000
> triangles running here in my game engine. I
> just got 175fps* when I render it with VBOs
> and only about 75fps with display lists.
It would be interesting to see. How often (if ever) are your display lists
recompiled?
> What's the difference between the little
> benchmark and the "real" program? I don't
> know, but I'm sticking with VBOs....
Yes, I'm sticking with DLs. :-)
Still, this is very interesting, IMHO.
--
Dr Jon D Harrop, Flying Frog Consultancy
|
|
0
|
|
|
|
Reply
|
Jon
|
2/15/2005 1:38:35 AM
|
|
Jon Harrop wrote:
> I just used GL_POINTS and got _exactly_ the same framerate. I think my
> setup is totally geo-bound.
Yeah, check this out:
1600x1200
25fps VBO
36fps DL
That's only 30% slower than at 640x480.
Put another way, VBOs are as fast in 640x480 as DLs are at 1600x1200. Man,
VBOs suck. ;-)
--
Dr Jon D Harrop, Flying Frog Consultancy
|
|
0
|
|
|
|
Reply
|
Jon
|
2/15/2005 1:48:44 AM
|
|
Jon Harrop wrote:
> 1600x1200
> 25fps VBO
> 36fps DL
>
> That's only 30% slower than at 640x480.
>
???
We're trying to measure geometry rate here, not the
speed of glClear()...!
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
|
|
0
|
|
|
|
Reply
|
fungus
|
2/15/2005 2:32:04 AM
|
|
Jon Harrop wrote:
>
> Is your index array in the VBO?
>
Nope. I made two buffers, one for vertices and
one for indices.
> I just used GL_POINTS and got _exactly_ the same framerate. I think my setup
> is totally geo-bound.
>
Could be. What graphics card do you have?
>>I get 175fps* when I render it with VBOs
>>and only about 75fps with display lists.
>
> How often (if ever) are your display lists
> recompiled?
>
Um, once...
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
|
|
0
|
|
|
|
Reply
|
fungus
|
2/15/2005 2:34:30 AM
|
|
Mikkel Gj�l wrote:
> fungus wrote:
>
>>>> I just combined them into a single VBO and gained
>>>> about 100fps on my machine.
>
>
> did this difference
> hold when you added more points?
>
Yes....there's about 20% difference between indexed
and non-indexed arrays (indexed is faster).
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
|
|
0
|
|
|
|
Reply
|
fungus
|
2/15/2005 2:36:20 AM
|
|
fungus wrote:
> Jon Harrop wrote:
>>
>> Is your index array in the VBO?
>>
>
> Nope. I made two buffers, one for vertices and
> one for indices.
Are they both VBOs or is the index array in system memory?
>> I just used GL_POINTS and got _exactly_ the same framerate. I think my
>> setup is totally geo-bound.
>>
>
> Could be. What graphics card do you have?
GeForce 3.
>>>I get 175fps* when I render it with VBOs
>>>and only about 75fps with display lists.
>>
>> How often (if ever) are your display lists
>> recompiled?
>
> Um, once...
Hmm, that's interesting...
--
Dr Jon D Harrop, Flying Frog Consultancy
|
|
0
|
|
|
|
Reply
|
Jon
|
2/15/2005 2:41:11 AM
|
|
fungus wrote:
> Jon Harrop wrote:
>> 1600x1200
>> 25fps VBO
>> 36fps DL
>>
>> That's only 30% slower than at 640x480.
>>
>
> ???
>
> We're trying to measure geometry rate here, not the
> speed of glClear()...!
I think this _is_ the geometry rate (certainly at 640x480) but it is still a
surprisingly big factor at 1600x1200.
--
Dr Jon D Harrop, Flying Frog Consultancy
|
|
0
|
|
|
|
Reply
|
Jon
|
2/15/2005 2:43:10 AM
|
|
Jon Harrop wrote:
> fungus wrote:
>
>>I made two buffers, one for vertices and
>>one for indices.
>
>
> Are they both VBOs or is the index array in system memory?
>
Both VBOs.
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
|
|
0
|
|
|
|
Reply
|
fungus
|
2/15/2005 2:49:07 AM
|
|
fungus wrote:
>[cuted]
on my machine dl are five times faster than vbo's(1 fps)
|
|
0
|
|
|
|
Reply
|
Sulsa
|
2/15/2005 3:23:18 AM
|
|
Sulsa wrote:
> fungus wrote:
> >[cuted]
>
>
> on my machine dl are five times faster than vbo's(1 fps)
What's your "machine"...?
For this to be useful it's nice to see what graphics
card you have.
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
|
|
0
|
|
|
|
Reply
|
fungus
|
2/15/2005 11:26:37 AM
|
|
>
> Ah, if only life were that simple...!
>
> What happens if you use texture... or colors?
>
> What happens if you have indexed arrays?
>
> In that "benchmark" the DL was faster than the VBO
> but I've measured VBO as 50% faster than DL on
> a radiosity model with a million triangles in it
> on the exact same machine.
Totally aggree. If you set the
#define MESH_RESOLUTION 1.0f // Pixels Per Vertex
then things start to look different. The sppedup here is mainly,
because the 6 calls to initialize the VBO buffers take longer than the
one call to the display list. If triangle count goes up, you see which
method is usually faster internally.
So - what do we do now? Do some work before each game start and check
what's fastest on that PC?
My opinion: Don't use DL's at all, so driver developers will stop
working on them - They are a real PITA to program, I bet.
More: What if you include the VBO's _in_ the Display list - have you
checked that already? (Have no time to test...)
-Gernot
|
|
0
|
|
|
|
Reply
|
Gernot
|
2/15/2005 1:31:20 PM
|
|
Gernot Frisch wrote:
> So - what do we do now? Do some work before each game start and check
> what's fastest on that PC?
No, you do as the hardware-vendors tell you to :) - you use VBOs when
the data is dynamic, when upload-speed counts, and display-lists when
you feel like testing something new and fancy out.
> My opinion: Don't use DL's at all, so driver developers will stop
> working on them - They are a real PITA to program, I bet.
But they are a piecacake to use, so I would be sad to see them leave.
> More: What if you include the VBO's _in_ the Display list - have you
> checked that already? (Have no time to test...)
Sadly not possible. This would be an intuitive interface to "instancing"
though, but I guess there are design-based limits that prohibit this usage.
Kind regards,
\\Mikkel Gjoel
|
|
0
|
|
|
|
Reply
|
ISO
|
2/15/2005 1:48:25 PM
|
|
Gernot Frisch wrote:
>
> Totally aggree. If you set the
> #define MESH_RESOLUTION 1.0f // Pixels Per Vertex
>
> then things start to look different. The sppedup here is mainly,
> because the 6 calls to initialize the VBO buffers take longer than the
> one call to the display list.
>
Yep. I combined the separate vertex/texcoord buffers
into a single buffer and gained 100fps (it went from
600 to 700).
> So - what do we do now? Do some work before each game start and check
> what's fastest on that PC?
>
That's the only real answer.
> My opinion: Don't use DL's at all, so driver developers will stop
> working on them - They are a real PITA to program, I bet.
>
I've had more driver bugs with display lists than
just about anything else. All my programs have an
option "Don't use display lists" and many users have
said that disabling them off fixes their problems.
VBOs are a much more logical way of doing things
from all points of view (hardware/driver/program).
I think this is why the video card manufacturers
are pushing them so hard.
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
|
|
0
|
|
|
|
Reply
|
fungus
|
2/15/2005 2:16:03 PM
|
|
fungus wrote:
> Sulsa wrote:
>
>> fungus wrote:
>> >[cuted]
>>
>>
>> on my machine dl are five times faster than vbo's(1 fps)
>
>
> What's your "machine"...?
>
> For this to be useful it's nice to see what graphics
> card you have.
>
I wrote that i have athlon 1.7 256ram GF4 MX420.
|
|
0
|
|
|
|
Reply
|
Sulsa
|
2/15/2005 9:26:58 PM
|
|
Jon Harrop wrote:
> nVidia 6629 drivers
>
> My guess is that the DLs are a win here because they put the index array
> onto the graphics card. I believe I need to use GL_ELEMENT_ARRAY_BUFFER
> but my nVidia driver doesn't appear to provide this.
It does. I'm using it without a problem in my program with the same driver
version, but on a GeForce FX5200. Maybe your card doesn't support it?
|
|
0
|
|
|
|
Reply
|
Rolf
|
2/16/2005 11:49:53 PM
|
|
Rolf Magnus wrote:
> Jon Harrop wrote:
>>nVidia 6629 drivers
>>My guess is that the DLs are a win here because they put the index array
>>onto the graphics card. I believe I need to use GL_ELEMENT_ARRAY_BUFFER
>>but my nVidia driver doesn't appear to provide this.
>
> It does. I'm using it without a problem in my program with the same driver
> version, but on a GeForce FX5200. Maybe your card doesn't support it?
I have it running on a geforce2mx in linux. It runs on everything and
the kitchen sink! It might even run on geforce440mx... though slowly of
course... :/
Regards,
\\Mikkel Gjoel
|
|
0
|
|
|
|
Reply
|
ISO
|
2/17/2005 2:16:28 AM
|
|
|
52 Replies
431 Views
(page loaded in 1.428 seconds)
|