VBOs vs Display Lists - lets test it out!

  • Follow


Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs and 
added Display Lists drawing for comparison.  The new code is available here:
http://allanmb.redirectme.net/binaries/VBOs_vs_DL.zip

Can anybody who is interested, and those who believe that Dispay Lists are 
slower than VBOs (or vice versa) please try this out and we can compare the 
results?  See if we can finally determine which is faster in general or on 
specific hardware.

Ok, for my results:

Machine Spec:
Athlon64 3000+
1GB DDR400 RAM
ATI Radeon 9700XT 256MB

(default window size)
VBO speed:  ~295 fps
DL speed:  ~450 fps

(1600x1200 window)
VBO speed:  ~81 fps
DL speed:  ~88 fps (dips down to 75ps, but mostly >90fps)

My conclusion is that Display Lists are faster for my system but I would 
like to see what other peoples experiences are.

Allan 


0
Reply abruce (27) 2/13/2005 1:28:42 PM

great idea, allan!
i've tested your app on my notebook (1.9 GHz P4, 512 MB, ATI Radeon 
Mobility 7500) and it looks like a draw:

default window size:
VBO: 109-130 fps
DL: 112-139 fps

and on a P4 3 GHz, 1 GB and ATI Radeon X800:

default window size:
VBO: 710-795 fps ( down to 690 for a moment)
DL: 1030-1075 fps (down to 940 for a moment)

seems that ATI prefers display lists.

cheers,

david
0
Reply David 2/13/2005 2:22:02 PM


Allan Bruce wrote:
> Machine Spec:
Pentium 4, 2.4 GHz, FSB800
1GB DDR400 RAM
nVidia Geforce 6800 128MB, 70.41 drivers

> (default window size)
VBO: 330 fps
DL: 170 fps

> My conclusion is that Display Lists are faster for my system but I would 
> like to see what other peoples experiences are.

Looks like that's why nVidia recommends VBO to game developers...


Malte
0
Reply Malte 2/13/2005 4:03:45 PM

"Allan Bruce" <abruce@TAKEMEAWAY.csd.abdn.ac.uk> wrote in message
news:110ulgdltj9ue2f@corp.supernews.com...
> Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs
and
> added Display Lists drawing for comparison.  The new code is available
here:
> http://allanmb.redirectme.net/binaries/VBOs_vs_DL.zip
>
> Can anybody who is interested, and those who believe that Dispay Lists are
> slower than VBOs (or vice versa) please try this out and we can compare
the
> results?  See if we can finally determine which is faster in general or on
> specific hardware.
>
> Ok, for my results:
>
> Machine Spec:
> Athlon64 3000+
> 1GB DDR400 RAM
> ATI Radeon 9700XT 256MB
>
> (default window size)
> VBO speed:  ~295 fps
> DL speed:  ~450 fps
>
> (1600x1200 window)
> VBO speed:  ~81 fps
> DL speed:  ~88 fps (dips down to 75ps, but mostly >90fps)
>
> My conclusion is that Display Lists are faster for my system but I would
> like to see what other peoples experiences are.
>
> Allan
>
>

Interesting results so far. Looks like the NV 6800 may be better at VBO's
(finally).

I'm always suspicious of benchmarks that rely on extremely high frame rates.
They tend to be dominated by glClear effectiveness & other overheads.

In my world of massive Sci-viz data, we're lucky to get 15 FPS; we really
are dominated
by vertex processing rate, lighting, clipping & etc; not fill rate or buffer
overhead.
And, while generally less germaine to the mostly game/enthusiast oriented
crowd here,
I tend to use Linux a lot more than Window$, so we can handle our 64-bit
data needs,
also, we need the capabilities of pro cards; FIREGL and QUADRO's (esp: 2
sided lighting
in hardware, stereo...).

Nevertheless, I appreciate your effort here, I'll have to provide the
standard caution that
benchmarks only benchmark what you are benchmarking, and may have little
to do with how your application will actually run. I'd really like to see
the test run again
with a considerable amount of geometry, something more representative of
what
most folks are doing with OpenGL.

-jbw


jbw


0
Reply JB 2/13/2005 4:50:42 PM

> Nevertheless, I appreciate your effort here, I'll have to provide the
> standard caution that
> benchmarks only benchmark what you are benchmarking, and may have little
> to do with how your application will actually run. I'd really like to see
> the test run again
> with a considerable amount of geometry, something more representative of
> what
> most folks are doing with OpenGL.
>

You can supply a much larger .bmp file and that will test higher amounts of 
geometry...
Allan


0
Reply Allan 2/13/2005 5:08:34 PM

"Allan Bruce" <abruce@TAKEMEAWAY.csd.abdn.ac.uk> wrote in message 
news:110ulgdltj9ue2f@corp.supernews.com...
> Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs 
> and added Display Lists drawing for comparison.  The new code is available 
> here:
> http://allanmb.redirectme.net/binaries/VBOs_vs_DL.zip
>
> Can anybody who is interested, and those who believe that Dispay Lists are 
> slower than VBOs (or vice versa) please try this out and we can compare 
> the results?  See if we can finally determine which is faster in general 
> or on specific hardware.
>
> Ok, for my results:
>
> Machine Spec:
> Athlon64 3000+
> 1GB DDR400 RAM
> ATI Radeon 9700XT 256MB
>
> (default window size)
> VBO speed:  ~295 fps
> DL speed:  ~450 fps
>
> (1600x1200 window)
> VBO speed:  ~81 fps
> DL speed:  ~88 fps (dips down to 75ps, but mostly >90fps)
>
> My conclusion is that Display Lists are faster for my system but I would 
> like to see what other peoples experiences are.
>
> Allan
>

Got one of my mates to test out the prog too, his machine is as follows:
P4 2.53GHz
512 MB DDR333
GeForce Ti4600 (66.93)

and his results are:

(default window)
VBO: 500
DL:  190

(1280x1024)
VBO: 230
DL:  190

This is very interesting.  The Display List performance wasnt affected by 
the size of the window, but VBOs took a big performance hit.
Lets see if we can get a few more results especially on lower end cards if 
someon has access?

Allan 


0
Reply Allan 2/13/2005 5:18:40 PM

Allan Bruce wrote:
> Got one of my mates to test out the prog too, his machine is as follows:
> P4 2.53GHz
> 512 MB DDR333
> GeForce Ti4600 (66.93)
> 
> and his results are:
> 
> (default window)
> VBO: 500
> DL:  190
> 
> (1280x1024)
> VBO: 230
> DL:  190
> 
> This is very interesting.  The Display List performance wasnt affected by 
> the size of the window, but VBOs took a big performance hit.

Not that interesting if you consider that the size of the window affects 
only the fragment processing stage, not the vertex processing. So it 
does not matter at all where the vertex data comes from (vbo or display 
list). In this case, the fragment stage simply limits the throughput to 
230 fps in the full screen case. This is a bottleneck when using VBO, 
but since the display list limit is even lower, it doesn't affect that 
test case.


Malte
0
Reply Malte 2/13/2005 6:55:14 PM

Allan Bruce wrote:
> Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs and 
> added Display Lists drawing for comparison.  
> 

Ah, if only life were that simple...!

What happens if you use texture... or colors?

What happens if you have indexed arrays?

In that "benchmark" the DL was faster than the VBO
but I've measured VBO as 50% faster than DL on
a radiosity model with a million triangles in it
on the exact same machine.


-- 
<\___/>
/ O O \
\_____/  FTB.    For email, remove my socks.
0
Reply fungus 2/13/2005 6:56:44 PM

"fungus" <openglMY@SOCKSartlum.com> wrote in message 
news:PpNPd.16603$dr.12558@news.ono.com...
> Allan Bruce wrote:
>> Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs 
>> and added Display Lists drawing for comparison.
>
> Ah, if only life were that simple...!
>
> What happens if you use texture... or colors?
>
> What happens if you have indexed arrays?
>
> In that "benchmark" the DL was faster than the VBO
> but I've measured VBO as 50% faster than DL on
> a radiosity model with a million triangles in it
> on the exact same machine.
>
>

The code does use a texture, but not colouring I admit.  Do you fancy 
posting your results?
Allan


0
Reply Allan 2/13/2005 7:31:38 PM

Allan Bruce wrote:
> "fungus" <openglMY@SOCKSartlum.com> wrote in message
> news:PpNPd.16603$dr.12558@news.ono.com...
>> Allan Bruce wrote:
>>> Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs
>>> and added Display Lists drawing for comparison.
>>
>> Ah, if only life were that simple...!
>>
>> What happens if you use texture... or colors?
>>
>> What happens if you have indexed arrays?

My main objection to this benchmark is the lack of efficient geometry
(triangle strips). However, this would just widen any performance gap by
making per-vertex caching more coherent.

> The code does use a texture, but not colouring I admit.  Do you fancy
> posting your results?
> Allan

Regardless, we have objectively shown that display lists are much simpler
and, at least sometimes, as fast or faster than VBOs. This clearly makes
them useful and not just "... a very ancient technique used to send one
integer over the network instead of a lot of GL calls." as Gernot Frisch
said. Sadly, the lack of a display list equivalent in DirectX appears to be
due to similar misconceptions at MS... :-(

Also, all the memory handling involved in VBOs makes it much easier to screw
them up and crash your program. Display lists are, in contrast, simpler and
more elegant.

Having said that, if you need top-notch performance on your nVidia 6800 then
you'll have to put the effort in and use VBOs.

-- 
Dr Jon D Harrop, Flying Frog Consultancy

0
Reply Jon 2/13/2005 7:44:58 PM

"JB West" <jbwest@NOSPAM_acm.org> wrote in message 
news:tIqdnX2JxMLMGJLfRVn-ig@comcast.com...
>
> "Allan Bruce" <abruce@TAKEMEAWAY.csd.abdn.ac.uk> wrote in message
> news:110ulgdltj9ue2f@corp.supernews.com...
>> Ok, I have modified the tutorial code found at nehe.gamedev.net for VBOs
> and
>> added Display Lists drawing for comparison.  The new code is available
> here:
>> http://allanmb.redirectme.net/binaries/VBOs_vs_DL.zip
>>
>> Can anybody who is interested, and those who believe that Dispay Lists 
>> are
>> slower than VBOs (or vice versa) please try this out and we can compare
> the
>> results?  See if we can finally determine which is faster in general or 
>> on
>> specific hardware.
>>
>> Ok, for my results:
>>
>> Machine Spec:
>> Athlon64 3000+
>> 1GB DDR400 RAM
>> ATI Radeon 9700XT 256MB
>>
>> (default window size)
>> VBO speed:  ~295 fps
>> DL speed:  ~450 fps
>>
>> (1600x1200 window)
>> VBO speed:  ~81 fps
>> DL speed:  ~88 fps (dips down to 75ps, but mostly >90fps)
>>
>> My conclusion is that Display Lists are faster for my system but I would
>> like to see what other peoples experiences are.
>>
>> Allan
>>
>>
>
> Interesting results so far. Looks like the NV 6800 may be better at VBO's
> (finally).
>
> I'm always suspicious of benchmarks that rely on extremely high frame 
> rates.
> They tend to be dominated by glClear effectiveness & other overheads.
>
> In my world of massive Sci-viz data, we're lucky to get 15 FPS; we really
> are dominated
> by vertex processing rate, lighting, clipping & etc; not fill rate or 
> buffer
> overhead.
> And, while generally less germaine to the mostly game/enthusiast oriented
> crowd here,
> I tend to use Linux a lot more than Window$, so we can handle our 64-bit
> data needs,
> also, we need the capabilities of pro cards; FIREGL and QUADRO's (esp: 2
> sided lighting
> in hardware, stereo...).
>
> Nevertheless, I appreciate your effort here, I'll have to provide the
> standard caution that
> benchmarks only benchmark what you are benchmarking, and may have little
> to do with how your application will actually run. I'd really like to see
> the test run again
> with a considerable amount of geometry, something more representative of
> what
> most folks are doing with OpenGL.
>
> -jbw
>
>
> jbw
>
>

I just tested an image 1600x1200 for the height map and the results are as 
follows:

(default window)
VBO: ~44fps
DL:  ~85fps

(1600x1200)
VBO: 15fps
DL:  24fps

Allan 


0
Reply Allan 2/13/2005 8:11:49 PM

Allan Bruce wrote:
<snippit>
> My conclusion is that Display Lists are faster for my system but I
> would like to see what other peoples experiences are.

This benchmark is _very_ crude. I agree with "fungus" that you really 
need to write a more thorough test to say _anything_ about the use of 
VBO vs. DL.

I find your lack of knowledge concerning what you're actually testing 
somewhat disturbing. You can't simply draw a single, randomly chosen, 
scene on various configurations and expect the results to give any kind 
of meaningful results as to performance in general. And you certainly 
can't expect your test of vertex-throughput to make sense, if you are 
cpu or fillrate-limited (as is the case when you increase the window-area).

Display Lists and VBO also have very different characteristics. DL's are 
optimized on creation, making them quite slow on initialization. VBO on 
the other hand, are optimized per design, to enable fast transfer of 
geometry to the graphics card. For this benchmark to make sense, this 
should be a parameter also. Display-lists yet again, have the ability to 
change opengl-state, and apply matrices - which is immensely useful if 
you need to do this often.
(actually, I have no idea whether this is hardware accelerated or not... 
which would also make for a nice test).

Finally, doing a general geometry-benchmark, testing triangle-strips vs. 
individual triangles, different batch-sizes for DL and VBO, and of 
course vertex cached versus not, would be quite useful :)


Regards,
\\Mikkel Gjoel
PS. a couple of nvidia-presentations on the subject:
http://download.nvidia.com/developer/presentations/2004/Eurographics/EG_04_IntroductionToGPU.pdf
http://download.nvidia.com/developer/presentations/2004/Eurographics/EG_04_OptimizingGPUPipeline.pdf
http://developer.nvidia.com/docs/IO/8230/BatchBatchBatch.pdf
0
Reply ISO 2/13/2005 9:19:35 PM

Mikkel Gj�l wrote:
> This benchmark is _very_ crude. I agree with "fungus" that you really 
> need to write a more thorough test to say _anything_ about the use of 
> VBO vs. DL.

Well, I'd say it a bit more positive: After this "crude test" (including 
all results on ATI and nVidia cards), we actually know that we can't say 
anything, since different systems prefered different techniques. If both 
ATI and nVidia showed an advantage of DL over VBO (or vice versa), it 
could have lead some of us to a false assumption, but this way, we can 
say for sure that there's no simple rule at all :)

Btw, is it possible to use VBO in display lists?


Malte
0
Reply Malte 2/13/2005 10:18:24 PM

Mikkel Gj�l wrote:
> Allan Bruce wrote:
> <snippit>
>> My conclusion is that Display Lists are faster for my system but I
>> would like to see what other peoples experiences are.
> 
> This benchmark is _very_ crude.

Yes.

> I agree with "fungus" that you really 
> need to write a more thorough test to say _anything_ about the use of
> VBO vs. DL.

No, we can say _something_ about them already. This wasn't supposed to be
the be-all and end-all of VBO vs DL benchmarking.

> I find your lack of knowledge concerning what you're actually testing
> somewhat disturbing.

I think that is both rude and wrong. Allan was simply having a first stab at
answering our question, which he did. In contrast, you are whining and
haven't written a shred of code!

> You can't simply draw a single, randomly chosen, 
> scene on various configurations and expect the results to give any kind
> of meaningful results as to performance in general.

This isn't about "in general", this discussion started when someone posted
(again) saying that DLs are archaic and useless. We have shown that this is
definitely not the case, DLs can still be useful. I can't see this
situation changing - DLs are a good idea.

> And you certainly 
> can't expect your test of vertex-throughput to make sense, if you are
> cpu or fillrate-limited (as is the case when you increase the
> window-area).

Which is, of course, precisely why he gave both results.

> Display Lists and VBO also have very different characteristics. DL's are
> optimized on creation, making them quite slow on initialization. VBO on
> the other hand, are optimized per design, to enable fast transfer of
> geometry to the graphics card. For this benchmark to make sense, this
> should be a parameter also.

No, for this benchmark to be more thorough it should test more parameters.
It still makes sense without that. This was never about being thorough,
this was about knocking up a simple test to see just how much slower DLs
are at anything at all. Some quantitative information is better than an
infinite number of guesses.

> Display-lists yet again, have the ability to 
> change opengl-state, and apply matrices - which is immensely useful if
> you need to do this often.
> (actually, I have no idea whether this is hardware accelerated or not...
> which would also make for a nice test).

I suspect that is infeasible in the general case. Texture maps are probably
handled efficiently by the DL compiler though, and this could be tested...

> Finally, doing a general geometry-benchmark, testing triangle-strips vs.
> individual triangles, different batch-sizes for DL and VBO, and of
> course vertex cached versus not, would be quite useful :)

Then stop complaining and get coding. :-)

> Regards,
> \\Mikkel Gjoel
> PS. a couple of nvidia-presentations on the subject:
> ...

I'm just flicking through these here and they appear to be non-GL specific.
Of course, this means that they are (primarily?) aimed at DirectX which
lacks DLs.

-- 
Dr Jon D Harrop, Flying Frog Consultancy

0
Reply Jon 2/14/2005 1:20:45 AM

Thanks for putting up an arguement in my case Jon, I thought he was being a 
bit rude but didnt want to pipe up!

Anyway just to throw a spanner into the works here, I am at work and have a 
GeForce5700LE 256MB here.  With this card, Display Lists are faster for 
large viewports.  Results:

(default window)
VBO: 190fps
DL: 122fps

(1600x1200)
VBO: 65fps
DL: 75fps


I appreciate the fact that this is a crude test.  I have an lwo2 viewer that 
I have been working on - if I was to add an option to use VBOs or Display 
Lists (by a dialog at startup) and supply models with a large number of 
vertices, would anyone be interested in trying this out?  It uses colouring, 
texturing and lighting so would be a better 'real' test.

Allan


0
Reply Allan 2/14/2005 10:34:06 AM

Hey

Just to clarify this: Allan, I think creating this benchmark was a good 
thing, and it would be interresting to have it refined to remove the 
current limitations.

I'm not really sure I understand why DL/VBO is such a hot topic though. 
For raw geometry rendering, I can't imagine DLs being faster than VBOs. 
But even so, DLs allow you to do fast testing and easy, checked, 
optimizations - also some not available using VBO.

On a sidenode, I would find it immensely logical to be able to make 
VBO-calls from a Display List to utilize "geometry instancing" 
(</buzzword>). I don't think there should be any server/client-issues, 
but I'm aware that there are probably other design-based limitations 
that will make this kind of usage impossible. It would still be cool 
though ;)

- on with the show:
(this is my final I-said, you-said post)

Jon Harrop wrote:
> Mikkel Gj�l wrote:
>> ...expect the results to give any kind of meaningful results as to 
>> performance in general.
> 
> This isn't about "in general", this discussion started when someone 
> posted (again) saying that DLs are archaic and useless. We have shown
> that this is definitely not the case, DLs can still be useful. I 
> can't see this situation changing - DLs are a good idea.

I never argued they weren't. I argued that this test doesn't add 
meaningful insight, because the results give a seemingly simple but 
potentially confusing answer to a complicated question. In short: It 
will only add to the confusion because the results need a great deal of 
insight to be interpreted correctly (if possible).


>> And you certainly can't expect your test of vertex-throughput to 
>> make sense, if you are cpu or fillrate-limited (as is the case when
>> you increase the window-area).
> 
> Which is, of course, precisely why he gave both results.

No, it's probably because Allan doesn't fully understand what is the
limiting factors in the benchmark (see reply about vbo/dl "performance
hit" at different resolutions). I don't blame him, and I don't make any
claims about my own insight on the subject - it's a very complex area.


>> For this benchmark to make sense, this should be a parameter also.
> 
> No, for this benchmark to be more thorough it should test more 
> parameters.

hehe, so what did I just say? :)


> This was never about being thorough, this was about knocking up a 
> simple test to see just how much slower DLs are at anything at all. 
> Some quantitative information is better than an infinite number of 
> guesses.

Well, at least it has started a discussion about the results, possibly
leading to more insight. I still don't think knocking up a random test 
makes sense though.


>> PS. a couple of nvidia-presentations on the subject: ...
> 
> I'm just flicking through these here and they appear to be non-GL 
> specific.

That's true. My point was mainly to clarify some of the other potential 
bottlenecks when doing this kind of rendering. DL/VBO shouldn't really 
make a difference in this regard, as they are located the same place in 
the pipeline.

I just had a quick look around, and I can't seem to find anything about 
hardware-support / performance for Display Lists. I did come across a 
very interresting discussion on gl-geometry performance:

http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=009111


Kind regards,
\\Mikkel Gjoel
0
Reply ISO 2/14/2005 10:57:53 AM

Mikkel Gj�l wrote:
> - on with the show:
> (this is my final I-said, you-said post)

There seems to be little point in my replying so I'll be brief:

> I never argued they weren't. I argued that this test doesn't add
> meaningful insight, because the results give a seemingly simple but
> potentially confusing answer to a complicated question. In short: It
> will only add to the confusion because the results need a great deal of
> insight to be interpreted correctly (if possible).

We disagree and have drawn a conclusion.

>> Which is, of course, precisely why he gave both results.
> 
> No, it's probably because Allan doesn't fully understand what is the
> limiting factors in the benchmark (see reply about vbo/dl "performance
> hit" at different resolutions). I don't blame him, and I don't make any
> claims about my own insight on the subject - it's a very complex area.

I think this statement is based upon a misunderstanding of what we were
trying to achieve.

>>> For this benchmark to make sense, this should be a parameter also.
>> 
>> No, for this benchmark to be more thorough it should test more
>> parameters.
> 
> hehe, so what did I just say? :)

"...make sense..." vs "...more thorough...".

>> This was never about being thorough, this was about knocking up a
>> simple test to see just how much slower DLs are at anything at all.
>> Some quantitative information is better than an infinite number of
>> guesses.
> 
> Well, at least it has started a discussion about the results, possibly
> leading to more insight. I still don't think knocking up a random test
> makes sense though.

If you think it started with that then you missed the first half of the
conversation, which explains why you have misunderstood what we are trying
to achieve...

>>> PS. a couple of nvidia-presentations on the subject: ...
>> 
>> I'm just flicking through these here and they appear to be non-GL
>> specific.
> 
> That's true.

It's more than true. I spent an hour reading all three lectures in detail
last night and they are all completely irrelevant. The first is too
"beginner" to be relevant, the last two are entirely DirectX/game
programming.

> My point was mainly to clarify some of the other potential 
> bottlenecks when doing this kind of rendering. DL/VBO shouldn't really
> make a difference in this regard, as they are located the same place in
> the pipeline.

I believe that is quite wrong. When geometry bound, DLs could exhibit
various levels of optimisation. In the worst case, they could just exhibit
the performance of vertex arrays, called by the driver. In the best case,
they could be split between VBOs and vertex array ranges, resulting in
better performance (tuned to a given graphics card) than an application
which tried to use VBOs alone would get. Also, DLs are likely to give the
best performance for pre-VBO hardware.

On top of this, as Dls are easier to code than VBOs, so people will still
want to use them when doing small projects, provided the performance isn't
awful (say, within 2x).

Finally, Allan's program is not that different from some of the kinds of
things I'd expect to write. Sure, it doesn't use fragment shaders, but
neither do I!

> I just had a quick look around, and I can't seem to find anything about
> hardware-support / performance for Display Lists. I did come across a
> very interresting discussion on gl-geometry performance:

This is also completely irrelevant twaddle about game programming and
DirectX.

I think you should read the parent thread to get some idea of what we are
talking about here. This has nothing to do with game programming.

-- 
Dr Jon D Harrop, Flying Frog Consultancy

0
Reply Jon 2/14/2005 12:41:15 PM

Jon Harrop wrote:
> Mikkel Gj�l wrote:
>> (this is my final I-said, you-said post)

I will however reply to relevant, on-topic subjects.
(and I have now read the thread about "wirecube rendering"-gone mad)


>> My point was mainly to clarify some of the other potential 
>> bottlenecks when doing this kind of rendering. DL/VBO shouldn't 
>> really make a difference in this regard, as they are located the 
>> same place in the pipeline.
> 
> I believe that is quite wrong. When geometry bound, DLs could exhibit
> various levels of optimisation.

My above statement was probably unclear. This is what I was trying to 
communicate: The papers are relevant no matter if they are concerned 
with VBO or DL, as they simply give guidelines on how to test what your 
program's bottlenecks are. The same goes for DirectX/OpenGL - it doesn't 
doesn't really matter in this context, as the subject is graphics-card 
performance, not API-performance.

Concerning the benchmark, I guess it would be relevant to test in both 
CPU- and geometry-bound cases, as the workload might very well vary 
switching between VBO/DL.


> In the best case, they could be split between VBOs and vertex array
> ranges, resulting in better performance (tuned to a given graphics
> card) than an application which tried to use VBOs alone would get.

Just a sidenode: Isn't the VAR-extension dead as a doornail? I believe 
any graphics-card supporting VAR should be able to (and does, for 
atleast ati/nv) support VBO with about the same speedsups.


> Finally, Allan's program is not that different from some of the kinds
> of things I'd expect to write.

This isn't really relevant to the accuracy of the results. I appreciate 
the attempt to show that Display Lists are useful, but for you to be 
able to use this knowledge, you need to know when these results apply. 
Hence the need for more thorough testing. A general guideline of "if 
your program does about the same as Allan's, then use Displaylists" was 
probably not what you where trying to establish either.

Again, I don't disagee that Display Lists are helpful(*), only that the 
benchmark isn't actually helpful until it does more accurate testing.

(*) (except as an example of "displaylists not being slower, and thus 
useful")


Kind Regards,
\\Mikkel Gjoel
0
Reply ISO 2/14/2005 1:58:42 PM

Allan Bruce wrote:
> Anyway just to throw a spanner into the works here, I am at work and have a 
> GeForce5700LE 256MB here.  With this card, Display Lists are faster for 
> large viewports.  Results:
> 
> (default window)
> VBO: 190fps
> DL: 122fps
> 
> (1600x1200)
> VBO: 65fps
> DL: 75fps
> 

In the case when you're fill limited they should be
both the same speed...


-- 
<\___/>
/ O O \
\_____/  FTB.    For email, remove my socks.
0
Reply fungus 2/14/2005 2:00:03 PM

Mikkel Gj�l wrote:
> Jon Harrop wrote:
> 
>> Mikkel Gj�l wrote:
>>
>>> (this is my final I-said, you-said post)
> 
> 
> I will however reply to relevant, on-topic subjects.
> (and I have now read the thread about "wirecube rendering"-gone mad)
> 

[... zzz ...]

Please post some real good benchmark code. So finally we'll know
which one is faster under which circumstance - VBO or DL.

[more blabla "Like I said, I was trying to say, I meant..." removed]

*sigh*

Cheers,

	Toni


-- 
for mail, mirror: ed.lausivksa@elielb
0
Reply Antonio 2/14/2005 2:23:34 PM

"Allan Bruce" <abruce@TAKEMEAWAY.csd.abdn.ac.uk> wrote in message news:<110ulgdltj9ue2f@corp.supernews.com>...

> Can anybody who is interested, and those who believe that Dispay Lists are 
> slower than VBOs (or vice versa) please try this out and we can compare the 
> results?  See if we can finally determine which is faster in general or on 
> specific hardware.
> 

P4 2.2 512MB
Windows 2000
Quadro 500FX

VBO: ~ 220 FPS
DL: ~ 330 FPS

I use PBO as well and whe it comes to updating the buffer very often
(STREAM_DRAW) PBO owns. Perhaps a test using all possible combinations
would give us better results.

NVIDIA's SDK 8.5 has an example but it only runs on GeForce FX or
above, if you have the time you could change it so that it would run
in any > Geforce 3.

Regards,

wpr
0
Reply mlopes_filho 2/14/2005 3:46:14 PM

[snip]
> >
>
> I just tested an image 1600x1200 for the height map and the results are as
> follows:
>
> (default window)
> VBO: ~44fps
> DL:  ~85fps
>
> (1600x1200)
> VBO: 15fps
> DL:  24fps
>
> Allan
>
>

Very Interesting! Quite a reversal.

jbw


0
Reply JB 2/14/2005 4:07:20 PM

Hi all,

I was curious and made some tests as well:

P4 2.2 512MB
Windows 2000
Quadro FX 500
Drivers 56.72

VBO: 220 FPS
DL: 330 FPS

I was a bit disapointed, so I used my own program to load two .PLY meshes

BIG: 327323 vertices 654666 Faces
SMALL: 52227 vertices 102280 Faces

VBO SMALL : 10
DL SMALL: 18

VBO BIG: 10
DL BIG: 5

Pretty slow... I decided to check in a better machine:

Dual Xeon 2.66 512MB
Windows XP SP1
Quadro FX 1100
Drivers 61.76

VBO: 380 FPS
DL: 600 FPS !!!

VBO SMALL: 40 FPS
DL SMALL: 68 FPS

VBO BIG: 38 FPS
DL BIG: 10 FPS

For the BIG mesh VBO does make a diference.
The question is when it starts to make a diference !?

wpr.
0
Reply mlopes_filho 2/14/2005 4:39:43 PM

Malte Clasen wrote:
> Allan Bruce wrote:
>> Machine Spec:
> Pentium 4, 2.4 GHz, FSB800
> 1GB DDR400 RAM
> nVidia Geforce 6800 128MB, 70.41 drivers
> 
> VBO: 330 fps
> DL: 170 fps
> 
>> My conclusion is that Display Lists are faster for my system but I 
>> would like to see what other peoples experiences are.

Actually, it looks like a driver-bug. If you change glCreateList( 
GL_COMPILE_AND_EXECUTE ); to glCreateList( GL_COMPILE ); - the DL-speed 
hits the roof. It seems the driver compiles the list for each call to 
glCallList... ie. for each frame.


Regards,
\\Mikkel Gjoel
0
Reply ISO 2/14/2005 6:54:30 PM

fungus wrote:

>>
> 
> In the case when you're fill limited they should be
> both the same speed...

maybe they should but  tests shown that dl are faster also on my box, 
the difference was few fps.

GeForce 4 mx 420 , athlon 1.7 256 ram
0
Reply Sulsa 2/14/2005 11:03:57 PM

Mikkel Gj�l wrote:
> Malte Clasen wrote:
>> Allan Bruce wrote:
>>> Machine Spec:
>> Pentium 4, 2.4 GHz, FSB800
>> 1GB DDR400 RAM
>> nVidia Geforce 6800 128MB, 70.41 drivers
>> 
>> VBO: 330 fps
>> DL: 170 fps
>> 
>>> My conclusion is that Display Lists are faster for my system but I
>>> would like to see what other peoples experiences are.
> 
> Actually, it looks like a driver-bug. If you change glCreateList(
> GL_COMPILE_AND_EXECUTE ); to glCreateList( GL_COMPILE ); - the DL-speed
> hits the roof. It seems the driver compiles the list for each call to
> glCallList... ie. for each frame.

IIRC, I reported that bug to nVidia quite some time ago. Perhaps if you guys
all report it too...

For my own version of the NeHe lesson 45 demo (255 512-triangle strips but
index array for the VBOs in system memory):

1.2GHz Athlon T-bird
2.4.27-2-k7 Debian (Sarge) Linux
64Mb GeForce 3
768Mb RAM
nVidia 6629 drivers
gcc -pipe -Wall -O2 -Wall -march=athlon-tbird -mmmx -m3dnow -ffast-math
-fno-math-errno -funsafe-math-optimizations -fno-trapping-math
-malign-double -funroll-loops -pipe -fomit-frame-pointer main.cpp error.cpp
lesson45.cpp -o lesson45 -L/usr/X11R6/lib/ -lGL -lGLU `sdl-config --cflags
--libs`

8fps DL: COMPILE_AND_EXECUTE
36fps VBOs
58fps DL: COMPILE

My guess is that the DLs are a win here because they put the index array
onto the graphics card. I believe I need to use GL_ELEMENT_ARRAY_BUFFER but
my nVidia driver doesn't appear to provide this.

Here's my code:

#define NO_VBOS
#define EXECUTE

#include <iostream>
#include <cstdio>
#include <string.h>
#include <GL/gl.h>
#include <GL/glu.h>

#include "main.h"

#ifndef CDS_FULLSCREEN
#define CDS_FULLSCREEN 4
#endif

#ifndef GL_BGR
#define GL_BGR  0x80E0
#endif

#define GL_ARRAY_BUFFER_ARB 0x8892
#define GL_STATIC_DRAW_ARB 0x88E4

typedef void (APIENTRY * PFNGLBINDBUFFERARBPROC)
  (GLenum target, GLuint buffer);
typedef void (APIENTRY * PFNGLDELETEBUFFERSARBPROC)
  (GLsizei n, const GLuint *buffers);
typedef void (APIENTRY * PFNGLGENBUFFERSARBPROC)
  (GLsizei n, GLuint *buffers);
typedef void (APIENTRY * PFNGLBUFFERDATAARBPROC)
  (GLenum target, int size, const GLvoid *data, GLenum usage);

PFNGLGENBUFFERSARBPROC glGenBuffersARB = NULL;
PFNGLBINDBUFFERARBPROC glBindBufferARB = NULL;
PFNGLBUFFERDATAARBPROC glBufferDataARB = NULL;
PFNGLDELETEBUFFERSARBPROC glDeleteBuffersARB = NULL;

extern S_AppStatus AppStatus;
int width, height;

class CMesh {
public:
  // Mesh Data
  GLuint m_nVertexCount;
  GLfloat *m_pVertices;
  GLfloat *m_pTexCoords;
  GLuint *m_pElements;
  unsigned int m_nTextureId;

  // Vertex Buffer Object Names
  unsigned int m_nVBOVertices;
  unsigned int m_nVBOTexCoords;

  // Temporary Data
  SDL_Surface* m_pTextureImage;

public:
  CMesh();
  ~CMesh();

  bool LoadHeightmap(char* szPath);
  float PtHeight(int nX, int nY);
  void BuildVBOs();
};

bool g_fVBOSupported = false;
CMesh* g_pMesh = NULL;
float g_flYRot = 0.0f;
int g_nFPS = 0, g_nFrames = 0;
int g_dwLastFPS = 0;

bool IsExtensionSupported(char* szTargetExtension) {
  const unsigned char *pszExtensions = NULL;
  const unsigned char *pszStart;
  unsigned char *pszWhere, *pszTerminator;

  pszWhere = (unsigned char *) strchr( szTargetExtension, ' ' );
  if( pszWhere || *szTargetExtension == '\0' )
    return false;
  
  pszExtensions = glGetString( GL_EXTENSIONS );

  pszStart = pszExtensions;
  for(;;)
    {
      pszWhere = (unsigned char *) strstr((const char *) pszStart,
       szTargetExtension);
      if(!pszWhere)
 break;
      pszTerminator = pszWhere + strlen( szTargetExtension );
      if(pszWhere == pszStart || *( pszWhere - 1 ) == ' ')
 if(*pszTerminator == ' ' || *pszTerminator == '\0')
   return true;
      pszStart = pszTerminator;
    }
  return false;
}

bool Initialize (void) {
  AppStatus.Visible = true;
  AppStatus.MouseFocus = true;
  AppStatus.KeyboardFocus = true;
  
  g_pMesh = new CMesh();
  if(!g_pMesh->LoadHeightmap("terrain.bmp")) {
    Log( "Error Loading Heightmap");
    return false;
  }
  
  // Check For VBOs Supported
  Log("Checking for extensions.....");
#ifndef NO_VBOS
  g_fVBOSupported = IsExtensionSupported( "GL_ARB_vertex_buffer_object" );
  if( g_fVBOSupported )
    {
      glGenBuffersARB = (PFNGLGENBUFFERSARBPROC)
 SDL_GL_GetProcAddress("glGenBuffersARB");
      glBindBufferARB = (PFNGLBINDBUFFERARBPROC)
 SDL_GL_GetProcAddress("glBindBufferARB");
      glBufferDataARB = (PFNGLBUFFERDATAARBPROC)
 SDL_GL_GetProcAddress("glBufferDataARB");
      glDeleteBuffersARB = (PFNGLDELETEBUFFERSARBPROC)
 SDL_GL_GetProcAddress("glDeleteBuffersARB");
      // Load Vertex Data Into The Graphics Card Memory
      g_pMesh->BuildVBOs();
    }
#else /* NO_VBOS */
  g_fVBOSupported = false;
#endif
  return true;
}

bool InitGL(SDL_Surface *S) {
  glEnable( GL_TEXTURE_2D );
  glClearColor (0.0f, 0.0f, 0.0f, 0.5f);
  glClearDepth (1.0f);
  glDepthFunc (GL_LEQUAL);
  glEnable (GL_DEPTH_TEST);
  glShadeModel (GL_SMOOTH);
  glHint (GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST);

  SDL_GL_SetAttribute( SDL_GL_DOUBLEBUFFER, 1 );

  return true;
}

void Deinitialize (void) {
  if(g_pMesh) delete g_pMesh;
  g_pMesh = NULL;
}

void Update (Uint32 milliseconds, Uint8 *Keys) {
  g_flYRot += (float) ( milliseconds ) / 1000.0f * 25.0f;
  
  if(Keys)
    {
      if (Keys [SDLK_ESCAPE] == true) TerminateApplication ();
      if (Keys [SDLK_F1] == true) ToggleFullscreen ();
    }
}

GLuint displaylist=0;

void coord(int x, int y) {
  glTexCoord2fv(g_pMesh->m_pTexCoords + (y*width + x)*2);
  glVertex3fv(g_pMesh->m_pVertices + (y*width + x)*3);
}

void Draw (void) {
  glClear (GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
  glLoadIdentity ();
  
  if( SDL_GetTicks() - g_dwLastFPS >= 10000 )
    {
      g_dwLastFPS = SDL_GetTicks();
      g_nFPS = g_nFrames / 10.;
      g_nFrames = 0;
      
      char szTitle[256]={0};
      sprintf( szTitle, "VBO Tut - %d Triangles, %d FPS",
        (width-1)*(height-1)*2, g_nFPS );
      if( g_fVBOSupported )
 strcat( szTitle, ", Using VBOs" );
      else
 strcat( szTitle, ", Using a DL" );
      SDL_WM_SetCaption(szTitle,NULL);
    }

  g_nFrames++;

  // Move The Camera
  glTranslatef( 0.0f, -220.0f, 0.0f );
  glRotatef( 10.0f, 1.0f, 0.0f, 0.0f );
  glRotatef( g_flYRot, 0.0f, 1.0f, 0.0f );

  // Set Pointers To Our Data
  if( g_fVBOSupported ) {
    // Enable Pointers
    glEnableClientState(GL_VERTEX_ARRAY);
    glEnableClientState(GL_TEXTURE_COORD_ARRAY);

    glBindBufferARB(GL_ARRAY_BUFFER_ARB, g_pMesh->m_nVBOVertices );
    glVertexPointer(3, GL_FLOAT, 0, (char *) NULL);
    glBindBufferARB(GL_ARRAY_BUFFER_ARB, g_pMesh->m_nVBOTexCoords );
    glTexCoordPointer(2, GL_FLOAT, 0, (char *) NULL);

    // Render
    for (int i=0; i<height-1; i++)
      glDrawElements(GL_TRIANGLE_STRIP, width*2, GL_UNSIGNED_INT,
       g_pMesh->m_pElements + width*2*i);

    glDisableClientState(GL_VERTEX_ARRAY);
    glDisableClientState(GL_TEXTURE_COORD_ARRAY);
  } else {
    // Render
    if (!displaylist) {
      displaylist = glGenLists(1);
#ifdef EXECUTE
      glNewList(displaylist, GL_COMPILE_AND_EXECUTE);
#else
      glNewList(displaylist, GL_COMPILE);
#endif
      for (int y=0; y<height-1; y++) {
 glBegin(GL_TRIANGLE_STRIP);
 for (int x=0; x<width; x++) {
   coord(x, y);
   coord(x, y+1);
 }
 glEnd();
      }
      glEndList();
#ifndef EXECUTE
      glCallList(displaylist);
#endif
    }
    else
      glCallList(displaylist);
  }
  glFlush();
}

CMesh :: CMesh() {
  m_pTextureImage = NULL;
  m_pVertices = NULL;
  m_pTexCoords = NULL;
  m_pElements = NULL;
  m_nVertexCount = 0;
  m_nVBOVertices = m_nVBOTexCoords = m_nTextureId = 0;
}

CMesh :: ~CMesh() {
  if(g_fVBOSupported) {
    unsigned int nBuffers[3] = { m_nVBOVertices, m_nVBOTexCoords };
    glDeleteBuffersARB(2, nBuffers);
  }

  if (m_pVertices) delete [] m_pVertices;
  m_pVertices = NULL;
  if (m_pTexCoords) delete [] m_pTexCoords;
  m_pTexCoords = NULL;
  if (m_pElements) delete [] m_pElements;
  m_pElements = NULL;

  if (m_nVBOTexCoords) glDeleteBuffersARB(1, &m_nVBOTexCoords);
  if (m_nVBOVertices) glDeleteBuffersARB(1, &m_nVBOVertices);
}

bool CMesh :: LoadHeightmap(char* szPath) {
  SDL_Surface *surface;
  Uint32 rmask, gmask, bmask, amask;

#if SDL_BYTEORDER == SDL_BIG_ENDIAN
  rmask = 0xff000000;
  gmask = 0x00ff0000;
  bmask = 0x0000ff00;
  amask = 0x00000000;
#else
  rmask = 0x000000ff;
  gmask = 0x0000ff00;
  bmask = 0x00ff0000;
  amask = 0x00000000;
#endif

  // Load Texture Data
  m_pTextureImage = SDL_LoadBMP(szPath);
  width = m_pTextureImage->w;
  height = m_pTextureImage->h;
  surface = SDL_CreateRGBSurface(SDL_SWSURFACE, width, height, 24,
     rmask, gmask, bmask, amask);
  m_pTextureImage = SDL_ConvertSurface(m_pTextureImage, surface->format ,
SDL_SWSURFACE );

  // Generate Vertex Field
  m_nVertexCount = (int) (width * height);
  m_pVertices = new GLfloat[m_nVertexCount*3];
  m_pTexCoords = new GLfloat[m_nVertexCount*2];
  for (int y=0; y<height; y++)
    for (int x=0; x<width; x++) {
      m_pVertices[(y*width + x)*3 + 0] = x - width/2;
      m_pVertices[(y*width + x)*3 + 1] = PtHeight(x, y);
      m_pVertices[(y*width + x)*3 + 2] = y - height/2;

      m_pTexCoords[(y*width + x)*2 + 0] = GLfloat(x) / width;
      m_pTexCoords[(y*width + x)*2 + 1] = GLfloat(y) / height;
    }
  m_pElements = new GLuint[width*(height - 1)*2];
  {
    GLuint *it = m_pElements;
    for (int y=0; y<height-1; y++)
      for (int x=0; x<width; x++) {
 *(it++) = y*width + x;
 *(it++) = (y + 1)*width + x;
      }
  }

  // Load The Texture Into OpenGL
  glGenTextures(1, &m_nTextureId);
  glBindTexture(GL_TEXTURE_2D, m_nTextureId);
  glTexImage2D(GL_TEXTURE_2D, 0, 3, width, height, 0,
        GL_RGB, GL_UNSIGNED_BYTE, m_pTextureImage->pixels);
  glTexParameteri(GL_TEXTURE_2D,GL_TEXTURE_MIN_FILTER,GL_LINEAR);
  glTexParameteri(GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER,GL_LINEAR);
  // Free The Texture Data
  if(m_pTextureImage) SDL_FreeSurface(m_pTextureImage);
  if(surface) SDL_FreeSurface(surface);

  return true;
}

float CMesh :: PtHeight(int nX, int nY) {
  SDL_Color color;   // Used to store R,G,B components of pixel
  Uint32 col=0;    // Temporary pixel value storage
  
  char* offset = (char *)(m_pTextureImage->pixels);

  offset += m_pTextureImage->pitch * nY;  
  offset += m_pTextureImage->format->BytesPerPixel * nX;

  memcpy(&col, offset, m_pTextureImage->format->BytesPerPixel);
  
  SDL_GetRGB(col, m_pTextureImage->format, &color.r, &color.g, &color.b);

  return 0.299f * color.r + 0.587f * color.g + 0.114f * color.b;
}

void CMesh :: BuildVBOs() {
  glGenBuffersARB(1, &m_nVBOVertices);
  glBindBufferARB(GL_ARRAY_BUFFER_ARB, m_nVBOVertices);
  glBufferDataARB(GL_ARRAY_BUFFER_ARB, m_nVertexCount*3*sizeof(GLfloat),
    m_pVertices, GL_STATIC_DRAW_ARB);

  glGenBuffersARB(1, &m_nVBOTexCoords);
  glBindBufferARB(GL_ARRAY_BUFFER_ARB, m_nVBOTexCoords);
  glBufferDataARB(GL_ARRAY_BUFFER_ARB, m_nVertexCount*2*sizeof(GLfloat),
    m_pTexCoords, GL_STATIC_DRAW_ARB);

  delete [] m_pVertices; m_pVertices = NULL;
  delete [] m_pTexCoords; m_pTexCoords = NULL;
}

-- 
Dr Jon D Harrop, Flying Frog Consultancy

0
Reply Jon 2/14/2005 11:05:32 PM

Antonio Bleile wrote:

> Please post some real good benchmark code. So finally we'll know
> which one is faster under which circumstance - VBO or DL.
> 
> [more blabla "Like I said, I was trying to say, I meant..." removed]
> 

very wise answer
0
Reply Sulsa 2/14/2005 11:11:28 PM

Sulsa wrote:
> fungus wrote:
> 
>> In the case when you're fill limited they should be
>> both the same speed...
> 
> 
> maybe they should but  tests shown that dl are faster also on my box, 
> the difference was few fps.
> 
> GeForce 4 mx 420 , athlon 1.7 256 ram

I've also noticed that the frame rate varies a *lot*
as the model rotates. I get between 550fps and 800fps
depending on the rotation...


-- 
<\___/>
/ O O \
\_____/  FTB.    For email, remove my socks.
0
Reply fungus 2/14/2005 11:33:16 PM

I noticed that the "benchmark" used two separate
VBOs, one for vertex coordinates and one for
texture coordinates.

I just combined them into a single VBO and gained
about 100fps on my machine.


-- 
<\___/>
/ O O \
\_____/  FTB.    For email, remove my socks.
0
Reply fungus 2/14/2005 11:47:16 PM

fungus wrote:
> I've also noticed that the frame rate varies a *lot*
> as the model rotates. I get between 550fps and 800fps
> depending on the rotation...

The variation for me is within 4%.

-- 
Dr Jon D Harrop, Flying Frog Consultancy

0
Reply Jon 2/15/2005 1:15:38 AM

fungus wrote:
> I noticed that the "benchmark" used two separate
> VBOs, one for vertex coordinates and one for
> texture coordinates.
> 
> I just combined them into a single VBO and gained
> about 100fps on my machine.

Which is sort of wierd, as I could swear nvidia stated that interleaved 
arrays did not result in performance-gains (of course, you might not be 
testing on an nvidia card). Using separate VBO's for each 
vertex-attribute is probably the regular case though.

Actually, you should disable depth-tests, depth-writes, texture, 
lighting, and stop clearing the color and depth-buffers too.

Yeah I know... write a better one. I actually sort of did, but I get 
strange performance on nv40 (and I use s crappy timer - probably 
related). I'll finish it tomorrow, but if someone wants to take a look 
at it:

http://www.userwebs.dk/gjoel/geom_bench.zip
http://www.userwebs.dk/gjoel/geom_bench_src.zip

- it writes output to a textfile. I added a couple of testresults to a 
textfile in the zip. What is really needed in this one, is test of 
cpu-usage. It assumes the card has a 10-entry vertex-cache (matching 
geforceFX cards)... which favors some cards to others, but shouldn't 
affect VBO/DL comparisont.


Regards,
\\Mikkel Gjoel
0
Reply ISO 2/15/2005 1:17:44 AM

fungus wrote:
> I noticed that the "benchmark" used two separate
> VBOs, one for vertex coordinates and one for
> texture coordinates.
> 
> I just combined them into a single VBO and gained
> about 100fps on my machine.
> 

I just made the VBO use an index array
(ie. glDrawElements()) and gained another
100fps or so. Looks like indexed arrays are
much faster than non-indexed on my Radeon x800.

I also changed the primitive type to GL_POINTS
to avoid pixel fill overhead. Now there's not
really any variation as the terrain rotates.

I also increased the number of vertices quite
a bit because I was getting over 1200fps and
I don't think GetTickCount() is that accurate...

After these simple tweaks it showed display
lists running about 8% faster than VBOs. If
you want the tweaked code it's here:

http://www.artlum.com/Lesson45.cpp


Still...I've got a "real" model with 800,000
triangles running here in my game engine. I
just got 175fps* when I render it with VBOs
and only about 75fps with display lists.

What's the difference between the little
benchmark and the "real" program? I don't
know, but I'm sticking with VBOs....


-- 
<\___/>
/ O O \
\_____/  FTB.    For email, remove my socks.


[*] Which is actually quite impressive! I only
got 95fps last time I loaded that model (a couple
of driver versions/program tweaks ago).

175fps is about 140 million real, antialiased
triangles/sec...

Eat your heart out all those people who paid
$$millions for a RealityMonster a few years
ago!!

.....and this is only a normally-clocked x800 "Pro"
- not an overclocked Platinum XT GTi Turbo-nutter
model or anything fancy.
0
Reply fungus 2/15/2005 1:18:51 AM

Jon Harrop wrote:
> fungus wrote:
> 
>>I've also noticed that the frame rate varies a *lot*
>>as the model rotates. I get between 550fps and 800fps
>>depending on the rotation...
> 
> 
> The variation for me is within 4%.
> 

The difference must be in the pixel filling so it
will depend on your graphics card. I changed from
triangles to GL_POINTS and the variation vanished
- see my other post.


-- 
<\___/>
/ O O \
\_____/  FTB.    For email, remove my socks.
0
Reply fungus 2/15/2005 1:20:31 AM

Mikkel Gj�l wrote:
>> I just combined them into a single VBO and gained
>> about 100fps on my machine.
> 
> 
> Which is sort of wierd, as I could swear nvidia stated that interleaved 
> arrays did not result in performance-gains (of course, you might not be 
> testing on an nvidia card). Using separate VBO's for each 
> vertex-attribute is probably the regular case though.
> 

I didn't interleave them I just put them into the
same VBO.


-- 
<\___/>
/ O O \
\_____/  FTB.    For email, remove my socks.
0
Reply fungus 2/15/2005 1:24:33 AM

fungus wrote:
> The difference must be in the pixel filling so it
> will depend on your graphics card. I changed from
> triangles to GL_POINTS and the variation vanished
> - see my other post.

I think it is because you have a sweet graphics card and I do not. Also, my
frame rates are about 0.1x everyone else's. )-;

-- 
Dr Jon D Harrop, Flying Frog Consultancy

0
Reply Jon 2/15/2005 1:26:25 AM

fungus wrote:
> Mikkel Gj�l wrote:
>>> I just combined them into a single VBO and gained
>>> about 100fps on my machine.
 >>>
>> Which is sort of wierd, as I could swear nvidia stated that 
>> interleaved arrays did not result in performance-gains
 >>
> I didn't interleave them I just put them into the
> same VBO.

Hmm, ok. Seems to me the program is cpu-limited - did this difference 
hold when you added more points?


Regards,
\\Mikkel Gjoel
0
Reply ISO 2/15/2005 1:31:33 AM

fungus wrote:
> I noticed that the "benchmark" used two separate
> VBOs, one for vertex coordinates and one for
> texture coordinates.
> 
> I just combined them into a single VBO and gained
> about 100fps on my machine.

I just did that (non-interleaved, of course ;-) and got 0fps more. :-)

-- 
Dr Jon D Harrop, Flying Frog Consultancy

0
Reply Jon 2/15/2005 1:33:57 AM

fungus wrote:
> I just made the VBO use an index array
> (ie. glDrawElements()) and gained another
> 100fps or so. Looks like indexed arrays are
> much faster than non-indexed on my Radeon x800.

I've been doing that from the start, so I can't say (and I can't be bothered
to make it less efficient ;-).

Is your index array in the VBO?

> I also changed the primitive type to GL_POINTS
> to avoid pixel fill overhead. Now there's not
> really any variation as the terrain rotates.

I just used GL_POINTS and got _exactly_ the same framerate. I think my setup
is totally geo-bound.

> I also increased the number of vertices quite
> a bit because I was getting over 1200fps and
> I don't think GetTickCount() is that accurate...

I could plot vertices vs frame-rate but I can't be bothered ATM. Maybe
later.

> Still...I've got a "real" model with 800,000
> triangles running here in my game engine. I
> just got 175fps* when I render it with VBOs
> and only about 75fps with display lists.

It would be interesting to see. How often (if ever) are your display lists
recompiled?

> What's the difference between the little
> benchmark and the "real" program? I don't
> know, but I'm sticking with VBOs....

Yes, I'm sticking with DLs. :-)

Still, this is very interesting, IMHO.

-- 
Dr Jon D Harrop, Flying Frog Consultancy

0
Reply Jon 2/15/2005 1:38:35 AM

Jon Harrop wrote:
> I just used GL_POINTS and got _exactly_ the same framerate. I think my
> setup is totally geo-bound.

Yeah, check this out:

1600x1200
25fps VBO
36fps DL

That's only 30% slower than at 640x480.

Put another way, VBOs are as fast in 640x480 as DLs are at 1600x1200. Man,
VBOs suck. ;-)

-- 
Dr Jon D Harrop, Flying Frog Consultancy

0
Reply Jon 2/15/2005 1:48:44 AM

Jon Harrop wrote:
> 1600x1200
> 25fps VBO
> 36fps DL
> 
> That's only 30% slower than at 640x480.
> 

???

We're trying to measure geometry rate here, not the
speed of glClear()...!



-- 
<\___/>
/ O O \
\_____/  FTB.    For email, remove my socks.
0
Reply fungus 2/15/2005 2:32:04 AM

Jon Harrop wrote:
> 
> Is your index array in the VBO?
> 

Nope. I made two buffers, one for vertices and
one for indices.

> I just used GL_POINTS and got _exactly_ the same framerate. I think my setup
> is totally geo-bound.
> 

Could be. What graphics card do you have?

>>I get 175fps* when I render it with VBOs
>>and only about 75fps with display lists.
> 
> How often (if ever) are your display lists
> recompiled?
> 

Um, once...



-- 
<\___/>
/ O O \
\_____/  FTB.    For email, remove my socks.
0
Reply fungus 2/15/2005 2:34:30 AM

Mikkel Gj�l wrote:
> fungus wrote:
> 
>>>> I just combined them into a single VBO and gained
>>>> about 100fps on my machine.
> 
> 
> did this difference 
> hold when you added more points?
> 

Yes....there's about 20% difference between indexed
and non-indexed arrays (indexed is faster).


-- 
<\___/>
/ O O \
\_____/  FTB.    For email, remove my socks.
0
Reply fungus 2/15/2005 2:36:20 AM

fungus wrote:
> Jon Harrop wrote:
>> 
>> Is your index array in the VBO?
>> 
> 
> Nope. I made two buffers, one for vertices and
> one for indices.

Are they both VBOs or is the index array in system memory?

>> I just used GL_POINTS and got _exactly_ the same framerate. I think my
>> setup is totally geo-bound.
>> 
> 
> Could be. What graphics card do you have?

GeForce 3.

>>>I get 175fps* when I render it with VBOs
>>>and only about 75fps with display lists.
>> 
>> How often (if ever) are your display lists
>> recompiled?
> 
> Um, once...

Hmm, that's interesting...

-- 
Dr Jon D Harrop, Flying Frog Consultancy

0
Reply Jon 2/15/2005 2:41:11 AM

fungus wrote:
> Jon Harrop wrote:
>> 1600x1200
>> 25fps VBO
>> 36fps DL
>> 
>> That's only 30% slower than at 640x480.
>> 
> 
> ???
> 
> We're trying to measure geometry rate here, not the
> speed of glClear()...!

I think this _is_ the geometry rate (certainly at 640x480) but it is still a
surprisingly big factor at 1600x1200.

-- 
Dr Jon D Harrop, Flying Frog Consultancy

0
Reply Jon 2/15/2005 2:43:10 AM

Jon Harrop wrote:
> fungus wrote:
> 
>>I made two buffers, one for vertices and
>>one for indices.
> 
> 
> Are they both VBOs or is the index array in system memory?
> 

Both VBOs.




-- 
<\___/>
/ O O \
\_____/  FTB.    For email, remove my socks.
0
Reply fungus 2/15/2005 2:49:07 AM

fungus wrote:
 >[cuted]


on my machine dl are five times faster than vbo's(1 fps)
0
Reply Sulsa 2/15/2005 3:23:18 AM

Sulsa wrote:
> fungus wrote:
>  >[cuted]
> 
> 
> on my machine dl are five times faster than vbo's(1 fps)

What's your "machine"...?

For this to be useful it's nice to see what graphics
card you have.

-- 
<\___/>
/ O O \
\_____/  FTB.    For email, remove my socks.
0
Reply fungus 2/15/2005 11:26:37 AM

>
> Ah, if only life were that simple...!
>
> What happens if you use texture... or colors?
>
> What happens if you have indexed arrays?
>
> In that "benchmark" the DL was faster than the VBO
> but I've measured VBO as 50% faster than DL on
> a radiosity model with a million triangles in it
> on the exact same machine.

Totally aggree. If you set the
#define MESH_RESOLUTION 1.0f         // Pixels Per Vertex

then things start to look different. The sppedup here is mainly, 
because the 6 calls to initialize the VBO buffers take longer than the 
one call to the display list. If triangle count goes up, you see which 
method is usually faster internally.

So - what do we do now? Do some work before each game start and check 
what's fastest on that PC?

My opinion: Don't use DL's at all, so driver developers will stop 
working on them - They are a real PITA to program, I bet.

More: What if you include the VBO's _in_ the Display list - have you 
checked that already? (Have no time to test...)

-Gernot 


0
Reply Gernot 2/15/2005 1:31:20 PM

Gernot Frisch wrote:
> So - what do we do now? Do some work before each game start and check
>  what's fastest on that PC?

No, you do as the hardware-vendors tell you to :) - you use VBOs when 
the data is dynamic, when upload-speed counts, and display-lists when 
you feel like testing something new and fancy out.


> My opinion: Don't use DL's at all, so driver developers will stop 
> working on them - They are a real PITA to program, I bet.

But they are a piecacake to use, so I would be sad to see them leave.


> More: What if you include the VBO's _in_ the Display list - have you
> checked that already? (Have no time to test...)

Sadly not possible. This would be an intuitive interface to "instancing" 
though, but I guess there are design-based limits that prohibit this usage.


Kind regards,
\\Mikkel Gjoel
0
Reply ISO 2/15/2005 1:48:25 PM

Gernot Frisch wrote:
>
> Totally aggree. If you set the
> #define MESH_RESOLUTION 1.0f         // Pixels Per Vertex
> 
> then things start to look different. The sppedup here is mainly, 
> because the 6 calls to initialize the VBO buffers take longer than the 
> one call to the display list.
> 

Yep. I combined the separate vertex/texcoord buffers
into a single buffer and gained 100fps (it went from
600 to 700).

> So - what do we do now? Do some work before each game start and check 
> what's fastest on that PC?
> 

That's the only real answer.


> My opinion: Don't use DL's at all, so driver developers will stop 
> working on them - They are a real PITA to program, I bet.
> 

I've had more driver bugs with display lists than
just about anything else. All my programs have an
option "Don't use display lists" and many users have
said that disabling them off fixes their problems.

VBOs are a much more logical way of doing things
from all points of view (hardware/driver/program).
I think this is why the video card manufacturers
are pushing them so hard.



-- 
<\___/>
/ O O \
\_____/  FTB.    For email, remove my socks.
0
Reply fungus 2/15/2005 2:16:03 PM

fungus wrote:
> Sulsa wrote:
> 
>> fungus wrote:
>>  >[cuted]
>>
>>
>> on my machine dl are five times faster than vbo's(1 fps)
> 
> 
> What's your "machine"...?
> 
> For this to be useful it's nice to see what graphics
> card you have.
> 

I wrote that i have athlon 1.7 256ram GF4 MX420.
0
Reply Sulsa 2/15/2005 9:26:58 PM

Jon Harrop wrote:

> nVidia 6629 drivers
> 
> My guess is that the DLs are a win here because they put the index array
> onto the graphics card. I believe I need to use GL_ELEMENT_ARRAY_BUFFER
> but my nVidia driver doesn't appear to provide this.

It does. I'm using it without a problem in my program with the same driver
version, but on a GeForce FX5200. Maybe your card doesn't support it?


0
Reply Rolf 2/16/2005 11:49:53 PM

Rolf Magnus wrote:
> Jon Harrop wrote:
>>nVidia 6629 drivers
>>My guess is that the DLs are a win here because they put the index array
>>onto the graphics card. I believe I need to use GL_ELEMENT_ARRAY_BUFFER
>>but my nVidia driver doesn't appear to provide this.
> 
> It does. I'm using it without a problem in my program with the same driver
> version, but on a GeForce FX5200. Maybe your card doesn't support it?

I have it running on a geforce2mx in linux. It runs on everything and 
the kitchen sink! It might even run on geforce440mx... though slowly of 
course... :/


Regards,
\\Mikkel Gjoel
0
Reply ISO 2/17/2005 2:16:28 AM

52 Replies
431 Views

(page loaded in 1.428 seconds)

Similiar Articles:


















7/26/2012 2:39:23 PM


Reply: