dear all,
I am currently trying to improve one tutorial for
the CGAL polyhedron: http://www.cgal.org/Tutorials/Polyhedron/index.html
My goal is speed speed speed, for triangle meshes with smooth
shading.
I recently switched to vertex buffer arrays - and the improvement
is really spectacular on my NVidia quadro fx (laptop M70).
I am using interleaved arrays, with normals and vertex coordinates
packed into one array (example code below).
my question is the following: how can I improve the speed further ?
is there any chance that using a cache oblivious reordering will
improve the speed ? (such as http://gamma.cs.unc.edu/COL/OpenCCL/)
I could observe a great dispersion in terms of FPS in my application,
depending on the meshes.
any advice would be greatly appreciate,
--
Pierre Alliez
INRIA Sophia-Antipolis
http://www-sop.inria.fr/geometrica/team/Pierre.Alliez/
unsigned int nb_triangles = pDoc->m_pMesh->size_of_facets();
if(!m_hardware_buffer_done || pDoc->m_pMesh->modified())
{
if(m_pHardware_buffer != NULL)
delete [] m_pHardware_buffer;
m_pHardware_buffer =
pDoc->m_pMesh->generate_vertex_buffer(nb_triangles);
m_hardware_buffer_done = true;
if(m_pHardware_buffer == NULL)
return;
// reset
pglBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
// store data into buffer object, which is accelerated
// when available
pglBufferDataARB(GL_ARRAY_BUFFER_ARB,
18 * nb_triangles * sizeof(float),
m_pHardware_buffer,
GL_STATIC_DRAW_ARB);
// the classic gen and bind thingies we know from OpenGL
// creates and selects the buffer object
pglGenBuffersARB(1, &m_bufferObject);
pglBindBufferARB(GL_ARRAY_BUFFER_ARB, m_bufferObject);
// the vertex buffer object is ready and stored in the graphics
// card's memory. It can be used as any normal OpenGL vertex array.
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);
m_hardware_buffer_done = true;
pDoc->m_pMesh->modified() = false;
}
// display shaded triangles
glInterleavedArrays(GL_N3F_V3F,0,m_pHardware_buffer);
glDrawArrays(GL_TRIANGLES,0,3*nb_triangles);
|
|
0
|
|
|
|
Reply
|
Pierre
|
2/3/2006 11:13:10 AM |
|
I had tried this with great success
a) find the pre T&L cache of my card. Finding it with experiments.
b) find the post T&L cache of my card. Finding it with experiments and
using TriStrips.
c) Convert your entire mesh into batches such that
a) Each batch contains number of vertices < PreT&L cache
b) The batch is made into triangle strips of length <=Post T&L cache
(if the card supports Restart Extention I think Quadro has this
support
then make one long tri strip of the entire batch with restarts
specified correctly).
d)Each batch should be then made to interleaved triangle strip format
e)Each batch is then bound to vbos
f) Render batches.
If your application is not cpu bound. Then you can spend some time
doing view dependent algorithms to cull out entire batches or make
smaller batches. If transparency is not an issue then you can make
batches with priorities set as
a) Locality b) Normal Clusters (Fast Backface Culling Using Normal
Masks - Zhang).
Valins
|
|
0
|
|
|
|
Reply
|
psvalins
|
2/3/2006 2:26:10 PM
|
|