as fast as possible...
With a single application doing glReadPixels() I can get ~500MiB/s which
is great. But as soon as I start a second OpenGL application, the
transferrate drops down to 30-40MiB/s. I guess that there's some locking
going on when two applications are using the GPU, but I don't belive
that a simple data transfer from VRAM to system memory can slow down the
GPU so much.
I thought maybe it's because both applications are using the frontbuffer
and maybe it would be faster if I copied the frontbuffer to a second
buffer (FBO, texture etc.) and transfered the second buffer to system
memory. This would require much less locking because VRAM to VRAM is
much faster than VRAM to system memory. But.. I don't know how to
transfer data from a FBO (renderbuffer) to system memory (glReadPixels
doesn't like FBO renderbuffers) or from a texture to system memory.
Can someone explain me how to transfer data from FBOs to system memory?
Or from textures to system memory?
thanks
tom
|
|
0
|
|
|
|
Reply
|
tom
|
10/1/2005 11:22:59 AM |
|
tom wrote:
> as fast as possible...
>
> With a single application doing glReadPixels() I can get ~500MiB/s which
> is great. But as soon as I start a second OpenGL application, the
> transferrate drops down to 30-40MiB/s. I guess that there's some locking
> going on when two applications are using the GPU, but I don't belive
> that a simple data transfer from VRAM to system memory can slow down the
> GPU so much.
> I thought maybe it's because both applications are using the frontbuffer
> and maybe it would be faster if I copied the frontbuffer to a second
> buffer (FBO, texture etc.) and transfered the second buffer to system
> memory. This would require much less locking because VRAM to VRAM is
> much faster than VRAM to system memory. But.. I don't know how to
> transfer data from a FBO (renderbuffer) to system memory (glReadPixels
> doesn't like FBO renderbuffers) or from a texture to system memory.
I've made some tests. My application creates a FBO with a renderbuffer
or a tetxure as GL_COLOR_ATTACHMENT0_EXT and does glReadPixels() to a
local static buffer. In both cases I get 500MiB/s when I run the
application. But if I start two instances of the same application, the
transferrate drops down to 30-40MiB/s.
That means: both application instances are reading from different memory
regions in VRAM but yet their cumulative transferrate is much less than
one application would be able to reach.
I don't belive there's so much overhead required for these simple
operations.
So someone please explain this to me...
|
|
0
|
|
|
|
Reply
|
tom
|
10/1/2005 11:57:12 AM
|
|
tom wrote:
>
> I don't belive there's so much overhead required for these simple
> operations.
>
> So someone please explain this to me...
Only the people who write the drivers
know the truth...
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
In science it often happens that scientists say, 'You know
that's a really good argument; my position is mistaken,'
and then they actually change their minds and you never
hear that old view from them again. They really do it.
It doesn't happen as often as it should, because scientists
are human and change is sometimes painful. But it happens
every day. I cannot recall the last time something like
that happened in politics or religion.
- Carl Sagan, 1987 CSICOP keynote address
|
|
0
|
|
|
|
Reply
|
fungus
|
10/1/2005 1:39:37 PM
|
|
fungus wrote:
> tom wrote:
>
>>
>> I don't belive there's so much overhead required for these simple
>> operations.
>>
>> So someone please explain this to me...
>
>
>
> Only the people who write the drivers
> know the truth...
>
Even those people won't tell me...
I've replaced glReadPixels() with glGetTexImage() and it's faster.. I
can get 800MiB/s when transfering data from texture to system memory.
But when I try to copy the frontbuffer to the texture (using
glCopyTexImage()), it gets _really_ slow. glCopyTexImage() does only
~3MiB/s (less than 1 fps).
The good thing is that when I run two instances of this new application,
the transfer rate drops to 400MiB/s which is exactly half of the
transfer rate when only one application is running.
Now I only need to figure out how to copy the frontbuffer to the texture.
|
|
0
|
|
|
|
Reply
|
tom
|
10/1/2005 1:50:17 PM
|
|
tom wrote:
> glCopyTexImage() does only ~3MiB/s (less than 1 fps).
>
Sounds like your data formats don't match and
are being converted. Look at the texture internal
format with glGetTexLevelParameter()
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
In science it often happens that scientists say, 'You know
that's a really good argument; my position is mistaken,'
and then they actually change their minds and you never
hear that old view from them again. They really do it.
It doesn't happen as often as it should, because scientists
are human and change is sometimes painful. But it happens
every day. I cannot recall the last time something like
that happened in politics or religion.
- Carl Sagan, 1987 CSICOP keynote address
|
|
0
|
|
|
|
Reply
|
fungus
|
10/1/2005 5:05:41 PM
|
|
fungus wrote:
> tom wrote:
>
>> glCopyTexImage() does only ~3MiB/s (less than 1 fps).
>>
>
> Sounds like your data formats don't match and
> are being converted. Look at the texture internal
> format with glGetTexLevelParameter()
>
all formats are GL_RGBA...
I have two FBOs each with a texture as GL_COLOR_ATTACHMENT0_EXT.
This is my function:
void __glCaptureGet()
{
// unbind FBO, read from the frontbuffer
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0);
glReadBuffer(GL_FRONT);
glDrawBuffer(GL_FRONT);
// fill the frontbuffer with garbage
glClearColor(1.0, 0.0, 0.0, 1.0);
glClear(GL_COLOR_BUFFER_BIT);
glFlush();
glXSwapBuffers(dpy, win);
// bind the texture from FBO '0'
glBindTexture(GL_TEXTURE_2D, textureBuffers[0]);
// copy from frontbuffer to texture '0'
glCopyTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 0, 0,
__glCaptureWidth, __glCaptureHeight, 0);
// bind FBO '0', read from texture '0'
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, frameBuffers[0]);
glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
glDrawBuffer(GL_COLOR_ATTACHMENT0_EXT);
// bind texture '1'
glBindTexture(GL_TEXTURE_2D, textureBuffers[1]);
// read from FBO '0' (texture '0')
glCopyTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 0, 0,
__glCaptureWidth, __glCaptureHeight, 0);
// copy texture '1' to local memory
glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA,
GL_UNSIGNED_BYTE, (char *) __frameBuffer);
}
If I comment out the last line (glGetTexImage) I can get 1000 iterations
per second (calls to this function per second).
If I comment out everything from "bind FBO '0', read from texture '0'"
down to the end I can get twice as much (2000 iterations per second),
sounds reasonable because I do only one glCopyTexImage2D instead of two.
but If I read to a texture and then transfer the pixels to main memory
(eg. uncomment the last line) the performance drops down to <1
iterations per second.
If I read to one texture and then bind the other texture and try to copy
those pixels to main memory, I'm at 140 iterations per second.
I don't understand whats going on...
|
|
0
|
|
|
|
Reply
|
tom
|
10/1/2005 6:07:35 PM
|
|
I did some more testing... and all basically leads to this conclusion:
void fastFunction()
{
// bind a texture
glBindTexture();
// update the whole texture
glTexSubImage2D();
// tranfer the texture data back
glGetTexImage();
}
void slowFunction()
{
// bind a texture
glBindTexture();
// copy data from the frontbuffer to the texture
glCopyTexImage2D();
// tranfer data to system memory
glGetTexImage();
}
I can call fastFunction() up to ~45 times in a second and this drops
down to ~20 when I start glxgears (which should be ok because glxgears
also uploads data to the VRAM).
OTOH, i can call lowFunction() only 1.4 times per second (one execution
takes something under one second).
So please someone explain me... Why is fastFunction() faster? It does
more work, it transfers data from system memory to VRAM and than back.
slowFunction() only copies pixels from frontbuffer to the texture and
that is VRAM to VRAM which should be really fast.
Should I fill a bug report?
|
|
0
|
|
|
|
Reply
|
tom
|
10/1/2005 10:51:17 PM
|
|
tom wrote:
>
> all formats are GL_RGBA...
>
BGRA is usually faster...
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
In science it often happens that scientists say, 'You know
that's a really good argument; my position is mistaken,'
and then they actually change their minds and you never
hear that old view from them again. They really do it.
It doesn't happen as often as it should, because scientists
are human and change is sometimes painful. But it happens
every day. I cannot recall the last time something like
that happened in politics or religion.
- Carl Sagan, 1987 CSICOP keynote address
|
|
0
|
|
|
|
Reply
|
fungus
|
10/2/2005 3:24:56 PM
|
|
tom wrote:
>
> So please someone explain me... Why is fastFunction() faster? It does
> more work, it transfers data from system memory to VRAM and than back.
> slowFunction() only copies pixels from frontbuffer to the texture and
> that is VRAM to VRAM which should be really fast.
>
It must be pixel formats. The framebuffer is
likely to be BGRA format, your card might be
able to store/use textures in RGBA format so
no conversion is needed.
BGRA is always the best format for PC graphics
cards.
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
In science it often happens that scientists say, 'You know
that's a really good argument; my position is mistaken,'
and then they actually change their minds and you never
hear that old view from them again. They really do it.
It doesn't happen as often as it should, because scientists
are human and change is sometimes painful. But it happens
every day. I cannot recall the last time something like
that happened in politics or religion.
- Carl Sagan, 1987 CSICOP keynote address
|
|
0
|
|
|
|
Reply
|
fungus
|
10/2/2005 3:29:50 PM
|
|
"fungus" <umailMY@SOCKSartlum.com> wrote in message
news:P1T%e.97373$dr.48304@news.ono.com...
> tom wrote:
> >
> > So please someone explain me... Why is fastFunction() faster? It does
> > more work, it transfers data from system memory to VRAM and than back.
> > slowFunction() only copies pixels from frontbuffer to the texture and
> > that is VRAM to VRAM which should be really fast.
> >
>
> It must be pixel formats. The framebuffer is
> likely to be BGRA format, your card might be
> able to store/use textures in RGBA format so
> no conversion is needed.
>
> BGRA is always the best format for PC graphics
> cards.
>
>
> --
> <\___/>
> / O O \
> \_____/ FTB. For email, remove my socks.
>
> In science it often happens that scientists say, 'You know
> that's a really good argument; my position is mistaken,'
> and then they actually change their minds and you never
> hear that old view from them again. They really do it.
> It doesn't happen as often as it should, because scientists
> are human and change is sometimes painful. But it happens
> every day. I cannot recall the last time something like
> that happened in politics or religion.
>
> - Carl Sagan, 1987 CSICOP keynote address
>
I even wonder about the "A". Without a destination Alpha buffer, the driver
might have to pad every 4th byte (OP -- you ARE using unsigned bytes, right
?). Maybe BGR could be faster than BGRA in one of the transfers.
jbw
|
|
0
|
|
|
|
Reply
|
jbwest
|
10/3/2005 12:18:05 AM
|
|
jbwest wrote:
>
> I even wonder about the "A". Without a destination Alpha buffer, the driver
> might have to pad every 4th byte (OP -- you ARE using unsigned bytes, right
> ?). Maybe BGR could be faster than BGRA in one of the transfers.
>
The framebuffer stores pixels in 32 bits
with or without destination alpha.
--
<\___/>
/ O O \
\_____/ FTB. For email, remove my socks.
In science it often happens that scientists say, 'You know
that's a really good argument; my position is mistaken,'
and then they actually change their minds and you never
hear that old view from them again. They really do it.
It doesn't happen as often as it should, because scientists
are human and change is sometimes painful. But it happens
every day. I cannot recall the last time something like
that happened in politics or religion.
- Carl Sagan, 1987 CSICOP keynote address
|
|
0
|
|
|
|
Reply
|
fungus
|
10/3/2005 7:24:35 AM
|
|
jbwest wrote:
> "fungus" <umailMY@SOCKSartlum.com> wrote in message
> news:P1T%e.97373$dr.48304@news.ono.com...
>
>>tom wrote:
>>
>>>So please someone explain me... Why is fastFunction() faster? It does
>>>more work, it transfers data from system memory to VRAM and than back.
>>>slowFunction() only copies pixels from frontbuffer to the texture and
>>>that is VRAM to VRAM which should be really fast.
>>>
>>
>>It must be pixel formats. The framebuffer is
>>likely to be BGRA format, your card might be
>>able to store/use textures in RGBA format so
>>no conversion is needed.
>>
>>BGRA is always the best format for PC graphics
>>cards.
>>
>>
>>--
>><\___/>
>>/ O O \
>>\_____/ FTB. For email, remove my socks.
>>
>>In science it often happens that scientists say, 'You know
>>that's a really good argument; my position is mistaken,'
>>and then they actually change their minds and you never
>>hear that old view from them again. They really do it.
>>It doesn't happen as often as it should, because scientists
>>are human and change is sometimes painful. But it happens
>>every day. I cannot recall the last time something like
>>that happened in politics or religion.
>>
>>- Carl Sagan, 1987 CSICOP keynote address
>>
>
>
> I even wonder about the "A". Without a destination Alpha buffer, the driver
> might have to pad every 4th byte (OP -- you ARE using unsigned bytes, right
> ?). Maybe BGR could be faster than BGRA in one of the transfers.
>
> jbw
>
>
The problem emerged down to this (simplified code):
static int __glCaptureWidth = 1280;
static int __glCaptureHeigh = 1024;
int main(int argc, char *argv[])
{
glEnable(GL_TEXTURE_2D);
glGenFramebuffersEXT(1, &frameBuffer);
glGenTextures(1, &textureBuffer);
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, frameBuffer);
glBindTexture(GL_TEXTURE_2D, textureBuffer);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, __glCaptureWidth,
__glCaptureHeight,
0, GL_RGBA, GL_UNSIGNED_BYTE, 0);
glTexParameteri (GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
memset(textureData, 0x0f, 4 * 1280 * 1024);
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, __glCaptureWidth,
__glCaptureHeight,
GL_RGBA, GL_UNSIGNED_BYTE, textureData);
glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT,
GL_TEXTURE_2D, textureBuffer, 0);
GLuint status = glCheckFramebufferStatusEXT(GL_FRAMEBUFFER_EXT);
if ( status != GL_FRAMEBUFFER_COMPLETE_EXT ) {
fprintf( stderr, "Framebuffer error: 0x%x\n", status );
exit( 1 );
}
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0);
__glCaptureGet();
}
void __glCaptureGet()
{
glBindTexture(GL_TEXTURE_2D, 0);
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, frameBuffer);
glClearColor(1.0, 0.0, 0.0, 1.0);
glClear(GL_COLOR_BUFFER_BIT);
glFinish();
glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
glReadPixels(0, 0, __glCaptureWidth, __glCaptureHeight, GL_BGRA,
GL_UNSIGNED_BYTE, (char *) __frameBuffer);
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0);
glBindTexture(GL_TEXTURE_2D, textureBuffer);
glGetTexImage(GL_TEXTURE_2D, 0, GL_BGRA, GL_UNSIGNED_BYTE,
(char *) __textureBuffer);
int cmp = memcmp(__textureBuffer, __frameBuffer, imageDataSize);
if (cmp) {
fprintf( stderr, "data different\n" );
fprintf( stderr, "textureBuffer %d:%d:%d:%d\n",
__textureBuffer[0], __textureBuffer[1],
__textureBuffer[2], __textureBuffer[3] );
fprintf( stderr, "frameBuffer %d:%d:%d:%d\n",
__frameBuffer[0], __frameBuffer[1],
__frameBuffer[2], __frameBuffer[3] );
exit( 1 );
}
}
Why am I getting this output?
textureBuffer 15:15:15:15 <- wrong default texture pixel (0x0f == 15)
frameBuffer 0:0:255:255 <- correct BGRA pixel
glGetTexImage() returns different (wrong) data than glReadPixels() yet
both read from the same source...
|
|
0
|
|
|
|
Reply
|
tom
|
10/3/2005 7:53:16 AM
|
|
>
> BGRA is always the best format for PC graphics
> cards.
>
Hi all,
fungus is right about the format. I've made the tests myself.
By the way, why are you using FBO and not PBO ?
You could have two, one to write into and the other to read back, very
much like NVIDIA's PBO example.
http://download.developer.nvidia.com/developer/SDK/Individual_Samples/featured_samples.html#TexturePerformancePBO
Perhaps I misunderstood what you want to do, but If you need to use
FBO, why aren't you rendering directly to it instead of rendering to
the front buffer and then copying it ?
The other thing is about glflush... If you are swaping buffers you
don;t need to call flush. I've made some tests with Intel VTune and
glFlush slows down ur app.
wpr
|
|
0
|
|
|
|
Reply
|
wpr
|
10/4/2005 10:02:35 AM
|
|
Hi,
Sorry about the glFlush... I meant glFinish. (I should stop drinking
coffee)
wpr.
|
|
0
|
|
|
|
Reply
|
wpr
|
10/4/2005 11:26:02 AM
|
|
Hi,
Introduce some glGetError and you will see that yout code will return
an error.
On main() for example, you are trying to create a rectangular texture.
GLenum errCode = 0;
if ((errCode = glGetError()) != GL_NO_ERROR)
{
const GLubyte* errorString = gluErrorString(errCode);
printf(errorString);
}
As I said before, use PBO. You should check NVIDIA's example.
wpr.
|
|
0
|
|
|
|
Reply
|
wpr
|
10/4/2005 11:41:40 AM
|
|
tom wrote:
>
> The problem emerged down to this (simplified code):
>
> [bug description + source code]
>
>
This is a confirmed bug in the linux nvidia drivers...
was fixed within 2 hors in mesa.. I'm wondering how long it will take
nvidia to fix it..
|
|
0
|
|
|
|
Reply
|
tom
|
10/4/2005 12:27:59 PM
|
|
|
15 Replies
260 Views
(page loaded in 0.164 seconds)
|