Hi,
I have some game engine, where you can use
glRotate/glTranslate/glPushPop stuff.
However, you can also use glMultMatrix.
Thus, in order to get a point in camera space, I have to perform all
these with gl... commands, then glGetMatrix and then I can e.g. test
for frustum culling.
Now, would it be wise to perform all the glRotate stuff myself and
thus, have the matrix already at hand, or is it better to let OpenGL
do this and use glGetMatrix?
--
-Gernot
int main(int argc, char** argv) {printf
("%silto%c%cf%cgl%ssic%ccom%c", "ma", 58, 'g', 64, "ba", 46, 10);}
|
|
0
|
|
|
|
Reply
|
Gernot
|
10/29/2007 3:56:23 PM |
|
Gernot Frisch wrote:
> Now, would it be wise to perform all the glRotate stuff myself
> and thus, have the matrix already at hand, or is it better to
> let OpenGL do this and use glGetMatrix?
Do it yourself and just put the matrix into OpenGL with
glLoadMatrix. The matrix stack and everything else still works,
but as you already figured out it gives you the advantage in
haing the matrix already hat hand when you need it.
Also all matrix functions of OpenGL are not HW accelerated
anyway, so you can actually gain some performance if you do
reimplement them yourself.
Wolfgang Draxinger
--
E-Mail address works, Jabber: hexarith@jabber.org, ICQ: 134682867
|
|
0
|
|
|
|
Reply
|
Wolfgang
|
10/29/2007 4:58:09 PM
|
|
On Oct 29, 4:56 pm, "Gernot Frisch" <M...@Privacy.net> wrote:
>
> Now, would it be wise to perform all the glRotate stuff myself and
> thus, have the matrix already at hand,
yes.
--
<\___/>
/ O O \
\_____/ FTB. Remove my socks for email address.
|
|
0
|
|
|
|
Reply
|
fungus
|
10/29/2007 5:54:45 PM
|
|
> Do it yourself and just put the matrix into OpenGL with
> glLoadMatrix. The matrix stack and everything else still works,
> but as you already figured out it gives you the advantage in
> haing the matrix already hat hand when you need it.
>
> Also all matrix functions of OpenGL are not HW accelerated
> anyway, so you can actually gain some performance if you do
> reimplement them yourself.
Works fine. One problem:
The order I stored the glRotate calls before, was reverse order.
Assume you have a pointer pointing towards (+)z-axis.
Then:
glRotate(90, -1,0,0)
makes it point to (+)y axis. Fine.
Now I want to rotate "this" thing arount z axis:
glRotate(90, 0,0,-1)
should make it rotate pointing to x axis.
But OpenGL is the other way round.
My brain is too small to understand "why" the reverse order?
|
|
0
|
|
|
|
Reply
|
Gernot
|
10/30/2007 1:57:38 PM
|
|
Gernot Frisch wrote:
> My brain is too small to understand "why" the reverse order?
left sided matrix multiplication vs. right sided matrix
multiplication.
The idea of the reverse order of operations in OpenGL is, that
one can create transformation heirachies. E.g. first position
the model of the android in space, then rotate it's waist, the
shoulder, elbow and so on. But if you really had to position an
android model you'd do it in the other way, than one thinks of
it. And to accomodate for this habit, OpenGL performs right
sides multiplication, i.e. the transformation performed is on
the right side of the matrix multiplication. The effect is, that
the OOP is exactly other way round.
In the Red Book there's a own chapter about it (somewhere towards
the end).
Wolfgang Draxinger
--
E-Mail address works, Jabber: hexarith@jabber.org, ICQ: 134682867
|
|
0
|
|
|
|
Reply
|
Wolfgang
|
10/30/2007 2:22:54 PM
|
|
Wolfgang Draxinger pisze:
> Also all matrix functions of OpenGL are not HW accelerated
> anyway, so you can actually gain some performance if you do
> reimplement them yourself.
>
Hello,
I don't know if I understand this correctly.
Is that mean, that 32bit CPU makes matrix multiplication operations
faster that 256bit GPU?
Regards
Krzysiek
|
|
0
|
|
|
|
Reply
|
ISO
|
10/30/2007 6:34:02 PM
|
|
On Oct 30, 8:34 pm, Krzysiek So?ek
<ksolek191.USU...@TOTEZUSUN.poczta.onet.pl> wrote:
> Wolfgang Draxinger pisze:
>
> > Also all matrix functions of OpenGL are not HW accelerated
> > anyway, so you can actually gain some performance if you do
> > reimplement them yourself.
>
> Hello,
> I don't know if I understand this correctly.
> Is that mean, that 32bit CPU makes matrix multiplication operations
> faster that 256bit GPU?
>
> Regards
> Krzysiek
It doesn't matter as long as the GPU does not collapse the matrix
stack.
The strength of a GPU is parallelism; you have hundreds of multiply-
accumulator units running in parallel, that's great, but collapsing
matrix stack isn't task that benefits from such arrangement very much.
GPU's parallelism is great when you have 1000's and millions of
vertices and fragments that all execute the same program. The setup
like matrix stack collapse is one-time-per-draw command overhead. Now,
transferring ALL the matrix stack and commands what to do with it into
the GPU=B4's local memory only to do a few multiply-accumulate ops isn't
really worth the hassle.
The modern x86 architecture has SSE instructions, which use 128 bit
(xmm) registers. The 32 bits you mention gives me a mental image of
x87 floating point stack. The 256 bits on the other hand draw me a
blank stare, most GPU's process either 32 bit scalars (scalar engine)
or 4 x 32 bit scalars (vector engine) at a time. My best guess is that
you mean that some specific GPU has 256 bits wide internal bus..?
The scalar engines are becoming more common, there was a few years
when the "SIMD" was all the rage in the GPU design but it's wearing
off; data the applications send for processing is often vec3
(position) and vec2 (texcoords) and, well, generating optimum code so
that all scalar elements of vec4 are utilized is not easy task for a
compiler.. with 'easy' mean impossible with a lot of code that is
written by developers.
It would've been more, let's say, inviting for developers to
"optimize" code to use SIMD if they only had vec4 in the GLSL. Well,
that wasn't the case, the float, vec2 and vec3 were also exposed
because those are basic data types all applications used since year
rock and scissors.
So what the hardware guys did was to make scalar engine instead, where
there is a crossbar which distributes the computations to scalar
units. vec3+vec3 operation would consume precisely _3_ scalar units.
No waste. The same operation would consume one vector unit, wasting
25% of the unit, unless there was some other scalar add operation that
could be fused with the vec3+vec3, but then, the data would have to be
swizzled so that it ends up in the same register before the operation
and so on. This kind of thing can get really nasty and increase number
of ALU instructions that are generated again degrading the
performance. So you lose-lose no matter what approach you take.
A scalar engine, on the other hand, is always optimal. The downside is
that you need extra logic to implement the crossbar, but on the other
hand, the idea is that the extra logic used for that puts the units
into better use and you get better return for the investment. This is
where the "unified shaders" come into picture; since all computation
goes into this scalar alu array, the fragment and vertex programs are
just interface... problem with this is that dedicated fragment alu
instruction could be tighter in implementation, again the generic
unified ALU design is slightly waste from that point of view, but when
you throw enough power at it, at least all of it can be utilized.
So from this angle the 256 bit wide (whatever you meant by that)
statement dies off, it's actually 32 bit scalar vs. 32 bit scalar /
128 bit 4 x 32 bit scalar competition at best. The GPU has a lot more
ALU's to do the computations; it wins every time in parallel
computation.
Problem with generic CPU is the issue rate; the processing is serial
and each instruction is dependent on the state of the processor before
that specific instruction. You are limited by the rate you can feed
the CPU work to do. The GPU doesn't have this bottleneck from the
practical point of view.
Back to the matrix stack collapse; it is not problem you can very
easily parallelize, first, it is serial like fragment program for one
specific fragment for example, the problem is, there aren't more than
one instance of this program being executed. So from that point of
view suddenly the CPU isn't at very great disadvantage at all. Also,
doing this with the CPU keeps the GPU side of the driver code one bit
simpler. Simpler without any performance disadvantage is a good thing.
Hope this gives the right mental image of the situation. Good luck.
|
|
0
|
|
|
|
Reply
|
aku
|
10/30/2007 11:03:34 PM
|
|
> I don't know if I understand this correctly.
> Is that mean, that 32bit CPU makes matrix multiplication operations
> faster that 256bit GPU?
>
CPU: 3GHz
GPU: 600Mhz
Also, you're mixing bus width with register size.
A pentium CPU has 32 bit registers but it has a
64 bit data bus.
Besides, sending the data to the GPU has a lot of
overhead - copy the data to a buffer, set up the DMA
controller, transfer the data....etc. The CPU will have
finished the multiply long before the data even gets
to the GPU.
--
<\___/>
/ O O \
\_____/ FTB. Remove my socks for email address.
|
|
0
|
|
|
|
Reply
|
fungus
|
10/31/2007 4:11:02 AM
|
|
Krzysiek So?ek wrote:
> Hello,
> I don't know if I understand this correctly.
> Is that mean, that 32bit CPU makes matrix multiplication
> operations faster that 256bit GPU?
No, it means, that any CPU you'll find today will do a 4x4 matrix
multiplication in a much shorter time, than it takes, to map the
data from application memory to DMA memory, initiate the copy
process, send the calculation commands to the GPU and do the
reverse steps to get the data back into the application memory
on the CPU. The keyword is "Overhead".
Of course any modern GPU can do matrix multiplications a lot of
faster than any CPU, but it makes only sense if those operate on
memory of the GPU.
Since all OpenGL matrix manipulation operations are executed from
CPU context, all calculations are done by CPU (i.e. you don't
have that overhead), and the resulting matrix is sent to the GPU
when drawing primitives is issued (i.e. glBegin, glDrawArrays,
glDrawElements), just like you'd specify auxiliary matrices for
a shader with uniforms (technically the very same functions are
used).
glPushMatrix will also copy a matrix to the GPU, but again not
immediately, but deferred, as the whole overhead to send data to
the GPU can be saved by sending a lot of data in batches.
Wolfgang Draxinger
--
E-Mail address works, Jabber: hexarith@jabber.org, ICQ: 134682867
|
|
0
|
|
|
|
Reply
|
Wolfgang
|
10/31/2007 11:43:07 AM
|
|
On Oct 31, 1:03 am, aku ankka <ju...@liimatta.org> wrote:
Summary of the previous. ;-)
case 1: "how the cpu works"
work ---->- CPU -----> result
case 2: "how the gpu works"
work ----> GPU -----> result
work ----> GPU -----> result
work ----> GPU -----> result
work ----> GPU -----> result
work ----> GPU -----> result
.... etc.. possibly 100's of times :)
case 3: "what the gpu can do for the specific problem"
work ----> GPU -----> result
The CPU is *faster* for this specific case; let's say we put 2.6 GHz
Core 2 against 650 MHz GPU. If both can issue instruction per clock,
the issue rate is higher by a nice factor for the CPU: the GPU's
parallelism advantage is eliminated.
There are details like how many instructions per clock can each issue,
how many ALU's they got to realize the issue rate and so on. The GPU
*could* compute each matrix simultaneously with it's multiple scalar/
vector ALU's and then also concenate them parallel, but, so what? ;)(
|
|
0
|
|
|
|
Reply
|
aku
|
10/31/2007 12:17:19 PM
|
|
aku ankka pisze:
> Hope this gives the right mental image of the situation. Good luck.
>
Hi,
Thank you for your explanation. I really appreciate it. Now I have a
better picture of the issues that I was not aware of.
Also many thanks to Wolfgang Draxinger and fungus at other threads!!
Regards
Krzysiek
|
|
0
|
|
|
|
Reply
|
ISO
|
10/31/2007 5:27:48 PM
|
|
> Of course any modern GPU can do matrix multiplications a lot of
> faster than any CPU, but it makes only sense if those operate on
> memory of the GPU.
It's funny, a few years ago everyone was saying let OpenGL handle the matrix
maths, now it's do it yourself. Here's one question: how will the combined
CPU/GPU chips change things? Will the pednulum swing back to letting OpenGL
do the work?
--
Charles E. Hardwidge
|
|
0
|
|
|
|
Reply
|
Charles
|
11/3/2007 7:02:55 AM
|
|
Charles E Hardwidge wrote:
> It's funny, a few years ago everyone was saying let OpenGL
> handle the matrix maths
The main reason for that was, that doing the math 'yourself'
caused at that time not neglectible overhead, when copying the
matrix over with glLoadMatrix - for the lighting calculations
the matrix has to be inverted every time e.g. Now if using
glRotate/glTranslate/glScale the problem of inverting the matrix
can be saved, since for each of those operations there is a
computational easier solution to inversion, than the general
solution (transpose the rotation matrix, flip the sign of x,y,z
in the translate matrix, use the reciprocal values for scale and
then: L multiply on the inverse matrix stack).
However today's CPUs are so fast that any manually done matrix
computation easily outperforms the call overhead of OpenGL
matrix manipulation functions.
> now it's do it yourself. Here's one
> question: how will the combined CPU/GPU chips change things?
> Will the pednulum swing back to letting OpenGL do the work?
Unlikely. At the time of the old recommendation, CPUs had not so
much floating point processing power, so it was critical to
spend as few CPU cycles as possible on matrix stuff. Compared
with FPU operations function calls had only few overhead.
Nowadays CPU have multiparallel pipelines, performing not just
one, but multiple operations per cycle. However any function
call or long jump will disrupt the pipeline, thus slowing the
CPU down. And this will not change with future CPUs - either
they got an OOE rewrite engine built in (Intel archirecture) or
rely on the compiler to do that (MIPS), but all modern
architectures have and probably will have in common, that they
are highly parallel and execute things out of order.
Now here comes something that I think John Harrop will agree with
me: I think that we will see a lot more use of descriptive
programming in the future. Already today the actually executed
operations may differ a lot in order from what has been
imperatively coded. However the use of descriptive languages
(like e.g. functional ones => OCaml, Haskell, Clean, Lisp and
similair) allows compilers to do a lot more optimizing for OOE.
This doesn't mean, that imperative programming is obsolete, like
if you have the define the exact order of things to do - or if
you've to write some backend/runtime for a descriptive
environment.
Wolfgang Draxinger
--
E-Mail address works, Jabber: hexarith@jabber.org, ICQ: 134682867
|
|
0
|
|
|
|
Reply
|
Wolfgang
|
11/3/2007 10:53:59 AM
|
|
"Wolfgang Draxinger" <wdraxinger@darkstargames.de> wrote in message
news:7j0uv4-tac.ln1@darkstargames.dnsalias.net...
> Charles E Hardwidge wrote:
>
>> It's funny, a few years ago everyone was saying let OpenGL
>> handle the matrix maths
>
> The main reason for that was, that doing the math 'yourself'
> caused at that time not neglectible overhead, when copying the
> matrix over with glLoadMatrix - for the lighting calculations
> the matrix has to be inverted every time e.g. Now if using
> glRotate/glTranslate/glScale the problem of inverting the matrix
> can be saved, since for each of those operations there is a
> computational easier solution to inversion, than the general
> solution (transpose the rotation matrix, flip the sign of x,y,z
> in the translate matrix, use the reciprocal values for scale and
> then: L multiply on the inverse matrix stack).
>
> However today's CPUs are so fast that any manually done matrix
> computation easily outperforms the call overhead of OpenGL
> matrix manipulation functions.
>
>> now it's do it yourself. Here's one
>> question: how will the combined CPU/GPU chips change things?
>> Will the pednulum swing back to letting OpenGL do the work?
>
> Unlikely. At the time of the old recommendation, CPUs had not so
> much floating point processing power, so it was critical to
> spend as few CPU cycles as possible on matrix stuff. Compared
> with FPU operations function calls had only few overhead.
> Nowadays CPU have multiparallel pipelines, performing not just
> one, but multiple operations per cycle. However any function
> call or long jump will disrupt the pipeline, thus slowing the
> CPU down. And this will not change with future CPUs - either
> they got an OOE rewrite engine built in (Intel archirecture) or
> rely on the compiler to do that (MIPS), but all modern
> architectures have and probably will have in common, that they
> are highly parallel and execute things out of order.
>
> Now here comes something that I think John Harrop will agree with
> me: I think that we will see a lot more use of descriptive
> programming in the future. Already today the actually executed
> operations may differ a lot in order from what has been
> imperatively coded. However the use of descriptive languages
> (like e.g. functional ones => OCaml, Haskell, Clean, Lisp and
> similair) allows compilers to do a lot more optimizing for OOE.
> This doesn't mean, that imperative programming is obsolete, like
> if you have the define the exact order of things to do - or if
> you've to write some backend/runtime for a descriptive
> environment.
>
> Wolfgang Draxinger
> --
> E-Mail address works, Jabber: hexarith@jabber.org, ICQ: 134682867
Thanks, Wolfgang. The reasoning and issues you're putting forward look sound
enough. It just took a while to sink in.
--
Charles E. Hardwidge
|
|
0
|
|
|
|
Reply
|
Charles
|
11/8/2007 3:03:58 AM
|
|
|
13 Replies
249 Views
(page loaded in 0.166 seconds)
Similiar Articles: rewrite glRotate? - comp.graphics.api.openglHi, I have some game engine, where you can use glRotate/glTranslate/glPushPop stuff. However, you can also use glMultMatrix. Thus, in order to get ... glTranslate, glRotate - comp.graphics.api.openglrewrite glRotate? - comp.graphics.api.opengl Hi, I have some game engine, where you can use glRotate/glTranslate/glPushPop stuff. However, you can also use glMultMatrix. Is glRotate the same as axisAngle to quaternion to matrix? - comp ...I have a glRotate(50, 1, 0, 0) that produces a rotation matrix. When I go from ... Angle to quaternion and quaternion to Axis angle question ... rotate ... rewrite glRotate ... Why Insulation is a Good Investment - comp.soft-sys.matlab ...rewrite glRotate? - comp.graphics.api.opengl Why Insulation is a Good Investment - comp.databases.ms-access ... rewrite glRotate? - comp.graphics.api.opengl My brain is ... Newbie question about rotate a picture using OpenGL - comp ...rewrite glRotate? - comp.graphics.api.opengl Newbie question about rotate a picture using OpenGL - comp ... rewrite glRotate? - comp.graphics.api.opengl... glRotate(90, 0 ... OpenGL, OSX and Intel compilers - comp.graphics.api.opengl ...rewrite glRotate? - comp.graphics.api.opengl Since all OpenGL matrix manipulation operations are executed from ... CPUs - either they got an OOE rewrite engine built in ... Vertex Array Implementation - comp.graphics.api.openglrewrite glRotate? - comp.graphics.api.opengl Vertex Array Implementation - comp.graphics.api.opengl rewrite glRotate? - comp.graphics.api.opengl goes into this scalar alu ... Critical section in parallel computing - comp.soft-sys.matlab ...rewrite glRotate? - comp.graphics.api.opengl Critical section in parallel computing - comp.soft-sys.matlab ... rewrite glRotate? - comp.graphics.api.opengl... is ... number of texture units on GeForce 6800 - comp.graphics.api.opengl ...I am rewriting my renderers to be based only on shader programs (no fixed ... number of texture units on GeForce 6800 - comp.graphics.api.opengl ... rewrite glRotate ... Inverse of an openGL matrix - comp.graphics.api.openglrewrite glRotate? - comp.graphics.api.opengl Inverse of an openGL matrix - comp.graphics.api.opengl rewrite glRotate? - comp.graphics.api.opengl Since all OpenGL matrix ... Flipping the y-axis? - comp.soft-sys.matlabrewrite glRotate? - comp.graphics.api.opengl Then: glRotate(90, -1,0,0) makes it point to (+)y axis. Fine. Now I want to ... to inversion, than the general solution ... Re: fast angle calculation between 2D vectors - comp.soft-sys ...rewrite glRotate? - comp.graphics.api.opengl Re: fast angle calculation between 2D vectors - comp.soft-sys ... rewrite glRotate? - comp.graphics.api.opengl So from this ... FREE Animations you can e-mail - comp.sys.mac.appsrewrite glRotate? - comp.graphics.api.opengl... where you can use ... so you can actually gain some performance if you do reimplement them yourself. GPU and OpenGL - comp.graphics.api.openglrewrite glRotate? - comp.graphics.api.opengl Of course any modern GPU can do matrix multiplications a lot of faster than any CPU, but it makes only sense if those operate ... OpenGL vs Direct3D - comp.graphics.api.openglrewrite glRotate? - comp.graphics.api.opengl OpenGL vs Direct3D - comp.graphics.api.opengl rewrite glRotate? - comp.graphics.api.opengl left sided matrix multiplication vs ... rewrite glRotate? - opengl - Mofeel Groupsopengl, rewrite glRotate? comp.graphics.api.opengl - The OpenGL 3D application programming interface. rewrite glRotate? - comp.graphics.api.opengl | Computer GroupHi, I have some game engine, where you can use glRotate/glTranslate/glPushPop stuff. However, you can also use glMultMatrix. Thus, in order to get ... 7/25/2012 8:56:19 AM
|