The many faces of perspective projection matrix

One of the first things I stumbled upon in the beginning of my adventure with graphics programming were types of matrices and view spaces. I remember it took me a while to wrap my head around different naming conventions (is clip space the same as screen space or…?) and how each and every projection worked from theoretical standpoint. With Internet around it’s so much easier to figure things out but there’s one thing that I remember baffling me: the relation between different forms of perspective projection matrix.

The most popular (and dare I say: the only one?) representation of a projection matrix that you can find in decent graphics and math books today is of the API agnostic form:

[ 2n/(r - l),          0, -(r + l)/(r - l),            0 ]
[          0, 2n/(t - b), -(t + b)/(t - b),            0 ]
[          0,          0,  (f + n)/(f - n), -2fn/(f - n) ]
[          0,          0,                1,            0 ]

where: 
r, l, t, b - respective planes of the view frustum (right, left, top, bottom)
f - far plane
n - near plane

This matrix is a result of how a truncated pyramid frustum is being transformed into canonical view volume (a unit cube) – a process slightly more complicated than a regular ortographic projection and requiring a bit of math work (which I will skip here as you can find plentiful reference material on the Internet). Lets assume for a second that it all makes sense to you, you understand how each element of the matrix came to be and how the entire thing works (no really, read the full math derivation and try to understand – it’ll help!). If you’re a beginner in graphics programming, one of the first example implementations of perspective projection matrix will probably use this form instead (assuming we’re talking OpenGL):

[ (1/r)cot(fov/2),          0,                0,            0 ]
[               0, cot(fov/2),                0,            0 ]
[               0,          0, -(f + n)/(f - n), -2fn/(f - n) ]
[               0,          0,               -1,            0 ]

where: 
fov - field of view
r   - aspect ratio
f   - far plane
n   - near plane

(if you happen to see a tan() being used instead of cot() remember, that one function is the inverse of another, so the final forms may differ slightly on the trig part)

Wait… what?

There’s very little explanation available out there concerning the relation between these two forms. The math behind it is there, you can still find it and get a grip of how the matrix works – but how do two different representations result in the same output? Also, which one is the “better” one that I should use? The key is simply to understand that you arrive at both solutions using different input parameters:

– First matrix is derived given n and f planes but also the actual dimensions of the view frustum defined by r, l, t and b planes (here, both the aspect ration and field of view can be extracted from the matrix for the given size of the frustum).
– Second matrix is a result of taking into account the n and f planes and instead of frustum dimensions we use the desired view aspect ratio and desired field of view.

It is therefore easier and less code to write with the second form, making it the most commonplace in real-life. This is especially visible in FPS games where we want to have a smooth and fast control over the player’s fov. Bottom line? Being able to express the same thing in different ways is a powerful tool but also one that can easily confuse everyone using it 🙂

Tweet about this on TwitterShare on RedditShare on LinkedInShare on FacebookShare on Google+Share on Tumblr

Oculus Rift DK2 (SDK 0.6.0.1) and OpenGL ES 2.0

Recently I’ve been working on a VR port for Rage of the Gladiator, a game that was originally released for mobile devices and used OpenGL ES 2.0 as the rendering backend. This seemingly simple task soon created several fun problems resulting in limitation of this graphics SDK in relation to “full-fledged” OpenGL. My initial idea was to rewrite the entire renderer but very soon this approach turned out to be a dead end (suffice to say, the original codebase was slightly convoluted), so I decided to stick with the original implementation. To run an OpenGL ES application on a PC I used the PowerVR SDK which is an excellent emulation of mobile rendering environment on a desktop computer.

Once I got the game up and running, I started figuring out how to plug in my existing Oculus code to get proper output both on the device and in the mirroring window. Rendering to the Rift worked pretty much out of the box – it only required changing the depth buffer internal format of each eye buffer to GL_DEPTH_COMPONENT16 (from the “default” GL_DEPTH_COMPONENT24). Creating a proper mirror output was a whole different story and while not excessively complicated, it did require some workarounds to get it working. Here’s a list of things I ran into – something you should consider if you ever decide to use Open GL ES in your VR application (but why would you, anyway? 🙂 ):

1. Replacement for glBlitFramebuffer()

Starting with Oculus SDK 0.6.0.0, rendering mirror texture to window is as easy as getting the system-handled swap texture and perform a blit to the window back buffer:

    // Blit mirror texture to back buffer
    glBindFramebuffer(GL_READ_FRAMEBUFFER, m_mirrorFBO);
    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);
    GLint w = m_mirrorTexture->OGL.Header.TextureSize.w;
    GLint h = m_mirrorTexture->OGL.Header.TextureSize.h;

    // perform the blit
    glBlitFramebuffer(0, h, w, 0, 0, 0, w, h, GL_COLOR_BUFFER_BIT, GL_NEAREST);

    glBindFramebuffer(GL_READ_FRAMEBUFFER, 0);

With OpenGL ES 2.0 you will soon notice that glBlitFramebuffer() is not present. This causes more complications than may seem at first because now you have to manually render a textured quad which, while not particularily difficult, is still a lot more code to write:

// create VBO for the mirror - call this once before BlitMirror()!
void CreateMirrorVBO()
{
    const float verts[] = { // quad vertices
                            -1.0f, 1.0f, 1.0f, 1.0f, -1.0f, -1.0f, 1.0f, -1.0f,

                            // quad tex coords
                            0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f,

                            // quad color
                            1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
                            1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f
    };

    glGenBuffers(1, &mirrorVBO);
    glBindBuffer(GL_ARRAY_BUFFER, mirrorVBO);
    glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);
}

void BlitMirror()
{
    // bind a simple shader rendering a textured (and optionally colored) quad
    ShaderManager::GetInstance()->UseShaderProgram(MainApp::ST_QUAD_BITMAP);

    // bind the stored window FBO - why stored? See 2.
    glBindFramebuffer(GL_FRAMEBUFFER, platform::Platform::GetFBO());
    glActiveTexture(GL_TEXTURE0);
    glBindTexture(GL_TEXTURE_2D, m_mirrorTexture->OGL.TexId);

    // we need vertex, texcoord and color - used by the shader
    glEnableVertexAttribArray(VERTEX_ARRAY);
    glEnableVertexAttribArray(TEXCOORD_ARRAY);
    glEnableVertexAttribArray(COLOR_ARRAY);

    glBindBuffer(GL_ARRAY_BUFFER, mirrorVBO);
    glVertexAttribPointer(VERTEX_ARRAY, 2, GL_FLOAT, GL_FALSE, 0, (const void*)0);

    glEnableVertexAttribArray(TEXCOORD_ARRAY);
    glVertexAttribPointer(TEXCOORD_ARRAY, 2, GL_FLOAT, GL_FALSE, 0, (const void*)(8 * sizeof(float)));

    glEnableVertexAttribArray(COLOR_ARRAY);
    glVertexAttribPointer(COLOR_ARRAY, 3, GL_FLOAT, GL_FALSE, 0, (const void*)(16 * sizeof(float)));

    // set the viewport and render textured quad
    glViewport(0, 0, WINDOW_WIDTH, WINDOW_HEIGHT);
    glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);

    // safety disable
    glBindBuffer(GL_ARRAY_BUFFER, 0);
    glDisableVertexAttribArray(VERTEX_ARRAY);
    glDisableVertexAttribArray(TEXCOORD_ARRAY);
    glDisableVertexAttribArray(COLOR_ARRAY);
}

2. Keeping track of the window/screen FBO

Many complex games of today heavily employ the use of rendering to texture for special effects or various other purposes. My experience shows that once programmers start using RTT, the calls to glBindFramebuffer() start appearing at an alarming rate in various parts of the code, disregarding the fact that in many instances switches between rendering to texture and rendering to the actual window happen more often than it should. Not counting performance impact, usually this behavior does not produce unwanted results and it *may* not matter whether we squeeze a render to window between various RTTs or not. Now, consider that the Oculus mirror render is virtually a blit to separate eye buffers which are then later again blitted to the output window buffer resulting in the popular distorted image you see in YouTube videos. If the rendering code performs a blit to window in-between RTTs, parts of the final image may be distorted by weirdly overlaying images.


Notice how a popup is rendered incorrectly behind the lenses due to a mid-RTT render to window.

For this reason it’s important to correctly track which FBO belongs to the window and avoid reverting to it *before* you render the entire scene – glGetIntegerv() is your friend and it can save you a lot of grief, especially with more complex drawing sections. While this is not a VR problem per-se and may happen to you in regular application development, it’s definitely easier to run into in this particular case.

3. Remember to disable OpenGL states after you’re done using them

Again, this is not a strictly VR-related issue but one that can manifest itself right away. With VR rendering you have to remember that you essentially draw the entire scene twice – once per eye. This means that OpenGL state after the first render persists during the second one which may produce some rather baffling results. It took me quite a while to understand why left eye rendered correctly, right eye had messed up textures and the mirror turned out completely black – turns out the cause was not calling glDisable() for culling, blending and depth test. A simple fix but very annoying one to track down 🙂

4. Don’t forget to disable V-Sync

As of today, PowerVR SDK seems to create all render contexts with V-Sync enabled – while this may sound suprisingly easy to detect it did, in fact, caused me some trouble. What’s worse – Oculus Rift didn’t seem to bother and showed a constant 75fps in the stats which only added to the confusion (why oh why does this one single triangle rendering stutter all the time?). Calling eglSwapInterval(display, 0) will solve that problem for you.

Conclusion

In perpsective, the issues I ran into were a minor annoyance but clearly showed how forgetting simple things can cause a whole bunch of issues you would normally never see when performing a single render. The whole experience was also a nice indication that the current state of Oculus SDK performs well even with limited OpenGL – even if it’s a bit gimmicky when developing for a PC.

Tweet about this on TwitterShare on RedditShare on LinkedInShare on FacebookShare on Google+Share on Tumblr