|
Author Topic:   Vertex arrays vs. glVertex3f
assen
Member
posted September 21, 1999 05:52 AM            
I tried moving my ROAM implementation from implicit, on-the-stack vertices,
fed into OpenGL via glVertex3f, to vertex arrays (glInterleavedArray(GL_T2F_V3F,
to be more specific).

The results? Nada. No improvement in framerate. Still, I think this is the way
to do it, and the fact that my crappy Savage3D ICD doesn't take advantage of
vertex arrays, doesn't mean a better ICD would.

The problem? Well, with implicit vertices I could store for each terrain tile
(65x65 vertices in my case - tried 128x128, but then tile view frustum culling
doesn't help much) only the coords of the corners of the tile projection in the
z=0 plane, and an array of the z values; I calculated the x, y, u and v coordinates
while descending the triangle tree. This is probably what Seumas does.

Now with the vertex array, I need a 65x65 array with (x,y,z,u,v) tuples for
each tile. For an 1024x1024 terrain, that's 21 MB - obviously too much just
for the terrain. However, the subdivision code is simpler, since I pass only
three vertex indices instead of three vertices (or three pointers to vertices).
To calculate the index of the midpoint of the hypotenuse of the triangle, I can just
average the indices of its two edgepoints (think of it, it works

So the choice is like this:
- you either burn up a ton of memory for storing highly redundant data, like
the texcoords and x,y coords which are a just regular grid, but feed primitives
into OpenGL "the right way"
- or you feed OpenGL with primitives the way everybody says is prohibitively
stupid/slow, but use only 20% of the memory.

IP:

assen
Member
posted September 21, 1999 11:07 AM            
I forgot the main question the previous post was supposed to pose: How much memory does the terrain subsystem in ThreadMarks in its present form (or rather, in the form we've seen in the demo) consume?

IP:

ostra
Member
posted September 21, 1999 03:17 PM            
Well, you could try building only the vertex array for the tile you are rendering, and render one tile at at time. And if you know the max vertex per tile, you wouldn´t mess up with dynamic allocation (as in the 21 mb case). It could be just a small array.
bye,
Marco

IP:

LDA Seumas
unregistered
posted September 23, 1999 06:59 AM           
As Marco mentioned, it should be possible to use vertex arrays without wasting all that memory. Realistically, you want to construct your arrays so that every vertex in the array is used, which is especially important when using Compiled Vertex Arrays, which is really the only way you'll see any significant speed up from vertex arrays. CVAs refers to the glLockArraysEXT and glUnlockArraysEXT extension functions, which "lock" (you promise to not write to any new data in the arrays you've specified) and "unlock" the arrays, and allow the driver to batch-transform the vertices and/or only transform each vertex once, even if it is references multiple times in the index array of a glDrawElements call.

Currently the dynamic part of Tread Marks' terrain engine uses 32 bytes per binary triangle object, and there are 2 binary triangle objects for every 1 triangle in the terrain displayed, so at the default of 3,000 triangles, that would be 192k.

------------------
-- Seumas McNally, Lead Programmer, Longbow Digital Arts

IP:

assen
Member
posted September 23, 1999 07:51 AM            
Hmmm... glLockArraysEXT? I thought I was supposed not to change anything in the arrays in all cases...

Anyway, I thought about it again, and I realized that of the 5 values in the vertexarray (x, y, z, s, t):
s and t, the texcoords, are the same for all tiles;
x and y are also the same, with a constant offset, which can be handled with a glTranslate
and as for the Zs, you only need to update those for the vertices you have active in the current tile - which you can learn from the fan accumulation phase.
So essentialy I can only keep one vertex array.

BTW where glLockArraysEXT is not mentioned in MSDN... hmm... I guess I'll have to do my homework on the Web...

IP:

LDA Seumas
unregistered
posted September 24, 1999 03:22 AM           
With normal Vertex Arrays, I believe you are allowed to muck with the contents at any time, since the values are pulled out of the array at the moment you make the glArrayElement call, which is why you saw no change in speed. Function call overhead is pretty darn minimal on modern systems, and that's pretty much all basic Vertex Arrays reduce.
www.OpenGL.org should have some docs on Compiled Vertex Arrays, which is an extension that isn't supported on all cards, so you'll need a fallback rendering path regardless.

A single vertex array will theoretically work as you mentioned, but it still wastes a lot of memory, and will cause a lot of cache missing, on any tile but an extremely highly tessellated tile. Also, it could be very slow when used with Compiled Vertex Arrays, as CVAs are usually used in the instance where every element in the vertex arrays is a vertex that needs to be rendered at least once, so the drivers are often optimized to transform the _entire_ array of vertices either at the locking or the unlocking, and to then render your primitives using the transformed-and-cached vertex data. On a Hardware Transform card, the entire vertex array will probably be downloaded to the card, transformed there, and primitives rendered by passing the card indices. In those cases, only actually using a tiny fraction of the verts in the array would hurt a lot, as I don't imagine drivers would bother to check for that seemingly rarer case.

All that said, the best way to find out what's fast is to experiment on as many different cards as possible, or to ask the developer support folks at various card manufacturers directly.

------------------
-- Seumas McNally, Lead Programmer, Longbow Digital Arts

IP:

assen
Member
posted September 24, 1999 05:51 AM            
Here's what MSDN (I believe they copied it from some SGI docs) says about glVertexPointer:
"Your application can modify static elements, but once the elements are modified, the application must explicitly specify the array again before using the array for any rendering."
However, if it really worked that way, they wouldn't need to introduce CVAs anyway...

A good reading on CVAs turned out to be John Carmack's post on optimizing GL drivers for Q3A. Since this is what videocard manufacturers will optimize for, it's a save bet to do as he does

The memory for a full vertex array, position & texcoords, for a tile of 65x65 vertices, would be around 84 KB, which is a bit too much, I admit, at least on Celerons. But I'm worried about recreating a shorter vertex array for each tile, for each frame - statistics show that for said 65x65 vertices (with a theoritical maximum of 8K tris) numbers like 500-800 triangles are typical (with a total triangle budget of, say, 6000 tris). However, the point about the driver transforming all the vertices at the moment of the glLockArraysEXT finally persuaded me. Maybe I'll keep the single-65x65-array for the fallback renderer.

The overhead for the CALL instruction might be negligible, but I don't think pushing/popping a few floats, fiddling with the stack frame etc. is so cheap.

You said once you're developing on TNTs, right? Is it worth buying a TNT, even a Vanta, for the benefit of better OpenGL drivers, with all the SGI expertise at NVIDIA etc.? More expensive cards are out of the question for me now

Are NVIDIA developer support relations more friendly when being asked things like "Is vertex path X faster than vertex path Y?" I tried once to contact S3 with a similar question and got back a questionnaire with things like "How many people work in your company? how many on 3d/multimedia? how many on your project? when is your project scheduled to finish? who is your publisher" etc...

IP:

LDA Seumas
unregistered
posted October 02, 1999 01:42 AM           
Even the pushing, popping, etc. is a pretty small hit compared to all the work the card and CPU has to do in response to those functions.

The TNT does have really good OpenGL drivers, Rage128 isn't bad either.

I've never really spoken with Nvidia developer relations actually... A good place to ask OpenGL questions is the OpenGL-GameDev-List (see my programming page for a link to the subscription info). Most 3D companies will make you sign an NDA before they'll give you any help, or especially any hardware.

------------------
-- Seumas McNally, Lead Programmer, Longbow Digital Arts

IP: