Author Topic:   ROAM for dummies
posted January 10, 2001 04:16 PM         
Anybody want to write a ROAM for dummies paper? I would be the first
person to read it.


posted January 11, 2001 03:50 PM            
I think Terrain Rendering for dummies would be even better. I haven't even got started on Quad yet (almost, just starting it now), let alone ROAM.......


New Member
posted January 13, 2001 09:33 PM            
I'm in. I've been researching like a madman, but I'm still not there yet. I'm at the point where I need to build or find a C++ shell that I can use to load terrain and begin testing the different techniques for improving terrain rendering performance. ROAM is on the top of my list.

Lithium, I would love to work together.


posted January 14, 2001 03:10 AM         
No, I wasn't volunteering myself to write it up, I mean somebody
else besides me.

I can get pass the first couple of pages, but when it starts
getting into error metrics and throwing equations at you, I can't
make much of it.


New Member
posted January 14, 2001 04:31 AM         
These are only suggested error metrics to build the queues, you could use a metric like the one in TM, realHeight-averageHeight.

So you can actually ignore most of the equations in the error metric section and still have a functional ROAM algorithm.

I don't know why anybody would want to implement ROAM just as the paper presents it anyways. It's possible to have only a split queue and have it run relatively fast with a lower memory usage.


posted January 15, 2001 12:11 PM         
I guess I'm having a hard time understanding the terminology in the
paper. Queues is another topic that doesn't quite make sense as it
is explained.

A couple days ago, I didn't even know what they meant by the phrase
'error metric', but I got a better understanding of it from this page:

This is how the ROAM paper should be explained, damn it.

If the Treadmarks error metric is simply real_height-aver_height,
then it doesn't even take the distance from the camera into
account? It is merely based on how much the slope of the terrain
changes? So there would be something that says if the delta
height for this node is greater than 5 meters for example, subdivide
it, until the slope levels out or until we reach the lowest
resolution of the terrain?

In other words, if I had two patches of terrain exactly the same,
and I placed one near the camera, and the other one far away, they
would both be rendered to the same depth and resolution?

btw, is there a standard range for error metrics? Are they bounded
by a min and a max value? The one on the Delphi page says:

"When traversing your quadtree, all you have to do is calculate the
error metric for the current node, and subdivide if the value is
less than one."

This seems to suggest that the error metric is a floating point
variable? Error metrics are not stored per node I assuming, it
is calculated each frame for each node, so there is no storage?

[This message has been edited by Lithium (edited January 15, 2001).]


Klaus Hartmann
posted January 16, 2001 09:00 AM            

I'm not very familiar with ROAM, but I am familiar with other LOD algoritms for terrain. So the following may indeed differ from the ROAM paper, but it might still help you to get a better picture.

First off, these LOD algorithms for terrain normally mix two types of LOD:
[1] a distance-based LOD
[2] a roughness-based LOD

The distance-based LOD has the effect, that distant areas of the terrain are more coarsely tesselated than those areas that are close to the camera.
The roughness-based LOD takes the roughness of the terrain into account. That means, the more features an area has the more vertices you'll need to approximate this area.

Now consider a full-resolution (no LOD) terrain mesh that is rendered perspectively onto the screen. In the distance, this terrain mesh has the same grid-spacing as near the camera (in world space). The perspective projection, however, causes the distant vertices to be closer together than the vertices near the camera. This means that rough terrain in the distance can be approximated with fewer vertices than rough terrain near the camera. If you didn't take this into account, then the result would be that multiple vertices in the distance are projected onto the same pixel on the screen (result is z-buffer artifacts). One solution to this problem is as follows:

[1] Determine the LOD of an area based on the roughness of the area (i.e. roughness-based LOD). This is the maximum LOD you'll need to properly represent the area near the camera.

[2] Make [1] depend on the distance of the area. That is, if the area is near the camera, then use the LOD from [1], because it represents maximum resolution for this area. If the area is further away from the camera, you can drop one or more levels of detail, because a rough and distant area doesn't require as many vertices as a rough and close area (remember the perspective projection thing from above).

The above two steps can be combined into one, usually represented by some formula. The result of this formula is a value that helps you to decide whether or not you may drop a level of detail. For example, you could create some formula like this:
F = R / D
where R is the roughness of the area in question, D is the distance to the same area, and F is the value you use to decide whether you're going to drop a level or not. Of course, you don't use the above formula in an actual terrain engine, because I just made it up, because it is easy to understand. The formulas in all those terrain papers are usually more complex than the above.
Now think about it... F becomes smaller, if D increases, and F becomes larger is D decreases. Hence, F already implies distance LOD. In addition, F becomes larger if R increases, and F becomes smaller if R decreases. In other words, this add the rougness-LOD to the distance-LOD, and as a result:
If F is small then use a low-resolution level of detail, and if F is large, then you use a high-resolution level of detail.

So the above is the basis of almost all LOD terrain algorithms. In practice, however, things are more difficult, because:

[1] The roughness R can be based on a projected screen-space error, or it can be based on a geometric error in 3-space (like realheight-avgheight in 3-space).

[2] The above function can grow arbitrarily large, and therefore there's no way for you to tell which level of detail you're actually going to use.

[3] For adjacent areas, the above formula may result in very different values. This can have the effect that the adjacent areas differ by more than one level of detail. This means that it becomes tough to avoid cracks, because different level of detail means different amount of vertices (i.e. the area don't fit together and result in cracks in the terrain). A level of detail difference of 1 does, of course, also introduce cracks, but it becomes a lot easier to eliminate these cracks. If I remember correctly, then ROAM solves this problem by using split and merge queues.

Anyway, I don't know if the above helps or not, because it's not about ROAM but rather about terrain LOD in general. But I hope you got a better picture now.



posted January 16, 2001 06:08 PM         
Yes, thank you very much.

So when the ROAM paper says something like "geometric screen-space distortions",
they are referring to the distortion caused by a perspective camera view?
That seems to make sense.

It's amazing as to how much information this paper can pack into a single
sentence. It doesn't waste a single space.


Klaus Hartmann
posted January 17, 2001 02:39 AM            

the answer to your question about "geometric screen-space distortions" is "yes and no".

It is true that perspective projections cause distortion. This distortion is commonly referred to as "perspective distortion".

"Geometric screen-space distortion" are related to perspective distortion, but they are not quite the same thing. Personally, I would have called this "Geometric screen-space error", but it looks like Mark feels different. Anyway, this is a bit tough to explain, but I'll try.
Imagine you have two views of the same terrain (with identical camera settings). The difference between the two views is that view A shows a full-resolution mesh, whereas view B shows a low-resolution mesh (i.e. view B is an approximated version of view A). (Both of these images are, of course, affected by the perspective distortion, but that doesn't matter at the moment.)
Obviously, view A and view B are normally not identical, because view B has less vertices than view A. Therefore the image produced from view A looks different than the image produced from view B. So view B is equal to "view A plus some error". Anyway, in a LOD engine, view B tells you where a vertex actually is (on the screen), and view A tells you where the vertex *should* be (in order for the mesh to be as optimal as it gets). This error can be calculated in screen-space, and the ROAM paper says:
dist(v) = | |s(v) - sT(v)| |
where v is the vertex we want to calculate the error for. s(v) is the projection of the vertex v in view A, and sT(v) is the projection of the vertex v in view B. Thus, "s(v) - sT(v)" gives you the signed error of a vertex v introduced in view B. However, we don't care about the sign of the error, because we are only interested in the magnitude of the error, and therefore you use the absolute value "dist(v) = | |s(v) - sT(v)| |".

At this point you could ask yourself "Why is this error computed in screen-space, and not in world space". In fact, some algorithms calculate the geometric error in world space, but this ignores the perspective distortion. So screen-space errors are normally more accurate because they represent the error of what is actually seen, and not what the mesh looks like in world space (which doesn't include the perspective distortion).

I hope this made some sense to you. I had to download the ROAM paper, but I didn't have the time to read the full paper. So I had to start somewhere in section 6 and that did put things a bit out of context. I can just hope that I got it right. I'll try to read and understand the whole paper but that'll take a bit, because I'll be away for a couple of days.



Klaus Hartmann
posted January 17, 2001 02:52 AM            
Hmmm... I think I did a mistake there, regarding "dist(v)=| |s(v) - sT(v)| |". s(v) and sT(v) are point in screen space, so "dist(v)=| |s(v) - sT(v)| |" computes the magnitude of the vector S = s(v) - sT(v). Obviously, this also represents an error



Klaus Hartmann
posted January 17, 2001 06:31 AM            
Okay, I read the paper in the bathtub and it was easier to understand than I expected. Of course, I didn't understand every single detail, yet, because I read it only once. Anyway, I think I know enough to answer a couple of questions you asked. However, I cannot give any guarantees that my answers are correct, because I haven't implemented the algorithm, yet.

"If the Treadmarks error metric is simply real_height-aver_height, then it doesn't even take the distance from the camera into account?"

real_height-aver_height is also used in the paper, and this does take the distance into account. It's buried in the following formula:
dist(v) = | |s(v) - sT(v)| |,
where s(v) is the real_height-vertex projected into screen-space, and sT(v) is the aver_height-vertex projected into screen-space. Their counterparts in world space are w(v) and wT(v), respectively.

dist(v) is, as already mentioned, the distance between the two screen-space points s(v) and sT(v). And now think about it... the farther w(v) and wT(v) are away from the camera, the smaller is the distance between s(v) and sT(v). Since dist(v) directly corresponds to the error, this means that the error for distant w(v)s and wT(v)s is smaller than the error for close w(v)s and wT(v)s. (smaller error means coarser LOD)

In addition, the formula takes the roughness into account, because the larger the difference between w(v) and wT(v) the greater the roughness AND the longer the distance between s(v) and sT(v).
In one of my previous posts I mentioned the following formula:
F = R / D
Remember that one? Well in ROAM this becomes:
F(v) = dist(v)

"btw, is there a standard range for error metrics? Are they bounded by a min and a max value? The one on the Delphi page says: "When traversing your quadtree, all you have to do is calculate the error metric for the current node, and subdivide if the value is less than one.""

Hmmm... I may be wrong but I think your are mixing up algorithms here ROAM is based on a bintree and the upper bound is given in equation (2) and (3) in section 6. Röttger's algorithm, on the other hand, is based on quadtrees, and it subdivides a node if the calculated error falls below 1.

If I understand this correctly, then ROAM works a bit differently. You basically compute priorities, and then perform merges and splits depending on the desired number of triangles. For example, if you want 3000 triangles, and the current triangulation has less than 3000 triangles, then you start to force-split the triangulation (starting with the highest priority), until you reach (or exceed) the desired number of triangles. Of course, the highest priority is directly related to the largest error (you want to minimize the error, and therefore you start to split the triangle with the largest error).

"This seems to suggest that the error metric is a floating point variable? Error metrics are not stored per node I assuming, it is calculated each frame for each node, so there is no storage?"

Yes, the error metric is a real number. If I understand this correctly, then you don't store the errors in the tree, but rather in the priority queues.



posted January 17, 2001 06:14 PM         

Ok, I see what you mean by a worldspace height farther from the camera being
smaller in screenspace. So this means when Treadmarks uses real - aver height,
we are talking about both a worldspace error and a screenspace error.
This seems to be an important fact to understand.

In the paper,

Real height was given as:

Average height was given as:

zT(vc) = ( z(v0) + z(v1) ) / 2, where

vc = bintree vertex midpoint
v0,v1 = bintree vertex base edges

But this only calculates the worldspace error. So we have to convert it
to screenspace after this:

dist(v) = | |s(v) - sT(v)| |

How are you interpreting dist(v)? Does this represent 'dist'ance or

btw, in the paper, I am interpreting the xyz coordinate system as:

w(v) = ( vx, vy, z(v) ), where

vw,vy = vertex v domain coords.
z(v) = v height

It sounds like z+ points up, and (x,y) are the terrain grid coordinates.
But that doesn't matter that much to me.


Klaus Hartmann
posted January 18, 2001 12:11 AM            
Cool! I have Internet access here, though I won't have much time to spend on the net
(Whoa! This keyboard is a pain...)

dist(v) = | |s(v) - sT(v)| | represents both, distance and distortion.

It represents distance, because the length of projected line segment between s(v) and sT(v) depends on the distance of the two vertices in world space, and it represents perspective distortion, because s(v) and sT(v) are the result of a perspective projection. My first assumption would be that dist(v) is equal to the priorities you have to calculate, but I'm not 100% sure. Anyway, in the paper "distortion" means error, and the length of the projected line segment is equal to the error.

As for the coordinate system... You are right. A lot of terrain-related papers use a coordinate system where the z-axis points up (for example ROAM and Peter Lindstrom's paper). I guess this has to do with the fact that (x, y) is more natural to use with height fields (2D arrays), whereas (x, z) feels a bit strange, IMO.



Klaus Hartmann
posted January 18, 2001 12:20 AM            
Ooops! Now I understand your question regarding dist(v). "dist" stands for distortion (i.e. error). This is a bit confusing Also, the "dist" with that roof (equation 2 and 3) means upper error bound (upper, because the roof is on top .

As for the Tread Marks 'question'... I really don't know much about the inner workings of Tread Marks, but I would assume that it does the following:
[1] Compute the real-height, w(v) in world-space.
[2] Compute the aver-height, wT(v) in world-space. This is indeed given by the formula you mentioned.
[3] Project w(v) and wT(v) into screen space, giving the new screen space points s(v) and sT(v), respectively.
[4] Compute the distance between s(v) and sT(v), which is equal to the distortion (i.e. error).