|
Author
|
Topic: Rottger/Quadtree Algorithm
|
MarkBatten Member
|
posted March 14, 2000 03:33 PM
One place to begin with this particular algorithm is http://pages.eidosnet.co.uk/~n.anderson/lod_program.html. I haven't studied it in detail, but it claims to be an implementation of the Rottger algorithm we've been discussing here. After a quick look at the code, it appears to be a very simplistic and incomplete version, but the code is not very long and it does run, so it might help you get started in understanding these issues.Now, if MarkW will report on the CGDC session, we can all move ahead! MarkW? IP: |
MarkW Member
|
posted March 15, 2000 11:47 AM
Just checked and they still have not posted the source in the GDC library. I'm kind of busy at the moment but I'll give you guys a brief summary. Westwood is using the algorithm in a massively multiplayer online game (so it does work in the real world). I spoke with their lead programmer and he ran into the same geomorphing problem that we found. I believe they did the same thing as Klaus and simply ignored the problem. The minor popping is not noticeable in real-world game terrain which tends to be mostly smooth rolling hills. They departed from the paper in some minor ways. If I remember correctly, they got a more compact memory structure than the huge matrix. Their version uses less memory and is faster to traverse. They really didn't go into much detail (it's only 1 hour lecture) so we'll all have to wait for the source release to get the details. I'll post a link here as soon as it's available.IP: |
Klaus Hartmann Member
|
posted March 16, 2000 12:21 AM
<hehe> So we are not the only people who have problems with the geomorphing. Here's what I'm going to try next. It may be considered a hack, but I have high hopes that it'll work. I'm going to use indexed triangle lists, because they are (a) faster than fans, and (b) make use of shared vertices. There are a couple of other advantages, but let's stay with the shared vertices. I intend to make a matrix of indices, and this matrix is initialized with 0xFFFF. In addition I have a global index counter, which is incremented for each new vertex that is added to a vertex buffer. The indices are also stored in an array of indices so that the terrain section can be rendered in one piece.Thus, the rendering algorithm changes to something like: idxCounter = 0 Index-Matrix = 0xFFFF If vertex needs to be render ..If index-matrix[row][col] != 0xFFFF ....Add index to index array ..Else ....Morph the vertex ....Index-matrix[row][col]=idxCounter++; ....Add vertex to vertex buffer ....Add index to index array ..End End RenderVertexBuffer(index-array) Of course, the above is incomplete, but it shows the basic idea. The beauty about this is the following: I can morph the vertices in whatever way I want to, and there will be no cracks. In the end, this allows me to try more complete geomorphing methods than before. IP: |
MarkW Member
|
posted March 16, 2000 12:08 PM
Klaus,You have to be careful with these sort of algorithms. When I wrote an indoor engine a few years ago I tried to do the same thing. I had all my polygons index into a global vertex pool. Each vertex was flagged to make sure it's only transformed once. In the end, the whole thing ran very slowly because of the "if" statements checking the flag and because you're missing the cache with the random vertex pool reads. It may still be worth it if the per vertex operations are very expensive or helpful to your algorithm (i.e geomorphing). I will try working on a similar optimization. At the minimum, I will stream all my vertices through a vertex buffer by using the (NO_OVERWRITE) flags in D3D 7. Indexing would be nice if I can find a way to do it without "ifs" and possibly sort the indices in strip order to reduce bus traffic. In case you're wondering, the Westwood demo didn't do any of this - just plain nonshared vertices with DrawPrimitve vertex_fan calls. IP: |
Freeside New Member
|
posted March 16, 2000 12:35 PM
Well, I've only been working on this alg. for 2 days, so keep that in mind.Regarding the comment "if f falls into the range [.5, 1), the quadtree is not further refined" : One thing I've noticed by setting 'c' to 0.0 (meaning ignoring roughness values complete) is that if f = [.5, 1) then you WILL have to generate a fan at this node.. So there is always 1 child that did not split further. Perhaps the idea is to use this value for the blending value of the 'leaf' vertices? Of couse I might be brain dead from staring at Figure 7 for too long.  I also wrote a simple console app that builds roughness values, triangulates etc, and then dumps out an ascii version of the different matrices. This seems to be pretty helpful in understanding just what is going on w/o dealing with GL / D3D.. -doug
IP: |
Klaus Hartmann Member
|
posted March 16, 2000 01:30 PM
MarkW, What exactly do you mean by testing flags? Which flags? I know what flags are, but I don't quite understand you. Here's a more detailed explanation of what I plan to do:First, I subdivide the world in 65x65 blocks (or smaller, if the user wants them to be smaller). There are three reasons for this: (a) limits in texture size (i.e. Voodoo) (b) to limit the size of a vertex buffer (c) to decrease the number of levels in the tree. For 65x65 blocks, the theoretical vertex buffer size if 4225 vertices. In practice, however, you'll only need the first 1000-1500 vertices. This is because of the simplification (i.e. NESW vertices are usually skipped). Well, it's probably *much* less than 1000 vertices, but I cannot tell for sure, without testing, first. Assuming that each vertex is 24 byte, then we have a vertex buffer of 1,000*24 = 24,000 bytes. In addition there's the index array. Assuming, that each vertex is used 4 times (average), we have an array size of 4*1,000*2 = 8,000 byte. So all in all we are talking about 32,000 bytes, which doesn't seem to be too much. As for if-statements... Those are only used to check the index-matrix. Of course, this can result in a cache miss from time to time, but I also significantly reduce the number of morphs. With fans, you morph each edge-vertex twice, whereas I'll only need to morph them once. I believe (but cannot proof, yet), that reducing the number of morphs is more important than having a cache miss every 100 vertices or so. I think we don't have to argue about triangle lists. They are better suited than fans, because you triangles can be batched up. This is necessary to achieve high polygon throughput (see "fanning problem" thread on this message board). Indexed triangle lists take this a step further, because not only do they avoid unnecessary transforms, but they should also reduce cache misses. That's simply because an indexed vertex buffer contains much less vertices than a non-indexed vertex buffer. Of course, a non-indexed vertex buffer is better sorted than an indexed one, and can thus be read sequentially. But an indexed vertex buffer also contains some order, because of the nature of the tree traversal. Hence, I believe, that indexed triangle lists are the way to go for this particular problem (I may be wrong though). As for vertex buffer handling, I intend to do the following: (a) create vb with D3DVBCAPS_WRITEONLY (b) lock vb with DDLOCK_NOOVERWRITE and DDLOCK_DISCARDCONTENTS (c) add vertices to vb (d) unlock vb (e) DrawIndexPrimitiveVB (f) jump to (b) and lock the same vertex buffer Anyway, I think I'll spend some time tonight, and test this. I hope that I can get far enough, because I still have to port the whole algorithm to Direct3D.
IP: |
MarkW Member
|
posted March 16, 2000 04:38 PM
Klaus,Indexed triangle lists are the best thing to use. But you can gain even more speed by having the triangle indices in strip order. One card in particular (can't say for NDA reasons) only has a 3 value vertex cache and no strip support. So using these sorted triangle lists will allow the card to auto-generate strips for you and cut the bus traffic. The speed gain is VERY noticeable from my experience. About the flag issue. From your first pseudo code example I thought you were using a frame number type flag. I used this in the past to keep track if a vertex has already been processed in the current frame. It ended up being slower to check the flag than to simply process all the vertices at the start of the frame. Wish I had more time to work on this terrain stuff - there are so many cool optimization ideas. IP: |
Klaus Hartmann Member
|
posted March 16, 2000 06:53 PM
MarkWI read you message, but still wanted to run a performance test between triangle fans and indexed triangle lists. The difference is gigantic. Machine 1 (TNT-1, Windows 98): fans: 101 fps list: 124 fps Machine 2 (GeForce, Windows 98): fans: 155 fps list: 271 fps Machine 2 (GeForce, Windows 2000): fans: 231 fps list: 217 fps Don't ask me what the problem with the Win2K machine is. But the Windows 98 numbers are reason enough to stay with the indexed lists. BTW... these tests were not performed with Roettger's algorithm. I wrote a small test program, which displays 1024 quads (traversed like a tree). Each quad has 4 triangles (NESW vertices skipped). Also, you were talking about strip. How do you generate those strips from the quadtree? IP: |
Freeside New Member
|
posted March 16, 2000 10:00 PM
Klaus, Any chance you could post a link to your test code. I'd like to play with it some with my hardware..-doug IP: |
Klaus Hartmann Member
|
posted March 17, 2000 07:12 AM
FreesideSure... I'll make the code a bit easier to read, and then I'll post a link. Just give me a couple of hours. IP: |
Klaus Hartmann Member
|
posted March 17, 2000 09:07 AM
Freeside,Here you go: http://www.thecore.de/TheCore/QuadRender.zip WARNING! This code isn't particularly clean and safe. It's simply a test program. IP: |
MikeC Member
|
posted March 18, 2000 01:01 PM
Guys... don't know if you got the source from the GDC talk yet, but I have it here: http://www.salug.org/~mcoker/LOD_Terrain.zip and the paper http://www.salug.org/~mcoker/LODtalk.doc IP: |
Klaus Hartmann Member
|
posted March 18, 2000 07:26 PM
MikeC,Thanks a lot Can't wait to examine the source. Until now I was only able to have a quick look at it. Well, the code sure looks different. To be honest... it's looks over-complicated and cumbersome. But that's, of course, only my first impression. Time to have a closer look... Thanks IP: |
Revolver Member
|
posted March 19, 2000 11:58 AM
There seems to be a strange artifact at the center of the terrain - a short "spike", with a cube at it's top. Any clues why this happens?IP: |
Klaus Hartmann Member
|
posted March 19, 2000 12:51 PM
Revolver, This isn't an artifact. They did it on purpose, probably to mark the center of the height field.The box comes from the draw_box call in Main.cpp, and the peak is there, because they set the byte at 0x80400 (in the file map_height.raw) to 0xFF. IP: |
Revolver Member
|
posted March 19, 2000 01:12 PM
i ran the binary before hand, and thought it might be a bug. but yea, i noticed what you mentioned when i looked through the code. =PIP: |
Freeside New Member
|
posted March 19, 2000 08:39 PM
I believe the spike is there to demonstrate the need for the 'detail map' (d2 values). Hmmm. It also appears that I am calculating the d2 values in the same manner as they are. The paper didn't clearly mention testing against the child nodes as well as newphews, but it makes sense. Also their K value is different from the papers. I realized that a different K value is needed for children vs nephews, but I didn't come up with the same values that they did.. Guess I'll try theirs and see if I notice a difference. IP: |
Klaus Hartmann Member
|
posted March 19, 2000 09:37 PM
Freeside,Their K value is the same as in the paper. The point is that they are scaling their errors by DETAIL_VALUE_SCALE_FACTOR = 0.5. For the L1-norm, their K is: K = C / 2 * (C - 0.5) Child errors are multiplied by K: error' = K * error * 0.5 (0.5 is the DETAIL_VALUE_SCALE_FACTOR). Now you do the math: K = (C / 2 * (C - 0.5)) * 0.5 ==> K = C / 4 * (C - 0.5) ==> K = C / (4C - 2) ==> K = C / 2 * (C - 1) The reason why they use nephews is, that they store their quadtree in a linear fashion (with Morton numbers). This way you cannot propagate any d2 values to the corners, and thus you need to take them from the childrens' neighbors (the nephews). I find it tough to visualize this, but I believe, they are doing this part wrong. IMO, the nephews should be multiplied with the same K as the children. IP: |
Klaus Hartmann Member
|
posted March 19, 2000 09:37 PM
Freeside,Their K value is the same as in the paper. The point is that they are scaling their errors by DETAIL_VALUE_SCALE_FACTOR = 0.5. For the L1-norm, their K is: K = C / 2 * (C - 0.5) Child errors are multiplied by K: error' = K * error * 0.5 (0.5 is the DETAIL_VALUE_SCALE_FACTOR). Now you do the math: K = (C / 2 * (C - 0.5)) * 0.5 ==> K = C / 4 * (C - 0.5) ==> K = C / (4C - 2) ==> K = C / 2 * (C - 1) The reason why they use nephews is, that they store their quadtree in a linear fashion (with Morton numbers). This way you cannot propagate any d2 values to the corners, and thus you need to take them from the childrens' neighbors (the nephews). I find it tough to visualize this, but I believe, they are doing this part wrong. IMO, the nephews should be multiplied with the same K as the children. IP: |
Klaus Hartmann Member
|
posted March 19, 2000 09:45 PM
Sorry for the two posts. And also sorry for the missing brackets, like: K = C / (2 * (C - 1))I did this on a sheet of paper, where I had real division lines  IP: |
Klaus Hartmann Member
|
posted March 19, 2000 10:10 PM
<hehehe> Their morphing seems to be equivalent to my version, because they have the same problems. If the quality is high, then you have absolutely no chance to see any popping. But if it is low, then you see popping. Exactly the same as in my implementation...IP: |
Freeside New Member
|
posted March 19, 2000 11:14 PM
Klaus, Uhm, I don't think you're looking at the same thing I am. DETAIL_VALUE_SCALE_FACTOR is just used so that they get approx. the same # of tris w/ and w/o detail mapping.I'm talking about the consts LOD_Terrain__Child_Detail_Scale and LOD_Terrain__Nephew_Detail_Scale which are found in LOD_Terrain.cpp. These are the K values that they use. Kchild = C / (2*(C - .5*sqrt(2))) Knephew = C / (2*(C - .5*sqrt(10))) Both are mentioned in the .DOC and at the top of the .cpp file though I'm still working on understanding it better. Those values are used in the constructor for Detail_Map() to build the detail/d2 values. My code for d2 calcs was identical to theirs except my 'K' values were apparently wrong because of the paper. Their should definitely be 2 different K values (one for children and one for nephews). I need to reread the paper and see if I can prove that Westwoods values are correct. Also, what were the problems with morphing? And what were you using for your bias calc? IP: |
MarkW Member
|
posted March 20, 2000 02:50 PM
Have not had a chance to look at the source yet... But just the answer the spike question- yes it's there to demonstrate the need for the detail map. They used it to show that you can tweak/adjust the detail map to force it to preserve local details like the spike. Also, they said the code would be very bloated in order to make it easier to read/understand. I'll look over the code soon and see if I can expand on anything they covered in the lecture.Thanks Mike for posting the code. The GDC site will not be updated till April. ;-( IP: |
Assassin New Member
|
posted March 21, 2000 10:14 AM
There seems to be a lot of replies / posts / converstaions on this topic.I think some of the more knowledgable people should get togethet and form a web page on the topic with tutorials, samples, optimizations... I've looked at the paper and other documents and fairly understand the algorithm - but I'm sure other people who may wish to use it may be baffled. Ye seem to have some good ideas bouncing around - it would be a shame not to have them all together on a web site. ------------------
/n Assassin IP: | |