Sunday, July 17, 2016

I've wanted to write a page about coordinate transforms, but I've started it and abandoned it several times.

The first version was about matrices. I had seen Nicky Case's interactive explanation and Max Goldstein's animated explanation but I wanted to write something more basic. I wrote out an outline showing how matrices could be used for translate, scale, rotate, and shear transformations, and how those could be combined or inverted to generate all the effects we want.

After struggling with this for a few weeks, I put the page on hold.

I returned to it after a friend of mine suggested that I take matrices out completely. I had to ponder that for a while. Surely, the whole point of the page was to show how matrices worked. How could I do that if I took out matrices?

I came up with a new outline that pushed matrices to the very end. I covered all the transforms first, with simple code to implement each. If you have a series of transforms A, B, C, and you want to transform point p, you can call C(B(A(p))). At the end I introduced matrices as a uniform representation of all the different types of transforms. They also offer a way to combine transforms together, so that you can call (C ∘ B ∘ A)(p).

I had implemented lots of interactive diagrams for this page (see the draft version), but in the end I was unhappy with that version too.

There's a technique called 5 Whys (or 3 Whys) that I should've tried.

Why do I want to explain matrices? Because they are a nice way of implementing transformations.

Why do I want to explain transformations? Because they are a uniform way of thinking about operations we need in games: translate, scale, rotate.

Why do I want to explain translate, scale, rotate? Because they are a clean way to solve problems with game cameras: scrolling, zooming, rotating, and isometric views.

Aha! Maybe that's the real problem: game cameras. Instead of starting with matrices and then explaining how they represent transformations and then explaining how transformations can be combined, I could start with game cameras and then work my way up to transformations and then matrices.

I'm going to make another attempt at an outline for this page starting with game cameras.

Update: [2016 Aug] Well, I failed. I lost motivation to work on this so I've put it on hold … again. I think I may take a long break from tutorials.

Labels:

4 comments:

Anonymous wrote at August 01, 2016 3:03 AM

I agree. Problem are game cameras. I have knowledge of matrices and basic ideas about they are used in graphic design, but since I'm learning to make games the main problem I found are the cameras. I couldn't find the information I wanted to understand them, how coordinates are transformed, etc. Knowing that a matrix is used for transformations is nice, but if you don't undersntad how finally that's shown in the real world... is useless.
The problem is the game camera and I'm focouse the last days on them.
Just my opinion.
I appreciate your articles.

makc wrote at September 06, 2016 3:26 PM

I tried this before https://makc3d.wordpress.com/2012/01/10/wtf-transformation-matrix/ not really sure if it helped anyone

aref wrote at October 22, 2016 4:11 AM

Great perspective! In my experience, it's not just the camera that needs transformations. In (transformationally) nontrivial games, there is a scene graph where nodes need transformations relative to one another. For example, the limb of an animal relative to the body.

Also, making certain shapes, e.g. winding tubular structures or most anything with a Lindenmayer system, requires transformations, e.g. the branches of a tree relative to the trunk; the capillaries of a leaf relative to the leaf spine.

Even in data visualization, multiple transformations are present, and some of them are in value space, some are in screen space (e.g. annotations) and some in both.

So I believe it's best to think of transformations in general terms; nothing specific to the camera (sure you want to smoothly guide it etc. but math-wise, same thing as other transforms). In fact, the camera is as though the rest of the world moved around on some kind of manipulator arm, as when a child observes a maquette.

Also, however versatile matrix transforms are, they're in many ways limited:

1. They're limited to affine and perspective transformations; lines are preserved. Sometimes you need non-linear transformations, e.g. a magnifying lens or logarithmic scaling in a game or in dataviz, or simulating general relativity; discontinuities etc.
2. On rotation, they suffer from the gimbal lock problem, unlike quaternions
3. They use up considerable space (e.g. 4x4 number grid), more than quaternions
4. Simple matrix math can wreak havoc with numerical precision esp. on the GPU; sometimes the solution is to do scale and translate in separate steps (check out e.g. https://github.com/gl-vis/gl-scatter2d-fancy/pull/2/files#diff-134f4c4c8be9ad818d2ee31c034f109c)
5. Matrix multiplications aren't _necessarily_ faster than scale + transform (and perhaps rotate) because, often, many of the elements are zeros or ones, and a decent compiler can find a good approach anyway (and often, transforms aren't the bottleneck)
6. Matrix transforms are very regular but less intuitive to non-linalg folks than scale+transform

Amit wrote at October 24, 2016 10:01 AM

Thanks aref! I agree, transformations are the main idea. The reason I wanted to start with game cameras is that I usually structure my pages to answer an immediate question (like "how do I make a scrolling map") to draw the reader in, then introduce a more general, more useful idea (like transformations, including chaining them together and inverting them), then show how the specific question they had fits into the more general way of thinking.

I hadn't thought about the downsides of matrices; thanks for that list!