In my last post, I discussed the coordinate systems used in KStars, which all used two-dimensional spherical coordinates. However, the coordinate systems are perhaps better thought of not in terms of angles, but in terms of rotations from a fixed origin, plus an orientation. Trigonometry is in some ways easy to use with pen and paper, but it’s not a good choice for computation. We’ll use quaternions. This post gives a brief outline of what quaternions are and how they relate to rotations.

Note: since there’s some math in the post, and I use MathJaX to render it, the math may not render on Planet KDE or with RSS. So either follow the link, scroll past, or just read the TeX yourself.

Consider the plane \(\mathbb{R}^2\), and the complex numbers \(\mathbb{C}\). It turns out that there’s a very nice link between the complex numbers, and the orthogonal transformations of the plane. Orthogonal transformations are those that preserve distance and angle; in general, the group \(O_n\) of orthogonal transformations of \(\mathbb{R}^n\) consists of all of the linear maps which are isometries, i.e., preserve distance. We want to study \(SO_2\), the group of orthogonal maps of the plane which have determinant \(1\), so that they preserve the handedness of shapes.

Recall that the complex numbers are all the numbers of the form \(a + bi\), where \(i^2 = -1\). If we ignore the multiplication operation for complex numbers, we can identify complex numbers with points in the plane. To put that another way, we can think of the complex numbers as points in the plane that we can multiply together to get other points in the plane.

Multiplication for complex numbers is defined as \[ (a + bi)(c + di) = (ac - bd) + (ad + bc)i, \] but if we are identifying numbers, which are algebraic, with points, which are geometric, then this multiplication should also have a geometric interpretation. We’ve written our points as \(a + bi\), but we can just as easily write them in polar coordinates as \((r,\theta)\), where \(r\) is the distance to the origin and \(\theta\) measures the angle counterclockwise from the real axis to the point. It turns out that the multiplication of two points is then just \[ (r_1,\theta_1)(r_2,\theta_2) = (r_1r_2,\theta_1 + \theta_2), \] i.e., we multiply their lengths and add their rotations.

In particular, this means that if we write \(N(z)\) to be the norm or length of the number \(z\), then \(N(z_1z_2) = N(z_1)N(z_2)\). Hence if \(z_0\) is a point on the unit circle, then the transformation \(z \mapsto z_0z\) preserves the norm, and since is thus an isometry. If we let \(\theta\) be the angle of \(z_0\), then we can write this map in matrix form as \[ \begin{bmatrix} \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{bmatrix}. \] It’s not hard to see that if \(A \in SO_2\), then \(A\) is of this form for some angle \(\theta\).

But this group is really just a circle, as we’ve seen. Importantly, the description as a circle isn’t just nice in the abstract: it tells us that we don’t need matrices to describe these rotations, we just need a single number, which is the angle.

Now, we’d like to have a similarly nice description of \(SO_3\), the group of all possible rotations of \(\mathbb{R}^3\).

In the two-dimensional case, we introduced a multiplication on points in \(\mathbb{R}^2\), so that the multiplication is compatible with the norm, and then looked at the points with norm \(1\). Historically, people tried for a long time to do the same thing with triples of points, and they all failed, because it is (as was later proved) impossible.

Instead, we use the *quaternions*, which are numbers of the form \(a + bi + cj + dk\), where \(i,j,k\) satisfy \[
i^2 = j^2 = k^2 = ijk = -1.
\] This equation determines all of the pairwise products, which are as follows: \[
\begin{matrix}
ij = k & ji = -k \\
jk = i & kj = -i \\
ki = j & ik = -j
\end{matrix}.
\] Note that the anticommutativity of the generators \(i,j,k\) means that unlike real or complex numbers, quaternion multiplication is not commutative, and we cannot multiply things in any order.

We define our norm as \[ N(q) = \sqrt{a^2 + b^2 + c^2 + d^2}, \] and this norm is in fact compatible with the multiplication: if we have two quaternions \(q_1, q_2\), then \(N(q_1q_2) = N(q_1) N(q_2)\). And, as with the complex numbers, we can define conjugation on quaternions: we write \(q^* = a - bi - cj -dk\) for the conjugate of \(q\). Going through the full formula for multiplication of quaternions, which I’ve omitted, one can see that \(N(q) = \sqrt{qq^*}\). Note that since the norm is multiplicative, \[ q \left( \frac{q^*}{N(q)^2} \right) = 1 = \left( \frac{q^*}{N(q)^2} q \right), \] so that \[ q^{-1} = \frac{q^*}{N(q)^2}. \]

If we consider the quaternions (written as \(\mathbb{H}\)) as a real vector space, we can see that under conjugation, they break up as \(\DeclareMathOperator{\Span}{Span} \mathbb{H} = \mathbb{R} \oplus \Span\left\{ i,j,x \right\}\). The first subspace is fixed under conjugation, while the elements of the second are sent to their negative. We can identify elements of the second subspace, which are purely imaginary quaternions, with elements of \(\mathbb{R}^3\), and just call them vectors.

Now let \(q_1, q_2\) be two quaternions, and consider the map \(v \mapsto q_1vq_2\). By the multiplicativity of the norm, we see that if \(N(q_1) = N(q_2) = 1\), then this map is orthogonal. If we use the decomposition into real and imaginary subspaces defined above, we can see that if this map fixes \(v = 1\), then it must fix the purely imaginary subspace, i.e., it sends vectors to other vectors.

Both of these conditions are satisfied if and only if \(q_2q_1 = 1\). So, we can define the map \[ [q] : \mathbb{R}^3 \rightarrow \mathbb{R}^3 \\ [q] : v \mapsto qvq^{-1}, \] and this map is an orthogonal map. It’s clear that the product of quaternions corresponds to the product of rotations, since applying \([q_2]\) after \([q_1]\) gives \(q_2 q_1 v q_1^{-1} q_2^{-1}\), which is \([q_2q_1]\).

We want to interpret this geometrically; it turns out that if \(u\) is a unit vector, then the quaternion \(q = r(\cos \theta + u\sin \theta)\) is a rotation of angle \(2\theta\) around the axis \(u\). Call a rotation *simple* if it fixes a line. We’ve just seen that all simple rotations can be expressed as a map \([q]\) for some quaternion \(q\); it’s also true that the orthogonal group \(SO_3\) is generated by simple rotations, and since the product of two simple rotations is a simple rotation (as noted above), we find that every rotation of \(\mathbb{R}^3\) is simple and can be written as \([q]\) for some quaternion \(q\). (For references for these facts, see the end.)

Note there are infinitely many such quaternions, unless we fix \(r = 1\), in which case there are two, since \(q\) and \(-q\) give the same rotation. This is an explicit version of the double cover of \(SO_3\) by the special unitary group \(SU(2)\), which is embedded here as the group of unit quaternions, but written up with no reference to the underlying Lie theory.

This map from \(SU(2)\) to \(SO(3)\) is a very nice map, because there is underlying geometry I haven’t discussed. In particular, the map is smooth everywhere, so we avoid the problem of gimbal lock. If we represent a rotation by its rotations of the \(x\), \(y\), and \(z\) axes, then we can end up rotating one axis onto the other and losing a degree of freedom. (For details, see the Wikipedia article). This can’t happen with quaternions, because we’re working with a covering map. Also, compared to matrices, we need only four numbers, compared to nine.

There’s a very nice book by John H. Conway and Derek Smith, “On Quaternions and Octonions” that describes the geometry very nicely. I mostly followed their description; in particular, it has proofs of the facts I omitted.

There’s a blog post by Qiaochu Yuan that describes the situation from a Lie-theoretic point of view.

This Wikipedia page is also very helpful.