These past two weeks or so, I’ve been working on a nice interface for KStars to use OpenCL with. The problem is that OpenCL support is still pretty flaky in terms of support — at the moment, there are three complete implementations that support Linux, by Intel, AMD, and nVidia respectively, and they’re all proprietary. There’s some promising work for the future with OpenCL in Mesa and also with pocl (an LLVM-based CPU-only implementation), but it’s not ready yet.

So, the point is that although we want to be able to have users use OpenCL if they have one of these implementations installed, we can’t rely on it. The solution I arrived at is to have two classes, KSBuffer and KSContext, which respectively hold a buffer of points to do computation on, and manage contextual state for the computation.

These classes use d-pointers, and for each of them we have two classes that inherit the *Private classes. One of these uses OpenCL, while the other one just uses plain Eigen on the CPU. This way, the rest of the code that wants to do computation on buffers of points doesn’t need to have to know anything about the implementation details, and we can make OpenCL an optional dependency both at compile- and run-time.

We can also run a short test to see how the performance has changed.

A first performance report

As a small test, we create a buffer of 1 million sky points, and then do the steps needed to compute the apparent position of these points at a given time:

Precession
Nutation
Aberration
Conversion to horizontal coordinates (i.e., coordinates for a given location and time).

Running these steps, we get:

Old algorithms: 3947ms
New algorithms (with Eigen): 70ms (56x baseline)
New algorithms (with OpenCL): 30ms (132x baseline)

So, this is a pretty good result so far, with the following caveats:

None of the new code is optimized.
The benchmark is pretty synthetic, and we usually don’t process a million points at once.

I’m looking forward to seeing how much benefit we can get once we integrate the new algorithms into the sky-component hierarchy, and whether we can optimize this further.

Addendum: since this got posted to Phoronix, it’s good to point out that the dramatic improvement is actually from better algorithms, not from using OpenCL. For information, see some of my previous posts on the algorithmic changes.

KStars GSoC: OpenCL and a first performance report

A first performance report