Better Living Through Clang-istry

Recently I came across this article, explaining how to use clang to dump the memory layout of a C++ object. Running

clang -cc1 -fdump-record-layouts ppfile.cpp

on a preprocessed C++ file, produced using, e.g.,

clang -E -I/probably/lots/of/include/paths  file.cpp

gives output like:

*** Dumping AST Record Layout
   0 | class StarObject
   0 |   class SkyObject (primary base)
   0 |     class SkyPoint (primary base)
   0 |       (SkyPoint vtable pointer)
   0 |       (SkyPoint vftable pointer)
  16 |       long double lastPrecessJD
  32 |       class dms RA0
  32 |         double D
     |       [sizeof=8, dsize=8, align=8
     |        nvsize=8, nvalign=8]
 184 |   float B
 188 |   float V
     | [sizeof=192, dsize=192, align=16
     |  nvsize=192, nvalign=16]

Notice that the lastPrecessJD variable is stored as a long double, with possibly 63 bits of precision instead of the usual 53 bits given by a double. In practice, long double has 16-byte storage and alignment. Since the vtable takes up only 8 bytes (on 64-bit), we waste 8 bytes on padding. Moreover, we then take up 16 bytes to store lastPrecessJD, but using a program like the following:

#include <stdio.h>
#include <math.h>

int main()
    double jd2000 = 2451545.0;
    double delta = nextafter(jd2000,jd2000+1) - jd2000;
    printf("delta: %.30f\n", delta);
    return 0;

we can compute that at the year 2000, the minimum time step at (64-bit) double precision is approximately 40 microseconds, so it’s not clear that we gain anything by using 80-bit long doubles instead of 64-bit doubles. Changing the long double to double (and placing it last, though this isn’t strictly necessary) results in memory layout for the SkyPoint class like so:

*** Dumping AST Record Layout
   0 | class SkyPoint
   0 |   (SkyPoint vtable pointer)
   0 |   (SkyPoint vftable pointer)
   8 |   class dms RA0
   8 |     double D
     |   [sizeof=8, dsize=8, align=8
     |    nvsize=8, nvalign=8]
  48 |   class dms Az
  48 |     double D
     |   [sizeof=8, dsize=8, align=8
     |    nvsize=8, nvalign=8]

  56 |   double lastPrecessJD
     | [sizeof=64, dsize=64, align=8
     |  nvsize=64, nvalign=8]

This saves 16 bytes, cutting the size to 64 bytes from 801. Since KStars suffers from abuse of complex inheritance heirarchies and everything-is-an-object, this is 16 bytes saved for every single object in the sky.

Doing some simple rearrangements of the data in other classes means we can also save 8 bytes per StarObject and DeepSkyObject. Overall, these changes give approximately a 10% reduction in memory usage, just from removing padding.

  1. This also has the benefit that the SkyPoint data fits in a single cache line, though I don’t think this really makes a difference given the inefficiencies in the rest of the code, and the fact that none of our data has any thought put into alignment, but it’s nice to have.

Posted December 22, 2013 under planetkde, code, kstars.