Acting on the Results

Reducing spacetime consumption is good, but there are two ways of reducing it and they are not equal. You can either reduce the amount of memory allocated, or reduce the amount of time it is allocated for. Consider for a moment a model system with only two processes running. Both processes use up almost all the physical RAM and if they overlap at all then the system will swap and everything will slow down. Obviously if we reduce the memory usage of each process by a factor of two then they can peacefully coexist without the need for swapping. If instead we reduce the time the memory is allocated by a factor of two then the two programs can coexist, but only as long as their periods of high memory use don't overlap. So it is better to reduce the amount of memory allocated.

Unfortunately, the choice of optimization is also constrained by the needs of the program. The size of the pixbuf data in Swell Foop is determined by the size of the game's graphics and cannot be easily reduced. However, the amount of time it spends loaded into memory can be drastically reduced. Figure 2-2 shows the Massif analysis of Swell Foop after being altered to dispose of the pixbufs once the images have been loaded into the X server.

Figure 2-2Massif output for the optimized Swell Foop program.

The spacetime use of gdk_pixbuf_new is now a thin band that only spikes briefly (it is now the sixteenth band down and shaded magenta). As a bonus, the peak memory use has dropped by 200 kB since the spike occurs before other memory is allocated. If two processes like this were run together the chances of the peak memory usage coinciding, and hence the risk of swapping, would be quite low.

Can we do better ? A quick examination of Massif's text output reveals: g_strdup to be the new major offender.

Command: ./swell-foop

== 0 ===========================
Heap allocation functions accounted for 87.6% of measured spacetime

Called from:
    7.7% : 0x5A32A5: g_strdup (in /usr/lib/libglib-2.0.so.0.400.6)

    7.6% : 0x43BC9F: (within /usr/lib/libgdk-x11-2.0.so.0.400.9)

    6.9% : 0x510B3C: (within /usr/lib/libfreetype.so.6.3.7)

    5.2% : 0x2A4A6B: __gconv_open (in /lib/tls/libc-2.3.3.so)
        

If we look closer though we see that it is called from many, many, places.

== 1 ===========================
Context accounted for  7.7% of measured spacetime
  0x5A32A5: g_strdup (in /usr/lib/libglib-2.0.so.0.400.6)

Called from:
    1.8% : 0x8BF606: gtk_icon_source_copy (in /usr/lib/libgtk-x11-2.0.so.0.400.9)

    1.1% : 0x67AF6B: g_param_spec_internal (in /usr/lib/libgobject-2.0.so.0.400.6)

    0.9% : 0x91FCFC: (within /usr/lib/libgtk-x11-2.0.so.0.400.9)

    0.8% : 0x57EEBF: g_quark_from_string (in /usr/lib/libglib-2.0.so.0.400.6)

  and 155 other insignificant places
        

We now face diminishing returns for our optimization efforts. The graph hints at another possible approach: Both the "other" and "heap admin" bands are quite large. This tells us that there are a lot of small allocations being made from a variety of places. Eliminating these will be difficult, but if they can be grouped then the individual allocations can be larger and the "heap admin" overhead can be reduced.