c++ : Modern C++... or Modern Art?

Date: 31-Mar-2009/0:41

Tags: c++

Characters: (none)

In the preface to his book Modern C++, Andrei Alexandrescu paints a vision of what programming should be like:

Imagine the following scenario. You come from a design meeting with a couple of printed diagrams, scribbled with your annotations. Okay, the event type passed between these objects is not char anymore; it's int. You change one line of code. The smart pointers to Widget are too slow; they should go unchecked. You change one line of code. The object factory needs to support the new Gadget class just added by another department. You change one line of code.

You have changed the design. Compile. Link. Done.

This is a very nice theory. But as C++ programming has remained relevant only among a small (yet important) "fringe" of developers, they have been flexing the standards toward an uncompromising pursuit of this vision. The results are somewhat extreme and not generally easy to work with.

In this article I will talk briefly about the what is happening and what I think of the aesthetics.

One change in decision, one change in code

Sometimes it seems that C++ libraries are pathologically complex for no good reason, when it is actually related to the pursuit of the goal Andrei describes. Simpler solutions simply wouldn't work for them. For instance, look at the old way of doing numeric limits:

// CHAR_BIT - Length of a char variable in bits.
// CHAR_MAX - Maximal value which can be stored in a char variable.
// CHAR_MIN - Minimal value which can be stored in a char variable.
// INT_MAX -Maximal value which can be stored in an int variable.
// INT_MIN - Minimal value which can be stored in an int variable.
// LONG_MAX - Maximal value which can be stored in a long int variable.
// LONG_MIN - Minimal value which can be stored in a long int variable.
// SCHAR_MAX - Maximal value which can be stored in a signed char variable.
// SCHAR_MIN - Minimal value which can be stored in a signed char variable.
// SHRT_MAX - Maximal value which can be stored in a short int variable.
// SHRT_MIN - Minimal value which can be stored in a short int variable.
// UCHAR_MAX - Maximal value which can be stored in an unsigned char variable.
// UINT_MAX - Maximal value which can be stored in an unsigned int variable.
// ULONG_MAX - Maximal value which can be stored in an unsigned long int variable.
// USHRT_MAX - Maximal value which can be stored in an unsigned short variable.

for (
    char index = CHAR_MIN;
    index <= CHAR_MAX;
    index++
) {
  std::cout << index << " is a valid value " << std::endl;
}

What's wrong with that? Well, in a scenario like Andrei was discussing, let's say you want to change from char to int. But you have to change THREE places: the type, the min, and the max:

for (
    int /*1*/  index = INT_MIN /*2*/;
    index <= INT_MAX /*3*/;
    index++
) {
   std::cout << index << " is a valid value " << std::endl;
}

Here it is easy to see the 3 changes you need to make. This is only because they are all on one line and the word "INT" hints us of the relationship--in a bigger codebase it is easier to slip up and not touch all the impacted sites. What would tighten this up would be a typedef that captured our decision to use a certain type... and functions type_max() and type_min():

typedef int index_type;
for (
    index_type index = type_min(index_type);
    index <= type_max(index_type);
    index++
) {
   std::cout << index << " is a valid value " << std::endl;
}

You now have the expressive power you need to change the code in ONE place. To see this, let's change to unsigned:

typedef unsigned /*1*/ index_type;
for (
    index_type index = type_min(index_type);
    index <= type_max(index_type);
    index++
) {
   std::cout << index << " is a valid value " << std::endl;
}

You can observe that the return type varies--if the type you are testing is "int" then the return type should be an int, and if it's a "char" it should be char, etc.

Yet there's no way to write the functions type_min() and type_max() in "old-school" C++! Both of these functions require parameters that are data types instead of values of a particular type. Many interpreted languages support "reflection" and can inspect and operate on types at run-time, but that's just not how C++ works.

Note Technically speaking, you can enable run-time-type-information (RTTI) and the typeid keyword. But this doesn't work with native types like int or char, and a lot of people--including Bjarne Stroustrup--don't like RTTI. They would say this is precisely the kind of situation where people might use it when they actually want something else entirely.

Yet in "modern" C++ we have templates at our service! They look a little weird because the arguments to the templated code (what I'll call "Tparams") are in their own separate list from the conventional function arguments ("Fparams"). You can offer template parameters on specific functions or on a whole class, such as:

// Template arguments ("Tparam"s) appear in angle brackets <>
// they *must* be known at compile time
// data types are allowed, but no variable values!
//
// Function arguments ("Fparam"s) appear in parentheses ()
// they *might* be known at compile time
// variable values are allowed, but no data types!
//
object<
    Tparam1, Tparam2, ..., TparamN
>.method(
    Fparam1, Fparam2, ..., FparamN
);

When the standard C++ library implemented something functionally equivalent to type_min() and type_max() they did it using a templated class called numeric_limits:

typedef char index_type;
for (
    index_type index = std::numeric_limits<index_type>::min();
    index <= std::numeric_limits<index_type>::max();
    index++
) {
   std::cout << index << " is a valid value " << std::endl;
}

The result is now a legal program, and one that brings us into harmony with what Alexandrescu was suggesting. Changing only the index_type will adjust the limits accordingly. Neat!

Beauty vs. Pathology

This "portrait of reform" is a very typical one in C++ library design. Whether you love or hate the look of the resulting code, it is undeniably more robust in the face of future changes!

What has happened is that implicit relationships have been made explicit to the compiler. In the original C-style program, the choice of a char type and the choice to use CHAR_MAX in a loop could have been seen as completely independent. Using std::numeric_limits<char>.max() binds the two inextricably together.

This should appeal to the same sensibility that leads programmers to declare symbolic constants in a single location, as opposed to repeating the values each time they use them. And although it's rather helpful in managing basic churn in one's own code, it is indispensible when composing together class libraries that were written by different authors. The interactions can be very subtle.

Yet at the bottom line, it is a lot more keystrokes to enter. It also requires a depth of understanding of the C++ language. Jeremy Friesner aptly captured the darker side of the tradeoff:

The drawback is as the declarations get more intricate, fewer and fewer people can understand what the code is for, or how it works. Often the symptom of that is that the code is overlooked or ignored... e.g. you spend a month designing your super-duper templates header, and then two weeks after you leave the project, your replacement added his own header which looks like this:
// CHAR_MAX - Maximal value which can be stored in a char variable.
#define CHAR_MAX std::numeric_limits<char>::max()
// CHAR_MIN - Minimal value which can be stored in a char variable.
#define CHAR_MIN std::numeric_limits<char>::min()
[...]
Doh!

Making matters worse is that once one thinks they understand some of the more abstruse mechanics of C++ libraries, they're still changing their minds. Best practices keep changing, and Alexandrescu himself gave a talk at BoostCon suggesting that "STL iterators must go...which [has some further discussed here on StackOverflow!

Is managed complexity an illusion?

These days I'm leaning even more heavily toward saying that we have to dispose of complexity in our systems whenever we can. Otherwise we spin out of control under the illusion of "managed complexity".

I think numeric_limits is a good example of where there's another avenue of exploration. Instead of building ever-increasing mechanics for dealing with sizes of different numeric types, you could invest in removing that concern entirely. Why not make the platform super-efficient at dealing with arbitrary precision arithmetic, and stop using native types?

Of course, C++ is not a candidate for that particular change. I'm just trying to make the point that it is possible to look at it from another perspective. It's like the saying goes:

The most reliable parts of a system--the ones that never break down and that you don't need to replace--are the ones that aren't there.

Thoughts on Joel Spolsky's "User Interface Design for Programmers"

Boost.Graph: Not as Crazy as you Think