My C++ Wishlist

Here's a list of features I think I'd like to have in C++. Some of them are pretty half-baked, but blue-sky ideas can be fun even if they aren't always technically useful.

Customized error messages

One of the really cool things about templates in C++ is that you can make them "break the build" when the program uses them the wrong way.

In libpqxx, for example, I have template functions for converting various types to and from a domain-specific string representation. Those templates are specialized for many built-in types, but not for char. That was a deliberate choice: a char may represent a character, but it could also be a small integer (unsigned if the program was written with portability in mind). So I'd rather have the build fail and force the programmer to work around it.

But when the build fails with a link error about the missing specialization, the programmer probably thinks it's a bug in the library. The message that the library doesn't want this code to compile doesn't come across.

I work around this by providing a specialization that invokes a function error_ambiguous_string_conversion. I declare that function but leave out any definition. A program that tries to convert a char to a string representation will now fail to link, and the build error will at least contain this helpful function name. Hopefully the programmer will try to figure out why the build failed, and end up at the declaration for error_ambiguous_string_conversion where the problem is documented.

But wouldn't it be nice if I could customize the error message completely? Imagine I could just say:

template<> std::string to_string(const char &)
[
  "String conversion for type char is ambiguous.  If you really mean a "
  "character, create a std::string directly.  If you mean a small integer, "
  "convert to int or another unambiguous integral type."
];

That would mean that this function is declared but has no definition. If the program tries to invoke it, the compiler will emit my custom error message to the user.

This would also be helpful for nocopy base classes: current practice for classes that shouldn't be copied is to declare the copy constructor and copy assignment operator but not define them. A custom error message would make that trick a bit more user-friendly.

Optimization Goals

For ages now we've been optimizing CPU performance. Higher clock speeds, more aggressive optimizations, faster memory to keep the CPU fed. It all matters, because practically all of the CPU's time is spent either waiting for memory or cranking through tight loops. The benchmarks we use to guide all this work say so. That's one big reason why we still use so many statically compiled languages like C++: you get to spend a lot more time finding ways to make those loops run just a little bit faster. We need multimedia extensions so we can speed up those crucial loops in multimedia applications: it's low-hanging fruit for an important chunk of perceived performance in everyday computing tasks.

Then there are those who say that computers are "fast enough" now, or at least for most users. Most of the computer's time is spent waiting for something it can do for us. We don't need our word processors or web browsers to be any faster. What matters now is to produce lower-power processors with more functionality, so computers can do ever more and ever smarter work for us. Some say we've even run out of ideas to put the computing power we have to effective use. Processors have been getting faster at an astounding exponential rate, and although memory hasn't exactly been keeping up, it's gotten a lot faster too. This has kept system speeds skyrocketing faster than we know how to waste it on slower software.

Bollocks!

Both positions are full of good, technically correct points. But I think they misrepresent what's been happening: the CPU cranking through loops is no longer the major time waster in most computer use. The computer isn't fast enough, and never will be. But ever-faster processors and ever-more efficient loops have hit a point of diminishing returns. It's a simple application of Amdahl's Law: if n% of runtime is spent on a particular part of the job, then the best you can hope for in optimizing that one task is an n% speedup. And that's if you manage to get the task done infinitely fast; if you speed it up by a factor of x, you only shed n%-(n%/x) of overall runtime, while getting to a higher x typically requires a disproportionately large amount of effort.

The moral is obvious: don't waste large efforts getting a large x for a low n. Double the speed of a 10% part of your workload, and what have you gained? An overall 5% improvement. Which is great, but not as much as you could have gotten from a mere 20% speedup on a task that took 50% of your time.

So where are our systems wasting their time (and ours)? Don't believe the benchmarks. Those tend to measure the straightforward, classic computing tasks that used to take so long, way back when. In fact, this trend I'm talking about has already come to the point where it's getting hard to find good benchmarks for that kind of thing. The big problem in composing "practical" benchmark tests today is finding realistic, straight-through processing tasks that take long enough to run! Second big problem: to eliminate the huge, highly variable "warmup" cost of running a benchmark--a cost that would otherwise grossly distort the real runtime of the program. We need consistent, reproduceable results that are subject to all kinds of rational analysis.

Waaaaaiit a minute. Ho! Step back. What's happening here? We're doing our best to eliminate this large, highly variable cost from our benchmarks. But it's getting harder and harder to make the straight-through processing take up a significant part of the total. Now that is an institutional blind spot. We're focusing on a low n so we can fight over the best x!

Back to the high-n part of the job, therefore. Here are some interesting facts about that, which are quite widely known in various circles:

Below are some things we could conceivably do about the real performance problem...

Lazy vtables

For every class that has virtual member functions (or destructors--strictly speaking they're not called member functions I believe), a typical C++ compiler will emit a vtable. This is a list of pointers to the various virtual functions in that class. Every object of the class will have a pointer to its vtable (the vptr), except that each class in the hierarchy may have its pointers point to different implementations of the same virtual function.

All of these functions need to be looked up by the system linker/loader when the code is loaded in memory for execution, so the vtable can be filled with the addresses where the various virtual function implementations happen to have been loaded into memory. Theoretically speaking this is only needed if the class, nay even the individual function is actually used--but the conventional model of separate compilation and linkage makes that impossible for the compiler to predict. Perhaps we could save a lot of time if we didn't need to do all these lookups; lots of them are never used on the average run of a typical program. But inserting run-time checks would be too costly: "I'm going to call this virtual function. Wait a minute, have I looked up its address yet? If not, do it now." That would defeat the entire purpose of the exercise.

Something we could do, however, is to replace the function pointers in the vtable with stubs. The pointers would not be directed to the virtual function implementations themselves on loading; instead, they would all point to a single stub function that does the following:

  1. Identify the function that the caller really intended to call.
  2. See whether the function's address has been looked up yet. If not:
    1. Look it up.
    2. Replace the pointer in the vtable with the newly found address.
  3. Jump to the originally intended function.

After the program has been running for a while, all functions that are actually used would have been looked up and run at full speed, just as in the conventional implementation.

This may look unnecessarily complex. There is a technical reason for that: the program may be multi-threaded, and doing things this way avoids the need for a lock to synchronize multiple threads that may try to make that crucial first call at the same time. Writing the new address to the vtable should be atomic on most architectures, without the need for any synchronization; the compiler implementation would know about this anyway.

This trick could greatly reduce the number of symbols that need to be looked up at program load time, probably reducing the time needed to get some first useful responses from it, at a hopefully small extra cost in initial performance.

There has to be a catch, of course, and there is.

What if the program does not call the virtual function, but tries to take its address instead? You could just let it read the stub, and there are things you could do to make that "patch itself out of the code" to some extent when a call is made through the pointer. But who ever wants to take the address of a virtual function in order to call it? That's exactly what virtual functions are for in C++: to take the "pointer dancing" for function calls with runtime binding out of the programmer's hands--look through the Linux kernel source for some examples of this ugly stuff. No, a much more likely reason why someone could want to take the function's address is to obtain some runtime type information. Compare the function pointer with another. And since that does not involve a call, using stubs here becomes really difficult.

Assuming that such cases are rare, however, it may be well worth the cost to insert a runtime check for this usage. Just check that every virtual function whose address is taken has had its address looked up, then proceed as with a conventional vtable. If that case is rare enough, as per Amdahl's law (a low n and negative x for taking function addresses, but higher n and good x for load effort) this could be a big win.

This is not a new idea. I've used this in the 1990's, and I'm sure others have come up with it before me. A particularly nice part of the "atomic update" trick is that it is also idempotent: even on a CPU that keeps its instruction and data caches separate, there is no need to flush caches to make the change stick.

Daemonized Libraries

Given the cost of creating processes and loading libraries, wouldn't it be great if we could somehow reduce the number of libraries that need to be loaded in the first place? What follows is probably a very silly idea in practice, but let's take a look anyway.

If we had a proper, extremely lightweight inter-process communication mechanism, which most Unix-like systems unfortunately don't seem to have, we could perhaps divest a lot of "library" work to multithreaded daemon processes. A single daemon could serve lots of client processes, loading multiple libraries into its own address space as needed. The clients would pass numeric identifiers for the library function they wanted to call, marshall the arguments and send them over to the daemon (just like conventional RPC, CORBA, SOAP etc. requests) and wait for the result.

This means slower library calls, of course, but in return for a potentially vast reduction in the amount of loader work to be done, as well as in memory usage. And where shared libraries call other shared libraries, well, those calls would be just as efficient as they were. More efficient perhaps, if run-time optimizations are performed. Modern developments in virtual machines and emulation have seen some interesting work in this field. Plus, some low-level parallellism would become very easy to attain: "call this function, but continue processing while the daemon is running. I'll check on the result later."

Running daemons is often difficult, of course, and involves lots of system interaction. Attempts like ActiveX have drawn a lot of fire in the past. What about bugs and memory pollution? What if lots of client processes end up waiting to be served by that daemon? The difference, I think, with what we already have is that existing implementations either pretend to be objects or manage singleton resources.

A "daemonized library" would be more like a stateless worker process. There can be any number of instances; a supervisor in the vein of the Apache webserver (which also dynamically manages a variable number of stateless daemons, if you come to think about it) can spawn daemon processes on demand; stop them from time to time just in case they should get polluted by internal bugs; adjust their numbers to suit the number of concurrent threads supported by the hardware, and so on. With careful design, even runtime library upgrades may be possible, without stopping any running applications to reload them with the new libraries.

So what about memory protection? It would definitely blur the boundaries of memory protection, which is now neatly confied (mostly) to individual processes. Well, every system user would still have separate daemon processes of course. You could also have other context boundaries. In the extreme case, one could spawn a new daemon process for every new child process created from, say, the same shell process, and reuse it in that same process tree just like a regular library.

Would that be costly? Probably. But at least one could pre-spawn them daemons, so they'd be ready for use the moment they're needed--rather than having to be loaded after the application that the user is expecting a response from. That is not something we can do with conventional libraries for many reasons, and it may help "perceived" performance. "Perceived" may sound like the user is a fool who doesn't understand what performance really is. The main point of this chapter is that maybe we techies need to put some effort into learning to see things his way!

Parallellism

Everyone "knows," and has "known" for some time now, that parallellism is going to be the next major step up in performance.

As far as I can see, it hasn't happened. There have been many nice academic ideas that weren't quite ready for practical use, or only worked in environments that weren't (such as overly restrictive languages). Then there are the various threading implementations for the major programming environments, which have more or less standardized but remain messy. And there's the fact that C and C++ don't exactly help make it easy. Java has good ideas on how to do conventional multithreading, but it's still just that--conventional multithreading.

In my humble opinion, we've been looking the wrong way. For an example of how to approach these things, look at const.

I love const. It cuts both ways: it expresses an important semantic property in the source code (good for the programmer), but it also provides a powerful piece of information to the optimizer.

A viable approach to parallellism should be like that: first and foremost a way of specifying correct software, so people have reason to adopt it in their daily programming; and then we can look at how to use it for optimizing the resulting code. That requires a leap of faith: start using it before we know it's going to be designed "just right" from the optimizer's point of view. The same happened with const: what optimizers really wanted was something more like restrict (a C99 feature), and practical experience with const was the only way to make clear just what was needed. The ensuing difficulty in designing a stronger successor indicates to me that const was a stroke of genius. I can only hope that future languages will allow the programmer to define similar attributes for their own classes.

So, I say, design useful correctness semantics for parallellism first, and trust Heaven that it will provide optimization opportunities. Even if it doesn't, a construct that provides this much will have gained us something.

Really easy to say, right? We need a construct that enhances correctness before all else. How can you take a problem that puzzles many, and introduces uncountable (and more often than not, un-debuggable) bugs into the world, and somehow make it enhance correctness?

I think there are things we can do; see below. But first and foremost, we need to try. It's hard, but it's worthwhile. The promised Era of Parallellism will not come all by itself. We can start by thinking about dependency relationships (and lack of same) between tasks.

Parallellism for Correctness

In case you thought this was all theoretical, here's a list of real-world examples that could be vastly improved with a good model for dependencies and independent tasks. Some of them are largely about error handling, which is a particularly important (and underrated) area of programming.

What these examples have in common is the spilled-coffee effect: you set the computer to work on some long-running task and go off to get a cup of coffee in the meantime. By the time you get back, you expect it to be more or less finished.

Instead, you find that almost none of the work has been done. For most of the time you've been away, the program has been sitting there displaying some equivalent of "Press Any Key to Continue." This is of course the point where you may spill your coffee; even if you don't, the coffee break has, in a sense, been wasted.

Like I said, error handling is particularly important. Few programmers spend much time thinking about it. There are good reasons for this. It's against human nature to focus on what might not work; the programmers are intimate enough to understand most of their errors when using their programs, and subconsciously avoid them in the first place; and frankly, it's not usually very clear what to do with them anyway.

So programmers tend to ignore errors. Users, as a result, get bombarded with unclear, unspecific, dumb, or superfluous error messages and tend to ignore them also. A good model for expressing interdependence of tasks can help here: if we stop treating exceptions like hot potatoes that must be passed on to the user immediately, we can aggregate them a bit. Present them in more sensible ways. Recognize what's going on before passing the buck to whoever's sitting outside the computer. "The following error occurred for a bunch of different files." I've seen some programs do that, but not many, and never with multi-selectable lists to make it easy to manipulate the program's reactions to each. Let's find better ways of handling errors, outside of the regular sequential code path!

In the following sections I'll show a very, very simple construct that could help solve all these problems--given one very simple extension to the language.

Cloning Exceptions

We need a way to be able to create copies of exception objects, without full static type information. Why is this under "parallellism?" Because it allows us to write frameworks that perform multiple tasks independently of one another, and gather up their successes and failures for later processing. Just catch any exception coming out of each individual task, add it to a container of results, and proceed to the next independent task.

This can really help. See my pipeline class in libpqxx, the C++ client API to PostgreSQL. This is the kind of "execution management" class that we'll need in order to support more liberal execution models.

(In case you didn't know it, PostgreSQL or "Postgres" for short is a fully enterprise-ready, open-source database management system. I vastly prefer it over MySQL for standards compliance and even, to my own surprise, ease of use. Its licensing is also more liberal than MySQL's, although I must confess to being a GPL man myself.)

Primitive Example: pipeline

The pipeline class is conceptually simple: instead of executing queries in synchronous fashion ("execute this query and give me the result"), you create a pipeline object. You feed your queries into the "front" end, and retrieve results from the "back" end. They are executed in strict insertion order, but apart from that, sequencing is controlled by the pipeline object. This allows for several optimizations:

  1. Latency hiding: your program can go and do something useful while the query is executing. No need to waste the time, nor to risk a gruesome debugging effort to get your event handling right.
  2. Concatenation: PostgreSQL happens to allow queries to be concatenated and sent to the server collectively. If the server is far away on the network, and you have a lot of queries to perform that you can formulate completely before the first one completes, they can all be bundled and transmitted in bulk.
  3. Server-side Pipelining: some database management systems are implemented internally as pipelines. One stage may receive and parse a query, then pass it on to the next which optimizes it; the following stage may plan data access patterns for it, the next (or next few) execute it, and so on until the results are sent back to the client. If these pipeline stages are implemented as independent threads running on separate processors, for instance, then sending more consecutive queries to the server at once may allow these stages to work on the queries concurrently.

Of course, the pipeline's error handling gets a little complicated. If one of your queries fails, you want an exception when you retrieve its result--not before, when it is executed, and you've still got some perfectly good previous results waiting to be retrieved (literally "in the pipeline"). So the pipeline class retains any error information and doesn't convert it to exceptions until the right moment.

Implementing Exception Cloning

The pipeline is lucky in this regard because it exists on the boundary between C (which doesn't have exceptions) and C++ (which does). In the general case, this is much harder to do. What if the query could just throw any C++ exception? About the best we could do, I guess, is to catch as many known exception types as possible, and for each of them, include code to create a copy on the free store (using new), and store that somewhere. We can't return a reference to the original exception, because its lifetime will end at the end of the catch block, and passing copies around directly pretty much restricts us to one basic exception type.

What we arrive at is not very scalable, nor does it lend itself to very effective reuse:

template<typename T> inline std::exception *clone(const T &t)
{
  return new T(t);
}

exception *perform_one(std::string query)
{
  std::exception *fail = 0;
  try
  {
    execute(query);
  }
  catch (const std::bad_alloc &e) { fail=clone(e); }
  catch (const std::length_error &e) { fail=clone(e); }
  catch (const std::domain_error &e) { fail=clone(e); }
  catch (const std::out_of_range_error &e) { fail=clone(e); }
  catch (const std::invalid_argument &e) { fail=clone(e); }
  // TODO: Add more exception types here!

  try { post_query_work(); } catch (...) { }

  return fail;
}

Obviously we need something better. Making run-time type information more accessible would be nice: then we could ask the compiler's run-time system to copy-construct the exception object for us, whatever its type. But that would require a fairly radical change in the language (which I hope to write more about in the future). Let's assume that we only have simple tools, and small changes in the language at our disposal.

As a "quick fix," why not give std::exception a new member function along the lines of:

namespace std
{
class exception
{
public:
  virtual auto_ptr<exception> clone() const
  {
    auto_ptr<exception> result(new exception(*this));
    return result;
  }

  // ...
};

The std::exception hierarchy already has a vtable, so this is no great loss in terms of performance. Now, every concrete class derived from std::exception would hopefully implement this function, always returning a copy of itself as a std::auto_ptr<std::exception> to avoid changing the return type--but dynamically it would always point to an exception of the right type. Programmers implementing exception classes of their own would be called upon to do the same.

So what about scalability--what if some exception class is not supported? The answer is not great, but better than what we had before. If some third-party exception class fails to implement this function, then some details for that exception class will be lost, and you'd get a copy-constructed object of some parent class of the real exception. This is known as slicing and happens a lot in current exception programming as well. At least you should still have the what() string. It's regrettable, but not a new problem.

Moreover, if this were part of the standard, more programmers would support it and at some point one would simply expect it to be there. Remember how old C++ libraries used to grow into frameworks and define their own complete exception hierarchy, completely unrelated to anything in the Standard Library? That slowly went away, the custom exception classes were brought into line with the std::exception hiearchy, and so should exceptions without clone().

Using It

So let's say all relevant exception classes support clone(). What do we do with it? To answer that, here's a simple class that performs a set of actions that are not interdependent. At some future point, this could be optimized to distribute these tasks over multiple threads--but what it does for now is express that these tasks can be executed in any order, and we don't want to just cancel or hold everything whenever an exception comes out of one of them.

/** If you've got a job you'd like us to perform, wrap it in a class derived
 * from task, create an object of your class, and pass it to perform() (see
 * below).  A job is a functor.
 */
class job
{
  std::string m_name;
public:
  job(const std::string &jobname) : m_name(jobname) {}
  const std::string &name() const throw () { return m_name; }

  /// Overridable: the action to be performed.  Feel free to throw exceptions.
  virtual void operator()() throw (std::exception) =0;
};

/** Performs jobs.  Pass it a sequence of pointers to jobs.  Any failed jobs are
 * logged to cerr, but only the first (if any) will throw an exception.
 */
template<typename ITER> void perform(ITER begin, ITER end)
{
  exception *error = 0;

  for ( ; begin != end; ++begin) try
  {
    (**begin)();
  }
  catch (const exception &e)
  {
    std::cerr << "Job " << (*begin)->name() << " failed: "
              << e.what() << std::endl;
    if (!error) error = e.clone();
  }
  if (error) throw *error;
}

Object Sync

In conventional multithreaded programming, you often obtain a lock, access an object related to the lock, release the lock, and continue. This exposes you to two risks that the language doesn't help with:

  1. When you think you're reading (parts of) the object from memory, you may actually be reading from processor registers because the compiler "cached" them for you during a previous access, or "prefetched" possibly incorrect values before you obtained the lock.
  2. After you've modified the object and released the lock, the compiler may still be keeping some of the values you've written in registers, either to write them back after you release the lock (creating a race condition) or not writing them back at all because it thinks it knows exactly how and when it will be read/written next.

Lots of C and C++ programs deal with this problem by ignoring it. This will usually work because the compiler isn't smart enough to optimize across the external lock/unlock function calls or inlined assembly instructions. But compilers get smarter (some already do whole-program analysis), and we normally want them to optimize as aggressively as possible. In fact, even with the lock/unlock functions, it can be good to have optimization "think across" the calls for all matters except the locked object.

We do have one construct to help us explain to a compiler that an object may be shared between different threads or execution contexts: volatile. But this vaguely defined modifier affects the entire lifetime of the object, so all code accessing the object is kept less efficient--just because we need to synchronize it at two points in the program!

What we need, I guess, is a targeted synchronization primitive. Imagine:

namespace std
{
template<typename T> void sync(volatile T &) throw ();
}

Now, doing a std::sync(foo) on an object foo would tell the compiler, "I don't want you to keep any part of foo in registers across this call, because another execution context may access it." It would also help express to anyone reading the code which is the, ahem, object of that lock. I don't think the volatile is strictly necessary (heck, it might even cause compilation problems) but I'm including it here mostly for artistic reasons.

Compilers could implement the template in various ways:

Class Member Names

Some people like to prefix the names of class member variables with underscores (_foo), Hungarian-looking lower-case letters (pFoo), or other prefixes (i_Foo, m_foo). I'm one of them. I dislike Hungarian notation but like to know when code is referring to a member variable of its containing class. I find it really helps, both in maintaining my own code and in reading somebody elses code. Example code snippets can be shorter and clearer given such a convention, because they no longer need to state whether a particular variable is a class member.

Unfortunately, it's not going to work very well unless everyone can agree on a single convention--which they can't. None of them looks quite natural to me. It's often better to express things inside the language than using conventions on top of the language, so here's an idea on how to do that.

Dotted Names

The idea is to allow class member names to start with a dot:

struct bar
{
  int .foo;

  void func() { .foo++; }
};

That may not seem to make much sense, except there is one special rule: if the class member is explicitly qualified as such, the dot can (or must, really) be omitted. So it only shows up when referring to an unqualified name, but in that case it is also enforced by the compiler. This should work for member variables and member functions alike:

struct splat
{
  static int .foo;
  int .myfoo;

  static void .func()
  {
    .foo++;
    splat::foo++;
    foo++;	// ERROR: "foo" undeclared
  }

  void .anotherfunc()
  {
    .myfoo++;
    this->myfoo++;
    (*this).myfoo++;
    .func();
    splat::func();
    func();	// ERROR: "func()" undeclared
  }
};

There could be serious parser problems with this idea; but if it's possible then I think it would solve a lot of problems. It relies on standard language notation to indicate class membership, so no more convention conflicts. It's compiler-enforceable. And it's backwards-compatible: you can even rename a public class member using this notation without breaking code outside the class. Only code inside the class would have to be adapted!

In practice, I also run into naming conflicts elsewhere:

struct atype
{
  int property;

  // Change spelling of parameter to avoid name conflict
  explicit atype(int Property) : property(Property) { }
};

With the proposed "dotted names," that could become the slightly more elegant

struct atype
{
  int .property;

  // Struct member name starts with dot, parameter name doesn't and can't
  explicit atype(int property) : .property(property) { }
};

A problem that remains, of course, is that property names often come in threes:

struct record
{
  string .name;
  explicit record(string name) : .name(name) { }
  string .name() const { return .name; }	// Which .name do you mean!?
};

Ideas on solving that case are more than welcome. But even without it, I think "dotted names" could be a great maintainability enhancer.

Templates

New template bracket syntax

If templates were a bit easier to parse, we'd be rid of many little problems we have now. The problem is that the <angled brackets> we use for templates had various different meanings already (in the <, <=, << etc. operators), and that because of this, they are not consistently matched in pairs as the other bracket characters ([{}]) are. This is also a pain in the backside for other programs that try to parse the language, such as syntax-colouring plugins for editors.

If we had some easily recognizable syntax such as

template![int T, typename C]

then we would be rid of warts like set<set<int> > needing that piece of whitespace to avoid parsing >> as a shift operator, or the abominable typename A<T>::template B<int> b; (as found in the gcc 3.4 series changelog). As David Vandevoorde points out (thanks David!) the typename keyword would have to remain, but the template in the middle would become unnecessary.

Non-intrusive tuples

It bugs me that associative containers use std::pair, with its first and second, whereas sequential containers just contain their value types--meaning that algorithms written to deal with the two kinds of iterators may have to look very different:

template<typename IT> void dump_values(IT begin, IT end)
{
  for ( ; begin != end; ++begin) cout << *begin << endl;
}

// Special version for associative containers--sorry!
template<typename IT> void dump_values(IT begin, IT end)
{
  for ( ; begin != end; ++begin) cout << begin->second << endl;
}

Sure you could add a template argument specifying how to access the iterators, and either write your own functor that just returns an iterator's ->second or use the standard library's nifty lambda-calculus tricks (not all of which made it into the Standard) to achieve the same effect. But in practice it's rarely worth the trouble. Your code usually gets longer, not shorter, and harder, not easier to maintain.

What if we had some way of specializing what first and second mean in the context of a container? What if we accessed iterators using templated key() and value() functions defined by the container? A set would be nothing more than a map where the two iterator access functions return the same value, or where value() returns void (a set<T> would then be a map<T,void>). A type like vector could implement a key() that computed an iterator position's index, making it work from your algorithm's point of view like an associative container indexed by integral values. Which is what it is, really. The only remaining difference besides performance specifications would be that a vector must have a contiguous key range. That may not be of any interest to most algorithms!

Why do I propose making these functions members of the container rather than of the iterator? Several reasons:

  1. It's non-intrusive on the iterator, meaning that it would still work with pointers. One of the strong points of the standard library's iterator model is that it also covers pointers.
  2. Containers like vector may need access to private members to compute an iterator position's key() efficiently.
  3. It would be easy to make the key() and/or value() functions overridable through the container's template parameters.

The iterator's traits would also have to provide a way to access these functions so sequence algorithms (which don't know about containers) could access them.

There is one big chicken-and-egg problem with all this: if these functions are added as container template parameters, how do you specify the defaults? You can't write "I want a set<int> with, as its value() function, whatever is the builtin default for a set of int with the default implementation for value()." It's an infinitely recursive definition. Perhaps the problem can be solved with an extra layer of templatization on the contained type (so that set and map, for instance, would only differ in this detail) but of course we also want to keep containers simple.

Did I just say that? Make it "keep containers at or near their current daunting level of complexity." At least until compiler error diagnostics improve dramatically.

Namespaces

Access protection

Imagine being able to write:

namespace foo 
{
private:
	void bar();
public:
	int splat(); 
}

What I do to get a similar effect in libpqxx is to define a public namespace pqxx, and within that, a nested namespace called internal which is documented as something that users should not touch. With access protection I could actually enforce this, and know that I wouldn't be breaking any third-party code when changing classes etc. in there.

Namespace inheritance

It may not make all that much sense in practice, but allowing inheritance between namespaces would give protected a meaning in namespace context. Presumably it could be "tucked underneath" the definition of class inheritance in the same way that namespaces themselves have been turned into a fitting component of the existing class inheritance mechanism.

Static virtual functions

Once you start doing namespace inheritance, you might as well have virtual functions in namespaces (and, by extension, static virtual functions in classes as well). "Dynamic" dispatch (actually I suppose it could be optimized away at link time) would be based on the calling function's namespace, regardless of whether the implementation's declaration is known to the compiler at the call site.

Access protection

"prohibited" protection level

Imagine a protection level "beyond private," which doesn't allow any access at all--not even from within the declaring class.

"prohibited" would be useful for disallowing copy constructors, copy assignment operators or default constructors that the compiler would implicitly generate otherwise. Standard practice for accomplishing this is currently to declare them as private and deliberately fail to implement them, but it's sort of a dirty trick and errors may not be caught until link time. Make them prohibited and you'll get a compile-time error if you try to call them even from within the class itself. The new protection level would also be useful for implementing virtual member functions that should only be accessed through virtual calls. Virtual functions can be dangerous, believe it or not (ever called one from a constructor or destructor?) so it may be useful to restrict access to them even from within the defining class.

Update: David Vandevoorde tells me that work is being done on a special syntax to suppress automatic generation of constructors and assignment operators, making the "false declaration" trick unnecessary.

Ideally the new keyword would precede "private" in alphabetical order so it fits into the private - protected - public progression. But at least prohibited starts with a 'p'.

Update: Bart Samwel found a word that meets these requirements and has a fitting meaning: precluded.

Member function access levels

Imagine being able to specify the level of access a member function has to its class: private (default and maximum), protected (as if it were defined in a derived class), or public (as if it weren't a member at all).

You'd be able to define member functions that look like perfectly normal members from the syntactic point of view, but can be implemented purely in terms of the class' public interface. You could add functionality to a class in the form of member functions without losing oversight of what state transitions the objects can go through; the number of "privileged" member functions would remain limited.

This is currently worked around by not making the "unprivileged" functions members at all. But if the interface between the class and the function ever changes, for instance because an optimization opportunity is found that requires more access to the class' implementation details, you can either move the function back into the class (and break source compatibility) or make it a friend (and break the whole model).

Builtin Types

Sub-byte types

typedef bool bit_t:1;

You wouldn't be able to take its address or its size, which complicates the way arrays are defined, but it could be nice to have packed-bit arrays in the language.

likely and unlikely

It'd be nice if we had likely bool and unlikely bool types (somewhat similar to signed char and unsigned char, with optional compiler warnings for such probable mistakes as implicit conversions between the two, exceptions thrown on likely conditions, etc. (Note that operator!(likely) would return unlikely and vice versa).

Good for static compiler optimization as well as code clarity. As a side effect, C99's if (unlikely(x < 0)) syntax would also work. And with unlikely being a builtin type instead of an operator or function, there is less temptation to read that statement as "if x is unlikely to be negative," which is not what it means.

I wonder if these two new types should be seen as different from regular bool for purposes of overloading and template specialization... On the one hand it would make things much more complex, but on the other, it might allow for some additional optimizations.

void Objects

Why can't we have void variables? Some compilers used to implement this as a C language extension. In C++ it could be even more fun. Consider:

template<typename T> inline T foo(T t)
{
	return bar(t);
}

What happens if for some type T, bar<T>() returns void? Should instantiation of this template fail over such a minor point? Hell, no! There will be an error if the program tries to receive the return value into a variable (unless that too is void, of course) and unless that happens, there is no problem whatsoever.

Promoting, Converting, or Returning void

Returning a non-void value from a void function would still yield a compiler warning, but this time as a type demotion rather than as a special case. Returning a void from a non-void function (say it returns type T) could either be an error or return a default-constructed T. T's default constructor doubles as a "promotion" from void to T.

This raises the question of whether we'd need to have explicit default constructors as well, but I suspect there wouldn't be much need.

Container Library

Once we allow void objects, some simple templates may become implementable as degenerate cases of more complex ones:

template<typename T> typedef map<T,void> set<T>;

The standard library may specialize this to contain no object storage at all. The set<void> would then hold no more than a boolean to indicate whether an "object" had been inserted (or a counter if we decide that all void values should differ, instead of considering them all equal). Personally I'd like to say all void objects were not only equal, but actually the same object. But that may not be feasible; see below.

Storage

If void is allowed to have zero size, arrays and other variables of void type could be optimized away entirely in all cases. That may reduce the need for manual specialization when we reduce simple templates to degenerate cases of generic ones; you'd be almost guaranteed maximum efficiency for the void case, as if it were a manual re-implementation.

On the other hand, if it better fits the Standard to give each void object its own identity and nonzero size, then this could be a convenient way to tell the compiler to generate a unique address (without necessarily allocating storage to it). Sometimes that's useful too. We could just define sizeof(void)==1 and specify that the compiler may avoid assigning real storage to it where it can. That would be consistent with the notion that incrementing a void pointer increases it by one, as e.g. some code in Linux assumes (which probably means it's part of the C99 standard).

Arguments

A nasty consequence of all this is that a function's number of variables suddenly becomes a relative thing. If a function declared as void foo(); may be called as either foo() or foo(void()), why not allow foo(void(),void()) as well? That will take some more thinking, but I'm sure there's a way of living with it.

It may even be put to use. Function templates that accept "up to 5" arguments for example could in some cases be implemented as a single template, some of whose arguments may be of type void. Present practice requires 6 different definitions for zero arguments, one argument etc.:

template<typename A, typename B, typename C, typename D, typename E>
inline void print_total(A a, B b, C c, D d, E e)
{
	// Add and print.  Addition operators taking void arguments are all
	// defined as no-ops.
	cout << a + b + c + d + e << endl;
}

To be fair though, this is probably an exceptional case. Most templates will probably fail to compile when compiled with void as an argument type. But that's all compile-time trouble, not run-time surprises, and if those cases can be fixed it may eventually make for more elegant programs.

Miscellany

History


Valid XHTML 1.0! Boycott Internet spam! Created with vim Back to Jeroen Vermeulen Home