I've calmed down a little after some teething, but there are still some inconsistencies that seem a bit off. For example, it seems you have to include function implementation for templated classes in the header file?
Templates are a complicated beast. The thing to understand about templates is that they cause the compiler to generate code. A single template function could result in 100 different functions being generated. For example, if you write foo<int> and foo<double>, the compiler will generate two completely different functions.
Why is this important? First, let's look at how a compiler works with a non-templated function. it sees something like this in a header:
Code:
struct Foo
{
void foo();
};
It mangles the function name and produces something like ?foo@Foo@@QAEXXZ So now it says "ok there's a function called ?foo@Foo@@QAEXXZ. Any time someone calls this I'll write the call into the object file as ?foo@Foo@@QAEXXZ, and hopefully later some cpp file will contain a definition of this function. So I'll just let the linker figure it out at the end after every file has been compiled".
So it happily goes along, and later on sure enough some cpp file defines this function. At that point the compiler says "I've just compiled this function called ?foo@Foo@@QAEXXZ, so I will write its name and definition into my object file". Lnd later still, after all compiling is done, the linker says "ok I see there's a definition for a function called ?foo@Foo@@QAEXXZ. I will put its code at address 0x12345678. And over here is some code that calls that same function. I'll write the call instruction into the executable as calling the function at 0x12345678, since that's where I've placed the code for this function in the executable."
Everybody's happy.
Now consider the case of a template function. Suppose you had this:
Code:
// File a.cpp
void f()
{
Foo foo;
foo.foo<int>();
}
// File b.h
struct Foo
{
template<class T>
void foo();
};
// File b.cpp
#include "b.h"
template<class T>
void Foo::foo<T>()
{
}
When it compiles a.cpp, it does the same thing as in the previous example. It says "ok I've got this function named ??$foo@N@Foo@@QAEXXZ". (note that this function has a different mangling. It has an @N near the beginning. If you change int to double you'll get an @H instead. Every type will produce a different mangling). "I'll put it into my object file as before, and let the linker figure it out". Then it compiles b.cpp. Now it says "hmm, I've got this template function, but I don't know what type the template parameter is supposed to be. What am I supposed to do? How many functions do I write into the object file, and what should they be called?" Why doesn't it know that someone else called it with foo<int>? Remember compilation happens in parallel. One file, one process. The process that compiled a.cpp might have ended a long time ago. Hell, a.cpp might not have even been compiled yet. So b.cpp has *no idea*. Not only does it not know *how* to generate the code, but it doesn't even know what to call the function it generates. So the compiler is stuck here, it doesn't know what to do, so it just skips the function and does nothing. At the end you get an unresolved symbol error from your linker, because the compiler didn't generate code for the integer instanation.
You can actually "help" the compiler here. For example, in b.h you can explicitly instantiate it with a line like:
This will cause compilation of b.cpp to know that it needs to generate an instanation with an int, so when it compiles b.cpp (which includes b.h), the compiler will see this and everything will work.
The problem with this approach, of course, is that the b.h needs to know in advance every possible way that the template will ever be instantiated. If you ever use it a different way, you'll get a linker error at the end.
Now, if you're astute, you might be thinking "Well linkers already do link time code generation, and the linker knows every possible instantiation. Why can't the linker just generate the code for all these instantiations?" And the answer is, it can! Kinda. In fact, if I'm not mistaken this is the way in which the EDG compiler implemented the export keyword (which was basically something in the standard that allowed compilers to work exactly the way in which you were hoping it did work). But this path is fraught with peril.
One of the issues with the export keyword (and generating code for template instanations at link time) is that compilation is highly parallel. Every compilation is run in a different process. You can compile as many files at once as you have cores on your machine. Linking, on the other hand, is highly serialized. The whole point is to take many inputs and produce one output, so by its very nature it has to serialize somehow. This may seem like a minor inconvenience but if you consider how many instanations of templates come from things like the standard library and when you start doing metaprogramming, it becomes really prohibitively expensive. It's easy to imagine a program with millions or even
billions of template instanations.
LTCG is a relatively new thing anyway, so even if it were possible to do it efficiently, you lose a ton of potential for optimization. The compiler has so much more knowledge about your program. type information, semantics, data flow analysis, etc. And compiler optimization is really just an exercise in data flow analysis. So programs would not only build significantly slower as a result of doing this, but they would run much slower as well. Linking is already the slowest part of a large build anyway. By making each compilation do a little extra work (namely, instantiating all the templates every time in every process) it may be more overall work, but it gets done faster because it can be parallelized.
Then, there is the issue of how much is even possible in the linker? Although the linker would be generating code, it would be generating it based on a template that you wrote. When it does LTCG for optimization purposes, it can be assured there are no errors because it's not generating it based on user input. It's like the adage of "make sure you sanitize user input". Code you write is user input to a compiler. If the linker were to generate the code for your templates, it may encounter errors in your code. Does the
linker now get into the business of reporting
compiler errors? It's not immediately obvious that there's a good solution here.
Finally, it's not even clear that it
could be standards conformant. Since the linker doesn't have the same level semantic information as the compiler, certain advanced features like SFINAE (beyond the scope of this post, feel free to google it though) may not even be made to work at all.
I know this is a long post, but hopefully this clears some things up!
Edit: By the way, you should have a look at
C++ Modules. It's a C++ language feature originally planned for C++11 which was removed from C++11 for technical reasons. It's still on the slate for inclusion in a future versino of the standard, the next of which is C++17. Clang has already started working on implementing it and has made significant progress.
Implementing modules essentially solves a lot of the same problems required for implementing the export keyword in the compiler. I'm not up to date on the latest details regarding modules and how they interact with templates, but my understanding is that modules will open the door to fixing the inline template definition problem.