C++11 added move semantics along with rvalue references to the language. This allows for the user to indicate when a function can be destructive to its arguments. For example a “move” constructor could avoid an expensive memory allocation and copy by just taking ownership of its argument’s pointer. Additionally, it allows for the creation of classes which can only be moved but not copied. This trait is used by unique_ptr
to help ensure only a single instance of the class remains responsible for the managed pointer.
While only standardised in C++11, Boost.Move provides mechanisms to achieve similar results in C++03:
Boost.Move emulates C++0x move semantics in C++03 compilers and allows writing portable code that works optimally in C++03 and C++0x compilers.
I took an interest in understanding how Boost achieves this. Boost’s documentation on how to implement a “movable but non-copyable class” seemed like a good place to start. It points us towards two macros:
To write a movable but not copyable type in portable syntax, you need to follow these simple steps:
Put the following macro in the private section:
BOOST_MOVABLE_BUT_NOT_COPYABLE(classname)
Write a move constructor and a move assignment taking the parameter as
BOOST_RV_REF(classname)
This is essentially Boost’s take on the rvalue
reference. It is a macro defined in move/core.hpp
:
#define BOOST_RV_REF(TYPE)\
::boost::rv< TYPE >&
with boost::rv
being the class shown below (with simplified templating):
template <class T>
class rv : public T
{
rv();
~rv() throw();
rv(rv const &);
void operator=(rv const &);
};
An instance of rv<T>
is little more than a derived type of T
- theres no added bells or whistles.
Its purpose is to provide a type which has a “is a” relationship with our class. When used as a function argument it indicates when its acceptable to “move” resources from it.
This macro (also in move/core.hpp
) can more or less be split into the following two sections:
Below Boost is declaring both the copy constructor and assignment operator private; preventing any unintentional copies.
private:
TYPE(TYPE &);
TYPE& operator=(TYPE &);
Boost then proceeds to define public conversion functions to (rather forcibly) allow conversion of our class, T
, to a reference to boost::rv<T>
.
public:
inline operator boost::rv<TYPE>&()
{
return *reinterpret_cast<
boost::rv<TYPE>*>(this);
}
inline operator const boost::rv<TYPE>&() const
{
return *reinterpret_cast<
const boost::rv<TYPE>*>(this);
}
There’s not much point in making a movable class without providing some way to allow a user to explicitly request a move. That is where the boost::move()
function (defined in utility_core.hpp
) comes into play. A cut down version of this is:
template <class T>
rv<T>& move(T &x)
{
return *reinterpret_cast<boost::rv<T>*>(
boost::move_detail::addressof(x));
}
Much like the conversion functions above, this function will, for a given reference of type T
, return a reference to boost::rv<T>
.
Now we’ve seen the core building blocks Boost uses, we a can put together a simple example of a movable class which doesn’t rely on Boost (so hopefully is easier to follow). This contrived example has the class move_only_class
using an integer (named value
) as an example of movable data. value
will be set to -1
to indicate a move has occurred.
#include <iostream>
#define TRACE(INFO) \
std::cout << __FILE__ << ":" << __LINE__ \
<< "\t" << INFO << "\t" << value \
<< "\t" << this << std::endl;
/* Cast "move_only_class" to this when we want to
* indicate that we want a "move" to occur. */
template <typename T> class allow_move : public T
{
public:
allow_move();
~allow_move();
allow_move(allow_move const &);
allow_move &operator=(allow_move const &);
};
/* A noisy class which cannot be copied, only
* moved */
class move_only_class
{
public:
move_only_class(int arg) : value(arg)
{
TRACE("Constructor(int)");
}
move_only_class() : value(99)
{
TRACE("Constructor(default)");
}
~move_only_class()
{
TRACE("Destructor");
}
operator allow_move<move_only_class> &()
{
TRACE("Conversion(allow_move)");
return *reinterpret_cast<
allow_move<move_only_class> *>(this);
}
operator allow_move<move_only_class> const &()
const
{
TRACE("Conversion(allow_move const)");
return *reinterpret_cast<
const allow_move<move_only_class> *>(this);
}
move_only_class(
allow_move<move_only_class> &other)
: value(other.value)
{
other.value = -1;
TRACE("Construction(allow_move)");
}
move_only_class &
operator=(allow_move<move_only_class> &other)
{
value = other.value;
other.value = -1;
TRACE("Assignment(allow_move)");
return *this;
}
int value;
private:
move_only_class(move_only_class &);
move_only_class &operator=(move_only_class &);
};
/* Call this to request a move */
allow_move<move_only_class> &
move_it(move_only_class &x)
{
return *reinterpret_cast<
allow_move<move_only_class> *>(&x);
}
The following demonstrates a move assignment:
move_only_class a(10);
move_only_class b(20);
b = move_it(a);
This generates the logs below. We can see the call to move_it(a)
results in a call to the assignment operator that takes the argument of allow_move<move_only_class>
, resulting in a move.
Constructor(int) 10 0x7ffdf7f9fb08
Constructor(int) 20 0x7ffdf7f9fb00
Assignment(allow_move) 10 0x7ffdf7f9fb00
Destructor 10 0x7ffdf7f9fb00
Destructor -1 0x7ffdf7f9fb08
If we try to assign directly from an rvalue
it “just” works without a call to move_it()
:
move_only_class b(1);
b = move_only_class(100);
Looking at the debug produced, the compiler has performed an implicit conversion to allow_move<move_only_class>
for us:
Constructor(int) 1 0x7ffdbcbec748
Constructor(int) 100 0x7ffdbcbec740
Conversion(allow_move) 100 0x7ffdbcbec740
Assignment(allow_move) 100 0x7ffdbcbec748
Destructor -1 0x7ffdbcbec740
Destructor 100 0x7ffdbcbec748
However, if we try to assign from an lvalue
without move_it()
:
move_only_class a;
move_only_class b = a;
the compiler errors out with the following:
‘move_only_class::move_only_class(move_only_class&)’ is private within this context`
I’ll admit this behaviour had me confused. I wasn’t sure why the compiler wouldn’t just perform the same implicit conversion for the lvalue
that it did for the rvalue
.
The answer is that we cannot bind rvalues
to a non-const
(lvalue
) reference. As I have not provided a constructor or assignment operator that takes a const
argument, the compiler can perform an implicit conversion to allow_move<move_only_class>
for an rvalue
.
If the class were modified to also declare private copy and assignment functions that take a const
parameter, like so:
private:
move_only_class(move_only_class &);
move_only_class &operator=(move_only_class &);
move_only_class(move_only_class const &);
move_only_class &operator=(move_only_class const &);
Then the compilation of the rvalue
assignment will fail with a similar error as the lvalue
assignment. The compiler is now trying to call the const
assignment operator:
‘move_only_class& move_only_class::operator=(const move_only_class&)’ is private within this context
Edit: Upon re-reading I felt there was further clarification needed here, so I wrote a follow up post.
The above example didn’t sit quite right with me. I’ve seen enough blog posts / conference talks to be aware that attempts at type punning, or anything resembling it, could well be Undefined Behaviour (UB). So, I’ve attempted to understand what N3242 C++11 Working Draft (no C++03 draft available) had to say on the matter.
Firstly, considering the implementation of move_it()
:
allow_move<move_only_class> &
move_it(move_only_class &x)
{
return *reinterpret_cast<
allow_move<move_only_class> *>(&x);
}
I don’t think there is any UB here. The (draft) standard looks to more or less cover this case between N3242 [expr.reinterpret.cast].7:
A pointer to an object can be explicitly converted to a pointer to a different object type. When a prvalue v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is
static_cast<cv T2*>(static_cast<cv void*>(v))
if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1. Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. The result of any other such pointer conversion is unspecified.
and N3242 [expr.reinterpret.cast].11:
An lvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a
reinterpret_cast
. That is, a reference castreinterpret_cast<T&>(x)
has the same effect as the conversion*reinterpret_cast<T*>(&x)
with the built-in&
and*
operators (and similarly forreinterpret_cast<T&&>(x)
). The result refers to the same object as the source lvalue, but with a different type. The result is an lvalue for an lvalue reference type or an rvalue reference to function type and an xvalue for an rvalue reference to object type. No temporary is created, no copy is made, and constructors (12.1) or conversion functions (12.3) are not called.
However, I think there’s UB when it comes to accessing the contents of the allow_move<move_only_class>
reference, such as in the move constructor:
move_only_class(
allow_move<move_only_class> &other)
: value(other.value)
{
other.value = -1;
TRACE("Construction(allow_move)");
}
We can use reinterpret_cast
to bind an object originally defined with type move_only_class
to allow_move<move_only_class> &
. However the C++ object model appears to stipulate that accesses to that reference remain governed by the original object type. Here the original type was move_only_class
, which the standard would define as the “most derived object”. The class allow_move<move_only_class>
inherits from it but does not meet the criteria to access its memory, for example it is not a subobject of the “most derived class”.
The evidence to support this theory requires a trip down the following rabbit hole, starting with N3242 [basic.lval].10:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
- the dynamic type of the object,
- a cv-qualified version of the dynamic type of the object,
- a type similar (as defined in 4.4) to the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
- an aggregate or union type that includes one of the aforementioned types among its elements or non- static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
- a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
- a char or unsigned char type.
Following this to the definition of dynamic type (N3242 [defns.dynamic.type]):
type of the most derived object (1.8) to which the glvalue denoted by a glvalue expression refers [ Example: if a pointer (8.3.1) p whose static type is “pointer to class B” is pointing to an object of class D, derived from B (Clause 10), the dynamic type of the expression *p is “D.” References (8.3.2) are treated similarly. — end example ]
… and onto determine what “most derived object” means from [intro.object].1:
An object is a region of storage. […] An object has a type (3.9). The term object type refers to the type with which the object is created.
… and [intro.object].2:
Objects can contain other objects, called subobjects. A subobject can be a member subobject (9.2), a base class subobject (Clause 10), or an array element. An object that is not a subobject of any other object is called a complete object
… and [intro.object].4:
If a complete object, a data member (9.2), or an array element is of class type, its type is considered the most derived class, to distinguish it from the class type of any base class subobject; an object of a most derived class type or of a non-class type is called a most derived object.
Additionally, the “aggregate” clause in [basic.lval].10 sounded like it might potentially permit us to access the object, until we read its definition in [dcl.init.aggr].1:
An aggregate is an array or a class (Clause 9) with no user-provided constructors (12.1), no brace-or-equal-initializers for non-static data members (9.2), no private or protected non-static data members (Clause 11), no base classes (Clause 10), and no virtual functions (10.3).
I think that this occurrence of UB could be avoided if we simply cast back to the original type before accessing value
:
move_only_class(
allow_move<move_only_class> &other)
{
move_only_class &access =
reinterpret_cast<move_only_class &>(other);
value = access.value;
access.value = -1;
TRACE("Construction(allow_move)");
}
Its worth noting that the Boost implementation of unique_ptr
directly accesses the members of the BOOST_RV_REF
class. As does this example provided in the Boost documentation. Admittedly this has me doubt my interpretation on this being UB in the first place. Even if it is, I suspect that no compiler is actually performing any unexpected optimisation on this since it likely would have been detected before now.
gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
and clang (version 9.0.0)
-std=c++03 -pedantic -Wall -Wextra -O0 -g3 -fno-elide-constructors
1.71.0