C++ Vector

Introduction
This article looks into the internals of the libstdc++ implementation of std::vector. You can find it in your gcc installation, in a directory like this:

/usr/lib/gcc/i686-pc-linux-gnu/3.4.6/include/g++-v3/

Design
new_allocator ∆                       │                  allocator ∆                       │ _Vector_base ◊── _Vector_impl ∆    │  vector

Constructing a Vector
Let's look at an example:

vector V(2);

This vector ctor is called (from bits/stl_vector.h): explicit vector(size_type __n) : _Base(__n, allocator_type) { this->_M_impl._M_finish = std::uninitialized_fill_n(this->_M_impl._M_start, __n, value_type); } It calls the _Vector_base ctor - in the example the call would be: _Vector_base::_Vector_base(2, allocator ) Which is this ctor: _Vector_base(size_t __n, const allocator_type& __a) : _M_impl(__a) {       this->_M_impl._M_start = this->_M_allocate(__n); this->_M_impl._M_finish = this->_M_impl._M_start; this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n; } _M_impl is a struct _Vector_impl which is defined as     struct _Vector_impl : public _Alloc { _Tp*          _M_start; _Tp*          _M_finish; _Tp*          _M_end_of_storage; _Vector_impl (_Alloc const& __a) : _Alloc(__a), _M_start(0), _M_finish(0), _M_end_of_storage(0) { }     }; The _Vector_impl ctor calls the base copy ctor allocator ::allocator(allocator const &a) and sets the pointers to 0. So now the vector has an allocator and pointers defining data location, end of data, and end of allocated space.

Returning to      _Vector_base(size_t __n, const allocator_type& __a) : _M_impl(__a) {       this->_M_impl._M_start = this->_M_allocate(__n); this->_M_impl._M_finish = this->_M_impl._M_start; this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n; } The first statement allocates space for __n * sizeof(_Tp). As _M_impl inherits from allocator, _M_allocate may just call allocate on _M_impl: _Tp* _M_allocate(size_t __n) { return _M_impl.allocate(__n); } The standard allocator is defined in bits/allocator.h: template class allocator: public ___glibcxx_base_allocator<_Tp> Actually it just inherits from new_allocator, which is defined in ext/new_allocator.h: template class new_allocator {   public: ...     typedef _Tp*       pointer; ...     pointer allocate(size_type __n, const void* = 0) { return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp))); } This strange usage of the new operator is defined in the header file 'new': void* operator new(std::size_t) throw (std::bad_alloc); In our example, it just allocates sizeof(int) * 2 bytes, and casts the pointer to a int *.

Returning yet again to      _Vector_base(size_t __n, const allocator_type& __a) : _M_impl(__a) {       this->_M_impl._M_start = this->_M_allocate(__n); this->_M_impl._M_finish = this->_M_impl._M_start; this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n; } We initialize _M_finish = _M_start (signifying that the vector is empty) and indicates that we have allocated sizeof(int) * __n bytes by setting _M_end_of_storage = _M_start + __n (which is one-past-the-end of the allocated memory).

We are done with the _Vector_base ctor, and may return to the ctor of vector itself: explicit vector(size_type __n) : _Base(__n, allocator_type) { this->_M_impl._M_finish = std::uninitialized_fill_n(this->_M_impl._M_start, __n, value_type); } where we fill the newly allocated memory with a default value. Note that value_type actually is a call to the default ctor for our type, in the example it is a call to int which yields 0.

uninitialized_fill_n is from bits/stl_uninitialized.h: template inline _ForwardIterator uninitialized_fill_n(_ForwardIterator __first, _Size __n, const _Tp& __x) {     typedef typename iterator_traits<_ForwardIterator>::value_type _ValueType; typedef typename __type_traits<_ValueType>::is_POD_type _Is_POD; return std::__uninitialized_fill_n_aux(__first, __n, __x, _Is_POD); } Now, _Tp is Plain-Old-Data (POD), so this version of __uninitialized_fill_n_aux is called: template inline _ForwardIterator __uninitialized_fill_n_aux(_ForwardIterator __first, _Size __n,                              const _Tp& __x, __true_type) { return std::fill_n(__first, __n, __x); } From bits/stl_algobase.h: template _OutputIterator fill_n(_OutputIterator __first, _Size __n, const _Tp& __value) {     ...      for (__n > 0; --__n, ++__first) *__first = __value; return __first; } So we basically just set every value to the default value of the datatype, in the example 0. So in conclusion, the ctor: explicit vector(size_type __n) : _Base(__n, allocator_type) { this->_M_impl._M_finish = std::uninitialized_fill_n(this->_M_impl._M_start, __n, value_type); }


 * Allocates memory for __n * sizeof(type).
 * Set start pointer to a the beginning of the allocated memory.
 * Set end pointer = start pointer to signify that the vector is empty.
 * Set end_of_storage pointer = start + n (one past the end of the allocated memory).
 * Initializes the memory using the value of the default ctor of the type.

Vector Iterators
So, in our example Vector V(2); What would vector::iterator and vector::const_iterator be? Well, they are defined as: template > class vector : protected _Vector_base<_Tp, _Alloc> {     ...      typedef vector<_Tp, _Alloc>                       vector_type; public: typedef _Tp                                       value_type; typedef typename _Alloc::pointer                  pointer; typedef typename _Alloc::const_pointer            const_pointer; ...     typedef __gnu_cxx::__normal_iterator iterator; typedef __gnu_cxx::__normal_iterator const_iterator; ... allocator defines pointer and const_pointer: template class allocator: public ___glibcxx_base_allocator<_Tp> {     ...      typedef _Tp*       pointer; typedef const _Tp* const_pointer; ... So, in our example: iterator      is a __normal_iterator>, and const_iterator is a __normal_iterator> In bits/stl_iterator.h, we find __normal_iterator: template class __normal_iterator {   protected: _Iterator _M_current; public: ...     typedef typename iterator_traits<_Iterator>::value_type  value_type; typedef typename iterator_traits<_Iterator>::difference_type difference_type; typedef typename iterator_traits<_Iterator>::reference reference; typedef typename iterator_traits<_Iterator>::pointer  pointer; __normal_iterator : _M_current(_Iterator) { } ...     // Forward iterator requirements reference operator* const { return *_M_current; } pointer operator-> const { return _M_current; } __normal_iterator& operator++ { ++_M_current; return *this; } ... _M_current, which is the only variable of the iterator, is of type _Iterator, which in our example was a int*. So a vector ::iterator is just a (very) fancy int* underneath.

Destroying a Vector
From bits/stl_vector.h:     /** * The dtor only erases the elements, and note that if the * elements themselves are pointers, the pointed-to memory is       *  not touched in any way. Managing the pointer is the user's      *  responsibilty. */     ~vector { std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish); } bits/stl_construct defines _Destroy: template inline void _Destroy(_ForwardIterator __first, _ForwardIterator __last) {     typedef typename iterator_traits<_ForwardIterator>::value_type _Value_type; typedef typename __type_traits<_Value_type>::has_trivial_destructor _Has_trivial_destructor; std::__destroy_aux(__first, __last, _Has_trivial_destructor); } bits/stl_iterator_base_types.h defines iterator_traits<_ForwardIterator>::value_type template<typename _Tp> struct iterator_traits<_Tp*> {     typedef random_access_iterator_tag iterator_category; typedef _Tp                        value_type; typedef ptrdiff_t                  difference_type; typedef _Tp*                       pointer; typedef _Tp&                       reference; }; In our case, _ForwardIterator is an int*, and thus iterator_traits<int*>::value_type is an int To evaluate, whether _Value_type (int, that is) has a trivial destructor, we look in bits/type_traits.h: template<> struct __type_traits {   typedef __true_type    has_trivial_default_constructor; typedef __true_type   has_trivial_copy_constructor; typedef __true_type   has_trivial_assignment_operator; typedef __true_type   has_trivial_destructor; typedef __true_type   is_POD_type; }; It does.

So, to look at _Destroy from bits/stl_construct.h again: template<typename _ForwardIterator> inline void _Destroy(_ForwardIterator __first, _ForwardIterator __last) {     typedef typename iterator_traits<_ForwardIterator>::value_type _Value_type; typedef typename __type_traits<_Value_type>::has_trivial_destructor _Has_trivial_destructor; std::__destroy_aux(__first, __last, _Has_trivial_destructor); } Which then in our example calls: template<typename _ForwardIterator> inline void __destroy_aux(_ForwardIterator, _ForwardIterator, __true_type) { } Which does nothing. So this whole endeavour did nothing. Big Whoop. So when does the memory actually get deallocated? Well, our base class destructor also gets called: ~_Vector_base { _M_deallocate(this->_M_impl._M_start,                     this->_M_impl._M_end_of_storage - this->_M_impl._M_start); } Which deallocates (_M_end_of_storage - _M_start) bytes at _M_start using _M_deallocate: void _M_deallocate(_Tp* __p, size_t __n) { if (__p) _M_impl.deallocate(__p, __n); } Which calls the allocators deallocate method, as defined in ext/new_allocator.h:     // __p is not permitted to be a null pointer. void deallocate(pointer __p, size_type) { ::operator delete(__p); }

Simple Vector Operations
Now we can look at some simple vector operations: template<typename _Tp, typename _Alloc = allocator<_Tp> > class vector : protected _Vector_base<_Tp, _Alloc> {     ...      typedef typename _Alloc::reference                 reference; typedef __gnu_cxx::__normal_iterator<pointer, vector_type> iterator; typedef size_t                                    size_type; ...     iterator begin { return iterator (this->_M_impl._M_start); } iterator end  { return iterator (this->_M_impl._M_finish); } bool    empty const { return begin == end; } reference operator[](size_type __n) { return *(begin + __n); } size_type size const { return size_type(end - begin); } ... begin and end constructs a __normal_iterator, using this ctor: explicit __normal_iterator(const _Iterator& __i) : _M_current(__i) { } where the iterator value _M_current is just set to the _M_start and _M_finish pointers, respectively.

push_back
Let's look at a common operation, push_back. If _M_finish != _M_end_of_storage, meaning that there is available space in the vector, the new element is just placed at _M_finish, which is then incremented. void push_back(const value_type& __x) {       if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) {           std::_Construct(this->_M_impl._M_finish, __x); ++this->_M_impl._M_finish; }       else _M_insert_aux(end, __x); } However, if there is no available space, _M_insert_aux is called, which is a little more complex. It is an insert function, which involves copying the range from the insert position to the end one index to the right, and inserting the new value afterwards. It is structured like this: if space is available at the end of the vector: insert new value else reallocate Where insert new value: construct new element at finish, copy value from finish-1 increment finish copy range [position;finish-2] one index to the right set position to new value As an example, we would like to insert the value D into the vector [A,B,C] at position 1 (where B is): sta        fin     eos ┌───┬───┬───┬───┬───┬───┐ │ A │ B │ C │  │   │░░░│ └───┴───┴───┴───┴───┴───┘      ∆       │     ┌───┐     │ D │ └───┘ construct new element at finish, copy value from finish-1, increment finish: sta            fin eos ┌───┬───┬───┬───┬───┬───┐ │ A │ B │ C │ C │  │░░░│ └───┴───┴───┴───┴───┴───┘ copy range [position;finish-2] one index to the right. This operation is obviously linear time in the length of the vector, so insertion is slow for vectors! But push_back never does this. sta    f-2 f-1 fin eos ┌───┬───┬───┬───┬───┬───┐ │ A │ B │ B │ C │  │░░░│ └───┴───┴───┴───┴───┴───┘ set position to new value (D): sta            fin eos ┌───┬───┬───┬───┬───┬───┐ │ A │ D │ B │ C │  │░░░│ └───┴───┴───┴───┴───┴───┘

Reallocation is structured like this reallocate: new size = 2 * old size new_start = allocate(new size) copy range [start;position] to new_start construct new element at new_start + position with new value increment new_finish copy range [position;finish] to new_finish destroy [start;finish] deallocate [start;end of storage] start = new_start finish = new_finish end_of_storage = new size Reallocation is also linear time in the number of elements, so remember to initialize your vectors to the right size from the beginning!

It looks like this: template<typename _Tp, typename _Alloc> void vector<_Tp,_Alloc>:: _M_insert_aux(iterator __position, const _Tp& __x) {     if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) {       std::_Construct(this->_M_impl._M_finish, *(this->_M_impl._M_finish - 1)); ++this->_M_impl._M_finish; _Tp __x_copy = __x; std::copy_backward(__position,                          iterator(this->_M_impl._M_finish-2),                           iterator(this->_M_impl._M_finish-1)); *__position = __x_copy; }     else {       const size_type __old_size = size; const size_type __len = __old_size != 0 ? 2 * __old_size : 1; iterator __new_start(this->_M_allocate(__len)); iterator __new_finish(__new_start); try {           __new_finish = std::uninitialized_copy(iterator(this->_M_impl._M_start),                                                   __position,                                                   __new_start); std::_Construct(__new_finish.base, __x); ++__new_finish; __new_finish = std::uninitialized_copy(__position,                                                  iterator(this->_M_impl._M_finish),                                                   __new_finish); }       catch(...) {           std::_Destroy(__new_start,__new_finish); _M_deallocate(__new_start.base,__len); __throw_exception_again; }       std::_Destroy(begin, end); _M_deallocate(this->_M_impl._M_start,                     this->_M_impl._M_end_of_storage - this->_M_impl._M_start); this->_M_impl._M_start = __new_start.base; this->_M_impl._M_finish = __new_finish.base; this->_M_impl._M_end_of_storage = __new_start.base + __len; }   }