sizeof(integers)/sizeof(integers[0]) on vector<int> (x64)

Pages: 12
 
sizeof(integers)/sizeof(integers[0])

On x86 this works just fine, but on x64 the vector seems to reserve a long long (8 bytes) for each element but only uses an int (4 bytes). I know we can just use ".size()" or divide by "sizeof(long long)", but just curious as to what you guys think about this?

Is there a sizeof() deviation that reports reserved vs actual used?



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <iostream>
#include <string>
#include<vector>
using namespace std;

int main()
{
	vector<int> integers;
	integers.push_back(1);
	integers.push_back(2);
	integers.push_back(3);
	integers.push_back(4);

	for (auto index : integers)
		cout << index << endl;
	cout << "*************" << endl;

	////On x86, vector uses int = 4 bytes
	//cout << integers.size() << endl;		//4
	//cout << integers.capacity() << endl;	//4
	//cout << sizeof(integers) << endl;		//16
	//cout << sizeof(int) << endl;			//4
	//cout << sizeof(integers[0]) << endl;	//4
	//cout << sizeof(integers)/sizeof(integers[0]) << endl;	//4


	//On x64, vector reserves long long = 8 bytes, but uses int = 4 bytes
	cout << integers.size() << endl;		//4
	cout << integers.capacity() << endl;	        //4
	cout << sizeof(integers) << endl;		//32
	cout << sizeof(int) << endl;			 //4
	cout << sizeof(long long) << endl;		//8
	cout << sizeof(integers[0]) << endl;	        //4
	cout << sizeof(integers) / sizeof(integers[0]) << endl;	//8 WRONG!!!
	cout << sizeof(integers) / sizeof(long long) << endl;	//4

	return 0;
}
Last edited on
vector<int> integers;
Your integers is not an array, it is a std::vector.

The vector object could be something like:
1
2
3
4
5
class V {
  int* data;
  size_t size;
  size_t capacity;
};

The size of V object is not same as the dynamically allocated array that the V.data points to.

Try again with vector<int> integers( 1000, 42 );
That integers has already 1000 ints, so what is sizeof(integers) now?
sizeof() with a class returns the size of the class. eg sizeof(integers) returns the size of the class vector. This has no relation to the number of items stored in the vector as the data is stored in dynamic memory outside of the class. The class only holds a pointer to this memory. Your sums just happen to provide the expected answer for x86. If you add some more items to the vector you'll find that sizeof() stays the same. sizeof() is evaluated at compile time so has to be known then.
Using Visual Studio the situation is even "worse" than you think. The output for x86:
4
4
12
4
8
4
3
1

x64:
4
4
24
4
8
4
6
3

sizeof is a C function, using it in C++ code is gonna not work as expected when trying to determine a container's byte size.

On x64, vector reserves long long

No, it doesn't. What you are seeing is the byte cost of the vector's internal use of pointers to contain the data along with the overhead for working with dynamic memory allocation:
1
2
3
4
5
6
7
8
9
10
11
#include <iostream>
#include <vector>

int main( )
{
   std::cout << "Size of int:  " << sizeof(int) << '\n';
   std::cout << "Size of int*: " << sizeof(int*) << "\n\n";

   std::vector<int> vec;
   std::cout << "Size of vec:  " << sizeof(vec) << '\n';
}
And just for sheets and giggles here is that above code snippet using C++20 (also works with C++23):
1
2
3
4
5
6
7
8
9
10
11
import <iostream>;
import <vector>;

int main( )
{
   std::cout << std::format("Size of int:  {}", sizeof(int)) << '\n';
   std::cout << std::format("Size of int*: {}\n\n", sizeof(int*));

   std::vector<int> vec;
   std::cout << std::format("Size of vec:  {}\n", sizeof(vec));
}

Going full on hog, C++23:
1
2
3
4
5
6
7
8
9
10
import std;

int main( )
{
   std::println("Size of int:  {}", sizeof(int));
   std::println("Size of int*: {}\n", sizeof(int*));

   std::vector<int> vec;
   std::println("Size of vec:  {}", sizeof(vec));
}

I gotta ask, what compiler are you using? It doesn't seem to be VS.
Glad I asked, thanks!

Yes, 32 bytes for 1000 elements as well because it is an object. So when you "integers[0]", this is also referring to the object, but probably is an overloaded operator on their end that returns a reference to the memory location and we can interrogate the size. But our "integers" obj has no return on the constructor, because there aren't any return on constructors and so represents the size of the obj.

In the "integers" obj they are probably using some pointer variable that points to the memory locations, so regardless if we increase the size of our vector it will never be reflected in the size of the obj. That pointer will probably be 8 bytes of the 32 total bytes, maybe even some VFT's in there too.

It makes sense now.

On this next one when you "*v2.begin()" this works fine and prints out 11, because this returns a pointer to the 1st element [0]. When I go to copy it over it is as if it points to the element before [0] since I have to add +5 instead of +4. Why, just how it works?

In addition, is there a way to get the address of "v2.begin()" rather than to "&v2[0]"?

1
2
	//vector<int> v2(v.begin(), v.begin() + 4);	//Copy 1st 5 elements, NOT WORK! 
	vector<int> v2(v.begin(), v.begin() + 5);	//Copy 1st 5 elements, WORKS!  


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <iostream>
#include <vector>
using namespace std;

int main()
{
	vector<int> v{ 11,22,33,44,55,66,77,88,99 };
	//vector<int> v2(v.begin(), v.begin() + 4);	//Copy 1st 5 elements, NOT WORK! 
	vector<int> v2(v.begin(), v.begin() + 5);	//Copy 1st 5 elements, WORKS! 
	for (auto index : v2)
		cout << index << endl;

	//cout << v2.begin() << endl;				//NOT WORK!
	//cout << (unsigned int*)v2.begin() << endl;//NOT WORK!
	//cout << &v2.begin() << endl;				//NOT WORK!

	cout << &v2[0] << endl;						//000001E7ED8A0AC0
	cout << &v2[1] << endl;						//000001E7ED8A0AC4
	cout << *v2.begin() << endl;				//11
	cout << *(v2.begin() + 1) << endl;			//22
	cout << *(v2.begin() + 2) << endl;			//33

	return 0;
}
Last edited on
With initializer lists, range-based for loops (C++11) and deduction guides (C++17) creating a vector and accessing all the elements one after the other can be a lot less typing:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <iostream>
#include <vector>

int main( )
{
   // https://en.cppreference.com/w/cpp/utility/initializer_list
   // https://en.cppreference.com/w/cpp/container/vector/deduction_guides
   std::vector vec { 1, 2, 3, 4 };

   // https://en.cppreference.com/w/cpp/language/range-for
   for ( const auto& itr : vec )
   {
      std::cout << itr << ' ';
   }
   std::cout << '\n';
}

You can also use a vector's iterators to craft a for loop. Replace the for block with this:
1
2
3
4
for ( auto itr { vec.cbegin( ) }; itr != vec.cend( ); ++itr )
{
   std::cout << *itr << ' ';  // <----- notice the dereference operator!!!!!
}

auto tells the compiler to deduce the type of a vector's iterator so you don't have to remember what it is and potentially mistype it.

FYI, a vector's iterator is treated as if it were a pointer, that is why you need the dereference operator to access the vector's contents.

Now if you like to type everything explicitly here's the for loop with the explicit iterator type used:
for ( std::vector<int>::const_iterator itr { vec.cbegin( ) }; itr != vec.cend( ); ++itr )

Personally I use the range-based for loop as much as possible, with the auto deduced iterator type second. I rarely use the full iterator type, too easy to make a mistake. I also treat the vector's elements as const when not purposely modifying them. That way if by accident some code does try to change the contents the compiler errors out the offending code.

Yes, I still use the old school for loop from time to time:
1
2
3
4
   for ( size_t itr { }; itr < vec.size( ); ++itr )
   {
      std::cout << vec[itr] << ' ';
   }
vec[x] is the vector's operator[], to manually access a specified element. It returns a reference to the element, the contents stored at that location.

Fair warning, there is NO bounds checking with operator[]. If'n you have a vector of 5 elements and try to access say vec[100] that is out-of-bounds. Oooops!

operator at() does bounds checking, and some compilers in debug mode have operator[] do bounds checking. Don't rely on that.

As I said above, a vector's iterator acts as if it is a pointer:
1
2
   // have to dereference a vector iterator as you do a pointer.
   std::cout << *vec.cbegin( ) << '\n';

There are some operations you can't do on an iterator that can be done on a pointer, such as taking the address of the iterator. Iterators model pointers, but they are not pointers.

Using "pointer math" on an iterator can still go out of bounds.

Regarding "using namespace std;"....just don't use it.

https://stackoverflow.com/questions/1452721/whats-the-problem-with-using-namespace-std#1452738

It was introduced to make the transition of older legacy C++ code when C++ became ISO standardized. It was never meant to be used as a crutch for less typing.

https://isocpp.org/wiki/faq/coding-standards#using-namespace-std

"using namespace std;" IMO is at best lazy, and depending on where you use it can cause some major bad ju-ju. I make enough mistakes with my code that I don't need to introduce more that can be stopped by fully qualifying a namespace.

I personally always qualify what namespace I'm using when writing new code. At first it was "a pain" to remember to type std::, but after a while it became automatic. I want to use cout I type std::cout without thinking about it.
Thanks George. I actually know all the code you typed, I memorized them from the book and can code them now without looking.

Problem that I am sure I will have is when I stray away from the samples of the book and try things on my own, as I have been doing little by little.

I have heard about the namespace issues, just using it temporary so I can concentrate on learning tasks. When I write a real program I will transition to it and other requirements, such as error checking.

So you can't get the address of the pointer "vec.cbegin( )", OK but you can send it to an iterator.

Microsoft Visual Studio Community 2022 (64-bit) - Current
Version 17.3.5
Last edited on
Why do you need/want to know the address of the iterator? You should have zero need unless you are mashing up a new container.

Since a stdlib vector is doing all the memory management for you, including moving the elements around as needed, knowing the address of an iterator itself is kinda "there be dragons here" territory.

Now, obtaining the addresses of the contained elements, that is doable.

FYI, the current VS version is 17.8.6, you really should consider updating that if you can.

https://learn.microsoft.com/en-us/visualstudio/releases/2022/release-notes#17.8.6

Not that I am doing anything with it now. Since I am on the subject wondered if there was a way of getting it. Maybe someone has a legacy code that does something other worldly but it uses hex addressing as the interface. Good to know in case and I just put it in my notes for future reference.

Do you have an answer to this one, "when you "*v2.begin()" this works fine and prints out 11, because this returns a pointer to the 1st element [0]. When I go to copy it over it is as if it points to the element before [0] since I have to add +5 instead of +4. Why, just how it works?"

1
2
	//vector<int> v2(v.begin(), v.begin() + 4);	//Copy 1st 5 elements, NOT WORK! 
	vector<int> v2(v.begin(), v.begin() + 5);	//Copy 1st 5 elements, WORKS!  
You should explain the "NOT WORK".

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <iostream>
#include <vector>

int main()
{
    using std::vector;
    vector<int> v{ 11,22,33,44,55,66,77,88,99 };
    vector<int> v2(v.begin(), v.begin() + 4);
    for (auto index : v2)
        std::cout << ' ' << index;
    std::cout << '\n';
    vector<int> v3(v.begin(), v.begin() + 5);
    for (auto index : v3)
        std::cout << ' ' << index;
    std::cout << '\n';
}

 11 22 33 44
 11 22 33 44 55

The v2 has 4 elements. The v3 has 5 elements. What in that is "NOT WORK"? (Ahh, it is all _fun_.)


See range constructor in https://cplusplus.com/reference/vector/vector/vector/

(3) range constructor
Constructs a container with as many elements as the range [first,last), with each element constructed from its corresponding element in that range, in the same order.

That is, starting from first -- upto, but not including last, just like in:
1
2
for ( int i=0; i<4; ++i )
  std::cout << i << '\n';

that prints 4 values (but not '4').
Last edited on
Sorry, I though it was obvious from the comment "/Copy 1st 5 elements, NOT WORK!", which produced only 4 elements. I know there is an offset there somewhere. I thought the answer was going to be something similar to the dequeue, where if you insert an element at [0] but where the pointer was really pointing to the one before it in order to insert atop.

I was going to gripe about why it was not consistent with the other string and cons char* copy constructors, but after reading "but not the element pointed by last" it makes more sense why they chose the offset. They are keeping it consistent with the containers so you can simply perform the beginning to end copy and not have the ".end()" copied over.

 
v2(v1.begin(), v1.end());





first, last
Input iterators to the initial and final positions in a range. The range used is [first,last), which includes all the elements between first and last, including the element pointed by first but not the element pointed by last.
They are keeping it consistent with the containers so you can simply perform the beginning to end copy and not have the ".end()" copied over.

Right, ranges include the beginning and exclude the end, for the same reason that for loops use < not <=. Dijkstra explains:
https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF

That pointer will probably be 8 bytes of the 32 total bytes, maybe even some VFT's in there too.

Virtual functions are rare in the standard library - there's only a few dozen at most. The extra 8 bytes belong to the vector's allocator, but only in debug mode. The three mainstream implementations inherit their allocators, so they benefit from empty-base optimization to save some bytes if that optimization is enabled.
Last edited on
Thanks mbozzi and thanks to all for your aid.

That is funny, the entire time since I first learned sizeof(), I have been using it incorrectly on strings. That explains another mystery I had, where at times it would show 32 bytes and other times 40 bytes and I though it was padding or something.


myString.size() and myString.length() both report the same numbers. Will there ever be a time when the two may have different values? The string object seems to hide the null terminator '\0' from the user and not include it in the size.

1
2
cout << sizeof(int) << endl;
cout << sizeof(string) << endl;
Last edited on
.begin() doesn't return a pointer to an address. It returns an iterator which is something different. if you really want a pointer to the first element then there is .data().

.end() is an iterator to 1 past the last element so that <, != etc works as expected. If there are no elements then .begin() equals .end(). Consider:

1
2
3
4
const std::vector v {1, 2, 3, 4, 5};

for (size_t i {}; i < v.size(); ++i)
     std::cout << v[i] << '\n';


1
2
3
4
const std::vector v {1, 2, 3, 4, 5};

for (auto itr {v.begin()}; itr < v.end(); ++itr)
    std::cout << *itr << '\n';


Note < is only meaningful if the iterators are consecutive (eg for vector). If the iterators are not consecutive then != must be used (!= can also be used with consecutive ones).
myString.size() and myString.length() both report the same numbers


This is as per design. The final \0 is not considered as part of the string - although .c_str() will always return null-terminated. A std::string can contain \0 within the string which are considered part of the string.

sizeof(string) - like sizeof(vector) returns the number of bytes used for the class and NOT the number of stored characters (which like vector are stored in memory external to the class). No matter the number of chars held, sizeof(string) will always return the same value. Note that this value can change between different compilers/versions/x86/x64 etc and it's value (like sizeof() for other classes) shouldn't be assumed.
Last edited on
SubZeroWins wrote:
myString.size() and myString.length() both report the same numbers. Will there ever be a time when the two may have different values?

They will always return the same value.

SubZeroWins wrote:
The string object seems to hide the null terminator '\0' from the user and not include it in the size.

It's consistent with how strlen works.
https://en.cppreference.com/w/cpp/string/byte/strlen
The vector::begin() returns a random access iterator.
Random access iterators do have iterator_type op+ ( iterator_type, int )
See https://cplusplus.com/reference/iterator/RandomAccessIterator/

1
2
3
4
5
vector<int> v2(v.begin(), v.begin() + 5);
// does about same as more verbose
auto first = v.begin();
auto last = first + 5;
vector<int> v2(first, last);


Consequently, v.begin() and v.begin() + 0 point to same element in the container.


You could also read Iterator Validity from https://cplusplus.com/reference/vector/vector/resize/
Much obliged, my questions have been answered and it all makes sense.
There is still much to read for new info and to expand.
Pages: 12