python incorrect size with getsizeof() and .nbytes with nested lists












0















I apologise if this is a duplicate issue, but I've been having some issues with .nsize and sys.getsizeof().



In particular, I have a list which contains numpy arrays, each array is a 3D representation of an image (row, column, RGB) and each of these images have different dimensions.
There are over 4000 images, and this may increase in the future, as I plan to use them for machine learning.



When I use .nsize with one image, I get the correct size, but when I try to evaluate the whole lot, I get an incorrect size:



# size of image 1 in bytes
print("size of first image: %d bytes" % images[0].nbytes)

# size of all images in bytes
print("total size of all images: %d bytes" % images.nbytes)


Result:



size of first image: 60066 bytes

total size of all images: 36600 bytes


Are the only ways around this to either loop through all the images or change to a monstrous 4D array instead of a list of 3D arrays? Is there another function which better evaluates size for this kind of nested setup?



I'm running Python 3.6.7.










share|improve this question

























  • Focus on shape and dtype. The other measures don't help you understand. And don't give us a massive display of the data.

    – hpaulj
    Nov 25 '18 at 15:05






  • 1





    Many, if not all, machine learning tools, assume the images have the same shape. They will raise errors if you try to use a list or object dtype array with diverse shapes. Both lists and object arrays contain pointers to arrays elsewhere in memory. So any size measure of the container just sees the pointers (e.g. 8 byte integers).

    – hpaulj
    Nov 25 '18 at 17:29













  • @hpaulj I would have avoided including that data but someone asked for it. I'll remove it now.

    – Armand Bernard
    Nov 26 '18 at 1:08











  • @hpaulj as it happens I guess that second comment of yours answers my question as to 'why' it is happening, so thanks for that. I understand that better now. I'm not going to close this yet as I'm looking for solutions too.

    – Armand Bernard
    Nov 26 '18 at 1:15
















0















I apologise if this is a duplicate issue, but I've been having some issues with .nsize and sys.getsizeof().



In particular, I have a list which contains numpy arrays, each array is a 3D representation of an image (row, column, RGB) and each of these images have different dimensions.
There are over 4000 images, and this may increase in the future, as I plan to use them for machine learning.



When I use .nsize with one image, I get the correct size, but when I try to evaluate the whole lot, I get an incorrect size:



# size of image 1 in bytes
print("size of first image: %d bytes" % images[0].nbytes)

# size of all images in bytes
print("total size of all images: %d bytes" % images.nbytes)


Result:



size of first image: 60066 bytes

total size of all images: 36600 bytes


Are the only ways around this to either loop through all the images or change to a monstrous 4D array instead of a list of 3D arrays? Is there another function which better evaluates size for this kind of nested setup?



I'm running Python 3.6.7.










share|improve this question

























  • Focus on shape and dtype. The other measures don't help you understand. And don't give us a massive display of the data.

    – hpaulj
    Nov 25 '18 at 15:05






  • 1





    Many, if not all, machine learning tools, assume the images have the same shape. They will raise errors if you try to use a list or object dtype array with diverse shapes. Both lists and object arrays contain pointers to arrays elsewhere in memory. So any size measure of the container just sees the pointers (e.g. 8 byte integers).

    – hpaulj
    Nov 25 '18 at 17:29













  • @hpaulj I would have avoided including that data but someone asked for it. I'll remove it now.

    – Armand Bernard
    Nov 26 '18 at 1:08











  • @hpaulj as it happens I guess that second comment of yours answers my question as to 'why' it is happening, so thanks for that. I understand that better now. I'm not going to close this yet as I'm looking for solutions too.

    – Armand Bernard
    Nov 26 '18 at 1:15














0












0








0








I apologise if this is a duplicate issue, but I've been having some issues with .nsize and sys.getsizeof().



In particular, I have a list which contains numpy arrays, each array is a 3D representation of an image (row, column, RGB) and each of these images have different dimensions.
There are over 4000 images, and this may increase in the future, as I plan to use them for machine learning.



When I use .nsize with one image, I get the correct size, but when I try to evaluate the whole lot, I get an incorrect size:



# size of image 1 in bytes
print("size of first image: %d bytes" % images[0].nbytes)

# size of all images in bytes
print("total size of all images: %d bytes" % images.nbytes)


Result:



size of first image: 60066 bytes

total size of all images: 36600 bytes


Are the only ways around this to either loop through all the images or change to a monstrous 4D array instead of a list of 3D arrays? Is there another function which better evaluates size for this kind of nested setup?



I'm running Python 3.6.7.










share|improve this question
















I apologise if this is a duplicate issue, but I've been having some issues with .nsize and sys.getsizeof().



In particular, I have a list which contains numpy arrays, each array is a 3D representation of an image (row, column, RGB) and each of these images have different dimensions.
There are over 4000 images, and this may increase in the future, as I plan to use them for machine learning.



When I use .nsize with one image, I get the correct size, but when I try to evaluate the whole lot, I get an incorrect size:



# size of image 1 in bytes
print("size of first image: %d bytes" % images[0].nbytes)

# size of all images in bytes
print("total size of all images: %d bytes" % images.nbytes)


Result:



size of first image: 60066 bytes

total size of all images: 36600 bytes


Are the only ways around this to either loop through all the images or change to a monstrous 4D array instead of a list of 3D arrays? Is there another function which better evaluates size for this kind of nested setup?



I'm running Python 3.6.7.







python-3.x numpy size






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 '18 at 1:08







Armand Bernard

















asked Nov 25 '18 at 11:36









Armand BernardArmand Bernard

12




12













  • Focus on shape and dtype. The other measures don't help you understand. And don't give us a massive display of the data.

    – hpaulj
    Nov 25 '18 at 15:05






  • 1





    Many, if not all, machine learning tools, assume the images have the same shape. They will raise errors if you try to use a list or object dtype array with diverse shapes. Both lists and object arrays contain pointers to arrays elsewhere in memory. So any size measure of the container just sees the pointers (e.g. 8 byte integers).

    – hpaulj
    Nov 25 '18 at 17:29













  • @hpaulj I would have avoided including that data but someone asked for it. I'll remove it now.

    – Armand Bernard
    Nov 26 '18 at 1:08











  • @hpaulj as it happens I guess that second comment of yours answers my question as to 'why' it is happening, so thanks for that. I understand that better now. I'm not going to close this yet as I'm looking for solutions too.

    – Armand Bernard
    Nov 26 '18 at 1:15



















  • Focus on shape and dtype. The other measures don't help you understand. And don't give us a massive display of the data.

    – hpaulj
    Nov 25 '18 at 15:05






  • 1





    Many, if not all, machine learning tools, assume the images have the same shape. They will raise errors if you try to use a list or object dtype array with diverse shapes. Both lists and object arrays contain pointers to arrays elsewhere in memory. So any size measure of the container just sees the pointers (e.g. 8 byte integers).

    – hpaulj
    Nov 25 '18 at 17:29













  • @hpaulj I would have avoided including that data but someone asked for it. I'll remove it now.

    – Armand Bernard
    Nov 26 '18 at 1:08











  • @hpaulj as it happens I guess that second comment of yours answers my question as to 'why' it is happening, so thanks for that. I understand that better now. I'm not going to close this yet as I'm looking for solutions too.

    – Armand Bernard
    Nov 26 '18 at 1:15

















Focus on shape and dtype. The other measures don't help you understand. And don't give us a massive display of the data.

– hpaulj
Nov 25 '18 at 15:05





Focus on shape and dtype. The other measures don't help you understand. And don't give us a massive display of the data.

– hpaulj
Nov 25 '18 at 15:05




1




1





Many, if not all, machine learning tools, assume the images have the same shape. They will raise errors if you try to use a list or object dtype array with diverse shapes. Both lists and object arrays contain pointers to arrays elsewhere in memory. So any size measure of the container just sees the pointers (e.g. 8 byte integers).

– hpaulj
Nov 25 '18 at 17:29







Many, if not all, machine learning tools, assume the images have the same shape. They will raise errors if you try to use a list or object dtype array with diverse shapes. Both lists and object arrays contain pointers to arrays elsewhere in memory. So any size measure of the container just sees the pointers (e.g. 8 byte integers).

– hpaulj
Nov 25 '18 at 17:29















@hpaulj I would have avoided including that data but someone asked for it. I'll remove it now.

– Armand Bernard
Nov 26 '18 at 1:08





@hpaulj I would have avoided including that data but someone asked for it. I'll remove it now.

– Armand Bernard
Nov 26 '18 at 1:08













@hpaulj as it happens I guess that second comment of yours answers my question as to 'why' it is happening, so thanks for that. I understand that better now. I'm not going to close this yet as I'm looking for solutions too.

– Armand Bernard
Nov 26 '18 at 1:15





@hpaulj as it happens I guess that second comment of yours answers my question as to 'why' it is happening, so thanks for that. I understand that better now. I'm not going to close this yet as I'm looking for solutions too.

– Armand Bernard
Nov 26 '18 at 1:15












1 Answer
1






active

oldest

votes


















0














Try running images.dtype. What does it return? If it's dtype('O'), that explains your problem: images is not a list, but is instead a Numpy array of type object, which is generally a Bad Idea™️. Technically, it'll be an 1D array holding a bunch of 3D arrays.



Numpy arrays are best suited to use with numerical data. They're flexible enough to hold arbitrary Python objects, but it greatly impairs both their functionality and their efficiency. Unless you have a clear reason why in mind, you should generally just use a plain Python list in these situations.



You may actually be best off converting images to a 4D array, as this is the only way that images.nbytes will work correctly. You can't do this if your images are all different sizes, but given that they all have the same shape (x, y, z) it's actually pretty straightforward:



images = np.array([a for a in images])


Now images.shape will be (n, x, y, z), where n is the total number of images. You can access the 3D array that represents the ith image by just indexing images:



image_i = images[i]


Alternatively, you can convert images to a normal Python list:



images = images.to_list()


If you don't want to bother with any of those conversions, you can always get the size of all the subarrays via iteration:



totalsize = sum(arr.nbytes for arr in images)





share|improve this answer


























  • images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.

    – Armand Bernard
    Nov 26 '18 at 1:28











  • Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.

    – Armand Bernard
    Nov 26 '18 at 1:39











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53467036%2fpython-incorrect-size-with-getsizeof-and-nbytes-with-nested-lists%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














Try running images.dtype. What does it return? If it's dtype('O'), that explains your problem: images is not a list, but is instead a Numpy array of type object, which is generally a Bad Idea™️. Technically, it'll be an 1D array holding a bunch of 3D arrays.



Numpy arrays are best suited to use with numerical data. They're flexible enough to hold arbitrary Python objects, but it greatly impairs both their functionality and their efficiency. Unless you have a clear reason why in mind, you should generally just use a plain Python list in these situations.



You may actually be best off converting images to a 4D array, as this is the only way that images.nbytes will work correctly. You can't do this if your images are all different sizes, but given that they all have the same shape (x, y, z) it's actually pretty straightforward:



images = np.array([a for a in images])


Now images.shape will be (n, x, y, z), where n is the total number of images. You can access the 3D array that represents the ith image by just indexing images:



image_i = images[i]


Alternatively, you can convert images to a normal Python list:



images = images.to_list()


If you don't want to bother with any of those conversions, you can always get the size of all the subarrays via iteration:



totalsize = sum(arr.nbytes for arr in images)





share|improve this answer


























  • images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.

    – Armand Bernard
    Nov 26 '18 at 1:28











  • Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.

    – Armand Bernard
    Nov 26 '18 at 1:39
















0














Try running images.dtype. What does it return? If it's dtype('O'), that explains your problem: images is not a list, but is instead a Numpy array of type object, which is generally a Bad Idea™️. Technically, it'll be an 1D array holding a bunch of 3D arrays.



Numpy arrays are best suited to use with numerical data. They're flexible enough to hold arbitrary Python objects, but it greatly impairs both their functionality and their efficiency. Unless you have a clear reason why in mind, you should generally just use a plain Python list in these situations.



You may actually be best off converting images to a 4D array, as this is the only way that images.nbytes will work correctly. You can't do this if your images are all different sizes, but given that they all have the same shape (x, y, z) it's actually pretty straightforward:



images = np.array([a for a in images])


Now images.shape will be (n, x, y, z), where n is the total number of images. You can access the 3D array that represents the ith image by just indexing images:



image_i = images[i]


Alternatively, you can convert images to a normal Python list:



images = images.to_list()


If you don't want to bother with any of those conversions, you can always get the size of all the subarrays via iteration:



totalsize = sum(arr.nbytes for arr in images)





share|improve this answer


























  • images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.

    – Armand Bernard
    Nov 26 '18 at 1:28











  • Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.

    – Armand Bernard
    Nov 26 '18 at 1:39














0












0








0







Try running images.dtype. What does it return? If it's dtype('O'), that explains your problem: images is not a list, but is instead a Numpy array of type object, which is generally a Bad Idea™️. Technically, it'll be an 1D array holding a bunch of 3D arrays.



Numpy arrays are best suited to use with numerical data. They're flexible enough to hold arbitrary Python objects, but it greatly impairs both their functionality and their efficiency. Unless you have a clear reason why in mind, you should generally just use a plain Python list in these situations.



You may actually be best off converting images to a 4D array, as this is the only way that images.nbytes will work correctly. You can't do this if your images are all different sizes, but given that they all have the same shape (x, y, z) it's actually pretty straightforward:



images = np.array([a for a in images])


Now images.shape will be (n, x, y, z), where n is the total number of images. You can access the 3D array that represents the ith image by just indexing images:



image_i = images[i]


Alternatively, you can convert images to a normal Python list:



images = images.to_list()


If you don't want to bother with any of those conversions, you can always get the size of all the subarrays via iteration:



totalsize = sum(arr.nbytes for arr in images)





share|improve this answer















Try running images.dtype. What does it return? If it's dtype('O'), that explains your problem: images is not a list, but is instead a Numpy array of type object, which is generally a Bad Idea™️. Technically, it'll be an 1D array holding a bunch of 3D arrays.



Numpy arrays are best suited to use with numerical data. They're flexible enough to hold arbitrary Python objects, but it greatly impairs both their functionality and their efficiency. Unless you have a clear reason why in mind, you should generally just use a plain Python list in these situations.



You may actually be best off converting images to a 4D array, as this is the only way that images.nbytes will work correctly. You can't do this if your images are all different sizes, but given that they all have the same shape (x, y, z) it's actually pretty straightforward:



images = np.array([a for a in images])


Now images.shape will be (n, x, y, z), where n is the total number of images. You can access the 3D array that represents the ith image by just indexing images:



image_i = images[i]


Alternatively, you can convert images to a normal Python list:



images = images.to_list()


If you don't want to bother with any of those conversions, you can always get the size of all the subarrays via iteration:



totalsize = sum(arr.nbytes for arr in images)






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 25 '18 at 13:36

























answered Nov 25 '18 at 13:18









teltel

7,41621431




7,41621431













  • images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.

    – Armand Bernard
    Nov 26 '18 at 1:28











  • Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.

    – Armand Bernard
    Nov 26 '18 at 1:39



















  • images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.

    – Armand Bernard
    Nov 26 '18 at 1:28











  • Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.

    – Armand Bernard
    Nov 26 '18 at 1:39

















images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.

– Armand Bernard
Nov 26 '18 at 1:28





images.dtype returns "object", which I'm guessing is what you said. Even though I created it as a list, I guess it changed type when I started importing arrays.

– Armand Bernard
Nov 26 '18 at 1:28













Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.

– Armand Bernard
Nov 26 '18 at 1:39





Thanks for your answer, what you propose with making it a 4D array does theoretically pose a problem though. If I import an image larger than the previous one into the 4D array, I'd probably get an error and have to pad previous images so it all fits. I'd then have to reverse this process every time I access the image. Seems like a huge hassle just so I can gauge the 'weight' of my dataset.

– Armand Bernard
Nov 26 '18 at 1:39




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53467036%2fpython-incorrect-size-with-getsizeof-and-nbytes-with-nested-lists%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

404 Error Contact Form 7 ajax form submitting

How to know if a Active Directory user can login interactively

TypeError: fit_transform() missing 1 required positional argument: 'X'