pandas reading data from column in as float or int and not str despite dtype setting












2















i have an issue with pandas (0.23.4) on python 3.7 where the data is being read in as scientific notation instead of just a string despite setting the dtype setting. Here is an example of the data that is being read in



-------------------
codes
-------------------
001234544
00023455
123456789
A1253532
780E9000
00678E10


The problem comes with lines 5 and 6 of the above because they contain, i think, 'E' characters and they are being turned into scientific notation.



My reader is setup as follows.



accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', dtype=str)


despite that dtype=str setting, it appears that pandas using something called ... a "sniffer" that detects the data type automatically and its being changed back to what I assume is float or int, and then changing it to scientific notation. One suggestion in another thread says to use something called a converter statement within the read_csv like the following



pd.read_csv('my.csv', converters = {i: str for i in range(0, 100)})


I am curious if this is a possible solution to my problem, but also i have no idea how long that range should be as it changes often. Is there any way to query the length of the column and feed that as a variable into that range call?



I looks like i can do something like len(accounts.index) ... but i cant do this till after the reader has read the file so something like this below doesnt work



accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1',  converters = {i: str for i in range(0, gym_length)}))
gym_length = len(accounts.index)


the length check is after the .. i guess you call it ... data reader, so it doesnt work obviously.










share|improve this question

























  • I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.

    – 0range
    Nov 29 '18 at 22:25
















2















i have an issue with pandas (0.23.4) on python 3.7 where the data is being read in as scientific notation instead of just a string despite setting the dtype setting. Here is an example of the data that is being read in



-------------------
codes
-------------------
001234544
00023455
123456789
A1253532
780E9000
00678E10


The problem comes with lines 5 and 6 of the above because they contain, i think, 'E' characters and they are being turned into scientific notation.



My reader is setup as follows.



accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', dtype=str)


despite that dtype=str setting, it appears that pandas using something called ... a "sniffer" that detects the data type automatically and its being changed back to what I assume is float or int, and then changing it to scientific notation. One suggestion in another thread says to use something called a converter statement within the read_csv like the following



pd.read_csv('my.csv', converters = {i: str for i in range(0, 100)})


I am curious if this is a possible solution to my problem, but also i have no idea how long that range should be as it changes often. Is there any way to query the length of the column and feed that as a variable into that range call?



I looks like i can do something like len(accounts.index) ... but i cant do this till after the reader has read the file so something like this below doesnt work



accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1',  converters = {i: str for i in range(0, gym_length)}))
gym_length = len(accounts.index)


the length check is after the .. i guess you call it ... data reader, so it doesnt work obviously.










share|improve this question

























  • I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.

    – 0range
    Nov 29 '18 at 22:25














2












2








2








i have an issue with pandas (0.23.4) on python 3.7 where the data is being read in as scientific notation instead of just a string despite setting the dtype setting. Here is an example of the data that is being read in



-------------------
codes
-------------------
001234544
00023455
123456789
A1253532
780E9000
00678E10


The problem comes with lines 5 and 6 of the above because they contain, i think, 'E' characters and they are being turned into scientific notation.



My reader is setup as follows.



accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', dtype=str)


despite that dtype=str setting, it appears that pandas using something called ... a "sniffer" that detects the data type automatically and its being changed back to what I assume is float or int, and then changing it to scientific notation. One suggestion in another thread says to use something called a converter statement within the read_csv like the following



pd.read_csv('my.csv', converters = {i: str for i in range(0, 100)})


I am curious if this is a possible solution to my problem, but also i have no idea how long that range should be as it changes often. Is there any way to query the length of the column and feed that as a variable into that range call?



I looks like i can do something like len(accounts.index) ... but i cant do this till after the reader has read the file so something like this below doesnt work



accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1',  converters = {i: str for i in range(0, gym_length)}))
gym_length = len(accounts.index)


the length check is after the .. i guess you call it ... data reader, so it doesnt work obviously.










share|improve this question
















i have an issue with pandas (0.23.4) on python 3.7 where the data is being read in as scientific notation instead of just a string despite setting the dtype setting. Here is an example of the data that is being read in



-------------------
codes
-------------------
001234544
00023455
123456789
A1253532
780E9000
00678E10


The problem comes with lines 5 and 6 of the above because they contain, i think, 'E' characters and they are being turned into scientific notation.



My reader is setup as follows.



accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', dtype=str)


despite that dtype=str setting, it appears that pandas using something called ... a "sniffer" that detects the data type automatically and its being changed back to what I assume is float or int, and then changing it to scientific notation. One suggestion in another thread says to use something called a converter statement within the read_csv like the following



pd.read_csv('my.csv', converters = {i: str for i in range(0, 100)})


I am curious if this is a possible solution to my problem, but also i have no idea how long that range should be as it changes often. Is there any way to query the length of the column and feed that as a variable into that range call?



I looks like i can do something like len(accounts.index) ... but i cant do this till after the reader has read the file so something like this below doesnt work



accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1',  converters = {i: str for i in range(0, gym_length)}))
gym_length = len(accounts.index)


the length check is after the .. i guess you call it ... data reader, so it doesnt work obviously.







python-3.x pandas csv






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 '18 at 1:19







Oscalation

















asked Nov 26 '18 at 0:44









OscalationOscalation

23318




23318













  • I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.

    – 0range
    Nov 29 '18 at 22:25



















  • I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.

    – 0range
    Nov 29 '18 at 22:25

















I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.

– 0range
Nov 29 '18 at 22:25





I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.

– 0range
Nov 29 '18 at 22:25












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473481%2fpandas-reading-data-from-column-in-as-float-or-int-and-not-str-despite-dtype-set%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473481%2fpandas-reading-data-from-column-in-as-float-or-int-and-not-str-despite-dtype-set%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

404 Error Contact Form 7 ajax form submitting

How to know if a Active Directory user can login interactively

TypeError: fit_transform() missing 1 required positional argument: 'X'