pandas reading data from column in as float or int and not str despite dtype setting

i have an issue with pandas (0.23.4) on python 3.7 where the data is being read in as scientific notation instead of just a string despite setting the dtype setting. Here is an example of the data that is being read in

-------------------

codes

-------------------

001234544

00023455

123456789

A1253532

780E9000

00678E10

The problem comes with lines 5 and 6 of the above because they contain, i think, 'E' characters and they are being turned into scientific notation.

My reader is setup as follows.

accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', dtype=str)

despite that dtype=str setting, it appears that pandas using something called ... a "sniffer" that detects the data type automatically and its being changed back to what I assume is float or int, and then changing it to scientific notation. One suggestion in another thread says to use something called a converter statement within the read_csv like the following

pd.read_csv('my.csv', converters = {i: str for i in range(0, 100)})

I am curious if this is a possible solution to my problem, but also i have no idea how long that range should be as it changes often. Is there any way to query the length of the column and feed that as a variable into that range call?

I looks like i can do something like len(accounts.index) ... but i cant do this till after the reader has read the file so something like this below doesnt work

accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1',  converters = {i: str for i in range(0, gym_length)}))

gym_length = len(accounts.index)

the length check is after the .. i guess you call it ... data reader, so it doesnt work obviously.

edited Nov 26 '18 at 1:19

asked Nov 26 '18 at 0:44

Oscalation

23318

I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.

– 0range
Nov 29 '18 at 22:25

add a comment |

-------------------

codes

-------------------

001234544

00023455

123456789

A1253532

780E9000

00678E10

The problem comes with lines 5 and 6 of the above because they contain, i think, 'E' characters and they are being turned into scientific notation.

My reader is setup as follows.

accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', dtype=str)

pd.read_csv('my.csv', converters = {i: str for i in range(0, 100)})

I looks like i can do something like len(accounts.index) ... but i cant do this till after the reader has read the file so something like this below doesnt work

accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1',  converters = {i: str for i in range(0, gym_length)}))

gym_length = len(accounts.index)

the length check is after the .. i guess you call it ... data reader, so it doesnt work obviously.

edited Nov 26 '18 at 1:19

asked Nov 26 '18 at 0:44

Oscalation

23318

I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.

– 0range
Nov 29 '18 at 22:25

add a comment |

-------------------

codes

-------------------

001234544

00023455

123456789

A1253532

780E9000

00678E10

The problem comes with lines 5 and 6 of the above because they contain, i think, 'E' characters and they are being turned into scientific notation.

My reader is setup as follows.

accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', dtype=str)

pd.read_csv('my.csv', converters = {i: str for i in range(0, 100)})

I looks like i can do something like len(accounts.index) ... but i cant do this till after the reader has read the file so something like this below doesnt work

accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1',  converters = {i: str for i in range(0, gym_length)}))

gym_length = len(accounts.index)

the length check is after the .. i guess you call it ... data reader, so it doesnt work obviously.

edited Nov 26 '18 at 1:19

asked Nov 26 '18 at 0:44

Oscalation

23318

-------------------

codes

-------------------

001234544

00023455

123456789

A1253532

780E9000

00678E10

The problem comes with lines 5 and 6 of the above because they contain, i think, 'E' characters and they are being turned into scientific notation.

My reader is setup as follows.

accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', dtype=str)

pd.read_csv('my.csv', converters = {i: str for i in range(0, 100)})

I looks like i can do something like len(accounts.index) ... but i cant do this till after the reader has read the file so something like this below doesnt work

accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1',  converters = {i: str for i in range(0, gym_length)}))

gym_length = len(accounts.index)

the length check is after the .. i guess you call it ... data reader, so it doesnt work obviously.

python-3.x pandas csv

edited Nov 26 '18 at 1:19

asked Nov 26 '18 at 0:44

Oscalation

23318

edited Nov 26 '18 at 1:19

asked Nov 26 '18 at 0:44

Oscalation

23318

edited Nov 26 '18 at 1:19

asked Nov 26 '18 at 0:44

Oscalation

23318

asked Nov 26 '18 at 0:44

Oscalation

23318

asked Nov 26 '18 at 0:44

Oscalation

23318

I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.

– 0range
Nov 29 '18 at 22:25

add a comment |

I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.

– 0range
Nov 29 '18 at 22:25

I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.

– 0range
Nov 29 '18 at 22:25

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473481%2fpandas-reading-data-from-column-in-as-float-or-int-and-not-str-despite-dtype-set%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk