pandas reading data from column in as float or int and not str despite dtype setting
i have an issue with pandas (0.23.4) on python 3.7 where the data is being read in as scientific notation instead of just a string despite setting the dtype setting. Here is an example of the data that is being read in
-------------------
codes
-------------------
001234544
00023455
123456789
A1253532
780E9000
00678E10
The problem comes with lines 5 and 6 of the above because they contain, i think, 'E' characters and they are being turned into scientific notation.
My reader is setup as follows.
accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', dtype=str)
despite that dtype=str setting, it appears that pandas using something called ... a "sniffer" that detects the data type automatically and its being changed back to what I assume is float or int, and then changing it to scientific notation. One suggestion in another thread says to use something called a converter statement within the read_csv like the following
pd.read_csv('my.csv', converters = {i: str for i in range(0, 100)})
I am curious if this is a possible solution to my problem, but also i have no idea how long that range should be as it changes often. Is there any way to query the length of the column and feed that as a variable into that range call?
I looks like i can do something like len(accounts.index) ... but i cant do this till after the reader has read the file so something like this below doesnt work
accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', converters = {i: str for i in range(0, gym_length)}))
gym_length = len(accounts.index)
the length check is after the .. i guess you call it ... data reader, so it doesnt work obviously.
python-3.x pandas csv
add a comment |
i have an issue with pandas (0.23.4) on python 3.7 where the data is being read in as scientific notation instead of just a string despite setting the dtype setting. Here is an example of the data that is being read in
-------------------
codes
-------------------
001234544
00023455
123456789
A1253532
780E9000
00678E10
The problem comes with lines 5 and 6 of the above because they contain, i think, 'E' characters and they are being turned into scientific notation.
My reader is setup as follows.
accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', dtype=str)
despite that dtype=str setting, it appears that pandas using something called ... a "sniffer" that detects the data type automatically and its being changed back to what I assume is float or int, and then changing it to scientific notation. One suggestion in another thread says to use something called a converter statement within the read_csv like the following
pd.read_csv('my.csv', converters = {i: str for i in range(0, 100)})
I am curious if this is a possible solution to my problem, but also i have no idea how long that range should be as it changes often. Is there any way to query the length of the column and feed that as a variable into that range call?
I looks like i can do something like len(accounts.index) ... but i cant do this till after the reader has read the file so something like this below doesnt work
accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', converters = {i: str for i in range(0, gym_length)}))
gym_length = len(accounts.index)
the length check is after the .. i guess you call it ... data reader, so it doesnt work obviously.
python-3.x pandas csv
I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.
– 0range
Nov 29 '18 at 22:25
add a comment |
i have an issue with pandas (0.23.4) on python 3.7 where the data is being read in as scientific notation instead of just a string despite setting the dtype setting. Here is an example of the data that is being read in
-------------------
codes
-------------------
001234544
00023455
123456789
A1253532
780E9000
00678E10
The problem comes with lines 5 and 6 of the above because they contain, i think, 'E' characters and they are being turned into scientific notation.
My reader is setup as follows.
accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', dtype=str)
despite that dtype=str setting, it appears that pandas using something called ... a "sniffer" that detects the data type automatically and its being changed back to what I assume is float or int, and then changing it to scientific notation. One suggestion in another thread says to use something called a converter statement within the read_csv like the following
pd.read_csv('my.csv', converters = {i: str for i in range(0, 100)})
I am curious if this is a possible solution to my problem, but also i have no idea how long that range should be as it changes often. Is there any way to query the length of the column and feed that as a variable into that range call?
I looks like i can do something like len(accounts.index) ... but i cant do this till after the reader has read the file so something like this below doesnt work
accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', converters = {i: str for i in range(0, gym_length)}))
gym_length = len(accounts.index)
the length check is after the .. i guess you call it ... data reader, so it doesnt work obviously.
python-3.x pandas csv
i have an issue with pandas (0.23.4) on python 3.7 where the data is being read in as scientific notation instead of just a string despite setting the dtype setting. Here is an example of the data that is being read in
-------------------
codes
-------------------
001234544
00023455
123456789
A1253532
780E9000
00678E10
The problem comes with lines 5 and 6 of the above because they contain, i think, 'E' characters and they are being turned into scientific notation.
My reader is setup as follows.
accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', dtype=str)
despite that dtype=str setting, it appears that pandas using something called ... a "sniffer" that detects the data type automatically and its being changed back to what I assume is float or int, and then changing it to scientific notation. One suggestion in another thread says to use something called a converter statement within the read_csv like the following
pd.read_csv('my.csv', converters = {i: str for i in range(0, 100)})
I am curious if this is a possible solution to my problem, but also i have no idea how long that range should be as it changes often. Is there any way to query the length of the column and feed that as a variable into that range call?
I looks like i can do something like len(accounts.index) ... but i cant do this till after the reader has read the file so something like this below doesnt work
accounts = pd.read_excel('gym_accounts.xlsx', sheet_name='Sheet1', converters = {i: str for i in range(0, gym_length)}))
gym_length = len(accounts.index)
the length check is after the .. i guess you call it ... data reader, so it doesnt work obviously.
python-3.x pandas csv
python-3.x pandas csv
edited Nov 26 '18 at 1:19
Oscalation
asked Nov 26 '18 at 0:44
OscalationOscalation
23318
23318
I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.
– 0range
Nov 29 '18 at 22:25
add a comment |
I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.
– 0range
Nov 29 '18 at 22:25
I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.
– 0range
Nov 29 '18 at 22:25
I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.
– 0range
Nov 29 '18 at 22:25
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473481%2fpandas-reading-data-from-column-in-as-float-or-int-and-not-str-despite-dtype-set%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53473481%2fpandas-reading-data-from-column-in-as-float-or-int-and-not-str-despite-dtype-set%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I am afraid this is not reproducible without example file. Copying your example data into libreoffice and saving as gym_accounts.xlsx and then using your code to read it into pandas does not cut it. The data is read as str in that case as expected.
– 0range
Nov 29 '18 at 22:25