Pandas iloc wrong index causing problems with subtraction

I should start by saying that I am quite new to pandas and numpy (and machine learning in general).

I am trying to learn some basic machine learning algorithms and am doing linear regression. I have completed this problem using matlab, but wanted to try implementing it in python - as that is a more practically used language. I am having a very difficult time doing basic matrix operations with these libraries and I think it's down to a lack of understanding of how pandas is indexing the dataframe...

I have found several posts talking about the differences between iloc and ix and that ix is being deprecated so use iloc, but iloc is causing me loads of issues. I am simply trying to pull the first n-1 columns out of a dataframe into a new dataframe, then the final column into another dataframe to get my label values. Then I want to perform the cost function one time to see what my current cost is with theta = 0. Currently, my dataset has only one label - but I'd like to code as if I had more. Here is my code and my output:

path = os. getcwd() + '\ex1data1.txt'

data = pd.read_csv(path, header=None)



numRows = data.shape[0]

numCols = data.shape[1]



X = data.iloc[:,0:numCols-1].copy()

theta = pd.DataFrame(np.zeros((X.shape[1], 1)))

y = data.iloc[:,-1].copy()



#start computing cost sum((X-theta)-y).^2)

predictions = X.dot(theta)

print("predictions shape: {0}".format(predictions.shape))

print(predictions.head())

print("y shape: {0}".format(y.shape))

print(y.head())



errors = predictions.subtract(y)



print("errors shape: {0}".format(errors.shape))

print(errors.head())

output:

predictions shape: (97, 1)

 0

0  0.0

1  0.0

2  0.0

3  0.0

4  0.0

y shape: (97, 1)

     1

0  17.5920

1   9.1302

2  13.6620

3  11.8540

4   6.8233

errors shape: (97, 2)

0   1

0 NaN NaN

1 NaN NaN

2 NaN NaN

3 NaN NaN

4 NaN NaN

I can see that y and X have the same shape, but for some reason when I display them - it seems that y is beginning its indexing at column 1 (it's original position in the first dataframe) and X has its original column of 0. As a result, pandas is properly doing the subtraction and replacing any missing values with NaN. As y has no column 0 values, they are all NaN, and as X has no column 1 values, they are all NaN, resulting in a 97x2 NaN matrix.

If I use ‍‍‍‍‍‍y = data.ix[:,-1:0] - the above code does the correct calculations. Output:

 errors shape: (97, 1)

         0

     0 -6.1101

     1 -5.5277

     2 -8.5186

     3 -7.0032

     4 -5.8598

But I am trying to stay away from ix as it has been said it is deprecating.

How to I tell pandas that the new matrix has a start column of 0 and why is this not the default behavior?

asked Nov 22 '18 at 20:23

Aserian

3991418

add a comment |

I should start by saying that I am quite new to pandas and numpy (and machine learning in general).

path = os. getcwd() + '\ex1data1.txt'

data = pd.read_csv(path, header=None)



numRows = data.shape[0]

numCols = data.shape[1]



X = data.iloc[:,0:numCols-1].copy()

theta = pd.DataFrame(np.zeros((X.shape[1], 1)))

y = data.iloc[:,-1].copy()



#start computing cost sum((X-theta)-y).^2)

predictions = X.dot(theta)

print("predictions shape: {0}".format(predictions.shape))

print(predictions.head())

print("y shape: {0}".format(y.shape))

print(y.head())



errors = predictions.subtract(y)



print("errors shape: {0}".format(errors.shape))

print(errors.head())

output:

predictions shape: (97, 1)

 0

0  0.0

1  0.0

2  0.0

3  0.0

4  0.0

y shape: (97, 1)

     1

0  17.5920

1   9.1302

2  13.6620

3  11.8540

4   6.8233

errors shape: (97, 2)

0   1

0 NaN NaN

1 NaN NaN

2 NaN NaN

3 NaN NaN

4 NaN NaN

If I use ‍‍‍‍‍‍y = data.ix[:,-1:0] - the above code does the correct calculations. Output:

 errors shape: (97, 1)

         0

     0 -6.1101

     1 -5.5277

     2 -8.5186

     3 -7.0032

     4 -5.8598

But I am trying to stay away from ix as it has been said it is deprecating.

How to I tell pandas that the new matrix has a start column of 0 and why is this not the default behavior?

asked Nov 22 '18 at 20:23

Aserian

3991418

add a comment |

I should start by saying that I am quite new to pandas and numpy (and machine learning in general).

path = os. getcwd() + '\ex1data1.txt'

data = pd.read_csv(path, header=None)



numRows = data.shape[0]

numCols = data.shape[1]



X = data.iloc[:,0:numCols-1].copy()

theta = pd.DataFrame(np.zeros((X.shape[1], 1)))

y = data.iloc[:,-1].copy()



#start computing cost sum((X-theta)-y).^2)

predictions = X.dot(theta)

print("predictions shape: {0}".format(predictions.shape))

print(predictions.head())

print("y shape: {0}".format(y.shape))

print(y.head())



errors = predictions.subtract(y)



print("errors shape: {0}".format(errors.shape))

print(errors.head())

output:

predictions shape: (97, 1)

 0

0  0.0

1  0.0

2  0.0

3  0.0

4  0.0

y shape: (97, 1)

     1

0  17.5920

1   9.1302

2  13.6620

3  11.8540

4   6.8233

errors shape: (97, 2)

0   1

0 NaN NaN

1 NaN NaN

2 NaN NaN

3 NaN NaN

4 NaN NaN

If I use ‍‍‍‍‍‍y = data.ix[:,-1:0] - the above code does the correct calculations. Output:

 errors shape: (97, 1)

         0

     0 -6.1101

     1 -5.5277

     2 -8.5186

     3 -7.0032

     4 -5.8598

But I am trying to stay away from ix as it has been said it is deprecating.

How to I tell pandas that the new matrix has a start column of 0 and why is this not the default behavior?

asked Nov 22 '18 at 20:23

Aserian

3991418

I should start by saying that I am quite new to pandas and numpy (and machine learning in general).

path = os. getcwd() + '\ex1data1.txt'

data = pd.read_csv(path, header=None)



numRows = data.shape[0]

numCols = data.shape[1]



X = data.iloc[:,0:numCols-1].copy()

theta = pd.DataFrame(np.zeros((X.shape[1], 1)))

y = data.iloc[:,-1].copy()



#start computing cost sum((X-theta)-y).^2)

predictions = X.dot(theta)

print("predictions shape: {0}".format(predictions.shape))

print(predictions.head())

print("y shape: {0}".format(y.shape))

print(y.head())



errors = predictions.subtract(y)



print("errors shape: {0}".format(errors.shape))

print(errors.head())

output:

predictions shape: (97, 1)

 0

0  0.0

1  0.0

2  0.0

3  0.0

4  0.0

y shape: (97, 1)

     1

0  17.5920

1   9.1302

2  13.6620

3  11.8540

4   6.8233

errors shape: (97, 2)

0   1

0 NaN NaN

1 NaN NaN

2 NaN NaN

3 NaN NaN

4 NaN NaN

If I use ‍‍‍‍‍‍y = data.ix[:,-1:0] - the above code does the correct calculations. Output:

 errors shape: (97, 1)

         0

     0 -6.1101

     1 -5.5277

     2 -8.5186

     3 -7.0032

     4 -5.8598

But I am trying to stay away from ix as it has been said it is deprecating.

How to I tell pandas that the new matrix has a start column of 0 and why is this not the default behavior?

python pandas

asked Nov 22 '18 at 20:23

Aserian

3991418

asked Nov 22 '18 at 20:23

Aserian

3991418

asked Nov 22 '18 at 20:23

Aserian

3991418

asked Nov 22 '18 at 20:23

Aserian

3991418

asked Nov 22 '18 at 20:23

Aserian

3991418

add a comment |

1 Answer
1

active

oldest

votes

Looks like the calculation you actually want to do is on the series (individual columns). So you should be able to do:

predictions[0].subtract(y[1])

To get the value you want. This looks kind of confusing because you have numbers as DataFrame columns, you are selecting the columns you want (0 and 1) and performing the subtraction between them.

Or using iloc as you originally suggested, which gives you more like matrix style indexing you could do this:

predictions.iloc[:, 0].subtract(y.iloc[:, 0])

Because in each DataFrame you want all the rows and the first column

edited Nov 22 '18 at 20:43

answered Nov 22 '18 at 20:35

Sven Harris

1,8571412

Thank you very much for the help! I didn't realize that the columns, or column names rather, mattered. Is there a more succinct way to turn a matrix into two separate matrices? Or is the way that I am doing it acceptable?

– Aserian
Nov 22 '18 at 20:42

Yeah looks pretty acceptable overall

– Sven Harris
Nov 22 '18 at 20:52

Thank you for your help

– Aserian
Nov 22 '18 at 20:57

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437621%2fpandas-iloc-wrong-index-causing-problems-with-subtraction%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Looks like the calculation you actually want to do is on the series (individual columns). So you should be able to do:

predictions[0].subtract(y[1])

To get the value you want. This looks kind of confusing because you have numbers as DataFrame columns, you are selecting the columns you want (0 and 1) and performing the subtraction between them.

Or using iloc as you originally suggested, which gives you more like matrix style indexing you could do this:

predictions.iloc[:, 0].subtract(y.iloc[:, 0])

Because in each DataFrame you want all the rows and the first column

edited Nov 22 '18 at 20:43

answered Nov 22 '18 at 20:35

Sven Harris

1,8571412

Thank you very much for the help! I didn't realize that the columns, or column names rather, mattered. Is there a more succinct way to turn a matrix into two separate matrices? Or is the way that I am doing it acceptable?

– Aserian
Nov 22 '18 at 20:42

Yeah looks pretty acceptable overall

– Sven Harris
Nov 22 '18 at 20:52

Thank you for your help

– Aserian
Nov 22 '18 at 20:57

add a comment |

Looks like the calculation you actually want to do is on the series (individual columns). So you should be able to do:

predictions[0].subtract(y[1])

To get the value you want. This looks kind of confusing because you have numbers as DataFrame columns, you are selecting the columns you want (0 and 1) and performing the subtraction between them.

Or using iloc as you originally suggested, which gives you more like matrix style indexing you could do this:

predictions.iloc[:, 0].subtract(y.iloc[:, 0])

Because in each DataFrame you want all the rows and the first column

edited Nov 22 '18 at 20:43

answered Nov 22 '18 at 20:35

Sven Harris

1,8571412

Thank you very much for the help! I didn't realize that the columns, or column names rather, mattered. Is there a more succinct way to turn a matrix into two separate matrices? Or is the way that I am doing it acceptable?

– Aserian
Nov 22 '18 at 20:42

Yeah looks pretty acceptable overall

– Sven Harris
Nov 22 '18 at 20:52

Thank you for your help

– Aserian
Nov 22 '18 at 20:57

add a comment |

Looks like the calculation you actually want to do is on the series (individual columns). So you should be able to do:

predictions[0].subtract(y[1])

To get the value you want. This looks kind of confusing because you have numbers as DataFrame columns, you are selecting the columns you want (0 and 1) and performing the subtraction between them.

Or using iloc as you originally suggested, which gives you more like matrix style indexing you could do this:

predictions.iloc[:, 0].subtract(y.iloc[:, 0])

Because in each DataFrame you want all the rows and the first column

edited Nov 22 '18 at 20:43

answered Nov 22 '18 at 20:35

Sven Harris

1,8571412

Looks like the calculation you actually want to do is on the series (individual columns). So you should be able to do:

predictions[0].subtract(y[1])

To get the value you want. This looks kind of confusing because you have numbers as DataFrame columns, you are selecting the columns you want (0 and 1) and performing the subtraction between them.

Or using iloc as you originally suggested, which gives you more like matrix style indexing you could do this:

predictions.iloc[:, 0].subtract(y.iloc[:, 0])

Because in each DataFrame you want all the rows and the first column

edited Nov 22 '18 at 20:43

answered Nov 22 '18 at 20:35

Sven Harris

1,8571412

edited Nov 22 '18 at 20:43

answered Nov 22 '18 at 20:35

Sven Harris

1,8571412

answered Nov 22 '18 at 20:35

Sven Harris

1,8571412

answered Nov 22 '18 at 20:35

Sven Harris

1,8571412

Thank you very much for the help! I didn't realize that the columns, or column names rather, mattered. Is there a more succinct way to turn a matrix into two separate matrices? Or is the way that I am doing it acceptable?

– Aserian
Nov 22 '18 at 20:42

Yeah looks pretty acceptable overall

– Sven Harris
Nov 22 '18 at 20:52

Thank you for your help

– Aserian
Nov 22 '18 at 20:57

add a comment |

Thank you very much for the help! I didn't realize that the columns, or column names rather, mattered. Is there a more succinct way to turn a matrix into two separate matrices? Or is the way that I am doing it acceptable?

– Aserian
Nov 22 '18 at 20:42

Yeah looks pretty acceptable overall

– Sven Harris
Nov 22 '18 at 20:52

Thank you for your help

– Aserian
Nov 22 '18 at 20:57

Thank you very much for the help! I didn't realize that the columns, or column names rather, mattered. Is there a more succinct way to turn a matrix into two separate matrices? Or is the way that I am doing it acceptable?

– Aserian
Nov 22 '18 at 20:42

Yeah looks pretty acceptable overall

– Sven Harris
Nov 22 '18 at 20:52

Thank you for your help

– Aserian
Nov 22 '18 at 20:57

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk