Measuring covariance on several rows

I'm new to Python and I'm trying to find my way by trying to perform some calculations (i can do them easily in excel, but now I want to know how to do it in Python).

One calculation is the covariance.
I have a simple example where I have 3 items that are sold and we have the demand per item of 24 months.

Here, you see a snapshot of the excel file:

Items and their demand over 24 months

The goal is to measure the covariance between all the three items. Thus the covariance between item 1 and 2, 1 and 3 and 2 and 3. But also, I want to know how to do it for more than 3 items, let's say for a thousand items.

The calculations are as follows:

First I have to calculate the averages per item. This is already something I found by doing the following code:

after importing the following:

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

I imported the file:

df = pd.read_excel("Directory\Covariance.xlsx")

And calculated the average per row:

x=df.iloc[:,1:].values

df['avg'] = x.mean(axis=1)

This gives the file with an extra column, the average (avg):

Items, their demand and the average

The following calculation that should be done is to calculate the covariance between, lets say for example, item 1 and 2. this is mathematically done as follows:

(column "1" of item 1- column "avg" of item 1)*(column "1" of item 2- column "avg" of item 2). This has to be done for column "1" to "24", so 24 times. This should add 24 columns to the file df.

After this, we should take the average of these columns and that displays the covariance between item 1 and 2. Because we have to do this N-1 times, so in this simple case we should have 2 covariance numbers (for the first item, the covariance with item 2 and 3, for the second item, the covariance with item 1 and 3 and for the third item, the covariance with item 1 and 2).

So the first question is; how can I achieve this for these 3 items, so that the file has a column that displays 2 covariance outcomes per item (first item should have a column with the covariance number of item 1 and 2 and a second column with the covariance number between item 1 and 3, and so on...).

The second question is of course: what if I have a 1000 items; how do I then efficiently do this, because then I have 999 covariance numbers per item and thus 999 extra columns, but also 999*25 columns extra if I calculate it via the above methodology. So how do I perform this calculation for every item as efficient as possible?

edited Nov 23 '18 at 17:22

desertnaut

17.3k63768

asked Nov 23 '18 at 8:01

Steven Pauly

528

add a comment |

I'm new to Python and I'm trying to find my way by trying to perform some calculations (i can do them easily in excel, but now I want to know how to do it in Python).

One calculation is the covariance.
I have a simple example where I have 3 items that are sold and we have the demand per item of 24 months.

Here, you see a snapshot of the excel file:

Items and their demand over 24 months

The calculations are as follows:

First I have to calculate the averages per item. This is already something I found by doing the following code:

after importing the following:

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

I imported the file:

df = pd.read_excel("Directory\Covariance.xlsx")

And calculated the average per row:

x=df.iloc[:,1:].values

df['avg'] = x.mean(axis=1)

This gives the file with an extra column, the average (avg):

Items, their demand and the average

The following calculation that should be done is to calculate the covariance between, lets say for example, item 1 and 2. this is mathematically done as follows:

(column "1" of item 1- column "avg" of item 1)*(column "1" of item 2- column "avg" of item 2). This has to be done for column "1" to "24", so 24 times. This should add 24 columns to the file df.

edited Nov 23 '18 at 17:22

desertnaut

17.3k63768

asked Nov 23 '18 at 8:01

Steven Pauly

528

add a comment |

I'm new to Python and I'm trying to find my way by trying to perform some calculations (i can do them easily in excel, but now I want to know how to do it in Python).

One calculation is the covariance.
I have a simple example where I have 3 items that are sold and we have the demand per item of 24 months.

Here, you see a snapshot of the excel file:

Items and their demand over 24 months

The calculations are as follows:

First I have to calculate the averages per item. This is already something I found by doing the following code:

after importing the following:

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

I imported the file:

df = pd.read_excel("Directory\Covariance.xlsx")

And calculated the average per row:

x=df.iloc[:,1:].values

df['avg'] = x.mean(axis=1)

This gives the file with an extra column, the average (avg):

Items, their demand and the average

The following calculation that should be done is to calculate the covariance between, lets say for example, item 1 and 2. this is mathematically done as follows:

(column "1" of item 1- column "avg" of item 1)*(column "1" of item 2- column "avg" of item 2). This has to be done for column "1" to "24", so 24 times. This should add 24 columns to the file df.

edited Nov 23 '18 at 17:22

desertnaut

17.3k63768

asked Nov 23 '18 at 8:01

Steven Pauly

528

I'm new to Python and I'm trying to find my way by trying to perform some calculations (i can do them easily in excel, but now I want to know how to do it in Python).

One calculation is the covariance.
I have a simple example where I have 3 items that are sold and we have the demand per item of 24 months.

Here, you see a snapshot of the excel file:

Items and their demand over 24 months

The calculations are as follows:

First I have to calculate the averages per item. This is already something I found by doing the following code:

after importing the following:

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

I imported the file:

df = pd.read_excel("Directory\Covariance.xlsx")

And calculated the average per row:

x=df.iloc[:,1:].values

df['avg'] = x.mean(axis=1)

This gives the file with an extra column, the average (avg):

Items, their demand and the average

The following calculation that should be done is to calculate the covariance between, lets say for example, item 1 and 2. this is mathematically done as follows:

(column "1" of item 1- column "avg" of item 1)*(column "1" of item 2- column "avg" of item 2). This has to be done for column "1" to "24", so 24 times. This should add 24 columns to the file df.

python pandas statistics covariance

edited Nov 23 '18 at 17:22

desertnaut

17.3k63768

asked Nov 23 '18 at 8:01

Steven Pauly

528

edited Nov 23 '18 at 17:22

desertnaut

17.3k63768

asked Nov 23 '18 at 8:01

Steven Pauly

528

edited Nov 23 '18 at 17:22

desertnaut

17.3k63768

edited Nov 23 '18 at 17:22

desertnaut

17.3k63768

edited Nov 23 '18 at 17:22

desertnaut

17.3k63768

asked Nov 23 '18 at 8:01

Steven Pauly

528

asked Nov 23 '18 at 8:01

Steven Pauly

528

asked Nov 23 '18 at 8:01

Steven Pauly

528

add a comment |

1 Answer
1

active

oldest

votes

Pandas has a builtin function to calculate the covariance matrix, but first you need to make sure your dataframe is in the correct format. The first column in your data actually contains the row labels, so let's put those in the index:

df = pd.read_excel("Directory\Covariance.xlsx", index_col=0)

Then you can calculate also the mean more easily, but don't put it back in your dataframe yet!

avg = df.mean(axis=1)

To calculate the covariance matrix, just call .cov(). This however calculates pair-wise covariances of columns, to transpose the dataframe first:

cov = df.T.cov()

If you want, you can put everything together in 1 dataframe:

df['avg'] = avg

df = df.join(cov, rsuffix='_cov')

Note: the covariance matrix includes the covariance with itself = the variance per item.

answered Nov 23 '18 at 8:25

Rob

2,30611024

thanks! This works perfect!

– Steven Pauly
Nov 23 '18 at 13:35

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53442705%2fmeasuring-covariance-on-several-rows%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

df = pd.read_excel("Directory\Covariance.xlsx", index_col=0)

Then you can calculate also the mean more easily, but don't put it back in your dataframe yet!

avg = df.mean(axis=1)

To calculate the covariance matrix, just call .cov(). This however calculates pair-wise covariances of columns, to transpose the dataframe first:

cov = df.T.cov()

If you want, you can put everything together in 1 dataframe:

df['avg'] = avg

df = df.join(cov, rsuffix='_cov')

Note: the covariance matrix includes the covariance with itself = the variance per item.

answered Nov 23 '18 at 8:25

Rob

2,30611024

thanks! This works perfect!

– Steven Pauly
Nov 23 '18 at 13:35

add a comment |

df = pd.read_excel("Directory\Covariance.xlsx", index_col=0)

Then you can calculate also the mean more easily, but don't put it back in your dataframe yet!

avg = df.mean(axis=1)

To calculate the covariance matrix, just call .cov(). This however calculates pair-wise covariances of columns, to transpose the dataframe first:

cov = df.T.cov()

If you want, you can put everything together in 1 dataframe:

df['avg'] = avg

df = df.join(cov, rsuffix='_cov')

Note: the covariance matrix includes the covariance with itself = the variance per item.

answered Nov 23 '18 at 8:25

Rob

2,30611024

thanks! This works perfect!

– Steven Pauly
Nov 23 '18 at 13:35

add a comment |

df = pd.read_excel("Directory\Covariance.xlsx", index_col=0)

Then you can calculate also the mean more easily, but don't put it back in your dataframe yet!

avg = df.mean(axis=1)

To calculate the covariance matrix, just call .cov(). This however calculates pair-wise covariances of columns, to transpose the dataframe first:

cov = df.T.cov()

If you want, you can put everything together in 1 dataframe:

df['avg'] = avg

df = df.join(cov, rsuffix='_cov')

Note: the covariance matrix includes the covariance with itself = the variance per item.

answered Nov 23 '18 at 8:25

Rob

2,30611024

df = pd.read_excel("Directory\Covariance.xlsx", index_col=0)

Then you can calculate also the mean more easily, but don't put it back in your dataframe yet!

avg = df.mean(axis=1)

To calculate the covariance matrix, just call .cov(). This however calculates pair-wise covariances of columns, to transpose the dataframe first:

cov = df.T.cov()

If you want, you can put everything together in 1 dataframe:

df['avg'] = avg

df = df.join(cov, rsuffix='_cov')

Note: the covariance matrix includes the covariance with itself = the variance per item.

answered Nov 23 '18 at 8:25

Rob

2,30611024

answered Nov 23 '18 at 8:25

Rob

2,30611024

answered Nov 23 '18 at 8:25

Rob

2,30611024

answered Nov 23 '18 at 8:25

Rob

2,30611024

thanks! This works perfect!

– Steven Pauly
Nov 23 '18 at 13:35

add a comment |

thanks! This works perfect!

– Steven Pauly
Nov 23 '18 at 13:35

thanks! This works perfect!

– Steven Pauly
Nov 23 '18 at 13:35

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk