Compute how many samples have been improved, according to a minimum threshold or confidence interval, in a...
I have the following dataframe:
ID VAL1 VAL2
Q2241 0.3333 0.3353
Q2242 0.5 0.5
Q2243 0.3333 0.3333
Q2244 0.2137 0.4792
Q2245 0.1429 0.2
Q2246 0.5 0.5
Q2247 0.4167 0.6667
Q2248 1 1
Q2249 0.125 0.0909
Q2250 0.2 0.2
Q2251 0.325 0.2667
Q2252 0.1667 0.2
Q2253 0.3333 0.25
Q2254 0.45 0.8333
Q2255 0.3333 0.5
Q2256 1 1
Q2257 0.5 0.51
Q2258 0.3929 0.3333
Q2259 0.3611 0.625
Is there a way to correctly compute the number of samples (ID
) where VAL2
is significantly higher/lower than VAL1
in a given dataframe.
I'm looking for something like t-test, where a measure gives results like the following example:
Win Tie Loss
64 36 137
where:
Win: number of IDs where VAL2 is higher than VAL1 with some confidence interval
Tie: number of IDs where VAL2 ~ VAL1 (no significant difference, 0.0001 for example)
Loss: number of IDs where VAL2 is lower than VAL1 with some confidence interval
python dataframe statistics difference
add a comment |
I have the following dataframe:
ID VAL1 VAL2
Q2241 0.3333 0.3353
Q2242 0.5 0.5
Q2243 0.3333 0.3333
Q2244 0.2137 0.4792
Q2245 0.1429 0.2
Q2246 0.5 0.5
Q2247 0.4167 0.6667
Q2248 1 1
Q2249 0.125 0.0909
Q2250 0.2 0.2
Q2251 0.325 0.2667
Q2252 0.1667 0.2
Q2253 0.3333 0.25
Q2254 0.45 0.8333
Q2255 0.3333 0.5
Q2256 1 1
Q2257 0.5 0.51
Q2258 0.3929 0.3333
Q2259 0.3611 0.625
Is there a way to correctly compute the number of samples (ID
) where VAL2
is significantly higher/lower than VAL1
in a given dataframe.
I'm looking for something like t-test, where a measure gives results like the following example:
Win Tie Loss
64 36 137
where:
Win: number of IDs where VAL2 is higher than VAL1 with some confidence interval
Tie: number of IDs where VAL2 ~ VAL1 (no significant difference, 0.0001 for example)
Loss: number of IDs where VAL2 is lower than VAL1 with some confidence interval
python dataframe statistics difference
add a comment |
I have the following dataframe:
ID VAL1 VAL2
Q2241 0.3333 0.3353
Q2242 0.5 0.5
Q2243 0.3333 0.3333
Q2244 0.2137 0.4792
Q2245 0.1429 0.2
Q2246 0.5 0.5
Q2247 0.4167 0.6667
Q2248 1 1
Q2249 0.125 0.0909
Q2250 0.2 0.2
Q2251 0.325 0.2667
Q2252 0.1667 0.2
Q2253 0.3333 0.25
Q2254 0.45 0.8333
Q2255 0.3333 0.5
Q2256 1 1
Q2257 0.5 0.51
Q2258 0.3929 0.3333
Q2259 0.3611 0.625
Is there a way to correctly compute the number of samples (ID
) where VAL2
is significantly higher/lower than VAL1
in a given dataframe.
I'm looking for something like t-test, where a measure gives results like the following example:
Win Tie Loss
64 36 137
where:
Win: number of IDs where VAL2 is higher than VAL1 with some confidence interval
Tie: number of IDs where VAL2 ~ VAL1 (no significant difference, 0.0001 for example)
Loss: number of IDs where VAL2 is lower than VAL1 with some confidence interval
python dataframe statistics difference
I have the following dataframe:
ID VAL1 VAL2
Q2241 0.3333 0.3353
Q2242 0.5 0.5
Q2243 0.3333 0.3333
Q2244 0.2137 0.4792
Q2245 0.1429 0.2
Q2246 0.5 0.5
Q2247 0.4167 0.6667
Q2248 1 1
Q2249 0.125 0.0909
Q2250 0.2 0.2
Q2251 0.325 0.2667
Q2252 0.1667 0.2
Q2253 0.3333 0.25
Q2254 0.45 0.8333
Q2255 0.3333 0.5
Q2256 1 1
Q2257 0.5 0.51
Q2258 0.3929 0.3333
Q2259 0.3611 0.625
Is there a way to correctly compute the number of samples (ID
) where VAL2
is significantly higher/lower than VAL1
in a given dataframe.
I'm looking for something like t-test, where a measure gives results like the following example:
Win Tie Loss
64 36 137
where:
Win: number of IDs where VAL2 is higher than VAL1 with some confidence interval
Tie: number of IDs where VAL2 ~ VAL1 (no significant difference, 0.0001 for example)
Loss: number of IDs where VAL2 is lower than VAL1 with some confidence interval
python dataframe statistics difference
python dataframe statistics difference
asked Nov 24 '18 at 10:28
Belkacem ThiziriBelkacem Thiziri
69111
69111
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
tol = 0.0001
win = (df.VAL2 > (df.VAL1 + tol)).sum()
loss = (df.VAL2 < (df.VAL1 - tol)).sum()
tie = ((df.VAL1 - df.VAL2).abs() <= tol).sum()
df = pd.DataFrame([{'Win': win, 'Tie':tie, 'Loss': loss}])
print (df)
# Loss Tie Win
# 0 4 6 9
Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?
– Belkacem Thiziri
Nov 24 '18 at 12:31
There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can usescipy.stats.ttest_ind
in Python.
– Ghilas BELHADJ
Nov 24 '18 at 12:37
The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?
– Belkacem Thiziri
Nov 24 '18 at 12:51
Probably not. but I'll let you know if I find something.
– Ghilas BELHADJ
Nov 24 '18 at 13:38
Okay, thanks. I'll discuss that with my advisors then leave a comment here.
– Belkacem Thiziri
Nov 24 '18 at 13:41
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53457222%2fcompute-how-many-samples-have-been-improved-according-to-a-minimum-threshold-or%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
tol = 0.0001
win = (df.VAL2 > (df.VAL1 + tol)).sum()
loss = (df.VAL2 < (df.VAL1 - tol)).sum()
tie = ((df.VAL1 - df.VAL2).abs() <= tol).sum()
df = pd.DataFrame([{'Win': win, 'Tie':tie, 'Loss': loss}])
print (df)
# Loss Tie Win
# 0 4 6 9
Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?
– Belkacem Thiziri
Nov 24 '18 at 12:31
There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can usescipy.stats.ttest_ind
in Python.
– Ghilas BELHADJ
Nov 24 '18 at 12:37
The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?
– Belkacem Thiziri
Nov 24 '18 at 12:51
Probably not. but I'll let you know if I find something.
– Ghilas BELHADJ
Nov 24 '18 at 13:38
Okay, thanks. I'll discuss that with my advisors then leave a comment here.
– Belkacem Thiziri
Nov 24 '18 at 13:41
add a comment |
tol = 0.0001
win = (df.VAL2 > (df.VAL1 + tol)).sum()
loss = (df.VAL2 < (df.VAL1 - tol)).sum()
tie = ((df.VAL1 - df.VAL2).abs() <= tol).sum()
df = pd.DataFrame([{'Win': win, 'Tie':tie, 'Loss': loss}])
print (df)
# Loss Tie Win
# 0 4 6 9
Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?
– Belkacem Thiziri
Nov 24 '18 at 12:31
There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can usescipy.stats.ttest_ind
in Python.
– Ghilas BELHADJ
Nov 24 '18 at 12:37
The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?
– Belkacem Thiziri
Nov 24 '18 at 12:51
Probably not. but I'll let you know if I find something.
– Ghilas BELHADJ
Nov 24 '18 at 13:38
Okay, thanks. I'll discuss that with my advisors then leave a comment here.
– Belkacem Thiziri
Nov 24 '18 at 13:41
add a comment |
tol = 0.0001
win = (df.VAL2 > (df.VAL1 + tol)).sum()
loss = (df.VAL2 < (df.VAL1 - tol)).sum()
tie = ((df.VAL1 - df.VAL2).abs() <= tol).sum()
df = pd.DataFrame([{'Win': win, 'Tie':tie, 'Loss': loss}])
print (df)
# Loss Tie Win
# 0 4 6 9
tol = 0.0001
win = (df.VAL2 > (df.VAL1 + tol)).sum()
loss = (df.VAL2 < (df.VAL1 - tol)).sum()
tie = ((df.VAL1 - df.VAL2).abs() <= tol).sum()
df = pd.DataFrame([{'Win': win, 'Tie':tie, 'Loss': loss}])
print (df)
# Loss Tie Win
# 0 4 6 9
answered Nov 24 '18 at 11:20
Ghilas BELHADJGhilas BELHADJ
5,31962961
5,31962961
Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?
– Belkacem Thiziri
Nov 24 '18 at 12:31
There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can usescipy.stats.ttest_ind
in Python.
– Ghilas BELHADJ
Nov 24 '18 at 12:37
The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?
– Belkacem Thiziri
Nov 24 '18 at 12:51
Probably not. but I'll let you know if I find something.
– Ghilas BELHADJ
Nov 24 '18 at 13:38
Okay, thanks. I'll discuss that with my advisors then leave a comment here.
– Belkacem Thiziri
Nov 24 '18 at 13:41
add a comment |
Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?
– Belkacem Thiziri
Nov 24 '18 at 12:31
There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can usescipy.stats.ttest_ind
in Python.
– Ghilas BELHADJ
Nov 24 '18 at 12:37
The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?
– Belkacem Thiziri
Nov 24 '18 at 12:51
Probably not. but I'll let you know if I find something.
– Ghilas BELHADJ
Nov 24 '18 at 13:38
Okay, thanks. I'll discuss that with my advisors then leave a comment here.
– Belkacem Thiziri
Nov 24 '18 at 13:41
Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?
– Belkacem Thiziri
Nov 24 '18 at 12:31
Thanks @Ghilas BELHADJ I already tried something like that, but I was wondering if there is some specific method in data science that enable to do such a statistic, some official function like "t-test" ?
– Belkacem Thiziri
Nov 24 '18 at 12:31
There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can use
scipy.stats.ttest_ind
in Python.– Ghilas BELHADJ
Nov 24 '18 at 12:37
There is a list of Software implementations in the bottom of your Wikipedia page. So basically, you can use
scipy.stats.ttest_ind
in Python.– Ghilas BELHADJ
Nov 24 '18 at 12:37
The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?
– Belkacem Thiziri
Nov 24 '18 at 12:51
The t-test evaluates the significance of the differences between 2 distributions but does not give how many samples that are significantly different. One better solution will combine the significance probability given by the t-test with something else to compute the number of these samples. Do you think that computing the t-test value for each line in my data frame will make sense?
– Belkacem Thiziri
Nov 24 '18 at 12:51
Probably not. but I'll let you know if I find something.
– Ghilas BELHADJ
Nov 24 '18 at 13:38
Probably not. but I'll let you know if I find something.
– Ghilas BELHADJ
Nov 24 '18 at 13:38
Okay, thanks. I'll discuss that with my advisors then leave a comment here.
– Belkacem Thiziri
Nov 24 '18 at 13:41
Okay, thanks. I'll discuss that with my advisors then leave a comment here.
– Belkacem Thiziri
Nov 24 '18 at 13:41
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53457222%2fcompute-how-many-samples-have-been-improved-according-to-a-minimum-threshold-or%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown