Numpy performance differences depending on numerical values
I found a strange performance difference while evaluating an expression in Numpy.
I executed the following code:
import numpy as np
myarr = np.random.uniform(-1,1,[1100,1100])
and then
%timeit np.exp( - 0.5 * (myarr / 0.001)**2 )
>> 184 ms ± 301 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
and
%timeit np.exp( - 0.5 * (myarr / 0.1)**2 )
>> 12.3 ms ± 34.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
That's an almost 15x faster computation in the second case! Note that the only difference is the factor being 0.1 or 0.001.
What's the reason for this behaviour? Can I change something to make the first calculation as fast as the second?
python performance numpy
|
show 14 more comments
I found a strange performance difference while evaluating an expression in Numpy.
I executed the following code:
import numpy as np
myarr = np.random.uniform(-1,1,[1100,1100])
and then
%timeit np.exp( - 0.5 * (myarr / 0.001)**2 )
>> 184 ms ± 301 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
and
%timeit np.exp( - 0.5 * (myarr / 0.1)**2 )
>> 12.3 ms ± 34.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
That's an almost 15x faster computation in the second case! Note that the only difference is the factor being 0.1 or 0.001.
What's the reason for this behaviour? Can I change something to make the first calculation as fast as the second?
python performance numpy
1
OK, on Windows, NumPy 1.14.3, Python 3.6.0, I see 97.7ms vs 47.7ms.
– jpp
Nov 21 '18 at 15:53
3
In my system,exp
of large (negative) numbers are slower:exp(-1)
is faster thanexp(-1000)
. So it probably comes down to some slower covergence of theexp
algorithm with large numbers
– Brenlla
Nov 21 '18 at 15:55
1
@MattMessersmith Reasonable explanation, but nope.exp(1)
is still much faster thanexp(1000)
– Brenlla
Nov 21 '18 at 16:23
2
My first guess (based on the title) was that there are some denormalized numbers involved - see stackoverflow.com/questions/36781881/… I didn't verify this in all depth for the specific numpy/python setup, but they can be awfully slow...
– Marco13
Nov 21 '18 at 19:53
2
@Marco13, yes, in fact,exp(-708)
is a normal float, andexp(-709)
is denormal, and that's where I see (on Mac OS X) a big jump in execution time. Underflow to zero doesn't occur until aboutexp(-746)
.
– Warren Weckesser
Nov 21 '18 at 20:48
|
show 14 more comments
I found a strange performance difference while evaluating an expression in Numpy.
I executed the following code:
import numpy as np
myarr = np.random.uniform(-1,1,[1100,1100])
and then
%timeit np.exp( - 0.5 * (myarr / 0.001)**2 )
>> 184 ms ± 301 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
and
%timeit np.exp( - 0.5 * (myarr / 0.1)**2 )
>> 12.3 ms ± 34.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
That's an almost 15x faster computation in the second case! Note that the only difference is the factor being 0.1 or 0.001.
What's the reason for this behaviour? Can I change something to make the first calculation as fast as the second?
python performance numpy
I found a strange performance difference while evaluating an expression in Numpy.
I executed the following code:
import numpy as np
myarr = np.random.uniform(-1,1,[1100,1100])
and then
%timeit np.exp( - 0.5 * (myarr / 0.001)**2 )
>> 184 ms ± 301 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
and
%timeit np.exp( - 0.5 * (myarr / 0.1)**2 )
>> 12.3 ms ± 34.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
That's an almost 15x faster computation in the second case! Note that the only difference is the factor being 0.1 or 0.001.
What's the reason for this behaviour? Can I change something to make the first calculation as fast as the second?
python performance numpy
python performance numpy
asked Nov 21 '18 at 15:36
Ethunxxx
379515
379515
1
OK, on Windows, NumPy 1.14.3, Python 3.6.0, I see 97.7ms vs 47.7ms.
– jpp
Nov 21 '18 at 15:53
3
In my system,exp
of large (negative) numbers are slower:exp(-1)
is faster thanexp(-1000)
. So it probably comes down to some slower covergence of theexp
algorithm with large numbers
– Brenlla
Nov 21 '18 at 15:55
1
@MattMessersmith Reasonable explanation, but nope.exp(1)
is still much faster thanexp(1000)
– Brenlla
Nov 21 '18 at 16:23
2
My first guess (based on the title) was that there are some denormalized numbers involved - see stackoverflow.com/questions/36781881/… I didn't verify this in all depth for the specific numpy/python setup, but they can be awfully slow...
– Marco13
Nov 21 '18 at 19:53
2
@Marco13, yes, in fact,exp(-708)
is a normal float, andexp(-709)
is denormal, and that's where I see (on Mac OS X) a big jump in execution time. Underflow to zero doesn't occur until aboutexp(-746)
.
– Warren Weckesser
Nov 21 '18 at 20:48
|
show 14 more comments
1
OK, on Windows, NumPy 1.14.3, Python 3.6.0, I see 97.7ms vs 47.7ms.
– jpp
Nov 21 '18 at 15:53
3
In my system,exp
of large (negative) numbers are slower:exp(-1)
is faster thanexp(-1000)
. So it probably comes down to some slower covergence of theexp
algorithm with large numbers
– Brenlla
Nov 21 '18 at 15:55
1
@MattMessersmith Reasonable explanation, but nope.exp(1)
is still much faster thanexp(1000)
– Brenlla
Nov 21 '18 at 16:23
2
My first guess (based on the title) was that there are some denormalized numbers involved - see stackoverflow.com/questions/36781881/… I didn't verify this in all depth for the specific numpy/python setup, but they can be awfully slow...
– Marco13
Nov 21 '18 at 19:53
2
@Marco13, yes, in fact,exp(-708)
is a normal float, andexp(-709)
is denormal, and that's where I see (on Mac OS X) a big jump in execution time. Underflow to zero doesn't occur until aboutexp(-746)
.
– Warren Weckesser
Nov 21 '18 at 20:48
1
1
OK, on Windows, NumPy 1.14.3, Python 3.6.0, I see 97.7ms vs 47.7ms.
– jpp
Nov 21 '18 at 15:53
OK, on Windows, NumPy 1.14.3, Python 3.6.0, I see 97.7ms vs 47.7ms.
– jpp
Nov 21 '18 at 15:53
3
3
In my system,
exp
of large (negative) numbers are slower: exp(-1)
is faster than exp(-1000)
. So it probably comes down to some slower covergence of the exp
algorithm with large numbers– Brenlla
Nov 21 '18 at 15:55
In my system,
exp
of large (negative) numbers are slower: exp(-1)
is faster than exp(-1000)
. So it probably comes down to some slower covergence of the exp
algorithm with large numbers– Brenlla
Nov 21 '18 at 15:55
1
1
@MattMessersmith Reasonable explanation, but nope.
exp(1)
is still much faster than exp(1000)
– Brenlla
Nov 21 '18 at 16:23
@MattMessersmith Reasonable explanation, but nope.
exp(1)
is still much faster than exp(1000)
– Brenlla
Nov 21 '18 at 16:23
2
2
My first guess (based on the title) was that there are some denormalized numbers involved - see stackoverflow.com/questions/36781881/… I didn't verify this in all depth for the specific numpy/python setup, but they can be awfully slow...
– Marco13
Nov 21 '18 at 19:53
My first guess (based on the title) was that there are some denormalized numbers involved - see stackoverflow.com/questions/36781881/… I didn't verify this in all depth for the specific numpy/python setup, but they can be awfully slow...
– Marco13
Nov 21 '18 at 19:53
2
2
@Marco13, yes, in fact,
exp(-708)
is a normal float, and exp(-709)
is denormal, and that's where I see (on Mac OS X) a big jump in execution time. Underflow to zero doesn't occur until about exp(-746)
.– Warren Weckesser
Nov 21 '18 at 20:48
@Marco13, yes, in fact,
exp(-708)
is a normal float, and exp(-709)
is denormal, and that's where I see (on Mac OS X) a big jump in execution time. Underflow to zero doesn't occur until about exp(-746)
.– Warren Weckesser
Nov 21 '18 at 20:48
|
show 14 more comments
2 Answers
2
active
oldest
votes
Use Intel SVML
I have no working numexpr
with Intel SVML, but numexpr
with working SVML should perform as good as Numba. The Numba
Benchmarks show quite the same behaviour without SVML, but perform much better with SVML.
Code
import numpy as np
import numba as nb
myarr = np.random.uniform(-1,1,[1100,1100])
@nb.njit(error_model="numpy",parallel=True)
def func(arr,div):
return np.exp( - 0.5 * (myarr / div)**2 )
Timings
#Core i7 4771
#Windows 7 x64
#Anaconda Python 3.5.5
#Numba 0.41 (compilation overhead excluded)
func(myarr,0.1) -> 3.6ms
func(myarr,0.001) -> 3.8ms
#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=True
func(myarr,0.1) -> 5.19ms
func(myarr,0.001) -> 12.0ms
#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=False
func(myarr,0.1) -> 16.7ms
func(myarr,0.001) -> 63.2ms
#Numpy (1.13.3), set OMP_NUM_THREADS=4
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 70.82ms
np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 12.58ms
#Numpy (1.13.3), set OMP_NUM_THREADS=1
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 189.4ms
np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 17.4ms
#Numexpr (2.6.8), no SVML, parallel
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->17.2ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->4.38ms
#Numexpr (2.6.8), no SVML, single threaded
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->50.85ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->13.9ms
add a comment |
This may produce denormalised numbers which slow down computations.
You may like to disable denormalized numbers using daz
library:
import daz
daz.set_daz()
More info: x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ):
To avoid serialization and performance issues due to denormals and underflow numbers, use the SSE and SSE2 instructions to set Flush-to-Zero and Denormals-Are-Zero modes within the hardware to enable highest performance for floating-point applications.
Note that in 64-bit mode floating point computations use SSE instructions, not x87.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415531%2fnumpy-performance-differences-depending-on-numerical-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use Intel SVML
I have no working numexpr
with Intel SVML, but numexpr
with working SVML should perform as good as Numba. The Numba
Benchmarks show quite the same behaviour without SVML, but perform much better with SVML.
Code
import numpy as np
import numba as nb
myarr = np.random.uniform(-1,1,[1100,1100])
@nb.njit(error_model="numpy",parallel=True)
def func(arr,div):
return np.exp( - 0.5 * (myarr / div)**2 )
Timings
#Core i7 4771
#Windows 7 x64
#Anaconda Python 3.5.5
#Numba 0.41 (compilation overhead excluded)
func(myarr,0.1) -> 3.6ms
func(myarr,0.001) -> 3.8ms
#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=True
func(myarr,0.1) -> 5.19ms
func(myarr,0.001) -> 12.0ms
#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=False
func(myarr,0.1) -> 16.7ms
func(myarr,0.001) -> 63.2ms
#Numpy (1.13.3), set OMP_NUM_THREADS=4
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 70.82ms
np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 12.58ms
#Numpy (1.13.3), set OMP_NUM_THREADS=1
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 189.4ms
np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 17.4ms
#Numexpr (2.6.8), no SVML, parallel
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->17.2ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->4.38ms
#Numexpr (2.6.8), no SVML, single threaded
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->50.85ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->13.9ms
add a comment |
Use Intel SVML
I have no working numexpr
with Intel SVML, but numexpr
with working SVML should perform as good as Numba. The Numba
Benchmarks show quite the same behaviour without SVML, but perform much better with SVML.
Code
import numpy as np
import numba as nb
myarr = np.random.uniform(-1,1,[1100,1100])
@nb.njit(error_model="numpy",parallel=True)
def func(arr,div):
return np.exp( - 0.5 * (myarr / div)**2 )
Timings
#Core i7 4771
#Windows 7 x64
#Anaconda Python 3.5.5
#Numba 0.41 (compilation overhead excluded)
func(myarr,0.1) -> 3.6ms
func(myarr,0.001) -> 3.8ms
#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=True
func(myarr,0.1) -> 5.19ms
func(myarr,0.001) -> 12.0ms
#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=False
func(myarr,0.1) -> 16.7ms
func(myarr,0.001) -> 63.2ms
#Numpy (1.13.3), set OMP_NUM_THREADS=4
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 70.82ms
np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 12.58ms
#Numpy (1.13.3), set OMP_NUM_THREADS=1
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 189.4ms
np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 17.4ms
#Numexpr (2.6.8), no SVML, parallel
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->17.2ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->4.38ms
#Numexpr (2.6.8), no SVML, single threaded
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->50.85ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->13.9ms
add a comment |
Use Intel SVML
I have no working numexpr
with Intel SVML, but numexpr
with working SVML should perform as good as Numba. The Numba
Benchmarks show quite the same behaviour without SVML, but perform much better with SVML.
Code
import numpy as np
import numba as nb
myarr = np.random.uniform(-1,1,[1100,1100])
@nb.njit(error_model="numpy",parallel=True)
def func(arr,div):
return np.exp( - 0.5 * (myarr / div)**2 )
Timings
#Core i7 4771
#Windows 7 x64
#Anaconda Python 3.5.5
#Numba 0.41 (compilation overhead excluded)
func(myarr,0.1) -> 3.6ms
func(myarr,0.001) -> 3.8ms
#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=True
func(myarr,0.1) -> 5.19ms
func(myarr,0.001) -> 12.0ms
#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=False
func(myarr,0.1) -> 16.7ms
func(myarr,0.001) -> 63.2ms
#Numpy (1.13.3), set OMP_NUM_THREADS=4
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 70.82ms
np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 12.58ms
#Numpy (1.13.3), set OMP_NUM_THREADS=1
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 189.4ms
np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 17.4ms
#Numexpr (2.6.8), no SVML, parallel
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->17.2ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->4.38ms
#Numexpr (2.6.8), no SVML, single threaded
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->50.85ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->13.9ms
Use Intel SVML
I have no working numexpr
with Intel SVML, but numexpr
with working SVML should perform as good as Numba. The Numba
Benchmarks show quite the same behaviour without SVML, but perform much better with SVML.
Code
import numpy as np
import numba as nb
myarr = np.random.uniform(-1,1,[1100,1100])
@nb.njit(error_model="numpy",parallel=True)
def func(arr,div):
return np.exp( - 0.5 * (myarr / div)**2 )
Timings
#Core i7 4771
#Windows 7 x64
#Anaconda Python 3.5.5
#Numba 0.41 (compilation overhead excluded)
func(myarr,0.1) -> 3.6ms
func(myarr,0.001) -> 3.8ms
#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=True
func(myarr,0.1) -> 5.19ms
func(myarr,0.001) -> 12.0ms
#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=False
func(myarr,0.1) -> 16.7ms
func(myarr,0.001) -> 63.2ms
#Numpy (1.13.3), set OMP_NUM_THREADS=4
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 70.82ms
np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 12.58ms
#Numpy (1.13.3), set OMP_NUM_THREADS=1
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 189.4ms
np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 17.4ms
#Numexpr (2.6.8), no SVML, parallel
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->17.2ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->4.38ms
#Numexpr (2.6.8), no SVML, single threaded
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->50.85ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->13.9ms
edited Nov 22 '18 at 13:35
answered Nov 22 '18 at 11:25
max9111
2,2661612
2,2661612
add a comment |
add a comment |
This may produce denormalised numbers which slow down computations.
You may like to disable denormalized numbers using daz
library:
import daz
daz.set_daz()
More info: x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ):
To avoid serialization and performance issues due to denormals and underflow numbers, use the SSE and SSE2 instructions to set Flush-to-Zero and Denormals-Are-Zero modes within the hardware to enable highest performance for floating-point applications.
Note that in 64-bit mode floating point computations use SSE instructions, not x87.
add a comment |
This may produce denormalised numbers which slow down computations.
You may like to disable denormalized numbers using daz
library:
import daz
daz.set_daz()
More info: x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ):
To avoid serialization and performance issues due to denormals and underflow numbers, use the SSE and SSE2 instructions to set Flush-to-Zero and Denormals-Are-Zero modes within the hardware to enable highest performance for floating-point applications.
Note that in 64-bit mode floating point computations use SSE instructions, not x87.
add a comment |
This may produce denormalised numbers which slow down computations.
You may like to disable denormalized numbers using daz
library:
import daz
daz.set_daz()
More info: x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ):
To avoid serialization and performance issues due to denormals and underflow numbers, use the SSE and SSE2 instructions to set Flush-to-Zero and Denormals-Are-Zero modes within the hardware to enable highest performance for floating-point applications.
Note that in 64-bit mode floating point computations use SSE instructions, not x87.
This may produce denormalised numbers which slow down computations.
You may like to disable denormalized numbers using daz
library:
import daz
daz.set_daz()
More info: x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ):
To avoid serialization and performance issues due to denormals and underflow numbers, use the SSE and SSE2 instructions to set Flush-to-Zero and Denormals-Are-Zero modes within the hardware to enable highest performance for floating-point applications.
Note that in 64-bit mode floating point computations use SSE instructions, not x87.
edited Nov 24 '18 at 19:09
answered Nov 24 '18 at 14:21
Maxim Egorushkin
85.4k1199182
85.4k1199182
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415531%2fnumpy-performance-differences-depending-on-numerical-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
OK, on Windows, NumPy 1.14.3, Python 3.6.0, I see 97.7ms vs 47.7ms.
– jpp
Nov 21 '18 at 15:53
3
In my system,
exp
of large (negative) numbers are slower:exp(-1)
is faster thanexp(-1000)
. So it probably comes down to some slower covergence of theexp
algorithm with large numbers– Brenlla
Nov 21 '18 at 15:55
1
@MattMessersmith Reasonable explanation, but nope.
exp(1)
is still much faster thanexp(1000)
– Brenlla
Nov 21 '18 at 16:23
2
My first guess (based on the title) was that there are some denormalized numbers involved - see stackoverflow.com/questions/36781881/… I didn't verify this in all depth for the specific numpy/python setup, but they can be awfully slow...
– Marco13
Nov 21 '18 at 19:53
2
@Marco13, yes, in fact,
exp(-708)
is a normal float, andexp(-709)
is denormal, and that's where I see (on Mac OS X) a big jump in execution time. Underflow to zero doesn't occur until aboutexp(-746)
.– Warren Weckesser
Nov 21 '18 at 20:48