Numpy performance differences depending on numerical values












8














I found a strange performance difference while evaluating an expression in Numpy.



I executed the following code:



import numpy as np
myarr = np.random.uniform(-1,1,[1100,1100])


and then



%timeit np.exp( - 0.5 * (myarr / 0.001)**2 )
>> 184 ms ± 301 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


and



%timeit np.exp( - 0.5 * (myarr / 0.1)**2 )
>> 12.3 ms ± 34.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


That's an almost 15x faster computation in the second case! Note that the only difference is the factor being 0.1 or 0.001.



What's the reason for this behaviour? Can I change something to make the first calculation as fast as the second?










share|improve this question


















  • 1




    OK, on Windows, NumPy 1.14.3, Python 3.6.0, I see 97.7ms vs 47.7ms.
    – jpp
    Nov 21 '18 at 15:53






  • 3




    In my system, exp of large (negative) numbers are slower: exp(-1) is faster than exp(-1000). So it probably comes down to some slower covergence of the exp algorithm with large numbers
    – Brenlla
    Nov 21 '18 at 15:55






  • 1




    @MattMessersmith Reasonable explanation, but nope. exp(1) is still much faster than exp(1000)
    – Brenlla
    Nov 21 '18 at 16:23






  • 2




    My first guess (based on the title) was that there are some denormalized numbers involved - see stackoverflow.com/questions/36781881/… I didn't verify this in all depth for the specific numpy/python setup, but they can be awfully slow...
    – Marco13
    Nov 21 '18 at 19:53






  • 2




    @Marco13, yes, in fact, exp(-708) is a normal float, and exp(-709) is denormal, and that's where I see (on Mac OS X) a big jump in execution time. Underflow to zero doesn't occur until about exp(-746).
    – Warren Weckesser
    Nov 21 '18 at 20:48
















8














I found a strange performance difference while evaluating an expression in Numpy.



I executed the following code:



import numpy as np
myarr = np.random.uniform(-1,1,[1100,1100])


and then



%timeit np.exp( - 0.5 * (myarr / 0.001)**2 )
>> 184 ms ± 301 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


and



%timeit np.exp( - 0.5 * (myarr / 0.1)**2 )
>> 12.3 ms ± 34.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


That's an almost 15x faster computation in the second case! Note that the only difference is the factor being 0.1 or 0.001.



What's the reason for this behaviour? Can I change something to make the first calculation as fast as the second?










share|improve this question


















  • 1




    OK, on Windows, NumPy 1.14.3, Python 3.6.0, I see 97.7ms vs 47.7ms.
    – jpp
    Nov 21 '18 at 15:53






  • 3




    In my system, exp of large (negative) numbers are slower: exp(-1) is faster than exp(-1000). So it probably comes down to some slower covergence of the exp algorithm with large numbers
    – Brenlla
    Nov 21 '18 at 15:55






  • 1




    @MattMessersmith Reasonable explanation, but nope. exp(1) is still much faster than exp(1000)
    – Brenlla
    Nov 21 '18 at 16:23






  • 2




    My first guess (based on the title) was that there are some denormalized numbers involved - see stackoverflow.com/questions/36781881/… I didn't verify this in all depth for the specific numpy/python setup, but they can be awfully slow...
    – Marco13
    Nov 21 '18 at 19:53






  • 2




    @Marco13, yes, in fact, exp(-708) is a normal float, and exp(-709) is denormal, and that's where I see (on Mac OS X) a big jump in execution time. Underflow to zero doesn't occur until about exp(-746).
    – Warren Weckesser
    Nov 21 '18 at 20:48














8












8








8


3





I found a strange performance difference while evaluating an expression in Numpy.



I executed the following code:



import numpy as np
myarr = np.random.uniform(-1,1,[1100,1100])


and then



%timeit np.exp( - 0.5 * (myarr / 0.001)**2 )
>> 184 ms ± 301 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


and



%timeit np.exp( - 0.5 * (myarr / 0.1)**2 )
>> 12.3 ms ± 34.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


That's an almost 15x faster computation in the second case! Note that the only difference is the factor being 0.1 or 0.001.



What's the reason for this behaviour? Can I change something to make the first calculation as fast as the second?










share|improve this question













I found a strange performance difference while evaluating an expression in Numpy.



I executed the following code:



import numpy as np
myarr = np.random.uniform(-1,1,[1100,1100])


and then



%timeit np.exp( - 0.5 * (myarr / 0.001)**2 )
>> 184 ms ± 301 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


and



%timeit np.exp( - 0.5 * (myarr / 0.1)**2 )
>> 12.3 ms ± 34.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


That's an almost 15x faster computation in the second case! Note that the only difference is the factor being 0.1 or 0.001.



What's the reason for this behaviour? Can I change something to make the first calculation as fast as the second?







python performance numpy






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 21 '18 at 15:36









Ethunxxx

379515




379515








  • 1




    OK, on Windows, NumPy 1.14.3, Python 3.6.0, I see 97.7ms vs 47.7ms.
    – jpp
    Nov 21 '18 at 15:53






  • 3




    In my system, exp of large (negative) numbers are slower: exp(-1) is faster than exp(-1000). So it probably comes down to some slower covergence of the exp algorithm with large numbers
    – Brenlla
    Nov 21 '18 at 15:55






  • 1




    @MattMessersmith Reasonable explanation, but nope. exp(1) is still much faster than exp(1000)
    – Brenlla
    Nov 21 '18 at 16:23






  • 2




    My first guess (based on the title) was that there are some denormalized numbers involved - see stackoverflow.com/questions/36781881/… I didn't verify this in all depth for the specific numpy/python setup, but they can be awfully slow...
    – Marco13
    Nov 21 '18 at 19:53






  • 2




    @Marco13, yes, in fact, exp(-708) is a normal float, and exp(-709) is denormal, and that's where I see (on Mac OS X) a big jump in execution time. Underflow to zero doesn't occur until about exp(-746).
    – Warren Weckesser
    Nov 21 '18 at 20:48














  • 1




    OK, on Windows, NumPy 1.14.3, Python 3.6.0, I see 97.7ms vs 47.7ms.
    – jpp
    Nov 21 '18 at 15:53






  • 3




    In my system, exp of large (negative) numbers are slower: exp(-1) is faster than exp(-1000). So it probably comes down to some slower covergence of the exp algorithm with large numbers
    – Brenlla
    Nov 21 '18 at 15:55






  • 1




    @MattMessersmith Reasonable explanation, but nope. exp(1) is still much faster than exp(1000)
    – Brenlla
    Nov 21 '18 at 16:23






  • 2




    My first guess (based on the title) was that there are some denormalized numbers involved - see stackoverflow.com/questions/36781881/… I didn't verify this in all depth for the specific numpy/python setup, but they can be awfully slow...
    – Marco13
    Nov 21 '18 at 19:53






  • 2




    @Marco13, yes, in fact, exp(-708) is a normal float, and exp(-709) is denormal, and that's where I see (on Mac OS X) a big jump in execution time. Underflow to zero doesn't occur until about exp(-746).
    – Warren Weckesser
    Nov 21 '18 at 20:48








1




1




OK, on Windows, NumPy 1.14.3, Python 3.6.0, I see 97.7ms vs 47.7ms.
– jpp
Nov 21 '18 at 15:53




OK, on Windows, NumPy 1.14.3, Python 3.6.0, I see 97.7ms vs 47.7ms.
– jpp
Nov 21 '18 at 15:53




3




3




In my system, exp of large (negative) numbers are slower: exp(-1) is faster than exp(-1000). So it probably comes down to some slower covergence of the exp algorithm with large numbers
– Brenlla
Nov 21 '18 at 15:55




In my system, exp of large (negative) numbers are slower: exp(-1) is faster than exp(-1000). So it probably comes down to some slower covergence of the exp algorithm with large numbers
– Brenlla
Nov 21 '18 at 15:55




1




1




@MattMessersmith Reasonable explanation, but nope. exp(1) is still much faster than exp(1000)
– Brenlla
Nov 21 '18 at 16:23




@MattMessersmith Reasonable explanation, but nope. exp(1) is still much faster than exp(1000)
– Brenlla
Nov 21 '18 at 16:23




2




2




My first guess (based on the title) was that there are some denormalized numbers involved - see stackoverflow.com/questions/36781881/… I didn't verify this in all depth for the specific numpy/python setup, but they can be awfully slow...
– Marco13
Nov 21 '18 at 19:53




My first guess (based on the title) was that there are some denormalized numbers involved - see stackoverflow.com/questions/36781881/… I didn't verify this in all depth for the specific numpy/python setup, but they can be awfully slow...
– Marco13
Nov 21 '18 at 19:53




2




2




@Marco13, yes, in fact, exp(-708) is a normal float, and exp(-709) is denormal, and that's where I see (on Mac OS X) a big jump in execution time. Underflow to zero doesn't occur until about exp(-746).
– Warren Weckesser
Nov 21 '18 at 20:48




@Marco13, yes, in fact, exp(-708) is a normal float, and exp(-709) is denormal, and that's where I see (on Mac OS X) a big jump in execution time. Underflow to zero doesn't occur until about exp(-746).
– Warren Weckesser
Nov 21 '18 at 20:48












2 Answers
2






active

oldest

votes


















1














Use Intel SVML



I have no working numexpr with Intel SVML, but numexpr with working SVML should perform as good as Numba. The Numba Benchmarks show quite the same behaviour without SVML, but perform much better with SVML.



Code



import numpy as np
import numba as nb

myarr = np.random.uniform(-1,1,[1100,1100])

@nb.njit(error_model="numpy",parallel=True)
def func(arr,div):
return np.exp( - 0.5 * (myarr / div)**2 )


Timings



#Core i7 4771
#Windows 7 x64
#Anaconda Python 3.5.5
#Numba 0.41 (compilation overhead excluded)
func(myarr,0.1) -> 3.6ms
func(myarr,0.001) -> 3.8ms

#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=True
func(myarr,0.1) -> 5.19ms
func(myarr,0.001) -> 12.0ms

#Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=False
func(myarr,0.1) -> 16.7ms
func(myarr,0.001) -> 63.2ms

#Numpy (1.13.3), set OMP_NUM_THREADS=4
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 70.82ms
np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 12.58ms

#Numpy (1.13.3), set OMP_NUM_THREADS=1
np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 189.4ms
np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 17.4ms

#Numexpr (2.6.8), no SVML, parallel
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->17.2ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->4.38ms

#Numexpr (2.6.8), no SVML, single threaded
ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->50.85ms
ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->13.9ms





share|improve this answer































    1














    This may produce denormalised numbers which slow down computations.



    You may like to disable denormalized numbers using daz library:



    import daz
    daz.set_daz()


    More info: x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ):




    To avoid serialization and performance issues due to denormals and underflow numbers, use the SSE and SSE2 instructions to set Flush-to-Zero and Denormals-Are-Zero modes within the hardware to enable highest performance for floating-point applications.




    Note that in 64-bit mode floating point computations use SSE instructions, not x87.






    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415531%2fnumpy-performance-differences-depending-on-numerical-values%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      Use Intel SVML



      I have no working numexpr with Intel SVML, but numexpr with working SVML should perform as good as Numba. The Numba Benchmarks show quite the same behaviour without SVML, but perform much better with SVML.



      Code



      import numpy as np
      import numba as nb

      myarr = np.random.uniform(-1,1,[1100,1100])

      @nb.njit(error_model="numpy",parallel=True)
      def func(arr,div):
      return np.exp( - 0.5 * (myarr / div)**2 )


      Timings



      #Core i7 4771
      #Windows 7 x64
      #Anaconda Python 3.5.5
      #Numba 0.41 (compilation overhead excluded)
      func(myarr,0.1) -> 3.6ms
      func(myarr,0.001) -> 3.8ms

      #Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=True
      func(myarr,0.1) -> 5.19ms
      func(myarr,0.001) -> 12.0ms

      #Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=False
      func(myarr,0.1) -> 16.7ms
      func(myarr,0.001) -> 63.2ms

      #Numpy (1.13.3), set OMP_NUM_THREADS=4
      np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 70.82ms
      np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 12.58ms

      #Numpy (1.13.3), set OMP_NUM_THREADS=1
      np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 189.4ms
      np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 17.4ms

      #Numexpr (2.6.8), no SVML, parallel
      ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->17.2ms
      ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->4.38ms

      #Numexpr (2.6.8), no SVML, single threaded
      ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->50.85ms
      ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->13.9ms





      share|improve this answer




























        1














        Use Intel SVML



        I have no working numexpr with Intel SVML, but numexpr with working SVML should perform as good as Numba. The Numba Benchmarks show quite the same behaviour without SVML, but perform much better with SVML.



        Code



        import numpy as np
        import numba as nb

        myarr = np.random.uniform(-1,1,[1100,1100])

        @nb.njit(error_model="numpy",parallel=True)
        def func(arr,div):
        return np.exp( - 0.5 * (myarr / div)**2 )


        Timings



        #Core i7 4771
        #Windows 7 x64
        #Anaconda Python 3.5.5
        #Numba 0.41 (compilation overhead excluded)
        func(myarr,0.1) -> 3.6ms
        func(myarr,0.001) -> 3.8ms

        #Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=True
        func(myarr,0.1) -> 5.19ms
        func(myarr,0.001) -> 12.0ms

        #Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=False
        func(myarr,0.1) -> 16.7ms
        func(myarr,0.001) -> 63.2ms

        #Numpy (1.13.3), set OMP_NUM_THREADS=4
        np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 70.82ms
        np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 12.58ms

        #Numpy (1.13.3), set OMP_NUM_THREADS=1
        np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 189.4ms
        np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 17.4ms

        #Numexpr (2.6.8), no SVML, parallel
        ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->17.2ms
        ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->4.38ms

        #Numexpr (2.6.8), no SVML, single threaded
        ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->50.85ms
        ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->13.9ms





        share|improve this answer


























          1












          1








          1






          Use Intel SVML



          I have no working numexpr with Intel SVML, but numexpr with working SVML should perform as good as Numba. The Numba Benchmarks show quite the same behaviour without SVML, but perform much better with SVML.



          Code



          import numpy as np
          import numba as nb

          myarr = np.random.uniform(-1,1,[1100,1100])

          @nb.njit(error_model="numpy",parallel=True)
          def func(arr,div):
          return np.exp( - 0.5 * (myarr / div)**2 )


          Timings



          #Core i7 4771
          #Windows 7 x64
          #Anaconda Python 3.5.5
          #Numba 0.41 (compilation overhead excluded)
          func(myarr,0.1) -> 3.6ms
          func(myarr,0.001) -> 3.8ms

          #Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=True
          func(myarr,0.1) -> 5.19ms
          func(myarr,0.001) -> 12.0ms

          #Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=False
          func(myarr,0.1) -> 16.7ms
          func(myarr,0.001) -> 63.2ms

          #Numpy (1.13.3), set OMP_NUM_THREADS=4
          np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 70.82ms
          np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 12.58ms

          #Numpy (1.13.3), set OMP_NUM_THREADS=1
          np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 189.4ms
          np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 17.4ms

          #Numexpr (2.6.8), no SVML, parallel
          ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->17.2ms
          ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->4.38ms

          #Numexpr (2.6.8), no SVML, single threaded
          ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->50.85ms
          ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->13.9ms





          share|improve this answer














          Use Intel SVML



          I have no working numexpr with Intel SVML, but numexpr with working SVML should perform as good as Numba. The Numba Benchmarks show quite the same behaviour without SVML, but perform much better with SVML.



          Code



          import numpy as np
          import numba as nb

          myarr = np.random.uniform(-1,1,[1100,1100])

          @nb.njit(error_model="numpy",parallel=True)
          def func(arr,div):
          return np.exp( - 0.5 * (myarr / div)**2 )


          Timings



          #Core i7 4771
          #Windows 7 x64
          #Anaconda Python 3.5.5
          #Numba 0.41 (compilation overhead excluded)
          func(myarr,0.1) -> 3.6ms
          func(myarr,0.001) -> 3.8ms

          #Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=True
          func(myarr,0.1) -> 5.19ms
          func(myarr,0.001) -> 12.0ms

          #Numba (set NUMBA_DISABLE_INTEL_SVML=1), parallel=False
          func(myarr,0.1) -> 16.7ms
          func(myarr,0.001) -> 63.2ms

          #Numpy (1.13.3), set OMP_NUM_THREADS=4
          np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 70.82ms
          np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 12.58ms

          #Numpy (1.13.3), set OMP_NUM_THREADS=1
          np.exp( - 0.5 * (myarr / 0.001)**2 ) -> 189.4ms
          np.exp( - 0.5 * (myarr / 0.1)**2 ) -> 17.4ms

          #Numexpr (2.6.8), no SVML, parallel
          ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->17.2ms
          ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->4.38ms

          #Numexpr (2.6.8), no SVML, single threaded
          ne.evaluate("exp( - 0.5 * (myarr / 0.001)**2 )") ->50.85ms
          ne.evaluate("exp( - 0.5 * (myarr / 0.1)**2 )") ->13.9ms






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 22 '18 at 13:35

























          answered Nov 22 '18 at 11:25









          max9111

          2,2661612




          2,2661612

























              1














              This may produce denormalised numbers which slow down computations.



              You may like to disable denormalized numbers using daz library:



              import daz
              daz.set_daz()


              More info: x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ):




              To avoid serialization and performance issues due to denormals and underflow numbers, use the SSE and SSE2 instructions to set Flush-to-Zero and Denormals-Are-Zero modes within the hardware to enable highest performance for floating-point applications.




              Note that in 64-bit mode floating point computations use SSE instructions, not x87.






              share|improve this answer




























                1














                This may produce denormalised numbers which slow down computations.



                You may like to disable denormalized numbers using daz library:



                import daz
                daz.set_daz()


                More info: x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ):




                To avoid serialization and performance issues due to denormals and underflow numbers, use the SSE and SSE2 instructions to set Flush-to-Zero and Denormals-Are-Zero modes within the hardware to enable highest performance for floating-point applications.




                Note that in 64-bit mode floating point computations use SSE instructions, not x87.






                share|improve this answer


























                  1












                  1








                  1






                  This may produce denormalised numbers which slow down computations.



                  You may like to disable denormalized numbers using daz library:



                  import daz
                  daz.set_daz()


                  More info: x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ):




                  To avoid serialization and performance issues due to denormals and underflow numbers, use the SSE and SSE2 instructions to set Flush-to-Zero and Denormals-Are-Zero modes within the hardware to enable highest performance for floating-point applications.




                  Note that in 64-bit mode floating point computations use SSE instructions, not x87.






                  share|improve this answer














                  This may produce denormalised numbers which slow down computations.



                  You may like to disable denormalized numbers using daz library:



                  import daz
                  daz.set_daz()


                  More info: x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ):




                  To avoid serialization and performance issues due to denormals and underflow numbers, use the SSE and SSE2 instructions to set Flush-to-Zero and Denormals-Are-Zero modes within the hardware to enable highest performance for floating-point applications.




                  Note that in 64-bit mode floating point computations use SSE instructions, not x87.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 24 '18 at 19:09

























                  answered Nov 24 '18 at 14:21









                  Maxim Egorushkin

                  85.4k1199182




                  85.4k1199182






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53415531%2fnumpy-performance-differences-depending-on-numerical-values%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      404 Error Contact Form 7 ajax form submitting

                      How to know if a Active Directory user can login interactively

                      TypeError: fit_transform() missing 1 required positional argument: 'X'