What does the C++ compiler do to ensure that different but adjacent memory locations are safe to be used on...











up vote
25
down vote

favorite












Lets say I have a struct:



struct Foo {
char a; // read and written to by thread 1 only
char b; // read and written to by thread 2 only
};


Now from what I understand, the C++ standard guarantees the safety of the above when two threads operate on the two different memory locations.



I would think though that, since char a and char b, fall within the same cache line, that the compiler has to do extra syncing.



What exactly happens here?










share|improve this question


















  • 8




    On a lot of platforms (for example, x86), it doesn't have to do anything. It just works (it means that the HW does the necessary extra stuff).
    – geza
    13 hours ago










  • Would that imply additional latency then incurred even if by the x86 hardware?
    – Nathan Doromal
    13 hours ago






  • 5




    Yes. But the exact hit could vary on different generations/vendors of CPU. Do a search on "false sharing".
    – geza
    12 hours ago






  • 3




    This is handled by the hardware, not the compiler, as far as I am aware. This is called false sharing
    – NathanOliver
    12 hours ago










  • I think the only CPUs that C++ has actually been implemented on that the compiler would have to do anything special to support the C++ memory model are early Alpha CPUs which lacked instructions that could atomically set a single byte (or 16-bit) memory location. See Peter Cordes answer to a related question for details: stackoverflow.com/a/46818162/3826372 As far as know there's no compiler implementations that have been updated to support the C++11 memory model on these long obsolete Alpha CPUs.
    – Ross Ridge
    8 hours ago

















up vote
25
down vote

favorite












Lets say I have a struct:



struct Foo {
char a; // read and written to by thread 1 only
char b; // read and written to by thread 2 only
};


Now from what I understand, the C++ standard guarantees the safety of the above when two threads operate on the two different memory locations.



I would think though that, since char a and char b, fall within the same cache line, that the compiler has to do extra syncing.



What exactly happens here?










share|improve this question


















  • 8




    On a lot of platforms (for example, x86), it doesn't have to do anything. It just works (it means that the HW does the necessary extra stuff).
    – geza
    13 hours ago










  • Would that imply additional latency then incurred even if by the x86 hardware?
    – Nathan Doromal
    13 hours ago






  • 5




    Yes. But the exact hit could vary on different generations/vendors of CPU. Do a search on "false sharing".
    – geza
    12 hours ago






  • 3




    This is handled by the hardware, not the compiler, as far as I am aware. This is called false sharing
    – NathanOliver
    12 hours ago










  • I think the only CPUs that C++ has actually been implemented on that the compiler would have to do anything special to support the C++ memory model are early Alpha CPUs which lacked instructions that could atomically set a single byte (or 16-bit) memory location. See Peter Cordes answer to a related question for details: stackoverflow.com/a/46818162/3826372 As far as know there's no compiler implementations that have been updated to support the C++11 memory model on these long obsolete Alpha CPUs.
    – Ross Ridge
    8 hours ago















up vote
25
down vote

favorite









up vote
25
down vote

favorite











Lets say I have a struct:



struct Foo {
char a; // read and written to by thread 1 only
char b; // read and written to by thread 2 only
};


Now from what I understand, the C++ standard guarantees the safety of the above when two threads operate on the two different memory locations.



I would think though that, since char a and char b, fall within the same cache line, that the compiler has to do extra syncing.



What exactly happens here?










share|improve this question













Lets say I have a struct:



struct Foo {
char a; // read and written to by thread 1 only
char b; // read and written to by thread 2 only
};


Now from what I understand, the C++ standard guarantees the safety of the above when two threads operate on the two different memory locations.



I would think though that, since char a and char b, fall within the same cache line, that the compiler has to do extra syncing.



What exactly happens here?







c++ multithreading thread-safety






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 13 hours ago









Nathan Doromal

1,42911420




1,42911420








  • 8




    On a lot of platforms (for example, x86), it doesn't have to do anything. It just works (it means that the HW does the necessary extra stuff).
    – geza
    13 hours ago










  • Would that imply additional latency then incurred even if by the x86 hardware?
    – Nathan Doromal
    13 hours ago






  • 5




    Yes. But the exact hit could vary on different generations/vendors of CPU. Do a search on "false sharing".
    – geza
    12 hours ago






  • 3




    This is handled by the hardware, not the compiler, as far as I am aware. This is called false sharing
    – NathanOliver
    12 hours ago










  • I think the only CPUs that C++ has actually been implemented on that the compiler would have to do anything special to support the C++ memory model are early Alpha CPUs which lacked instructions that could atomically set a single byte (or 16-bit) memory location. See Peter Cordes answer to a related question for details: stackoverflow.com/a/46818162/3826372 As far as know there's no compiler implementations that have been updated to support the C++11 memory model on these long obsolete Alpha CPUs.
    – Ross Ridge
    8 hours ago
















  • 8




    On a lot of platforms (for example, x86), it doesn't have to do anything. It just works (it means that the HW does the necessary extra stuff).
    – geza
    13 hours ago










  • Would that imply additional latency then incurred even if by the x86 hardware?
    – Nathan Doromal
    13 hours ago






  • 5




    Yes. But the exact hit could vary on different generations/vendors of CPU. Do a search on "false sharing".
    – geza
    12 hours ago






  • 3




    This is handled by the hardware, not the compiler, as far as I am aware. This is called false sharing
    – NathanOliver
    12 hours ago










  • I think the only CPUs that C++ has actually been implemented on that the compiler would have to do anything special to support the C++ memory model are early Alpha CPUs which lacked instructions that could atomically set a single byte (or 16-bit) memory location. See Peter Cordes answer to a related question for details: stackoverflow.com/a/46818162/3826372 As far as know there's no compiler implementations that have been updated to support the C++11 memory model on these long obsolete Alpha CPUs.
    – Ross Ridge
    8 hours ago










8




8




On a lot of platforms (for example, x86), it doesn't have to do anything. It just works (it means that the HW does the necessary extra stuff).
– geza
13 hours ago




On a lot of platforms (for example, x86), it doesn't have to do anything. It just works (it means that the HW does the necessary extra stuff).
– geza
13 hours ago












Would that imply additional latency then incurred even if by the x86 hardware?
– Nathan Doromal
13 hours ago




Would that imply additional latency then incurred even if by the x86 hardware?
– Nathan Doromal
13 hours ago




5




5




Yes. But the exact hit could vary on different generations/vendors of CPU. Do a search on "false sharing".
– geza
12 hours ago




Yes. But the exact hit could vary on different generations/vendors of CPU. Do a search on "false sharing".
– geza
12 hours ago




3




3




This is handled by the hardware, not the compiler, as far as I am aware. This is called false sharing
– NathanOliver
12 hours ago




This is handled by the hardware, not the compiler, as far as I am aware. This is called false sharing
– NathanOliver
12 hours ago












I think the only CPUs that C++ has actually been implemented on that the compiler would have to do anything special to support the C++ memory model are early Alpha CPUs which lacked instructions that could atomically set a single byte (or 16-bit) memory location. See Peter Cordes answer to a related question for details: stackoverflow.com/a/46818162/3826372 As far as know there's no compiler implementations that have been updated to support the C++11 memory model on these long obsolete Alpha CPUs.
– Ross Ridge
8 hours ago






I think the only CPUs that C++ has actually been implemented on that the compiler would have to do anything special to support the C++ memory model are early Alpha CPUs which lacked instructions that could atomically set a single byte (or 16-bit) memory location. See Peter Cordes answer to a related question for details: stackoverflow.com/a/46818162/3826372 As far as know there's no compiler implementations that have been updated to support the C++11 memory model on these long obsolete Alpha CPUs.
– Ross Ridge
8 hours ago














3 Answers
3






active

oldest

votes

















up vote
21
down vote













This is hardware-dependent. On hardware I am familiar with, C++ doesn't have to do anything special, because from hardware perspective accessing different bytes even on a cached line is handled 'transparently'. From the hardware, this situation is not really different from



char a[2];
// or
char a, b;


In the cases above, we are talking about two adjacent objects, which are guaranteed to be independently accessible.



However, I've put 'transparently' in quotes for a reason. When you really have a case like that, you could be suffering (performance-wise) from a 'false sharing' - which happens when two (or more) threads access adjacent memory simultaneously and it ends up being cached in several CPU's caches. This leads to constant cache invalidation. In the real life, care should be taken to prevent this from happening when possible.






share|improve this answer























  • care should be taken to prevent this from happening when possible. How would you suggest one going about doing that?
    – ArtB
    8 hours ago






  • 3




    @ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
    – SergeyA
    8 hours ago










  • @ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
    – davidbak
    6 hours ago










  • @ArtB: C++17 provides interference sizes to help guide such design.
    – Davis Herring
    1 hour ago


















up vote
10
down vote













As others have explained, nothing in particular on common hardware. However, there is a catch: The compiler must refrain from performing certain optimizations, unless it can prove that other threads don't access the memory locations in question, e.g.:



std::array<std::uint8_t, 8u> c;

void f()
{
c[0] ^= 0xfa;
c[3] ^= 0x10;
c[6] ^= 0x8b;
c[7] ^= 0x92;
}


Here, in a single-threaded memory model, the compiler could emit code like the following (pseudo-assembly; assumes little-endian hardware):



load r0, *(std::uint64_t *) &c[0]
xor r0, 0x928b0000100000fa
store r0, *(std::uint64_t *) &c[0]


This is likely to be faster on common hardware than xor'ing the individual bytes. However, it reads and writes the unaffected (and unmentioned) elements of c at indices 1, 2, 4 and 5. If other threads are writing to these memory locations concurrently, these changes could be overwritten.



For this reason, optimizations like these are often unusable in a multi-threaded memory model. As long as the compiler performs only loads and stores of matching length, or merges accesses only when there is no gap (e.g. the accesses to c[6] and c[7] can still be merged), the hardware commonly already provides the necessary guarantees for correct execution.



(That said, there are/have been some architectures with weak and counterintuitive memory order guarantees, e.g. DEC Alpha does not track pointers as a data dependency in the way that other architectures do, so it is necessary to introduce an explicit memory barrier in some cases, in low level code. There is a somewhat well-known little rant by Linus Torvalds on this issue. However, a conforming C++ implementation is expected to shield you from such issues.)






share|improve this answer




























    up vote
    -1
    down vote













    The compiler does nothing about it.
    Lets start by saying that it doesn't matter if the memory locations are adjacent or at totally different locations! The memory must ensure that it delivers the correct data for a given address.
    Cache lines are not how data is provided to the process, instead the CPU provides the address and number of bytes it intends to read, and the memory provides it. (if we have a cache hit from the cache, otherwise RAM)
    problem arises only when two threads are trying to access data at the exactly same address, which must be dealt with caution (using Semaphores etc.)






    share|improve this answer





















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














       

      draft saved


      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53376806%2fwhat-does-the-c-compiler-do-to-ensure-that-different-but-adjacent-memory-locat%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      21
      down vote













      This is hardware-dependent. On hardware I am familiar with, C++ doesn't have to do anything special, because from hardware perspective accessing different bytes even on a cached line is handled 'transparently'. From the hardware, this situation is not really different from



      char a[2];
      // or
      char a, b;


      In the cases above, we are talking about two adjacent objects, which are guaranteed to be independently accessible.



      However, I've put 'transparently' in quotes for a reason. When you really have a case like that, you could be suffering (performance-wise) from a 'false sharing' - which happens when two (or more) threads access adjacent memory simultaneously and it ends up being cached in several CPU's caches. This leads to constant cache invalidation. In the real life, care should be taken to prevent this from happening when possible.






      share|improve this answer























      • care should be taken to prevent this from happening when possible. How would you suggest one going about doing that?
        – ArtB
        8 hours ago






      • 3




        @ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
        – SergeyA
        8 hours ago










      • @ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
        – davidbak
        6 hours ago










      • @ArtB: C++17 provides interference sizes to help guide such design.
        – Davis Herring
        1 hour ago















      up vote
      21
      down vote













      This is hardware-dependent. On hardware I am familiar with, C++ doesn't have to do anything special, because from hardware perspective accessing different bytes even on a cached line is handled 'transparently'. From the hardware, this situation is not really different from



      char a[2];
      // or
      char a, b;


      In the cases above, we are talking about two adjacent objects, which are guaranteed to be independently accessible.



      However, I've put 'transparently' in quotes for a reason. When you really have a case like that, you could be suffering (performance-wise) from a 'false sharing' - which happens when two (or more) threads access adjacent memory simultaneously and it ends up being cached in several CPU's caches. This leads to constant cache invalidation. In the real life, care should be taken to prevent this from happening when possible.






      share|improve this answer























      • care should be taken to prevent this from happening when possible. How would you suggest one going about doing that?
        – ArtB
        8 hours ago






      • 3




        @ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
        – SergeyA
        8 hours ago










      • @ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
        – davidbak
        6 hours ago










      • @ArtB: C++17 provides interference sizes to help guide such design.
        – Davis Herring
        1 hour ago













      up vote
      21
      down vote










      up vote
      21
      down vote









      This is hardware-dependent. On hardware I am familiar with, C++ doesn't have to do anything special, because from hardware perspective accessing different bytes even on a cached line is handled 'transparently'. From the hardware, this situation is not really different from



      char a[2];
      // or
      char a, b;


      In the cases above, we are talking about two adjacent objects, which are guaranteed to be independently accessible.



      However, I've put 'transparently' in quotes for a reason. When you really have a case like that, you could be suffering (performance-wise) from a 'false sharing' - which happens when two (or more) threads access adjacent memory simultaneously and it ends up being cached in several CPU's caches. This leads to constant cache invalidation. In the real life, care should be taken to prevent this from happening when possible.






      share|improve this answer














      This is hardware-dependent. On hardware I am familiar with, C++ doesn't have to do anything special, because from hardware perspective accessing different bytes even on a cached line is handled 'transparently'. From the hardware, this situation is not really different from



      char a[2];
      // or
      char a, b;


      In the cases above, we are talking about two adjacent objects, which are guaranteed to be independently accessible.



      However, I've put 'transparently' in quotes for a reason. When you really have a case like that, you could be suffering (performance-wise) from a 'false sharing' - which happens when two (or more) threads access adjacent memory simultaneously and it ends up being cached in several CPU's caches. This leads to constant cache invalidation. In the real life, care should be taken to prevent this from happening when possible.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 6 hours ago

























      answered 12 hours ago









      SergeyA

      39.9k53581




      39.9k53581












      • care should be taken to prevent this from happening when possible. How would you suggest one going about doing that?
        – ArtB
        8 hours ago






      • 3




        @ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
        – SergeyA
        8 hours ago










      • @ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
        – davidbak
        6 hours ago










      • @ArtB: C++17 provides interference sizes to help guide such design.
        – Davis Herring
        1 hour ago


















      • care should be taken to prevent this from happening when possible. How would you suggest one going about doing that?
        – ArtB
        8 hours ago






      • 3




        @ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
        – SergeyA
        8 hours ago










      • @ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
        – davidbak
        6 hours ago










      • @ArtB: C++17 provides interference sizes to help guide such design.
        – Davis Herring
        1 hour ago
















      care should be taken to prevent this from happening when possible. How would you suggest one going about doing that?
      – ArtB
      8 hours ago




      care should be taken to prevent this from happening when possible. How would you suggest one going about doing that?
      – ArtB
      8 hours ago




      3




      3




      @ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
      – SergeyA
      8 hours ago




      @ArtB there is no hard and fast rule. Designing program correctly from the scratch is always the best approach. You can also try profiling tools, such as valgrind and analyze the number of cache misses.
      – SergeyA
      8 hours ago












      @ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
      – davidbak
      6 hours ago




      @ArtB - can be done by adding padding members to structs to separate fields into different cache lines. There are plenty of papers/blog posts/etc out there discussing how to measure to see if you've got a problem and then how to ameliorate it.
      – davidbak
      6 hours ago












      @ArtB: C++17 provides interference sizes to help guide such design.
      – Davis Herring
      1 hour ago




      @ArtB: C++17 provides interference sizes to help guide such design.
      – Davis Herring
      1 hour ago












      up vote
      10
      down vote













      As others have explained, nothing in particular on common hardware. However, there is a catch: The compiler must refrain from performing certain optimizations, unless it can prove that other threads don't access the memory locations in question, e.g.:



      std::array<std::uint8_t, 8u> c;

      void f()
      {
      c[0] ^= 0xfa;
      c[3] ^= 0x10;
      c[6] ^= 0x8b;
      c[7] ^= 0x92;
      }


      Here, in a single-threaded memory model, the compiler could emit code like the following (pseudo-assembly; assumes little-endian hardware):



      load r0, *(std::uint64_t *) &c[0]
      xor r0, 0x928b0000100000fa
      store r0, *(std::uint64_t *) &c[0]


      This is likely to be faster on common hardware than xor'ing the individual bytes. However, it reads and writes the unaffected (and unmentioned) elements of c at indices 1, 2, 4 and 5. If other threads are writing to these memory locations concurrently, these changes could be overwritten.



      For this reason, optimizations like these are often unusable in a multi-threaded memory model. As long as the compiler performs only loads and stores of matching length, or merges accesses only when there is no gap (e.g. the accesses to c[6] and c[7] can still be merged), the hardware commonly already provides the necessary guarantees for correct execution.



      (That said, there are/have been some architectures with weak and counterintuitive memory order guarantees, e.g. DEC Alpha does not track pointers as a data dependency in the way that other architectures do, so it is necessary to introduce an explicit memory barrier in some cases, in low level code. There is a somewhat well-known little rant by Linus Torvalds on this issue. However, a conforming C++ implementation is expected to shield you from such issues.)






      share|improve this answer

























        up vote
        10
        down vote













        As others have explained, nothing in particular on common hardware. However, there is a catch: The compiler must refrain from performing certain optimizations, unless it can prove that other threads don't access the memory locations in question, e.g.:



        std::array<std::uint8_t, 8u> c;

        void f()
        {
        c[0] ^= 0xfa;
        c[3] ^= 0x10;
        c[6] ^= 0x8b;
        c[7] ^= 0x92;
        }


        Here, in a single-threaded memory model, the compiler could emit code like the following (pseudo-assembly; assumes little-endian hardware):



        load r0, *(std::uint64_t *) &c[0]
        xor r0, 0x928b0000100000fa
        store r0, *(std::uint64_t *) &c[0]


        This is likely to be faster on common hardware than xor'ing the individual bytes. However, it reads and writes the unaffected (and unmentioned) elements of c at indices 1, 2, 4 and 5. If other threads are writing to these memory locations concurrently, these changes could be overwritten.



        For this reason, optimizations like these are often unusable in a multi-threaded memory model. As long as the compiler performs only loads and stores of matching length, or merges accesses only when there is no gap (e.g. the accesses to c[6] and c[7] can still be merged), the hardware commonly already provides the necessary guarantees for correct execution.



        (That said, there are/have been some architectures with weak and counterintuitive memory order guarantees, e.g. DEC Alpha does not track pointers as a data dependency in the way that other architectures do, so it is necessary to introduce an explicit memory barrier in some cases, in low level code. There is a somewhat well-known little rant by Linus Torvalds on this issue. However, a conforming C++ implementation is expected to shield you from such issues.)






        share|improve this answer























          up vote
          10
          down vote










          up vote
          10
          down vote









          As others have explained, nothing in particular on common hardware. However, there is a catch: The compiler must refrain from performing certain optimizations, unless it can prove that other threads don't access the memory locations in question, e.g.:



          std::array<std::uint8_t, 8u> c;

          void f()
          {
          c[0] ^= 0xfa;
          c[3] ^= 0x10;
          c[6] ^= 0x8b;
          c[7] ^= 0x92;
          }


          Here, in a single-threaded memory model, the compiler could emit code like the following (pseudo-assembly; assumes little-endian hardware):



          load r0, *(std::uint64_t *) &c[0]
          xor r0, 0x928b0000100000fa
          store r0, *(std::uint64_t *) &c[0]


          This is likely to be faster on common hardware than xor'ing the individual bytes. However, it reads and writes the unaffected (and unmentioned) elements of c at indices 1, 2, 4 and 5. If other threads are writing to these memory locations concurrently, these changes could be overwritten.



          For this reason, optimizations like these are often unusable in a multi-threaded memory model. As long as the compiler performs only loads and stores of matching length, or merges accesses only when there is no gap (e.g. the accesses to c[6] and c[7] can still be merged), the hardware commonly already provides the necessary guarantees for correct execution.



          (That said, there are/have been some architectures with weak and counterintuitive memory order guarantees, e.g. DEC Alpha does not track pointers as a data dependency in the way that other architectures do, so it is necessary to introduce an explicit memory barrier in some cases, in low level code. There is a somewhat well-known little rant by Linus Torvalds on this issue. However, a conforming C++ implementation is expected to shield you from such issues.)






          share|improve this answer












          As others have explained, nothing in particular on common hardware. However, there is a catch: The compiler must refrain from performing certain optimizations, unless it can prove that other threads don't access the memory locations in question, e.g.:



          std::array<std::uint8_t, 8u> c;

          void f()
          {
          c[0] ^= 0xfa;
          c[3] ^= 0x10;
          c[6] ^= 0x8b;
          c[7] ^= 0x92;
          }


          Here, in a single-threaded memory model, the compiler could emit code like the following (pseudo-assembly; assumes little-endian hardware):



          load r0, *(std::uint64_t *) &c[0]
          xor r0, 0x928b0000100000fa
          store r0, *(std::uint64_t *) &c[0]


          This is likely to be faster on common hardware than xor'ing the individual bytes. However, it reads and writes the unaffected (and unmentioned) elements of c at indices 1, 2, 4 and 5. If other threads are writing to these memory locations concurrently, these changes could be overwritten.



          For this reason, optimizations like these are often unusable in a multi-threaded memory model. As long as the compiler performs only loads and stores of matching length, or merges accesses only when there is no gap (e.g. the accesses to c[6] and c[7] can still be merged), the hardware commonly already provides the necessary guarantees for correct execution.



          (That said, there are/have been some architectures with weak and counterintuitive memory order guarantees, e.g. DEC Alpha does not track pointers as a data dependency in the way that other architectures do, so it is necessary to introduce an explicit memory barrier in some cases, in low level code. There is a somewhat well-known little rant by Linus Torvalds on this issue. However, a conforming C++ implementation is expected to shield you from such issues.)







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 11 hours ago









          Arne Vogel

          3,66011124




          3,66011124






















              up vote
              -1
              down vote













              The compiler does nothing about it.
              Lets start by saying that it doesn't matter if the memory locations are adjacent or at totally different locations! The memory must ensure that it delivers the correct data for a given address.
              Cache lines are not how data is provided to the process, instead the CPU provides the address and number of bytes it intends to read, and the memory provides it. (if we have a cache hit from the cache, otherwise RAM)
              problem arises only when two threads are trying to access data at the exactly same address, which must be dealt with caution (using Semaphores etc.)






              share|improve this answer

























                up vote
                -1
                down vote













                The compiler does nothing about it.
                Lets start by saying that it doesn't matter if the memory locations are adjacent or at totally different locations! The memory must ensure that it delivers the correct data for a given address.
                Cache lines are not how data is provided to the process, instead the CPU provides the address and number of bytes it intends to read, and the memory provides it. (if we have a cache hit from the cache, otherwise RAM)
                problem arises only when two threads are trying to access data at the exactly same address, which must be dealt with caution (using Semaphores etc.)






                share|improve this answer























                  up vote
                  -1
                  down vote










                  up vote
                  -1
                  down vote









                  The compiler does nothing about it.
                  Lets start by saying that it doesn't matter if the memory locations are adjacent or at totally different locations! The memory must ensure that it delivers the correct data for a given address.
                  Cache lines are not how data is provided to the process, instead the CPU provides the address and number of bytes it intends to read, and the memory provides it. (if we have a cache hit from the cache, otherwise RAM)
                  problem arises only when two threads are trying to access data at the exactly same address, which must be dealt with caution (using Semaphores etc.)






                  share|improve this answer












                  The compiler does nothing about it.
                  Lets start by saying that it doesn't matter if the memory locations are adjacent or at totally different locations! The memory must ensure that it delivers the correct data for a given address.
                  Cache lines are not how data is provided to the process, instead the CPU provides the address and number of bytes it intends to read, and the memory provides it. (if we have a cache hit from the cache, otherwise RAM)
                  problem arises only when two threads are trying to access data at the exactly same address, which must be dealt with caution (using Semaphores etc.)







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 10 hours ago









                  TheEngineer

                  35510




                  35510






























                       

                      draft saved


                      draft discarded



















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53376806%2fwhat-does-the-c-compiler-do-to-ensure-that-different-but-adjacent-memory-locat%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      404 Error Contact Form 7 ajax form submitting

                      How to know if a Active Directory user can login interactively

                      Refactoring coordinates for Minecraft Pi buildings written in Python