Efficiency measurments of Go's once Type











up vote
-1
down vote

favorite












I have a piece of code that I want to run only once for initialization.
So far I was using sync.Mutex combined with an if-clause to test if it has been run already. Later I came across the Once type and its DO() function in the same sync package.



The implementation is the following https://golang.org/src/sync/once.go:



func (o *Once) Do(f func()) {
if atomic.LoadUint32(&o.done) == 1 {
return
}
// Slow-path.
o.m.Lock()
defer o.m.Unlock()
if o.done == 0 {
defer atomic.StoreUint32(&o.done, 1)
f()
}
}


Looking at the code, it is basically the same thing I've been using before. A mutex combined with an if-clause. However, the added function calls makes this seem rather inefficient to me. I did some testing and tried varous versions:



func test1() {
o.Do(func() {
// Do smth
})
wg.Done()
}

func test2() {
m.Lock()
if !b {
func() {
// Do smth
}()
}
b = true
m.Unlock()
wg.Done()
}

func test3() {
if !b {
m.Lock()
if !b {
func() {
// Do smth
}()
b = true
}
m.Unlock()
}
wg.Done()
}


I tested all versions by running the following code:



    wg.Add(10000)
start = time.Now()
for i := 0; i < 10000; i++ {
go testX()
}
wg.Wait()
end = time.Now()

fmt.Printf("elapsed: %vn", end.Sub(start).Nanoseconds())


with the following resutls:



elapsed: 8002700 //test1
elapsed: 5961600 //test2
elapsed: 5646700 //test3


Is it even worth using the Once type? It is convenient but performance is even worse than test2 which always serializes all routines.



Also, why are they using an atomic int for their if-clause? Storing happens inside the lock anyway.



Edit: Go playground link: https://play.golang.org/p/qlMxPYop7kS NOTICE: this doensn't show the results as time is fixed in the playground.










share|improve this question




















  • 2




    Where are your Go testing package benchmark results?
    – peterSO
    Nov 19 at 14:01






  • 2




    test3() has data race (you cannot read b without synchronization). And if you move that check inside the block protected by the mutex, you're already "doing worse" than Once.Do() which has an "optimized" short path using an atomic load. The "slow" path is most likely only to be encountered once.
    – icza
    Nov 19 at 14:05












  • see test2, it does the check on b inside the locked section and is still much faster.
    – Gilrich
    Nov 19 at 14:07






  • 1




    As peterSO wrote, we don't know how you got your test results. Show the testing and benchmarking code. Aim for a Minimal, Complete, and Verifiable example.
    – icza
    Nov 19 at 14:08






  • 2




    First, use Go's built-in benchmarking system, it is very thorough and effective. Second, unless you've benchmarked a real-world application and found a performance issue, and used profiling to trace that issue to sync.Once, this is a fruitless exercise. It is extremely unlikely that sync.Once will have any meaningful performance impact in any real-world scenario.
    – Adrian
    Nov 19 at 14:19















up vote
-1
down vote

favorite












I have a piece of code that I want to run only once for initialization.
So far I was using sync.Mutex combined with an if-clause to test if it has been run already. Later I came across the Once type and its DO() function in the same sync package.



The implementation is the following https://golang.org/src/sync/once.go:



func (o *Once) Do(f func()) {
if atomic.LoadUint32(&o.done) == 1 {
return
}
// Slow-path.
o.m.Lock()
defer o.m.Unlock()
if o.done == 0 {
defer atomic.StoreUint32(&o.done, 1)
f()
}
}


Looking at the code, it is basically the same thing I've been using before. A mutex combined with an if-clause. However, the added function calls makes this seem rather inefficient to me. I did some testing and tried varous versions:



func test1() {
o.Do(func() {
// Do smth
})
wg.Done()
}

func test2() {
m.Lock()
if !b {
func() {
// Do smth
}()
}
b = true
m.Unlock()
wg.Done()
}

func test3() {
if !b {
m.Lock()
if !b {
func() {
// Do smth
}()
b = true
}
m.Unlock()
}
wg.Done()
}


I tested all versions by running the following code:



    wg.Add(10000)
start = time.Now()
for i := 0; i < 10000; i++ {
go testX()
}
wg.Wait()
end = time.Now()

fmt.Printf("elapsed: %vn", end.Sub(start).Nanoseconds())


with the following resutls:



elapsed: 8002700 //test1
elapsed: 5961600 //test2
elapsed: 5646700 //test3


Is it even worth using the Once type? It is convenient but performance is even worse than test2 which always serializes all routines.



Also, why are they using an atomic int for their if-clause? Storing happens inside the lock anyway.



Edit: Go playground link: https://play.golang.org/p/qlMxPYop7kS NOTICE: this doensn't show the results as time is fixed in the playground.










share|improve this question




















  • 2




    Where are your Go testing package benchmark results?
    – peterSO
    Nov 19 at 14:01






  • 2




    test3() has data race (you cannot read b without synchronization). And if you move that check inside the block protected by the mutex, you're already "doing worse" than Once.Do() which has an "optimized" short path using an atomic load. The "slow" path is most likely only to be encountered once.
    – icza
    Nov 19 at 14:05












  • see test2, it does the check on b inside the locked section and is still much faster.
    – Gilrich
    Nov 19 at 14:07






  • 1




    As peterSO wrote, we don't know how you got your test results. Show the testing and benchmarking code. Aim for a Minimal, Complete, and Verifiable example.
    – icza
    Nov 19 at 14:08






  • 2




    First, use Go's built-in benchmarking system, it is very thorough and effective. Second, unless you've benchmarked a real-world application and found a performance issue, and used profiling to trace that issue to sync.Once, this is a fruitless exercise. It is extremely unlikely that sync.Once will have any meaningful performance impact in any real-world scenario.
    – Adrian
    Nov 19 at 14:19













up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











I have a piece of code that I want to run only once for initialization.
So far I was using sync.Mutex combined with an if-clause to test if it has been run already. Later I came across the Once type and its DO() function in the same sync package.



The implementation is the following https://golang.org/src/sync/once.go:



func (o *Once) Do(f func()) {
if atomic.LoadUint32(&o.done) == 1 {
return
}
// Slow-path.
o.m.Lock()
defer o.m.Unlock()
if o.done == 0 {
defer atomic.StoreUint32(&o.done, 1)
f()
}
}


Looking at the code, it is basically the same thing I've been using before. A mutex combined with an if-clause. However, the added function calls makes this seem rather inefficient to me. I did some testing and tried varous versions:



func test1() {
o.Do(func() {
// Do smth
})
wg.Done()
}

func test2() {
m.Lock()
if !b {
func() {
// Do smth
}()
}
b = true
m.Unlock()
wg.Done()
}

func test3() {
if !b {
m.Lock()
if !b {
func() {
// Do smth
}()
b = true
}
m.Unlock()
}
wg.Done()
}


I tested all versions by running the following code:



    wg.Add(10000)
start = time.Now()
for i := 0; i < 10000; i++ {
go testX()
}
wg.Wait()
end = time.Now()

fmt.Printf("elapsed: %vn", end.Sub(start).Nanoseconds())


with the following resutls:



elapsed: 8002700 //test1
elapsed: 5961600 //test2
elapsed: 5646700 //test3


Is it even worth using the Once type? It is convenient but performance is even worse than test2 which always serializes all routines.



Also, why are they using an atomic int for their if-clause? Storing happens inside the lock anyway.



Edit: Go playground link: https://play.golang.org/p/qlMxPYop7kS NOTICE: this doensn't show the results as time is fixed in the playground.










share|improve this question















I have a piece of code that I want to run only once for initialization.
So far I was using sync.Mutex combined with an if-clause to test if it has been run already. Later I came across the Once type and its DO() function in the same sync package.



The implementation is the following https://golang.org/src/sync/once.go:



func (o *Once) Do(f func()) {
if atomic.LoadUint32(&o.done) == 1 {
return
}
// Slow-path.
o.m.Lock()
defer o.m.Unlock()
if o.done == 0 {
defer atomic.StoreUint32(&o.done, 1)
f()
}
}


Looking at the code, it is basically the same thing I've been using before. A mutex combined with an if-clause. However, the added function calls makes this seem rather inefficient to me. I did some testing and tried varous versions:



func test1() {
o.Do(func() {
// Do smth
})
wg.Done()
}

func test2() {
m.Lock()
if !b {
func() {
// Do smth
}()
}
b = true
m.Unlock()
wg.Done()
}

func test3() {
if !b {
m.Lock()
if !b {
func() {
// Do smth
}()
b = true
}
m.Unlock()
}
wg.Done()
}


I tested all versions by running the following code:



    wg.Add(10000)
start = time.Now()
for i := 0; i < 10000; i++ {
go testX()
}
wg.Wait()
end = time.Now()

fmt.Printf("elapsed: %vn", end.Sub(start).Nanoseconds())


with the following resutls:



elapsed: 8002700 //test1
elapsed: 5961600 //test2
elapsed: 5646700 //test3


Is it even worth using the Once type? It is convenient but performance is even worse than test2 which always serializes all routines.



Also, why are they using an atomic int for their if-clause? Storing happens inside the lock anyway.



Edit: Go playground link: https://play.golang.org/p/qlMxPYop7kS NOTICE: this doensn't show the results as time is fixed in the playground.







performance go synchronization mutex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 at 14:18

























asked Nov 19 at 13:58









Gilrich

538




538








  • 2




    Where are your Go testing package benchmark results?
    – peterSO
    Nov 19 at 14:01






  • 2




    test3() has data race (you cannot read b without synchronization). And if you move that check inside the block protected by the mutex, you're already "doing worse" than Once.Do() which has an "optimized" short path using an atomic load. The "slow" path is most likely only to be encountered once.
    – icza
    Nov 19 at 14:05












  • see test2, it does the check on b inside the locked section and is still much faster.
    – Gilrich
    Nov 19 at 14:07






  • 1




    As peterSO wrote, we don't know how you got your test results. Show the testing and benchmarking code. Aim for a Minimal, Complete, and Verifiable example.
    – icza
    Nov 19 at 14:08






  • 2




    First, use Go's built-in benchmarking system, it is very thorough and effective. Second, unless you've benchmarked a real-world application and found a performance issue, and used profiling to trace that issue to sync.Once, this is a fruitless exercise. It is extremely unlikely that sync.Once will have any meaningful performance impact in any real-world scenario.
    – Adrian
    Nov 19 at 14:19














  • 2




    Where are your Go testing package benchmark results?
    – peterSO
    Nov 19 at 14:01






  • 2




    test3() has data race (you cannot read b without synchronization). And if you move that check inside the block protected by the mutex, you're already "doing worse" than Once.Do() which has an "optimized" short path using an atomic load. The "slow" path is most likely only to be encountered once.
    – icza
    Nov 19 at 14:05












  • see test2, it does the check on b inside the locked section and is still much faster.
    – Gilrich
    Nov 19 at 14:07






  • 1




    As peterSO wrote, we don't know how you got your test results. Show the testing and benchmarking code. Aim for a Minimal, Complete, and Verifiable example.
    – icza
    Nov 19 at 14:08






  • 2




    First, use Go's built-in benchmarking system, it is very thorough and effective. Second, unless you've benchmarked a real-world application and found a performance issue, and used profiling to trace that issue to sync.Once, this is a fruitless exercise. It is extremely unlikely that sync.Once will have any meaningful performance impact in any real-world scenario.
    – Adrian
    Nov 19 at 14:19








2




2




Where are your Go testing package benchmark results?
– peterSO
Nov 19 at 14:01




Where are your Go testing package benchmark results?
– peterSO
Nov 19 at 14:01




2




2




test3() has data race (you cannot read b without synchronization). And if you move that check inside the block protected by the mutex, you're already "doing worse" than Once.Do() which has an "optimized" short path using an atomic load. The "slow" path is most likely only to be encountered once.
– icza
Nov 19 at 14:05






test3() has data race (you cannot read b without synchronization). And if you move that check inside the block protected by the mutex, you're already "doing worse" than Once.Do() which has an "optimized" short path using an atomic load. The "slow" path is most likely only to be encountered once.
– icza
Nov 19 at 14:05














see test2, it does the check on b inside the locked section and is still much faster.
– Gilrich
Nov 19 at 14:07




see test2, it does the check on b inside the locked section and is still much faster.
– Gilrich
Nov 19 at 14:07




1




1




As peterSO wrote, we don't know how you got your test results. Show the testing and benchmarking code. Aim for a Minimal, Complete, and Verifiable example.
– icza
Nov 19 at 14:08




As peterSO wrote, we don't know how you got your test results. Show the testing and benchmarking code. Aim for a Minimal, Complete, and Verifiable example.
– icza
Nov 19 at 14:08




2




2




First, use Go's built-in benchmarking system, it is very thorough and effective. Second, unless you've benchmarked a real-world application and found a performance issue, and used profiling to trace that issue to sync.Once, this is a fruitless exercise. It is extremely unlikely that sync.Once will have any meaningful performance impact in any real-world scenario.
– Adrian
Nov 19 at 14:19




First, use Go's built-in benchmarking system, it is very thorough and effective. Second, unless you've benchmarked a real-world application and found a performance issue, and used profiling to trace that issue to sync.Once, this is a fruitless exercise. It is extremely unlikely that sync.Once will have any meaningful performance impact in any real-world scenario.
– Adrian
Nov 19 at 14:19












1 Answer
1






active

oldest

votes

















up vote
3
down vote



accepted










That is not how you're supposed to test code performance. You should use Go's built-in testing framework (testing package and go test command). See Order of the code and performance for details.



Let's create the testable code:



func f() {
// Code that must only be run once
}

var testOnce = &sync.Once{}

func DoWithOnce() {
testOnce.Do(f)
}

var (
mu = &sync.Mutex{}
b bool
)

func DoWithMutex() {
mu.Lock()
if !b {
f()
b = true
}
mu.Unlock()
}


Let's write proper testing / benchmarking code using the testing package:



func BenchmarkOnce(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithOnce()
}
}

func BenchmarkMutex(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithMutex()
}
}


We can run the benchmark with the following code:



go test -bench .


And here are the benchmarking results:



BenchmarkOnce-4         200000000                6.30 ns/op
BenchmarkMutex-4 100000000 20.0 ns/op
PASS


As you can see, using sync.Once() was almost 4 times faster than using a sync.Mutex. Why? Because sync.Once() has an "optimized", short path that uses only an atomic load to check if the task has been called before, and if so, no mutex is used. The "slow" path is likely only used once, on first call to Once.Do(). Although if you'd have many concurrent goroutines attempting to call DoWithOnce(), the slow path might be reached multiple times, but on the long run once.Do() will only need to use an atomic load.



Parallel testing (from multiple goroutines)



Yes, the above benchmarking code only uses a single goroutine to test. But using multiple concurrent goroutines will just make the mutex's case worse, as it always have to obtain a mutex to even check if the task is to be called while sync.Once just uses an atomic load.



Nevertheless, let's benchmark it.



Here are the benchmarking code using parallel testing:



func BenchmarkOnceParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithOnce()
}
})
}

func BenchmarkMutexParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithMutex()
}
})
}


I have 4 cores on my machine, so I'm gonna use those 4 cores:



go test -bench Parallel -cpu=4


(You may omit the -cpu flag in which case it defaults to GOMAXPROCS–the number of cores available.)



And here are the results:



BenchmarkOnceParallel-4         500000000                3.04 ns/op
BenchmarkMutexParallel-4 20000000 93.7 ns/op


When "concurrency increases", the results are starting to become uncomparable in favor of sync.Once (in the above test, it's 30 times faster).



We may further increase the number of goroutines created using testing.B.SetPralleism(), but I got similar result when I set it to 100 (that means 400 goroutines were used to call the benchmarking code).






share|improve this answer























  • Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
    – Gilrich
    Nov 19 at 14:47










  • How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be: go DoWithOnce() and go DoWithMutex()?
    – Gilrich
    Nov 19 at 15:09












  • @Gilrich You shouldn't use go in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.
    – icza
    Nov 19 at 15:21











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53376237%2fefficiency-measurments-of-gos-once-type%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
3
down vote



accepted










That is not how you're supposed to test code performance. You should use Go's built-in testing framework (testing package and go test command). See Order of the code and performance for details.



Let's create the testable code:



func f() {
// Code that must only be run once
}

var testOnce = &sync.Once{}

func DoWithOnce() {
testOnce.Do(f)
}

var (
mu = &sync.Mutex{}
b bool
)

func DoWithMutex() {
mu.Lock()
if !b {
f()
b = true
}
mu.Unlock()
}


Let's write proper testing / benchmarking code using the testing package:



func BenchmarkOnce(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithOnce()
}
}

func BenchmarkMutex(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithMutex()
}
}


We can run the benchmark with the following code:



go test -bench .


And here are the benchmarking results:



BenchmarkOnce-4         200000000                6.30 ns/op
BenchmarkMutex-4 100000000 20.0 ns/op
PASS


As you can see, using sync.Once() was almost 4 times faster than using a sync.Mutex. Why? Because sync.Once() has an "optimized", short path that uses only an atomic load to check if the task has been called before, and if so, no mutex is used. The "slow" path is likely only used once, on first call to Once.Do(). Although if you'd have many concurrent goroutines attempting to call DoWithOnce(), the slow path might be reached multiple times, but on the long run once.Do() will only need to use an atomic load.



Parallel testing (from multiple goroutines)



Yes, the above benchmarking code only uses a single goroutine to test. But using multiple concurrent goroutines will just make the mutex's case worse, as it always have to obtain a mutex to even check if the task is to be called while sync.Once just uses an atomic load.



Nevertheless, let's benchmark it.



Here are the benchmarking code using parallel testing:



func BenchmarkOnceParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithOnce()
}
})
}

func BenchmarkMutexParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithMutex()
}
})
}


I have 4 cores on my machine, so I'm gonna use those 4 cores:



go test -bench Parallel -cpu=4


(You may omit the -cpu flag in which case it defaults to GOMAXPROCS–the number of cores available.)



And here are the results:



BenchmarkOnceParallel-4         500000000                3.04 ns/op
BenchmarkMutexParallel-4 20000000 93.7 ns/op


When "concurrency increases", the results are starting to become uncomparable in favor of sync.Once (in the above test, it's 30 times faster).



We may further increase the number of goroutines created using testing.B.SetPralleism(), but I got similar result when I set it to 100 (that means 400 goroutines were used to call the benchmarking code).






share|improve this answer























  • Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
    – Gilrich
    Nov 19 at 14:47










  • How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be: go DoWithOnce() and go DoWithMutex()?
    – Gilrich
    Nov 19 at 15:09












  • @Gilrich You shouldn't use go in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.
    – icza
    Nov 19 at 15:21















up vote
3
down vote



accepted










That is not how you're supposed to test code performance. You should use Go's built-in testing framework (testing package and go test command). See Order of the code and performance for details.



Let's create the testable code:



func f() {
// Code that must only be run once
}

var testOnce = &sync.Once{}

func DoWithOnce() {
testOnce.Do(f)
}

var (
mu = &sync.Mutex{}
b bool
)

func DoWithMutex() {
mu.Lock()
if !b {
f()
b = true
}
mu.Unlock()
}


Let's write proper testing / benchmarking code using the testing package:



func BenchmarkOnce(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithOnce()
}
}

func BenchmarkMutex(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithMutex()
}
}


We can run the benchmark with the following code:



go test -bench .


And here are the benchmarking results:



BenchmarkOnce-4         200000000                6.30 ns/op
BenchmarkMutex-4 100000000 20.0 ns/op
PASS


As you can see, using sync.Once() was almost 4 times faster than using a sync.Mutex. Why? Because sync.Once() has an "optimized", short path that uses only an atomic load to check if the task has been called before, and if so, no mutex is used. The "slow" path is likely only used once, on first call to Once.Do(). Although if you'd have many concurrent goroutines attempting to call DoWithOnce(), the slow path might be reached multiple times, but on the long run once.Do() will only need to use an atomic load.



Parallel testing (from multiple goroutines)



Yes, the above benchmarking code only uses a single goroutine to test. But using multiple concurrent goroutines will just make the mutex's case worse, as it always have to obtain a mutex to even check if the task is to be called while sync.Once just uses an atomic load.



Nevertheless, let's benchmark it.



Here are the benchmarking code using parallel testing:



func BenchmarkOnceParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithOnce()
}
})
}

func BenchmarkMutexParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithMutex()
}
})
}


I have 4 cores on my machine, so I'm gonna use those 4 cores:



go test -bench Parallel -cpu=4


(You may omit the -cpu flag in which case it defaults to GOMAXPROCS–the number of cores available.)



And here are the results:



BenchmarkOnceParallel-4         500000000                3.04 ns/op
BenchmarkMutexParallel-4 20000000 93.7 ns/op


When "concurrency increases", the results are starting to become uncomparable in favor of sync.Once (in the above test, it's 30 times faster).



We may further increase the number of goroutines created using testing.B.SetPralleism(), but I got similar result when I set it to 100 (that means 400 goroutines were used to call the benchmarking code).






share|improve this answer























  • Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
    – Gilrich
    Nov 19 at 14:47










  • How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be: go DoWithOnce() and go DoWithMutex()?
    – Gilrich
    Nov 19 at 15:09












  • @Gilrich You shouldn't use go in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.
    – icza
    Nov 19 at 15:21













up vote
3
down vote



accepted







up vote
3
down vote



accepted






That is not how you're supposed to test code performance. You should use Go's built-in testing framework (testing package and go test command). See Order of the code and performance for details.



Let's create the testable code:



func f() {
// Code that must only be run once
}

var testOnce = &sync.Once{}

func DoWithOnce() {
testOnce.Do(f)
}

var (
mu = &sync.Mutex{}
b bool
)

func DoWithMutex() {
mu.Lock()
if !b {
f()
b = true
}
mu.Unlock()
}


Let's write proper testing / benchmarking code using the testing package:



func BenchmarkOnce(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithOnce()
}
}

func BenchmarkMutex(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithMutex()
}
}


We can run the benchmark with the following code:



go test -bench .


And here are the benchmarking results:



BenchmarkOnce-4         200000000                6.30 ns/op
BenchmarkMutex-4 100000000 20.0 ns/op
PASS


As you can see, using sync.Once() was almost 4 times faster than using a sync.Mutex. Why? Because sync.Once() has an "optimized", short path that uses only an atomic load to check if the task has been called before, and if so, no mutex is used. The "slow" path is likely only used once, on first call to Once.Do(). Although if you'd have many concurrent goroutines attempting to call DoWithOnce(), the slow path might be reached multiple times, but on the long run once.Do() will only need to use an atomic load.



Parallel testing (from multiple goroutines)



Yes, the above benchmarking code only uses a single goroutine to test. But using multiple concurrent goroutines will just make the mutex's case worse, as it always have to obtain a mutex to even check if the task is to be called while sync.Once just uses an atomic load.



Nevertheless, let's benchmark it.



Here are the benchmarking code using parallel testing:



func BenchmarkOnceParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithOnce()
}
})
}

func BenchmarkMutexParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithMutex()
}
})
}


I have 4 cores on my machine, so I'm gonna use those 4 cores:



go test -bench Parallel -cpu=4


(You may omit the -cpu flag in which case it defaults to GOMAXPROCS–the number of cores available.)



And here are the results:



BenchmarkOnceParallel-4         500000000                3.04 ns/op
BenchmarkMutexParallel-4 20000000 93.7 ns/op


When "concurrency increases", the results are starting to become uncomparable in favor of sync.Once (in the above test, it's 30 times faster).



We may further increase the number of goroutines created using testing.B.SetPralleism(), but I got similar result when I set it to 100 (that means 400 goroutines were used to call the benchmarking code).






share|improve this answer














That is not how you're supposed to test code performance. You should use Go's built-in testing framework (testing package and go test command). See Order of the code and performance for details.



Let's create the testable code:



func f() {
// Code that must only be run once
}

var testOnce = &sync.Once{}

func DoWithOnce() {
testOnce.Do(f)
}

var (
mu = &sync.Mutex{}
b bool
)

func DoWithMutex() {
mu.Lock()
if !b {
f()
b = true
}
mu.Unlock()
}


Let's write proper testing / benchmarking code using the testing package:



func BenchmarkOnce(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithOnce()
}
}

func BenchmarkMutex(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithMutex()
}
}


We can run the benchmark with the following code:



go test -bench .


And here are the benchmarking results:



BenchmarkOnce-4         200000000                6.30 ns/op
BenchmarkMutex-4 100000000 20.0 ns/op
PASS


As you can see, using sync.Once() was almost 4 times faster than using a sync.Mutex. Why? Because sync.Once() has an "optimized", short path that uses only an atomic load to check if the task has been called before, and if so, no mutex is used. The "slow" path is likely only used once, on first call to Once.Do(). Although if you'd have many concurrent goroutines attempting to call DoWithOnce(), the slow path might be reached multiple times, but on the long run once.Do() will only need to use an atomic load.



Parallel testing (from multiple goroutines)



Yes, the above benchmarking code only uses a single goroutine to test. But using multiple concurrent goroutines will just make the mutex's case worse, as it always have to obtain a mutex to even check if the task is to be called while sync.Once just uses an atomic load.



Nevertheless, let's benchmark it.



Here are the benchmarking code using parallel testing:



func BenchmarkOnceParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithOnce()
}
})
}

func BenchmarkMutexParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithMutex()
}
})
}


I have 4 cores on my machine, so I'm gonna use those 4 cores:



go test -bench Parallel -cpu=4


(You may omit the -cpu flag in which case it defaults to GOMAXPROCS–the number of cores available.)



And here are the results:



BenchmarkOnceParallel-4         500000000                3.04 ns/op
BenchmarkMutexParallel-4 20000000 93.7 ns/op


When "concurrency increases", the results are starting to become uncomparable in favor of sync.Once (in the above test, it's 30 times faster).



We may further increase the number of goroutines created using testing.B.SetPralleism(), but I got similar result when I set it to 100 (that means 400 goroutines were used to call the benchmarking code).







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 19 at 21:14

























answered Nov 19 at 14:18









icza

159k24309349




159k24309349












  • Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
    – Gilrich
    Nov 19 at 14:47










  • How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be: go DoWithOnce() and go DoWithMutex()?
    – Gilrich
    Nov 19 at 15:09












  • @Gilrich You shouldn't use go in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.
    – icza
    Nov 19 at 15:21


















  • Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
    – Gilrich
    Nov 19 at 14:47










  • How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be: go DoWithOnce() and go DoWithMutex()?
    – Gilrich
    Nov 19 at 15:09












  • @Gilrich You shouldn't use go in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.
    – icza
    Nov 19 at 15:21
















Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
– Gilrich
Nov 19 at 14:47




Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
– Gilrich
Nov 19 at 14:47












How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be: go DoWithOnce() and go DoWithMutex()?
– Gilrich
Nov 19 at 15:09






How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be: go DoWithOnce() and go DoWithMutex()?
– Gilrich
Nov 19 at 15:09














@Gilrich You shouldn't use go in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.
– icza
Nov 19 at 15:21




@Gilrich You shouldn't use go in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.
– icza
Nov 19 at 15:21


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53376237%2fefficiency-measurments-of-gos-once-type%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

404 Error Contact Form 7 ajax form submitting

How to know if a Active Directory user can login interactively

Refactoring coordinates for Minecraft Pi buildings written in Python