Efficiency measurments of Go's once Type
up vote
-1
down vote
favorite
I have a piece of code that I want to run only once for initialization.
So far I was using sync.Mutex combined with an if-clause to test if it has been run already. Later I came across the Once type and its DO() function in the same sync package.
The implementation is the following https://golang.org/src/sync/once.go:
func (o *Once) Do(f func()) {
if atomic.LoadUint32(&o.done) == 1 {
return
}
// Slow-path.
o.m.Lock()
defer o.m.Unlock()
if o.done == 0 {
defer atomic.StoreUint32(&o.done, 1)
f()
}
}
Looking at the code, it is basically the same thing I've been using before. A mutex combined with an if-clause. However, the added function calls makes this seem rather inefficient to me. I did some testing and tried varous versions:
func test1() {
o.Do(func() {
// Do smth
})
wg.Done()
}
func test2() {
m.Lock()
if !b {
func() {
// Do smth
}()
}
b = true
m.Unlock()
wg.Done()
}
func test3() {
if !b {
m.Lock()
if !b {
func() {
// Do smth
}()
b = true
}
m.Unlock()
}
wg.Done()
}
I tested all versions by running the following code:
wg.Add(10000)
start = time.Now()
for i := 0; i < 10000; i++ {
go testX()
}
wg.Wait()
end = time.Now()
fmt.Printf("elapsed: %vn", end.Sub(start).Nanoseconds())
with the following resutls:
elapsed: 8002700 //test1
elapsed: 5961600 //test2
elapsed: 5646700 //test3
Is it even worth using the Once type? It is convenient but performance is even worse than test2 which always serializes all routines.
Also, why are they using an atomic int for their if-clause? Storing happens inside the lock anyway.
Edit: Go playground link: https://play.golang.org/p/qlMxPYop7kS NOTICE: this doensn't show the results as time is fixed in the playground.
performance go synchronization mutex
|
show 1 more comment
up vote
-1
down vote
favorite
I have a piece of code that I want to run only once for initialization.
So far I was using sync.Mutex combined with an if-clause to test if it has been run already. Later I came across the Once type and its DO() function in the same sync package.
The implementation is the following https://golang.org/src/sync/once.go:
func (o *Once) Do(f func()) {
if atomic.LoadUint32(&o.done) == 1 {
return
}
// Slow-path.
o.m.Lock()
defer o.m.Unlock()
if o.done == 0 {
defer atomic.StoreUint32(&o.done, 1)
f()
}
}
Looking at the code, it is basically the same thing I've been using before. A mutex combined with an if-clause. However, the added function calls makes this seem rather inefficient to me. I did some testing and tried varous versions:
func test1() {
o.Do(func() {
// Do smth
})
wg.Done()
}
func test2() {
m.Lock()
if !b {
func() {
// Do smth
}()
}
b = true
m.Unlock()
wg.Done()
}
func test3() {
if !b {
m.Lock()
if !b {
func() {
// Do smth
}()
b = true
}
m.Unlock()
}
wg.Done()
}
I tested all versions by running the following code:
wg.Add(10000)
start = time.Now()
for i := 0; i < 10000; i++ {
go testX()
}
wg.Wait()
end = time.Now()
fmt.Printf("elapsed: %vn", end.Sub(start).Nanoseconds())
with the following resutls:
elapsed: 8002700 //test1
elapsed: 5961600 //test2
elapsed: 5646700 //test3
Is it even worth using the Once type? It is convenient but performance is even worse than test2 which always serializes all routines.
Also, why are they using an atomic int for their if-clause? Storing happens inside the lock anyway.
Edit: Go playground link: https://play.golang.org/p/qlMxPYop7kS NOTICE: this doensn't show the results as time is fixed in the playground.
performance go synchronization mutex
2
Where are your Go testing package benchmark results?
– peterSO
Nov 19 at 14:01
2
test3()
has data race (you cannot readb
without synchronization). And if you move that check inside the block protected by the mutex, you're already "doing worse" thanOnce.Do()
which has an "optimized" short path using an atomic load. The "slow" path is most likely only to be encountered once.
– icza
Nov 19 at 14:05
see test2, it does the check on b inside the locked section and is still much faster.
– Gilrich
Nov 19 at 14:07
1
As peterSO wrote, we don't know how you got your test results. Show the testing and benchmarking code. Aim for a Minimal, Complete, and Verifiable example.
– icza
Nov 19 at 14:08
2
First, use Go's built-in benchmarking system, it is very thorough and effective. Second, unless you've benchmarked a real-world application and found a performance issue, and used profiling to trace that issue tosync.Once
, this is a fruitless exercise. It is extremely unlikely thatsync.Once
will have any meaningful performance impact in any real-world scenario.
– Adrian
Nov 19 at 14:19
|
show 1 more comment
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I have a piece of code that I want to run only once for initialization.
So far I was using sync.Mutex combined with an if-clause to test if it has been run already. Later I came across the Once type and its DO() function in the same sync package.
The implementation is the following https://golang.org/src/sync/once.go:
func (o *Once) Do(f func()) {
if atomic.LoadUint32(&o.done) == 1 {
return
}
// Slow-path.
o.m.Lock()
defer o.m.Unlock()
if o.done == 0 {
defer atomic.StoreUint32(&o.done, 1)
f()
}
}
Looking at the code, it is basically the same thing I've been using before. A mutex combined with an if-clause. However, the added function calls makes this seem rather inefficient to me. I did some testing and tried varous versions:
func test1() {
o.Do(func() {
// Do smth
})
wg.Done()
}
func test2() {
m.Lock()
if !b {
func() {
// Do smth
}()
}
b = true
m.Unlock()
wg.Done()
}
func test3() {
if !b {
m.Lock()
if !b {
func() {
// Do smth
}()
b = true
}
m.Unlock()
}
wg.Done()
}
I tested all versions by running the following code:
wg.Add(10000)
start = time.Now()
for i := 0; i < 10000; i++ {
go testX()
}
wg.Wait()
end = time.Now()
fmt.Printf("elapsed: %vn", end.Sub(start).Nanoseconds())
with the following resutls:
elapsed: 8002700 //test1
elapsed: 5961600 //test2
elapsed: 5646700 //test3
Is it even worth using the Once type? It is convenient but performance is even worse than test2 which always serializes all routines.
Also, why are they using an atomic int for their if-clause? Storing happens inside the lock anyway.
Edit: Go playground link: https://play.golang.org/p/qlMxPYop7kS NOTICE: this doensn't show the results as time is fixed in the playground.
performance go synchronization mutex
I have a piece of code that I want to run only once for initialization.
So far I was using sync.Mutex combined with an if-clause to test if it has been run already. Later I came across the Once type and its DO() function in the same sync package.
The implementation is the following https://golang.org/src/sync/once.go:
func (o *Once) Do(f func()) {
if atomic.LoadUint32(&o.done) == 1 {
return
}
// Slow-path.
o.m.Lock()
defer o.m.Unlock()
if o.done == 0 {
defer atomic.StoreUint32(&o.done, 1)
f()
}
}
Looking at the code, it is basically the same thing I've been using before. A mutex combined with an if-clause. However, the added function calls makes this seem rather inefficient to me. I did some testing and tried varous versions:
func test1() {
o.Do(func() {
// Do smth
})
wg.Done()
}
func test2() {
m.Lock()
if !b {
func() {
// Do smth
}()
}
b = true
m.Unlock()
wg.Done()
}
func test3() {
if !b {
m.Lock()
if !b {
func() {
// Do smth
}()
b = true
}
m.Unlock()
}
wg.Done()
}
I tested all versions by running the following code:
wg.Add(10000)
start = time.Now()
for i := 0; i < 10000; i++ {
go testX()
}
wg.Wait()
end = time.Now()
fmt.Printf("elapsed: %vn", end.Sub(start).Nanoseconds())
with the following resutls:
elapsed: 8002700 //test1
elapsed: 5961600 //test2
elapsed: 5646700 //test3
Is it even worth using the Once type? It is convenient but performance is even worse than test2 which always serializes all routines.
Also, why are they using an atomic int for their if-clause? Storing happens inside the lock anyway.
Edit: Go playground link: https://play.golang.org/p/qlMxPYop7kS NOTICE: this doensn't show the results as time is fixed in the playground.
performance go synchronization mutex
performance go synchronization mutex
edited Nov 19 at 14:18
asked Nov 19 at 13:58
Gilrich
538
538
2
Where are your Go testing package benchmark results?
– peterSO
Nov 19 at 14:01
2
test3()
has data race (you cannot readb
without synchronization). And if you move that check inside the block protected by the mutex, you're already "doing worse" thanOnce.Do()
which has an "optimized" short path using an atomic load. The "slow" path is most likely only to be encountered once.
– icza
Nov 19 at 14:05
see test2, it does the check on b inside the locked section and is still much faster.
– Gilrich
Nov 19 at 14:07
1
As peterSO wrote, we don't know how you got your test results. Show the testing and benchmarking code. Aim for a Minimal, Complete, and Verifiable example.
– icza
Nov 19 at 14:08
2
First, use Go's built-in benchmarking system, it is very thorough and effective. Second, unless you've benchmarked a real-world application and found a performance issue, and used profiling to trace that issue tosync.Once
, this is a fruitless exercise. It is extremely unlikely thatsync.Once
will have any meaningful performance impact in any real-world scenario.
– Adrian
Nov 19 at 14:19
|
show 1 more comment
2
Where are your Go testing package benchmark results?
– peterSO
Nov 19 at 14:01
2
test3()
has data race (you cannot readb
without synchronization). And if you move that check inside the block protected by the mutex, you're already "doing worse" thanOnce.Do()
which has an "optimized" short path using an atomic load. The "slow" path is most likely only to be encountered once.
– icza
Nov 19 at 14:05
see test2, it does the check on b inside the locked section and is still much faster.
– Gilrich
Nov 19 at 14:07
1
As peterSO wrote, we don't know how you got your test results. Show the testing and benchmarking code. Aim for a Minimal, Complete, and Verifiable example.
– icza
Nov 19 at 14:08
2
First, use Go's built-in benchmarking system, it is very thorough and effective. Second, unless you've benchmarked a real-world application and found a performance issue, and used profiling to trace that issue tosync.Once
, this is a fruitless exercise. It is extremely unlikely thatsync.Once
will have any meaningful performance impact in any real-world scenario.
– Adrian
Nov 19 at 14:19
2
2
Where are your Go testing package benchmark results?
– peterSO
Nov 19 at 14:01
Where are your Go testing package benchmark results?
– peterSO
Nov 19 at 14:01
2
2
test3()
has data race (you cannot read b
without synchronization). And if you move that check inside the block protected by the mutex, you're already "doing worse" than Once.Do()
which has an "optimized" short path using an atomic load. The "slow" path is most likely only to be encountered once.– icza
Nov 19 at 14:05
test3()
has data race (you cannot read b
without synchronization). And if you move that check inside the block protected by the mutex, you're already "doing worse" than Once.Do()
which has an "optimized" short path using an atomic load. The "slow" path is most likely only to be encountered once.– icza
Nov 19 at 14:05
see test2, it does the check on b inside the locked section and is still much faster.
– Gilrich
Nov 19 at 14:07
see test2, it does the check on b inside the locked section and is still much faster.
– Gilrich
Nov 19 at 14:07
1
1
As peterSO wrote, we don't know how you got your test results. Show the testing and benchmarking code. Aim for a Minimal, Complete, and Verifiable example.
– icza
Nov 19 at 14:08
As peterSO wrote, we don't know how you got your test results. Show the testing and benchmarking code. Aim for a Minimal, Complete, and Verifiable example.
– icza
Nov 19 at 14:08
2
2
First, use Go's built-in benchmarking system, it is very thorough and effective. Second, unless you've benchmarked a real-world application and found a performance issue, and used profiling to trace that issue to
sync.Once
, this is a fruitless exercise. It is extremely unlikely that sync.Once
will have any meaningful performance impact in any real-world scenario.– Adrian
Nov 19 at 14:19
First, use Go's built-in benchmarking system, it is very thorough and effective. Second, unless you've benchmarked a real-world application and found a performance issue, and used profiling to trace that issue to
sync.Once
, this is a fruitless exercise. It is extremely unlikely that sync.Once
will have any meaningful performance impact in any real-world scenario.– Adrian
Nov 19 at 14:19
|
show 1 more comment
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
That is not how you're supposed to test code performance. You should use Go's built-in testing framework (testing
package and go test
command). See Order of the code and performance for details.
Let's create the testable code:
func f() {
// Code that must only be run once
}
var testOnce = &sync.Once{}
func DoWithOnce() {
testOnce.Do(f)
}
var (
mu = &sync.Mutex{}
b bool
)
func DoWithMutex() {
mu.Lock()
if !b {
f()
b = true
}
mu.Unlock()
}
Let's write proper testing / benchmarking code using the testing
package:
func BenchmarkOnce(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithOnce()
}
}
func BenchmarkMutex(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithMutex()
}
}
We can run the benchmark with the following code:
go test -bench .
And here are the benchmarking results:
BenchmarkOnce-4 200000000 6.30 ns/op
BenchmarkMutex-4 100000000 20.0 ns/op
PASS
As you can see, using sync.Once()
was almost 4 times faster than using a sync.Mutex
. Why? Because sync.Once()
has an "optimized", short path that uses only an atomic load to check if the task has been called before, and if so, no mutex is used. The "slow" path is likely only used once, on first call to Once.Do()
. Although if you'd have many concurrent goroutines attempting to call DoWithOnce()
, the slow path might be reached multiple times, but on the long run once.Do()
will only need to use an atomic load.
Parallel testing (from multiple goroutines)
Yes, the above benchmarking code only uses a single goroutine to test. But using multiple concurrent goroutines will just make the mutex's case worse, as it always have to obtain a mutex to even check if the task is to be called while sync.Once
just uses an atomic load.
Nevertheless, let's benchmark it.
Here are the benchmarking code using parallel testing:
func BenchmarkOnceParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithOnce()
}
})
}
func BenchmarkMutexParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithMutex()
}
})
}
I have 4 cores on my machine, so I'm gonna use those 4 cores:
go test -bench Parallel -cpu=4
(You may omit the -cpu
flag in which case it defaults to GOMAXPROCS
–the number of cores available.)
And here are the results:
BenchmarkOnceParallel-4 500000000 3.04 ns/op
BenchmarkMutexParallel-4 20000000 93.7 ns/op
When "concurrency increases", the results are starting to become uncomparable in favor of sync.Once
(in the above test, it's 30 times faster).
We may further increase the number of goroutines created using testing.B.SetPralleism()
, but I got similar result when I set it to 100 (that means 400 goroutines were used to call the benchmarking code).
Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
– Gilrich
Nov 19 at 14:47
How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be:go DoWithOnce()
andgo DoWithMutex()
?
– Gilrich
Nov 19 at 15:09
@Gilrich You shouldn't usego
in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.
– icza
Nov 19 at 15:21
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
That is not how you're supposed to test code performance. You should use Go's built-in testing framework (testing
package and go test
command). See Order of the code and performance for details.
Let's create the testable code:
func f() {
// Code that must only be run once
}
var testOnce = &sync.Once{}
func DoWithOnce() {
testOnce.Do(f)
}
var (
mu = &sync.Mutex{}
b bool
)
func DoWithMutex() {
mu.Lock()
if !b {
f()
b = true
}
mu.Unlock()
}
Let's write proper testing / benchmarking code using the testing
package:
func BenchmarkOnce(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithOnce()
}
}
func BenchmarkMutex(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithMutex()
}
}
We can run the benchmark with the following code:
go test -bench .
And here are the benchmarking results:
BenchmarkOnce-4 200000000 6.30 ns/op
BenchmarkMutex-4 100000000 20.0 ns/op
PASS
As you can see, using sync.Once()
was almost 4 times faster than using a sync.Mutex
. Why? Because sync.Once()
has an "optimized", short path that uses only an atomic load to check if the task has been called before, and if so, no mutex is used. The "slow" path is likely only used once, on first call to Once.Do()
. Although if you'd have many concurrent goroutines attempting to call DoWithOnce()
, the slow path might be reached multiple times, but on the long run once.Do()
will only need to use an atomic load.
Parallel testing (from multiple goroutines)
Yes, the above benchmarking code only uses a single goroutine to test. But using multiple concurrent goroutines will just make the mutex's case worse, as it always have to obtain a mutex to even check if the task is to be called while sync.Once
just uses an atomic load.
Nevertheless, let's benchmark it.
Here are the benchmarking code using parallel testing:
func BenchmarkOnceParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithOnce()
}
})
}
func BenchmarkMutexParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithMutex()
}
})
}
I have 4 cores on my machine, so I'm gonna use those 4 cores:
go test -bench Parallel -cpu=4
(You may omit the -cpu
flag in which case it defaults to GOMAXPROCS
–the number of cores available.)
And here are the results:
BenchmarkOnceParallel-4 500000000 3.04 ns/op
BenchmarkMutexParallel-4 20000000 93.7 ns/op
When "concurrency increases", the results are starting to become uncomparable in favor of sync.Once
(in the above test, it's 30 times faster).
We may further increase the number of goroutines created using testing.B.SetPralleism()
, but I got similar result when I set it to 100 (that means 400 goroutines were used to call the benchmarking code).
Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
– Gilrich
Nov 19 at 14:47
How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be:go DoWithOnce()
andgo DoWithMutex()
?
– Gilrich
Nov 19 at 15:09
@Gilrich You shouldn't usego
in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.
– icza
Nov 19 at 15:21
add a comment |
up vote
3
down vote
accepted
That is not how you're supposed to test code performance. You should use Go's built-in testing framework (testing
package and go test
command). See Order of the code and performance for details.
Let's create the testable code:
func f() {
// Code that must only be run once
}
var testOnce = &sync.Once{}
func DoWithOnce() {
testOnce.Do(f)
}
var (
mu = &sync.Mutex{}
b bool
)
func DoWithMutex() {
mu.Lock()
if !b {
f()
b = true
}
mu.Unlock()
}
Let's write proper testing / benchmarking code using the testing
package:
func BenchmarkOnce(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithOnce()
}
}
func BenchmarkMutex(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithMutex()
}
}
We can run the benchmark with the following code:
go test -bench .
And here are the benchmarking results:
BenchmarkOnce-4 200000000 6.30 ns/op
BenchmarkMutex-4 100000000 20.0 ns/op
PASS
As you can see, using sync.Once()
was almost 4 times faster than using a sync.Mutex
. Why? Because sync.Once()
has an "optimized", short path that uses only an atomic load to check if the task has been called before, and if so, no mutex is used. The "slow" path is likely only used once, on first call to Once.Do()
. Although if you'd have many concurrent goroutines attempting to call DoWithOnce()
, the slow path might be reached multiple times, but on the long run once.Do()
will only need to use an atomic load.
Parallel testing (from multiple goroutines)
Yes, the above benchmarking code only uses a single goroutine to test. But using multiple concurrent goroutines will just make the mutex's case worse, as it always have to obtain a mutex to even check if the task is to be called while sync.Once
just uses an atomic load.
Nevertheless, let's benchmark it.
Here are the benchmarking code using parallel testing:
func BenchmarkOnceParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithOnce()
}
})
}
func BenchmarkMutexParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithMutex()
}
})
}
I have 4 cores on my machine, so I'm gonna use those 4 cores:
go test -bench Parallel -cpu=4
(You may omit the -cpu
flag in which case it defaults to GOMAXPROCS
–the number of cores available.)
And here are the results:
BenchmarkOnceParallel-4 500000000 3.04 ns/op
BenchmarkMutexParallel-4 20000000 93.7 ns/op
When "concurrency increases", the results are starting to become uncomparable in favor of sync.Once
(in the above test, it's 30 times faster).
We may further increase the number of goroutines created using testing.B.SetPralleism()
, but I got similar result when I set it to 100 (that means 400 goroutines were used to call the benchmarking code).
Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
– Gilrich
Nov 19 at 14:47
How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be:go DoWithOnce()
andgo DoWithMutex()
?
– Gilrich
Nov 19 at 15:09
@Gilrich You shouldn't usego
in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.
– icza
Nov 19 at 15:21
add a comment |
up vote
3
down vote
accepted
up vote
3
down vote
accepted
That is not how you're supposed to test code performance. You should use Go's built-in testing framework (testing
package and go test
command). See Order of the code and performance for details.
Let's create the testable code:
func f() {
// Code that must only be run once
}
var testOnce = &sync.Once{}
func DoWithOnce() {
testOnce.Do(f)
}
var (
mu = &sync.Mutex{}
b bool
)
func DoWithMutex() {
mu.Lock()
if !b {
f()
b = true
}
mu.Unlock()
}
Let's write proper testing / benchmarking code using the testing
package:
func BenchmarkOnce(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithOnce()
}
}
func BenchmarkMutex(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithMutex()
}
}
We can run the benchmark with the following code:
go test -bench .
And here are the benchmarking results:
BenchmarkOnce-4 200000000 6.30 ns/op
BenchmarkMutex-4 100000000 20.0 ns/op
PASS
As you can see, using sync.Once()
was almost 4 times faster than using a sync.Mutex
. Why? Because sync.Once()
has an "optimized", short path that uses only an atomic load to check if the task has been called before, and if so, no mutex is used. The "slow" path is likely only used once, on first call to Once.Do()
. Although if you'd have many concurrent goroutines attempting to call DoWithOnce()
, the slow path might be reached multiple times, but on the long run once.Do()
will only need to use an atomic load.
Parallel testing (from multiple goroutines)
Yes, the above benchmarking code only uses a single goroutine to test. But using multiple concurrent goroutines will just make the mutex's case worse, as it always have to obtain a mutex to even check if the task is to be called while sync.Once
just uses an atomic load.
Nevertheless, let's benchmark it.
Here are the benchmarking code using parallel testing:
func BenchmarkOnceParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithOnce()
}
})
}
func BenchmarkMutexParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithMutex()
}
})
}
I have 4 cores on my machine, so I'm gonna use those 4 cores:
go test -bench Parallel -cpu=4
(You may omit the -cpu
flag in which case it defaults to GOMAXPROCS
–the number of cores available.)
And here are the results:
BenchmarkOnceParallel-4 500000000 3.04 ns/op
BenchmarkMutexParallel-4 20000000 93.7 ns/op
When "concurrency increases", the results are starting to become uncomparable in favor of sync.Once
(in the above test, it's 30 times faster).
We may further increase the number of goroutines created using testing.B.SetPralleism()
, but I got similar result when I set it to 100 (that means 400 goroutines were used to call the benchmarking code).
That is not how you're supposed to test code performance. You should use Go's built-in testing framework (testing
package and go test
command). See Order of the code and performance for details.
Let's create the testable code:
func f() {
// Code that must only be run once
}
var testOnce = &sync.Once{}
func DoWithOnce() {
testOnce.Do(f)
}
var (
mu = &sync.Mutex{}
b bool
)
func DoWithMutex() {
mu.Lock()
if !b {
f()
b = true
}
mu.Unlock()
}
Let's write proper testing / benchmarking code using the testing
package:
func BenchmarkOnce(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithOnce()
}
}
func BenchmarkMutex(b *testing.B) {
for i := 0; i < b.N; i++ {
DoWithMutex()
}
}
We can run the benchmark with the following code:
go test -bench .
And here are the benchmarking results:
BenchmarkOnce-4 200000000 6.30 ns/op
BenchmarkMutex-4 100000000 20.0 ns/op
PASS
As you can see, using sync.Once()
was almost 4 times faster than using a sync.Mutex
. Why? Because sync.Once()
has an "optimized", short path that uses only an atomic load to check if the task has been called before, and if so, no mutex is used. The "slow" path is likely only used once, on first call to Once.Do()
. Although if you'd have many concurrent goroutines attempting to call DoWithOnce()
, the slow path might be reached multiple times, but on the long run once.Do()
will only need to use an atomic load.
Parallel testing (from multiple goroutines)
Yes, the above benchmarking code only uses a single goroutine to test. But using multiple concurrent goroutines will just make the mutex's case worse, as it always have to obtain a mutex to even check if the task is to be called while sync.Once
just uses an atomic load.
Nevertheless, let's benchmark it.
Here are the benchmarking code using parallel testing:
func BenchmarkOnceParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithOnce()
}
})
}
func BenchmarkMutexParallel(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
DoWithMutex()
}
})
}
I have 4 cores on my machine, so I'm gonna use those 4 cores:
go test -bench Parallel -cpu=4
(You may omit the -cpu
flag in which case it defaults to GOMAXPROCS
–the number of cores available.)
And here are the results:
BenchmarkOnceParallel-4 500000000 3.04 ns/op
BenchmarkMutexParallel-4 20000000 93.7 ns/op
When "concurrency increases", the results are starting to become uncomparable in favor of sync.Once
(in the above test, it's 30 times faster).
We may further increase the number of goroutines created using testing.B.SetPralleism()
, but I got similar result when I set it to 100 (that means 400 goroutines were used to call the benchmarking code).
edited Nov 19 at 21:14
answered Nov 19 at 14:18
icza
159k24309349
159k24309349
Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
– Gilrich
Nov 19 at 14:47
How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be:go DoWithOnce()
andgo DoWithMutex()
?
– Gilrich
Nov 19 at 15:09
@Gilrich You shouldn't usego
in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.
– icza
Nov 19 at 15:21
add a comment |
Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
– Gilrich
Nov 19 at 14:47
How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be:go DoWithOnce()
andgo DoWithMutex()
?
– Gilrich
Nov 19 at 15:09
@Gilrich You shouldn't usego
in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.
– icza
Nov 19 at 15:21
Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
– Gilrich
Nov 19 at 14:47
Thanks, this completely solves my confusion. Also, I didn't know about the benchmarking capability of the testing package. Will use that from now on.
– Gilrich
Nov 19 at 14:47
How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be:
go DoWithOnce()
and go DoWithMutex()
?– Gilrich
Nov 19 at 15:09
How do I know however, that this runs multiple goroutines simulatneously? I'm not sure it does. Shouldn't it be:
go DoWithOnce()
and go DoWithMutex()
?– Gilrich
Nov 19 at 15:09
@Gilrich You shouldn't use
go
in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.– icza
Nov 19 at 15:21
@Gilrich You shouldn't use
go
in benchmarking code, that will render the results almost useless. But there, you shall have parallel results. Check edited answer.– icza
Nov 19 at 15:21
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53376237%2fefficiency-measurments-of-gos-once-type%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Where are your Go testing package benchmark results?
– peterSO
Nov 19 at 14:01
2
test3()
has data race (you cannot readb
without synchronization). And if you move that check inside the block protected by the mutex, you're already "doing worse" thanOnce.Do()
which has an "optimized" short path using an atomic load. The "slow" path is most likely only to be encountered once.– icza
Nov 19 at 14:05
see test2, it does the check on b inside the locked section and is still much faster.
– Gilrich
Nov 19 at 14:07
1
As peterSO wrote, we don't know how you got your test results. Show the testing and benchmarking code. Aim for a Minimal, Complete, and Verifiable example.
– icza
Nov 19 at 14:08
2
First, use Go's built-in benchmarking system, it is very thorough and effective. Second, unless you've benchmarked a real-world application and found a performance issue, and used profiling to trace that issue to
sync.Once
, this is a fruitless exercise. It is extremely unlikely thatsync.Once
will have any meaningful performance impact in any real-world scenario.– Adrian
Nov 19 at 14:19