-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Optimize package initialization #197
Conversation
Have left some comments and generally looks really good... do wish tho the changes were done separately so it'd be possible to see what effect each change is having on package load time. |
phonenumbers.go
Outdated
} | ||
|
||
// load it into our map | ||
onceMap[language].Do(func() { | ||
prefixMap, err := loadPrefixMap(binMap[language]) | ||
if _, loaded := visited.LoadOrStore(language, struct{}{}); !loaded { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this lead to a race condition if two threads get here at the same time looking for a language that hasn't yet been loaded.. first does the load, second doesn't.. but second could get to line 3434 before the first thread completes the loading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the sync.Map should prevent this situation because it blocks https://pkg.go.dev/sync#Map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It blocks writes to the map within the call to LoadOrStore
so thread 1 and 2 arrive at that call and let's say thread 1 is first so it will write struct{}{}
into the map and thread 2 will read struct{}{}
from the map.
Then both threads can proceed.. loaded
is true for thread 1 and false for thread 2. Thread 1 tries to write into langMap[language]
but thread 2 might read from it first 💥 .
Unless I'm forgetting how sync.Map
works... I think you have a race condition. And I'm not sure the existing solution of using sync.Once
is so bad we need to replace it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not have the list to prefill the maps of onces in the init() func 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you put the sync.Once
stuff back in this PR and then in a separate PR you can try replacing it with a map of mutexes or something else.. and at least then we can measure the difference it makes. In this PR it's impossible to know what is contributing to the fast package load times because we also have the embed changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I did the sync.Once changes because of the embed changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think of adding a mutex ? 🙏 something like:
func fillLangMap(visited *sync.Map, langMap map[string]*intStringMap, file fs.File, language string) error {
m.Lock()
defer m.Unlock()
// load it into our map
if _, loaded := visited.LoadOrStore(language, struct{}{}); !loaded {
data, err := io.ReadAll(file)
if err != nil {
return err
}
prefixMap, err := loadPrefixMap(data)
if err == nil {
langMap[language] = prefixMap
}
}
return nil
}
I added this tests to reproduce the error
func TestGetCarrierWithPrefixForNumberWithConcurrency(t *testing.T) {
number, _ := Parse("+8613702032331", "ZZ")
wg := sync.WaitGroup{}
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
defer wg.Done()
_, _, err := GetCarrierWithPrefixForNumber(number, "en")
if err != nil {
t.Errorf("Failed to getCarrierWithPrefix for the number %s: %s", "+8613702032331", err)
}
}()
}
wg.Wait()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutexes seem like a good solution...
- You don't need
visited *sync.Map
.. you have synchronized access tolangMap
now - Using one mutex means you have now synchronized across all languages.. which is more synchronized than this code was previously.. but I think I'm ok with that for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the more I think about it a mutex for each map seems totally fine here and it's a lot simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, i removed the map and added the mutex 😄
thank you for your time, next I will try using a pool of gzips to see if it helps with the performance
phonenumbers.go
Outdated
@@ -607,10 +608,12 @@ var ( | |||
|
|||
// These are our onces and maps for our prefix to carrier maps | |||
carrierOnces = make(map[string]*sync.Once) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove it, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
67e4ed9
to
1b8fc0a
Compare
|
||
// Create our sync.Onces for each of our languages for carriers | ||
for lang := range gen.CarrierData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rowanseymour look, here i need the gen data to fill the onces map
77c1206
to
a4797d6
Compare
phonenumbers.go
Outdated
return "", 0, err | ||
} | ||
|
||
defer file.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the file existence check always repeated? Remember package initialization isn't the only concern here - lookup times are more important.
I think getValueForNumber
should work like...
- inside mutex...
- check if
langMap
has value forlang
- try to load from file if not, storing result in
langMap
- use value from
langMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i was wondering the same the old code checked whether the language key exists in the metadata/binMap first and then it parses the file...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rowanseymour sorry i am making the changes but, do you know why we do not return an error when the field do not exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I applied the changes I am not sure it is according to your idea, let me know if you see how it can be improved
I added a benchmark these are the results:
prev change:
BenchmarkGetCarrierWithPrefixForNumber 30 34821182 ns/op 2948078 B/op 28948 allocs/op
main
BenchmarkGetCarrierWithPrefixForNumber 109204 11091 ns/op 401 B/op 7 allocs/op
new
BenchmarkGetCarrierWithPrefixForNumber 107184 11363 ns/op 448 B/op 8 allocs/op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That benchmark tho is just checking the same language over and over again which means only one data load.. but I think it's ok
This commit improves package startup performance by: * **Embedding static data:** Replaced go file data loading with `go:embed`, reducing I/O overhead. * **Optimized metadata build:** Changed `cmd/buildmetadata` to download `metadata` files with the new format. * **Logarithmic digit counting:** Replaced string conversion and length calculation with a faster logarithmic approach for digit counting. * **Improve initialization:** Utilized `sync.Map` instead of `map[string]sync.Once` for prevent gunzip the same metadata more than once. * **Clarify data directory:** Removed the `gen` directory, moving data into the `data` directory because the data is not longer loaded as generated generated code. These changes result in a substantial reduction in package startup time. old: init github.com/nyaruka/phonenumbers @4.2 ms, 7.2 ms clock, 2994832 bytes, 34852 allocs new: init github.com/nyaruka/phonenumbers @2.1 ms, 3.7 ms clock, 2917304 bytes, 34444 allocs
a4797d6
to
76f8699
Compare
This commit improves package startup performance by:
go:embed
, reducing I/O overhead.cmd/buildmetadata
to downloadmetadata
files with the new format.sync.Map
instead ofmap[string]sync.Once
for prevent gunzip the same metadata more than once.gen
directory, moving data into thedata
directory because the data is not longer loaded as generated generated code.These changes result in a substantial reduction in package startup time.
old:
init github.com/nyaruka/phonenumbers @4.2 ms, 7.2 ms clock, 2994832 bytes, 34852 allocs
new:
init github.com/nyaruka/phonenumbers @2.1 ms, 3.7 ms clock, 2917304 bytes, 34444 allocs
the binary size also went down 1M:
4.0K ./main.go
7.8M ./phoneparser-new
8.9M ./phoneparser-main
17M .