Description
Describe the bug
The bug demostrates that verdaccio with s3 backend is stateful, with package-list cached in memory, and caused inconsistent and racing issues in a clustrer env. The solution is attached, and discussion are welcome.
I've setup a minimal cluster verdaccio deployment, using two verdaccio instances, s3 store and nginx as reverse proxy. The s3 plugin is slightly modified, but nothing really hit the core logic.
Before the test, I already have one package (com.littlebigfun.addressable-importer
) in verdaccio.
# space is an alias for aws-cli
$ space ls s3://openupm/verdaccio/
PRE com.littlebigfun.addressable-importer/
2019-11-27 00:26:39 126 verdaccio-s3-db.json
$ space cp s3://openupm/verdaccio/verdaccio-s3-db.json -
{"list":["com.littlebigfun.addressable-importer"],"secret":"..."}
Let's publish another package (com.bastianblokland.enumgenerator
) for testing.
$ npm --registry=my-registry publish
...
+ [email protected]
Logs show that the return code is 201, the publish is successful. The NotFoundError
is harmless for new package. Notice the publish job is executed by verdaccio instance 0 (the log prefix tells).
0|verdaccio | info <-- 127.0.0.1 requested 'PUT /com.bastianblokland.enumgenerator'
0|verdaccio | error-=- s3: [S3PackageManager writeTarball headObject] { NotFoundError: no such package available
0|verdaccio | http <-- 201, user: openupm(156.236.113.121 via 127.0.0.1), req: 'PUT /com.bastianblokland.enumgenerator', bytes: 1683542/53
The added package is verified in S3.
$ space cp s3://openupm/verdaccio/verdaccio-s3-db.json -
{"list":["com.littlebigfun.addressable-importer","com.bastianblokland.enumgenerator"],"secret":"..."}
$ space ls s3://openupm/verdaccio/
PRE com.bastianblokland.enumgenerator/
PRE com.littlebigfun.addressable-importer/
2019-11-27 01:08:59 162 verdaccio-s3-db.json
Now the buggy part, let's curl the package list, twice. Notice that only the second call return the new added package.
# first pass - wrong
$ curl https://my-registry/-/verdaccio/packages
[
{
"name": "com.littlebigfun.addressable-importer",
...
}
]
# second pass - correct
$ curl https://my-registry/-/verdaccio/packages
[
{
"name": "com.bastianblokland.enumgenerator",
...
},
{
"name": "com.littlebigfun.addressable-importer",
...
}
]
Logs show that the second correct curl result is from verdaccio instance 0, the one just executed the publish command. The incorrect curl result is from verdaccio instance 1. We can run it for multiple times, the result is the same. The verdaccio instance 1 never return the new added package.
1|verdaccio | info <-- 127.0.0.1 requested 'GET /-/verdaccio/packages'
1|verdaccio | http <-- 200, user: null(156.236.113.121 via 127.0.0.1), req: 'GET /-/verdaccio/packages', bytes: 0/4070
0|verdaccio | info <-- 127.0.0.1 requested 'GET /-/verdaccio/packages'
0|verdaccio | http <-- 200, user: null(156.236.113.121 via 127.0.0.1), req: 'GET /-/verdaccio/packages', bytes: 0/7787
This behavior seems implying that verdaccio has some sort of local cache in memory of package list (verdaccio-s3-db.json). So until I restart verdaccio instance 1, there's no way to notify verdaccio instance to refresh the cache. I haven't check the source code yet, so it is just my guessing. But if this is true, it means verdaccio is not scalable, can only run with one instance. Well this isn't my expectation when discussing with @juanpicado on verdaccio/verdaccio#1459 (comment), where I ask for the the behaviour of how to handle a shared package list in cluster env.
I need some time to read getLocalDatabase method of https://github.com/verdaccio/verdaccio/blob/dbf20175dc68dd81e52363cc7e8013e24947d0fd/src/lib/storage.ts, to figure it out. But please guide me if you think there's something obvious I missed.
To Reproduce
You will need a simliar deployment - two instances of verdaccio managed by pm2, s3 backend. Nginx isn't necessary.
Expected behavior
All verdaccio instances should return the latest package list right after new package added (or removed).
Configuration File (cat ~/.config/verdaccio/config.yaml)
storage: ./storage
plugins: ./plugins
max_body_size: 200mb
listen: 0.0.0.0:4873
server:
keepAliveTimeout: 60
middlewares:
audit:
enabled: true
web:
enable: true
auth:
htpasswd:
file: ./htpasswd
max_users: -1
packages:
'@*/*':
# scoped packages
access: $all
publish: $authenticated
unpublish: $authenticate
'**':
access: $all
publish: $authenticated
unpublish: $authenticated
store:
aws-s3-storage:
bucket: openupm
region: sfo2
endpoint: ...
accessKeyId: ...
secretAccessKey: ...
s3ForcePathStyle: true
keyPrefix: 'verdaccio/'
tarballACL: public-read
tarballEdgeUrl: ...
convert_to_local_tarball_url: false
Environment information
verdaccio: 4.3.4 (modified: verdaccio/verdaccio#1580)
s3-plugin: 8.4.2 (modified: #249)
My modifications are made for #250, which is not related to this bug.
Debugging output
$ NODE_DEBUG=request verdaccio
display request calls (verdaccio <--> uplinks)$ DEBUG=express:* verdaccio
enable extreme verdaccio debug mode (verdaccio api)$ npm -ddd
prints:$ npm config get registry
prints:
Additional context