Skip to content

aws-s3: Inconsistent package list in cluster deployment #317

Open
@favoyang

Description

@favoyang

Describe the bug

The bug demostrates that verdaccio with s3 backend is stateful, with package-list cached in memory, and caused inconsistent and racing issues in a clustrer env. The solution is attached, and discussion are welcome.

I've setup a minimal cluster verdaccio deployment, using two verdaccio instances, s3 store and nginx as reverse proxy. The s3 plugin is slightly modified, but nothing really hit the core logic.

Before the test, I already have one package (com.littlebigfun.addressable-importer) in verdaccio.

# space is an alias for aws-cli

$ space ls s3://openupm/verdaccio/
                           PRE com.littlebigfun.addressable-importer/
2019-11-27 00:26:39        126 verdaccio-s3-db.json

$ space cp s3://openupm/verdaccio/verdaccio-s3-db.json -
{"list":["com.littlebigfun.addressable-importer"],"secret":"..."}

Let's publish another package (com.bastianblokland.enumgenerator) for testing.

$ npm --registry=my-registry publish
...
+ [email protected]

Logs show that the return code is 201, the publish is successful. The NotFoundError is harmless for new package. Notice the publish job is executed by verdaccio instance 0 (the log prefix tells).

0|verdaccio  |  info <-- 127.0.0.1 requested 'PUT /com.bastianblokland.enumgenerator'
0|verdaccio  |  error-=- s3: [S3PackageManager writeTarball headObject] { NotFoundError: no such package available
0|verdaccio  |  http <-- 201, user: openupm(156.236.113.121 via 127.0.0.1), req: 'PUT /com.bastianblokland.enumgenerator', bytes: 1683542/53

The added package is verified in S3.

$ space cp s3://openupm/verdaccio/verdaccio-s3-db.json -
{"list":["com.littlebigfun.addressable-importer","com.bastianblokland.enumgenerator"],"secret":"..."}

$ space ls s3://openupm/verdaccio/
                           PRE com.bastianblokland.enumgenerator/
                           PRE com.littlebigfun.addressable-importer/
2019-11-27 01:08:59        162 verdaccio-s3-db.json

Now the buggy part, let's curl the package list, twice. Notice that only the second call return the new added package.

# first pass - wrong
$ curl https://my-registry/-/verdaccio/packages
[
  {
    "name": "com.littlebigfun.addressable-importer",
    ...
  }
]
# second pass - correct
$ curl https://my-registry/-/verdaccio/packages
[
  {
    "name": "com.bastianblokland.enumgenerator",
    ...
  },
  {
    "name": "com.littlebigfun.addressable-importer",
    ...
  }
]

Logs show that the second correct curl result is from verdaccio instance 0, the one just executed the publish command. The incorrect curl result is from verdaccio instance 1. We can run it for multiple times, the result is the same. The verdaccio instance 1 never return the new added package.

1|verdaccio  |  info <-- 127.0.0.1 requested 'GET /-/verdaccio/packages'
1|verdaccio  |  http <-- 200, user: null(156.236.113.121 via 127.0.0.1), req: 'GET /-/verdaccio/packages', bytes: 0/4070
0|verdaccio  |  info <-- 127.0.0.1 requested 'GET /-/verdaccio/packages'
0|verdaccio  |  http <-- 200, user: null(156.236.113.121 via 127.0.0.1), req: 'GET /-/verdaccio/packages', bytes: 0/7787

This behavior seems implying that verdaccio has some sort of local cache in memory of package list (verdaccio-s3-db.json). So until I restart verdaccio instance 1, there's no way to notify verdaccio instance to refresh the cache. I haven't check the source code yet, so it is just my guessing. But if this is true, it means verdaccio is not scalable, can only run with one instance. Well this isn't my expectation when discussing with @juanpicado on verdaccio/verdaccio#1459 (comment), where I ask for the the behaviour of how to handle a shared package list in cluster env.

I need some time to read getLocalDatabase method of https://github.com/verdaccio/verdaccio/blob/dbf20175dc68dd81e52363cc7e8013e24947d0fd/src/lib/storage.ts, to figure it out. But please guide me if you think there's something obvious I missed.

To Reproduce
You will need a simliar deployment - two instances of verdaccio managed by pm2, s3 backend. Nginx isn't necessary.

Expected behavior
All verdaccio instances should return the latest package list right after new package added (or removed).

Configuration File (cat ~/.config/verdaccio/config.yaml)

storage: ./storage
plugins: ./plugins
max_body_size: 200mb
listen: 0.0.0.0:4873

server:
  keepAliveTimeout: 60

middlewares:
  audit:
    enabled: true

web:
  enable: true

auth:
  htpasswd:
    file: ./htpasswd
    max_users: -1

packages:
  '@*/*':
    # scoped packages
    access: $all
    publish: $authenticated
    unpublish: $authenticate

  '**':
    access: $all
    publish: $authenticated
    unpublish: $authenticated

store:
  aws-s3-storage:
    bucket: openupm
    region: sfo2
    endpoint: ...
    accessKeyId: ...
    secretAccessKey: ...
    s3ForcePathStyle: true
    keyPrefix: 'verdaccio/'
    tarballACL: public-read
    tarballEdgeUrl: ...

convert_to_local_tarball_url: false

Environment information

verdaccio: 4.3.4 (modified: verdaccio/verdaccio#1580)
s3-plugin: 8.4.2 (modified: #249)
My modifications are made for #250, which is not related to this bug.

Debugging output

  • $ NODE_DEBUG=request verdaccio display request calls (verdaccio <--> uplinks)
  • $ DEBUG=express:* verdaccio enable extreme verdaccio debug mode (verdaccio api)
  • $ npm -ddd prints:
  • $ npm config get registry prints:

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions