Skip to content

Azure meeting Oct 15 2021

Kenneth Hoste edited this page Oct 16, 2021 · 1 revision

EESSI/Azure/SURF sync meeting 20211015

Agenda

  • Update on NeIC project proposal (S4)
  • EESSI Stratum-1 in Azure (Bob)
  • GitHub runners for EESSI hosted in Azure VM (Bob)
  • Use of Terraform (Bob)
  • Zen3 build node (Kenneth)
    • Some trouble due to SELinux (?)
  • Work on interconnect detection support in archspec (Hugo)

Attendees

  • Laura Redfern
  • Martin Brandt
  • Ivar Janmaat
  • Bob Dröge
  • Kenneth Hoste
  • Ahmad Hesam
  • Alan O'Cais
  • Hugo Meiland

Notes

  • Update on NeIC project proposal (S4)

    • Did not get funded
    • Pretty decent score but competition was tough
    • Were recommended to reapply in next funding round (Feb. 22)
      • Will use feedback to tune proposal
      • Need to make a concrete connection back to users
        • Need some additional nordic partners (since that was noted)
    • Other opportunities will arise soon (like EOSC calls which are currently being fine-tuned)
    • Laura: Can arrange a letter of support for future bids
  • EESSI Stratum-1 in Azure (Bob)

    • Now part of the (latest) configuration package
    • CVMFS uses geoapi so may not be used so much since it currently sits in US
      • Hugo will test it out
      • Can check with cvmfs_config which S1 you're talking to
        # first make sure that CVMFS is mounted, e.g. by doing an ls:
        ls /cvmfs/pilot.eessi-hpc.org
        
        cvmfs_config stat -v pilot.eessi-hpc.org
        
        # That should show something like:
        # Connection: http://134.94.88.70/cvmfs/pilot.eessi-hpc.org through proxy DIRECT (online)
        
      • (Default) GitHub runners may also be using this
    • Should keep an eye on traffic, as this can be large
    • Azure blob as Stratum-1 is an option that might be interesting
  • GitHub runners for EESSI hosted in Azure VM (Bob)

    • Some of our actions exceed the 6h time limit for default runners
    • CVMFS do not provide containers for some archs (ARM + POWER) so we need to build them from source
    • Created our own runners to build containers
      • Only need these intermittently when the containers need updating
    • Any experience with Auto-scaling Kubernetes cluster for GitHub Actions workflows?
  • Use of Terraform through API access to Azure (Bob)

    • separate 'terraform' account
    • Martin can probably help here, has done this
  • Zen3 build node (Kenneth)

    • available now for EESSI in West Europe
    • Some trouble due to SELinux (?)
      • Using our container inside the image is kicking an error
      Singularity> mkdir /cvmfs/pilot.eessi-hpc.org/2021.06/software/linux/x86_64/amd/zen3
      mkdir: cannot create directory '/cvmfs/pilot.eessi-hpc.org/2021.06/software/linux/x86_64/amd/zen3': Operation not supported
      
      • Can make it work if /tmp is used for the overlay
      • Look like it could be related to mount options and SELinux
      • Once we get this resolved we should have a full stack in a day given the node is so powerful
      • Kenneth & Hugo will look into this together in a short call
      • see also https://github.com/EESSI/software-layer/issues/138
  • Work on interconnect detection support in archspec (Hugo)

  • Usage is about 200euro/month so no alarms trigger :P

    • only Stratum-1 + GitHub Actions runners
  • Are there any relevant upcoming events?

    • There will be another EasyBuild User meeting
    • Having an end-user focussed tutorial might be a good idea
      • For example, for someone building on top of the EESSI stack
      • Topics:
        • setting up EESSI from scratch
        • usage
        • building your own software on top
  • Hugo: Marketplace VM image

    • When will there be a production stack?
      • Really hard to say
      • Would need to have some monitoring in place...and someone to notify if there is something wrong
      • How we do roll something back? Who can do that?
      • If we have issues with a stratum 1 who fixes it, and if we can't contact the responsible people how do we kick it out
        • Can we use DNS to kick out stratum 1s?
        • That is possible

Clone this wiki locally