Skip to content

Latest commit

 

History

History
21 lines (15 loc) · 1.12 KB

Wisconsin-Nov-2014.md

File metadata and controls

21 lines (15 loc) · 1.12 KB

Experiment Components (A single "script" from Condor's viewpoint, called it the placement script)

  • Local file system test
  • Point-to-point network test (iperf, nuttcp)
  • Data placement experiment (end-to-end)

Some failure cases:

  • kill an instance of a recurring job after N minutes, reschedule at next regular interval
  • Place on hold, if job fails K times in a row.
  • Failure/Success of placement script reported by instance 0

Need ability to replace a repetitive test (especially if placement script is changed)

Need to work with condor team to understand how to use condor_chirp to place information in the job log.

Job log should be parsed (condor has a library), to extract experiment results (success/failure, MB/s, total time, etc)

Development configuration of the pool

  • Each site will configure 4 slots (Beihang, CNIC, UCSD, UWISC)
  • A schedd at each site, will handle the scheduling of the named dedicated slots (this pushes strong user authentication to be solved as future problem)
  • Effectively creates 4 independently-scheduled pools on the same physical pool. Can result in performance collisions.