meetingNotes/Wisconsin-Nov-2014.md at master · iDPL/meetingNotes · GitHub

Experiment Components (A single "script" from Condor's viewpoint, called it the placement script)

Local file system test
Point-to-point network test (iperf, nuttcp)
Data placement experiment (end-to-end)

Some failure cases:

kill an instance of a recurring job after N minutes, reschedule at next regular interval
Place on hold, if job fails K times in a row.
Failure/Success of placement script reported by instance 0

Need ability to replace a repetitive test (especially if placement script is changed)

Need to work with condor team to understand how to use condor_chirp to place information in the job log.

Job log should be parsed (condor has a library), to extract experiment results (success/failure, MB/s, total time, etc)

Development configuration of the pool

Each site will configure 4 slots (Beihang, CNIC, UCSD, UWISC)
A schedd at each site, will handle the scheduling of the named dedicated slots (this pushes strong user authentication to be solved as future problem)
Effectively creates 4 independently-scheduled pools on the same physical pool. Can result in performance collisions.