Replies: 3 comments 1 reply
-
|
Welcome, James. Happy to have you on board. :-) Any insight you can offer from your wealth of experience is very much appreciated.
This caught my eye; specifically the reference to a driver version. I'd be interested to hear your thoughts on whether the host requirements part of OpenJD might serve this use-case. The OpenJD solution that we're imaging for this sort of thing is to use customer-defined attribute requirements. The idea is that you'd define an attribute that represents the host configuration -- abstracting the specific software and software versions available. We're basically assuming here that a studio with 1000 hosts doesn't have 1000 unique snowflake hosts, but rather a smaller set of host images/configurations that they deploy to those hosts. That would allow you to put something like this in to your Step definition: Where the values |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for taking a look, it's great to read your thoughts!
The problem that this part of the spec is solving for, is how to express a pattern within a job that looks like:
For job templates that use this structure, a render farm implementation has a choice of how to map that job into its own structure. One option is to use chunking, so that each task runs as a Session, encapsulating the environment enters, task runs, and environment exits as a single schedulable entity. This doesn't impose implementation on the farm system, because all of that logic can live inside the task definition itself. The other option is to make the render farm scheduler aware of the Environment, Step, and Task entities, and to represent the Session directly in the scheduler. In order for a general job description schema to support both of these implementation choices, it needs to support specifying the job in a way that can be mapped to either, and that's what we've attempted to do. By having clearly defined semantics as documented by the runtime description, I believe that this kind of mapping is possible to create without requiring studio-specific changes that are too disruptive. This needs validation, of course, and I'm curious what examples everyone can come up with from their experience at different studios. I did a pass through your example to try expressing it as OpenJD yaml, and here's what I came up with. It's definitely more verbose, and it looks like the GPU driver constraint isn't expressed well in what we have now, but I think the mapping worked well. Your syntax reads more like code to me than a data structure, and we avoided that in order to make job templates more easily consumed by GUI and pipeline tooling. Curious what you think of that. specificationVersion: 'jobtemplate-2023-09'
name: "Tomy My First JAF job"
decription: |
This is an attempt to translate jvanns' example job into OpenJD format, to help see
how well it holds up.
parameterDefinitions:
- name: "shot"
type: STRING
default: "ayz421"
- name: "phasers"
type: STRING
default: "stun"
- name: "number"
type: INT
default: 1234567
- name: "frames"
type: STRING
default: "1-999"
- name: "frames_zip"
type: STRING
default: "1000-2000:5"
steps:
- name: "baz"
parameterSpace:
taskParameterDefinitions:
- name: frame
type: INT
range: "{{Param.frames}}"
- name: pi
type: FLOAT
range: [3.14159]
script:
actions:
onRun:
command: '{{Task.File.echo}}'
embeddedFiles:
- name: echo
runnable: true
type: TEXT
data: |
#!/bin/env bash
set -euo pipefail
echo "{{Task.Param.frame}}"
hostRequirements:
attributes:
- name: attr.worker.os.family
anyOf:
- linux
- name: "zip"
parameterSpace:
taskParameterDefinitions:
- name: frame
type: INT
range: "{{Param.frames_zip}}"
- name: pi
type: FLOAT
range: [3.14159]
script:
actions:
onRun:
command: '{{Task.File.echo}}'
embeddedFiles:
- name: echo
runnable: true
type: TEXT
data: |
#!/bin/env bash
set -euo pipefail
echo "{{Task.Param.frame}}"
hostRequirements:
attributes:
- name: attr.worker.os.family
anyOf:
- linux
amounts:
- name: amount.worker.vcpu
min: 1
- name: amount.worker.memory
min: 1024
- name: amount.worker.gpu
min: 1
- name: "bogus"
script:
actions:
onRun:
command: 'bash'
args: ["-c", ""]
hostRequirements:
attributes:
- name: attr.gpu.driver
anyOf:
- "430.26"
amounts:
- name: amount.worker.vcpu
min: 1
- name: amount.worker.gpu
min: 1
- name: amount.worker.memory
min: 4096
|
Beta Was this translation helpful? Give feedback.
-
|
Hi. Thanks for taking the time to read through and respond! Yes, I see the benefit (and indeed have written & used systems that demonstrate this) of the collaboration between running task (environment) and scheduler where a comms channel can enable richer features such as real-time metrics or load-once-exec-many. In fact, this is also often required for DCC features too (e.g. old netrenders and others similar to it where there were a set of workers and one coordinator, spread across the farm). I guess I just wasn't ready to approach standardising a way of supporting that between different systems that may or may not require that level of sophistication! But everyone needs to get their job into a system, so I'm down with a common description for that! My goal was not so much 'code vs. data' (perhaps language) but rather easily readable to humans hence the redundant, declarative format but overall far less verbose due to more expressive statements. The example I gave was definitely contrived though - more to demonstrate the idea of the language than a bona fide job! The resource description is actually a separate parser project (extended from a simpler format I wrote for a Mesos-based farm scheduler) but happens to fit nicely into a string in 'JAF'. I don't have a name for it, but its pretty expressive. Here's a few more examples (as you can see they're lifted directly from C code comments where I'd written the example parser & type/value generator rather than rely upon ANTLR as I had for the job description); It's really just demonstrable food-for-thought since it isn't used in any production system but rather written as an isolated example for a strict but flexible type+catalogue system that could be easily extended to support new farm features but without a lot of code adjustments. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there OJD community (a very new and young one at that!). Well this is interesting to see :D A great idea surely pondered by many over the decades, myself included. I've read through these pages and have caught up on Pauline's ASWF presentation, all of which I was unaware of before (I mostly keep quiet over here in the UK). However, having experimented many times before in this domain (Framestore, ILM and at home!) you've piqued my interest again. I love the idea of a standardised submission/description language that is the interface into to whats likely to be a set of very different farm management systems at each studio. Although, and I could be wrong, it did sound like there was a desire within this spec to also impose implementation or interchange within the farm system post-submission? I think you're referring to this as the 'Session' in the runtime description page. I think that could be a tricky thing to agree on since its possibly too far into the weeds of studio specifics. Sure, a decent interchange format or API (remember DRMAA from Grid days?) that presents an abstraction for a job should be achievable but beyond that, once its in the system, I'm unsure how adoptable that would be - especially if it begins to imply an environment or protocol between [running] task and scheduler/dispatcher or whatever component a studio has for managing tasks.
I could certainly get onboard with this. FWIW, since I'm never going to finish it(!), I thought I'd share an example of some humble beginnings I've drafted before. Its quite intentionally not YAML or JSON since the idea was to first focus on something declarative and readable to a regular user - artist or TD. I relied upon ANTLR for the parsing and code generation, which I then used to get objects ready for submission etc. (with a few to having these objects be used by farm submission APIs). As you can see its very wordy with lots of redundancy to make it feel more of a natural language (basically English!). It sits only in your 'How Jobs Are Constructed' remit rather than how they're run (lets see how markdown ruins this!). You could be forgiven for thinking it reminds you of Alfred ;)
That aside, I think the idea of trying to draft a submission format/API/interchange is a great one and it'd be interesting to see how easy or difficult it would be for different studios with different workflows and systems are able to contribute or even where they share commonality.
Beta Was this translation helpful? Give feedback.
All reactions