Replies: 5 comments 6 replies
-
|
Ah, yes, I think I alluded to this in my post too - sorry Justin, it would probably have been better posted here in reply to your comment. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Justin,
I think the answer is a combination of yes & no. I'll elaborate... There's nothing in this spec that is intended to dictate a push vs. pull model for getting work to a worker, but you are right in that the Sessions concept implies either some additional logic exist in the scheduler/worker communications, or that another solution is in place. This concept of a Session was generalized from a similar and useful runtime optimization that exists in Thinkbox's Deadline product . The idea is that you can have expensive setup operations that happen just once, and then run more than one task against that setup to save compute/wall-clock time, and thus money. We've seen this as particularly useful with applications like Maya where it can take a very long time just to open the application and load the scene; being able to leverage that load time for more than one task can speed things up significantly. That all said, it is an optimization, and optimizations should be optional. If a scheduler/worker are stateful and designed for a Sessions-compatible state machine then hopefully there wouldn't be an issue. If not, then imagine having available a CLI that could be given a job template, step name, and series of task numbers (if we linearize the step's parameter space). If that CLI were available on the worker host, as any application would be, then a job submission of an OpenJD job to the render management system could translate that job into its own native internal form by having the tasks in the system simply run that CLI. Concretely, let's say you have a job template like: That has one Step with 200 Tasks. The artist/TD submits that to the farm to run. As part of that submission, say that the tool that they use to translate OpenJD to the native form decides to pre-chunk the work into 5 Tasks per Session. The commands that are run on the farm, by the native render manager may look something like: When that's run on the worker, then the How does that sound to you? |
Beta Was this translation helpful? Give feedback.
-
|
The answer I posted in #21 (comment) is related, and echos similar ideas as Daniel. |
Beta Was this translation helpful? Give feedback.
-
I can see how the initial decisions are based on the features of a concrete Render Farm implementation. Looking at the reply and the details of Sessions more, I can see how there is a similarity to what our internal Job Description Framework (Kenobi) does for job state management. First I want to comment on the more general aspect and then circle back to the state management... It does appear that OpenJobDescription wants to be more of a framework and not just a specification language. That is clear because of the suggestion to rely on CLI tooling like So all that being said, I can see how OpenJobDescription is really trying to become a Job Description Framework and not just a yaml spec. For the extra features to really work, they have to be enabled behind tooling that accompanies the framework. If this were Kenobi, then one could see it implemented for other render farms as a backend for Deadline and an appropriate state management module, if an existing one were not appropriate. |
Beta Was this translation helpful? Give feedback.
-
|
I don't want to hijack the previous comment from @justinfx too much (#20 (comment))
For this part in particular, I wanted to connect this line of thought to our recent 2/13 releases of the openjd-cli, and openjd-sessions packages. I'm hoping these packages provide more context. I know @justinfx you had been trying to get more details of kenobi up for discussion. Even if that's not possible, I'm hoping we can continue to discuss more using the openjd packages and the way they work as examples. openjd-sessions in particular codifies how we run job templates. Please give these packages a try, @justinfx , @jvanns and let us know what you think! Also I wanted to say thanks because your enthusiasm helped us stay on track to get these in front of you sooner rather than later :) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
When I was reading through the information on Sessions, I kept wondering whether the job spec is implying implementation details on render farm software. The spec seems to take the perspective of render farms that are decentralized, where a worker node is asking for work and says "ok I am going to start servicing Job 123". And then it keeps asking for tasks from the job until it knows there is nothing left. That would make sense where the worker node could have clear boundaries of when to establish and tear down the working location for a job.
Our render farm at Weta, Plow, is a centralized implementation. The worker nodes are told what to do via an RPC and aren't really aware of the bounds of any given job. It just receives instructions "run this task with these resources" and it actively reports the status. So in order to support this spec and satisfy the requirement for Sessions, does that mean we have to build logic into our server side and communicate job boundaries to participating worker nodes to support the Session temp location on each worker?
I just wonder how much of the spec assumes certain architectures about any given render farm? There are other similar aspects like the embedded files feature. Does that imply that any supporting render farm needs to support the injection of arbitrary file data with the job submission?
Beta Was this translation helpful? Give feedback.
All reactions