Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,28 @@ There is no functional difference between the first server that was installed an

*See also*: xref:bucket-index[bucket index]

[[inference]]
==== image:images/yes.png[yes] inference (noun)
*Description*: The act a model generating outputs from input data. For example, "Inference speeds increased on the new models"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Description*: The act a model generating outputs from input data. For example, "Inference speeds increased on the new models"
*Description*: The process in which a trained model is loaded into memory and generates output based on input data.
For example, "The Llama-3.2-90B-Vision-Instruct-FP8-dynamic model performs inference to identify objects in an image."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aireilly This is an improvement in the definition. Thanks. I'd like to rewrite this a bit. In general, we write definitions as complete sentences. How's this?

Suggested change
*Description*: The act a model generating outputs from input data. For example, "Inference speeds increased on the new models"
*Description*: AI inference is the process in which a trained model is loaded into memory and then the makes predictions or performs tasks on new data. For example, "The Llama-3.2-90B-Vision-Instruct-FP8-dynamic model performs inference to identify objects in an image."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Description*: The act a model generating outputs from input data. For example, "Inference speeds increased on the new models"
*Description*: AI inference is the process in which a trained model is loaded into memory and then makes predictions based on input data. For example, "The Llama-3.2-90B-Vision-Instruct-FP8-dynamic model performs inference to identify objects in an image."

What about this?


*Use it*: yes

[.vale-ignore]
*Incorrect forms*:

*See also*:

[[inferencing]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[[inferencing]]
[[inference serving]]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Inference serving" seems to be the proper form.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They seem to both be correct terms, but they have different meanings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. So maybe we need "inferencing" and "inference serving"?

Inferencing - the process of running a model
Inference Serving - deploying the inferencing capability on a server

Something like this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that looks about right.

==== image:images/yes.png[yes] inferencing (noun)
*Description*: A process by which a model processes input data, deduce information, and generates an output. For example, "The inferencing workload is distributed across multiple accelerators."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Description*: A process by which a model processes input data, deduce information, and generates an output. For example, "The inferencing workload is distributed across multiple accelerators."
*Description*: The act of deploying and running a trained model so that it can process input data and generate output.
For example, "Use vLLM to inference serve a trained model."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aireilly Both definitions are correct. We just have to decide if we want to put both "inferencing" and "inference serving" in the SSG or choose one over the other.


*Use it*: yes

[.vale-ignore]
*Incorrect forms*:

*See also*:

[[inference-engine]]
==== image:images/yes.png[yes] inference engine (noun)
*Description*: In Red{nbsp}Hat Process Automation Manager and Red{nbsp}Hat Decision Manager, the _inference engine_ is a part of the Red{nbsp}Hat Decision Manager engine, which matches production facts and data to rules. It is often called the brain of a production rules system because it is able to scale to a large number of rules and facts. It makes inferences based on its existing knowledge and performs the actions based on what it infers from the information.
Expand All @@ -359,6 +381,29 @@ There is no functional difference between the first server that was installed an

*See also*:

[[inferenceservice]]
==== image:images/yes.png[yes] InferenceService (noun)
*Description*: In Red Hat OpenShift AI, this is the custom resource definition (CRD) used to create the `InferenceService` object. When referring to the CRD name, use `InferenceService` in monospace.


*Use it*: yes

[.vale-ignore]
*Incorrect forms*: InferenceService, inference serving

*See also*:

[[inference-serving]]
==== image:images/yes.png[yes] inference serving (verb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is "inferencing" accepted verb usage for serving models? For example, "inferencing the quantized granite model".

Copy link
Contributor Author

@kelbrown20 kelbrown20 Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a great point, early on when researching this, I rarely ever saw usage of the the verb form "inferencing" , but Ive been seeing it more and more and It seems like its a standard. I think part of the confusing part was it seemed like PMs were using "inference serving" primarily as the single word for both inferencing and serving, which still works for certain contexts. So I'll keep this term, update the description to make that more clear, and make "inferencing" an option

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems jargony to me, vs "performing/running inference" or "inference serving". (ChatGPT and Gemini support this analysis 😃 )

Copy link
Member

@aireilly aireilly Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both appear widely used in academic papers.

Copy link
Contributor Author

@kelbrown20 kelbrown20 Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see, I think the its the noun form vs the verb form of inferencing. I saw inferencing used more as a noun when going through those a few of those articles. I do think I need to update these but @aireilly what do you think of the new defs with examples

*Description*: _Inference serving_ is the process of deploying a model onto a server for the model to inference. Use as separate words, for example, "The following charts display the minimum hardware requirements for inference serving a model".

*Use it*: yes

[.vale-ignore]
*Incorrect forms*:

*See also*:

[[infiniband]]
==== image:images/yes.png[yes] InfiniBand (noun)
*Description*: _InfiniBand_ is a switched fabric network topology used in high-performance computing. The term is both a service mark and a trademark of the InfiniBand Trade Association. Their rules for using the mark are standard ones: append the (TM) symbol the first time it is used, and respect the capitalization (including the inter-capped "B") from then on. In ASCII-only circumstances, the "\(TM)" string is the acceptable alternative.
Expand Down
Loading