Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 13 additions & 31 deletions docs/source/user_guide/operators/recommender_operator/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,45 +2,27 @@
Recommender
===========

The Recommender Operator utilizes advanced algorithms to provide personalized recommendations based on user behavior and preferences. This operator streamlines the data science workflow by automating the process of selecting the best recommendation algorithms, tuning hyperparameters, and extracting relevant features, ensuring that users receive the most relevant and effective suggestions for their needs.
The Recommender Operator is a low-code template built around the Surprise ``SVD`` algorithm for collaborative filtering use cases. It currently focuses on matrix factorization scenarios, and additional capabilities will be documented as they become available.

Overview
--------
Current Capabilities
--------------------

The Recommender Operator is a powerful tool designed to facilitate the creation and deployment of recommendation systems. This operator utilizes three essential input files: `items`, `users`, and `interaction`, along with specific configuration parameters to generate personalized recommendations.
- **Data inputs**: expects three tabular sources named ``users``, ``items``, and ``interactions``. Each source can be loaded from local files, OCI Object Storage (``oci://`` URIs), or database queries using the standard ADS ``InputData`` configuration.
- **Model**: wraps Surprise ``SVD`` with sensible defaults. ``spec.model_name`` is reserved for future extensibility and is pinned to ``svd`` internally.
- **Outputs**: generates a recommendations CSV (``recommendations.csv`` by default) and, when enabled, an HTML summary report.
- **Configuration essentials**: ``top_k``, ``user_column``, ``item_column``, and ``interaction_column`` are mandatory and map your datasets to the operator.
- **Deployment targets**: supports local execution and OCI Data Science Jobs; see :doc:`./quickstart` for the CLI flow and :doc:`./scalability` for production guidance.

**Input Files**
Future Updates
--------------

1. **Items File**: Contains information about the items that can be recommended. Each entry in this file represents an individual item and includes attributes that describe the item.

2. **Users File**: Contains information about the users for whom recommendations will be generated. Each entry in this file represents an individual user and includes attributes that describe the user.

3. **Interaction File**: Contains historical interaction data between users and items. Each entry in this file represents an interaction (e.g., a user viewing, purchasing, or rating an item) and includes relevant details about the interaction.

**Configuration Parameters**

The Recommender Operator requires the following parameters to trigger the recommendation job:

- **top_k**: Specifies the number of top recommendations to be generated for each user.
- **user_column**: Identifies the column in the users file that uniquely represents each user.
- **item_column**: Identifies the column in the items file that uniquely represents each item.
- **interaction_column**: Identifies the column in the interaction file that details the interactions between users and items.

**Functionality**

Upon execution, the Recommender Operator processes the provided input files and configuration parameters to generate a list of top-k recommended items for each user. It leverages sophisticated algorithms that analyze the historical interaction data to understand user preferences and predict the items they are most likely to engage with in the future.

**Use Cases**

This operator is ideal for a variety of applications, including:

- **E-commerce**: Recommending products to users based on their browsing and purchase history.
- **Streaming Services**: Suggesting movies, TV shows, or music based on user viewing or listening habits.
- **Content Platforms**: Proposing articles, blogs, or news stories tailored to user interests.
New capabilities—such as alternative algorithms, advanced tuning controls, or expanded deployment guidance—will be documented in this guide as they are released.

.. versionadded:: 2.11.14

.. toctree::
:maxdepth: 1

./quickstart
./yaml_schema
./scalability
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,33 @@ The Recommender Operator requires three essential input files:
2. **Items File**: Contains item information.
3. **Interactions File**: Interactions between users and items.

.. note::

You can keep these sources local during prototyping and later point to Object Storage or Oracle databases without changing the operator code—only the YAML configuration.

Remote Data Sources
-------------------

Point to Object Storage by swapping the local ``url`` with an ``oci://`` URI:

.. code-block:: yaml

user_data:
url: oci://my-bucket@my-namespace/users.csv
item_data:
url: oci://my-bucket@my-namespace/items.csv

Read directly from Autonomous Database (or another Oracle database) using ``sql`` and ``connect_args``:

.. code-block:: yaml

interactions_data:
sql: |
SELECT user_id, movie_id, rating, event_ts
FROM MOVIE_RECS.INTERACTIONS
connect_args:
wallet_dir: /home/datascience/oci_wallet

Sample Data
===========

Expand Down Expand Up @@ -84,7 +111,7 @@ Within the ``recommender`` folder created above there will be a ``recommender.ya
.. code-block:: yaml

kind: operator
type: recommendation
type: recommender
version: v1
spec:
user_data:
Expand All @@ -97,22 +124,36 @@ Within the ``recommender`` folder created above there will be a ``recommender.ya
user_column: user_id
item_column: movie_id
interaction_column: rating

output_directory:
url: results
recommendations_filename: recommendations.csv
generate_report: true

Run the Recommender Operator
----------------------------

Now run the recommender job locally:
Validate the YAML and run locally:

.. code-block:: bash

ads operator validate -f recommender.yaml
ads operator run -f recommender.yaml

Run as an OCI Data Science Job
------------------------------

When you are ready to scale, submit the same YAML to a managed job backend:

.. code-block:: bash

ads operator run -f recommender.yaml -b job

Use ``-b`` with a backend config (for example, ``backend_job_python_config.yaml``) to specify shape, subnet, or other runtime controls. See :doc:`../common/run` for backend details.

Results
-------

If not specified in the YAML, all results will be placed in a new folder called ``results``. Performance is summarized in the ``report.html`` file, and the recommendation results can be found in results/recommendations.csv.
If not specified in the YAML, all results will be placed in a new folder called ``results``. Performance is summarized in the ``report.html`` file, and the recommendation results can be found in ``results/recommendations.csv``.

.. code-block:: bash

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
===========
Scalability
===========

Cloud-Native Execution
----------------------

You can promote the same ``recommender.yaml`` from local development to OCI Data Science Jobs without rewriting your configuration.

.. code-block:: bash

# run locally for quick validation
ads operator run -f recommender.yaml

# submit to OCI Data Science Jobs (serverless)
ads operator run -f recommender.yaml -b job

The ``-b job`` flag uses your default job backend profile. Override shape, block storage, or networking by merging a backend config, for example:

.. code-block:: bash

ads operator run -f recommender.yaml -b backend_job_python_config.yaml

For detailed backend options see :doc:`../common/run`.

Data Throughput and Storage
---------------------------

- Use Object Storage (``oci://`` URIs) for large interaction logs. The operator streams data through ADS I/O utilities, so you are limited primarily by network bandwidth.
- For database sources, push filtering and aggregation into the ``sql`` statement to minimise data transfer. Supply ``connect_args`` such as ``wallet_dir`` or ``dsn`` for Autonomous Database connectivity.
- When writing outputs back to Object Storage, point ``spec.output_directory.url`` to an ``oci://`` URI so downstream AI Skills or Jobs can consume the artifacts.

Batch Size and Latency
----------------------

Surprise ``SVD`` trains in-memory on the interaction matrix. To keep runs tractable:

- Start with filtered cohorts (for example, a single region or product line) to validate signal before scaling out.
- Increase compute shape (more OCPUs / memory) in the job backend when interaction counts grow beyond hundreds of thousands.
- Consider sharding your audience and running the operator multiple times if you need very large coverage; you can merge the resulting recommendation CSVs downstream.

Operational Tips
----------------

- Set ``spec.generate_report`` to ``false`` for automated batch runs to reduce artifact size.
- Version control your YAML files and backend configs alongside infrastructure-as-code scripts so intake reviews can track exactly how the operator is used.
- Monitor job logs in OCI Data Science to confirm the operator runs within expected time windows and to capture Surprise training diagnostics.
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
===========
YAML Schema
===========

The ``recommender.yaml`` file orchestrates data access, configuration, and output options for the Recommender Operator. This section walks through every top-level field so you can adapt the template to your environment.

Example Configuration
---------------------

.. code-block:: yaml

kind: operator
type: recommender
version: v1
spec:
user_data:
url: oci://my-bucket@my-namespace/users.csv
item_data:
url: oci://my-bucket@my-namespace/items.csv
interactions_data:
sql: |
SELECT user_id, movie_id, rating, event_ts
FROM MOVIE_RECS.INTERACTIONS
connect_args:
wallet_dir: /home/datascience/oci_wallet
top_k: 10
user_column: user_id
item_column: movie_id
interaction_column: rating
recommendations_filename: recommendations.csv
generate_report: true

Configuration Reference
-----------------------

.. list-table:: Recommender Operator Specification
:widths: 20 10 10 20 40
:header-rows: 1

* - Field
- Type
- Required
- Default
- Description

* - user_data
- dict
- Yes
- {"url": "user_data.csv"}
- Source for user attributes. Accepts the standard ADS ``InputData`` options such as ``url``, ``sql``, ``table_name``, ``connect_args``, and column filters. Remote URIs (``oci://``) and database queries are both supported.

* - item_data
- dict
- Yes
- {"url": "item_data.csv"}
- Source for item attributes. Shares the same structure and connectivity options as ``user_data``.

* - interactions_data
- dict
- Yes
- {"url": "interactions_data.csv"}
- Historical interactions between users and items. Use this to supply implicit or explicit feedback (for example, ratings or click events). Supports the same loaders as ``user_data``.

* - top_k
- integer
- Yes
- 1
- Number of recommendations returned per user. Increase this when downstream applications (such as AI Skills) need a wider candidate list.

* - user_column
- string
- Yes
- user_id
- User identifier column present in both ``user_data`` and ``interactions_data``.

* - item_column
- string
- Yes
- item_id
- Item identifier column present in both ``item_data`` and ``interactions_data``.

* - interaction_column
- string
- Yes
- rating
- Interaction strength column used to train Surprise ``SVD``. For implicit feedback, convert events to a numeric score before loading.

* - output_directory
- dict
- No
- Auto-generated temp path
- Controls where artifacts are written. Provide ``url`` (local path or ``oci://``) and optional ``name`` to organize outputs. Leave unset to let ADS create a timestamped local directory.

* - recommendations_filename
- string
- No
- recommendations.csv
- Customise the recommendations artifact name inside ``output_directory``.

* - generate_report
- boolean
- No
- true
- Toggles HTML report creation. Disable when running headless jobs where only CSV output is required.

* - report_filename
- string
- No
- report.html
- Name of the HTML summary report file saved under ``output_directory``.

* - model_name
- string
- No
- svd
- Reserved for future model expansion. The only supported value today is ``svd``; other values raise ``UnSupportedModelError``.

.. note::

The operator validates the schema before execution. If you pass extra keys, they will be ignored or trigger a validation error. Use the ``ads operator validate -f recommender.yaml`` command to catch issues early.
Loading