Adds fine tuning contrib example #667

skrawcz · 2024-01-28T07:40:24Z

This module shows how one can perform fine tuning using Hamilton. It is a basic example using the transformers library that connects to huggingface to pull and fine tune a FLAN model.

One should be able to adapt this code to their needs.

For new dataflows:

Do you have the following?

How I tested this

Ran it locally in a docker container

Notes

We might want to invest in making it so that multiple modules could make a contribution because that could help segment the code better.

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Dataflow documentation has been updated if adding/changing functionality.

This module shows how one can perform fine tuning using Hamilton. It is a basic example using the transformers library that connects to huggingface to pull and fine tune a FLAN model. One should be able to adapt this code to their needs.

elijahbenizzy

I think we can simplify a bit by:

Removing inference from this -- that should be a snippet of code in the README of an ipynb or something
Processing features prior to splitting the dataset
Fixing up/removing some of the docstrings

contrib/hamilton/contrib/user/skrawcz/fine_tuning/README.md

contrib/hamilton/contrib/user/skrawcz/fine_tuning/__init__.py

contrib/hamilton/contrib/user/skrawcz/fine_tuning/README.md

contrib/hamilton/contrib/user/skrawcz/fine_tuning/__init__.py

elijahbenizzy

Can we tokenize the dataset before splitting? That'll kill a lot of duplicate code and make it easier to read.

Otherwise this looks fine, I think you should break it out cause it's doing too much and I find that hard to follow, but let's just get it out.

elijahbenizzy

Looks fine, maybe _preprocess_function should do the dataset .map call

So that people know to change it to match their dataset.

With latest contrib additions.

Adds fine tuning contrib example

9ec145f

This module shows how one can perform fine tuning using Hamilton. It is a basic example using the transformers library that connects to huggingface to pull and fine tune a FLAN model. One should be able to adapt this code to their needs.

skrawcz force-pushed the contrib/fine-tuning branch from 2012c89 to 9ec145f Compare January 28, 2024 07:43

skrawcz temporarily deployed to github-pages January 28, 2024 07:43 — with GitHub Actions Inactive

elijahbenizzy reviewed Jan 28, 2024

View reviewed changes

Updates to PR from feedback.

15dc34c

skrawcz temporarily deployed to github-pages January 29, 2024 01:38 — with GitHub Actions Inactive

elijahbenizzy reviewed Jan 29, 2024

View reviewed changes

Adds extra comments for first tokenization functions

d8d2b1c

skrawcz temporarily deployed to github-pages January 29, 2024 17:17 — with GitHub Actions Inactive

elijahbenizzy reviewed Jan 30, 2024

View reviewed changes

elijahbenizzy self-requested a review January 30, 2024 01:06

elijahbenizzy approved these changes Jan 30, 2024

View reviewed changes

Updates docker file comment

5e9e26d

So that people know to change it to match their dataset.

skrawcz temporarily deployed to github-pages January 30, 2024 01:10 — with GitHub Actions Inactive

Bumps version to get 0.0.7 out

4b92139

With latest contrib additions.

skrawcz temporarily deployed to github-pages January 30, 2024 01:12 — with GitHub Actions Inactive

skrawcz merged commit 4fa02a1 into main Jan 30, 2024

skrawcz deleted the contrib/fine-tuning branch January 30, 2024 01:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds fine tuning contrib example #667

Adds fine tuning contrib example #667

Uh oh!

skrawcz commented Jan 28, 2024 •

edited

Loading

Uh oh!

elijahbenizzy left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elijahbenizzy left a comment

Uh oh!

elijahbenizzy left a comment

Uh oh!

Uh oh!

Adds fine tuning contrib example #667

Adds fine tuning contrib example #667

Uh oh!

Conversation

skrawcz commented Jan 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

For new dataflows:

How I tested this

Notes

Checklist

Uh oh!

elijahbenizzy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elijahbenizzy left a comment

Choose a reason for hiding this comment

Uh oh!

elijahbenizzy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

skrawcz commented Jan 28, 2024 •

edited

Loading