Skip to content
Michele Cosi edited this page Mar 20, 2025 · 21 revisions

Functional Open Science Skills for AI/ML Applications


Spring 2025 Workshop: Functional Open Science Skills for AI/ML Applications

This workshop provides graduate students in public universities with developing skills and learning tools required in today's AI/ML-focused science.

Ranging from covering the basic moving parts to understanding AI's role in Open Science, this workshop aims to lend an understanding where to obtain compute, covering software environments and reproducibility, the role of workflows, and aiming to create an end-to-end Machine Learning (ML) workflow.

Required Skills

Skill Description
Basic understanding of Linux This workshop assumes a basic understanding of the Linux Operating System
Familiarity with Machine Learning (and related software) While not required, it is suggested to have a basic understanding of Machine Learning and its concepts.
Enthusiasm for learning new computational skills A strong interest in learning new computational skills is essential for success in this workshop.

Workshop Program

Time: Tuesdays @2PM

REGISTRATION: Link

(for Zoom link/in person information, please sign up at the U of A Data Science Institute DataLab website)

All sessions are recorded and uploaded to the University of Arizona's DataLab YouTube channel, where you can also find the other DataLab series: Natural Language Processing (NLP), Generative AI, NextGen Geospatial.

Date Title/Topic Description Instructors Material Link/Recording
(01/28) The moving parts of Functional Open Science Explore the essential components of Open Science, including reproducibility, version control with Git, the importance of workflows, and tools and resources such as Hugging Face. This session provides an introduction to the ecosystem that enables modern science to be collaborative, transparent, and scalable. Participants will learn of containers to ensure reproducibility, leveraging Git for version control, and applying platforms like Hugging Face for machine learning workflows. Michele Cosi / Carlos Lizárraga Material, Recording
(02/04) AI's Role and Tools in Open Science Learn how AI is changing the world of Open Science. This session covers the principles of Open Science and the transformative role of AI in driving research forward, discussing key AI/ML tools such as PyTorch alongside open datasets and community-driven resources. Attendees will explore how AI enhances reproducibility, promotes transparency, and accelerates discovery. Michele Cosi / Carlos Lizárraga / Enrique Noriega Material, Recording
(02/11) Learning to Working in the Cloud: JetStream2 and Reproducibility Oftentimes, researchers may have all the knowledge necessary to for their work, however they may lack a key component: compute. In this session, attendees will learn of JetStream2 in order to address the need of GPUs and required compute as well as addressing reproducibility. Learn how to access and utilize JetStream2, the cloud computing for scalable data processing, training ML models, and managing collaborative projects. This session covers the basics of cloud infrastructure, setting up accounts, and using JetStream2 effectively for scientific research. Michele Cosi Material, Recording
(02/18) Handling Images & Videos pt. 1 Discover techniques for processing and analyzing image and video data. This session introduces foundational tools and libraries, such as OpenCV and Gradio, for handling visual data in machine learning workflows. Participants will learn how to preprocess images, handle different file formats, and extract meaningful features for analysis. Michele Cosi Material, Recording
(02/25) Handling Images & Videos pt. 2 Continuing from the previous workshop, this session aims to solidify concepts and techniques applicable to handling images and videos in order to train and test AI/ML models. Michele Cosi Material, Recording
(03/04) Training and Testing Models This session covers critical concepts like data splitting (training, validation, and test sets), evaluating model performance, and hyperparameter tuning. Participants will explore common pitfalls and best practices for achieving reliable results, using concepts and code developed in previous sessions. Michele Cosi, Mithün Paul Material, Recording
(03/18) End-to-end ML Workflow pt.1 The core of the workshop: attendees will apply the tools and techniques acquired thus far. In this and the following session, attendees will learn how to build an complete AI/ML pipeline, from data preparation, labeling, training, testing and real world applications. This session will be focused on the first part of the pipeline: data preparation and labelling. Michele Cosi, Carlos Lizárraga, Leonardo Soto Hernandez, Mithün Paul Material, Recording
(03/25) End-to-end ML Workflow pt.2 Continuing from the previous session, this workshop aims to use the previously labelled data in order to train, test and run a model. Michele Cosi, Carlos Lizárraga, Leonardo Soto Hernandez, Mithün Paul Material, Recording

References:


Updated: 03/18/2025 (M. Cosi)

UArizona Data Lab, Data Science Institute, University of Arizona.

CC BY-NC-SA 4.0