Skip to content

oomti/datascience_workshop_llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Data Analysis Workshop: Titanic Dataset Exploration

Overview

This workshop demonstrates how to perform data analysis on the Titanic dataset using Python, focusing on leveraging AI-assisted techniques with LangChain and GPT models. Participants will learn how to load data, explore its contents, visualize relationships, and interpret results using a combination of traditional data analysis methods and AI-powered insights.

Prerequisites

  • Basic understanding of Python
  • Familiarity with data analysis concepts
  • Google Colab account (optional, but recommended for easy setup)
  • OpenAI API Token

Setup

  1. Open the notebook in Google Colab or your preferred Jupyter environment.

  2. Upload the Data_Science_Workshop jupyter notebook file, or use the link shared on the workshop

  3. Run the first cell to install required packages:

    • langgraph
    • langchain
    • langchain-openai
    • pandas
    • langchain_core
    • langchain_experimental
    • pydantic
  4. Set up your OpenAI API key when prompted.

Workshop Outline

  1. Data Acquisition

    • Download the Titanic dataset
    • Load the data into a pandas DataFrame
  2. Initial Data Exploration

    • Use AI to generate questions about the dataset
    • Examine basic statistics and structure of the data
  3. Data Visualization

    • Create scatter plots to visualize relationships between variables
    • Use AI to interpret and explain the visualizations
  4. Advanced Analysis

    • Perform deeper analysis on passenger demographics and survival rates
    • Use AI to generate insights and answer complex questions about the data
  5. AI-Assisted Exploration

    • Utilize LangChain and GPT models to create a conversational interface for data exploration
    • Demonstrate how AI can assist in formulating queries and interpreting results

Key Components

  • LangChain: Used for creating AI-powered workflows
  • OpenAI GPT: Provides natural language processing capabilities
  • Pandas: Used for data manipulation and analysis
  • Matplotlib: Used for data visualization

Workflow

  1. Set up the AI-assisted analysis pipeline using LangChain
  2. Load and preprocess the Titanic dataset
  3. Use AI to generate initial insights and questions about the data
  4. Create visualizations based on AI suggestions
  5. Interpret results with AI assistance
  6. Iterate through analysis steps, asking follow-up questions and generating new visualizations as needed

Conclusion

By the end of this workshop, participants will have gained hands-on experience in:

  • Using AI to assist in data analysis tasks
  • Exploring and visualizing dataset characteristics
  • Interpreting complex relationships in data
  • Leveraging natural language interfaces for data exploration

This workshop showcases how AI can enhance traditional data analysis techniques, providing a powerful toolset for deriving insights from complex datasets.

Credits and Acknowledgments

Contributors

  • Vipul Kumar - Demonstrating the use of LangGraph
  • oomti - Creating this workshop notebook

Third-Party Libraries

  • Langchain - Library for creating agentic workflows

Resources

  • OpenAI - OpenAI GPT-4 LLM API

Special Thanks

  • [Alex Grazer] - Hosting and organizing the event
  • [Valerio Ficcadenti] - Co-Hosting and organizing the event

Sponsors

  • LSBU - For providing the venue for our workshop

About

Workshop at LSBU

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published