COLX 523 - Advanced Corpus Linguistics

Group Repository: David Kang, Daoming Liu, Jacob Nadal, Nicole Lopez

Project Overview

This is a fork of a repoistory dedicated to a group project for COLX 523 - Advanced Corpus Linguistics. The project involves building a corpus with annotation and a web interface, leveraging internet text sources to collect a sizable corpus (~1 million words).

Repository Structure

Movie-Review-Corpus-Annotation/
├── data/                # Raw and processed data files
├── documentation/       # Project-related documentation, meeting notes, and reports
├── src/                 # Source code for data collection, processing, and annotation
├── web_app/             # Contains the code for the interactive interface:
│     ├── frontend/      # The React application code for the user interface.
│     └── backend/       # The FastAPI application code for serving the API and static files  
├── .gitignore           # Files and directories to be ignored by Git
├── .dockerignore/       # Files and directories to be ignored by docker
├── instructions_movie-reviews.md   # Instructions for building and running the Docker image
├── Dockerfile/          # Dockerfile used to build a Docker image containing the entire project 
└── README.md            # Repository overview and guidelines

To create and run Docker image:

Please refer to intructions_movie-reviews.md in root repo.

Important File Locations (updated for sprint 3)

Annotation Data

Human Annotations: Stored in Excel files under: src/analysis/modified_corpus_batches/xlsx/
GPT annotations: Stored in json files under: src/analysis/modified_corpus_batches/json
Code for annotation & evaluation: src/analysis/gpt and src/analysis/helper_methods

Interannotator agreement study and Plan for the interface: Both updated under documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COLX 523 - Advanced Corpus Linguistics

Group Repository: David Kang, Daoming Liu, Jacob Nadal, Nicole Lopez

Project Overview

Repository Structure

To create and run Docker image:

Important File Locations (updated for sprint 3)

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
data		data
documentation		documentation
src		src
web_app		web_app
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
instructions_movie-reviews.md		instructions_movie-reviews.md

Folders and files

Latest commit

History

Repository files navigation

COLX 523 - Advanced Corpus Linguistics

Group Repository: David Kang, Daoming Liu, Jacob Nadal, Nicole Lopez

Project Overview

Repository Structure

To create and run Docker image:

Important File Locations (updated for sprint 3)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages