Indian-Address-Parser-and-Entity-Matching

Overview

This project implements a custom Named Entity Recognition (NER) system for address parsing functionality by fine tuning DistilBERT model. The model is trained on a dataset containing addresses and associated labels for Named Entities.

Dataset

Unfortunately, I cannot provide the dataset used in this project due to privacy and confidentiality reasons. However, the dataset contains address texts along with labels for categories given below.

address : 201, Main Street, Pleasantville, New York, 123456

labels : flat_apartment_number street street sub_locality city_town city_town pincode

The dataset is split into three subsets:

Training set: Used to train the model.
Development set: Used for model validation during training.
Test set: Used to evaluate the final model performance.

Model Architecture

The NER system is based on the DistilBERT architecture from Hugging Face's Transformers library. The model is fine-tuned for token classification with a specified number of output labels corresponding to the unique Named Entity tags present in the dataset.

Training

The training loop is implemented to train the model on the training dataset. The training loop includes evaluation on the development set to monitor model performance and prevent overfitting. We use the Stochastic Gradient Descent (SGD) optimizer with a specified learning rate for optimization.

Evaluation

After training, the model is evaluated on the test set to assess its performance metrics, including accuracy, F1 score, precision, and recall. These metrics provide insights into the model's ability to correctly identify Named Entities and parse addresses.

Usage

To use the trained model for address parsing and Named Entity Recognition:

Ensure all required libraries are installed (listed in requirements.txt).
Load the model and metadata from the saved file (distilbert_ner_model_meta.pth).
Use the model to predict Named Entities and parse addresses by providing input text.

Model File

You can download the trained model file from the following link:

distilbert_ner_model_meta.pth

Images

This image shows the output of the model for a sample address text. The model correctly identifies the Named Entities and parses the address into structured information.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README_images		README_images
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
model-train.ipynb		model-train.ipynb
model_load.ipynb		model_load.ipynb
requirements.txt		requirements.txt
streamlit.py		streamlit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Indian-Address-Parser-and-Entity-Matching

Overview

Dataset

The dataset contains the following Named Entity tags:

Model Architecture

Training

Evaluation

Usage

Model File

Images

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Indian-Address-Parser-and-Entity-Matching

Overview

Dataset

The dataset contains the following Named Entity tags:

Model Architecture

Training

Evaluation

Usage

Model File

Images

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages