WM-811K_semiconductor_wafer_map_pattern_classified

Data Introduction

The WM-811K semiconductor data sets can be downloaded from Kaggle or here. In the semiconductor industry, engineers rely on wafer map patterns from CP Yield, WAT (Wafer Acceptance Test), and Particle to identify process issues. However, classifying these wafer map patterns into groups without manual intervention is a major challenge. Many papers have investigated this problem, and I will present the results of using deep learning to address it.

The dataset contains 811,457 images, but only 172,950 images have manual labels, comprising a total of nine labels: 0, 1, 2, 3, 4, 5, 6, 7, and 8. Among these images, label 8 (representing no pattern) accounts for 85.2% of the total.

The other remain pattern (14.8%) are

Center : 4294(2.5%)
Donut : 555(0.3%)
Edge-Loc : 5189(3.0%)
Edge-Ring : 9680(5.6%)
Loc : 3593(2.1%)
Random : 866(0.5%)
Scratch : 1193(0.7%)
Near-full : 149(0.1%)

The data owner of Kaggle, QINGYI, has conducted an extensive data exploration and provided an image showing the other eight patterns (apologies for not generating it again). If we predict that all patterns belong to label 8, we can achieve an 85.2% baseline accuracy for this dataset. However, this may lead to a false sense of confidence in the model's performance, as we may obtain low-accuracy classification results for the remaining 14.8% of the data.

Therefore, our goal is to separate the eight patterns from the 25,519 images that do not belong to the none pattern category. It is worth noting that many researchers have used all 172,950 labeled images to claim high accuracy, which in our view may be misleading.

Data split

I split 25519 images into 15316(60%) train、3823(15%) validation and 6380(25%) test and use test data sets to varified my model.

train : 15316(60%)

Center : 2598(17.0%)
Donut : 326(2.1%)
Edge-Loc : 3081(20.1%)
Edge-Ring : 5873(38.3%)
Loc : 2106(13.8%)
Random : 516(3.4%)
Scratch : 719(4.7%)
Near-full : 97(0.6%)

validation : 3823(15%)

Center : 640(16.7%)
Donut : 78(2.0%)
Edge-Loc : 779(20.4%)
Edge-Ring : 1426(37.3%)
Loc : 571(14.9%)
Random : 124(3.2%)
Scratch : 186(4.9%)
Near-full : 19(0.5%)

test : 6380(25%)

Center : 1056(16.6%)
Donut : 151(2.4%)
Edge-Loc : 1329(20.8%)
Edge-Ring : 2381(37.3%)
Loc : 916(14.4%)
Random : 226(3.5%)
Scratch : 288(4.5%)
Near-full : 33(0.5%)

Model training progress

I train for 40 epochs and get 96.4% for train and 93.3% for validation. Below is my training history.

Test data result

For the test data set, I get 92.95% accuracy (It's similar to validation's accuracy). Below is my classified confusion matrix

You can download the test data classified result and validate data classfied result to check.

The download file will show below information

The prior column info are correspond to WM-811K data set.

Conclusion and future work

Due to certain reasons, I am unable to share my model at the moment. However, I have demonstrated the potential of solving the wafer map pattern classification problem in the semiconductor industry, and I am confident that I can improve the model in the future.

Moving forward, there are several things that I plan to do:

Include the "none" (label:8) pattern in the classification, but carefully consider the proportion of this pattern in the dataset to ensure that the model is not misleading or cheating. Thus, there will be a total of nine patterns (0, 1, 2, 3, 4, 5, 6, 7, 8) for classification.
There are still many mislabeled instances in the dataset, which results in reduced accuracy. However, it is challenging to verify the correctness of all manual labels. Hence, I need to explore alternative approaches to address this issue.

Reference :

https://www.kaggle.com/qingyi/wm811k-wafer-map/discussion/57318#latest-338421

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
images		images
.gitignore		.gitignore
README.md		README.md
WM-811K_semiconductor_wafer_map_pattern_classified.Rproj		WM-811K_semiconductor_wafer_map_pattern_classified.Rproj
test_classified.csv		test_classified.csv
valid_classified.csv		valid_classified.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WM-811K_semiconductor_wafer_map_pattern_classified

Data Introduction

Data split

Model training progress

Test data result

Conclusion and future work

About

Uh oh!

Releases

Packages

fr407041/WM-811K_semiconductor_wafer_map_pattern_classified

Folders and files

Latest commit

History

Repository files navigation

WM-811K_semiconductor_wafer_map_pattern_classified

Data Introduction

Data split

Model training progress

Test data result

Conclusion and future work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages