datawhalechina
diff --git a/‎README.md‎
Lines changed: 26 additions & 3 deletions b/‎README.md‎
Lines changed: 26 additions & 3 deletions
diff --git a/‎docs/README.md‎
Lines changed: 23 additions & 1 deletion b/‎docs/README.md‎
Lines changed: 23 additions & 1 deletion
diff --git a/‎docs/ch01/ch01.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/ch01/ch01.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/ch02/ch02.md‎
Lines changed: 265 additions & 3 deletions b/‎docs/ch02/ch02.md‎
Lines changed: 265 additions & 3 deletions
diff --git a/‎docs/ch02/images/mnist_dataset.png‎
17.3 KB b/‎docs/ch02/images/mnist_dataset.png‎
17.3 KB
diff --git a/‎docs/ch02/images/predict.png‎
33.7 KB b/‎docs/ch02/images/predict.png‎
33.7 KB
@@ -27,9 +27,6 @@
 - 提供通俗易懂的理论内容来科普模型压缩技术；
 - 提供实践代码，结合实际场景帮助学习者更好地理解理论内容。
 
-## 实践环境安装
-
-本项目实践代码基于Python 3.10，具体安装环境请参考：[INSTALL.md](./docs/notebook/INSTALL.md)
 
 ## 本地在线阅读环境安装
 ### Node.js版本
@@ -47,6 +44,32 @@ npm i docsify-cli -g
 docsify serve ./docs
 ```
 
+## 环境安装
+
+本项目实践代码基于Python 3.10，推荐使用conda虚拟环境进行安装。
+
+requirements.txt 文件内容如下：
+```
+numpy==1.24.3
+matplotlib==3.9.2
+tqdm==4.66.5
+jupyter==1.1.1
+torch==2.1.0
+torchvision==0.16.0
+torchprofile==0.0.4
+torchsummary==1.5.1
+fast-pytorch-kmeans
+scipy
+datasets
+```
+使用以下命令创建虚拟环境
+
+```
+conda create -n compression python=3.10
+conda activate compression
+pip install -r requirements.txt
+```
+
 ### 目录
 
 - [第1章 引言](https://datawhalechina.github.io/awesome-compression/#/ch01/ch01)
 
@@ -41,7 +41,29 @@
 
 ## 环境安装
 
-本项目实践代码基于Python 3.10，具体安装环境请参考：[INSTALL.md](./notebook/INSTALL.md)
+本项目实践代码基于Python 3.10，推荐使用conda虚拟环境进行安装。
+
+requirements.txt 文件内容如下：
+```
+numpy==1.24.3
+matplotlib==3.9.2
+tqdm==4.66.5
+jupyter==1.1.1
+torch==2.1.0
+torchvision==0.16.0
+torchprofile==0.0.4
+torchsummary==1.5.1
+fast-pytorch-kmeans
+scipy
+datasets
+```
+使用以下命令创建虚拟环境
+
+```
+conda create -n compression python=3.10
+conda activate compression
+pip install -r requirements.txt
+```
 
 ## 参与贡献
 
 
@@ -2,7 +2,7 @@
 
 &emsp;&emsp;随着计算性能和存储空间的发展，这使得设备能够运行更大的深度学习模型，有些模型具有数亿，数十亿甚至数百亿的参数，比如常见的7b模型大小，表示70亿的参数量，目前最大的模型参数为4500亿（2024 Snowflake公司的 [Arctic模型](https://www.thepaper.cn/newsDetail_forward_27161326)）。下图是近年来模型大小与GPU发展的趋势，从图中可以看出，GPU硬件发展的速度远远跟不上模型大小的增长速度，这也导致了大模型训练和推理的困难。而模型压缩技术可以弥补这个差距，使得大模型可以在有限的硬件资源上运行。
 
-![图1-1 近年来模型参数与GPU发展的趋势](images/problem.png)
+![近年来模型参数与GPU发展的趋势](images/problem.png)
 
 &emsp;&emsp;在最近的神经网络理论研究中已经发现，神经网络在优化过程中神经元会出现两种冗余情况，部分神经元会“坍缩成”功能类似的神经元，共同负责类似的功能；部分神经元则被忽视，并没有在优化完成后成为某类功能的承担者，这些神经元也被叫做“冗余神经元”。
 
@@ -92,6 +92,6 @@ $$OPS=\frac{OPs}{\text { second }}$$
 &emsp;&emsp;吞吐量（Throughput）是指模型在单位时间内能够处理的数据量，通常用于衡量压缩后模型的效率。Throughput 通常是与其他性能指标（如准确率、延迟）一同考量，以平衡模型精度和推理速度之间的关系。在剪枝或量化时，如果模型的 Throughput 提升显著而精度损失较小，则该压缩方法是有效的。
 
 ## 引用资料
--  [全球最大开源模型再刷爆纪录，4800亿参数MoE击败Llama 3、Mixtral](https://www.thepaper.cn/newsDetail_forward_27161326)
+- [全球最大开源模型再刷爆纪录，4800亿参数MoE击败Llama 3、Mixtral](https://www.thepaper.cn/newsDetail_forward_27161326)
 - [浮点数介绍](https://baike.baidu.com/item/%E6%B5%AE%E7%82%B9%E6%95%B0/6162520?fr=ge_ala)
 - [模型压缩综述](https://arxiv.org/pdf/2308.07633v4)
@@ -29,7 +29,7 @@
 
 &emsp;&emsp;下图是一个可视化示例，实现过程具体可参考[CNN Explainer](https://poloclub.github.io/cnn-explainer)：
 
-![图2-1 CNN可视化示例](images/convlayer_overview_demo.gif)
+![CNN可视化示例](images/convlayer_overview_demo.gif)
 
 ## 2.2 相关术语解读
 
@@ -49,18 +49,280 @@
 
 下图是对一个3通道的图片做卷积操作：
 
-![图2-2 3通道卷积操作](images/multi_channel.gif)
+![3通道卷积操作](images/multi_channel.gif)
 
 &emsp;&emsp;其中，有三个卷积核（也被称为滤波器）通道，维度是 `3 × 3 × 3`，分别代表卷积核的高度、宽度及深度。该卷积操作首先对三个输入通道分别做卷积操作，然后将卷积的结果相加，最后输出一个特征图。
 
 &emsp;&emsp;下面来看一个例子，因为3D数据难以可视化，所以所有的数据（输入数据体是蓝色，权重数据体是红色，输出数据体是绿色）都采取将深度切片按照列的方式排列展现。
 
-![图2-3 卷积操作示例](images/conv_demo.gif)
+![卷积操作示例](images/conv_demo.gif)
 
 &emsp;&emsp;卷积运算本质上就是在滤波器和输入数据的局部区域间做点积。卷积层的常用实现方式就是利用这一点，将卷积层的前向传播变成一个巨大的矩阵乘法。
 
 下面一起动手实践一个简单的CNN例子[Mnist手写数字识别](https://github.com/datawhalechina/awesome-compression/blob/main/docs/notebook/ch02/1.mnist_classify.ipynb)，通过这个例子来加深对CNN的理解。
 
+## 2.3 实践
+
+首先导入必要的包，并加载数据集
+
+```python
+import copy
+import math
+import time
+import random
+from collections import OrderedDict, defaultdict
+from typing import Union, List
+
+import numpy as np
+import torch
+from matplotlib import pyplot as plt
+from torch import nn
+from torch.optim import *
+from torch.optim.lr_scheduler import *
+from torch.utils.data import DataLoader
+from torchvision.transforms import *
+from tqdm.auto import tqdm
+import torch.nn.functional as F
+from torchvision import datasets
+
+random.seed(0)
+np.random.seed(0)
+torch.manual_seed(0)
+
+# 设置归一化
+transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+
+# 获取数据集
+train_dataset = datasets.MNIST(root='./data/mnist', train=True, download=True, transform=transform)  
+test_dataset = datasets.MNIST(root='./data/mnist', train=False, download=True, transform=transform)  # train=True训练集，=False测试集
+
+# 设置DataLoader
+batch_size = 64
+train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
+test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
+```
+
+展示数据集，如下图所示：
+```python
+# 展示数据集
+fig = plt.figure()
+for i in range(12):
+    plt.subplot(3, 4, i+1)
+    plt.tight_layout()
+    plt.imshow(train_dataset.train_data[i], cmap='gray', interpolation='none')
+    plt.title("Labels: {}".format(train_dataset.train_labels[i]))
+    plt.xticks([])
+    plt.yticks([])
+plt.show()
+```
+
+![mnist数据部分展示](images/mnist_dataset.png)
+
+定义一个LeNet网络，代码如下：
+
+```python
+# 定义一个LeNet网络
+class LeNet(nn.Module):
+    def __init__(self, num_classes=10):
+        super(LeNet, self).__init__()
+        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
+        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)
+        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
+        self.fc1 = nn.Linear(in_features=16 * 4 * 4, out_features=120)
+        self.fc2 = nn.Linear(in_features=120, out_features=84)
+        self.fc3 = nn.Linear(in_features=84, out_features=num_classes)
+
+    def forward(self, x):
+        x = self.maxpool(F.relu(self.conv1(x)))
+        x = self.maxpool(F.relu(self.conv2(x)))
+
+        x = x.view(x.size()[0], -1)
+        x = F.relu(self.fc1(x))
+        x = F.relu(self.fc2(x))
+        x = self.fc3(x)
+
+        return x
+device = torch.device("cpu")
+model = LeNet().to(device=device)
+```
+定义训练函数：
+
+```python
+
+def train(
+  model: nn.Module,
+  dataloader: DataLoader,
+  criterion: nn.Module,
+  optimizer: Optimizer,
+  callbacks = None
+) -> None:
+  model.train()
+
+  for inputs, targets in tqdm(dataloader, desc='train', leave=False):
+    inputs = inputs.to(device)
+    targets = targets.to(device)
+    # print(inputs.shape)
+    # Reset the gradients (from the last iteration)
+    optimizer.zero_grad()
+
+    # Forward inference
+    outputs = model(inputs).cpu()
+    loss = criterion(outputs, targets)
+
+    # Backward propagation
+    loss.backward()
+
+    # Update optimizer 
+    optimizer.step()
+
+    if callbacks is not None:
+        for callback in callbacks:
+            callback()
+```
+
+定义评估函数：
+
+```python
+@torch.inference_mode()
+def evaluate(
+  model: nn.Module,
+  dataloader: DataLoader,
+  verbose=True,
+) -> float:
+  model.eval()
+
+  num_samples = 0
+  num_correct = 0
+
+  for inputs, targets in tqdm(dataloader, desc="eval", leave=False,
+                              disable=not verbose):
+    inputs = inputs.to(device)
+    targets = targets.to(device)
+  
+    # Inference
+    outputs = model(inputs).cpu()
+
+    # Convert logits to class indices
+    outputs = outputs.argmax(dim=1)
+
+    # Update metrics
+    num_samples += targets.size(0)
+    num_correct += (outputs == targets).sum()
+
+  return (num_correct / num_samples * 100).item()
+```
+
+训练模型，并保存最好的模型和梯度，并输出预测准确率，代码如下：
+
+```python
+lr = 0.01
+momentum = 0.5
+num_epoch = 5
+
+optimizer = torch.optim.SGD(model.parameters(),  lr=lr, momentum=momentum)  # lr学习率，momentum冲量
+criterion = nn.CrossEntropyLoss()  # 交叉熵损失
+
+
+best_accuracy = 0
+best_checkpoint = dict()
+gradients = dict()
+for epoch in range(num_epoch):
+    train(model, train_loader, criterion, optimizer)
+    accuracy = evaluate(model, test_loader)
+    is_best = accuracy > best_accuracy
+    if is_best:
+        best_checkpoint['state_dict'] = copy.deepcopy(model.state_dict())
+        best_accuracy = accuracy
+        
+        # 将每个梯度保存到字典中
+        for name, parameter in model.named_parameters():
+            if parameter.grad is not None:
+                # .clone()确保我们有梯度的复制，而非引用
+                gradients[name] = parameter.grad.clone()
+
+    print(f'Epoch{epoch+1:>2d} Accuracy {accuracy:.2f}% / Best Accuracy: {best_accuracy:.2f}%')
+
+
+torch.save(best_checkpoint['state_dict'], './model.pt')
+torch.save(gradients, './model_gradients.pt')
+
+print(f"=> loading best checkpoint")
+model.load_state_dict(best_checkpoint['state_dict'])
+model_accuracy = evaluate(model, test_loader)
+print(f"Model has accuracy={model_accuracy:.2f}%")
+```
+
+最后，加载模型并预测单张图像的标签：
+
+```python
+from torchvision import transforms
+from PIL import Image
+# Load the saved model
+model = LeNet()  # Replace MyModel with your model's class
+model.load_state_dict(torch.load('./model.pt'))
+model.eval()  # Set the model to evaluation mode
+
+# Preprocess the image (assuming input is grayscale 28x28 as in MNIST)
+def preprocess_image(image_path):
+    transform = transforms.Compose([
+        transforms.Grayscale(num_output_channels=1),  # Convert to grayscale if needed
+        transforms.Resize((28, 28)),  # Resize to match MNIST dimensions
+        transforms.ToTensor(),  # Convert image to tensor
+        transforms.Normalize((0.1307,), (0.3081,))  # Normalize as per model's training
+    ])
+    image = Image.open(image_path)
+    image = transform(image).unsqueeze(0)  # Add batch dimension
+    return image
+
+# Perform prediction on a single image
+def predict_image(image_path):
+    image = preprocess_image(image_path)
+    with torch.no_grad():
+        output = model(image)
+        prediction = output.argmax(dim=1, keepdim=True)  # Get the predicted class
+    return prediction.item()
+
+# Example usage
+image_path = 'test.png'  # Replace with the actual image path
+predicted_label = predict_image(image_path)
+print(f'Predicted label: {predicted_label}')
+```
+
+同时预测多张图片的代码如下：
+
+```python
+def show_images(images, labels, preds, num_rows=4, num_cols=4):
+    fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 10))
+    axes = axes.flatten()  # Flatten axes array for easy iteration
+    for idx in range(num_rows * num_cols):
+        if idx >= len(images):
+            break
+        ax = axes[idx]
+        img = images[idx].cpu().numpy().squeeze()  # Convert tensor to numpy and remove unnecessary dimensions
+        ax.imshow(img, cmap='gray')
+        ax.set_title(f'True: {labels[idx].item()}\nPred: {preds[idx].item()}')
+        ax.axis('off')  # Turn off axis labels
+    plt.tight_layout()
+    plt.show()
+
+# Load the saved model
+model = LeNet()  # Replace MyModel with your model's class
+model.load_state_dict(torch.load('./model.pt'))
+model.eval()  # Set the model to evaluation mode
+# Get a batch of test data
+test_iter = iter(test_loader)
+images, labels = next(test_iter)
+
+# Run the model to predict labels
+with torch.no_grad():
+    outputs = model(images)
+    _, preds = torch.max(outputs, 1)  # Get the predicted labels
+
+# Show images with true and predicted labels
+show_images(images.cpu(), labels.cpu(), preds.cpu())
+```
+![预测多张图片示例](images/predict.png)
+
 ## 引用资料
 
 - [卷积核（kernel）和过滤器（filter）的区别](https://blog.csdn.net/weixin_38481963/article/details/109906338)