Deep neural network (DNN) models have become increasingly crucial components in intelligent software systems. However, training a DNN model is typically expensive in terms of both time and money. To address this issue, researchers have recently focused on reusing existing DNN models - borrowing the idea of code reuse in software engineering.However, reusing an entire model could cause extra overhead or inherits the weakness from the undesired functionalities.Hence, existing work proposes to decompose an already trained model into modules, i.e., modularizing-after-training, and enable module reuse.Since trained models are not built for modularization, modularizing-after-training incurs huge overhead and model accuracy loss.In this paper, we propose a novel approach that incorporates modularization into the model training process, i.e., modularizing-while-training (MwT).We train a model to be structurally modular through two loss functions that optimize intra-module cohesion and inter-module coupling. We have implemented the proposed approach for modularizing Convolutional Neural Network (CNN) models in this work.The evaluation results on representative models demonstrate that MwT outperforms the state-of-the-art approach. Specifically, the accuracy loss caused by MwT is only 1.13%, which is 1.76% less than that of the latter. The kernel retention rate of the modules generated by MwT is only 14.58%, with a reduction of 74.31% over the state-of-the-art approach.Furthermore, the total time cost required for training and modularizing is only 108 minutes, half that of the latter.
- fvcore 0.1.5.post20221221
- numpy 1.23.1
- python 3.9.12
- pytorch 1.12.0
- tensorboard 2.10.1
- torchvision 0.13.0
- tqdm 4.64.0
- GPU with CUDA support is also needed
|--- README.md : the user guidance
|--- data/ : the experimental data
|--- src/ : the source code of our work
|--- configs.py : setting the path
|--- modular_trainer.py : training modular CNN models
|--- modularizer.py : modularizing trained modular CNN models and then reusing modules on sub-tasks
|--- standard_trainer.py : training CNN models using the standard training method
|--- ...
|--- models/
|--- utils_v2.py : the implementation of mask generator
|--- vgg.py : the standard vgg16 model
|--- vgg_masked.py : the modular vgg16 model, i.e., the standard vgg16 model with mask generators
|--- ...
|--- modules_arch/
|--- vgg_module_v2.py : the vgg16 module which retains only relevant kernels and removes mask generators.
|--- ...
|--- exp_cnnsplitter_reusing/
|--- reuse_modules.py : reusing modules published by CNNSplitter on sub-tasks
|--- calculate_cohesion.py : calculating the cohesion of modules
|--- ... : published by CNNSplitter
|--- ...
The following sections describe how to reproduce the experimental results in our paper.
- We provide the resulting models trained by standard training and modular models trained by modular training
One can downloaddata/
from here and then move it toMwT/
.
The datasets will be downloaded automatically by PyTorch when running our project. - Modify
self.root_dir
insrc/configs.py
.
- Training a modular VGG16 model.
python modular_trainer.py --model vgg16 --dataset cifar10 --lr_model 0.05 --alpha 0.5 --beta 1.5 --batch_size 128
- Modularizing the modular VGG16 model and reusing the resulting modules on a sub-task containing "class 0" and "class 1".
python modularizer.py --model vgg16 --dataset cifar10 --lr_model 0.05 --alpha 0.5 --beta 1.5 --batch_size 128 --target_classes 0 1
- Training a VGG16 model
python standard_trainer.py --model vgg16 --dataset cifar10 --lr_model 0.05 --batch_size 128
- Downloading the published modules at CNNSplitter's project webpage.
- Modifying
root_dir
insrc/exp_cnnsplitter_reusing/global_configure.py
- Modifying
dataset_dir
insrc/exp_cnnsplitter_reusing/reuse_modules.py
- Reusing SimCNN-CIFAR10's modules on a sub-task containing "class 0" and "class 1"
python reuse_modules.py --model simcnn --dataset cifar10 --target_classes 0 1
- Calculating the cohesion of modules
python calculate_cohesion.py --model simcnn --dataset cifar10
The value of threshold directly affects the results of modularizing and module reuse. As shown in the figure below, as the threshold increases from 0.1 to 0.9, the kernel retention rate of the modules gradually decreases, from 37.36% to 24.74%. A larger threshold makes each module tend to retain convolutional kernels that are required by all samples of the corresponding category, leading to an increase in cohesion from 0.8572 to 0.9437 and a decrease in coupling from 0.3594 to 0.2412.
Regarding the effect on module reuse, the figure below presents the performance of the modules in terms of kernel retention rate and accuracy on the 3-class classification sub-task. As the threshold increases, the KRR of the module decreases, from 72.57% to 50.51%. Nonetheless, the decrease of KRR has a negligible impact on the accuracy of the module, which only drops from 97.77% to 97.23%. The experimental results also demonstrate that our default settings are appropriate.