Leveraging distillation token and weaker teacher model to improve DeiT transfer learning capability
International Journal of Informatics and Communication Technology
Abstract
Recently, distilling knowledge from convolutional neural networks (CNN) has positively impacted the data-efficient image transformer (DeiT) model. Due to the distillation token, this method is capable of boosting DeiT performance and helping DeiT to learn faster. Unfortunately, a distillation procedure with that token has not yet been implemented in the DeiT for transfer learning to the downstream dataset. This study proposes implementing a distillation procedure based on a distillation token for transfer learning. It boosts DeiT performance on downstream datasets. For example, our proposed method improves the DeiT B 16 model performance by 1.75% on the OxfordIIIT-Pets dataset. Furthermore, we present using a weaker model as a teacher of the DeiT. It could reduce the transfer learning process of the teacher model without reducing the DeiT performance too much. For example, DeiT B 16 model performance decreased by only 0.42% on Oxford 102 Flowers with EfficientNet V2S compared to RegNet Y 16GF. In contrast, in several cases, the DeiT B 16 model performance could improve with a weaker teacher model. For example, DeiT B 16 model performance improved by 1.06% on the OxfordIIIT-Pets dataset with EfficientNet V2S compared to RegNet Y 16GF as a teacher model.
Discover Our Library
Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.





