QEKD: Query-Efficient and Data-Free Knowledge Distillation from Black-box Models
Chen Chen, Ruoxi Jia
Knowledge distillation (KD) is a typical method for training a lightweight student model with the help of a well-trained teacher model. However, most KD methods require access to either the teacher's training dataset or model parameter, which is unrealistic. To tackle this problem, recent works study KD under data-free and black-box settings. Nevertheless, these works require a large number of queries to the teacher model, which involves significant monetary and computational costs. To this end, we propose a novel method called Query Efficient Knowledge Distillation (QEKD), which aims to query-efficiently learn from black-box model APIs to train a good student without any real data. In detail, QEKD trains the student model in two stages: data generation and model distillation. Note that QEKD does not require any query in the data generation stage and queries the teacher only once for each sample in the distillation stage. Extensive experiments on various real-world datasets show the effectiveness of the proposed QEKD. For instance, QEKD can improve the performance of the best baseline method (DFME) by 5.83 on CIFAR10 dataset with only 0.02x the query budget of DFME.
- Date of publication:
- May 23, 2022
- Cornell University
- Publication note:
Jie Zhang, Chen Chen, Jiahua Dong, Ruoxi Jia, Lingjuan Lyu:QEKD: Query-Efficient and Data-Free Knowledge Distillation from Black-box Models. CoRR abs/2205.11158 (2022)