Language to Network: Conditional Parameter Adaptation with Natural Language Descriptions

Abstract

Transfer learning using ImageNet pre-trained models has been the de facto approach in a wide range of computer vision tasks. However, finetuning still requires task-specific training data. In this paper, we propose a new paradigm of synthesizing task-specific neural networks from language and generic pre-trained models. Our proposed method, N^3 (Neural Networks from Natural Language), leverages language descriptions to generate parameter adaptations as well as new task-specific classification layers for pre-trained neural networks, effectively ``fine-tune’’ the network for a new task. To the best of our knowledge, N^3 is the first method to synthesize entire neural networks from natural language. Our experiments show that N^3 is able to out-perform previous natural-language based zero-shot learning methods across 4 different zero-shot image classification benchmarks. We also demonstrate a simple method that can help us interpret what keywords in language descriptions N^3 is leveraging when generating parameters.

Publication
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics