A major challenge in Natural Language Processing is to teach
machines to generate natural language and actions. However, existing
deep learning approaches to language generation do not impose
structural constraints in the generation process, often producing
low-quality results. In this context, we will introduce our attempt of
imposing structural constraints for video captioning via hierarchical
reinforcement learning. Moreover, we observe that most of the
automated metrics for generation could be gamed, and therefore, we
propose an adversarial reward learning method to automatically learn
the reward via inverse reinforcement learning. Furthermore, I will
discuss our recent attempts in connecting language and vision to
actions via a language grounding task for robot navigation, and
introduce new algorithms on scheduled policy optimization and
combining model-free and model-based reinforcement learning. I will
conclude by introducing other exciting research projects at UCSB's NLP
Group.