14th Workshop on Building and Using Comparable Corpora

INVITED SPEAKERS

Pushpak Bhattacharyya

Indian Institute of Technology
Mumbai

Machine Translation in Low Resource Setting

Abstract

AI now and in future will have to grapple continuously with the problem of low resource. AI will increasingly be ML intensive. But ML needs data often with annotation. However, annotation is costly.

Over the years, through work on multiple problems, we have developed insight into how to do language processing in low resource setting. Following 6 methods—individually and in combination—seem to be the way forward:

Artificially augment resource (e.g. subwords)
Cooperative NLP (e.g., pivot in MT)
Linguistic embellishment (e.g. factor based MT, source reordering)
Joint Modeling (e.g., Coref and NER, Sentiment and Emotion: each task helping the other to either boost accuracy or reduce resource requirement)
Multimodality (e.g., eye tracking based NLP, also picture+text+speech based Sentiment Analysis)
Cross Lingual Embedding (e.g., embedding from multiple languages helping MT, close to 2 above)

The present talk will focus on low resource machine translation. We describe the use of techniques from the above list and bring home the seriousness and methodology of doing Machine Translation in low resource settings.

Bio

Dr. Pushpak Bhattacharyya is Professor of Computer Science and Engineering Department IIT Bombay. His research areas are Natural Language Processing, Machine Learning and AI (NLP-ML-AI). Prof. Bhattacharyya has published more than 350 research papers in various areas of NLP. His textbook ‘Machine Translation’ sheds light on all paradigms of machine translation with abundant examples from Indian Languages. Two recent monographs co-authored by him called ‘Investigations in Computational Sarcasm’ and ‘Cognitively Inspired Natural Language Processing—An Investigation Based on Eye Tracking’ describe cutting edge research in NLP and ML. Prof. Bhattacharyya is Fellow of Indian National Academy of Engineering (FNAE) and Abdul Kalam National Fellow. For sustained contribution to technology he received the Manthan Award of the Ministry of IT, P.K. Patwardhan Award of IIT Bombay and VNMM Award of IIT Roorkey. He is also a Distinguished Alumnus of IIT Kharagpur.

Tomáš Mikolov

Czech Institute of Informatics, Robotics and Cybernetics

Language modeling and AI

Bio

Tomas Mikolov is a researcher at CIIRC, Prague. Currently he leads a research team focusing on development of novel techniques within the area of complex systems, artificial life and evolution. Previously, he did work at Facebook AI and Google Brain, where he led development of popular machine learning tools such as word2vec and fastText. He obtained PhD at the Brno University of Technology in 2012 for his work on neural language models (the RNNLM project). His main research interest is to understand intelligence, and to create artificial intelligence that can help people to solve complex problems.

Sujith Ravi

SliceX AI

Large-scale Deep Learning for Low-Resource AI

Bio

Dr. Sujith Ravi is Founder and CEO at SliceX AI. Previously, he was the Director of Amazon Alexa AI where he led efforts to build the future of multimodal conversational AI experiences at scale. Prior to that, he was leading and managing multiple ML and NLP teams and efforts in Google AI. He founded and headed Google’s large-scale graph-based semi-supervised learning platform, deep learning platform for structured and unstructured data as well as on-device machine learning efforts for products used by billions of people in Search, Ads, Assistant, Gmail, Photos, Android, Cloud and YouTube. These technologies power conversational AI (e.g., Smart Reply), Web and Image Search; On-Device predictions in Android and Assistant; and ML platforms like Neural Structured Learning in TensorFlow, Learn2Compress as Google Cloud service, TensorFlow Lite for edge devices.

Dr. Ravi has authored over 100 scientific publications and patents in top-tier machine learning and natural language processing conferences. His work has been featured in press: Wired, Forbes, Forrester, New York Times, TechCrunch, VentureBeat, Engadget, New Scientist, among others, and also won the SIGDIAL Best Paper Award in 2019 and ACM SIGKDD Best Research Paper Award in 2014. For multiple years, he was a mentor for Google Launchpad startups. Dr. Ravi was the Co-Chair (AI and deep learning) for the 2019 National Academy of Engineering (NAE) Frontiers of Engineering symposium. He was also the Co-Chair for ACL 2021, EMNLP 2020, ICML 2019, NAACL 2019, and NeurIPS 2018 ML workshops and regularly serves as Senior/Area Chair and PC of top-tier machine learning and natural language processing conferences like NeurIPS, ICML, ACL, NAACL, AAAI, EMNLP, COLING, KDD, and WSDM.