Gunjan Chhablani

prof_pic.jpg

Hi! I am Gunjan Chhablani, currently an ML Engineer at Oracle India Pvt. Ltd. where I work on building and deploying machine learning models for NLP. I graduated from BITS Pilani Goa in 2020, with a B.E. in Computer Science. In my free time I work on CV/NLP research projects.

I am extremely interested in deep learning and its applications and use my free time to work on research projects. My main research interest lies in the intersection of computer vision and natural language processing, and towards explaining the systems in these domains. I also work on adding models and features to 🤗datasets and 🤗transformers. Check the models that I have added here: VisualBert, PLBart, FNet.

In the little personal time that I get, I love working out, reading self-help books, watching a bit of Netflix, going out for a stroll, and listening to music/podcasts.

Feel free to hit me up for a discussion about any of the above, or just to say hi!


News

Feb, 2022 Our paper on Superpixel-based Knowledge Infusion in Deep Neural Networks for Image Classification was accepted for ACMSE 2022.
Feb, 2022 I will be joining the AI Services - Language Team at Oracle India Pvt. Ltd.
Jan, 2022 Our paper on Multitask Prompted Training Enables Zero-Shot Task Generalization was accepted for ICLR 2022 Spotlight.
Nov, 2021 The Datasets paper was awarded the best paper award at EMNLP 2021 System Demonstrations track.
Sep, 2021 Our paper on DRIFT was accepted at EMNLP 2021 System Demonstrations track.
Sep, 2021 Our paper on Huggingface Datasets was accepted at EMNLP 2021 System Demonstrations track.
Mar, 2021 Participated in HuggingFace XLSR fine-tuning week. Check out my models here.
Mar, 2021 Our papers on Toxic Spans Detection and ReCAM were accepted at SemEval-2021.
Jan, 2021 Participated in SemEval-2021 Task-5 Toxic Spans Detection. See: Toxic Spans Detection
Jan, 2021 Participated in SemEval-2021 Task-4 Reading Comprehension of Abstract Meaning. See: ReCAM
Jan, 2021 Participated in ML Reproducibility Challenge 2020.
See: Report, Code
Sep, 2020 Received the bronze medal for ranking third in Batch of 2020 at BITS Goa.
See: Video, Article 1, Article 2
Sep, 2020 I am joining Oracle India Pvt. Ltd. as an SWE in the HCM team.

Papers

Superpixel-based Knowledge Infusion in Deep Neural Networks for Image Classification

Domain(s): Graph Neural Networks , Computer Vision

We combine superpixel information with CNNs using differet kinds of GNNs to enhance the representation learned by the CNN. This helps us improve the classification performance on several tasks which we show in the paper.


Multitask Prompted Training Enables Zero-Shot Task Generalization

Domain(s): Datasets , NLP , Computer Vision

I contributed to prompting of the T0pp model on zero-shot generalization of transformer models to several tasks using HuggingFace datasets and Jinja templates. The model can be found here. This model is 16x smaller than GPT-3 but outperforms it on several tasks.


DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature

Domain(s): NLP , Computational Linguistics , Diachronic Analysis

We developed an open source toolkit called DRIFT for diachronic analysis of scientific literature using TWEC model for temporal word embeddings. We consolidated 10 different analysis methods with visualizations, including acceleration plot, semantic drift, wordclouds, tracking trends, LDA analysis, etc.

My team and I (part of what we call the “Language Research Group”) worked on making this tool using Streamlit. Our paper was accepted at EMNLP Demo 2021.


Datasets: A Community Library for Natural Language Processing

Domain(s): Datasets , NLP , Computer Vision

I contributed several datasets to the 🤗 Datasets and several fixes and features. I was one of the top-15 contributors and one of the authors of the paper. Our paper received the best paper award at EMNLP 2021 System Demonstrations track.


Toxic Spans Detection

Domain(s): NLP , Computational Linguistics

This is a competition hosted for the workshop SemEval-21. My team and I (part of what we call the “Language Research Group”) along with Prof. Shan Suthaharan, worked on analyzing span prediction and token classification approaches on BERT for this task, along with a few hybrid approaches including multi-span detection, span+token detection and BERT with LSTM CRF.


Reading Comprehension of Abstract Meaning

Domain(s): NLP , Computational Linguistics

This is a competition hosted for the workshop SemEval-21. My team and I (part of what we call the “Language Research Group”) along with Prof. Tirtharaj Dash. We devised several systems with augmentations and hypernym/hyponym information, statistical analysis for imperceptibility, and propose an idea to get the region in sentences with maximum relevant context to the question answered.