A Systematic Review of Hybrid Sarcasm Detection: Fusing Contextual Embeddings with Handcrafted Linguistic Features

Samyak Ingle; Saurabh aghadate; Siddhi S. Pampattiwar; Aditi M. Kamble; Arati D. Paraskar; Prof. Abhishekh R. Ladole

Authors

Samyak Ingle P.R. Pote Patil College Of Engineering And Management Amravti Author
Saurabh aghadate Author
Siddhi S. Pampattiwar Author
Aditi M. Kamble Author
Arati D. Paraskar Author
Prof. Abhishekh R. Ladole Author

Keywords:

Natural Language Processing (NLP), Sentiment Analysis, Linguistic Features, Hybrid Sarcasm Detection, Feature Engineering, Ensemble Learning, Deep Learning, Contextual Embeddings

Abstract

Sarcasm detection (SD) in Natural Language Processing (NLP) constitutes a significant challenge, as sarcastic expressions convey the opposite of their literal meaning, often reversing sentiment polarity. This ambiguity is amplified in text by the absence of non-verbal cues like tone and facial expressions. While modern transformer models excel at capturing deep context, they often fail to register the explicit rhetorical structures inherent in irony ; conversely, traditional feature-based models capture linguistic structure but lack deep semantic understanding. To address these limitations, this paper proposes a novel, two-branch Hybrid Contextual- Linguistic Sarcasm Detector (HCL-SD) framework. The HCL-SD framework synergistically integrates deep contextual embeddings, derived from fine-tuned RoBERTa/DistilBERT models, with a meticulously engineered set of 13 handcrafted linguistic features (such as Entropy, Readability Scores, and Part-of-Speech counts). This dual-branch approach allows the model to simultaneously learn implicit semantic incongruity and explicit rhetorical cues.

The resulting fused feature space is classified using an optimized Ensemble Model employing a Majority Voting scheme. Comprehensive experimentation on benchmark datasets, including the News Headlines and Mustard datasets, demonstrates the framework's superior performance. The proposed approach achieved a state-of-the-art F1-Score of 0.997 on the News Headlines dataset. Crucially, the integration of contextual metadata was proven essential for generalization, dramatically improving the cross-domain F1-Score on the Reddit validation dataset from 0.70 to 0.92. This research confirms that combining deep contextual comprehension with explicit linguistic feature engineering is indispensable for constructing robust, efficient, and highly accurate sarcasm detection systems.

Author Biographies

Siddhi S. Pampattiwar

Department of Artificial Intelligence and Data Science
Aditi M. Kamble

Department of Artificial Intelligence and Data Science
Arati D. Paraskar

Department of Artificial Intelligence and Data Science
Prof. Abhishekh R. Ladole

Assistant Professor and Co-Author, Department of Artificial Intelligence and Data Science

A Systematic Review of Hybrid Sarcasm Detection: Fusing Contextual Embeddings with Handcrafted Linguistic Features

Authors

Keywords:

Abstract

Author Biographies

Downloads

Published

Issue

Section

How to Cite

Information

Indexing

Keywords