Project LLMao
Sarcasm style transfer using fine-tuned language models. We train BART and LLaMA variants to rewrite sarcastic news headlines as neutral, factual equivalents while preserving meaning.
Data Pipeline
How 28,619 NHDSD headlines became 89,688 strategy-annotated training pairs through LLM generation and cross-validation.
02Evaluation
What each of the 7 metrics measures, why we use it, where it breaks down. Read this before the dashboard.
03Dashboard
Compare 14 models across 7 evaluation metrics with interactive charts and strategy breakdowns.
04Sample Explorer
Browse 2,857 test samples with filtering, search, and side-by-side model comparison.
05Playground
Type a sarcastic headline and watch our models rewrite it in real-time via LMStudio.
06Human Evaluation
140 samples × 3 models × 2 annotators (κ > 0.8). Three sarcasm classifiers all disagree with humans (κ = −0.11 to +0.18) — receipts inside.
Model Training
BART with context enhancement and REINFORCE + KL penalty. LLaMA 3.2 1B with LoRA fine-tuning. T5 baselines and ablation studies.