I am a Computer Engineering student at Pulchowk Campus, Tribhuvan University,
working on the interpretability and safety of language models. My research looks at how
safety-aligned LLMs fail silently — through over-refusals, geometric misrepresentation, and
surface-level triggers — and how the internal structure of a model's representations can be
understood and steered to make it safer and more reliable.
A recurring theme in my work is geometry: representations trace structured trajectories
inside a model, and that structure turns out to predict both safety behavior and
generalization. I am equally invested in extending these tools to low-resource and
multilingual settings, where I built the first language model for Maithili (~50M speakers).
I currently lead AI-safety and agentic-systems research at Astha.ai,
and I am ready to begin a PhD (Fall 2027) in mechanistic interpretability and AI alignment —
actively looking for the right group to join.
Selected Publications
See my Google Scholar for the full list.
1.
SafeConstellations: Mitigating Over-Refusals in LLMs Through Task-Aware Representation Steering.
ACL 2026 (Main Conference).
An inference-time, task-aware trajectory-shifting method that cuts over-refusals by up to
73%
with minimal utility loss — no retraining required.
[paper]
2.
On the Relationship Between Representation Geometry and Generalization in Deep Neural Networks.
Preprint, 2026 (sole author).
Shows that
effective dimension — an unsupervised geometric metric — predicts generalization
across vision and language models (partial
r = 0.75 over 52 classifiers).
[paper]
3.
Geometric Phases of Mechanism Formation in Neural Networks.
Working paper, 2026.
Traces how internal mechanisms form across training using linear probes, CKA, and targeted
ablations (CIFAR-10/100).
4.
Can maiBERT Speak for Maithili?
LoResLM @ ACL 2026.
The first monolingual BERT for Maithili (~50M speakers);
87% accuracy on news classification,
outperforming MuRIL and NepBERTa.
[paper]
[model]
5.
Revolutionizing Currency Security: A YOLOv8-Based Approach for Detecting Counterfeit Nepali Banknotes.
J. Bus. Econ. Stud., 2024.
[paper]
6.
Machine Learning Analysis of Tirhuta Lipi.
2023.
0.97 accuracy in Tirhuta script recognition for OCR and translation of low-resource scripts.
[paper]
7.
Support Vectors Are a Better Way of Text Classification for Imbalanced Data.
2023.
A robust SVC method for 100+ class text classification under severe imbalance.
[paper]
8.
Per-Block kNN Probing and a Lightweight Sequence Head over Frozen Perch~v2 Features for BirdCLEF+ 2026.
Submitted, LifeCLEF 2026 working notes (CEUR-WS).
A cheap, training-free probe that ranks the 26 MBConv blocks of Google's Perch~v2 by transferability,
plus a ~430k-parameter attention-pool head. Earned a
Kaggle Competition Bronze Medal
(rank
354 / 4084, top 8.7%, public LB 0.950 / private LB 0.942;
+590-place public→final rerank).
[paper]
[code]
[blog]