Sumit Yadav
aka rockerritesh
AI Safety & Mechanistic Interpretability Researcher
Computer Engineering graduate · Pulchowk Campus, Tribhuvan University, Nepal
Hello! I'm an AI researcher working on the interpretability and safety of language models, and on bringing language technology to Maithili and other under-served languages. I work on AI safety and agentic systems at Astha.ai, write here, and post short explainers on YouTube.
Publications
See my Google Scholar for the full list.
SafeConstellations: Mitigating Over-Refusals in LLMs Through Task-Aware Representation Steering
ACL 2026 (Main Conference)
LLM representations follow distinct per-task "constellation" trajectories in embedding space; SafeConstellations is an inference-time method that selectively steers only over-refusal-prone tasks toward non-refusal pathways — cutting over-refusals by up to 73% with minimal utility loss and no retraining.
On the Relationship Between Representation Geometry and Generalization in Deep Neural Networks
Preprint, 2026
Effective dimension — an unsupervised, label-free geometric metric — predicts generalization across domains: partial r = 0.75 over 52 ImageNet classifiers (13 architecture families), replicating on CIFAR-10 and generalizing to NLP (8 encoders on SST-2/MNLI, 15 decoder-only LLMs on AG News). Establishes bidirectional causality — degrading geometry with noise drops accuracy (r = −0.94), while PCA that improves geometry preserves it.
Geometric Phases of Mechanism Formation in Neural Networks
GLOW 2026 — Workshop on Generalizing from Limited Resources in the Open World @ IJCAI 2026 (Poster)
Using linear probes and centered kernel alignment (CKA) across dense training checkpoints, finds that classification mechanisms form output-layer-first and within the first ~5% of training: output layers reach >70% probe accuracy by epoch 5 while input layers stay below 50% (Cohen's d = 3.68). The same deep-first pattern holds in the first ~200M tokens of from-scratch LLM pretraining (GPT-2 Small, SmolLM2-135M) and reproduces on public Pythia / OLMo-2 checkpoints — and isn't explained by gradient magnitude. arXiv & code coming soon.
MaiBERT: A Pre-training Corpus and Language Model for Low-Resourced Maithili Language
LoResLM @ ACL 2026 (pp. 444–452), Rabat, Morocco
The first monolingual BERT for Maithili (~50M speakers), pre-trained with masked language modeling on a newly constructed Maithili corpus. Reaches 87.02% accuracy on news classification — outperforming regional models NepBERTa and HindiBERT with 5–7% gains across classes — and is open-sourced for downstream tasks like sentiment analysis and NER. (Earlier preprint: "Can maiBERT Speak for Maithili?")
Revolutionizing Currency Security: A YOLOv8-Based Approach for Detecting Counterfeit Nepali Banknotes
J. Bus. Econ. Stud., 2024
Classifies Nepalese banknotes as genuine or counterfeit to help curb financial fraud. On 180 samples of the 1000-rupee note, YOLOv8 reaches a true-positive recall of 0.82 on the front face and 0.9863 on the back, with strong mAP — and the approach is adaptable across hardware platforms.
Evaluating Auto-Encoding Transformer Language Models for Maithili Text Classification
B.E. Thesis (Electronics & Computer Engineering), Tribhuvan University, 2024
Builds a Maithili masked language model via transfer learning and fine-tunes it on a curated Maithili news classification dataset — the precursor work to maiBERT, addressing the absence of dedicated language models and task-specific data for Maithili.
Machine Learning Analysis of Tirhuta Lipi
Technical Report, Tribhuvan University, 2023
Character recognition for the Tirhuta script (Maithili) — MobileNet embeddings with logistic regression reach 0.97 accuracy, enabling OCR and translation for a low-resource script.
Support Vectors Are a Better Way of Text Classification for Imbalanced Data
2023
A TF-IDF + n-gram support-vector pipeline for text classification across 100+ classes on highly imbalanced data. It outperforms neural baselines on the test set and supports incremental retraining as users supply fresh data. Consolidates the team's 1st Runner-Up submissions at Docsumo DataRush (LOCUS 2021) and DataVerse (LOCUS 2023).
Writing
Notes and essays on AI, math, and a few things in between.
Experience
-
AI Researcher — Safety & Agentic Systems, Astha.ai
Zero-Trust agent oversight, MCP-Scanner vulnerability platform, SAFE-MCP framework.
-
AI Engineer — RAG & Infrastructure, AMNIL Technologies
Guardrails, LLM-as-a-Judge evaluation, self-hosted LLM serving with vLLM.
-
Data Team Lead, GradeUp Educations
Learning agents/chatbots, an automated grade-evaluation system, and semantic-similarity matching.
- GAN Specialization Mentor, DeepLearning.AI
Projects
- maiBERT — First BERT for Maithili (demo)
- Whisper-tiny Maithili (ASR) — Open Maithili speech-to-text — OpenAI Whisper-tiny fine-tuned on the IISc SYSPIN corpus (63.9% WER) (live demo)
- MMS-TTS Maithili (TTS) — Maithili text-to-speech — VITS / Meta MMS-TTS fine-tuned on a SYSPIN male voice (live demo)
- SAFE-MCP / SAF-MCP — Contributed detection techniques and mitigations to a community security framework for the Model Context Protocol (MCP) — an ATT&CK-style catalogue of agentic-system threats and defenses
- AgentGuard — Zero-Trust protocol for AI agents: identity, policy, mTLS, audit (Python SDK + Go server)
- spiffe-core · TraT — SPIFFE-based agent identity/attestation and Transaction Tokens for multi-agent workflows (TraT)
- sumit-mcp-server — Federated memory MCP server (live on HF Spaces)
- Vibe-Coder — An agent that builds Streamlit/FastAPI apps
- IRB Robotics Arm — Open-source image-recognition robotic arm (UN SDG3)
Honors & Awards
-
Winner, GritFeat AI Hackathon
SWIFT — wearable LSTM fall-detection for the elderly (79.86%).
-
1st Runner-Up, Docsumo DataVerse — LOCUS 2023
Team Deep Learners — NLP classification of imbalanced research-paper abstracts.
-
1st Runner-Up, Docsumo DataRush — LOCUS 2021
Team Deep Learners — abstract classification into 158 classes (SVC + TF-IDF).
-
Best AI Project, DELTA 3.0
Nepali Harvest — crop-disease prediction & harvest timing.
-
Winner, IT-Meet Image Challenge
Computer-vision classification of Nepali ballot-paper images.
-
Winner, LogPoint Capture The Flag
Binary exploitation & forensics.