Sumit Yadav

Sumit Yadav

aka rockerritesh

AI Safety & Mechanistic Interpretability Researcher

Computer Engineering graduate · Pulchowk Campus, Tribhuvan University, Nepal

Hello! I'm an AI researcher working on the interpretability and safety of language models, and on bringing language technology to Maithili and other under-served languages. I work on AI safety and agentic systems at Astha.ai, write here, and post short explainers on YouTube.

2019
Began B.E. Computer Engineering, Pulchowk Campus; started learning AI/ML
2020
First project — IRB image-recognition robotic arm; Deep Learning Specialization (DeepLearning.AI)
2021
GANs Specialization & DeepLearning.AI GAN mentor; LogPoint CTF win
2022
DELTA 3.0 & IT-Meet vision wins
2023
ML for Tirhuta Lipi; GritFeat AI Hackathon
2024
AI Engineer at AMNIL; banknote-security paper
2025
AI research at Astha.ai (safety & agents)
2026
ACL Main, GLOW @ IJCAI, maiBERT; CE graduate

Publications

See my Google Scholar for the full list.

Topic

SafeConstellations: Mitigating Over-Refusals in LLMs Through Task-Aware Representation Steering

Utsav Maskey, Sumit Yadav, Mark Dras, Usman Naseem

ACL 2026 (Main Conference)

LLM representations follow distinct per-task "constellation" trajectories in embedding space; SafeConstellations is an inference-time method that selectively steers only over-refusal-prone tasks toward non-refusal pathways — cutting over-refusals by up to 73% with minimal utility loss and no retraining.

On the Relationship Between Representation Geometry and Generalization in Deep Neural Networks

Sumit Yadav (sole author)

Preprint, 2026

Effective dimension — an unsupervised, label-free geometric metric — predicts generalization across domains: partial r = 0.75 over 52 ImageNet classifiers (13 architecture families), replicating on CIFAR-10 and generalizing to NLP (8 encoders on SST-2/MNLI, 15 decoder-only LLMs on AG News). Establishes bidirectional causality — degrading geometry with noise drops accuracy (r = −0.94), while PCA that improves geometry preserves it.

Geometric Phases of Mechanism Formation in Neural Networks

Sumit Yadav

GLOW 2026 — Workshop on Generalizing from Limited Resources in the Open World @ IJCAI 2026 (Poster)

Using linear probes and centered kernel alignment (CKA) across dense training checkpoints, finds that classification mechanisms form output-layer-first and within the first ~5% of training: output layers reach >70% probe accuracy by epoch 5 while input layers stay below 50% (Cohen's d = 3.68). The same deep-first pattern holds in the first ~200M tokens of from-scratch LLM pretraining (GPT-2 Small, SmolLM2-135M) and reproduces on public Pythia / OLMo-2 checkpoints — and isn't explained by gradient magnitude. arXiv & code coming soon.

MaiBERT: A Pre-training Corpus and Language Model for Low-Resourced Maithili Language

Sumit Yadav, Raju Kumar Yadav, Utsav Maskey, Gautam Siddharth Kashyap, Ganesh Gautam, Usman Naseem

LoResLM @ ACL 2026 (pp. 444–452), Rabat, Morocco

The first monolingual BERT for Maithili (~50M speakers), pre-trained with masked language modeling on a newly constructed Maithili corpus. Reaches 87.02% accuracy on news classification — outperforming regional models NepBERTa and HindiBERT with 5–7% gains across classes — and is open-sourced for downstream tasks like sentiment analysis and NER. (Earlier preprint: "Can maiBERT Speak for Maithili?")

Revolutionizing Currency Security: A YOLOv8-Based Approach for Detecting Counterfeit Nepali Banknotes

Sumit Yadav et al.

J. Bus. Econ. Stud., 2024

Classifies Nepalese banknotes as genuine or counterfeit to help curb financial fraud. On 180 samples of the 1000-rupee note, YOLOv8 reaches a true-positive recall of 0.82 on the front face and 0.9863 on the back, with strong mAP — and the approach is adaptable across hardware platforms.

Evaluating Auto-Encoding Transformer Language Models for Maithili Text Classification

Sumit Yadav, Raju Kumar Yadav

B.E. Thesis (Electronics & Computer Engineering), Tribhuvan University, 2024

Builds a Maithili masked language model via transfer learning and fine-tunes it on a curated Maithili news classification dataset — the precursor work to maiBERT, addressing the absence of dedicated language models and task-specific data for Maithili.

Machine Learning Analysis of Tirhuta Lipi

Sumit Yadav, Raju Kumar Yadav

Technical Report, Tribhuvan University, 2023

Character recognition for the Tirhuta script (Maithili) — MobileNet embeddings with logistic regression reach 0.97 accuracy, enabling OCR and translation for a low-resource script.

Support Vectors Are a Better Way of Text Classification for Imbalanced Data

Sumit Yadav et al.

2023

A TF-IDF + n-gram support-vector pipeline for text classification across 100+ classes on highly imbalanced data. It outperforms neural baselines on the test set and supports incremental retraining as users supply fresh data. Consolidates the team's 1st Runner-Up submissions at Docsumo DataRush (LOCUS 2021) and DataVerse (LOCUS 2023).

Writing

Notes and essays on AI, math, and a few things in between.

All 60 posts →

Experience

  • AI Researcher — Safety & Agentic Systems, Astha.ai · 2025–present
    Zero-Trust agent oversight, MCP-Scanner vulnerability platform, SAFE-MCP framework.
  • AI Engineer — RAG & Infrastructure, AMNIL Technologies · 2024–2025
    Guardrails, LLM-as-a-Judge evaluation, self-hosted LLM serving with vLLM.
  • Data Team Lead, GradeUp Educations · 2022–2024
    Learning agents/chatbots, an automated grade-evaluation system, and semantic-similarity matching.
  • GAN Specialization Mentor, DeepLearning.AI · 2021–present

Projects

  • maiBERT — First BERT for Maithili (demo)
  • Whisper-tiny Maithili (ASR) — Open Maithili speech-to-text — OpenAI Whisper-tiny fine-tuned on the IISc SYSPIN corpus (63.9% WER) (live demo)
  • MMS-TTS Maithili (TTS) — Maithili text-to-speech — VITS / Meta MMS-TTS fine-tuned on a SYSPIN male voice (live demo)
  • SAFE-MCP / SAF-MCP — Contributed detection techniques and mitigations to a community security framework for the Model Context Protocol (MCP) — an ATT&CK-style catalogue of agentic-system threats and defenses
  • AgentGuard — Zero-Trust protocol for AI agents: identity, policy, mTLS, audit (Python SDK + Go server)
  • spiffe-core · TraT — SPIFFE-based agent identity/attestation and Transaction Tokens for multi-agent workflows (TraT)
  • sumit-mcp-server — Federated memory MCP server (live on HF Spaces)
  • Vibe-Coder — An agent that builds Streamlit/FastAPI apps
  • IRB Robotics Arm — Open-source image-recognition robotic arm (UN SDG3)

Honors & Awards

  • Winner, GritFeat AI Hackathon (2023)
    SWIFT — wearable LSTM fall-detection for the elderly (79.86%).
  • 1st Runner-Up, Docsumo DataVerse — LOCUS 2023 (2023)
    Team Deep Learners — NLP classification of imbalanced research-paper abstracts.
  • 1st Runner-Up, Docsumo DataRush — LOCUS 2021 (2021)
    Team Deep Learners — abstract classification into 158 classes (SVC + TF-IDF).
  • Best AI Project, DELTA 3.0 (2022)
    Nepali Harvest — crop-disease prediction & harvest timing.
  • Winner, IT-Meet Image Challenge (2022)
    Computer-vision classification of Nepali ballot-paper images.
  • Winner, LogPoint Capture The Flag (2021)
    Binary exploitation & forensics.

Documents & Links

Notes & Lab Reports