Hello, my name is
Joydeep Kumar Devnath
Senior Design Engineer, ML Compiler & Systems
- joydeepkrdevnath@gmail.com
- +91 8638452791
About me
Artist and AI/VLSI Engineer with a strong foundation in deep learning systems and 5+ years of industry experience in ML compilers, AI systems, and hardware–software co-design.
I design and implement AI algorithms with a focus on software–hardware integration.
With industry experience spanning ML compilers, AI systems, and hardware–software co-design, my work focuses on backend compiler pipelines and IR-level optimizations for AI accelerators, including Qualcomm’s Hexagon Tensor Processor (Snapdragon SoCs).
I have hands-on experience deploying (training + inference) and optimizing generative and deep learning models, and have contributed to production-grade compiler and kernel libraries. My background includes performance optimization, quantization workflows for neural processors, and energy-efficient fused operations, with several contributions protected by patents. This site highlights selected projects and research from my work. Explore my portfolio to learn more.
Skills
ML Compilers & Systems
- ML graph compilers: IR design, lowering, and optimization passes
- Backend compiler pipelines for AI accelerators
- Hardware-aware code generation and kernel optimization
- Quantization workflows for efficient inference (uint8/16 pipelines)
Programming & Low-Level Systems
- C / C++ for performance-critical systems
- Python for compiler tooling and infrastructure
- Verilog / RTL for hardware interaction
- SIMD optimization using intrinsics and low-level assembly
AI Frameworks & Model Interfaces
- PyTorch, TensorFlow
- ONNX model ingestion and conversion
- MLIR-style compiler architectures
AI Hardware & Accelerators
- Qualcomm Hexagon Tensor Processor (HTP)
- Custom neural processors and FPGA targets
- Memory hierarchy, DMA, and vector/matrix units
Optimization & Performance
- IR-level graph and operator optimizations
- Operator fusion, kernel tiling, and scheduling
- Latency, throughput, and memory optimization
- Model accuracy debugging and performance analysis
Tools & Infrastructure
- LLVM and MLIR (applied concepts)
- Git, Gitea, CI workflows
- Cadence Virtuoso, Catapult, Xilinx Vivado
Focus areas: ML compiler backends, performance optimization, and hardware–software co-design.
My Experience
Senior Engineer
Working on the backend of the Hexagon NPU compiler stack, focusing on IR-level optimizations, kernel development, and performance tuning for on-device AI workloads on Snapdragon SoCs.
- Designed multi-level IR lowering paths (analogous to MLIR) in the HTP graph compiler for ML operations.
- Implemented high-performance backend kernels using HVX/HMX SIMD intrinsics and assembly.
- Delivered optimized MaxPool3D and Conv-Transpose kernels, achieving measurable latency reductions.
- Optimized and deployed generative and deep learning models on Hexagon Tensor Processor.
- Debugged accuracy issues and performance regressions across compiler and runtime layers.
AI Senior Design Engineer
Led compiler and systems development for in-house AI hardware across life sciences and automotive domains, focusing on backend architecture, performance, and hardware-aware algorithms.
- Architected a custom compiler backend targeting analog compute circuits.
- Led a team of two engineers to integrate and optimize deep learning models for life-sciences workloads, including molecular and graph-based learning.
- Designed system-level software architecture for automotive platforms, referencing AUTOSAR and MISRA standards.
- Contributed to lightweight RTOS design and hardware–software interface definition for embedded AI systems.
AI Design Engineer
Worked on neural processor compiler development, energy-efficient AI algorithms, and hardware-aware optimization across AI acceleration and security research.
- Invented a patented, power-efficient fused CNN layer achieving 3–4× energy efficiency.
- Developed graph-level and backend compiler optimizations for an in-house neural processor.
- Implemented optimized operator kernels and explored posit number systems for neural computation.
- Designed quantization-aware training algorithms tailored to custom hardware.
- Researched cryptographic algorithms and microarchitectural side-channel attacks for AI security.
- Contributed to multiple patent filings through algorithmic and systems-level innovations.
Junior Research Fellow
Conducted research on neural network accelerators with a focus on memory architectures and hardware-aware design.
- Researched and co-designed memory architectures for binary neural networks.
- Co-authored a research publication on accelerator-efficient NN design.
Education
M.Tech Electrical Engineering (Microelectronics and VLSI):
My master’s thesis, “Energy-Efficient Architectures for Neural Networks,” advised by Prof. Joycee Mekie, focused on numerical representations for efficient and robust neural computation. I derived analytical bounds for the minimum exponent width in floating-point weight formats to preserve model accuracy, and established a mathematical relationship between mantissa precision and network depth, proposing techniques to improve resilience to bit-level errors. My academic background encompasses a diverse range of subjects, including Pattern Recognition, Machine Learning, Artificial Intelligence, 3D Computer Vision, VLSI Design, Physics of Transistors, and CMOS Analog IC Design.
B.E. Electronics and Telecommunication Engineering [Gold Medal]
My final year project, “Prediction of Water Usage Based on Weather Data Patterns Using Neural Networks,” supervised by Prof. Rashi Borgohain and Mr. Tanmoy Goswami, focused on data-driven resource optimization. The work involved building an embedded system using Raspberry Pi, integrating sensors for soil characteristics and water usage, and implementing a neural network for prediction. A web-based interface with RESTful APIs was developed for user interaction and monitoring. Additionally, during an internship at IIT Guwahati, I worked on a machine learning–based human presence detection system using an AmigoBot, enabling autonomous navigation and litter collection upon detection.
