Hello, my name is

Joydeep Kumar Devnath

Senior Design Engineer, ML Compiler & Systems

About me

Artist and AI/VLSI Engineer with a strong foundation in deep learning systems and 5+ years of industry experience in ML compilers, AI systems, and hardware–software co-design.

I design and implement AI algorithms with a focus on software–hardware integration.

With industry experience spanning ML compilers, AI systems, and hardware–software co-design, my work focuses on backend compiler pipelines and IR-level optimizations for AI accelerators, including Qualcomm’s Hexagon Tensor Processor (Snapdragon SoCs).

I have hands-on experience deploying (training + inference) and optimizing generative and deep learning models, and have contributed to production-grade compiler and kernel libraries. My background includes performance optimization, quantization workflows for neural processors, and energy-efficient fused operations, with several contributions protected by patents. This site highlights selected projects and research from my work. Explore my portfolio to learn more.

What I do

From understanding the requirements, designing a blueprint and delivering the final product, I do everything that falls in between these lines.

Skills

ML Compilers & Systems

  • ML graph compilers: IR design, lowering, and optimization passes
  • Backend compiler pipelines for AI accelerators
  • Hardware-aware code generation and kernel optimization
  • Quantization workflows for efficient inference (uint8/16 pipelines)

Programming & Low-Level Systems

  • C / C++ for performance-critical systems
  • Python for compiler tooling and infrastructure
  • Verilog / RTL for hardware interaction
  • SIMD optimization using intrinsics and low-level assembly

AI Frameworks & Model Interfaces

  • PyTorch, TensorFlow
  • ONNX model ingestion and conversion
  • MLIR-style compiler architectures

AI Hardware & Accelerators

  • Qualcomm Hexagon Tensor Processor (HTP)
  • Custom neural processors and FPGA targets
  • Memory hierarchy, DMA, and vector/matrix units

Optimization & Performance

  • IR-level graph and operator optimizations
  • Operator fusion, kernel tiling, and scheduling
  • Latency, throughput, and memory optimization
  • Model accuracy debugging and performance analysis

Tools & Infrastructure

  • LLVM and MLIR (applied concepts)
  • Git, Gitea, CI workflows
  • Cadence Virtuoso, Catapult, Xilinx Vivado

Focus areas: ML compiler backends, performance optimization, and hardware–software co-design.

My Experience

2024-Present

Senior Engineer

Working on the backend of the Hexagon NPU compiler stack, focusing on IR-level optimizations, kernel development, and performance tuning for on-device AI workloads on Snapdragon SoCs.

  • Designed multi-level IR lowering paths (analogous to MLIR) in the HTP graph compiler for ML operations.
  • Implemented high-performance backend kernels using HVX/HMX SIMD intrinsics and assembly.
  • Delivered optimized MaxPool3D and Conv-Transpose kernels, achieving measurable latency reductions.
  • Optimized and deployed generative and deep learning models on Hexagon Tensor Processor.
  • Debugged accuracy issues and performance regressions across compiler and runtime layers.
2023-2024

AI Senior Design Engineer

Led compiler and systems development for in-house AI hardware across life sciences and automotive domains, focusing on backend architecture, performance, and hardware-aware algorithms.

  • Architected a custom compiler backend targeting analog compute circuits.
  • Led a team of two engineers to integrate and optimize deep learning models for life-sciences workloads, including molecular and graph-based learning.
  • Designed system-level software architecture for automotive platforms, referencing AUTOSAR and MISRA standards.
  • Contributed to lightweight RTOS design and hardware–software interface definition for embedded AI systems.
2020-2023

AI Design Engineer

Worked on neural processor compiler development, energy-efficient AI algorithms, and hardware-aware optimization across AI acceleration and security research.

  • Invented a patented, power-efficient fused CNN layer achieving 3–4× energy efficiency.
  • Developed graph-level and backend compiler optimizations for an in-house neural processor.
  • Implemented optimized operator kernels and explored posit number systems for neural computation.
  • Designed quantization-aware training algorithms tailored to custom hardware.
  • Researched cryptographic algorithms and microarchitectural side-channel attacks for AI security.
  • Contributed to multiple patent filings through algorithmic and systems-level innovations.

Junior Research Fellow

Conducted research on neural network accelerators with a focus on memory architectures and hardware-aware design.

  • Researched and co-designed memory architectures for binary neural networks.
  • Co-authored a research publication on accelerator-efficient NN design.

Education

M.Tech Electrical Engineering (Microelectronics and VLSI):

My master’s thesis, “Energy-Efficient Architectures for Neural Networks,” advised by Prof. Joycee Mekie, focused on numerical representations for efficient and robust neural computation. I derived analytical bounds for the minimum exponent width in floating-point weight formats to preserve model accuracy, and established a mathematical relationship between mantissa precision and network depth, proposing techniques to improve resilience to bit-level errors. My academic background encompasses a diverse range of subjects, including Pattern Recognition, Machine Learning, Artificial Intelligence, 3D Computer Vision, VLSI Design, Physics of Transistors, and CMOS Analog IC Design.

B.E. Electronics and Telecommunication Engineering [Gold Medal]

My final year project, “Prediction of Water Usage Based on Weather Data Patterns Using Neural Networks,” supervised by Prof. Rashi Borgohain and Mr. Tanmoy Goswami, focused on data-driven resource optimization. The work involved building an embedded system using Raspberry Pi, integrating sensors for soil characteristics and water usage, and implementing a neural network for prediction. A web-based interface with RESTful APIs was developed for user interaction and monitoring. Additionally, during an internship at IIT Guwahati, I worked on a machine learning–based human presence detection system using an AmigoBot, enabling autonomous navigation and litter collection upon detection.

Projects

Portfolio