Program 2026

Special Tracks: Embedded AI -

2 July 2026

(Stay tuned for our full FPGA Conference program dropping Monday, March 16th.)

from 8:00 a.m.

Check-In and Welcome Coffee

9:00 a.m.

9:00 a.m. – 9:40 a.m.

Embedded AI - #1

AI Basics: From image processing to perception and beyond

  • more info ▾

    Description:

    Deep learning has become ubiquitous in recent years. This talk reviews key technical advances in model development: from early, compute-heavy convolutions for image object detection and segmentation, to major gains in scale, resolution, and performance that enabled generative AI and impressive results even with video sequences.

    It covers convolution mathematics and processing fundamentals, highlighting how vectorized hardware accelerates CNNs and broader operations. These foundations extend to transformers, which power large language models (LLMs) and other foundation models accessible on desktops, often via data-center backends.

    Recent publications adapt these principles to smaller models suited for local deployment. The talk outlines challenges of running such architectures on resource-constrained edge devices and explores ways to design scalable, efficient models for on-device inference.


    Level: Beginner

Alexander Flick

Alexander Flick

PLC2 GmbH


9:00 a.m. – 9:40 a.m.

Embedded AI - #2

FPGA-Accelerated Multi-Camera AI Vision for High-Speed Industrial Inspection on Kria KR260

  • more info ▾

    Description:

    High-speed industrial inspection requires deterministic timing and efficient data processing at the edge. This session presents the ongoing development and technical framework of an AI vision system on the Zynq UltraScale+ MPSoC for defect detection in high-output production environments. The architecture utilizes hardware-level triggering within the Programmable Logic (PL) to synchronize a multi-camera array, ensuring precise frame acquisition. AI inference is offloaded to the AMD DPU via Vitis AI, aiming to enable real-time defect detection of 5+ units/sec within an Embedded Linux/C++ environment. We explore the methodology for synchronizing multi-camera GigE Vision streams and the optimization of the deep learning pipeline for low-latency results. This work outlines a compact and efficient embedded approach for meeting the performance requirements of automated inspection systems.


    Level: Advanced

Yunus Kök

Yunus Kök 

Anadologic


9:00 a.m. – 9:40 a.m.

Embedded AI - #3

Efficient 360° Threat Detection for Parked Vehicles - A Distributed, Event-Driven Approach

  • more info ▾

    Description:

    Automotive theft and vandalism remain costly global problems, driving the demand for smarter, energy-efficient vehicle protection systems. Traditional “always-on” video surveillance approaches rely on centralized compute architectures that consume excessive power and are unsuitable for multi-day parked operation. A new distributed, event-driven exterior monitoring concept enables 360° situational awareness while operating at sub-watt power levels. By using existing camera infrastructure and activating only when motion or threat patterns are detected, it provides scalable, low-latency protection without adding compute or bill-of-materials overhead. Beyond security, this approach supports vehicle personalization and user-adaptive responses, offering a flexible framework for future mobility platforms.


    In this session, you will learn more about:

    • Event-driven exterior monitoring concepts and low-power architectures
    • Techniques for achieving multi-day parked-state operation
    • How adaptive monitoring enhances personalization and security
    • Integration pathways for next-generation automotive platforms

    Level: Intermediate

Christian Mueller

Christian Mueller

Lattice Semiconductor GmbH


9:00 a.m. – 9:40 a.m.

Embedded AI - #4

Reinventing Super Resolution at the Edge: Custom FPGA AI Engines That Outrun GPUs

  • more info ▾

    Description:

    Edge AI demands low latency, energy efficient hardware, but general purpose GPUs and NPUs cannot meet strict power, performance and area constraints. FPGAs offer fine grained configurability, yet existing overlay style solutions—such as Altera’s CoreDLA and AMD’s DPU—fail to exploit the full architectural potential needed for real time workloads. Heronic addresses this gap with an automated tool flow that generates bespoke AI accelerator IP tuned to both the target FPGA and the specific model. This methodology achieves near peak silicon utilisation and has been proven on demanding tasks like super resolution upscaling, delivering deeply optimised pipelines with high data reuse and deterministic latency. Heronic demonstrates over 50% performance improvement versus NVIDIA Jetson Orin NX on Altera FPGAs, alongside a 28× increase in TOPS utilisation. This talk outlines how engineers can leverage Heronic’s approach to build highly efficient accelerators for Altera devices.


    Level: Intermediate

Alexander Montgomerie-Corcoran

Alexander Montgomerie-Corcoran

Heronic Technologies


9:50 a.m.


9:50 a.m. – 10:30 a.m.

Embedded AI - #1

Inside Edge AI: Processing Paradigms and Architectural Hints

  • more info ▾

    Description:

    Deploying neural networks at the edge requires understanding diverse hardware architectures and their trade-offs. This talk covers key edge AI processing paradigms: general-purpose CPUs with coprocessors, programmable GPUs, reconfigurable FPGAs, and dedicated vector processor arrays.


    It examines model optimization techniques—quantization, pruning, and tools like TensorFlow Lite and PyTorch Mobile—and their interaction with hardware-specific toolchains, often via ONNX. The deployment pipeline is detailed: from training frameworks, through conversion, to efficient on-device inference.


    Different neural network topologies suit hardware differently. Convolutional neural networks, dominant in edge vision tasks, map well across platforms. Emerging transformers, however, struggle with high memory bandwidth and data movement demands.


    These insights can aid product choices and decisions, while noting that no single architecture yet dominates all edge AI use cases.


    Level: Intermediate

Alexander Flick

Alexander Flick

PLC2 GmbH


9:50 a.m. – 10:30 a.m.

Embedded AI - #2

Building State of the Art Computer Vision Models for the Far Edge

  • more info ▾

    Description:

    This session is a technical deep dive into the development and deployment of computer vision AI models on far edge devices. 

    The content covers hardware and efficiency requirements, collaborative engineering workflows, pipeline and model design principles, and practical strategies for training, optimizing, and deploying small models under strict resource constraints. 

    Emphasis is placed on the importance of data quality, loss function selection, and hardware-aware training to achieve robust, real-time AI performance in industrial and consumer edge applications.


    Level: Intermediate

Karl Wachswender

Karl Wachswender

Lattice Semiconductor GmbH


9:50 a.m. – 10:30 a.m.

Embedded AI - #3

PipeGen: Agentic AI to Generate, Debug, and Deploy End-to-End Edge AI Pipelines

  • more info ▾

    Description:

    Edge AI deployment is mostly pipeline engineering: connecting real inputs, building

    pre/post-processing, integrating runtimes, debugging performance regressions, and

    packaging for devices. The model is only one component. This talk presents PipeGen , an

    agentic AI system that automates the end-to-end Build–Debug–Deploy workflow for Edge AI

    pipelines using a vendor-neutral pipeline manifest and a portable intermediate

    representation (IR). PipeGen compiles pipeline intent into target-specific implementations

    (e.g., GStreamer/ROS2/vendor runtimes), adds instrumentation for bottleneck attribution (I/O

    vs preprocess vs runtime vs postprocess), and generates reproducible build artifacts plus

    benchmark reports. The session explains the architecture (manifest => IR => adapters), the

    agent loop for generation and debugging, and a validation/benchmark harness designed for

    portability. It concludes with standardization-relevant recommendations for pipeline

    schemas, capability negotiation, conformance checks, and consistent metrics in a vendor

    agnostic manner.


    Level: Advanced

Yashwant Dagar

Yashwant Dagar

CraftifAI


9:50 a.m. – 10:30 a.m.

Embedded AI - #4

AI for Everyone with Altera Agilex3

  • more info ▾

    Description:

     Agilex3 is designed to make AI accessible, intuitive, and practical for individuals and organizations of all sizes. 


    Level: Beginner

Tolga Sel

Tolga Sel

Arrow Central Europe GmbH

Helmut Plötz

Helmut Plötz
ONE WARE GmbH i.G 


10:30 a.m. - 11:00 a.m.

Coffee Break and Time for Networking

11:00 a.m.


11:00 a.m. – 11:40 a.m.

Embedded AI - #1

Reality Over Peak Specs: Constraints Driven Platform Selection for Edge AI

  • more info ▾

    Description:

    Selecting a processor for edge AI is rarely about choosing the device with the highest advertised performance. Real systems are constrained by power, latency, memory, interfaces, development effort, long term availability, and commercial considerations. This talk presents a constraints driven approach to platform selection for edge AI applications. It discusses how different edge AI applications map to MCUs, MPUs, FPGAs, and dedicated accelerators, and why no single architecture is optimal across all scenarios. Using practical examples, the session highlights common pitfalls of datasheet based decisions and outlines a systematic way to evaluate trade offs between flexibility, efficiency, and system complexity in real deployments. The goal of this talk is to equip attendees with a practical framework for selecting the right compute platform for their next edge AI application.


    Level: Intermediate

Saad Qazi

Saad Qazi

EBV Elektronik GmbH & Co. KG


11:00 a.m. – 11:40 a.m.

Embedded AI - #2

Silicon Brains vs. Silicon Gates: Can LLMs Replace the FPGA Engineer?

  • more info ▾

    Description:

    Can an LLM truly understand timing constraints, or is it just 

    "hallucinating" hardware?

    As Artificial Intelligence transforms the landscape of software development, the FPGA and ASIC industries are facing a pivotal question: Are we witnessing the end of manual RTL coding? 

    Modern AI models can generate hundreds of lines of VHDL or Verilog in seconds, promising to slash development cycles and eliminate the "blank page" problem for hardware engineers.

    However, hardware design is fundamentally different from software. In a world where a single logic bug can lead to expensive re-spins or bricked prototypes, the stakes for "code quality" are absolute. 

    A model might generate syntactically correct code that completely ignores Clock Domain Crossings (CDC), metastability, or FPGA resource optimization.

    This webinar provides a transparent, no-hype evaluation of AI’s current capabilities in hardware description languages. We will put leading AI models to the test, benchmarking their output against the rigorous standards of professional FPGA engineering.


    Level: Beginner

Oren Hollander

Oren Hollander

HandsOn Training 


11:00 a.m. – 11:40 a.m.

Embedded AI - #3

Breaking the Data Bottleneck: Hardware-Accelerated Lossless Compression for Next-Generation AI Systems

  • more info ▾

    Description:

    As AI workloads scale in complexity and data volume, data movement has become a dominant bottleneck in FPGA-based acceleration platforms. Lossless data compression offers an effective mechanism to reduce storage, memory bandwidth, and interconnect pressure while preserving data accuracy, which is mandatory for training datasets, model checkpoints, distributed learning, and sensor telemetry. This presentation examines the role of high-performance lossless compression IP cores implemented on FPGAs, including LZ4, Zstd, Snappy, and GZIP. We discuss streaming architectures, throughput-latency tradeoffs, and resource utilization, and show how compression engines can be integrated into FPGA data paths such as DDR, Ethernet, and sensor interfaces. Practical AI use cases are presented, including high-speed data ingestion, compressed model loading, and distributed training support. Hardware-accelerated lossless compression is highlighted as a key enabler for scalable and energy-efficient FPGA-based AI systems.


    Level: Intermediate

Dr. Calliope-Louisa Sotiropoulou

Dr. Calliope-Louisa Sotiropoulou

CAST Inc.


11:00 a.m. – 12:30 a.m.   (90 min)

Embedded AI - #4

Accelerating Adoption of USB 10Gbps I/O in Edge AI and Embedded Systems

  • more info ▾

    Description:

    USB has long been the go-to interconnect for embedded applications due to its simplicity. However, despite the introduction of USB 3.1 Gen2 (now known as USB 3.2 Gen2) in 2013, which offers 10 Gbps of bandwidth, few applications have been able to harness the full potential of 10 Gbps USB connectivity. Now, exponential growth of AI in embedded systems has created a surge in demand for faster interconnects. In this session, we'll explore how Infineon's latest EZ-USB FX10 general-purpose USB 10Gbps peripheral controller is poised to accelerate the adoption of higher-bandwidth USB interconnects in embedded and edge AI applications, unlocking new possibilities for these rapidly evolving fields.


    Level: Intermediate

FliX Feng

FliX Feng

Infineon Technologies

Jimmy Chou

Jimmy Chou
Infineon Technologies


11:50 a.m.


11:50 a.m. – 12:30 a.m.

Embedded AI - #1

BYOM – Custom Model Edge Inference with Vitis AI

  • more info ▾

    Description:

    The new AMD Vitis AI 2025.1 toolchain introduces a streamlined workflow for optimizing and deploying deep learning models on AMD Versal AI Edge and Versal AI Edge Gen 2 devices. This talk outlines the updated processing flow centered on ONNX model input, detailing key improvements in quantization, compilation, and runtime integration.


    Using a pretrained model as a running example, the session walks through the "Bring Your Own Model" (BYOM) process by creating a snapshot of the model, which quantizes and compiles the model for the NPU hardware target. We will visit the controls available to scale and tune performance and discuss the tools to review the results. The snapshot achieved can then be deployed on physical hardware supported by the ARM CPU and the Vitis AI Runtime (VART). 


    Attendees can gain a good understanding of how Vitis AI 2025.1 supports flexible AI deployment on adaptive platforms and how to apply the updated tools efficiently in custom inference workflows.


    Level: Beginner

Alexander Flick

Alexander Flick

PLC2 GmbH


11:50 a.m. – 12:30 a.m.

Embedded AI - #2

Accelerating Edge AI with Efinix FPGAs: TinyML and eCNN for Real-World Applications

  • more info ▾

    Description:

    Edge AI demands low latency, minimal power use and flexible deployment. Traditional cloud-based AI falls short in these areas. Microcontrollers struggle with performance limits. High-end GPUs consume too much power. Efinix FPGAs bridge this gap with two optimized solutions. The TinyML framework brings lightweight neural networks to resource-constrained devices. The eCNN IP core delivers higher performance for complex models. Both run directly on Efinix FPGAs. This session will cover architecture details, performance data and deployment examples. Attendees will learn how to implement Efinix AI solutions in their projects. The focus is on practical steps for real-world edge AI systems. The goal is to enable faster, more efficient AI at the edge using FPGA technology.


    Level: Intermediate

Andreas Büttner

Andreas Büttner
Efinix GmbH


11:50 a.m. – 12:30 a.m.

Embedded AI - #3

One Size Does Not Fit All: Power-Efficient Vision AI on FPGAs and Beyond

  • more info ▾

    Description:

    Embedded AI deployments span a wide range of performance, power, and integration constraints, yet they are often approached with a “one-size-fits-all” mindset. This talk argues that such an approach is as inefficient as using a 40-ton truck for weekly grocery shopping. Instead, we explore how tailoring AI architectures to the actual workload can unlock significant gains in efficiency and throughput. We focus on power-efficient AI processing for vision data on FPGAs and FPGA-SoCs highlighting architectural trade-offs and design considerations. A key enabler is unstructured sparsity: using tools from Neuronix, who were acquired by Microchip, pruning is performed during training to produce a sparse neural network that is then processed by Neuronix tools for sparsity removal. This transforms it into a hardware-efficient implementation without sacrificing accuracy. Additionally, we position FPGA-based embedded AI within a broader heterogeneous compute landscape, showing how workloads can scale toward higher-end solutions using NVIDIA Holoscan IP when application complexity or data rates exceed embedded constraints. The result is a pragmatic, scalable view of AI acceleration, from an efficient edge inference to high-performance systems, where the right tool is used for the right job.


    Level: Intermediate

Brian Colgan

Brian Colgan

Microchip Technology GmbH


Martin Kellermann

Martin Kellermann
Microchip Technology GmbH


12:30 a.m. - 1:30 p.m.

Lunch Break and Time for Networking

1:30 p.m.


1:30 p.m. – 2:10 p.m.

Embedded AI - #1

An Experiment in AI-Assisted FSMs on FPGAs

  • more info ▾

    Description:

    Finite State Machines (FSMs) are a fundamental tool in FPGA design, especially for control logic and reactive systems. In practice, however, FSMs tend to grow steadily over time. Optional features, variations between implementations, and dependencies on previous inputs or events often lead to large and hard-to-maintain state machines. Once this complexity has accumulated, it becomes difficult to modify or extend the design without introducing errors.


    This talk presents an experiment that asks a simple question: can small, learned decision blocks help manage FSM complexity without sacrificing determinism? Instead of encoding all possible input sequences and corner cases as explicit states and transitions, recent input history is summarized and evaluated by a compact inference pipeline inspired by language models. The FSM remains responsible for timing, safety, and control, while the learned block assists in classification and decision making where rigid state transitions become impractical.


    The goal of this work is not to propose a production-ready solution, but to explore whether AI-assisted FSMs can reduce design complexity while preserving determinism and hardware transparency. By framing the approach as an experiment, the talk aims to encourage discussion about new design patterns for managing growing complexity in control-heavy FPGA applications.


    Level: Intermediate

Denis Vasilik

Denis Vasilik

Eccelerators GmbH


1:30 p.m. – 2:10 p.m.

Embedded AI - #2

Design Techniques for High-Performance Low-Latency LLM Inferencing on FPGAs optimized for AI

  • more info ▾

    Description:

    Low-latency LLM inferencing has rapidly become a critical workload for embedded and edge AI applications. It’s already valued at multiple billions of dollars and expected to show further accelerated growth.  The high speed of innovation in this field, and the push to maximize power and cost efficiencies require high degree of flexibility from the solution performing these workloads.   FPGAs optimized for AI provide this flexibility by virtue of fine-grain post-silicon programmability enabling heterogeneous use of number formats for quantizations using MXINT8 and 4-bit formats, down to ternary and even binary formats.  This paper presents techniques demonstrating how to map accelerators performing the main workloads of these applications to such FPGAs so that they outperform GPUs.  The main competitive metric is TCO, while latency, performance and accuracy constraints are satisfied.   These workloads include massively parallel matrix multiplication, which may be distributed across multiple FPGAs, very high bandwidth loading of model weights from external memory ensuring near-theoretical utilization, and point-to-point low-latency networking solutions for scale-out networking.  The paper presents in detail how clock cycle-level granularity of control logic implemented with FPGAs is used to completely hide weights loading vs compute vs communication latencies. 


    Level: Intermediate

Georg Hanak

Georg Hanak

Achronix Semiconductor Corporation


1:30 p.m. – 2:10 p.m.

Embedded AI - #3

Beyond the "Sledgehammer": Implementing Physical AI at the Sensor to Offload Robotic SoCs

  • more info ▾

    Description:

    Today, 95% of AI workloads are centralized, often resulting in a "sledgehammer hitting a fly" scenario where simple robotic tasks are processed by high-power, high-latency near-edge boxes. 

    This talk introduces the concept of Physical AI, which embeds intelligence directly next to the sensor to enable real-time sensing and immediate reaction. 

    Drawing inspiration from the "Octopus" architecture, we will discuss how to distribute compute so that FPGAs act as the "suckers"—handling local inference and filtering—while the host CPU/SoC remains the "central brain" for high-level goal setting. 

    Participants will learn how this decentralized approach reduces data transfer expenses, enhances security by keeping sensitive data local, and eliminates latency for critical safety-critical robotic operations.


    Level: Intermediate

Karl Wachswender

Karl Wachswender
Lattice Semiconductor GmbH


1:30 p.m. – 3:00 p.m.   (90 min)

Embedded AI - #4

Altera FPGA AI Suite: A Practical Deep Dive

  • more info ▾

    Description:

    This session provides a comprehensive, example-based exploration of the Altera FPGA AI Suite and its capabilities for deploying AI inference on Agilex™ FPGAs and SoCs. Attendees will learn the complete workflow - from model selection and optimization to generating and integrating inference IP - while uncovering best practices for achieving maximum throughput and power efficiency. We will examine multiple implementation strategies, focusing on their implication on performance and resource utilization. Whether you are new to FPGA-based AI or looking to refine your deployment approach, this deep dive equips you with practical insights and actionable steps to accelerate AI on Altera Agilex™ platforms. Presented examples will be shared on Github repository.


    Level: Intermediate 

Tomasz Iwanski

Tomasz Iwanski

Arrow Central Europe GmbH


2:20 p.m.


2:20 p.m. – 3:00 p.m.

Embedded AI - #1

Next Generation quasi-analog Neuron AI Chip and FPGA

  • more info ▾

    Description:

    The state of the art in AI is defined by so called neural networks. They are representations of real neurons in the brain. The target is to emulate these parts of the brain as close as possible to get similar intelligence in a circuit implementation. Nowadays AI does not rely on neurons but on software or FPGAs emulating neurons. This is a mature field of application, but very inefficient in terms of electrical power and computing power. 

    Most important challenges are the speed of the neurons signal processing, the size of them and the power which is needed for the operation of such an artificial brain.


    The new chip comprises neural AI networks and an FPGA. Due to the quasi-analog nature of the pulse-width-controlled neurons and the implementation in 2-5 nm chip technology, the chip will have an excellent performance/area ratio and also significantly reduce power consumption compared to known AI solutions. 


    Level: Intermediate

Dr. Michael Gude

Dr. Michael Gude

Cologne Chip AG


2:20 p.m. – 3:00 p.m.

Embedded AI - #2

Low-Power Low-Latency Edge AI with FPGAs: Balancing Performance, Power, and Complexity

  • more info ▾

    Description:

    Machine learning models are increasingly embedded in industrial systems, enabling applications from sensor data processing to real-time decision-making. A major challenge in deploying these models is achieving low inference latency without facing excessive power consumption costs.


    FPGAs provide an alternative over traditional and cloud-based solutions by enabling low-latency, power-efficient inference directly on edge devices. By leveraging hardware-level parallelism, FPGAs have the ability to execute complex models deterministically while reducing energy usage. Furthermore, large models can often be reduced in size while still maintaining accuracy, bringing the best of both worlds. 

    However, these benefits come at the cost of higher development effort and increased system complexity.


    This presentation explores the advantages and challenges of implementing FPGA-based Edge AI solutions and shows how to optimize power consumption by optimizing both the FPGA-Design and AI-model in use.


    Level: Intermediate

David Hintringer

David Hintringer

TRS-STAR GmbH


2:20 p.m. – 3:00 p.m.

Embedded AI - #3

AMD Vitis™ AI Tools Workflow: Compilation, Hardware Deployment & Profiling

  • more info ▾

    Description:

    This presentation offers a practical overview of the AMD Vitis™ AI tool workflow for deploying deep learning models on AMD hardware platforms. Attendees will learn about model preparation and optimization, the Vitis AI compiler, and how to integrate and execute models using both ONNX Runtime and Vitis AI Runtime (VART). The session will provide guidance on building applications for inference, deploying models on hardware, and using profiling tools such as the AI Analyzer to evaluate performance and pinpoint bottlenecks. Methods for measuring power and improving model efficiency on hardware accelerators will also be discussed. This session is designed for engineers interested in optimizing AI model deployment and performance using the Vitis AI end-to-end toolchain.


    Level: Beginner

N.N.
AMD


3:00 p.m. - 3:30 p.m.

Coffee Break and Time for Networking

3:30 p.m.


3:30 p.m. – 4:10 p.m.

Embedded AI - #1

Beyond the Architecture - A Forensic, Data-Centric Approach to Image Detection

  • more info ▾

    Description:

    Why do neural networks, showing near-perfect training accuracy, often stumble when hitting the real world? The answer may not lie in the code, it may be in the pixels. This talk goes beyond traditional deep learning by moving from a model-centric to a data-centric paradigm.


    We will explore model "misses"— False Negatives and Positives — not as failures, but as forensic evidence. By post-hoc analysis on prediction errors, we will try to diagnose the "silent killers" of model performance: effects that stem from the datasets in use.


    Beyond diagnosis, we will define best practices for pre-training qualification — including geometric bias and label consistency checks — and move beyond standard accuracy to metrics that truly matter, such as Stratified mAP and Intersection over Union (IoU).


    This session will discuss data-focused insights needed to bridge the gap between laboratory setting and complexity of reality. It should make aware that beyond tweaking layers knowing the data is key.


    Level: Beginner

Alexander Flick

Alexander Flick

PLC2 GmbH


3:30 p.m. – 4:10 p.m.

Embedded AI - #2

Efficient Vision Pipelines on FPGAs: Design Patterns and Performance Tuning

  • more info ▾

    Description:

    Building high-performance vision pipelines on FPGAs requires balancing compute, memory bandwidth, and latency. 

    This talk presents proven design patterns for implementing object detection, segmentation, and defect inspection using FPGA accelerators. 

    We’ll cover Conv core optimizations, address generator tuning, and leveraging DSP blocks for ML inference. 

    Special focus will be given to sensAI modular IP architecture, enabling developers to scale from simple classification to complex multi-stage pipelines. 

    Real-world benchmarks on CertusPro-NX will illustrate how to achieve sub-10 ms inference while maintaining ultra-low power consumption.


    Level: Intermediate

Karl Wachswender

Karl Wachswender
Lattice Semiconductor GmbH


3:30 p.m. – 4:10 p.m.

Embedded AI - #3

Software-to-Hardware Synergy for Edge AI: From Model Compression to Low-Power FPGA Acceleration

  • more info ▾

    Description:

    This paper presents a holistic approach to Edge AI optimization by bridging neural network compression techniques with low-power FPGA acceleration. Using the Software algorithmic enhancements, we demonstrate how pruning, quantization, and sparsity-based optimizations reduce compute complexity and power consumption while maintaining accuracy. The workflow spans model conversion, compilation, and deployment on FPGA platforms, leveraging heterogeneous compute engines for efficient inference. Benchmarks on popular models like YOLOv8 and MobileNet show up to 5× acceleration, 2–4× throughput improvement, and 30% resource reduction, enabling practical, energy-efficient AI solutions for edge applications.


    Level: Intermediate

Dr. Aurang Zaib

Dr. Aurang Zaib
Microchip Technology GmbH


3:30 p.m. – 5:00 p.m.   (90 min)

Embedded AI - #4

Agentic AI in the FPGA Design Loop

  • more info ▾

    Description:

    Agentic AI is redefining the adaptive SoC and FPGA design loops by enabling developers to progress from concept to verified implementation using natural language intent. This hands on workshop demonstrates how an AI agent guides a complete, single design workflow—interpreting requirements, creating AMD Vivado™ tool block design components, generating and integrating subsystems, and automating simulation, implementation, timing closure, and debug. Attendees will observe how iterative prompts drive the agent to refine architectures, analyze results, resolve design issues, and streamline verification. The session showcases a modern, software centric methodology where Agentic AI accelerates iteration, reduces complexity, and provides an accessible end to end FPGA/adaptive SoC development experience. Prerequisites: Familiarity with FPGA tools and curiosity to explore new workflows.


    Level: Beginner 

Luke Millar

Luke Millar
AMD


4:20 p.m.


3:30 p.m. – 4:10 p.m.

Embedded AI - #1

Reimagining Edge GenAI – Generative AI with Hailo-10

  • more info ▾

    Description:

    This presentation explores how generative AI at the edge can be efficiently deployed using the Hailo-10 AI accelerator. We will walk through practical GenAI use cases such as intelligent assistants, sensor-data interpretation, and multimodal inference, highlighting system architecture, performance, and power efficiency. The session will feature live, hands-on demonstrations, showing real-time execution of GenAI workloads on Hailo-10 and sharing practical insights into integration, optimization, and deployment for industrial and embedded environments.


    Level: Intermediate

Stan Klinke

Stan Klinke

EBV Elektonik GmbH & Co. KG


3:30 p.m. – 4:10 p.m.

Embedded AI - #2

AI Acceleration on Microchip FPGAs – From Concept to Deployment

  • more info ▾

    Description:

    Artificial Intelligence workloads are increasingly moving to edge devices, where power efficiency and deterministic performance are critical. This session explores how FPGAs can accelerate AI inference using Vectorblox. We will cover design methodologies for implementing neural networks on FPGA fabric. Attendees will learn how modern FPGA architectures enable secure, low-power AI acceleration for industrial and embedded applications, ensuring flexibility and long-term reliability.


    Level: Intermediate

Saadeddine Ben Jemaa

Saadeddine Ben Jemaa
Arrow Central Europe GmbH


3:30 p.m. – 4:10 p.m.

Embedded AI - #3

Dataflow driven Scalable AI Accelerator Architecture for FPGA and eFPGA Platforms

  • more info ▾

    Description:

    Current philosophies to AI acceleration in FPGAs follow a full-model inference approach. The ability to partition & schedule the execution, predict performance and co-locate other HPC tasks within the FPGA fabric are severely curtailed and subject to manipulating otherwise extensive tool-dependent workflows. This is clearly unacceptable for many real-world applications specially those under footprint and provisioning constraints.  

    We discuss the design of a FPGA-resident compute unit with performance predictability. This compute unit can be scaled in multiple dimensions to suit the available/remaining FPGA floor allowing the system designer much flexibility across the design space. We detail the integration into a software-based event queue that facilitates the scheduling of recurrent system-wide HPC tasks following a data-flow paradigm. We finish by presenting  comprehensive & comparative  performance assessments and measurements. 


    Level: Intermediate

Prof. Hans Dermot Doran

Prof. Hans Dermot Doran
Zurich University of Applied Sciences


* subject to change