Program 2026 - Day Three | FPGA Conference Europe

Program 2026 - Day Three

from 8:00 a.m.

Check-In and Welcome Coffee

9:00 a.m.

9:00 a.m. – 9:40 a.m.

Application

Enabling Low-Latency Applications at the Industrial Edge with FPGA-Based Acceleration

more info ▾
Description:
Industrial edge systems require deterministic, low-latency communication for real-time control. This paper explores FPGA-based hardware support, featuring a TSN Ethernet endpoint and compliance with IEEE 802.1 standards for time synchronization (802.1AS-2020 & IEEE 1588), traffic shaping (802.1Qbv), frame preemption (802.1Qbu), and stream filtering (802.1Qci). Integration with GMII PHY and AXI4-Stream ensures efficient packet handling, while the Time-Aware Scheduler and PTP synchronization guarantee precise timing. Benchmarks demonstrate sub-millisecond latency and superior flexibility compared to PLCs, making FPGA-based solutions ideal for low-latency industrial edge applications.

Level: Intermediate

Dr. Aurang Zaib
Microchip Technology GmbH

9:00 a.m. – 9:40 a.m.

Language / Debug / Verification

Re-use Human-Readable Test Cases for Different Test Levels (Unit/System) Using CocoTB and BDD

more info ▾
Description:
When developing a system, verification often takes as long as implementation, and system level tests on target make it hard to trace issues across software, FPGA, or hardware. Because these bugs appear late in the development cycle, they are time consuming to analyze and fix.

A solution to this is to run these tests against a simulation model using cocotb, which exposes simulator signals to Python. Combined with Python’s behave package, tests can be written in human readable form, improving communication across engineering teams. We implemented glue logic, so tests interact with a common DUT interface that can be backed by a Python model, HDL simulation, or on target execution, enabling incremental development and earlier bug discovery. Human readable test cases also allow domain experts to review test intentions without programming knowledge, improving review quality and ensuring requirements are fully covered.

In the presentation, we detail these challenges, explain the solution, show the framework integration, give usage examples, and discuss key development hurdles.

Level: Intermediate

Elijah Almeida Coimbra
Topic Embedded Systems

9:00 a.m. – 9:40 a.m.

Architecture

Understanding the FSBL

more info ▾
Description:
This session explains the role of the First-Stage Boot Loader (FSBL) in the boot process of AMD embedded platforms. It covers how the FSBL initializes hardware, loads software and bitstreams, and fits into common development flows such as Vitis, PetaLinux, and Yocto. Attendees will learn how the FSBL is structured, how it can be customized for specific devices and designs, and how it behaves at runtime. Advanced topics such as multi-partition flash layouts, fallback boot mechanisms, and debugging techniques are also discussed.

Level: Intermediate

Ernst Wehlage
PLC2 GmbH

9:00 a.m. – 9:40 a.m.

Tools & Methodologies

Open-Source Tools for Commercial FPGAs are There - and There is More to it

more info ▾
Description:
The achievements of the Yosys+nextpnr open-source FPGA CAD tools are quite remarkable. Traditionally, FPGA tools, in particular for place&route had been highly vendor specific and we now have a stable, scalable and vendor-independent bitstream compilation ecosystem that is proven for various FPGA vendors and families. The quality is so good that some smaller FPGA vendors are now using Yosys+nextpnr as the prime CAD tool suite (like CologneChip or the academic FABulous project).
The Talk will highlight some of the great achievements created with Yosys+nextpnr and what it means to build a vendor-independent flow that can scale to 500K+ LUTs. The talk will also show some interesting games we played with Yosys+nextpnr in the FABulous project, like producing FPGA clones or implementing vendor-independent Partial Reconfiguration.

Level: Intermediate

Prof. Dirk Koch
Universität Heidelberg

9:00 a.m. – 9:40 a.m.

Embedded / Vision

Implementation of a Computer Vision Project on Multiple Platforms

more info ▾
Description:
Settling on a suitable hardware platform typically happens early on in a project's lifecycle, and adaptations which go beyond changing the device size within the chosen product line are rare. The main reason for this approach is the anticipated high cost of altering IP and re-training staff on different vendor tools. This talk shares the lessons learned from implementing the very same Computer Vision project, a tabletop 3D laser scanner, on mid-range FPGA and FPGA-SoC devices of all major vendors across multiple product families.

The presentation details the specific portions and percentages of the RTL design which require porting versus those that do not, as well as the associated work effort. We examine how hardware-specific components, such as MIPI, DDR memory controllers, and CPU-FPGA interfaces, can be wrapped using the module template library of Whiznium - an Open Source framework for the model-based design of embedded and FPGA solutions. Furthermore, the discussion provides a comparative analysis of how standard-compliant or forgiving various vendor tools proved to be, while evaluating the perceived ease-of-use and speed toward achieving timing closure across different environments. Finally, the session presents a first experimental result in which an LLM was trained on this specific example and tasked with porting a different application from one platform to another by generating the corresponding Whiznium model description.

Level: Intermediate

Alexander Wirthmueller
MPSI Technologies GmbH

9:00 a.m. – 9:40 a.m.

Embedded AI - #1

AI Basics: From image processing to perception and beyond

more info ▾
Description:
Deep learning has become ubiquitous in recent years. This talk reviews key technical advances in model development: from early, compute-heavy convolutions for image object detection and segmentation, to major gains in scale, resolution, and performance that enabled generative AI and impressive results even with video sequences.
It covers convolution mathematics and processing fundamentals, highlighting how vectorized hardware accelerates CNNs and broader operations. These foundations extend to transformers, which power large language models (LLMs) and other foundation models accessible on desktops, often via data-center backends.
Recent publications adapt these principles to smaller models suited for local deployment. The talk outlines challenges of running such architectures on resource-constrained edge devices and explores ways to design scalable, efficient models for on-device inference.

Level: Beginner

Alexander Flick

PLC2 GmbH

9:00 a.m. – 9:40 a.m.

Embedded AI - #2

FPGA-Accelerated Multi-Camera AI Vision for High-Speed Industrial Inspection on Kria KR260

more info ▾
Description:
High-speed industrial inspection requires deterministic timing and efficient data processing at the edge. This session presents the ongoing development and technical framework of an AI vision system on the Zynq UltraScale+ MPSoC for defect detection in high-output production environments. The architecture utilizes hardware-level triggering within the Programmable Logic (PL) to synchronize a multi-camera array, ensuring precise frame acquisition. AI inference is offloaded to the AMD DPU via Vitis AI, aiming to enable real-time defect detection of 5+ units/sec within an Embedded Linux/C++ environment. We explore the methodology for synchronizing multi-camera GigE Vision streams and the optimization of the deep learning pipeline for low-latency results. This work outlines a compact and efficient embedded approach for meeting the performance requirements of automated inspection systems.

Level: Advanced

Yunus Kök

Anadologic

Burak Aykenar
Anadologic

9:00 a.m. – 9:40 a.m.

Embedded AI - #3

Efficient 360° Threat Detection for Parked Vehicles - A Distributed, Event-Driven Approach

more info ▾
Description:
Automotive theft and vandalism remain costly global problems, driving the demand for smarter, energy-efficient vehicle protection systems. Traditional “always-on” video surveillance approaches rely on centralized compute architectures that consume excessive power and are unsuitable for multi-day parked operation. A new distributed, event-driven exterior monitoring concept enables 360° situational awareness while operating at sub-watt power levels. By using existing camera infrastructure and activating only when motion or threat patterns are detected, it provides scalable, low-latency protection without adding compute or bill-of-materials overhead. Beyond security, this approach supports vehicle personalization and user-adaptive responses, offering a flexible framework for future mobility platforms.

In this session, you will learn more about:

Event-driven exterior monitoring concepts and low-power architectures
Techniques for achieving multi-day parked-state operation
How adaptive monitoring enhances personalization and security
Integration pathways for next-generation automotive platforms

Level: Intermediate

Christian Mueller

Lattice Semiconductor GmbH

9:00 a.m. – 9:40 a.m.

Embedded AI - #4

Reinventing Super Resolution at the Edge: Custom FPGA AI Engines That Outrun GPUs

more info ▾
Description:
Edge AI demands low latency, energy efficient hardware, but general purpose GPUs and NPUs cannot meet strict power, performance and area constraints. FPGAs offer fine grained configurability, yet existing overlay style solutions—such as Altera’s CoreDLA and AMD’s DPU—fail to exploit the full architectural potential needed for real time workloads. Heronic addresses this gap with an automated tool flow that generates bespoke AI accelerator IP tuned to both the target FPGA and the specific model. This methodology achieves near peak silicon utilisation and has been proven on demanding tasks like super resolution upscaling, delivering deeply optimised pipelines with high data reuse and deterministic latency. Heronic demonstrates over 50% performance improvement versus NVIDIA Jetson Orin NX on Altera FPGAs, alongside a 28× increase in TOPS utilisation. This talk outlines how engineers can leverage Heronic’s approach to build highly efficient accelerators for Altera devices.

Level: Intermediate

Alexander Montgomerie-Corcoran

Heronic Technologies

9:50 a.m.

9:50 a.m. – 10:30 a.m.

Application

Precision Warping for Advanced Imaging: When Optics Get Weird, FPGAs Step In

more info ▾
Description:
This talk presents a unified approach to advanced geometric imaging using Altera Warp FPGA IP, demonstrating its versatility across camera, projection, and AI driven systems. We show how Warp enables real time fisheye to perspective, spherical, and cylindrical projections, supporting high precision correction for wide FOV lenses. The same pipeline powers multi camera stitching by rectifying and aligning heterogeneous sensors, and acts as an efficient pre processing stage for 3D depth reconstruction through distortion correction, rectification, and reprojection. We extend these capabilities to projector domains, enabling keystone correction and seamless multi projector stitching. Running at up to 600 MHz on Agilex devices, Warp delivers deterministic, ultra low latency transformation at resolutions up to 8K, with significant power advantages over GPU based methods. Finally, we highlight how Warp offloads geometric normalization for AI pipelines, improving throughput in real time spatial perception tasks.

Level: Intermediate

Alex Lopich
Altera GmbH

9:50 a.m. – 10:30 a.m.

Language / Debug / Verification

Trapped by FPGA Complexity? Applying Software Methodologies to Regain Momentum

more info ▾
Description:
FPGA projects increasingly suffer from a form of self-inflicted complexity. As designs grow larger and more interconnected, developers often become buried in their own codebases — struggling with manual workflows, limited reuse, and long feedback cycles. Progress slows not because of hardware limitations, but because existing methodologies make it difficult to regain structure once complexity has accumulated.

This talk argues that FPGA developers need better tools to shovel their way out of this complexity. Drawing on proven methodologies from software engineering, we explore how automation, continuous integration and deployment, and package management can be applied effectively to FPGA projects. Rather than treating these practices as incompatible with hardware design, we show how they directly address common pain points such as fragile integration, late-stage bugs, and unmaintainable code. We also discuss how such approaches can coexist with traditional HDL workflows and vendor toolchains.

By adopting software-inspired methodologies, FPGA teams can stop digging themselves deeper and instead build sustainable, scalable designs that keep pace with growing project demands.

Level: Intermediate

Peter Fischer
Eccelerators GmbH

9:50 a.m. – 10:30 a.m.

Architecture

Overcoming Compute Memory Bottlenecks – It’s “On the Package”

more info ▾
Description:
Innovation keeps driving the need for more memory bandwidth, smaller form factors, and faster time to market. Semiconductor leaders like AMD are finding new ways to innovate at the package level to bring new capabilities to the customers in a wide range of applications including AI Inference, FinTech, Professional Cameras, and many more. In this session, we’ll look at the recent history of FPGAs with integrated HBM memories, learn about what new capabilities this technology can help enable, and get a preview of what’s coming next. AMD recently announced the AMD Versal™ Premium Gen 2 adaptive SoCs with integrated DRAM, which employ a new approach, stacking LPDDR5X memory on top of the adaptive SoC package. This eliminates complex, high-performance board-level interfaces and delivers more memory bandwidth for applications in the data center, through the network, and at every corner of the Edge.

Level: Intermediate

Timor Knudsen
AMD

9:50 a.m. – 10:30 a.m.

Tools & Methodologies

Intro to DFHDL, an Opensource Multi-Abstraction Hardware Description Framework

more info ▾
Description:
Hardware design does not have to be hard. In this talk, we will explore the foundations of hardware description languages and discuss the advantages/disadvantages of various approaches. You will learn about DFiant HDL (DFHDL), an open-source hardware description framework that leverages a unique multi-abstraction methodology for streamlined design and verification. A live demo will show how a single source of truth targets multiple FPGA platforms and a TinyTapeout ASIC, enabling reusable and portable hardware.

Level: Intermediate

Oron Port
DFiant Ltd

9:50 a.m. – 10:30 a.m.

Embedded / Vision

Image Sensor Integration in FPGA

more info ▾
Description:
A project to integrate a low cost image sensor in FPGA is presented. It is explained why this integration is beneficial in small and medium business, with a focus on the industrial companies. Then, a quick overview of a low cost image sensor is presented: electrical interfaces, timing diagrams from a first analysis and internal registers. A possible tested and working architecture in a SoC is presented.

Level: Intermediate

Alberto Venzo
Spiral Engineering

9:50 a.m. – 10:30 a.m.

Embedded AI - #1

Inside Edge AI: Processing Paradigms and Architectural Hints

more info ▾
Description:
Deploying neural networks at the edge requires understanding diverse hardware architectures and their trade-offs. This talk covers key edge AI processing paradigms: general-purpose CPUs with coprocessors, programmable GPUs, reconfigurable FPGAs, and dedicated vector processor arrays.

It examines model optimization techniques—quantization, pruning, and tools like TensorFlow Lite and PyTorch Mobile—and their interaction with hardware-specific toolchains, often via ONNX. The deployment pipeline is detailed: from training frameworks, through conversion, to efficient on-device inference.

Different neural network topologies suit hardware differently. Convolutional neural networks, dominant in edge vision tasks, map well across platforms. Emerging transformers, however, struggle with high memory bandwidth and data movement demands.

These insights can aid product choices and decisions, while noting that no single architecture yet dominates all edge AI use cases.

Level: Intermediate

Alexander Flick

PLC2 GmbH

9:50 a.m. – 10:30 a.m.

Embedded AI - #2

Building State of the Art Computer Vision Models for the Far Edge

more info ▾
Description:
This session is a technical deep dive into the development and deployment of computer vision AI models on far edge devices.
The content covers hardware and efficiency requirements, collaborative engineering workflows, pipeline and model design principles, and practical strategies for training, optimizing, and deploying small models under strict resource constraints.
Emphasis is placed on the importance of data quality, loss function selection, and hardware-aware training to achieve robust, real-time AI performance in industrial and consumer edge applications.

Level: Intermediate

Karl Wachswender

Lattice Semiconductor GmbH

9:50 a.m. – 10:30 a.m.

Embedded AI - #3

PipeGen: Agentic AI to Generate, Debug, and Deploy End-to-End Edge AI Pipelines

more info ▾
Description:
Edge AI deployment is mostly pipeline engineering: connecting real inputs, building
pre/post-processing, integrating runtimes, debugging performance regressions, and
packaging for devices. The model is only one component. This talk presents PipeGen , an
agentic AI system that automates the end-to-end Build–Debug–Deploy workflow for Edge AI
pipelines using a vendor-neutral pipeline manifest and a portable intermediate
representation (IR). PipeGen compiles pipeline intent into target-specific implementations
(e.g., GStreamer/ROS2/vendor runtimes), adds instrumentation for bottleneck attribution (I/O
vs preprocess vs runtime vs postprocess), and generates reproducible build artifacts plus
benchmark reports. The session explains the architecture (manifest => IR => adapters), the
agent loop for generation and debugging, and a validation/benchmark harness designed for
portability. It concludes with standardization-relevant recommendations for pipeline
schemas, capability negotiation, conformance checks, and consistent metrics in a vendor
agnostic manner.

Level: Advanced

Yashwant Dagar

CraftifAI

9:50 a.m. – 10:30 a.m.

Embedded AI - #4

AI for Everyone with Altera Agilex3

more info ▾
Description:
Agilex3 is designed to make AI accessible, intuitive, and practical for individuals and organizations of all sizes.

Level: Beginner

Tolga Sel

Arrow Central Europe GmbH

Helmut Plötz
ONE WARE GmbH i.G

10:30 a.m. - 11:00 a.m.

Coffee Break and Time for Networking

11:00 a.m.

11:00 a.m. – 11:40 a.m.

Application

Beyond the Lid: Maximizing Thermal Efficiency in Modern High-Performance Devices

more info ▾
Description:
This session highlights how lidless packaging improves thermal efficiency by removing the lid and internal TIM1 layer, allowing direct heatsink to silicon contact. The result is lower thermal resistance, cooler junction temperatures, and greater performance headroom—all while helping reduce cooling cost and complexity. We’ll briefly explain the physics, show real measurement data, and outline why lidless packaging with a stiffener ring delivers superior thermal performance and reliability for modern high-density compute systems.

Level: Intermediate

John Heslip

AMD

11:00 a.m. – 11:40 a.m.

Language / Debug / Verification

Resource Efficient DMA for FPGA Streaming Pipelines Implemented in SpinalHDL

more info ▾
Description:
This presentation reports on our experience using SpinalHDL as a hardware design framework in a real FPGA project. To evaluate the workflow, coding style, and integration aspects of SpinalHDL, we implemented a moderately complex DMA controller and used it as a practical test case. The focus of the session is not the DMA itself, but the process of building and verifying a non trivial design using SpinalHDL and its surrounding ecosystem.
Topics include structuring larger components in Scala, working with SpinalHDL’s type system and bus libraries, and defining clean boundaries to SystemVerilog modules. We also discuss simulation hooks, register map generation, and the interaction with typical FPGA toolflows. The case study provides insight into handling multi channel streaming logic, managing latency considerations, and organizing code so that it remains maintainable as the design grows.
The goal of the talk is to give engineers a realistic impression of what it looks like to adopt SpinalHDL in everyday FPGA development. Attendees will see where the framework helps, where extra care is needed, and how it compares to a traditional RTL only approach when designing mid size IP blocks.

Level: Advanced

Krzysztof Czyz

Embevity sp. Z o. o.

Mateusz Maciąg
Embevity sp. Z o. o.

11:00 a.m. – 11:40 a.m.

Architecture

Implementing Real-Time Applications on Modern ARM v8.2–Based FPGA SoCs

more info ▾
Description:
This presentation takes a deep dive into the Arm DynamIQ™ cluster within the Agilex™ 5 Hard Processor System (HPS), with a particular focus on cache subsystem behavior. It examines the operation and interaction of the cache subsystem, and how these interactions evolve under increasing system load.
The talk explores how rising workload intensity and complex memory access patterns impact core latency, stressing the cache hierarchy and the Arm DynamIQ Shared Unit (DSU). Key architectural and system-level factors that influence latency under high-stress conditions are highlighted.
Finally, the presentation discusses practical mitigation strategies to reduce latency and jitter in such scenarios. These approaches provide guidance for achieving predictable performance in hard real-time and complex mixed-criticality applications.

Level: Intermediate

Stefan Garcia

Altera GmbH

11:00 a.m. – 11:40 a.m.

Tools & Methodologies

Exploring the AMD Adaptive SoC Design Flow with the Vitis(TM) Unified IDE

more info ▾
Description:
The design flow for AMD Adaptive SoCs aggregates diverse artifacts for the different targets within these devices. While embedded software is one prominent project type that needs to be conquered, there are project flavours for specific IP targets like the programmable logic using HLS, or even more compute focused elements like the Versal(TM) families with AI Engines. The AMD Vitis Unified IDE provides all underlying compilers and adds productivity tools for code entry, interactive work with the devices and also offers tooling for deployment.

Along an Adaptive SoC design we will cover the Vitis Embedded Software Projects, which include the platform wrapping a hardware designs, an application component for Linux, or real-time or bare metal targets. We will describe an actual heterogenous design by adding kernel components, and show how to handle code versioning, simulation and interactive operation to easily achieve system integration and delivery.

Level: Beginner

Alexander Flick

PLC2 GmbH

11:00 a.m. – 11:40 a.m.

Embedded / Vision

MIPI CSI 2 to USB 3.2 Video Pipeline with CrossLinkU NX

more info ▾
Description:
This is an overview on how to connect an image sensor via MIPI CSI-2 to a PC by using CrossLinkU-NX USB 3.2 Hard-IP. On the FPGA the CSI-2 video payload will be converted to raw image data first and then converted into an RGB image by a debayer before it will be sent to the PC. To display the image data on the PC the USB Video Class (UVC) is used which makes development of additional drivers obsolete.

Level: Intermediate

Benjamin Mecke
Arrow Central Europe GmbH

11:00 a.m. – 11:40 a.m.

Embedded AI - #1

Reality Over Peak Specs: Constraints Driven Platform Selection for Edge AI

more info ▾
Description:
Selecting a processor for edge AI is rarely about choosing the device with the highest advertised performance. Real systems are constrained by power, latency, memory, interfaces, development effort, long term availability, and commercial considerations. This talk presents a constraints driven approach to platform selection for edge AI applications. It discusses how different edge AI applications map to MCUs, MPUs, FPGAs, and dedicated accelerators, and why no single architecture is optimal across all scenarios. Using practical examples, the session highlights common pitfalls of datasheet based decisions and outlines a systematic way to evaluate trade offs between flexibility, efficiency, and system complexity in real deployments. The goal of this talk is to equip attendees with a practical framework for selecting the right compute platform for their next edge AI application.

Level: Intermediate

Saad Qazi

EBV Elektronik GmbH & Co. KG

11:00 a.m. – 11:40 a.m.

Embedded AI - #2

Silicon Brains vs. Silicon Gates: Can LLMs Replace the FPGA Engineer?

more info ▾
Description:
Can an LLM truly understand timing constraints, or is it just
"hallucinating" hardware?
As Artificial Intelligence transforms the landscape of software development, the FPGA and ASIC industries are facing a pivotal question: Are we witnessing the end of manual RTL coding?
Modern AI models can generate hundreds of lines of VHDL or Verilog in seconds, promising to slash development cycles and eliminate the "blank page" problem for hardware engineers.
However, hardware design is fundamentally different from software. In a world where a single logic bug can lead to expensive re-spins or bricked prototypes, the stakes for "code quality" are absolute.
A model might generate syntactically correct code that completely ignores Clock Domain Crossings (CDC), metastability, or FPGA resource optimization.
This webinar provides a transparent, no-hype evaluation of AI’s current capabilities in hardware description languages. We will put leading AI models to the test, benchmarking their output against the rigorous standards of professional FPGA engineering.

Level: Beginner

Oren Hollander

HandsOn Training

11:00 a.m. – 11:40 a.m.

Embedded AI - #3

Breaking the Data Bottleneck: Hardware-Accelerated Lossless Compression for Next-Generation AI Systems

more info ▾
Description:
As AI workloads scale in complexity and data volume, data movement has become a dominant bottleneck in FPGA-based acceleration platforms. Lossless data compression offers an effective mechanism to reduce storage, memory bandwidth, and interconnect pressure while preserving data accuracy, which is mandatory for training datasets, model checkpoints, distributed learning, and sensor telemetry. This presentation examines the role of high-performance lossless compression IP cores implemented on FPGAs, including LZ4, Zstd, Snappy, and GZIP. We discuss streaming architectures, throughput-latency tradeoffs, and resource utilization, and show how compression engines can be integrated into FPGA data paths such as DDR, Ethernet, and sensor interfaces. Practical AI use cases are presented, including high-speed data ingestion, compressed model loading, and distributed training support. Hardware-accelerated lossless compression is highlighted as a key enabler for scalable and energy-efficient FPGA-based AI systems.

Level: Intermediate

Dr. Calliope-Louisa Sotiropoulou

CAST Inc.

11:00 a.m. – 12:30 p.m. (90 min)

Embedded AI - #4

Accelerating Adoption of USB 10Gbps I/O in Edge AI and Embedded Systems

more info ▾
Description:
USB has long been the go-to interconnect for embedded applications due to its simplicity. However, despite the introduction of USB 3.1 Gen2 (now known as USB 3.2 Gen2) in 2013, which offers 10 Gbps of bandwidth, few applications have been able to harness the full potential of 10 Gbps USB connectivity. Now, exponential growth of AI in embedded systems has created a surge in demand for faster interconnects. In this session, we'll explore how Infineon's latest EZ-USB FX10 general-purpose USB 10Gbps peripheral controller is poised to accelerate the adoption of higher-bandwidth USB interconnects in embedded and edge AI applications, unlocking new possibilities for these rapidly evolving fields.

Level: Intermediate

FliX Feng

Infineon Technologies

Jimmy Chou
Infineon Technologies

11:50 a.m.

11:50 a.m. – 12:30 p.m.

Application

Implementing high-speed FIR Filter in Achronix Speedster7t FPGAs

more info ▾
Description:
Next-generation data acquisition systems are pushing well beyond the limits of traditional FPGA DSP architectures. Sampling rates of 8–10 Gsps demand massive parallelism, deterministic data movement, and real-time processing flexibility—especially when ADCs lack built-in filtering or when application-specific filtering is required.
This presentation showcases how Achronix Speedster7t® FPGAs simplify extreme-bandwidth signal processing using a purpose-built architecture that combines Machine Learning Processors (MLPs) with a high-performance 2D Network-on-Chip (NoC). Attendees will see how multi-GHz, updateable FIR filtering can be efficiently implemented using MLP blocks, while the 2D-NoC enables scalable, low-latency data movement from JESD204B/C interfaces through processing, memory, and host connectivity.
We will provide a high-level overview of the Speedster7t platform—including 100G PAM4 transceivers, PCIe Gen5, 400G Ethernet, and GDDR6 memory—and demonstrate how these features come together to accelerate high-speed data acquisition and processing with unprecedented efficiency and flexibility.

Level: Beginner

Georg Hanak

Achronix Semiconductor Corporation

11:50 a.m. – 12:30 p.m.

Language / Debug / Verification

hdl-registers: The Smart Way to Build AXI-Lite IP Cores

more info ▾
Description:
Developing IP cores for FPGAs requires efficient and error-free implementation of register interfaces, especially when using standard protocols like AXI-Lite. The open-source project hdl-registers provides a powerful solution to simplify and automate this process.

This talk demonstrates how to quickly and consistently develop an AXI-Lite-compatible IP core using hdl-registers. Instead of manually implementing register definitions in HDL and maintaining a separate software interface, hdl-registers allows a single declarative register description. From this, HDL implementation, software headers, and documentation are automatically generated, reducing effort, avoiding inconsistencies, and improving code quality.

Level: Intermediate

Bernhard Wandl

P2L2 GmbH

11:50 a.m. – 12:30 p.m.

Architecture

Deterministic Execution of Real-Time Workloads on Agilex 5: a Multi-Domain Approach

more info ▾
Description:
As modern embedded systems integrate increasingly complex workloads, ensuring deterministic real-time execution while maintaining overall system flexibility has become a critical challenge. In this joint webinar, Altera and Accelerat present a combined hardware–software approach that enables hard real-time performance on the Agilex™ 5 SoC without sacrificing the ability to run rich, non-real-time applications on the same device.
Altera will introduce the architecture and capabilities of the Agilex 5 E-Series, highlighting its heterogeneous compute resources, advanced memory hierarchy, and power-efficient performance profile.

Building on this foundation, Accelerat will detail how its CLARE Software Stack enables strict spatial and temporal isolation for real-time tasks within the SoC. Through a multi-domain execution architecture, CLARE is able to allocate a dedicated domain for real-time workloads, ensuring they remain strongly-isolated from latency- and jitter-inducing interference originating from caches, memory buses, and other micro-architectural resources shared with non-real-time tasks.
This co-designed hardware/software methodology demonstrates how you can achieve deterministic, hard real-time behavior on Agilex 5 while simultaneously leveraging some cores for general-purpose or Linux-based applications. Attendees will gain insight into best practices for partitioning workloads, configuring isolated domains, and building mixed-criticality systems that combine reliability, performance, and flexibility on a single SoC.

Level: Intermediate

Angelo Lo Cicero

Altera GmbH

Giorgiomaria Cicero
Accelerat SRL

11:50 a.m. – 12:30 p.m.

Tools & Methodologies

From PetaLinux to Yocto EDF

more info ▾
Description:
This presentation introduces the new AMD EDF (Embedded Design Flow) for building embedded Linux systems,
which is intended to replace the long-established AMD PetaLinux flow.
While PetaLinux has been widely adopted for many years, growing system complexity and the need for more scalable and flexible build environments have revealed limitations in the existing approach.
The AMD EDF flow is designed to overcome these limitations by providing a more modular, extensible, and Yocto-aligned development methodology.
The session compares different Yocto-based build approaches and highlights the architectural and workflow differences between PetaLinux and the EDF flow.

Level: Intermediate

Ernst Wehlage
PLC2 GmbH

11:50 a.m. – 12:30 p.m.

Embedded / Vision

Why Not Just Use a GPU?
A Critical Case Study of High-Level Synthesis on FPGA vs. GPU

more info ▾
Description:
High-Level Synthesis (HLS) is a programming approach that allows developers to implement FPGA hardware using high-level languages such as C or C++, instead of writing low-level hardware description code.

It promises to significantly reduce FPGA development complexity and shorten implementation time. But in an era dominated by powerful and affordable GPUs, an important question remains: why not just use a GPU?

In this talk, an HLS-based FPGA implementation is compared with a GPU implementation using blocked GEMM as a practical case study. The comparison covers performance, power consumption, energy efficiency, development effort, and synthesis time. Based on the results, the talk discusses when FPGA acceleration is technically and economically justified and when GPU architectures or other alternatives are the better choice.

Level: Intermediate

Atakan Tosun
Heitec AG

11:50 a.m. – 12:30 p.m.

Embedded AI - #1

BYOM – Custom Model Edge Inference with Vitis AI

more info ▾
Description:
The new AMD Vitis AI 2025.1 toolchain introduces a streamlined workflow for optimizing and deploying deep learning models on AMD Versal AI Edge and Versal AI Edge Gen 2 devices. This talk outlines the updated processing flow centered on ONNX model input, detailing key improvements in quantization, compilation, and runtime integration.

Using a pretrained model as a running example, the session walks through the "Bring Your Own Model" (BYOM) process by creating a snapshot of the model, which quantizes and compiles the model for the NPU hardware target. We will visit the controls available to scale and tune performance and discuss the tools to review the results. The snapshot achieved can then be deployed on physical hardware supported by the ARM CPU and the Vitis AI Runtime (VART).

Attendees can gain a good understanding of how Vitis AI 2025.1 supports flexible AI deployment on adaptive platforms and how to apply the updated tools efficiently in custom inference workflows.

Level: Beginner

Alexander Flick

PLC2 GmbH

11:50 a.m. – 12:30 p.m.

Embedded AI - #2

Accelerating Edge AI with Efinix FPGAs: TinyML and eCNN for Real-World Applications

more info ▾
Description:
Edge AI demands low latency, minimal power use and flexible deployment. Traditional cloud-based AI falls short in these areas. Microcontrollers struggle with performance limits. High-end GPUs consume too much power. Efinix FPGAs bridge this gap with two optimized solutions. The TinyML framework brings lightweight neural networks to resource-constrained devices. The eCNN IP core delivers higher performance for complex models. Both run directly on Efinix FPGAs. This session will cover architecture details, performance data and deployment examples. Attendees will learn how to implement Efinix AI solutions in their projects. The focus is on practical steps for real-world edge AI systems. The goal is to enable faster, more efficient AI at the edge using FPGA technology.

Level: Intermediate

Andreas Büttner
Efinix GmbH

11:50 a.m. – 12:30 p.m.

Embedded AI - #3

One Size Does Not Fit All: Power-Efficient Vision AI on FPGAs and Beyond

more info ▾
Description:
Embedded AI deployments span a wide range of performance, power, and integration constraints, yet they are often approached with a “one-size-fits-all” mindset. This talk argues that such an approach is as inefficient as using a 40-ton truck for weekly grocery shopping. Instead, we explore how tailoring AI architectures to the actual workload can unlock significant gains in efficiency and throughput. We focus on power-efficient AI processing for vision data on FPGAs and FPGA-SoCs highlighting architectural trade-offs and design considerations. A key enabler is unstructured sparsity: using tools from Neuronix, who were acquired by Microchip, pruning is performed during training to produce a sparse neural network that is then processed by Neuronix tools for sparsity removal. This transforms it into a hardware-efficient implementation without sacrificing accuracy. Additionally, we position FPGA-based embedded AI within a broader heterogeneous compute landscape, showing how workloads can scale toward higher-end solutions using NVIDIA Holoscan IP when application complexity or data rates exceed embedded constraints. The result is a pragmatic, scalable view of AI acceleration, from an efficient edge inference to high-performance systems, where the right tool is used for the right job.

Level: Intermediate

Brian Colgan

Microchip Technology GmbH

Martin Kellermann
Microchip Technology GmbH

12:30 p.m. - 1:30 p.m.

Lunch Break and Time for Networking

1:30 p.m.

1:30 p.m. – 2:10 p.m.

Application

Space-Optimized PMIC Power Modules for FPGAs: Up to 80% Smaller Total Solution Area

more info ▾
Description:
Power trees are getting more sophisticated while PCB area keeps shrinking. This session presents space-optimized PMIC modules from Monolithic Power Systems that can reduce total solution area by up to 80%, simplify implementation, and maintain high efficiency with low EMI. Beyond area reduction, we cover digital interface–based configuration and programming, robust thermal performance, and tight output-voltage regulation—requirements for modern FPGA rails.

Level: Intermediate

Nicolay Garcia
Monolithic Power Systems GmbH

1:30 p.m. – 2:10 p.m.

Language / Debug / Verification

Bottom-up Radiation Hardness Assurance for FPGA Based Software Defined Radios

more info ▾
Description:
Radiation hardness assurance is an important step in qualifying systems for use in environments with ionizing radiation, like nuclear facilities, satellites, and high energy physics experiments. With the increased adoption of advanced wireless communication systems, radiation testing becomes ever more complex. In this presentation we want to explore a practical bottom-up methodology for testing FPGA based software defined radios under radiation.
The most critical radiation effects on FPGAs will be introduced, including single-event-effects (SEE) and total ionizing dose effects (TID). The radiation testing methodology, and one possible practical implementation will be presented, along with selected results for two testing campaigns, exploring the radiation hardness of three automatic gain control algorithms. We will conclude by mentioning how these results can be leveraged to ensure that radiation hardness, along with performance, is a primary system design goal.

Level: Beginner

Hannes Bachl
Ostbayerische Technische Hochschule

1:30 p.m. – 2:10 p.m.

Architecture

Pepper: The Open-Source FPGA-Based Rapid Development Board and
Environment for Secure Edge Innovation

more info ▾
Description:
Pepper is an open-source FPGA-based rapid development board designed to accelerate secure edge and networking innovation. Built around the Altera Agilex 5 FPGA, it combines reconfigurable hardware, heterogeneous computing, high-speed Ethernet switching, and extensive connectivity including 5G, Wi-Fi, and GPIO interfaces.
Built for performance and flexibility, it enables fast implementation of Layer 2 and Layer 3 networking solutions using pre-integrated Pantherun modules.
Unlike traditional embedded platforms, Pepper enables true hardware customization, ultra-low latency processing, and hardware-level security within an open development ecosystem.
With hardware-accelerated packet processing and full-speed, line-rate encryption, secure networking can be implemented without sacrificing performance.
This session demonstrates how Pepper significantly reduces development cycles, simplifies FPGA-based networking design, and enables rapid deployment of high-performance, secure communication systems.

Level: Intermediate

Michel Pedimina
Pantherun GmbH

1:30 p.m. – 2:10 p.m.

Tools & Methodologies

The New Spartan UltraScale+ Family

more info ▾
Description:
The newly available AMD Spartan UltraScale+ family is presented technically in this session.
The presentation is of interest both to FPGA beginners and to experienced ABC developers looking to port an existing design.

Level: Beginner

Ernst Wehlage
PLC2 GmbH

1:30 p.m. – 2:10 p.m.

Embedded / Vision

Technical Advantages of the HyperFlex Gen2 Architecture in Altera Agilex 3 and Agilex 5 FPGAs

more info ▾
Description:
As edge computing requirements evolve, the traditional boundary between "performance" and "cost-optimized" FPGA architectures is blurring. This presentation explores the technical implications of deploying Altera’s HyperFlex Gen2 architecture - previously exclusive to high-end Agilex 7 devices - into the mid-range Agilex 5 and cost-optimized Agilex 3 families.
We analyze how the HyperFlex Gen2 fabric, characterized by ubiquitous hyper-registers and fine-grained retiming capabilities, addresses the critical power-performance constraints of edge and embedded systems. Unlike traditional architectures that rely heavily on logic cell placement for timing closure, HyperFlex Gen2 leverages automated retiming to significantly boost Fmax without manual pipelining.
In terms of design productivity, the architecture provides three clear practical benefits. By decoupling logic functionality from routing delays, HyperFlex Gen2 enables high FPGA utilization, often exceeding 90% logic density without the catastrophic routing congestion typically seen in legacy architectures. This headroom, combined with naturally high Fmax, facilitates fast timing closure, drastically reducing iteration cycles and accelerating time-to-market for complex designs. Furthermore, the ability to run at higher frequencies allows for the use of narrower data paths and time-multiplexing, which optimizes power efficiency by reducing the overall silicon footprint.

Level: Beginner

Konstantin Dobrosolets

Altera

1:30 p.m. – 2:10 p.m.

Embedded AI - #1

An Experiment in AI-Assisted FSMs on FPGAs

more info ▾
Description:
Finite State Machines (FSMs) are a fundamental tool in FPGA design, especially for control logic and reactive systems. In practice, however, FSMs tend to grow steadily over time. Optional features, variations between implementations, and dependencies on previous inputs or events often lead to large and hard-to-maintain state machines. Once this complexity has accumulated, it becomes difficult to modify or extend the design without introducing errors.

This talk presents an experiment that asks a simple question: can small, learned decision blocks help manage FSM complexity without sacrificing determinism? Instead of encoding all possible input sequences and corner cases as explicit states and transitions, recent input history is summarized and evaluated by a compact inference pipeline inspired by language models. The FSM remains responsible for timing, safety, and control, while the learned block assists in classification and decision making where rigid state transitions become impractical.

The goal of this work is not to propose a production-ready solution, but to explore whether AI-assisted FSMs can reduce design complexity while preserving determinism and hardware transparency. By framing the approach as an experiment, the talk aims to encourage discussion about new design patterns for managing growing complexity in control-heavy FPGA applications.

Level: Intermediate

Denis Vasilik

Eccelerators GmbH

1:30 p.m. – 2:10 p.m.

Embedded AI - #2

Design Techniques for High-Performance Low-Latency LLM Inferencing on FPGAs optimized for AI

more info ▾
Description:
Low-latency LLM inferencing has rapidly become a critical workload for embedded and edge AI applications. It’s already valued at multiple billions of dollars and expected to show further accelerated growth. The high speed of innovation in this field, and the push to maximize power and cost efficiencies require high degree of flexibility from the solution performing these workloads. FPGAs optimized for AI provide this flexibility by virtue of fine-grain post-silicon programmability enabling heterogeneous use of number formats for quantizations using MXINT8 and 4-bit formats, down to ternary and even binary formats. This paper presents techniques demonstrating how to map accelerators performing the main workloads of these applications to such FPGAs so that they outperform GPUs. The main competitive metric is TCO, while latency, performance and accuracy constraints are satisfied. These workloads include massively parallel matrix multiplication, which may be distributed across multiple FPGAs, very high bandwidth loading of model weights from external memory ensuring near-theoretical utilization, and point-to-point low-latency networking solutions for scale-out networking. The paper presents in detail how clock cycle-level granularity of control logic implemented with FPGAs is used to completely hide weights loading vs compute vs communication latencies.

Level: Intermediate

Georg Hanak

Achronix Semiconductor Corporation

1:30 p.m. – 2:10 p.m.

Embedded AI - #3

Beyond the "Sledgehammer": Implementing Physical AI at the Sensor to Offload Robotic SoCs

more info ▾
Description:
Today, 95% of AI workloads are centralized, often resulting in a "sledgehammer hitting a fly" scenario where simple robotic tasks are processed by high-power, high-latency near-edge boxes.
This talk introduces the concept of Physical AI, which embeds intelligence directly next to the sensor to enable real-time sensing and immediate reaction.
Drawing inspiration from the "Octopus" architecture, we will discuss how to distribute compute so that FPGAs act as the "suckers"—handling local inference and filtering—while the host CPU/SoC remains the "central brain" for high-level goal setting.
Participants will learn how this decentralized approach reduces data transfer expenses, enhances security by keeping sensitive data local, and eliminates latency for critical safety-critical robotic operations.

Level: Intermediate

Karl Wachswender
Lattice Semiconductor GmbH

1:30 p.m. – 3:00 p.m. (90 min)

Embedded AI - #4

Altera FPGA AI Suite: A Practical Deep Dive

more info ▾
Description:
This session provides a comprehensive, example-based exploration of the Altera FPGA AI Suite and its capabilities for deploying AI inference on Agilex™ FPGAs and SoCs. Attendees will learn the complete workflow - from model selection and optimization to generating and integrating inference IP - while uncovering best practices for achieving maximum throughput and power efficiency. We will examine multiple implementation strategies, focusing on their implication on performance and resource utilization. Whether you are new to FPGA-based AI or looking to refine your deployment approach, this deep dive equips you with practical insights and actionable steps to accelerate AI on Altera Agilex™ platforms. Presented examples will be shared on Github repository.

Level: Intermediate

Tomasz Iwanski

Arrow Central Europe GmbH

2:20 p.m.

2:20 p.m. – 3:00 p.m.

Application

Reset Strategies

more info ▾
Description:
This session gives an overview of different types of reset implementations with advantages and disadvantages. Helps to understand the impact of reset coding on design reliability and how software analyzes and reports reset path timing. In the end best practices of reset coding to avoid HW related problems will be given.

Level: Beginner

Benjamin Mecke
Arrow Central Europe GmbH

2:20 p.m. – 3:00 p.m.

Language / Debug / Verification

EDA²: Post-processing EDA Tool outputs

more info ▾
Description:
EDA tools generate masses of log messages. We have seen synthesis and implementation logs exceeding 20 MiB. How can these logs be analyzed? How can these logs be checked for certain warnings, which are considered critical in a design, but aren't considered critical by the tool vendor?

This presentation will show how to (live) post-process synthesis and implementation logs from AMD Vivado with pyEDAA.OutputFilter. Each message will be classified, colored, counted, ..., it even parses tables. The tool provides a generic data format and API for user-defined data analysis and writing user-specific rules.

As a bonus, gathered statistics like resource utilization can be visualized over time using InfluxDB/Grafana. Thus, engineers and project leads can investigate how a design grows over time or when changing versions.

Level: Beginner

Patrick Lehmann
plc2 Design GmbH

2:20 p.m. – 3:00 p.m.

Architecture

Beyond the Bitstream: Streamlining Heterogeneous Computing with the MLE FPGA Full System Stack

more info ▾
Description:
As the demand for domain-specific architectures grows, FPGAs have become essential for offloading compute-intensive tasks in networking, storage, and automotive sectors. However, the "Integration Gap"—the months of engineering effort required to build reliable PCIe/DMA infrastructure, kernel drivers, and memory management—often acts as a barrier to entry. This presentation introduces the Missing Link Electronics (MLE) FPGA Full System Stack (FFSS), a pre-validated, cross-platform framework designed to eliminate architectural "plumbing" and accelerate application-specific development.

Level: Beginner

Andreas Schuler
Missing Link Electronics GmbH

2:20 p.m. – 3:00 p.m.

Tools & Methodologies

A Flexible and Scalable YOLO-Specific DPU for Real-Time FPGA Acceleration

more info ▾
Description:
This work presents the development of a flexible Deep Processing Unit (DPU) for real-time object detection using YOLO (e.g. YOLOv4) networks, specifically designed for FPGA-based acceleration. Open-source YOLO implementations such as YOLOv4 enable hardware-oriented adaptations without proprietary dependencies.
The approach starts from FPGA-friendly YOLO variants, applying quantisation and architectural simplifications such as replacing complex activations (e.g. SiLU/Swish) with ReLU, restricting the computation to operations efficiently mappable to FPGA hardware.
The proposed DPU does not implement the neural network as a fixed hardware structure, but acts as a layer orchestrator. It schedules YOLO layers and manages data movement through DMA, dynamically loading feature maps, weights, biases and layer configuration parameters. Computation units are reused across layers, and the architecture is parametrizable, allowing the number of parallel operations per cycle to be adapted to the available FPGA resources. This enables scalability across devices by trading resource usage for throughput and frames per second.
Compared to generic FPGA AI frameworks such as Vitis-AI, this work focuses on a YOLO-specific DPU, optimising performance per resource and per watt. The project is being actively implemented and validated on FPGA, demonstrating a practical and open-source solution.

Level: Intermediate

Pablo Mendoza Eguiguren
Indra Sistemas SA

2:20 p.m. – 3:00 p.m.

Embedded / Vision

Smarter Robotics with Lattice FPGAs: From Vision to Motion

more info ▾
Description:
Humanoids are shaping the future of robotics, and Lattice FPGAs are at the heart of this transformation.
Discover how low power, real-time motor control, ultra-low latency, vision acceleration, and FuSa compliance make Lattice the ideal choice for next-generation humanoid designs.
This session goes beyond theory with practical insights into motor control and industrial Ethernet.
Book now to learn how to deliver smarter, safer, and more efficient humanoid solutions that stand out in the market.

Level: Intermediate

Helmut Demel

Lattice Semiconductor GmbH

2:20 p.m. – 3:00 p.m.

Embedded AI - #1

Next Generation quasi-analog Neuron AI Chip and FPGA

more info ▾
Description:
The state of the art in AI is defined by so called neural networks. They are representations of real neurons in the brain. The target is to emulate these parts of the brain as close as possible to get similar intelligence in a circuit implementation. Nowadays AI does not rely on neurons but on software or FPGAs emulating neurons. This is a mature field of application, but very inefficient in terms of electrical power and computing power.
Most important challenges are the speed of the neurons signal processing, the size of them and the power which is needed for the operation of such an artificial brain.

The new chip comprises neural AI networks and an FPGA. Due to the quasi-analog nature of the pulse-width-controlled neurons and the implementation in 2-5 nm chip technology, the chip will have an excellent performance/area ratio and also significantly reduce power consumption compared to known AI solutions.

Level: Intermediate

Dr. Michael Gude

Cologne Chip AG

2:20 p.m. – 3:00 p.m.

Embedded AI - #2

Low-Power Low-Latency Edge AI with FPGAs: Balancing Performance, Power, and Complexity

more info ▾
Description:
Machine learning models are increasingly embedded in industrial systems, enabling applications from sensor data processing to real-time decision-making. A major challenge in deploying these models is achieving low inference latency without facing excessive power consumption costs.

FPGAs provide an alternative over traditional and cloud-based solutions by enabling low-latency, power-efficient inference directly on edge devices. By leveraging hardware-level parallelism, FPGAs have the ability to execute complex models deterministically while reducing energy usage. Furthermore, large models can often be reduced in size while still maintaining accuracy, bringing the best of both worlds.
However, these benefits come at the cost of higher development effort and increased system complexity.

This presentation explores the advantages and challenges of implementing FPGA-based Edge AI solutions and shows how to optimize power consumption by optimizing both the FPGA-Design and AI-model in use.

Level: Intermediate

David Hintringer

TRS-STAR GmbH

2:20 p.m. – 3:00 p.m.

Embedded AI - #3

AMD Vitis™ AI Tools Workflow: Compilation, Hardware Deployment & Profiling

more info ▾
Description:
This presentation offers a practical overview of the AMD Vitis™ AI tool workflow for deploying deep learning models on AMD hardware platforms. Attendees will learn about model preparation and optimization, the Vitis AI compiler, and how to integrate and execute models using both ONNX Runtime and Vitis AI Runtime (VART). The session will provide guidance on building applications for inference, deploying models on hardware, and using profiling tools such as the AI Analyzer to evaluate performance and pinpoint bottlenecks. Methods for measuring power and improving model efficiency on hardware accelerators will also be discussed. This session is designed for engineers interested in optimizing AI model deployment and performance using the Vitis AI end-to-end toolchain.

Level: Beginner

Gildas Ganest
AMD

3:00 p.m. - 3:30 p.m.

Coffee Break and Time for Networking

3:30 p.m.

3:30 p.m. – 4:10 p.m.

Application

PCIe in Embedded FPGA Companion Chips: Implementation, Performance, and Verification

more info ▾
Description:
PCIe has become the de facto high-speed interface for server-grade FPGA accelerator boards, such as Xilinx Alveo cards, enabling seamless integration with host systems for demanding computational tasks. However, its versatility, high bandwidth, and relatively straightforward FPGA implementation make it equally compelling for embedded applications, where FPGAs serve as companion chips to SoCs.
This talk explores the untapped potential of PCIe in embedded FPGA designs, bridging the gap between server accelerators and compact companion chips. In the first half, we will delve into key Intellectual Property (IP) cores for PCIe integration, including their interfaces and configuration options. We will compare performance metrics such as bandwidth and latency against integrated SoC FPGAs. Additionally, we will examine the implications for device drivers.
The second half shifts focus to advanced verification strategies tailored for PCIe-connected FPGA companion chips. We will cover practical techniques using open-source tools like Cocotb for hardware verification in Python, alongside co-simulation with QEMU to model full-system behaviour.

Level: Intermediate

Matteo Vit
Starware Design Ltd

3:30 p.m. – 4:10 p.m.

Language / Debug / Verification

The Power of High Level Co-Simulation for HDL Designs

more info ▾
Description:
Traditional HDL verification uses HDL to apply stimulus and validate responses from HDL designs. Since HDLs are not well suited for this, writing testbenches takes a lot of time.
A more efficient approach is to utilize open source Co-Simulation environments like Cocotb, enabling the HDL designer to write testbenches that cover an extensive amount of test combinations without a lot of effort and coding. Cocotb comes with a large variety of open source interface models, supports various simulators and uses Python to command those. This combination leverages the high level approach of Python with the simulation tools everyone is familiar with.
The real potential is shown, when used with standard Python libraries to validate HDL functionality. Writing complex testbenches is just a few lines of code away. Combined with tools for registers and testbench generators reduce the time to a fully functional HDL testbench to just minutes.
The presentation shows that the time spent on validating a HDL module, which usually takes ~50% of the design time, can be reduced to only a few minutes.

Level: Intermediate

Dr. Harald Simmler

Ing. Buero Harald Simmler

3:30 p.m. – 4:10 p.m.

Architecture

Implementation of Nios V with HyperRAM in Altera Agilex FPGA

more info ▾
Description:
When should you use a HyperRAM as external memory and how to implement it in combination with a NIOS V embedded soft processor into an Agilex 3 FPGA. HyperRAM is a high-speed, self-refresh Dynamic RAM (DRAM) designed for low-pin-count, low-power, and space-constrained embedded applications, such as IoT devices, wearables, automotive dashboards, HMI panels, factory automation, ...

Level: Beginner

Armin Faems
Arrow Central Europe GmbH

3:30 p.m. – 4:10 p.m.

Tools & Methodologies

Mistakes to Avoid in High-Rate RFSoC Designs

more info ▾
Description:
FPGAs with integrated ADCs and DACs enable multi-GSPS signal processing, but designing DSP systems at these data rates introduces pitfalls that are easy to underestimate. Many of these issues only become visible once parallelism, throughput, and system-level constraints are fully understood.

This presentation highlights common mistakes encountered in real high-rate RFSoC projects. On the firmware side, it addresses misconceptions around achievable converter sample rates, the practical challenges of parallel FFT implementations, and why Fmax optimization is a first-order design goal rather than an afterthought. It also covers the implications of ADC data rates that exceed DDR bandwidth, and what this means for bring-up, validation, and debugging.

On the software side, the presentation draws on practical experience using C#/.NET for SoC systems. It shows how higher-level abstractions, mature GUI and application frameworks, and built-in SIMD support help manage complexity and sustain high data rates. The talk also discusses why standard Linux interfaces such as generic-uio are often a better choice than custom kernel-space drivers in terms of maintainability and development effort.

Rather than a single deep dive, this talk provides a collection of practical lessons aimed at avoiding costly design mistakes and enabling RFSoC systems that are efficient, scalable, and debuggable in practice.

Level: Advanced

Oliver Bründler

Enclustra GmbH

3:30 p.m. – 4:10 p.m.

Embedded / Vision

Enabling Fault Tolerance in an FPGA-Based RISC-V Processor Through Lockstep Detection and Replay Recovery

more info ▾
Description:
This talk describes a practical method to add fault tolerance to an FPGA-based RV32IM RISC-V processor using dual-pipeline lockstep and replay recovery. Two identical pipelines execute in cycle-level synchronization, and stage-level comparison logic detects mismatches in critical execution state. A centralized control unit enforces bounded replay or halt to block incorrect architectural updates. FPGA implementation results quantify the incremental cost of detection and recovery logic relative to simple duplication.
FPGA RTL designers and hardware architects working with RISC-V or other soft processors who are interested in integrating architectural fault detection and recovery mechanisms for safety-critical or reliability-focused systems.
Architectural fault detection and recovery for FPGA-based soft processors using dual-pipeline lockstep execution and bounded replay control.

Level: Intermediate

Burak Gazel

Aselsan Inc.

3:30 p.m. – 4:10 p.m.

Embedded AI - #1

Beyond the Architecture - A Forensic, Data-Centric Approach to Image Detection

more info ▾
Description:
Why do neural networks, showing near-perfect training accuracy, often stumble when hitting the real world? The answer may not lie in the code, it may be in the pixels. This talk goes beyond traditional deep learning by moving from a model-centric to a data-centric paradigm.

We will explore model "misses"— False Negatives and Positives — not as failures, but as forensic evidence. By post-hoc analysis on prediction errors, we will try to diagnose the "silent killers" of model performance: effects that stem from the datasets in use.

Beyond diagnosis, we will define best practices for pre-training qualification — including geometric bias and label consistency checks — and move beyond standard accuracy to metrics that truly matter, such as Stratified mAP and Intersection over Union (IoU).

This session will discuss data-focused insights needed to bridge the gap between laboratory setting and complexity of reality. It should make aware that beyond tweaking layers knowing the data is key.

Level: Beginner

Alexander Flick

PLC2 GmbH

3:30 p.m. – 4:10 p.m.

Embedded AI - #2

Efficient Vision Pipelines on FPGAs: Design Patterns and Performance Tuning

more info ▾
Description:
Building high-performance vision pipelines on FPGAs requires balancing compute, memory bandwidth, and latency.
This talk presents proven design patterns for implementing object detection, segmentation, and defect inspection using FPGA accelerators.
We’ll cover Conv core optimizations, address generator tuning, and leveraging DSP blocks for ML inference.
Special focus will be given to sensAI modular IP architecture, enabling developers to scale from simple classification to complex multi-stage pipelines.
Real-world benchmarks on CertusPro-NX will illustrate how to achieve sub-10 ms inference while maintaining ultra-low power consumption.

Level: Intermediate

Karl Wachswender
Lattice Semiconductor GmbH

3:30 p.m. – 4:10 p.m.

Embedded AI - #3

Software-to-Hardware Synergy for Edge AI: From Model Compression to Low-Power FPGA Acceleration

more info ▾
Description:
This paper presents a holistic approach to Edge AI optimization by bridging neural network compression techniques with low-power FPGA acceleration. Using the Software algorithmic enhancements, we demonstrate how pruning, quantization, and sparsity-based optimizations reduce compute complexity and power consumption while maintaining accuracy. The workflow spans model conversion, compilation, and deployment on FPGA platforms, leveraging heterogeneous compute engines for efficient inference. Benchmarks on popular models like YOLOv8 and MobileNet show up to 5× acceleration, 2–4× throughput improvement, and 30% resource reduction, enabling practical, energy-efficient AI solutions for edge applications.

Level: Intermediate

Dr. Aurang Zaib
Microchip Technology GmbH

3:30 p.m. – 5:00 p.m. (90 min)

Embedded AI - #4

Agentic AI in the FPGA Design Loop

more info ▾
Description:
Agentic AI is redefining the adaptive SoC and FPGA design loops by enabling developers to progress from concept to verified implementation using natural language intent. This hands on workshop demonstrates how an AI agent guides a complete, single design workflow—interpreting requirements, creating AMD Vivado™ tool block design components, generating and integrating subsystems, and automating simulation, implementation, timing closure, and debug. Attendees will observe how iterative prompts drive the agent to refine architectures, analyze results, resolve design issues, and streamline verification. The session showcases a modern, software centric methodology where Agentic AI accelerates iteration, reduces complexity, and provides an accessible end to end FPGA/adaptive SoC development experience. Prerequisites: Familiarity with FPGA tools and curiosity to explore new workflows.

Level: Beginner

Luke Millar
AMD

4:20 p.m.

4:20 p.m. – 5:00 p.m.

Application

Crypto-Factories: Homomorphic Encryption Powers FPGA-Accelerated Confidential Computing for Industrial Edge AI

more info ▾
Description:
Industrial systems are adopting confidential computing (CC) to protect sensitive data and AI models and to meet IEC 62443.
Most deployments today use trusted execution environments (TEEs) and confidential VMs/containers to provide hardware isolation and remote attestation in OPC UA gateways, edge analytics, and OEM data-sharing. This is a major advance, but the current wave has real risks: side-channel leakage, vendor dependent attestation, weak key release, I/O gaps around accelerators, and patch-management friction in OT.
This session will:
Map today’s CC practices to IEC 62443 FR1–FR7.
Expose the top 9 vulnerabilities in TEE-centric deployments with concrete OT examples (gateways, vision, supplier KPIs).
Share a practical mitigation checklist: attestation-gated secrets, CPU isolation/SMT off, con9dential node pools, min-TCB policies, audited TrustLists, and default-deny networking.
Present a timeline where homomorphic encryption (HE), zero-knowledge proofs (ZKP), and FPGA acceleration extend CC to cross-company analytics and cryptographic compliance proofs.
Attendees will leave with clear guidance on where CC delivers value now, and how to prepare for the next wave—TEEs as the workhorse today, HE for partner analytics in the mid term, and ZKPs for veritable trust across OEMs, suppliers, and regulators.

Level: Intermediate

Christian Michel
Lattice Semiconductor GmbH

4:20 p.m. – 5:00 p.m.

Language / Debug / Verification

What Software Development Got Right - And FPGA Design Can Now Use

more info ▾
Description:
Livt challenges the assumption that every application needs a CPU, firmware, an OS, and a driver stack. Instead, it enables complete applications to be implemented directly on an FPGA—making a processor an optional integration choice rather than a prerequisite. This can simplify systems, reduce latency, and improve determinism while preserving application-level structure.

Software advanced through higher-level abstractions, shared libraries, frameworks, and reusable building blocks. FPGA development still tends to stay low-level: teams build isolated abstractions, engineers solve the same problems repeatedly, and complexity scales poorly.

Livt brings software-like structure to hardware by using hardware-native semantics—explicit state machines, multiplexing, and concurrency—that map directly to HDL. It supports encapsulation, interfaces, and composition, and encourages development with packages and frameworks instead of monolithic IP. These layers can be stacked and evolved, enabling incremental growth in functionality without exponential complexity.

Livt reframes FPGA design as true application development in silicon—with or without a processor.

Level: Intermediate

Denis Vasilik

Eccelerators GmbH

4:20 p.m. – 5:00 p.m.

Architecture

Robotics with Altera FPGA

more info ▾
Description:
Modern robotic systems require high-performance, low-latency processing to handle sensing, perception, and control in real time. Field-Programmable Gate Arrays (FPGAs) provide an efficient and flexible platform for accelerating critical robotic workloads through parallel and deterministic computation. This presentation highlights the use of FPGAs in robotics, focusing on sensor pre-processing, multi-sensor fusion, real-time video and vision processing, motion control, and acceleration of Robot Operating System (ROS) components. By offloading time-critical and compute-intensive tasks to FPGA hardware, robotic systems can achieve improved performance, reduced power consumption, and enhanced real-time responsiveness.

Level: Intermediate

Angelo Lo Cicero
Altera GmbH

4:20 p.m. – 5:00 p.m.

Tools & Methodologies

Developing with Lattice Propel

more info ▾
Description:
This session wil go through the development flow and learn about the various capabilities of the Propel Builder & Propel SDK, making use of the Radiant software and implemented on the MACHXO4 of Lattice.

Level: Beginner

Armin Faems

Arrow Central Europe GmbH

Philipp Henze
Arrow Central Europe GmbH

4:20 p.m. – 5:00 p.m.

Embedded / Vision

Emulation of Classic CPUs – a SoC-friendly Hybrid Approach

more info ▾
Description:
Emulation of classic 1980s CPUs, such as the Motorola MC680xx series, is currently achieved through:

(complete) software emulation on a modern processor

or
(complete) hardware emulation (RT-level) on an FPGA.

With the advent of SoC-FPGAs like the AMD Zynq devices, which combine CPU and FPGA logic in a single device and thus offer a platform for a hybrid approach to CPU emulation, another option has emerged.

This project reimplements a CPU from the 1980s in an SoC device. The following aspects are utilized:

The CPU uses sequential code to replace large logic blocks in the hardware emulation.
The FPGA uses hardware whose implementation would require Mach cycles.

Level: Intermediate

Volker Urban

Ingenieurbüro Dipl.-Ing. Volker Urban

4:20 p.m. – 5:00 p.m.

Embedded AI - #1

Reimagining Edge GenAI – Generative AI with Hailo-10

more info ▾
Description:
This presentation explores how generative AI at the edge can be efficiently deployed using the Hailo-10 AI accelerator. We will walk through practical GenAI use cases such as intelligent assistants, sensor-data interpretation, and multimodal inference, highlighting system architecture, performance, and power efficiency. The session will feature live, hands-on demonstrations, showing real-time execution of GenAI workloads on Hailo-10 and sharing practical insights into integration, optimization, and deployment for industrial and embedded environments.

Level: Intermediate

Ulrich Schmidt

EBV Elektonik GmbH & Co. KG

4:20 p.m. – 5:00 p.m.

Embedded AI - #2

Efficient Edge AI Inference on PolarFire SoC FPGAs Using VectorBlox 3.0 Compression Techniques

more info ▾
Description:
This presentation introduces VectorBlox 3.0, Microchip’s latest AI inference SDK and CNN acceleration engine for PolarFire® SoC FPGAs. The session focuses on how structured and unstructured model compression techniques enable higher AI throughput, reduced memory bandwidth, and improved AI-per-watt for edge vision applications. Real-world benchmarks using YOLOv8 and YOLOv9 networks demonstrate up to 1.6× acceleration with minimal accuracy degradation. The talk also covers the end-to-end deployment flow from PyTorch and ONNX models to FPGA implementation, along with reference designs and industrial edge AI use cases including smart cameras, surveillance, and autonomous systems.

Level: Intermediate

Saadeddine Ben Jemaa
Arrow Central Europe GmbH

4:20 p.m. – 5:00 p.m.

Embedded AI - #3

Dataflow driven Scalable AI Accelerator Architecture for FPGA and eFPGA Platforms

more info ▾
Description:
Current philosophies to AI acceleration in FPGAs follow a full-model inference approach. The ability to partition & schedule the execution, predict performance and co-locate other HPC tasks within the FPGA fabric are severely curtailed and subject to manipulating otherwise extensive tool-dependent workflows. This is clearly unacceptable for many real-world applications specially those under footprint and provisioning constraints.
We discuss the design of a FPGA-resident compute unit with performance predictability. This compute unit can be scaled in multiple dimensions to suit the available/remaining FPGA floor allowing the system designer much flexibility across the design space. We detail the integration into a software-based event queue that facilitates the scheduling of recurrent system-wide HPC tasks following a data-flow paradigm. We finish by presenting comprehensive & comparative performance assessments and measurements.

Level: Intermediate

Prof. Hans Dermot Doran
Zurich University of Applied Sciences

* subject to change