ZYNQ Architecture#
The overall architecture of Zynq consists of two parts: PS (Processing System) and PL (Programmable Logic). The power circuits for these two parts are independent of each other, allowing PS and PL to be used separately, with unused parts powered down to reduce power consumption. However, the most valuable mode of Zynq is when the two components are combined for use.
PS (Processing System)#
As the foundation of the processing system, all chips contain a dual-core ARM Cortex-A9 chip. This is a hard processor, a dedicated and optimized silicon component on the chip.
In addition to the "hard" processor, there is another option, such as the "soft" processor like MicroBlaze, which is composed of PL end units, equivalent to the IP on the PL side. In contrast, the "hard" processor can achieve relatively high performance, while the number and precise implementation of the "soft" processor are flexible.
It is worth mentioning that one or more MicroBlaze soft processors can be allocated on the PL side of Zynq to work in coordination with the hard core. For example, the soft core can be responsible for coordinating specific low-level functions and the interaction between systems, offloading less demanding tasks from the hard core to improve overall performance.
The PS side of Zynq does not only contain ARM processors but also a set of related processing resources that form an Application Processing Unit (APU), along with extended peripheral interfaces, cache memory, memory interfaces, interconnect interfaces, and clock generation circuits.
PL (Programmable Logic)#
The logic part of Zynq is based on Artix7 and Kintex7 FPGA components.
Logic Part#
- Configurable Logic Block (CLB) — CLB is a small-scale, general grouping of logic units arranged in a two-dimensional array in the PL, connected to other similar resources through programmable interconnects. Each CLB contains two slices and is adjacent to a switch matrix.
- Slice — A subunit within a CLB that contains resources for implementing combinational and sequential logic circuits.
-
Lookup Table (LUT) — A flexible resource that can implement
- Logic functions with up to 6 inputs
- A small piece of Read-Only Memory (ROM)
- A small piece of Random Access Memory (RAM)
- A shift register
LUTs can be combined as needed to form larger logic functions, memory, or shift registers.
-
Flip-flop (FF) — A sequential circuit that implements a one-bit register with a reset function. One of its uses is to implement latches.
-
Switch Matrix — Each CLB is adjacent to a switch matrix that provides flexible routing capabilities to connect units within the CLB or to connect the CLB to other resources in the PL.
-
Carry Logic — Arithmetic circuits require signal transmission between adjacent slices, which is achieved through carry logic.
-
Input/Output Block (IOB) — IOB facilitates the interfacing of PL logic resources and provides physical device "pads" to connect to external circuits. Each IOB can handle one bit of input or output signal and is generally located at the periphery of the chip.
Special Resources: DSP48E1 and BRAM#
These two resources are integrated in columns within the logic array, embedded in the logic part, and are close to each other because intensive computation and storing data in memory are often closely related operations.
BRAM#
The BRAM in Zynq-7000 is the same as the BRAM in other Xilinx 7 series FPGAs, capable of implementing RAM, ROM, and FIFO, while also supporting error correction coding.
Each BRAM can store up to 36KB of information and can be configured as a 36KB RAM or two independent 18KB RAMs. It can also be "reshaped" to include more smaller units or combined to form larger capacity RAM.
Using BRAM means that a large amount of data can be stored in a dedicated storage unit optimized within the chip, using very little physical space. Another method is Distributed RAM (DRAM), which is built using LUTs in the logic part. To create a memory of comparable size to BRAM, a large number of LUTs are needed, and the resulting implementation is affected by increased logic and routing delays that limit timing performance. On the other hand, implementing small memories with DRAM is advantageous as it offers high resource utilization and more flexible layout. BRAM often operates at the highest clock frequency supported by the chip.
DSP48E1#
The LUTs in the logic part can be used to implement arithmetic operations of arbitrary length, but since long-word arithmetic circuits occupy a large space within the logic slices, such layouts and routing can lead to suboptimal clock frequencies. Therefore, it is better to use LUTs for short-word operations.
DSP48E1 is specifically designed for high-speed arithmetic operations on long-word signals, being a dedicated silicon resource that includes a pre-adder/subtractor, multiplier, and post-adder/subtractor within the logic unit.
The post-adder can also be used as a logic unit, allowing it to perform logical operations and support all basic Boolean operations.
If larger word-length operations are needed, multiple DSPs can be combined for expansion.
General Input/Output#
The general input/output functionality on Zynq is collectively referred to as SelectIO resources, which are composed of 50 IOBs per group, with each IOB having a pad to connect to the external world.
The I/O groups are divided into High Performance (HP) or High Range (HR). The HP interface has a maximum voltage of 1.8V and is typically used for high-speed interfaces connecting memory and other chips; the HR interface allows for a voltage of 3.3V, suitable for connecting various IO standards. Both interfaces support single-ended and differential signals.
Each IOB also includes an IOSERDES resource that can perform programmable conversions between parallel and serial formats, with a data bit width of 2 to 8 bits.
Communication Interfaces#
Zynq contains GTX transceivers and high-speed communication interface blocks embedded within the logic part.
Other Programmable Logic Expansion Interfaces#
- ADC — XADC, with two independent 12-bit ADCs, each with a sampling rate of 1Msps.
- Clock — PL receives four independent clock inputs from PS and can also generate and distribute its own PS-independent clock.
- JTAG Debug Interface
Interface Between PS and PL#
As mentioned earlier, the performance of Zynq relies not only on the characteristics of its two components, PS and PL, but also on the ability to coordinate the two to form a complete, integrated system. A set of highly customized AXI interconnects and interfaces plays a key role in bridging the two parts. Additionally, there are other types of connections between PS and PL, particularly EMIO.
AXI#
Advanced eXtensible Interface. The current version is the fourth generation AXI4.
AXI4#
Used for memory-mapped connections, supporting the highest performance by transferring up to 256 data words to specify an address.
AXI4-Lite#
A simplified connection that only supports one data transfer at a time. AXI4-Lite is also memory-mapped, transferring one address and a single data at a time.
AXI4-Stream#
Used for high-speed streaming data, supporting bulk transfers of data of arbitrary size. There is no addressing mechanism, suitable for direct data flow between source and destination.
EMIO Interface#
EMIO involves transmission between two domains, implemented by a set of simple wires.
EBAZ4205 Mining Board Information Summary#
Expansion Version#
ebaz4205 Expansion Board - Lichuang Open Source Hardware Platform (oshwhub.com)
Development Board Completion#
Learning ZYNQ from Scratch (Based on Mining Card EBAZ4205) (Part 1) - CSDN Blog
Schematic Related#
Mining Board Schematic
Elrori/EBAZ4205: EBAZ4205 BOARD (github.com)
PCB