Realizing the Lucas Kanade motion estimation algorithm on Xilinx ZC702 board for Full HD real-time video analysis


Eduardo Serrano (eduardo.serrano@uclm.es) and Jesús Barba (jesus.barba@uclm.es)
School of Computer Science
University of Castilla-La Mancha, Spain
23/03/2018

Introduction

The goal of this tutorial is to develop a video application on a Xilinx’s ZynQ SoC (System-on-Chip) that performs real-time processing of a Full HD video stream. The application chosen for this tutorial is the Lucas Kanade motion estimation algorithm, a well known optical flow analysis method in computer vision.

For this demo, the Digilent’s FMC-HDMI board has been used to feed the system with an Full HD input video stream at a rate of 60 fps. The ZC702 version of the Xilinx’s ZynQ SoC is responsible for the processing of the input video stream, comparing two consecutive frames, and displaying the result through the built-in HDMI output port.

This tutorial is the first of a series of publication aimed to provide the non-experience FPGA designer with the necessary background to develop computer vision applications using Xilinx’s FPGA technology.

Overview of the design

Figure 1 shows the architecture of the system, where the flow of the video stream, components and their relations are represented.

 

Reference video processing platformFigure 1. Video processing platform developed in this tutorial.

First, the source of the video (i.e. camera) is connected to one of the HDMI inputs present in the FMC-HDMI board. This board is plugged to the ZC702 board through one of its FMC connectors. The FMC expansion board feeds the FPGA with a stream of data that has to be adapted before it can be processed. This is the role of the Video-In to AXI4-Stream. Then, a first Video DMA component (VDMA_0) takes the input video stream and stores frames in DDR memory.

The IP (Intellectual Property) component, which implements the Lucas Kanade (LK, from now on) algorithm, accepts two input streams, each one corresponding to two consecutive frames of the video sequence (that is, T and T+1). Therefore, a second VDMA component (VDMA_0) is necessary. The first VDMA will read the frame captured in time T from DDR memory and the second VDMA will read the frame T+1, providing, thus, the necessary data to the processing IP.

As a result of the execution of the LK algorithm, a new output stream is produced. This stream is a video frame which codifies the results of the motion estimation as pixels of different colors. The output video stream is adapted according to the format and timing requirements expected by the on-board HDMI output interface. To this end, the AXI4-Stream to Video-Out component is used. Figure 2 shows how the output video looks like.

IMG_20180322_113310_BURST1

Figure 2. The LK video processing platform at work

The reference project of the design, as well as the source code for both the firmware and the HLS model of the LK component are available by clicking THIS LINK. The demo has been developed using version 2016.4 of Xilinx Vivado Design Suite and assumes the GNU/Linux flavour of the toolchain.

The rest of this tutorial will guide the reader through all the necessary steps so as to have the design up and running from scratch.

Step 1. Generate the Lucas Kanade component

Before starting the design of the SoC itself, it is necessary to generate the video processing IP which, as it has been previously mentioned, implements the Lucas Kanade motion estimation algorithm.

It is not the aim of this tutorial to make a detailed explanation about how the algorithm works nor how it has been implemented to allow Vivado HLS (High-Level Synthesis) tool to generate an IP which is able to process one pixel of both input frames in one clock cycle.  The reader is invited to take a look at application note XAPP1300 1 from Xilinx in order to get the gist of the modeling process.

However, the application note does not cover the interfacing problem of the IP so that it can be usable in an actual design. Therefore, some works had to be carried out by us that enhanced the baseline implementation, providing connectivity and customizing the operation of the IP in order to fit the application functional requirements. We plan to publish in the coming weeks another article discussing this parallel work. Briefly, the major highlights of the improved version of the LK IP are:

  • Implementation of the AXI4-Stream interface in order to make the component usable in a real-life design.
  • Adaptation of some LK functions to allow the utilization of HLS Video functions and data structures.
  • Representation of the movement as a video frame instead of a set of vectors. Pixels are colored depending on the magnitude of the estimation performed by the core function.

Generation of the IP package

The design, testbench and other HLS project files can be found under the folder hls. The generation of the IP is automatically handled by a TCL script which is responsible for: (1) creating the Vivado HLS project; (2) drive the functional verification, synthesis and co-simulation validation stages; and (3) package the RTL into an IP ready to be included in our design.

You only have to set up the Vivado HLS environment by executing the settings script from a terminal window and type the following command line:

vivado_hls -f run_hls_script.tcl

Step 2. Create a new Vivado project

This step is very standard and nothing extraordinary happens but the addition of the IP previously generated to the IP catalog in our project. These are the parameters to take into account when going through the Create New Project wizard:

  • Project type is RTL (do not specify sources)
  • Select the ZYNQ-7 ZC702 Evaluation Board as the target platform.

Once the wizard finishes, go to menu ‘Tools Project Settings…’ and select the IP section. Then click on  Repository Manager  tab. Finally, click on Add Repository button and select the path where the IP package was generated (LK_tutorial/hls/LK_hls/solution1/impl/ip/).

Step 3. Add the design components

In this step, the designer creates the design block for the application. The list of IPs needed were identified in the overview of the system. Now, for each component it is detailed the required configuration.

It is important not to trigger the Run Block Automation wizard even though you are prompted with a message telling you to do so. Only execute the wizard if you are explicitly requested to proceed so.

The reason is twofold: (1) by default, some of the configurations inferred by the wizard might be not compatible or cause missfunction in our design; and (2) doing it by hand (at least the first times) will give you a good deal of knowledge about the design (especially everything related to timing) and how the different IPs are interconnected.

However, if you rather prefer the quick and easy route, you can execute the design.tcl script located in the hls/ folder and go straight to Step 4.

Regardless of the mode you chose (automatic or manual), start off by creating a new block design by clicking on ‘Create Block Design’ toolbar button (left panel, section IP Integrator).

Add the following building blocks to the block design

ZynQ Processing System

The first thing to do is to add the ZynQ Processing System component to the design. Then, click on Run Block Automation.

The configuration of this component is devoted to enable the necessary peripherals and ports, and configure the generation of clock signals. Double click on the IP and the customization window will appear. Navigate through the different sections (left panel) and set the values for the following parameters:

  • MIO Configuration. Enable the following I/O Peripherals:
    • I2C0 (MIO 50…51): it is used to configure the ADV7511 chip that controls the HDMI output interface.
    • I2C1 (EMIO): it is used to configure the ADV7611 chip that controls the HDMI input interface on the FMC board.
    • GPIO (as EMIO GPIO with 1 bit width): this port is used to send the reset signal to the ADV7611 chip.
  • Clock Configuration
    • PL fabric clocks: enable FCLK_CLK0 IO PLL 155Mhz.
  • PS-PL Configuration
    • HP Slave AXI Interface: enable S AXIS HP0 interface
  • Interrupts
    • Enable Fabric Interrupts and check IRQ_F2P[15:0]

Confirm the changes (click on OK button) and close the window.

NOTE: You might want to disable any other peripheral/PS features that your are not going to need in your design.

Processing IP

You only have to add and instance of the HLS core implementing the LK algorithm. If you followed the instructions in Step 1, just click the Add IP button and look for Hls_lk2.

Click on Run Connection Automation

Now, it is necessary to make external the GPO_O[0:0] and IIC_1 ports and the FCLK_CLK0 clock signal as well. This means that they will be assigned (see Section 4) and I/O FPGA pin in order to connect these ports/signals to the physical interface of the ADV chip, which controls the generation of the output video signal. Select the above mentioned ports/signals, right click on them and select Make External option. This creates automatically the necessary connections and ports.

For the ZynQ PS FCLK_CLK0 signal, repeat the same steps and change the name and frequency properties to hdmi_clk and 150000000 respectively (External Port Properties window, Figure 3).

hdmi_clk_port_conf

Figure 3. Configuration of the HDMI output clock

In our design there will be four interruption sources that have to be connected to the IRQ_F2P port of the ZynQ PS. The LK core is one of them, while the other three have their origin in the VDMA cores. Add a Concat IP to the design and configure it to have four ports as it is shown in Figure 4.

concat_configuration

Figure 4. Configuration of the Concat block

To connect the interrupt output signal of the LK core to the first input port of the xl_concat core, you either draw a wire or select one of them, right click and select the Make Connection option from the contextual menu. A window with a tree list displaying the possible sources/targets of the connection appears and you only have to select the right option (Figure 5).

Connect_LK_interrupt

Figure 5. Connect LK IP interrupt source

Do the same to connect the dout port from the xl_concant component to the IRQ_F2P port of the ZynQ PS.

AXI4-Stream to Video-Out

This IP performs the conversion from a stream of pixels to the application’s output video format. Add one instance and set the configuration as follows:

  • Pixels Per Clock: 1.
  • Video Format (Manual): YUV 4:2:2
  • AXI4S Video Input Component Width: Auto.
  • Native Video Output Component Width: 8.
  • FIFO Depth: 4096 (for now, leave enough buffering capacity. This value could be adjusted later)
  • Clock Mode: Independent.
  • Timing Mode: Master.
  • Hysteresis Level: 12.

Figure 6 shows graphically how the configuration of the core should look like. For further information about the meaning of the configuration parameters, you are referred to Xilinx Product Guide PG044.

AXI4S2VO

Figure 6. Configuration of the AXI4-S2VO block

The input stream of pixels is provided by the hls_LK2 IP, so you need to connect the YUV_img and video_in ports.

The AXI4-Stream to Video-Out IP (AXI4-S2VO, from now on) provides a video interfaz that will be converted to a video signal by the ADV7511 chip and, finally, displayed in a monitor connected to the HDMI connector of the ZC702 board.

Since the ADV7511 chip is independent of the FPA fabric, it is necessary to route the some of the AXI4-S2VO ports to the I/O pins of the FPGA assigned to the communication with the ADV circuit.

In Step 5 is shown how the actual connection is done but, firstly, it is necessary to create a bunch of external ports out of the AXI4-S2VO video interface.

To create an external port, the designer only has to right-click one signal of the design block and select the option Make External (or Ctrl+T). The tool automatically generates the external port. This process must be done for the following output port signals of the AXI4-S2VO component: vid_active_video, vid_data, vid_hsync and vid_vsync. Optionally, you can change the name of the external port created.

Secondly, we have to set up the clock and reset signals. Signal FCL_CLK0, from the ZynQ processing system, must be connected to input ports aclk and vid_io_out_clk of the AXI4-S2VO component and the aresetn port to peripheral_aresetn signal with origin the Processor System Reset component.

The easy way to do this, is to select the above mentioned ports and right click. A contextual menu appears where the option Make Connection is the one we are looking for. Instead of fighting against the tool and draw impossible wires through the mesh of components, just choose the desired component and signal from the list.

Next, it is necessary to set a constant value to some of the enable ports of the AXI4-S2VO component: clock is always enable and so is the video output. Therefore, the ports aclken and vid_io_out_ce will be fed with a logic one. We proceed by adding a constant component and making sure the val property is 1. It is up to you to change the name of the component (we used VCC in our design). Connect the output of the VCC component to the before mentioned inputs of the AXI4-S2VO core.

Finally, the vid_io_in_reset port of the AXI4-S2VO core has to be assigned a constant value of 0 since we don’t want the core to be reset (Figue 7). Add another constant IP (we renamed it as GND) but this time change the val property to 0 and connect the dout port.

configuracion_AXI4S-VO

Figure 7. Some input ports are fed with constant values

Video Timing Controller

This IP generates the synchronization signals for the recently added AXI4 Stream to Video-Out core. This core can generate two sets of sync signals (detection and generation) but only the generation pool is used in our design to govern the generation of the video signal.

Add the IP to the block design and configure it as follows (see Figure 8):

  • Detection/Generation:
    • Optional features: none of them must be checked.
    • Max clocks per line: 4096.
    • Max lines per frame: 4096.
    • Frame syncs: 1.
  • Check Enable Generation option and check the following sync signals:
    • Valid data (pixels): Active Video Generation.
    • Beginning of a new frame: Vertical Sync Generation and Vertical Blank Generation.
    • Beginning of a new line: Horizontal Sync Generation and Horizontal Blank Generation.
  • Uncheck Enable Detection.
  • Default/Constant:
    • Select Video Mode 1080p.
    • Leave the rest of the options with the default values.
  • Frame Sync Position:
    • Leave the default value.

For further information about the role and meaning of the above mentioned options, we refer the reader to the reference document Xilinx Product Guide PG016.

VTC_Configuration

Figure 8. Configuration of the Video Timing Controller block.

Then, the clock generation interface of the VTC core has to be connected to the AXI4-S2VO core. Remember that the VTC IP synchronizes the operation of the AXI4-S2VO core with the incoming stream of data from the LK IP. Connect the vtiming_out (VTC) to the vtiming_in and vtg_ce ports (AXI4-S2VO). The VTC input clock is, once more, the FCL_CLK0 signal which is used to rule the video timing interface that relates the VTC and AXI4-S2VO IPs.

Finally, connect the gen_clken, clken and resetn VTC ports to the output of the VCC component created in the previous step.

Video-In to AXI4-Stream

This IP performs the conversion from the source input video format to a stream of pixels. Add one instance and set the configuration as follows:

  • Pixels Per Clock: 1
  • Video Format: YUV 4:2:2
  • Native Video Input Component Width: 8.
  • AXI4S Video Output Component Width: 8.
  • FIFO Depth: 4096 (For now, leave enough buffering capacity. This value could be adjusted later)
  • Clock Mode: Independent.

Figure 9 shows graphically how the configuration of the core should look like. For further information about the meaning of the configuration parameters, you are referred to Xilinx Product Guide PG043.

AXI2S2VO

Figure 9. Configuration of the AXI4-VI2S block

The FMC-DIGILENT expansion board builds in an ADV7611 chip, which generates video stream after processing the video signal from the source (i.e. camera). The format of the stream can be configured by setting different parameters through the I2C interface.

In order to feed the FPGA with such video stream, it is necessary to connect the pins of the FMC board to ports in our design. Concerning the input video stream, five external input ports must be created and assigned later to their counterparts in the AXI4-VI2S component.

In Section 4 you will be guide through the final process of assigning FPGA physical I/O pins to these ports. Those pins connect directly to the corresponding I/O pins of the FMC interface, closing the circle.

Create_Port

Figure 10.Creating an external port (example)

To create a new port is easy. Right click on any empty area of the design canvas and select the option Create Port (you can also use the shortcut Ctrl+K). A window like the one shown in Figure 10 pops up. These are the parameters for the required five ports and the connections:

External Port Direction Type Properties Connect to AXI4-VI2 port
active_vid_in

hsync_vid_in

vsync_vid_in

Input Other vid_active_video

vid_hsync

vid_vsync

data_vid_in Input Data Vector [15:0] vid_data
LLC Input Clock 148.5 MHz vid_io_in_clk

Table 1. Configuration of the external input ports needed in our design.

There is also another way to create external input ports out of the actual input ports of the AXI4-VI2S component. You only have to select the port, right click and select the Make External option from the contextual menu. Vivado will handle the details on your behalf. Just change the names (if you’d like to) and, very important, the frequency property for the vid_io_in_clk associated external port (148500000 Hz).

As the last step, connect the clock (FCLK_CLK0 → aclk), the VCC output to the vid_io_in_ce, aclk_en and axis_enable inputs, the GND output to the vid_io_in_reset input and the aresetn to the peripheral_aresetn signal with source

VDMA_0

When it comes to video processing approaches, there are two main strategies: to use or not to use framebuffers. A framebuffer is a portion of memory containing a bitmap that drives a video display (thanks to Wikipedia). In other words, the use of framebuffers implies the presence of a circuitry that converts the content of a memory region into a video signal (and the other way around if we are dealing with the acquisition of video data).

In our design, the combo AXI4-VI2S/AXI4-S2VO + VDMA is the infrastructure created for such purpose. VDMA stands for Video Direct Memory Access,  an IP that provides high-bandwidth direct memory access between DDR memory and AXI4-Stream video type target peripherals, including peripherals which support the AXI4-Stream Video protocol. For further information about AXI VDMA IP, please check Xilinx Product Guide PG020.

The use of framebuffers demands, as well, of enough amount of memory in order to store the incoming frames on top of a considerable bandwidth between memory and the processing chain. Whenever the processing chain were not able to keep the pace of the video input, important information would be lost.

In this system, VDMA_0 will write in DDR memory the incoming video frames (from Video-In to AXI4-Stream)  and will read (synchronized with VDMA_1) the same video frames from memory and present them (as a stream of pixels) at one of the inputs of the LK IP.

Figure 11 shows the configuration parameters for the first of the two VDMA IPs in our design.

Figure 11. Configuration of the VDMA_0 block

Now, click on Run Connection Automation twice. There are still some signals/ports unconnected that we need to take care now. Just follow the instructions provided in Table 2:

VDMA_0 Signal/Port Source/Target IP Source/Target Signal/Port Description
S_AXIS_S2MM AXI4-S2VI video_out Input pixel stream
M_AXIS_MM2S HLS_LK2 inp1_img Input frame 1
m_axis_mm2s_aclk

s_axis_s2mm_aclk

ZynQ PS FCLK_CLK0 Clock
mm2s_introut Concat in1[0:0] Interrupt signal: frame sent to IP
s2mm_introut Concat in2[0:0] Interrupt signal

Table 2. I/O  connections of the VDMA_0 block

VDMA_1

For the second instance of the VDMA component, only the read channel has to be enabled.  The IP will read the video frames from DDR memory and present them as the second input pixel stream of the LK IP.

Figure 12 shows the configuration parameters for the second if the two VDMA IPs in our design.

Figure 12. Configuration of the VDMA_1 block

Now, click on Run Connection Automation. There are still some signals/ports unconnected that we need to take care now. Just follow the instructions provided in Table 3:

VDMA_1 Signal/Port Source/Target IP Source/Target Signal/Port Description
M_AXIS_MM2S HLS_LK2 inp2_img Input frame 2
m_axis_mm2s_aclk ZynQ PS FCLK_CLK0 Clock
mm2s_introut Concat in3[0:0] Interrupt signal: frame sent to IP
s2mm_frame_ptr_out Video DMA 1 mm2s_frame_ptr_in VDMAs synchronization

Table 3. I/O  connections of the VDMA_1 block

There are some interesting details about how both VDMA components team up in order to provide the LK IP with the right sequence of frames. The LK IP processes two frames at a time, which are provided by VDMA_0 and VDMA_1.  Both VDMAs work with a set of four frames (frame buffer) and, since VDMA_0 is the one that writes video frames to DDR memory, there must be some synchronization mechanism so that VDMA_1 is notified when to read a new frame from DDR. On top of that, this mechanism is used by the VDMA_1 core to know which frame to red and do not step on frames that have not been processed yet.

Xilinx VDMA core implements two different ways to achieve this goal (see section Genlock Synchronization in PG022). In this design, a Dynamic approach is used. One of the VDMA cores (VDMA_0) plays the role of master, while the other is the slave (VDMA_1). The master signals the slave via the s2mm_frame_ptr_out port whether a new frame has just been written in memory (it is a 5-bit signal that codifies the number of frame within the framebuffer). The slave receives this information at its mm2s_frame_ptr_in so that it knows which frame to access (the previous to the last one signaled by the master). In brief, you have to connect these two ports as stated in the last row of Table3 (see Figure 13 below).

VDMA_connection2

Figure 13. Synchronization of Video DMA components

Step 4. Configuring the FCM-HDMI input interface

As it was mentioned in the overview of the system, the source of the video is connected to the FMC-HDMI board which knows how to deal with the HDMI interface and adapts it to a more simple one, ready to be easily used in our design. Such interface is defined by the physical pinout of the ADV7611 chip, which is the actual hardware that handles the HDMI input port in our video the acquisition board. Thus, the signals to be considered when connecting the FMC board are the following:

  • Pixel clock or  Local Locked Clock.
  • Video synchronization signals (vid_vsync and vid_hsync) which are used to determine the start and end of a new frame or  new line, respectively.
  • Video active signal (vid_active_video) that indicates whether the pixel data is valid or not.
  • Data (vid_data), 16-bit signal that codifies one pixel. It depends on the configuration of the ADV chip and the selected format2. In this project, YUV 4:2:2 (16-Bit SDR ITU-R BT.656 4:2:2 Mode 0) format was used.

In the reference design, the FMC board was plugged in the FMC2 connector (see Figure 14). In order to perform the right pin assignment, the designer must look at the documentation of the Digilent FMC-HDMI board and check for the FMC pin that drives out of the board the signals of interests (in our case, the ones listed above). Then, the designer must read the documentation of the ZC702 board 3 and determine the correspondence between the FMC pin and the FPGA pin.

The actual assignment of external I/O signal to specific pins of the ZynQ SoC is done through the definition of constraints. The designer can create a new constraints file with Vivado tool by right clicking on the Hierarchy tab of the Sources window in the GUI. A contextual menu appears and the select Constraints→ Add Sources → Add or Create. The file constraintsHDMI_IN.xdc in /LK_tutorial/hls/constraints/ contains the definition of restrictions for this step, ready to be imported to a Vivado project.

 

Figure 14. The FMC-HDMI board is connected to the FMC2 connector.

Pinout configuration for the ADV7611

I2C port

The configuration of the ADV chip is done through an I2C port which needs two pins in order to implement the serial communication. It is necessary, in first place, to enable the second I2C controller (IIC_1) in the ZynQ SoC (see Step 2). The I2C controller makes it use of two signals (iic_1_scl_io and iic_1_sda_o) to implement the I2C master interface. The following lines of the constraints file show how these EMIO signals are driven outside the FPGA fabric through pins V4 and AA12. These pins are connected to the FMC2 pins LA21_N and LA25_P, the counterparts of the clock and data signal port of the I2C controller on the FMC board.

# I2C Chain on FMC
set_property PACKAGE_PIN V4 [get_ports iic_1_scl_io]
set_property IOSTANDARD LVCMOS25 [get_ports iic_1_scl_io]
set_property SLEW SLOW [get_ports iic_1_scl_io]
set_property DRIVE 8 [get_ports iic_1_scl_io]
set_property PACKAGE_PIN AA12 [get_ports iic_1_sda_io]
set_property IOSTANDARD LVCMOS25 [get_ports iic_1_sda_io]
set_property SLEW SLOW [get_ports iic_1_sda_io]
set_property DRIVE 8 [get_ports iic_1_sda_io]

Data and synchronization signals

These are the constraint definitions so as to connect our design ports to the FPGA I/O pins that are physically connected to the pins of the FMC interface, where the Digilent FMC-HDMI expansion board will be connected.

# HDMI Input (ADV7611) on FMC. ZC702 board.
# Sync signals
set_property PACKAGE_PIN Y19 [get_ports LLC]
set_property IOSTANDARD LVCMOS25 [get_ports LLC]
set_property PACKAGE_PIN U10 [get_ports vsync_vid_in]
set_property IOSTANDARD LVCMOS25 [get_ports vsync_vid_in]
set_property PACKAGE_PIN T6 [get_ports hsync_vid_in]
set_property IOSTANDARD LVCMOS25 [get_ports hsync_vid_in]
set_property PACKAGE_PIN U9 [get_ports active_vid_in]
set_property IOSTANDARD LVCMOS25 [get_ports active_vid_in]

# Data signals
set_property PACKAGE_PIN AA14 [get_ports {data_vid_in[0]}]
set_property PACKAGE_PIN T22 [get_ports {data_vid_in[1]}]
set_property PACKAGE_PIN Y14 [get_ports {data_vid_in[2]}]
set_property PACKAGE_PIN Y15 [get_ports {data_vid_in[3]}]
set_property PACKAGE_PIN W15 [get_ports {data_vid_in[4]}]
set_property PACKAGE_PIN U21 [get_ports {data_vid_in[5]}]
set_property PACKAGE_PIN AB17 [get_ports {data_vid_in[6]}]
set_property PACKAGE_PIN T21 [get_ports {data_vid_in[7]}]
set_property PACKAGE_PIN R6 [get_ports {data_vid_in[8]}]
set_property PACKAGE_PIN U4 [get_ports {data_vid_in[9]}]
set_property PACKAGE_PIN T4 [get_ports {data_vid_in[10]}]
set_property PACKAGE_PIN AA13 [get_ports {data_vid_in[11]}]
set_property PACKAGE_PIN U22 [get_ports {data_vid_in[12]}]
set_property PACKAGE_PIN Y13 [get_ports {data_vid_in[13]}]
set_property PACKAGE_PIN AB15 [get_ports {data_vid_in[14]}]
set_property PACKAGE_PIN AB14 [get_ports {data_vid_in[15]}]
set_property IOSTANDARD LVCMOS25 [get_ports data_vid_in*]

Remember! the LLC, vsync_vid_in, hsync_vid_in, active_vid_in and data_vid_in signals referenced in the constraints file are external ports created in Step 2. Now, these signals are connected to the clock, synchronization and data pins with origin the FMC-HDMI board.

These signals are needed by the Video-In to AXI-Stream component to create a data stream compatible with the AXI4-S interface, which is present in the majority of IPs related to the processing of video and image in Vivado IP catalog, included our LK IP.

Reset

The initialization of the ADV7611 chip implies generating a reset signal to the FMC board via one of the GPIO ports of the ZynQ SoC. The reset operation is performed by the firmware (see Section 7).

These are the lines that need to be present in the constraints file to drive pin 0 of the GPIO_O port to the reset pin of the FMC-HDMI board (V12 → LA32_P in FMC2).

set_property PACKAGE_PIN V12 [get_ports {GPIO_O[0]}]
set_property IOSTANDARD LVCMOS25 [get_ports {GPIO_O[0]}]

Step 5. Configuring the on-board output HDMI interface

The ZC702 board builts in a ADV7511 chip which is the component responsible for generating the appropriate video signal to be driven out the board using the standard HDMI interface. Figure 15 shows where the ADV chip and the connector are placed. The ADV chip is, thus, external to the ZynQ SoC and is connected to it through a bunch of pines.

Otras figurasFigure 15. HDMI connector and ADV7511 chip in the ZC702 board

Two main activities must be carried out in order to make the HDMI output available to the rest of the system:

  • Configure the pinout of the ZynQ SoC to drive the ADV7511 signals in and out the FPGA.
  • Program the firmware to configure the ADV7511 chip with the parameters of our video application.

Pinout configuration for the ADV7511

The assignment of external I/O signal to specific pins of the ZynQ SoC is done through the definition of constraints. The designer can create a new constraints file with Vivado tool by right clicking on the Hierarchy tab of the Sources window in the GUI. A contextual menu appears and the select Constraints→ Add Sources → Add or Create. The file constraintsHDMI_OUT.xdc in LK_turorial/hls/constraints/ contains the definition of restrictions for this step, ready to be imported to a Vivado project.

As a prerequisite, the signals to be driven out of the FPGA were made external (Section 3) so that during the synthesis process, Vivado knows how to route them and connect to the I/O pins specified in the constraints file.

These can be summarized as follows:

ADV7511 I2C port

The configuration of the ADV chip is done through an I2C port which needs two pins in order to implement the serial communication. It is necessary, in first place, to enable a free I2C controller in the ZynQ SoC. In our design, the IIC_0 is assigned to this task and it makes use of pins 50 and 51 of the MIO interface. The following lines of the constraints file do the connection of the FPGA and ADV7511 pins.

Output clock signal port

This port is necessary to synchronize the output video stream with the operation of the ADV chip. Basically, the hdm_clock signal tells the ADV chip when a valid pixel is available. The frequency of this signal depends on the characteristic of the video stream generated as it can be seen in Table 4.

Resolution Freq (Mhz) Frame size Fps
1080p 148.5 1920×1080 60
SXGA 110 1280×1024 60
720p 74.25 1280×720 60
XGA 65 1024×768 60
SVGA 40 800×600 60
576p 27 720×576 60
480p 27 720×480 60
VGA 25.175 640×480 60

Table 4.  Pixel clock configuration for different video resolutions4

In this tutorial, the video resolution the system is able to handle is 1080p, which requires a clock of 148.5 Mhz. However, it is truly extraordinary that we were capable of generating the exact frequency using the resources of the ZynQ SoC. So, the closest frequency generated in our design is 150 Mhz which is quite convenient. It is not necessary to be 100% precise since the bias between the required and the “provided” is compensated by means of the use of buffers implemented as FIFOs.

The following lines of the constraints file drive the hdmi_clk signal out of the FPGA fabric to the clock port of the ADV7511 chip.

set_property PACKAGE_PIN L16 [get_ports hdmi_clk]
set_property IOSTANDARD LVCMOS25 [get_ports hdmi_clk]

Video data output ports and synchronization

Finally, it is necessary to feed the ADV chip with the actual data (pixels) and some synchronization information: horizontal (when the line of the frame ends) and vertical (when the frame ends). These signals (vid_data, vid_vsync and vid_hsync) come from the output of the Axi-Stream to Video Out component of the design.

It is worth mentioning that the pins assigned to the data port depends on the format of the pixel. Thus, in this example, it is used the YUV 4:2:2 format which uses 16 pins in total: 8 (from 8 to 15) to codify the luma component and 8 (from 0 to 7)  to codify the red and blue chroma components.

Here below are the lines from the .xdc file that realizes the connection of the data and sync ports in our design.

# HDMI Output (ADV7511) for ZC702 board
#Synchronization
set_property PACKAGE_PIN H15 [get_ports vid_vsync]
set_property PACKAGE_PIN R18 [get_ports vid_hsync]
set_property PACKAGE_PIN T18 [get_ports vid_active_video]
set_property IOSTANDARD LVCMOS25 [get_ports vid_*]

# Data
set_property PACKAGE_PIN AB21 [get_ports {vid_data[0]}]
set_property PACKAGE_PIN AA21 [get_ports {vid_data[1]}]
set_property PACKAGE_PIN AB22 [get_ports {vid_data[2]}]
set_property PACKAGE_PIN AA22 [get_ports {vid_data[3]}]
set_property PACKAGE_PIN V19 [get_ports {vid_data[4]}]
set_property PACKAGE_PIN V18 [get_ports {vid_data[5]}]
set_property PACKAGE_PIN V20 [get_ports {vid_data[6]}]
set_property PACKAGE_PIN U20 [get_ports {vid_data[7]}]
set_property PACKAGE_PIN W21 [get_ports {vid_data[8]}]
set_property PACKAGE_PIN W20 [get_ports {vid_data[9]}]
set_property PACKAGE_PIN W18 [get_ports {vid_data[10]}]
set_property PACKAGE_PIN T19 [get_ports {vid_data[11]}]
set_property PACKAGE_PIN U19 [get_ports {vid_data[12]}]
set_property PACKAGE_PIN R19 [get_ports {vid_data[13]}]
set_property PACKAGE_PIN T17 [get_ports {vid_data[14]}]
set_property PACKAGE_PIN T16 [get_ports {vid_data[15]}]
set_property IOB TRUE [get_ports vid_d*]

Step 6. Generate de bitstream and set up the SDK project

This step is, again, pretty standard if you are familiar with the design flow using Vivado tools. Just follow the golden brick road:

  1. Validate the design
  2. Generate the Output Products
  3. Create the HDL wrapper
  4. Run synthesis
  5. Run implementation
  6. Generate bitstream

Now, export the hardware (File→ Export → Include bitstream) to a location of your preference and Lauck SDK. The SDK splash window should be appearing in your screen within a few seconds.

Finally, in the SDK environment, create a new application project (File → New → Application Project) with the following parameters:

  • Name of the project: as you wish.
  • Select standalone as the OS Platform.
  • Hardware platform: should be the design wrapper generated by Vivado an referenced as design_1_wrapper_hw_platform_0.
  • Language: C.
  • Create a new Board Support Package (BSP).
  • When you click next, select a template for the application. For instante, Hello World. Do not select empty application. The former will automatically generate some auxiliary files out of the BSP information that are needed by the software.

Step 7.  Embedded software

Folder arm_sw contains the source files of the embedded application to be executed by one of the ARM core present in the ZC702 board. Table 5 summarizes their role:

File/s Description
lk.c It is the main program. Initializes and configures all the components in the system.
xaxivdma_ext.c5

xaxivdma_ext.h

Define a set of utility functions to configure the read and write channels of the VDMA components in the platform. They rely on the xaxivdma platform driver, providing a higher level API, easier to use.
xiicps_ext.c

xiicps_ex.h

Implementation of multi-byte read/write functions on top of the xiicps platform driver.
hdmi_io.c6

hdmi_io.h

Implements a set of utility functions, on top of the xiicps platform driver, that configures both the Digilent FMC-HDMI board and the embedded HDMI output interface.

Table 5. Main source files in the embedded software project

The reader is invited to look into these files in order to get a better understanding of the functionality of the system. All of them are easy to read and are plenty of comments which make them good candidates for reutilization in latter projects.

Overall structure of the main program

The entry point to the embedded software is the main function in file lk.c. This function is in charge of the initialization and configuration of the SoC components, making it use of the utility functions defined in other source files. The principal steps are:

  1. Initialization of the PS platform and VDMA configuration data structures.
  2. Initialization of the IIC controllers (iic_init): this operation implies setting up the IIC clock rates.
  3. Initialization of the input HDMI interface: this operation implies the configuration of the ADV7611 chip through one of the IIC controllers we put to work in the previous step (digilent_fmc_hdmii_init).
  4. Initialization of the output HDMI interface: this operation implies the configuration of the ADV7511 chip through one of the IIC controllers we put to work in the previous step (zc702_hdmio_init).
  5. Reset the read/write channels of the VDMA components: this is done to set the system in an initial known state.
  6. Configure the read/write channels of the VDMA components (ReadSetup, WriteSetup): this step needs further explanation (see below).
  7. Start the operation of the VDMA components.
  8. Configure and start the Lucas Kanade IP: basically the size of the frames to be processed plus some standard parameters.

Configuration of the VDMAs

As it has been mentioned in the overview of the application, the implementation of the LK algorithm compares two frames that have a spatial-temporal relation (T and T+1). Each frame is fed to the hls_LK2  IP as a stream of pixels to be processed on the fly.

VDMA_0 writes the input frames in a region of DDR memory (that is, the frame buffer) which has been dimensioned to hold up to 4 different frames. The framebuffer is managed by VDMA_0 (well, in fact, by all VDMA components in the design) as a circular structure, which means that after, writing one frame in the last position of the buffer, the first one is selected. The following snippet of code of function WriteSetup (file xaxivdma_ext.c) shows the relevant lines of code to this end.

/* Enable circular buffer property */
WriteCfg.EnableCircularBuf = EnableCircularBuf;
...
/* Initialize buffer physical addresses   */
for(Index = 0; Index < 4; Index++) {
   WriteCfg.FrameStoreStartAddr[Index] = BaseAddr + BlockOffset;
   BaseAddr += HoriStride * 2 * VertStride;
}

/* Set the buffer addresses for transfer in the DMA engine
* The buffer addresses are physical addresses
*/

Status= XAxiVdma_DmaSetBufferAddr(InstancePtr,
XAXIVDMA_WRITE,WriteCfg.FrameStoreStartAddr);

Therefore, frames are stored in DDR memory following a pattern as the one shown in Figure 16. Since the LK core expects the input frames in a specific order (T – T+1, T+1 – T+2, … and so on), the configuration of the read operation for VDMA_0 and VDMA_1 must take into account this requirement.

Frame buffer

Figure 16. Framebuffer management (read the picture clockwise)

Although both VDMAs work on the same framebuffer, during the configuration of the frame descriptors it is specified a different order so that there is an one-frame offset between VDMA_0 and VDMA_1.

/* VDMA_1. Initialize buffer physical addresses  */
for(Index = 0; Index < 4; Index++) {
   ReadCfg.FrameStoreStartAddr[Index] = BaseAddr + BlockOffset;
   BaseAddr += HoriStride * 2 * VertStride;
}

/* VDMA_1. Initialize buffer physical addresses  */
ReadCfg.FrameStoreStartAddr[1] = BaseAddr + BlockOffset;
BaseAddr += HoriStride * 2 * VertStride;
ReadCfg.FrameStoreStartAddr[2] = BaseAddr + BlockOffset;
BaseAddr += HoriStride * 2 * VertStride;
ReadCfg.FrameStoreStartAddr[3] = BaseAddr + BlockOffset;
BaseAddr += HoriStride * 2 * VertStride;
ReadCfg.FrameStoreStartAddr[0] = BaseAddr + BlockOffset;

Figure 16 also depicts the actual relation between the frame read pointers for VDMA_0 and VDMA_1.

Step 9. Deploy and test the application

We are approaching to the end of this tutorial and you are a few clicks away of seeing the result of such a hard work so far.

First, be sure that you have everything in place and connected. A final effort, just go through this checklist:

  • Is the camera, the video output of my computer or any other source of video connected to HDMI input 1 of the FMC-HDMI card?
  • Is the JTAG programming cable connected to your computer? We assume that you have the interface operational.
  • Did you turn on the ZC702 board?
  • Is the monitor connected to the HDMI output of the ZC702 board?

If your answer is YES to all these questions, then go ahead and follow the instructions:

  1. Click the Program Bitstream button of the SDK window.
  2. Click the Run button and select Run as ?? option.

While the firmware is being uploaded to the board and the ARM processor initialized, nothing happens. Don’t panic. In brief, you will see a chameleonic picture like the one shown in  Figure 17 but don’t forget to play some video or move your camera so differences between frames are detected. Otherwise, you will only see a beautiful black screen.

IMG_20180322_113310_BURST1

Figure 17. Our video processing infrastructure in action

Footnotes

  1. Daniele Bagni, Pari Kannan, and Stephen Neuendorffe. Demystifying the Lucas-Kanade Optical Flow Algorithm with Vivado HLS.
  2. Analog Devices. ADV7611 Software Manual. Rev A.
  3. Xilinx. User Guide 850. ZC702 Evaluation Board for the Zynq-7000 XC7Z020 All Programmable SoC User Guide. V1.6 January 2018.
  4. Table from ‘FMC-IMAGEON – Building a Video Design from Scratch Tutorial’ (page 8).
  5. Source: AVNET HDL Reference Designs (github site).
  6. Source: part of the code was taken from Xilinx’s application note XAPP1205 “Designing High-Performance Video Systems with the Zynq-7000 All Programmable SoC Using IP Integrator” (files zc702_i2c_utils.c and zc702_i2c_utils.h)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s