## **AXI4-Stream Communication Protocol. AXI4-Stream Compliant Modules** ### **Structure of Computer Systems Laboratory** # 1 Objective This laboratory work introduces basic information about the behaviour of AXI4-Stream compliant hardware modules. The major objective is to describe the general mechanism based on which AXI4-Stream modules exchange data and to show how it can be implemented in order to develop such hardware components. # 2 AXI4-Stream Internal Behaviour As described in the previous laboratory work, AXI4-Stream protocol is used to exchange data between various hardware modules and it is extremly useful in processing data in a streaming fashion. The I/O interfaces of such modules must comply with certain rules defining a ready-valid handshake. In case of AXI4-Stream modules which perform simple operations in just one clock cycle, the internal behaviour can be synthetically represented as a two-state Finite State Machine (FSM). In one state, the module accepts data and performs the actual operation, while in the other state it provides the result at the output. Figure 1: Internal behaviour of AXI4-Stream modules represented as a FSM Figure 1 shows the state diagram which describs, in a simplified way, the operating mechanism of an AXI4-Stream module that is capable of performing the actual processing in a single clock cycle. | State | Description | Transition condition | Next state | |---------------|---------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|---------------| | READ OPERANDS | The module is ready to retrieve the values received at the input and performs the actual operation. | All inputs are valid. | WRITE RESULTS | | WRITE RESULTS | The module provides the result at the output and indicates that the output is valid on every output interface | All receivers are ready to accept the results provided by the module. | READ OPERANDS | Table 1: FSM states Table 1 shows the description and the transition condition for the two states of the FSM. The transition conditions can be described reffering the actual I/O signals as follows: - READ OPERANDS $\rightarrow$ WRITE RESULTS occurs when all TVALID signals of all the **input** interfaces are HIGH and the current state is READ OPERANDS - WRITE RESULTS $\rightarrow$ READ OPERANDS occurs when all TREADY signals of all the **output** interfaces are HIGH and the current state is WRITE RESULTS # 3 Example of AXI4-Stream Compliant Module This section provides an example of AXI4-Stream compliant adder/subtractor, implemented in VHDL using the approach described in the previous section. The meaning of the port names is the one explained in the previous laboratory work. ``` library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_SIGNED.ALL; entity int_adder_subtractor is Port ( aclk : IN STD_LOGIC; s_axis_a_tvalid : IN STD_LOGIC; s_axis_a_tready : OUT STD_LOGIC; s_axis_a_tdata : IN STD_LOGIC_VECTOR(31 DOWNTO 0); s_axis_b_tvalid : IN STD_LOGIC; s_axis_b_tready : OUT STD_LOGIC; s_axis_b_tdata : IN STD_LOGIC_VECTOR(31 DOWNTO 0); s_axis_operation_tvalid : IN STD_LOGIC; s_axis_operation_tready : OUT STD_LOGIC; s_axis_operation_tdata : IN STD_LOGIC_VECTOR(7 DOWNTO 0); m_axis_result_tvalid : OUT STD_LOGIC; m_axis_result_tready : IN STD_LOGIC; m_axis_result_tdata : OUT STD_LOGIC_VECTOR(31 DOWNTO 0) ); end int_adder_subtractor; architecture Behavioral of int_adder_subtractor is type state_type is (READ_OPERANDS, WRITE_RESULT); signal state : state_type := READ_OPERANDS; signal res_valid : STD_LOGIC := '0'; signal result : STD_LOGIC_VECTOR (31 downto 0) := (others => '0'); signal a_ready, b_ready, op_ready : STD_LOGIC := '0'; signal internal_ready, external_ready, inputs_valid : STD_LOGIC := '0'; begin s_axis_a_tready <= external_ready;</pre> s_axis_b_tready <= external_ready;</pre> s_axis_operation_tready <= external_ready;</pre> internal_ready <= '1' when state = READ_OPERANDS else '0';</pre> inputs_valid <= s_axis_a_tvalid and s_axis_b_tvalid and s_axis_operation_tvalid; external_ready <= internal_ready and inputs_valid;</pre> m_axis_result_tvalid <= '1' when state = WRITE_RESULT else '0';</pre> m_axis_result_tdata <= result;</pre> process(aclk) begin if rising_edge(aclk) then case state is when READ_OPERANDS => if external_ready = '1' and inputs_valid = '1' then if s_axis_operation_tdata = "00000000" then result <= s_axis_a_tdata + s_axis_b_tdata;</pre> else result <= s_axis_a_tdata - s_axis_b_tdata;</pre> end if; ``` The performed operation is selected based on the s\_axis\_operaion\_tdata input, as described in Table 2. | Table 2: Operation selection | | | | |------------------------------|-----------|--|--| | s_axis_operaion_tdata | Operation | | | | 0000000 | + | | | | ≠ 00000000 | - | | | When the value received on the input port s\_axis\_operaion\_tdata is equal to 00000000, the module performs the addition of the values received on the input ports s\_axis\_a\_tdata and s\_axis\_b\_tdata and when the value of the s\_axis\_operaion\_tdata input signal is not equal to 00000000, it performs the subtraction of the two values. The most important signals are described in Table 3. Table 3: Description of the main signals | Signal | Description | | |----------------|---------------------------------------------------------------------------------------|--| | state | Signal to represent the current state of the FSM | | | inputs_valid | Control signal which indicates that all inputs are valid | | | internal_ready | Control signal which indicates that the module is ready to accept data given as input | | | | (HIGH when the current state is READ_OPERANDS) | | | external_ready | Control signal which ensures that all inputs are consumed at once (HIGH when all | | | | inputs are valid and the module is ready to accept input data) | | The signals which depend only on the current state of the finite state machine are internal\_ready and m\_axis\_result\_tvalid. These signals are not influenced by the values of other signals. - internal\_ready is HIGH in the READ\_OPERANDS state, to indicate that the module is internally ready to accept the input data. The external\_ready is the control signal which, additionally, ensures that all the inputs are consumed at once. Therefore, external\_ready is HIGH only when both internal\_ready and inputs\_valid are HIGH. - m\_axis\_result\_tvalid is HIGH only during the WRITE\_RESULT state to indicate that the output is valid. These considerations are synthetically described in Table 4, Table 4: State-dependent signals | State | internal_ready | m_axis_result_tvalid | |---------------|----------------|----------------------| | READ_OPERANDS | HIGH | LOW | | WRITE_RESULT | LOW | HIGH | ## 4 Exercises 4.1 Implement an AXI4-Stream compliant module which adjusts the value given as input so that it fits in a given range. The range limits are also provided as inputs. Hints - The module should be implemented as a FSM, in a similar way to the adder shown in Section 3. - The actual operation performed by the module can be described as follows: ``` procedure Adjust-Range (a, min, max) if a > max then res \leftarrow max else if a < min then res \leftarrow min else res \leftarrow a end if end if return res end procedure ``` • The structure of the entity is shown below. ``` entity range_adjuster is Port ( aclk : IN STD_LOGIC; s_axis_a_tvalid : IN STD_LOGIC; s_axis_a_tready : OUT STD_LOGIC; s_axis_a_tdata : IN STD_LOGIC_VECTOR(31 DOWNTO 0); s_axis_max_tvalid : IN STD_LOGIC; s_axis_max_tready : OUT STD_LOGIC; s_axis_max_tdata : IN STD_LOGIC_VECTOR(31 DOWNTO 0); s_axis_min_tvalid : IN STD_LOGIC; s_axis_min_tready : OUT STD_LOGIC; s_axis_min_tdata : IN STD_LOGIC_VECTOR(31 DOWNTO 0); m_axis_result_tvalid : OUT STD_LOGIC; m_axis_result_tready : IN STD_LOGIC; m_axis_result_tdata : OUT STD_LOGIC_VECTOR(31 DOWNTO 0) ): end range_adjuster; ``` 4.2 Implement an AXI4-Stream compliant module which computes the sum of a window of values. The module gets a single value at once, then updates the internally stored window of values and provides at the output the sum of all the values in the window. #### Hints - The window is represented internally as an array of std\_logic\_vector. - The size of the window should be a generic parameter of the entity. - All the values in the window are initially set to 0. - The position where the new value is placed is indicated by an index, which is updated when a new value is read, as follows: ``` \begin{aligned} & \textbf{if} \ index < window.size - 1 \ \textbf{then} \\ & index \leftarrow index + 1 \\ & \textbf{else} \\ & index \leftarrow 0 \\ & \textbf{end if} \end{aligned} ``` Figure 2 shows how the index indicates the current position where the newest value is placed. Figure 2: Internal representation of the window - Remember the difference between signal and variable assignment in VHDL. Use a variable to compute the sum because the window signal is updated only when the process is suspended. To avoid using the old values, initialize the sum variable with the newest value and then compute the sum of all the values in the window except for the value placed at the position indicated by the index signal. - The steps performed by the module when a new value is provided at the input can be described as follows: ``` \begin{array}{c} \mathbf{procedure} \ \mathsf{SLIDING\text{-}WINDOW\text{-}SUM}(a) \\ sum \leftarrow a \\ \mathbf{for} \ i \leftarrow 0, window.size - 1 \ \mathbf{do} \\ \mathbf{if} \ i \neq index \ \mathbf{then} \\ sum \leftarrow sum + window[i] \\ \mathbf{end} \ \mathbf{if} \\ \mathbf{end} \ \mathbf{for} \\ \mathbf{return} \ sum \\ \mathbf{end} \ \mathbf{procedure} \end{array} ``` • The structure of the entity is given below: ``` entity sliding_window_adder is Generic ( WINDOW_SIZE : integer := 5 ); Port ( aclk : IN STD_LOGIC; s_axis_a_tvalid : IN STD_LOGIC; s_axis_a_tready : OUT STD_LOGIC; s_axis_a_tdata : IN STD_LOGIC_VECTOR(31 DOWNTO 0); m_axis_sum_tvalid : OUT STD_LOGIC; m_axis_sum_tready : IN STD_LOGIC; m_axis_sum_tdata : OUT STD_LOGIC_VECTOR(31 DOWNTO 0) ); end sliding_window_adder; ``` 4.3 Connect the previously implemented modules to get a module which computes the sum of a window of values which are pre-adjusted to fit in a given range. Create a testbench to simulate your design. #### Hints • The block design of the module is shown in Figure 3. Figure 3: The block design of the module to be implemented # References - [1] AMBA 4 AXI4-Stream Protocol Specification Arm, accessed 11/09/2023, https://documentation-service.arm.com/static/642583d7314e245d086bc8c9?token= - [2] How the axi-style ready/valid handshake works, accessed 12/09/2023, https://vhdlwhiz.com/how-the-axi-style-ready-valid-handshake-works/ - [3] Stimulus file read in testbench using TEXTIO, accessed 16/09/2023, https://vhdlwhiz.com/stimulus-file/