Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

AMD logo

AMD Vitis™ AI Engine Tutorials

See Vitis™ Development Environment on amd.com
See Vitis™ AI Development Environment on amd.com

MUltiple SIgnal Classification (MUSIC) Algorithm on AI Engine

Version: Vitis 2025.2

Table of Contents

  1. Introduction
  2. System Model
  3. Subspace Algorithm
  4. MUSIC Spectrum Estimation
  5. MATLAB Model
  6. AI Engine Subgraph Designs
  7. Top Level Design
  8. Building the Design
  9. Hardware-in-the-Loop Demo
  10. Conclusions

References

Appendix

Support

Introduction

This tutorial implements the Multiple Signal Classification (MUSIC) Algorithm [1] on the AI Engine. MUSIC is a popular algorithm for Direction of Arrival (DOA) estimation in antenna array systems. This tutorial assumes a system model with an 8-element uniform linear array (ULA). The algorithm chosen for AI Engine implementation operates directly on 128 x 8 data snapshots using a QR-Factorization followed by a Singular Value Decomposition (SVD) to obtain the basis vectors of the noise subspace. The QRD algorithm uses the well known modified Gram-Schmidt approach. The SVD algorithm adopts a one-sided Jacobi rotation approach with a fixed set of four iterations.

This tutorial implements the MUSIC algorithm fully in the AI Engine and validates its performance in real time hardware running on the VC1902 device (-2M speed grade) on the VCK190 evaluation board. A Hardware-in-the-Loop (HIL) demonstrator connects a host computer running MATLAB® over TCP/IP to the VCK190 board that delivers buffered array snapshots to the board and receives DOA estimates back in real time to demonstrate a best effort DOA estimation throughput rate of 1 MHz. The following table summarizes the system parameters for this tutorial.

This MUSIC tutorial was co-developed by AMD and a third party partner, Fidus Systems.

Parameter Value Units
Uniform Linear Array $N=8 $ elements
Array element spacing $d=\lambda/2$ m
Data snapshot 128 x 8 samples
Target throughput 1 $\mu s$
Subspace algorithm QRD + SVD n/a
One-sided Jacobi SVD 4 iterations
MUSIC spectrum resolution 256 bins
MUSIC spectrum sweep range 9 to 248 bins
Minimum DOA separation 6 deg

System Model

The system model consists of:

  • A uniform linear array (ULA) with $N$ equally $d$-spaced antenna elements and,
  • A set of $S$ sources emitting or echoing narrow-band independent signals $\textbf{x}_1,\ldots,\textbf{x}_S$.
  • The direction of arrival (in azimuth) of these signals at the ULA are $\theta_1,\ldots,\theta_S$.

The signal $\textbf{a}(t)=[a_1(t),\ldots,a_N(t)]$ received by the ULA at time $t$ can be expressed in matrix form as $\textbf{a}=\textbf{D}\times\textbf{x}+\textbf{w}$ where $\mathbf{D}=[d_{\theta_1},\ldots,d_{\theta_S}]^T$ and $\textbf{d}_{\theta_k}$ is the $k$-th steering vector for the ULA. The vector $\textbf{w}$ is uncorrelated white Gaussian noise. The data vectors $\textbf{a}(t)$ may be collected into a $128 \times 8$ snapshot matrix $\textbf{A}$ to collect one sample from each of $8$ elements of the array obtained over $128$ consecutive time instants. The following diagram shows the ULA receiving scenario.

figure

Subspace Algorithm

MUSIC performs DOA estimation using a subspace approach involving a peak search across the noise subspace of the array. The critical first step requires identification of the basis vectors of this noise subspace. Several approaches are possible. Eigenspace methods are popular. This tutorial adopts an algorithm based on QR-Decomposition and SVD as these algorithms can be implemented efficiently on the AI Engine. Data flow is simplified because this approach operates directly on the snapshot matrix $\textbf{A}$.

The following figure demonstrates the overall concept. The snapshot matrix $\textbf{A}$ is "tall and skinny" with dimensions $128\times 8$. The basis vectors of the noise subspace $\textbf{V}_r$ may be computed from a two step procedure. First, a QR-Decomposition of the snapshot matrix $\textbf{A}=\textbf{Q}\textbf{R}$ produces the $\textbf{R}$ matrix with upper triangular portion $\textbf{R}_r$. The $\textbf{Q}$ matrix may be discarded. Second, the SVD of $\textbf{R}_r=\textbf{U}_r\textbf{S}_r\textbf{V}_r^\dagger$ provides a basis for the desired noise subsplace by selecting the appropriate columns of $\textbf{V}_r^\dagger$ based on identifying the noise subspace singular values from $\textbf{S}_r$. This subspace identification may be performed simply by extracting the $S$ smallest singular values if $S$ is known, or the # of active signals may be identified online using simple (that is, "thresholding") or more advanced (that is, "information theoretic") techniques. This tutorial assumes $S$ is known.

figure

MUSIC Spectrum Estimation

Once identified, MUSIC uses the columns of the noise subspace $\textbf{V}_r$ to identify incident signal directions orthogonal to that subspace. This involves sweeping the steering vector (as a function of the DOA) against these noise subspace basis vectors to compute the strength of the so-called MUSIC pseudo-spectrum given in the following equation. The index $P$ represents the index of the first noise singular value (assuming they have been sorted in descending order). Also $\textbf{V}_r(j)$ denotes the $j$-th column of $\textbf{V}_r$. Once again, $P$ may be assumed known or estimated using various techniques. The $\times$ operator below represents an inner product. The $\textbf{s}(k)$ represents the steering vector which is a function of the array manifold and the presumed direction of arrival.

figure

MUSIC may solve the preceding equation either directly by looking for the peaks of the pseudo-spectrum that occur when the steering vector becomes orthogonal to the noise subspace. This occurs in the preceding equation when its denominator goes to zero. Computing these peaks requires a costly division operator. Instead, the denominator can inspect instead for its nulls. This typically gives a similar result but requires no division operator. This tutorial uses the latter approach to reduce the compute workload.

MATLAB Model

MATLAB models of MUSIC validate the algorithmic approach taken and provide a means to synthesize I/O data for the AI Engine implementation. The system model may be found in the <path-to-repo>/matlab/System folder. You can configure the system parameters shown in the following code in the Configuration/testCfg.m file. After editing this file (or going with its default settings), you can run the MUSIC model using the testMusic.m script from the <path-to-repo>/matlab/System folder.

figure

Running the MATLAB model produces the console and figure output as follows. It shows statistics for the accuracy of the Gram-Schmidt QRD used by the tutorial versus the built-in qr() MATLAB function. The consolde also provides the difference between the MATLAB svd() versus the one-sided Jacobi algorithm with four iterations used by the tutorial. The figure plots the peaks of the MUSIC pseudo-spectrum along with the nulls of its corresponding denominator which are identified instead by the approach adopted in the tutorial. As well as the console and figure output, the MATLAB model generates a detailed dump of the system signals in the top-level data folder.

figure

figure

AI Engine Subgraph Designs

The full MUSIC algorithm on AI Engine is built from a data flow graph containing six different subgraphs: IO Adapter, QRD, SVD, DOA, Scanner, and Finder. This section provides an overview of each one of these subgraphs.

IO Adapter Subgraph

The IO Adapter subgraph delivers the $\textbf{A}$ matrix from the input PLIO to the QRD subgraph. Buffers are used for I/O for all downstream MUSIC subgraphs. All of these subgraphs may use a single I/O buffer read over a high bandwidth memory interface, moving from tile to tile in a linear fashion. This is clarified in more detail in the following section. No bandwidth limitations are encountered downstream due to this use of the 256-bit AI Engine memory interface. However, the design must be fed by two PLIO streams @ 32-bits for 1250 MHz to achieve a 1 $\mu s$ throughput overall. The IO Adapter subgraph sinks two input PLIO streams and combines them into a single output buffer containing the input $\textbf{A}$ matrix to be processed by the first QRD subgraph. Two streams are required because the $128\times 8$ elements of $\textbf{A}$ cannot be transferred over a single PLIO in 1 $\mu s$. The following block diagram shows the AI Engine physical array view for the IO Adapter subgraph.

figure

QRD Subgraph

The following MATLAB code shows the QRD algorithm adopted in this tutorial. It contains an inner loop and an outer loop. Its loop structure has been modified for more efficient AI Engine implementation. The outer loop updates of R(kk,kk) and Q(:,kk) have been moved from after to before the inner loop. Also, the inner loop indices now run from kk+1 to C instead of from 1 to kk-1. This admits an easier to implement form of software pipelining in which a single AI Engine tile may be assigned to compute a single iteration of each outer loop body. For example, the first tile computes the R(1,1) and Q(:,1) updates and then performs all inner loop computations corresponding to kk=1. This tile will then pass all values of R and Q to the next tile in the chain. The second tile will accept these inputs but only update R(2,2) and Q(:,2) and all inner loop computations corresponding to kk=2. This approach requires a total of $8$ tiles to process the $128\times 8$ snapshot matrix $\textbf{A}$ for MUSIC.

function [Q,R] = qrd_mgssr_hw_model(A)

   [R,C] = size(A);
   Q = single(A);
   R = single(zeros(C,C));
   
   % Here, 'kk' now represents the tile
   % --> Each tile performs its outer loop and then performs the inner loop iterations for all 
   %     columns that follow
   for kk = 1 : C
      R(kk,kk) = norm(Q(:,kk));
      Q(:,kk) = Q(:,kk) * (1.0/R(kk,kk));
      for ii = kk+1 : C
         R(kk,ii) = transpose(conj(Q(:,kk))) * Q(:,ii);
         Q(:,ii) = Q(:,ii) - R(kk,ii) * Q(:,kk);
      end
   end
end

It turns out this $8$-tile solution does not provide sufficient compute capacity to achieve the 1 $\mu s$ throughput objective of the tutorial. You can increase throughput when you partition each inner loop body into its own tile. Similarly, you can partition each outer loop body to its own tile. The algorithm exhibits $C$ outer loop iterations, and $C-k$ inner loop body iterations for each outer loop $k$. It follows the total # of tiles required is $C + C(C-1)/2$. For the $8$ columns here, this equals $8+8\times7/2=36$ tiles.

The following diagram shows the AI Engine floorplan for this $36$-tile solution. Here, there is no attempt to floorplan the design—the tools elect by default to simply use the second row for buffers. These could be co-located in the first row for many of the tiles, shown as follows in the final floorplan.

figure

The following diagram shows additional details of the AI Engine QRD $norm()$ kernel code. The code is in three separate workloads:

  • The "Initialization" code accepts the $R$ and $Q$ inputs from the previous tile and initializes $R$ to zero for the first tile.
  • The "QRD Norm" code computes the $norm()$ required by the QRD outer loop body, updates the appropriate $Q$ column.
  • The "Output" code delivers the updated $R$ and $Q$ values to the following tile. The $Q$ is not returned by the last tile.

Note that kernels with indices $0,8,15,21,26,30,33,35$ perform the outer loop $norm()$ operations whereas the remaining tiles compute the inner loop bodies. Only the upper triangular portion of the $R$ matrix updates as it passes through the AI Engine pipeline.

figure

The following diagram shows additional details of the AI Engine QRD $qr()$ kernel code. The code is in three separate workloads:

  • The "Initialization" code accepts the $R$ and $Q$ inputs from the previous tile.
  • The "QR" code computes the dot product between columns Q(i) and Q(m) and then updates Q(i) based on the result.
  • The "Output" code delivers the updated $R$ and $Q$ values to the following tile.

figure

SVD Subgraph

The following MATLAB code shows the SVD algorithm adopted in this tutorial. It contains three nested loops. The outer-most loop performs identical "iterations." This tutorial performs a fixed set of $N_I=4$ iterations per SVD. The inner two loops admit a structure similar to the QRD analyzed above except all compute workloads are in the inner most loop only. here is no workload in the outer of these two loops. The inner loop workload involves computing a $2\times2$ Jacobi rotation matrix Rot and then applying that matrix to both V and W.

function [U,S,V] = svd_one_sided_jacobi( A, max_iter )
   if     (nargin == 1)  max_iter = 4;
   elseif (nargin ~= 2) error('Incorrect signature'); 
   end
   
   % Perform one-sided Jacobi for MUSIC application (no need to compute 'U' in principle):
   [m,n] = size(A);
   V = single(eye(n));
   W = single(A);

   for iter = 1 : max_iter
      for p = 1 : n-1
        for q = p+1 : n
           Rot = compute_rotation( W(:,p), W(:,q) );
           V(:,[p,q]) = V(:,[p,q]) * Rot;
           W(:,[p,q]) = W(:,[p,q]) * Rot;
        end
      end
   end
   
   % Compute singular values and 'U':
   U = single(zeros(m,n));
   S = single(zeros(n,n));
   for ii = 1 : n
     S(ii,ii) = sqrt(W(:,ii)'*W(:,ii));
     U(:,ii) = W(:,ii)/S(ii,ii);
   end
end

For completeness, the following MATLAB code defines the compute workload for the one-sided Jacobi rotation. This renders the two vectors Xv and Yv orthogonal. This workload contains some vectorizable dot product operations along with some $sqrt()$, $inv()$, and squaring operations. These can map to the AI Engine vector data path or can leverage the non-linear hardware accelerator on the scalar data path.

function [res] = calc_ei_2t(x,y)
   R = sqrt(x^2+y^2);
   res = complex(x/R,y/R);
end

function [res] = calc_ei_t(sin_2t,cos_2t)
   R = sqrt((1+cos_2t)^2 + sin_2t^2);
   res = complex((1+cos_2t)/R, sin_2t/R);
end

function [Rot] = calc_rot( eit, c, s )
   Rot = [      eit *complex(0,s),       eit*complex(c,0);
           conj(eit)*complex(-c,0), conj(eit)*complex(0,-s) ];
end

function [Rot] = compute_rotation( Xv, Yv )
   Hpp = real(transpose(Xv)*conj(Xv));
   Hqq = real(transpose(Yv)*conj(Yv));
   tmp = transpose(Xv)*conj(Yv);
   Hrr = real(tmp);
   Hjj = imag(tmp);
   
   ei_2t1 = calc_ei_2t(Hjj,Hrr);
   ei_t1  = calc_ei_t(imag(ei_2t1),real(ei_2t1));
   
   tx = 0.5*(Hqq-Hpp);
   ty = Hrr * imag(ei_2t1) + Hjj * real(ei_2t1);
   
   ei_2t2 = calc_ei_2t(tx,ty);
   ei_t2  = calc_ei_t(imag(ei_2t2), real(ei_2t2));
   
   Rot = calc_rot( ei_t1, real(ei_t2), imag(ei_t2) );
end

You can parallelize the SVD across multiple AI Engine tiles in a manner similar to the QRD. The most aggressive scheme assigns a single AI Engine tile to each inner-most loop body—in essence the system of three nested loops fully flattens. It turns out this scheme is overkill for the throughput target of 1 $\mu s$. Instead, it is possible to partition three inner loop body workloads to each AI Engine tile and still meet the requirement. This saves considerable resources; The SVD requires only $38$ tiles in total.

The following screenshot shows the kernel object creation in the adf::graph of the SVD graph implementation in svd_graph.h. The code comments indicate which indices $(p,q)$ are assigned to each tile. Each tile is assigned three inner loop workloads. The last tile is only assigned two inner loop workloads. However, the last tile also performs the final workload to compute the singular values required by the MUSIC algorithm for identifying the noise subspace basis vectors.

figure

The following diagram shows the AI Engine physical array view for the SVD subgraph. The data flow graph has a linear structure similar to the QRD graph, although requires less memory resources. This is because the $\textbf{U}$, $\textbf{S}$, and $\textbf{V}$ matrices are all $8\times 8$ in this case.

figure

DOA Subgraph

The DOA subgraph estimates the MUSIC Spectrum $\hat{\textbf{P}}_m$ defined earlier at $256$ equally spaced bins. To achieve the target throughput of 1 $\mu s$, the workload is partitioned across a number of tiles where each tile computes the spectrum for $L$ consecutive points. The following diagram shows this. Meeting the throughput requires a value of $L=4$, equivalent to $64$ AI Engine tiles.

figure

The following diagram shows the AI Engine physical array view for the DOA subgraph. The data flow graph is linear with $64$ total tiles to support evaluation of the MUSIC spectrum over $256$ equally spaced points. The incidence signals used for the MUSIC workloads are pre-computed at compile time based on the array steering vector and stored in lookup tables. Data flow proceeds from tile to tile, each one passing the noise subspace basis vectors and number of sources to the next tile in the graph. Each tile computes $L=4$ spectrum bins and passes them down the pipeline. Bins are passed with cfloat data type with the imaginary part set to zero.

figure

Scanner Subgraph

The Scanner subgraph performs a coarse-grained search of the MUSIC spectrum computed by the DOA subgraph, looking for regions of the spectrum that fall under a "null threshold". The Scanner breaks apart the $256$ spectrum bins into $32$ contiguous groups of $8$ bins each. It produces $32$ output tags that set to true if there exists a bin value in that group which falls under the given null threshold. Meeting the 1 $\mu s$ throughput target requires two AI Engine tiles, where each tile processes $128$ of the $256$ available bins. The following diagram outlines this algorithmic approach.

figure

The following diagram shows the AI Engine physical array view for the Scanner subgraph. As noted previously, the design requires $2$ tiles and some additional storage for I/O buffers.

figure

Finder Subgraph

The Finder subgraph performs a fine-grained search of the MUSIC spectrum regions tagged by the Scanner subgraph to identify a negative-to-positive gradient change in the spectrum as highlighted in the preceding diagram. When the finder finds a gradient change, it sets the tag to the index of the bin corresponding to the local minimum. To meet the 1 $\mu s$ throughput target, this search is partitioned over a $16$ tile pipeline. In the pipeline, each tile performs the fine-grained search on two of the 8-bin regions. The following diagram shows the AI Engine physical array view for the Finder subgraph.

figure

Top-Level Design

This section provides an overview of the top-level VC1902 design of the MUSIC algorithm. The following diagram shows screenshots of the top-level Versal block design (BD) in IP integrator. It includes the CIPS, NoC, DDR Interface, and AI Engine hard IPs on the left side. The Vitis Region is on the right side and includes three HLS kernels. Two mm2s() kernels provide data movers to pass snapshot matrices $\textbf{A}$ from DDR to the AI Engine over two PLIO streams. A single s2mm() kernel provides a data mover to pass the resultant MUSIC output tags back to DDR. The Vitis linker v++ has inserted clock domain crossing and data width converter IPs to match the rates from these three HLS blocks running with 128-bit I/O @ $312.5$ MHz to the 64-bit I/O @ $625$ MHz used by the PLIO interface to the AI Engine.

figure

The following diagram shows the final AI Engine physical array floorplan for the MUSIC design. Some minimal floorplanning steers the kernel locations by the tools. You could do additional floorplanning to tighten up the local tile memory placements. The tiles associated with each MUSIC subgraph have been color coded for ease of identification. The design consists of six different subgraphs: IO Adapter $(1)$, QRD $(36)$, SVD $(38)$, DOA $(64)$, Scanner $(2)$, Finder $(16)$. The full design requires a total of $157$ compute tiles. Arrows in the diagram identify the "snake-like" data flow. Alternative placements are possible.

figure

The following diagram shows the final PL floorplan of the VC1902 device. Most of these highlighted resources involve circuitry required to support the base platform of the VCK190 evaluation board and are not related to the MUSIC algorithm itself.

figure

The following diagram captures the device level resource utilization of the VC1902 device. The design is using a small portion of the available PL resources.

figure

You can achieve timing closure of the top-level device automatically with the standard Vitis v++ link and package flow. This is because you need only three data movers to support the MUSIC implementation in the AI Engine array, and this hardened portion of the design requires no timing closure.

figure

Building the Design

Setup and Initialization

IMPORTANT: Install Vitis™ 2025.2 software before beginning the tutorial. Download the Common Images for Embedded Vitis Platforms from this link.

Set the environment variable COMMON_IMAGE_VERSAL to the full path where you have downloaded the Common Images. Then set the environment variable PLATFORM_REPO_PATHS to the value $XILINX_VITIS/base_platforms. Additional information on this process may be found here.

The remaining environment variables are configured in the top-level Makefile <path-to-design>/18-MUSIC-Algorithm/Makefile file.

RELEASE=2025.2

PLATFORM_NAME              = xilinx_vck190_base_202520_1
PLATFORM_PATH              = ${PLATFORM_REPO_PATHS}

export PLATFORM            = ${PLATFORM_PATH}/${PLATFORM_NAME}/${PLATFORM_NAME}.xpfm
export SYSROOT             = ${COMMON_IMAGE_VERSAL}/sysroots/cortexa72-cortexa53-amd-linux
export KERNEL_IMAGE        = ${COMMON_IMAGE_VERSAL}/Image
export ROOTFS              = ${COMMON_IMAGE_VERSAL}/rootfs.ext4
export PREBUILT_LINUX_PATH = ${COMMON_IMAGE_VERSAL}

Hardware Emulation

This tutorial is not set up to run hardware emulation as it contains a full "Hardware-in-the-Loop" demonstrator outlined below. It is only necessary to build the top-level design for hardware to generate an SD card to run on the VCK190 evaluation board.

Hardware

You can build the design for the VCK190 evaluation board using the Makefile as follows:

[shell]% cd <path-to-repo>/
[shell]% make all TARGET=hw

The build process generates the SD card image in the <path-to-repo>/package/sd_card folder.

Hardware-in-the-Loop Demo

This section provides an overview of the HIL demo system, including how to use MATLAB to drive the system. Details on how to setup the VCK190 evaluation board and the ethernet connection between the host computer and the VCK190 board are in the Appendix.

Architecture

The following diagram shows the architecture of the HIL system. It consists of a host computer connected to the VCK190 evaluation board. The HIL starts and terminates on the host computer.

figure

System Operation

The HIL system performs the following operational steps:

  1. A system MATLAB model that runs on the host computer generates a set of synthetic snapshots from a simulation model. $S$ targets are configured to move at constant velocities while emitting an EM signal. This results in incident signals towards the ULA with given angles with respect to the boresight direction.
  2. The host computer runs the MUSIC model on the generated snapshots and generates reference results for the detected DOAs. The results are compared to values produced by the AI Engine implementation. You can perform steps 1 and 2 again after Step 5.
  3. The VC1902 PS application initializes the hardware data path including memory buffers, PL data movers, and AI Engine configuration.
  4. The VC1902 PS application starts a TCP server to accept incoming TCP connection requests on a port specified via a command line parameter.
  5. The host computer starts a TCP client on the port number on which the TCP server accepts incoming TCP/IP packets.
  6. The TCP client on the host computer sends the configured number of snapshots in a single batch carried in TCP/IP packets. A frame encapsulates each snapshot in the batch. The frame includes a header that provides start of frame, a sequence number, snapshot type and several other system and MUSIC configuration parameters. See the Appendix.
  7. PS client data extracts the payload from incoming TCP/IP packets and collects data in the memory buffers in DDR4 before initiating the hardware pipeline.
  8. Upon the collection of the last (or a unique) snapshot from TCP client, the fabric mm2s() DMA starts transferring the snapshots to the AI engine as a single DMA operation.
  9. The AI Engine receives data from two 64-bit PLIO inputs running @ 625 MHz, performs MUSIC processing steps through one or more AI Engine graph iterations, depending on the number of snapshots.
  10. The AI Engine outputs the spectral bins and DOA vector transferred by the s2mm() PL DMA into the memory buffer located in DDR4 of the VCK190 board.
  11. The PS application waits for all hardware pipeline stages to complete their operations and transfers the expected amounts of data to/from memory.
  12. The TCP server sends the output produced by the AI Engine back to the TCP client. Each snapshot output including its spectrum bins. A frame with a proper header encapsulates the DOA vector.
  13. The host computer receives the spectrum and DOA vector computed by the AI Engine and plots the results of both the expected and reference data.
  14. The TCP/IP server remains in listening mode for a fixed amount of time, while the client performs data visualization.
  15. Go to Step 12 if no new data is received from the TCP/IP client during the wait time window. Go to Step 6 if the TCP/IP client sends a new input batch.

Performance Estimation

Estimate AI Engine performance for each batch run by reading the AI Engine profiling counter. This counter value $C$ at the end of a batch run equals the number of cycles for MUSIC to process an entire batch of $K$ snapshots. The average sweep time is then given as $T=(C-B/8)/(K-1)$ where $B$ is the number of bytes produced by the AI Engine for each snapshot (that is, 256 cfloat bins + 32 cfloat tags = 2304B). Examples of some demo waveforms are given in the following table.

Test Case Average Sweep Time (ns) Note
Demo 3 (64 snapshots) 993 0 sources
Demo 4 (40 snapshots) 972 1 source
Demo 5 (40 snapshots) 969 2 sources
Demo 6 (40 shapshots) 960 3 sources

Software Version

The HIL system uses MATLAB version R2023a Update 4.

MATLAB Folder Structure

The MATLAB folder tree includes both the HIL folder and the SYSTEM folder. The HIL system does not use the SYSTEM folder — it contains golden system and MUSIC models. A copy of the MUSIC models is in the HIL folder and is used by the HIL system. So the HIL folder is self-contained.

figure

Steps to Generate and Run HIL Demo Data

Step #1: Start MATLAB and change directory to the MatlabClient folder and open the Configuration/systemConfig.m file.

Step #2: Configure or update the highlighted system parameters in Configuration/systemConfig.m according to your system preferences, and save the file.

figure

The following diagram is helpful to understand how the system preferences configure signal geometry. The system implements a ULA. $(x,y)$ Cartesian coordinates identify source locations. They move with a velocity in that plane with respect to the boresight of the ULA.

figure

Step #3: Configure or update the highlighted MUSIC parameters in Configuration/musicConfig.m according to your designed algorithm preferences and save the file.

figure

Step #4: Run the genSnapshots.m script. The script generates snapshots and stores them in the Snapshots folder. This script applies MUSIC on the batch snapshots and saves the data under the MusicResults folder.

figure

Step #5: It is possible that some generated snapshots fall outside the coverage zone. In these cases, tune the parameters to make sure all snapshots fall within the coverage zone.

figure

Step #6: Configure or update the highlighted parameters in Configuration/hilCfg.m and save the file.

figure

Step #7: Configure or update the IP address and port number in TcpIp/getIpAddr.m and save the file.

figure

Step #8: Run the sendSnapshots.m script. The generated snapshots will be sent to the remote server. The client enters listening mode and waits for a response from the server. Once the response is received, MATLAB models launch visualization as shown in the following figure.

figure

The TCP/IP server remains in listening mode for a fixed amount of time, while the client is performing data visualization. If the server receives no new data from the TCP/IP client during the wait time window, the server sends the previous responses again, and MATLAB performs data visualization. To send new data, send the clear client command as shown in the following figure and go to Step #2.

figure

Archiving Demo Data

To archive a batch of snapshots, use the following steps.

Step #1: Generate a batch of data as outlined in the preceding section.

Step #2: Run the script createDemoDir('dir-name') to archive the generated batch and reference data under the folder dir-name.

figure

You can run any archived snapshot data through the HIL system by continuing with the following steps.

Step #3: Configure or update the highlighted parameters in the Configuration/hilCfg.m and save the file. This time set cfg.demoData to 'archived' and then set cfg.demoDataSet to the name of the archived data set.

figure

Step #4: Configure or update the IP address and port number in TcpIp/getIpAddr.m and save the file. This might already be correctly set.

Step #5: Set the parameter cfg.nSource in the Configuration/systemCfg.m file to the number of sources to match the archived data. Run the sendSnapshots.m script. The script sends the archived shapshots to the remote server. The client enters listening mode and waits for a response from the server. When it receives the response, the MATLAB models launch visualization as outlined previously. Again, the TCP/IP server remains in listening mode for a fixed amount of time while the client performs data visualization. If the server receives no new data from the TCP/IP client during the wait time window, the server resends the previous responses, and MATLAB continues with data visualization. To send new data, send the clear client command again, as outlined earlier.

Playback Videos

Playback videos are available under the Video folder after the visualization goes through the entire responses of a batch.

figure

Client and Server on MATLAB

Use the following steps to run the HIL on two instances of MATLAB. Here you are not running MUSIC on the VCK190 evaluation board, but instead are using a second MATLAB instance to emulate the board.

Step #1: Start two instances of MATLAB. Set the root directory for the two instances to HIL/MatlabClient and HIL/MatlabServer respectively.

figure

Step #2: Use ipconfig on Windows (or equivalent if using Linux) to get the local IP address, and configure using the MATLAB script TcpIp/getIpAddr.m as outlined previously.

Step #3: Create a TCP server on the MATLAB server instance as shown in the following figure.

figure

Step #4: On the client MATLAB instance, send the generated or archived batch using the sendSnapshots.m script as outlined earlier. The client enters listening mode.

Step #5: On the server instance run the script emulateMUSIConVCK190.m to emulate the VCK190 workload. The server instance runs MUSIC on the received batch and sends back the results.

Step #6: The client instance plots the responses it receives.

Conclusions

This tutorial presented a high-performance AI Engine implementation of the popular MUSIC algorithm for estimating DOA using an antenna array. The MUSIC algorithm employed here adopts a QRD/SVD approach for subspace estimation that is well matched to the AI Engine compute capacity. Software pipelining techniques were employed to create a massively parallel data flow graph across $157$ AI Engine compute tiles to implement a high-performance implementation capable of processing $128\times 8$ data snaphots at a sustained throughput rate of 1 $\mu s$ per snapshot. The full MUSIC algorithm including noise subspace basis identification, MUSIC spectrum evaluation, and null detection is implemented fully in the AI Engine array with no supporting logic from PL required. A comprehensive Hardware-in-the-Loop demonstrator system is built using an external host running MATLAB to communicate to the VCK190 evaluation board over Ethernet.

References

[1]: MUSIC Algorithm

[2]: QR Decomposition

[3]: Singular Value Decomposition

[4]: One-sided Jacobi Algorithm

Appendix

Deploying the SD Card Image

Use the following steps to deploy the SD card image to the VCK190 board:

  1. Install the following tools (for Windows):

  2. Obtain the latest SD card image from the build process outlined previously.

    Refer to the following figure for the following steps:

  3. Connect the board power cord (connector 31).

  4. Power down the board (switch 30) and eject the microSD card from slot 10.

  5. Insert the micro SD card into your computer.

  6. Dismiss any Windows Explorer pop-up prompts regarding formatting the card.

  7. Run the SD card formatter tool, select the card disk letter from a drop-down list and perform a quick format (if the card shows up as several logical disks -- select the first disk letter).

  8. Run the Win32DiskImager tool and specify:

    • Image file: path to the SD card image
    • Device: the formatted card disk letter
    • Click write button and wait for the process to complete
  9. Eject the card from your computer and insert it into slot 10 on the VCK190 board.

figure

Booting the VCK190 Board

Connect the VCK190 board serial console to your computer (USBC port 8):

  1. Install the FTDI VCP drivers, if prompted (https://ftdichip.com/drivers/vcp-drivers).
  2. In Windows Device Manager, expand the section "Ports (COM & LPT) and observe three new USB serial ports that belong to the VCK190 board.
  3. Note the number of the first COM port, for example COM10.
  4. Run PuTTY and open the noted serial port with speed set to 115200.

figure

  1. Set the board DIP switch 6 as shown below.

figure

  1. Set the board DIP switch 49 as shown in the following figure.

figure

  1. Power up the board using switch 30 and observe the Linux boot prompt on the serial console.

figure

  1. Log in as petalinux.

  2. If logging in for the first time, the system prompts you to set a new password. Follow the prompts.

Simple Ethernet Configuration

Use the following steps when the VCK190 board and MATLAB host computer are on the same local network:

  1. Log in as root user on the VCK190 using sudo su.
  2. Connect the board to network via ethernet port 17 (the top one) and find the IP address it obtains over DHCP on the first ethernet port via ifconfig eth0 (for example the obtained address may be 192.168.1.10).
  3. If DHCP is not available on your network, assign the board IP address manually. Make sure the IP address belongs to a valid subnet (for example ifconfig eth0 192.168.1.10).
  4. Note the board IP address and use it when connecting from MATLAB.

Using a VPN

For remote testing through a corporate VPN connection, configure your network to expose the VCK190 board to the VPN and forward the inbound MATLAB TCP connections to its IP address. Refer to the following example:

  • Assuming two remote laptops connect to the same VPN network, one is running MATLAB (1) and another is locally connected to the VCK190 board (2).
  • Laptop 1 requires no additional configuration because it can already establish outgoing connections to the IP addresses within the VPN network.
    • You can also use onboard Ethernet port if is not occupied and the original VPN connection is through WiFi.
  • Assuming a Windows 10 system, share the VPN connection with the spare Ethernet on laptop 2.
    • Open Control Panel, Network and Internet, Network and Sharing Center, and on the left side click Change Adapter Settings.
    • Locate the virtual network adapter representing your VPN connection.
    • Locate the spare Ethernet adapter and note its name.
    • Check Allow other network users to connect through this computer's Internet connection and select the spare Ethernet network adapter name from the drop-down menu.
    • Apply the changes.

figure

  • Connect VCK190 board Ethernet port 17 (the top one) directly to the laptop 2 spare Ethernet port.
  • The board will obtain an IP address automatically through DHCP. On the board console, find out its IP address.
    • ifconfig eth0
    • For example, the address is 192.168.137.79
    • NOTE: subnet 137 is automatically assigned by Windows
  • Again open the VPN network adapter Properties, Sharing tab.
  • Click the Settings… button and add a port forwarding rule such that the incoming TCP connections to a certain port forward to the VCK190 board. For example, Port 8888 must be the same port number the VCK190 host application is listening to.

figure

  • Apply the changes. At this point, any TCP connection originating from the VPN network to the specified port are forwarded to the board.
  • Note the IP address you will need for establishing the TCP connection. In this case, it is not the board IP address, but the VPN adapter IP address.
    • Right-click the VPN network adapter, select Status, click Details…
    • Note the IPv4 address, which will belong to your VPN subnet.
    • Use this IP address when connecting from MATLAB.

Running the PS Application

Use the following steps to test the Fidus MUSIC algorithm with MATLAB:

  • Run the host application on VCK190 board:
    • sudo su
    • cd /run/media/mmcblk0p1/
    • ./host_app -c binary_container_1.xclbin -p 8888
    • -c specifies the platform configuration binary
    • -p specifies the TCP port the application is going to listen to
  • The application will load the platform configuration and start a TCP server on the given port:

figure

Testing with MATLAB

  • Connect MATLAB to the board by specifying <target ip address>:<tcp port> as described in the network configuration section, and send the input data.
  • The application receives N input snapshots from MATLAB via the TCP protocol, extracts the payload data, initializes the input and output DMA memory buffers, and executes one or more iterations of the Fidus MUSIC algorithm implementation on the AIE engine.
  • The output data from N snapshots collects in the memory and is then sent back to MATLAB via the TCP connection. This happens one batch at a time, with a fixed delay between the batches.
  • Observe the MATLAB visualization and the application console output.

figure

  • The application also reports the average AIE execution time measured over the series of input snapshots (997.257202 ns in the previous screenshot). This value is also sent back to MATLAB and presented on the visualization.
  • Without disconnecting the MATLAB TCP client, send another batch of input snapshots to execute the algorithm again.
  • Terminate the MATLAB TCP client and the application on the board will exit automatically.
    • NOTE: If the application terminates abruptly amid ongoing TCP exchange, the next time it starts, it may fail to bind its listening socket to the network interface. In this case, terminate the MATLAB client connection, and allow 1-2 minutes timeout before restarting the application to clean up the stale connection.
  • Restart the application on the board and connect MATLAB client again to perform another algorithm run.

Support

GitHub issues are used to track requests and bugs. For questions, go to support.xilinx.com.


Copyright © 2024–2025 Advanced Micro Devices, Inc.

Terms and Conditions