Data Acquisition Unit

Of all the projects I’ve worked on, the design and development of the high-speed, multi-channel data acquisition system has been the most rewarding. When I first entered the lab, all array data was acquired through a single 20 MSPS ADC which was multiplexed to the array through a matrix switch. This setup provided an excellent foundation to test imaging of static targets, but it did not allow for the implementation of a real-time system. Through my work on the data acquisition system, we can now digitize data at 50 MSPS on all elements in our array simultaneously. This has enabled high frame rate, single pulse imaging (frame rates exceeding 2k) as well as Synthetic Aperture imaging at frame rates above 30 frames per second.

The development of the data acquisition system can be separated in to two separate systems. The first system was a modular design that relied on a USB 2.0 connection for data transfer. Each element on the array had a dedicated 12 bit, 40 MSPS ADC to digitize the data. 4 ADCs were placed on a single board with a dedicated FPGA (Spartan 3E) to process and buffer the data before transfer through the USB link to the computer. The system was expanded to 64 channels by using 16 of these boards in parallel.

The second system was based on the AFE5801, a 60 MSPS, 12 bit, octal ADC which contains a software controllable analog front end. Two AFE5801s were paired to a single Virtex-5 FPGA for processing before transfer to the computer. The Virtex-5 FPGA allowed for a significant amount of processing to be done before the data reached the computer. The data passed through a software selectable signal chain which could process the data in a manner most appropriate for the given experiment. The data could be left untouched and passed directly to the computer, the data could be downsampled to allow for a lower data rate transfer back to the computer, or the data could be passed through a re-loadable 200 tap FIR filter (unique to each channel).

The goal of both systems was to do as much signal processing and beamforming in software as possible, creating a software defined ultrasound machine. Imaging with these coarsely sampled arrays is still an active area of research, by implementing the beamformers in software we are free to test out new ideas and implementations quickly and easily. The following describes each system in more detail.

4 Channel Board

The image below shows a single, 4 channel board with different functional areas identified by the colored boxes. The board was designed to fit into a 19 inch rack to allow for compact storage of the overall data collection system.

The red box, A, represents the analog front end. Each signal is brought onto the board through an SMA connection which then routes the signal to a high impedance, unity gain buffer. The output of the buffer is connected directly to the input of an AD9042 ADC. The signals generated by the elements on the array are inherently bandlimited, allowing the system to forgo an anti-aliasing filter.

The output of the ADC feeds into the digital section of the board, part B outlined in blue. This part of the system contains both the Spartan 3E FPGA as well as the supporting hardware. The supporting hardware allows the board to function as a standalone unit. An external PROM chip allows the board to reload its configuration at each power up. The JTAG connection provides a convenient programming and debugging port. The board can be alternatively run from an on-board 50 MHz crystal, or an external system clock that is used to drive all boards synchronously.

The right most part of the board, outlined in green, contains the communications module. This module presents the FPGA with an 8-bit address bus and an 8-bit data bus. These buses, along with some control lines, abstract away the lower levels of the USB protocol and instead provide a C++ API that allows the user to read and write to a register map in the FPGA.

In practice, this system was able to image at 300 fps with single wave excitation, and around 5 fps in synthetic aperture mode. The main limitation was both the bandwidth of the USB connection along with the latency in USB commands (~250 uS for each read or write command).

16 Channel Board

The goal of the second data acquisition unit was to design a platform that could evolve over time to fit the needs of various research projects. This was accomplished by making each part of the system as modular as possible. Unlike the previous unit, which contained the ADC and FPGA soldered to a single PCB, this unit used high-speed interconnects to connect the FPGA evaluation board to the ADCs through a baseboard. The picture below shows the 16 channel unit with the baseboard in place and one of two possible AFE5801s attached.

Each AFE5801 was placed on a custom designed breakout board with connections for the high-speed LVDS data, low speed serial control data, differential analog signals , and both analog and digital power supplies. The individual breakout board for each ADC allows for quick reuse in various projects with minimal additional production costs.

The baseboard serves as an intermediate layer between the FPGA evaluation board and the breakout boards. The baseboard shown above can mate with two ADC breakout boards (16 individual channels) and two DAC breakout boards (8 individual channels, not shown above). The baseboard also provides regulated power to the breakout boards as well as a blue ribbon cable connection for the analog signals (the picture above shows the blue ribbon cable connected to the ADC, there is also a connection near the top of the board for the DAC channels).

The evaluation board sits on the very bottom and houses the FPGA along with the PHY chips for two GbE connections. The system could be arranged in two modes of operation, the first mode used a single GbE connection to transfer data from both ADC modules back to the CPU, the second mode used a single GbE connection for each module. The second mode of operation allowed for double the bandwidth but required twice as many GbE ports on the CPU motherboard.

Shown below is the signal path for a single ADC channel. Each of the 16 channels has an identical signal path, however the filter used in the matched filter is unique to each channel.

Once the LVDS data signal is brought into the FPGA and deserialized it becomes one of four possible digital signals. The other signals correspond to calibration signals meant to allow for quick debugging. The chosen signal can be left untouched or directed through a CIC decimator to reduce the sampling rate by 4 and add a bit of resolution. The output of the CIC filter can again be left untouched or it can be directed through a 208 tap reloadable FIR filter. The output of this filter is rounded to 16 bits. Which 16 bits are chosen is software selectable and depends on the characteristics of the chosen filter. A bandpass filter will have very different bit growth than a matched filter requiring a different subset of output bits. Each data stream is directed to a trigger delay unit, which delays the loading of the BRAM until a trigger from the driver is detected. Once a trigger is detected, the data is loaded into a BRAM buffer before transmission to the computer.

The manner in which the FPGA evaluation board broke out the FPGA IO pins required connecting a number of high-speed clock lines to non-optimal pins. These connections made it very difficult to achieve timing closure on all 16, 300 MHz DDR data streams coming from the ADCs. Luckily the Virtex family provides IODELAY blocks which allow for IO signals to be dynamically delayed in increments of 80 ps. In addition, the AFE5801 has a number of preprogrammed waveforms which can be controlled through the chip SPI port. In order to correct any timing errors, an automatic test program was developed which would place each ADC into a debug mode to output a ramp signal, and then adjust the IODELAY timing on the FPGA until a measured BER of zero was achieved. The following shows a screen shot of that program after calibration.

The top plot shows the number of bit errors on each ADC channel for various delay increments. There are certain delay increments which result in near zero bit errors, and some which result in nearly continuous bit errors. The bottom plot shows the received ADC debug signal with the optimal timing settings.

One key system component not shown in the above images is the clock distribution board. In order for multiple FPGAs to be used in parallel, allowing for the collection of data on more than 16 channels, the boards must be synchronized to the same clock. Without synchronization, the data collected from two FPGAs could be off by up to a sampling period. In addition, if the data collection system is run on a clock separate from the driver, the per acquisition jitter can be excessively high. In order to provide the needed synchronization, a clock distribution board was developed to take in a single 50 MHz clock and distribute it to eight different boards. This allows for up to 128 channels to be acquired simultaneously.

The graphics below show the importance of synchronization between the receiver and exciter. The first figure shows two identical transmit waveforms captured from a hydrophone with and without synchronization between the receiver and exciter. A 3.5 MHz, 20 us pulse was transmitted 1000 times with all receptions overlaid on each other in the figures below. When the clocks are synchronized, each transmission is received with very little timing variations. However, when there is no synchronization and the clocks run independently, the timing variations are on the order of a sampling period.

A measurement of the exact timing variations from reception to reception (i.e. the jitter) is shown below. With clock synchronization, the jitter has a Gaussian shape with a standard deviation of 18 picoseconds. When the clocks are independent, the jitter is uniformly distributed between  +- 10 nanoseconds, or 1 sampling period.

A low jitter transmit/receive system is absolutely critical to the collection of data to be used in speckle tracking algorithms. In these algorithms, tiny shifts in the acoustic medium are tracked by looking at small time of flight (TOF) differences from acquisition to acquisition. The shifts which give rise to these TOF differences can be on the order of a micrometer, causing arrival differences of just a few hundred picoseconds. Without proper synchronization, these arrival differences would be lost in the jitter.



  1. Hello, I plan do make a similar data aquisition board for research especially in sonar field. Would be Spartan6 powerful enough to deserialize channels from AFE5801 chip ? I want to reduce the cost of the board. By the way, could you estimate the cost for yours ? Thanks!

    • My experience with deserializing the bit-stream from the AFE5801 was that the dynamically adjustable IODELAY block proved critical to eliminating bit errors. I believe the Spartan6 has something similar to the Virtex-5 IODELAY so that shouldn’t be a problem. Also, if you are looking at sonar signals you may be able to run the AFE5801 at a lower sampling rate and reduce the timing demands. If you end up only using a single AFE5801 per board you may be able to get by without the variable IODELAY, but it’s a nice tool to have if you start experiencing problems.
      The other thing to consider is the number of DSP48 slices in the FPGA you decide to use. Again, the Virtex5 DSP48 slices are a little different than the Spartan6 DSP48 slices (i.e. 25×18 bit multiplier for the Virtex5 vs. an 18×18 bit multiplier for the Spartan6) but both are very useful. This will have a huge impact on the types and complexity of the signal processing algorithms you are able to run directly on the FPGA.
      The cost to replicate a system like mine is very difficult to estimate. I was lucky to find the FPGA evaluation boards I did. They were almost perfect for this application (lots of differential IO broken out to a high-speed Samtec connector, a giga-bit Ethernet PHY on-board, etc…). If I had to design my own FPGA board with a Virtex5 it would have added greatly to the overall expense. As it was, the major cost drivers were the fabrication and assembly of the ADC boards. Good luck!

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>