I’ve recently developed an interest in Software Defined Radio (SDR). The idea behind SDR is to replace as much radio hardware as possible with software based algorithms. A simple example would be the envelope detection of an AM signal. Where a hardware based demodulator may use a diode and a capacitor, an SDR would digitize the modulated signal and then use digital signal processing techniques to demodulate the signal (e.g. convert the signal to its analytic representation, take the magnitude and remove any dc bias). I’m most interested in SDR designs that manage to get the ADC very close (“signal chain”-wise) to the antenna, relying on minimal front-end processing before digitizing the signal. This increases the data processing burden, but it allows for greater flexibility in how the signal is processed. Based on this interest, I decided to see if I could make a basic SDR that relied only on very limited external hardware but would still be capable of receiving interesting signals.

As part of my research at the University of Minnesota, I use an AFE5801 from Texas Instruments to digitize ultrasound signals. This is a single IC containing 8 separate 12 bit ADCs, each with a variable gain amplifier and software selectable anti-aliasing filter. The signal path of a single channel is shown below.

The signal is first run through the variable gain amplifier where it can be amplified up to 30 Db. It is then optionally passed through an anti-aliasing filter with a cutoff frequency of either 7, 12, or 14 MHz, before being digitized at a rate of up to 60 MSPS. In general this ADC would not be a good candidate for an SDR system. The analog front end limits the overall bandwidth of the ADC to HF frequencies. For a high quality SDR system, I’d like at least a 14 bit ADC, a sampling frequency over 100 MSPS, and a large bandwidth (~700 MHz) so that I could undersample signals above Nyquist. Given that I’m trying to make a very simple SDR, and I have this ADC available, I decided to use it.

The ADC is only part of any SDR, the other part is the FPGA. I decided on using a Xilinx Virtex-5 FPGA on an evaluation board from AVNET electronics. This FPGA has a hardware MAC, which along with the PHY on the evaluation board, allows for a Gigabit Ethernet (GbE) connection between the FPGA and the computer. I think the (GbE) connection is an ideal connection for an SDR. It’s common, high-bandwidth (125 Mb/S), and can support easy to use protocols (e.g. UDP). Sampling at full speed with this a 12 bit ADC, the GbE connection would have more than enough bandwidth to accommodate the full undecimated data stream. I did play around with capturing data straight from the ADC and streaming it back to my computer, this worked well but even a few seconds of data collection led to very large files. This also felt like a waste of FPGA power. So, I decided to implement an IQ demodulator followed by a downsampler to select a frequency of interest and lower the overall data rate. I chose an overall decimation factor of 1024 to give me a bandwidth of about 48.8 KHz (I typically run the ADC at 50 MSPS instead of the full 60 MSPS). This decimation is performed in two steps. First, a Cascade Integrator Comb (CIC) filter is used to decrease the sampling rate by 512, this is followed by a compensation filter and additional 2X decimator. The reason for this cascade of operations will be explained below.

The DSP processing performed in the FPGA is shown in the figure below along with the data rates and bit widths at each stage.

The signal processing chain was developed with Xilinx’s ISE Design Suite and the core generator application. If you’re not familiar with Xilinx’s development tools, the core generator allows you to customize IP cores from Xilinx. The multipliers, numerically controlled oscillator, CIC decimator, and the polyphase FIR filter were all created with the core generator application and then stitched together in VHDL.

Overall, the design is relatively straightforward. The data is first taken in from the ADC as a serial LVDS data stream with data presented to the FPGA on both the rising and falling edge of a 300 MHz clock. It’s not shown in the diagram above, however, before the deserialization process, the data is passed through an IODELAY block. This block passes the data stream through an adjustable delay line, allowing for the data stream to be dynamically delayed in increments of about 80 ps. When combined with preprogrammed output patterns from the AFE5801, this block can be a great help in achieving timing closure.

#### IQ Demodulation

The first step in the IQ demodulation is generating a sine and a cosine wave at the frequency of interest. Luckily, the DDS core from Xilinx makes this straightforward. The core takes in a fixed-point fraction that specifies the wave frequency as a percent of the sampling frequency, and outputs both a sine and a cosine wave at the specified frequency.

The output of the DDS is fed into two multipliers. The multiplication of the incoming signal with the cosine wave forms the in-phase signal, while the multiplication with the sine wave forms the quadrature signal. The multiplication of these two signals results in bit growth that must be dealt with in some manner. One option is to just accommodate the bit growth and expand the width of the registers throughout the rest of the signal chain. The other option, the one used here, is to select a subset of bits from the full precision output. A general description of which bits and how many bits to select is probably beyond the scope of a blog post, but for this application I chose to keep the bit width constant (I wanted to investigate the effects of various rounding schemes, and keeping only 12 bits ensured the rounding played a crucial role in the overall system performace) and to use the 12 most significant product bits (the sine and cosine factors are guaranteed to remain between +- 1). The nice thing about the FPGA is that you have a lot of freedom in determining the overall bit growth. If you’d like to keep more than 12 bits after the multiplier (and there are some good reasons why you may want to do this…) you can. The final issue, with regards to the multiplier output, is whether to truncate the result or round the result down to 12 bits. This decision greatly impacts the overall performance of the system and I’ll talk about it further down.

#### CIC Decimator

The next stage in the signal chain is the CIC decimator. The CIC decimator manages to filter and decimate the signal without the use of any multipliers, making it an attractive signal processing tool. It uses a recursive implementation of a moving average filter to serve as an anti-aliasing filter. By utilizing the Noble Identities, large rate changes can be implemented with very little resource usage. Besides the rate change, there are two adjustable parameters for the CIC filter: the number of stages, and the differential delay. The number of stages controls how many moving average filters are used. The more filters used, the greater the stop band rejection. The differential delay controls the placement of the nulls in the filter’s response. For this designed I used a 5 stage CIC filter with a differential delay of 2.

The one problem with the CIC filter is its passband characteristics, it suffers from severe droop as you move away from DC.

The fix for this passband droop is to apply a compensation filter to the CIC output such that the combined response produces a flat passband.

#### Rounding

Before talking about the compensation filter, I need to talk about the effects of rounding vs. truncating the output of these filters. Imagine if we were only interested in the most significant 16 bits of a 32 bit word. The simplest way to extract these 16 bits would be to just ignore the least significant 16 bits and take only the upper 16, this is truncation. The problem with truncating is that we make no use of the information contained in the least significant bits. A smarter approach is to round our upper 16 bits either up or down 1 bit based on the lower 16 bits. Not only does this cause the selected 16 bits to be a more accurate representation of the original word, it also eliminates bias. Bias shows up as a dc component in our signal spectrum (a negative component for two’s complement representation), and depending upon your application, can have significant detrimental effects. In an IQ demodulator, the frequency of interest is brought down to baseband, so any DC offset will show up as a large spike right in the middle of the spectrum.

The figures below show how the output spectrum is affected by truncating vs. rounding at different stages. These plots were generated by listening to a 4.0 MHz signal with an IQ frequency of 3.995, placing the signal of interest 5 KHz away from the DC component. In addition, the compensation filter was bypassed and the CIC decimation was increased from 512 to 1024.

Truncating the result at the output of the multiplier leads to very serious degradation of the signal. Not only do we see a very large DC component, we also see an artifact at -5 KHz. It’s not surprising that truncation at this stage would cause such a noticeable effect, by truncating to 12 bits there is very little margin for error.

In comparison to the multiplier, the truncated CIC output exhibits slightly better behavior. The artifact is gone, but the DC bias remains.

It’s only by rounding at each stage that the DC bias is eliminated.

Rounding can be incorporated into some of the Coregen IP during the initial configuration. In doing this, you allow the Xilinx IP to choose which bits from the full precision output will be used. This can be a problem since Xilinx doesn’t know what types of inputs you are placing on your signals. For instance, the AFE5801 scales its 12 bit word to represent a 2 Volt peak-to-peak signal; however, I know that my signal level will never exceed 1 Volt peak-to-peak. I can use this information in choosing which bits to keep. There are also times when you may want to dynamically configure which bits you keep and which you discard. For this reason, I like to configure the DSP blocks to produce the full precision output and then I’ll round and select which bits to use. The downside to this, in addition to increased resource consumption, is that I must implement the rounding myself, luckily this isn’t too hard.

The only challenge in rounding is to figure out if a number like 12.XXX is closer to 12 or 13. The simplest way to do this is to add 0.5 to the number and then take the integer part. If XXX is less than 0.5, adding 0.5 and taking the integer will give 12. If XXX is greater than 0.5, adding 0.5 and taking the integer will give 13. The only tricky situation is if XXX is equal to 0.5 and thereby lies equal distance between 12 and 13. There are a number of different schemes to deal with this situation, you can simply always round up, you can always round down, you can round towards the even number, you can round towards the odd number, or you can randomly choose which number to round to. For this system, I chose to randomly round up or down. The figure below shows how this is done for a 12 bit word rounded to an 8 bit word.

The light blue signifies the bits we’d like to keep, and the red represents the bits which will be eliminated. The gray bits signify the output of the rounder, and the brown bits are ignored. The green represents a single bit which has equal probability of being a “1” or a “0”. If the bits in red are equal to anything other than “1000”, the random bit will have no influence on the rounding. When the bits in red are equal to “1000”, a random bit equal to 1 will cause the rounder to round up, and a random bit equal to 0 will cause the rounder to round down. The random bit is supplied by a 24 bit linear feedback shift register (LFSR). This LFSR will generate a pseudo-random bit stream with a period of 2^24 -1. At 50 MHz, this will cause the bit stream to repeat every 335 ms. By utilizing the carry in of a two port adder, the above procedure can be implemented with a single adder. One port is connected to the 12 bit signal, one port is connect to the constant “0111”, and the carry in bit is connected to the random bit stream.

This same structure can be used to implement a variety of rounding strategies. If the green bit is tied to the least significant blue bit, the rounder will always break a rounding tie by rounding towards the even number. Tie the green bit to an inversion of the least significant blue bit and the rounder will break ties by rounding towards the odd number.

#### Compensation Filter

The compensation filter is the final data processing step before the FPGA sends the data to the computer. The goal of the compensation filter is to even out the passband and create a sharp cutoff frequency. It accomplishes this by first filtering the data with an inverse sinc-like filter with an appropriate frequency cutoff, and then decimating by a factor of 2.

The figure below shows the desired and realized compensation filter.

The Matlab command **fir2**, with a kaiser window, was used to generate a 128 tap filter which approximated the desired frequency response. This filter has an approximate cutoff frequency of 21 KHz, giving the overall system a usable bandwidth of just over 40 KHz. The m-code used to generate this filter can be found here.

The filter coefficients must be quantized before they can be used in the FPGA. I chose to use 16 bit coefficients with a fixed point representation using three integer bits and twelve fractional bits. I decided to sacrifice a little dynamic range to ensure that the DC response of the filter was a power of two. Doing this ensured that any scaling introduced by the compensation would be a power of two and therefore could be removed by simply shifting bits.

The figure below shows the results of a simple test I ran to judge the effectiveness of the compensation filter.

The solid blue line represents the calculated CIC frequency response for a 5 stage, decimation by 1024, with a differential delay of 2. The dashed black line represents the measured response (taken at 1 KHz intervals) with the 1024 CIC filter and no compensation filter. The dashed red line represents the frequency response for the combination 512x CIC and 2x compensation filter. Clearly the cascade of the CIC and the compensation filter offers a huge improvement in the overall frequency characteristics.

#### Data Offload

Two buffers are used to capture the data out of the compensation filter and offload it through the GbE. The double buffering allows one buffer to collect data while the other buffer streams its data to the computer. Every 1500 samples the buffers swap. The just collected data is encapsulated in a UDP packet ~6000 bytes long. This packet length is longer than the MTU of Ethernet, but my GbE card supports jumbo frames, so I can get away with it.

#### That’s all…

That pretty much sums up the system so far. I’ll try to collect and post some of the results in a few days. I’ve been using a simple dipole antenna connected directly to the ADC input with reasonable success. I’ve been able to receive local AM radio stations with no problems, I even managed to receive the NIST atomic clock signal at 10 MHz.

Moving forward, I’d like to implement this receiver structure for each of the 8 channels on the AFE5801 to create a basic phased array. I should have plenty of bandwidth in the GbE, I’ll just have to play around with the resource consumption to ensure I can fit 8 signal chains on a single FPGA. But, that’s for another day…

## 0 Comments.