# FPGA Mandelbrot explorer – part 1

I have always been fascinated by fractals with their never-ending geometry and beautiful patterns. One of the most famous and most simple fractals is the Mandelbrot set, first discovered in 1978. It’s reminiscent of nature with its spirals and strange geometry.

The image above is actually plotted on the complex plane. To produce it, start with the equation:

z_{n+1} = z_n^2 + c

c represents a point on the complex plane. Then, the formula should be iterated, with the calculated value being used as the next value of z.

A point is in the Mandelbrot set if the function does not diverge (does not go off to infinity). It can be proved that if the absolute value of z ever is greater than 2, the function will shoot off to infinity. This means that it is easy to produce an image of the set. Points can be coloured differently depending on how many iterations it takes for the function to become greater than 2.

First, I wrote some C code to plot the set, to get an idea of what I’d be doing. This produced the image above. Then, I thought about some ways to speed up the calculation on an FPGA.

First, z and c can be split up into real and imaginary parts. A bit of algebra gives the following two equations:

Re(z_{n+1}) = Re(z_n)^2 - Im(z_n)^2 + Re(c)
Im(z_{n+1}) = 2Re(z_n)Im(z_n) + Im(c)

This is great as we don’t need to bother with any complex numbers. But, we can do better. As described in Bruce Dawson’s blog, the multiplication for the imaginary part can be replaced by squaring, as shown:

Im(z_{n+1}​) = (Re(z) + Im(z))^2 - Re(z)^2 - Im(z)^2 + Im(c)

This follows from the expansion of (Re(z) + Im(z))^2.

This might initially look far more complicated, but in fact can be computed much faster in hardware. To get an idea of why this is the case, let’s look at a method of multiplication – grid multiplication. Let’s multiply two numbers, 25 and 37. To do this, split the numbers up in to their tens and ones digits.

So, to find the answer, we have to add up all of the partial products:

(20\times30) + (20\times7) + (5\times30) + (5\times7) = 925

Now, let’s square 25:

The table is symmetrical along its leading diagonal. We only need to work out 20\times5 once. This saves us some processing power.

(20\times20) + 2\times(5\times20) + (5\times5) = 625

Here’s the C code with the squaring only modification:

// Arguments: C_real, C_imaginary
unsigned int mandelbrot(double cr, double ci) {
// Z_real, Z_imaginary
double zr = 0, zi = 0;
double zr_squared = 0, zi_squared = 0;

int n; // Number of iterations

// Loop until no longer bounded
for (n = 0; zr_squared + zi_squared <= 4 && n < LIMIT; ++ n) {
// Calculate new values of z
zi = pow(zr + zi, 2) - zr_squared - zi_squared + ci;
zr = zr_squared - zi_squared + cr;

zr_squared = pow(zr, 2);
zi_squared = pow(zi, 2);
}

return n - 1;
}

So, I now had an algorithm for plotting the Mandelbrot set. Now I had to think about how to translate it to my FPGA, which is a Pynq-Z2 development board. The FPGA has 220 DSP cores, which can very quickly perform a multiply-accumulate operation (P = B \times (A + D) + C). This made it perfect to use in my design.

I wanted my design to be fully pipelined. So, I started out my sketching what I wanted this pipeline to look like. This pipeline represents the equations above.

Each square represents a register stage, and greyed boxes represent squaring inside a DSP core. I wasn’t exactly sure how many pipeline stages the multipliers would need so I made sure to remember that more registers may needed to be added for the last two rows.

The 4 adders and 2 subtracters were just going to be implemented in the FPGA fabric. So, I implemented this pipeline in Vivado. To simplify things initially, I used the Xilinx ‘Multiplier’ IP core instead of custom squaring logic.

Because floating-point logic on an FPGA is a pain, I decided to use fixed point. The DSP slice contains a 18×25 signed multiplier and the Multiplier IP core can automatically generate bigger multipliers using multiple DSP slices. The largest squarer I could make with 4 DSP slices was 34×34, so I went with 34 bit fixed point, with 4 integer digits and 30 fractional digits. Of course, I planned to replace the Multiplier core as I was wasting DSP slices.

I wanted to raise the precision in the future, but figured I would start with this to make it simple. I therefore used generics for all of the word lengths throughout the pipeline module, so I could increase them easily in the future. Once I’d written the module I tested it in simulation.

After some testing and fixing errors, it was reassuring to see an output pop up some number of clock cycles after the input.

In the next post, I will cover the display unit and memory.