This post explains how to parallelize the computational task of the Mandelbrot set and generate a terminal tool to generate images.
warning
You should read the previous post to understand the context of this post.
We left it after improving drastically the performance of our code by running benchmarks and drawing flame graphs with pprof. The problem is we almost hit the limit of the optimization following this line. Now we are going to take advantage of the tools provided by Go to implement a parallel computation that will reduce the total computational time.
Introduction
The first thing to do is to identify the parts of the computation that can be calculated independently from the others. Luckily, in the Mandelbrot set, the calculation if each point belongs to the set, is completely independent from other points. This makes really simple to split the work over multiple cores.
Next think to ask ourself is, is it worth to spawn a goroutine for each point? Reading documentation about go internals, this can be done, but it is not a good practice. This is because the task that we are performing is CPU intensive and this will effectively limit the maximum number of goroutines running to the maximum number of available cores in the system. So, theoretically, the maximum number of routines should be the number of cores.
Knowing that this encourages a producer-consumer pattern. The next questions is, what is the optimal way to split the tasks? we know that the calculation for each point is the minium part that can be calculated independently. Should we group some points as a single task? if yes, how many? I think the best way to answer this question is to benchmark the code.
Info
The idea that I’ve in mind comes from how Blender renders complex images. here is a gif of a render divided in little squares.
The Mandelbrot Picture Struct
To hold the data together and to coordinate the calculation, I’ve created a new structure called Picture. Then, we have a mandelbot.Point
, as the minimal computational unit. mandelbot.Area
, as a group of Points that will be calculated together. And mandelbot.Picture
as a group of Areas.
![Picture Description](./Picture Description.svg#center)
The previous picture is a visual representation of what the variables of the Picture structure actually mean. Blue ones are in pixels and represent the final image that will be build. The red ones are in float point precision and represent the mandelbrot area. The axis are just there as an orientation
|
|
The strategy of storing 2D data in a single array is also being used here with exactly the same approach. To initialize the data, I also have the picture.Init
function that initializes the picture and all the recursive areas. This function performs the calculation to assign to each area the corresponding section of the complex area that must render. Based on the x,y position and the ChunkSize, it is easy to calculate the TopLeft and BottomRight of each area. The only tricky thing is that to move from TopToBottom, we need to subtract instead of adding. thats why you see the minus sign next to the imaginary parts. Also, after creating the Area with the calculated values, there is a call to area.Init
to guarantee that everything is initialised.
|
|
Calculation
Let’s start with a simple loop to calculate all the areas inside the Picture. It will look like this.
|
|
The firs way is to decide how are we going to split the areas slice for the workers. My first attempt was to slice the slice intro subparts with the same number of elements, but this approach has a few drawbacks. First, when the operation len(p.areas)/workers generates a remained, it becomes tricky which worker should do it. Also, not all the areas require the same number of iterations, so this approach generates unbalanced workers. This is how the code looks like
|
|
Task Synchronization With sync.WaitGroup
The firs issue with this code is that the Calculate function no longer blocks the code and it exist before the computation finishes. To address this issue, we have sync.WaitGroup
. A wait group is a simple structure that keeps track of the works that are being executed in parallel and allows easy synchronization to wait for all of them to finish. After creating the wg := &sync.WaitGroup{}
, you must call to wg.Add
with the number of expected workers and then, call wg.Done
when each of the workers finish. This allows to wait for all of them to exit with a simple call to wg.Wait
|
|
Work Queue
I’m not really happy with the approach of splitting the work for the various reasons I’ve described before. In the second attempt, I decided to generate queue with all the areas pending to be processed and the workers just pick new jobs from there until all are done. Let’s create the function to initialize the queue based on a channel.
|
|
Given a workCount, this function returns a channel that returns sequentially the numbers from 0 to workCount - 1. This fits perfectly to the indexes of a slice and this is how we are going to use it. First, we have to modify our workers to accept the work from a channel.
|
|
The range function used over a channel, returns elements until the channel is closed. This is really handy to implement the work queue. The code looks even easier than before. Let’s see all together.
|
|
Work Progress Report
Remember when we tried to implement progress reporting using channels and this drastically reduced the performance? well, It was clearly over engineered to report every iteration of the calculation. But with this new structure, it would be easy and cheap to report each time one of the workers finish the calculation of an area. This will enable potential consumers to start drawing it or just notify us of the global process. I will be using the same approach as the workQueue. The calculate function will return a channel that reports the updates. To keep things clear, I will also create the method picture.CalculateAsync
to create the channel for the consumer.
|
|
Calculation Execution Context
The doneIndex channel carries the index of the Area that has been finished. The consumer only needs to wait for a number to arrive, and then, get the corresponding Area from the Picture. We will see this code later. Now we have to add one last thing to this code. The calculation can take a long time or we may require to stop it before finishing. To implement this, we have the context.Context
. From the docs A Context carries a deadline, a cancellation signal, and other values across API boundaries. let’s see how the code looks like.
|
|
So the tricky thing here is that the workQueue now has two options. A. Publish a new work to be done to the next
channel. B. receive a ctx.Done
signal and close the next
channel. When B happens. this will stop the workers after they finish the current task. This is not the ideal situation because an external user will expect the function to return almost immediately after context cancellation. The problem is that achieving this, can be a little more trick because we cannot just cancel a running goroutine. Also, we cannot close the doneIndex
channel until all the workers have finished their work because, if the channel is closed, the workers will panic when publishing the index after calculation. Nevertheless, It is possible to handle such case with a similar select statement to the one used in the workerQueue
and adding another go routine in the calculate function to listen for the wg.Wait()
. But I think this is too complex for this case. so I will keep it like this for now.
Configuration And Benchmarks
We have everything ready run our code in parallel, but there are a lot of different setting configurations that can affect the performance of the calculation. The best way to find the appropriate configuration is to run benchmarks and see where the performance is better.
This is the list of parameters that we can modify to change how the work is going to be processed by the workers.
- Image dimensions
- HorizontalImageChunks
- VerticalImageChunks
- ChunkImageSize
- Workers
To keep a constant image size, the following operation must be constant.
$$HorizontalImageChunksVerticalImageChunksChunkImageSize^2$$
Now we need to choose an image size and try all the combinations of these values. I’ve started with an image of 1024x1024 in the Mandelbrot area topLeft:(-1.401854499759, -0.000743603637), bottomRight: (−1,180199459759,−0,222398643637)
. The most fragmented case is when I divide the image in 1024x1024 areas of 1 pixel and the less fragmented is just 1 area of 1024x1024 pixels. In the following graph you can see the benchmark results for all the power of 2 combinations from the most fragmented to the less fragmented. In the X axis, the numbers of the previous multiplication. in the Y axis, the s/op returned by the benchmark. The Y axis is in logarithmic scale to improve the visualization. You can hover the mouse over any point to see the exact value. The horizontal lines represent the optimal speed based on the 1 worker benchmark. is expected to get x2 x3 x4 etc… speed for each extra worker. You also need to know that my computer has 6 cores.
There a few interesting things in this graph.
- If the image only contains 1 area, the benchmark is independent of the number of cores. You can see this at the right point of the graph.
- If the image is divided in too many areas. The performance is deteriorated for all the setups. This is the left part of the graph.
- All the lines have a parabolic shape with a minimum in 32x32x32. This is the best setup for my computer.
- Selecting the minimum point, the performance is doubled almost perfectly for each worker added until reaching the system limit of 6 cores.
- The performance still increases after 6 workers, but it never reaches the x6 speed.
- The performance in 512x512x2 is the same after the 2nd worker. I think this is because not all the sections in the set are equally expensive to calculate. The hardest one is closer to the right top because is almost all black.
Here is the code that runs the benchmarks.
|
|
Click here to see all code
|
|
It is important that the caller must read the channel done
until the calculation ends because, if there is no receiver, the channel is blocked.
The saveImage
is a flag passed to the test command to keep the generated image for each benchmark. This allows to visually verify that the benchmark is actually rendering the right image.
This is the section selected for the benchmarks. I’m using a different method to give color to the set this time. I’m using a color palette with 7 colors and I assign it based on the number of iterations. color[iterations % len(color)]
This is awesome because always give the same color to a section independently of the zoom.
Mandelbrot CLI
Let’s create a CLI to generate awesome images of any given area.
To ease the final usage, I’ve simplified the input parameters to some more human-friendly. The default values, generate a global view of the Mandelbrot in a 1920x1920 image.
|
|
With this flags, I directly call a mandelbot.NewPicture
function that accepts this parameters and handles the conversion to the original parameters needed by the mandelbot.Picture
. The only thing to adjust is the chunkSize based on the number of divisions to ensure that we draw the same area. The drawback is that we need to carefully select the number of divisions or it will be impossible to split the area in a whole number of divisions.
|
|
After calling pic.Init()
, we can call pic.CalculateAsync()
to get the doneIndex
channel that will notify us about the progress. As we need a little bit of setup with a context and a picture. I will wrap the calculation in a function Calculate
that accepts the pic
and returns an image.RGBA
ready to be exported.
|
|
The mandelbot.Picture
provides utility method that return information about the final image dimensions. This way, I can prepare the final image. In the for loop we wait for updates from the calculation and call a function to paint the finished Area into the final image. The method picture.GetImageOffserFor
accepts an index and returns the x,y
positions in the resolution of the final image. If the context is cancelled. the Calculate
function returns a partial image and the error that canceled the context. This way, if the context is cancelled. The function returns almost immediately. This could be a serious bug because workers will never be able to finish their work. But it is not a problem here because the program will just save the image and exit. I will solve this in future posts.
Let’s take a look into the paintAreaInImage
function. It’s purpose is to iterate over all the pixels in the area, assign them a color, and add it to the final Image. The function getColor
accepts (point mandelbrot.Point, palette []color.RGBA, maxIterations int, maxIterationsColor color.RGBA)
The color is chosen by dividing the iterations over the len of the palette. This ensures that we the next color for every iterations and that we loop back to the first one when the length is reached. The maxIterations is needed to paint the “Unknown” area of a different color. In this case, Black color.
|
|
When the function Calculate
returns, we have a img object that must be saved to a file. If you pay attention to the parameters accepted by the program, it says that accepts png and jpeg formats. To do so, we can use the function filepath.Ext
to find the extension of the file name. As go already provides codecs for jpeg and png. The code is straight forward.
|
|
Click here to see all code
|
|
Conclusion
This has been really interesting. I was not expecting a x6 speed but looks like I was wrong. I think that the optimal number of divisions comes from a tread-off between the capacity to split the work over multiple cores and the extra time required to send data over channels. I would like to benchmark the code again and see where is the bottle neck for each configuration.
I think that the next project will be to create a mandelbrot-server that provides a webpage when the user will be able to zoom in the set dynamically. It would be awesome!
Checkout the whole code in the repository. Here is the link to the tag that hold the code at this moment. You can also download the binary for linux! just in case you want to run it without installing go.
Thank you for reading and feel free to leave a comment bellow, I will be really happy to hear from you.