Mandelbrot Parallel Computation

This post explains how to parallelize the computational task of the Mandelbrot set and generate a terminal tool to generate images.

warning

You should read the previous post to understand the context of this post.

We left it after improving drastically the performance of our code by running benchmarks and drawing flame graphs with pprof. The problem is we almost hit the limit of the optimization following this line. Now we are going to take advantage of the tools provided by Go to implement a parallel computation that will reduce the total computational time.

Introduction

The first thing to do is to identify the parts of the computation that can be calculated independently from the others. Luckily, in the Mandelbrot set, the calculation if each point belongs to the set, is completely independent from other points. This makes really simple to split the work over multiple cores.

Next think to ask ourself is, is it worth to spawn a goroutine for each point? Reading documentation about go internals, this can be done, but it is not a good practice. This is because the task that we are performing is CPU intensive and this will effectively limit the maximum number of goroutines running to the maximum number of available cores in the system. So, theoretically, the maximum number of routines should be the number of cores.

Knowing that this encourages a producer-consumer pattern. The next questions is, what is the optimal way to split the tasks? we know that the calculation for each point is the minium part that can be calculated independently. Should we group some points as a single task? if yes, how many? I think the best way to answer this question is to benchmark the code.

Info

The idea that I’ve in mind comes from how Blender renders complex images. here is a gif of a render divided in little squares.

The Mandelbrot Picture Struct

To hold the data together and to coordinate the calculation, I’ve created a new structure called Picture. Then, we have a mandelbot.Point, as the minimal computational unit. mandelbot.Area, as a group of Points that will be calculated together. And mandelbot.Picture as a group of Areas.

![Picture Description](./Picture Description.svg#center)

The previous picture is a visual representation of what the variables of the Picture structure actually mean. Blue ones are in pixels and represent the final image that will be build. The red ones are in float point precision and represent the mandelbrot area. The axis are just there as an orientation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


type Picture struct {
	TopLeft       complex128
	ChunkSize     float64
	MaxIterations int

	HorizontalImageChunks int
	VerticalImageChunks   int
	ChunkImageSize        int

	areas []Area
}

The strategy of storing 2D data in a single array is also being used here with exactly the same approach. To initialize the data, I also have the picture.Init function that initializes the picture and all the recursive areas. This function performs the calculation to assign to each area the corresponding section of the complex area that must render. Based on the x,y position and the ChunkSize, it is easy to calculate the TopLeft and BottomRight of each area. The only tricky thing is that to move from TopToBottom, we need to subtract instead of adding. thats why you see the minus sign next to the imaginary parts. Also, after creating the Area with the calculated values, there is a call to area.Init to guarantee that everything is initialised.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


func (p *Picture) Init() {
	p.areas = make([]Area, p.HorizontalImageChunks*p.VerticalImageChunks)
	for i := 0; i < len(p.areas); i++ {
		x, y := p.ForIndex(i)
		areaTopLeft := p.TopLeft + complex(p.ChunkSize*float64(x), -p.ChunkSize*float64(y))
		areaBottomRight := areaTopLeft + complex(p.ChunkSize, -p.ChunkSize)

		p.areas[i] = Area{
			TopLeft:              areaTopLeft,
			BottomRight:          areaBottomRight,
			HorizontalResolution: p.ChunkImageSize,
			VerticalResolution:   p.ChunkImageSize,
			MaxIterations:        p.MaxIterations,
		}
		p.areas[i].Init()
	}
}

Calculation

Let’s start with a simple loop to calculate all the areas inside the Picture. It will look like this.

1
2
3
4
5


func (p *Picture) Calculate() {
	for i := 0; i < len(p.areas); i++ {
		p.areas[i].Calculate()
	}
}

The firs way is to decide how are we going to split the areas slice for the workers. My first attempt was to slice the slice intro subparts with the same number of elements, but this approach has a few drawbacks. First, when the operation len(p.areas)/workers generates a remained, it becomes tricky which worker should do it. Also, not all the areas require the same number of iterations, so this approach generates unbalanced workers. This is how the code looks like

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


func (p *Picture) Calculate(workerCount int) {
	works := splitWorks(p.areas, workerCount)

	for w := 0; w < len(p.works); w++ {
		go doWork(works[w])
	}
}

func splitWorks(areas []Area, workerCount int) (works [][]Area) {
	// Split []Area in [][]Area, omitted for simplicity
}

func doWork(areas []Area){
	for i := 0; i < len(p.areas); i++ {
		areas[i].Calculate()
	}
}

Task Synchronization With sync.WaitGroup

The firs issue with this code is that the Calculate function no longer blocks the code and it exist before the computation finishes. To address this issue, we have sync.WaitGroup. A wait group is a simple structure that keeps track of the works that are being executed in parallel and allows easy synchronization to wait for all of them to finish. After creating the wg := &sync.WaitGroup{}, you must call to wg.Add with the number of expected workers and then, call wg.Done when each of the workers finish. This allows to wait for all of them to exit with a simple call to wg.Wait

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


func (p *Picture) Calculate(workerCount int) {
	// Setup wg and WorkerCount
	wg := &sync.WaitGroup{}
	wg.Add(workerCount)

	works := splitWorks(p.areas, workers)

	for w := 0; w < len(p.works); w++ {
		// send the wg to the worker
		go doWork(wg, works[w])
	}
	// Wait for all the workers to finish
	wg.Wait()
}

func splitWorks(areas []Area, workers int) (works [][]Area) {
	// Split []Area in [][]Area, omitted for simplicity
}

func doWork(wg *sync.WaitGroup, areas []Area){
	// when the work is done, notify the wg
	defer wg.Done()
	for i := 0; i < len(p.areas); i++ {
		areas[i].Calculate()
	}
}

Work Queue

I’m not really happy with the approach of splitting the work for the various reasons I’ve described before. In the second attempt, I decided to generate queue with all the areas pending to be processed and the workers just pick new jobs from there until all are done. Let’s create the function to initialize the queue based on a channel.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


func workQueue(workCount int) <-chan int {
	next := make(chan int)
	go func() {
		for i := 0; i < workCount; i++ {
			next <- i
		}
		close(next)
	}()
	return next
}

Given a workCount, this function returns a channel that returns sequentially the numbers from 0 to workCount - 1. This fits perfectly to the indexes of a slice and this is how we are going to use it. First, we have to modify our workers to accept the work from a channel.

1
2
3
4
5
6
7


func doWork(wg *sync.WaitGroup, areas []Area, next <-chan int){
	defer wg.Done()

	for i := range next {
		areas[i].Calculate()
	}
}

The range function used over a channel, returns elements until the channel is closed. This is really handy to implement the work queue. The code looks even easier than before. Let’s see all together.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


func (p *Picture) Calculate(workerCount int) {
	wg := &sync.WaitGroup{}
	wg.Add(workerCount)

	next := workQueue(len(p.areas))

	for worker := 0; worker < workerCount; worker++ {
		go doWork(wg, p.areas, next)
	}

	wg.Wait()

}

func workQueue(workCount int) <-chan int {
	next := make(chan int)
	go func() {
		for i := 0; i < workCount; i++ {
			next <- i
		}
		close(next)
	}()
	return next
}

func doWork(wg *sync.WaitGroup, areas []Area, next <-chan int) {
	defer wg.Done()
	for i := range next {
		areas[i].Calculate()
	}
}

Work Progress Report

Remember when we tried to implement progress reporting using channels and this drastically reduced the performance? well, It was clearly over engineered to report every iteration of the calculation. But with this new structure, it would be easy and cheap to report each time one of the workers finish the calculation of an area. This will enable potential consumers to start drawing it or just notify us of the global process. I will be using the same approach as the workQueue. The calculate function will return a channel that reports the updates. To keep things clear, I will also create the method picture.CalculateAsync to create the channel for the consumer.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39


func (p *Picture) Calculate(ctx context.Context, workerCount int, doneIndex chan<- int) {
	wg := &sync.WaitGroup{}
	wg.Add(workerCount)

	next := workQueue(ctx, len(p.areas))

	for worker := 0; worker < workerCount; worker++ {
		go doWork(wg, p.areas, next, doneIndex)
	}

	wg.Wait()

	close(doneIndex)
}

func (p *Picture) CalculateAsync(ctx context.Context, workerCount int) <-chan int {
	doneIndex := make(chan int)
	go p.Calculate(ctx, workerCount, doneIndex)
	return doneIndex
}

func workQueue(ctx context.Context, workCount int) <-chan int {
	next := make(chan int)
	go func() {
		for i := 0; i < workCount; i++ {
			next <- i
		}
		close(next)
	}()
	return next
}

func doWork(wg *sync.WaitGroup, areas []Area, next <-chan int, doneIndex chan<- int) {
	defer wg.Done()
	for i := range next {
		areas[i].Calculate()
		doneIndex <- i
	}
}

Calculation Execution Context

The doneIndex channel carries the index of the Area that has been finished. The consumer only needs to wait for a number to arrive, and then, get the corresponding Area from the Picture. We will see this code later. Now we have to add one last thing to this code. The calculation can take a long time or we may require to stop it before finishing. To implement this, we have the context.Context. From the docs A Context carries a deadline, a cancellation signal, and other values across API boundaries. let’s see how the code looks like.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43


func (p *Picture) Calculate(ctx context.Context, workerCount int, doneIndex chan<- int) {
	wg := &sync.WaitGroup{}
	wg.Add(workerCount)

	next := workQueue(ctx, len(p.areas))

	for worker := 0; worker < workerCount; worker++ {
		go doWork(wg, p.areas, next, doneIndex)
	}

	wg.Wait()

	close(doneIndex)
}

func (p *Picture) CalculateAsync(ctx context.Context, workerCount int) <-chan int {
	doneIndex := make(chan int)
	go p.Calculate(ctx, workerCount, doneIndex)
	return doneIndex
}

func workQueue(ctx context.Context, workCount int) <-chan int {
	next := make(chan int)
	go func() {
		for i := 0; i < workCount; i++ {
			select {
			case <-ctx.Done():
				return
			case next <- i:
			}
		}
		close(next)
	}()
	return next
}

func doWork(wg *sync.WaitGroup, areas []Area, next <-chan int, doneIndex chan<- int) {
	defer wg.Done()
	for i := range next {
		areas[i].Calculate()
		doneIndex <- i
	}
}

So the tricky thing here is that the workQueue now has two options. A. Publish a new work to be done to the next channel. B. receive a ctx.Done signal and close the next channel. When B happens. this will stop the workers after they finish the current task. This is not the ideal situation because an external user will expect the function to return almost immediately after context cancellation. The problem is that achieving this, can be a little more trick because we cannot just cancel a running goroutine. Also, we cannot close the doneIndex channel until all the workers have finished their work because, if the channel is closed, the workers will panic when publishing the index after calculation. Nevertheless, It is possible to handle such case with a similar select statement to the one used in the workerQueue and adding another go routine in the calculate function to listen for the wg.Wait(). But I think this is too complex for this case. so I will keep it like this for now.

Configuration And Benchmarks

We have everything ready run our code in parallel, but there are a lot of different setting configurations that can affect the performance of the calculation. The best way to find the appropriate configuration is to run benchmarks and see where the performance is better.

This is the list of parameters that we can modify to change how the work is going to be processed by the workers.

Image dimensions
- HorizontalImageChunks
- VerticalImageChunks
- ChunkImageSize
Workers

To keep a constant image size, the following operation must be constant.

$$HorizontalImageChunksVerticalImageChunksChunkImageSize^2$$

Now we need to choose an image size and try all the combinations of these values. I’ve started with an image of 1024x1024 in the Mandelbrot area topLeft:(-1.401854499759, -0.000743603637), bottomRight: (−1,180199459759,−0,222398643637). The most fragmented case is when I divide the image in 1024x1024 areas of 1 pixel and the less fragmented is just 1 area of 1024x1024 pixels. In the following graph you can see the benchmark results for all the power of 2 combinations from the most fragmented to the less fragmented. In the X axis, the numbers of the previous multiplication. in the Y axis, the s/op returned by the benchmark. The Y axis is in logarithmic scale to improve the visualization. You can hover the mouse over any point to see the exact value. The horizontal lines represent the optimal speed based on the 1 worker benchmark. is expected to get x2 x3 x4 etc… speed for each extra worker. You also need to know that my computer has 6 cores.

Code could not finish, this are some reasons why this happen. - Plot name not defined. The first parameter of the shortcode is the name. - There is a syntax error. check browser console.

There a few interesting things in this graph.

If the image only contains 1 area, the benchmark is independent of the number of cores. You can see this at the right point of the graph.
If the image is divided in too many areas. The performance is deteriorated for all the setups. This is the left part of the graph.
All the lines have a parabolic shape with a minimum in 32x32x32. This is the best setup for my computer.
Selecting the minimum point, the performance is doubled almost perfectly for each worker added until reaching the system limit of 6 cores.
The performance still increases after 6 workers, but it never reaches the x6 speed.
The performance in 512x512x2 is the same after the 2nd worker. I think this is because not all the sections in the set are equally expensive to calculate. The hardest one is closer to the right top because is almost all black.

Here is the code that runs the benchmarks.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


func benchmarkComplexPictureChunks(b *testing.B, HorizontalImageChunks, VerticalImageChunks, ChunkImageSize, workers int) {
	for i := 0; i < b.N; i++ {
		b.StopTimer()
		pic := &mandelbrot.Picture{
			TopLeft:               complex(-1.401854499759, -0.000743603637),
			MaxIterations:         3534,
			ChunkSize:             0.00021646 * float64(ChunkImageSize),
			HorizontalImageChunks: HorizontalImageChunks,
			VerticalImageChunks:   VerticalImageChunks,
			ChunkImageSize:        ChunkImageSize,
		}
		ctx := context.Background()
		pic.Init()
		b.StartTimer()

		done := pic.CalculateAsync(ctx, workers)
		for range done {

		}
		if *saveImage {
			b.StopTimer()
			saveImageFrom(pic, fmt.Sprintf("bench_%d_%d_%d_w_%d.jpeg", pic.HorizontalImageChunks, pic.VerticalImageChunks, pic.ChunkImageSize, workers))
		}
	}
}

Click here to see all code

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368


var saveImage = flag.Bool("saveImage", false, "save the result of running the benchmarks")

func benchmarkComplexPictureChunks(b *testing.B, HorizontalImageChunks, VerticalImageChunks, ChunkImageSize, workers int) {
	for i := 0; i < b.N; i++ {
		b.StopTimer()
		pic := &mandelbrot.Picture{
			TopLeft:               complex(-1.401854499759, -0.000743603637),
			MaxIterations:         3534,
			ChunkSize:             0.00021646 * float64(ChunkImageSize),
			HorizontalImageChunks: HorizontalImageChunks,
			VerticalImageChunks:   VerticalImageChunks,
			ChunkImageSize:        ChunkImageSize,
		}
		ctx := context.Background()
		pic.Init()
		b.StartTimer()

		done := pic.CalculateAsync(ctx, workers)
		for range done {

		}
		if *saveImage {
			b.StopTimer()
			saveImageFrom(pic, fmt.Sprintf("bench_%d_%d_%d_w_%d.jpeg", pic.HorizontalImageChunks, pic.VerticalImageChunks, pic.ChunkImageSize, workers))
		}
	}
}

func saveImageFrom(pic *mandelbrot.Picture, name string) {
	img := image.NewRGBA(image.Rect(0, 0, pic.HorizontalResolution(), pic.VerticalResolution()))
	for i := 0; i < pic.HorizontalImageChunks*pic.VerticalImageChunks; i++ {
		offsetX, offsetY := pic.GetImageOffsetFor(i)
		paintAreaInImage(img, pic.GetArea(i), offsetX, offsetY)
	}
	outFile, err := os.Create(name)
	if err != nil {
		log.Fatalf("output file cannot be opened, cause: %s", err)
	}
	defer outFile.Close()

	encodingError := jpeg.Encode(outFile, img, &jpeg.Options{Quality: 90})
	if encodingError != nil {
		panic(err)
	}
}

func paintAreaInImage(img *image.RGBA, area mandelbrot.Area, offsetX int, offsetY int) {
	for x := 0; x < area.HorizontalResolution; x++ {
		for y := 0; y < area.VerticalResolution; y++ {
			point := area.GetPoint(x, y)
			color := getColor(point, []color.RGBA{
				color.RGBA{
					R: 255,
					A: 255,
				},
				color.RGBA{
					G: 255,
					A: 255,
				},
				color.RGBA{
					B: 255,
					A: 255,
				},
				color.RGBA{
					R: 255,
					G: 255,
					A: 255,
				},
				color.RGBA{
					G: 255,
					B: 255,
					A: 255,
				},
				color.RGBA{
					R: 255,
					B: 255,
					A: 255,
				},
				color.RGBA{
					R: 255,
					G: 255,
					B: 255,
					A: 255,
				},
			}, area.MaxIterations, color.RGBA{
				A: 255,
			})
			img.SetRGBA(offsetX+x, offsetY+y, color)
		}
	}
}
func getColor(point mandelbrot.Point, palette []color.RGBA, maxIterations int, maxIterationsColor color.RGBA) color.RGBA {
	if point.Iterations() == maxIterations {
		return maxIterationsColor
	}
	index := point.Iterations() % len(palette)
	return palette[index]
}
func BenchmarkComplexPictureChunks1024x1024x1w1(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1024, 1024, 1, 1)
}
func BenchmarkComplexPictureChunks512x512x2w1(b *testing.B) {
	benchmarkComplexPictureChunks(b, 512, 512, 2, 1)
}
func BenchmarkComplexPictureChunks256x256x4w1(b *testing.B) {
	benchmarkComplexPictureChunks(b, 256, 256, 4, 1)
}
func BenchmarkComplexPictureChunks128x128x8w1(b *testing.B) {
	benchmarkComplexPictureChunks(b, 128, 128, 8, 1)
}
func BenchmarkComplexPictureChunks64x64x16w1(b *testing.B) {
	benchmarkComplexPictureChunks(b, 64, 64, 16, 1)
}
func BenchmarkComplexPictureChunks32x32x32w1(b *testing.B) {
	benchmarkComplexPictureChunks(b, 32, 32, 32, 1)
}
func BenchmarkComplexPictureChunks16x16x64w1(b *testing.B) {
	benchmarkComplexPictureChunks(b, 16, 16, 64, 1)
}
func BenchmarkComplexPictureChunks8x8x128w1(b *testing.B) {
	benchmarkComplexPictureChunks(b, 8, 8, 128, 1)
}
func BenchmarkComplexPictureChunks4x4x256w1(b *testing.B) {
	benchmarkComplexPictureChunks(b, 4, 4, 256, 1)
}
func BenchmarkComplexPictureChunks2x2x512w1(b *testing.B) {
	benchmarkComplexPictureChunks(b, 2, 2, 512, 1)
}
func BenchmarkComplexPictureChunks1x1x1024w1(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1, 1, 1024, 1)
}

func BenchmarkComplexPictureChunks1024x1024x1w2(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1024, 1024, 1, 2)
}
func BenchmarkComplexPictureChunks512x512x2w2(b *testing.B) {
	benchmarkComplexPictureChunks(b, 512, 512, 2, 2)
}
func BenchmarkComplexPictureChunks256x256x4w2(b *testing.B) {
	benchmarkComplexPictureChunks(b, 256, 256, 4, 2)
}
func BenchmarkComplexPictureChunks128x128x8w2(b *testing.B) {
	benchmarkComplexPictureChunks(b, 128, 128, 8, 2)
}
func BenchmarkComplexPictureChunks64x64x16w2(b *testing.B) {
	benchmarkComplexPictureChunks(b, 64, 64, 16, 2)
}
func BenchmarkComplexPictureChunks32x32x32w2(b *testing.B) {
	benchmarkComplexPictureChunks(b, 32, 32, 32, 2)
}
func BenchmarkComplexPictureChunks16x16x64w2(b *testing.B) {
	benchmarkComplexPictureChunks(b, 16, 16, 64, 2)
}
func BenchmarkComplexPictureChunks8x8x128w2(b *testing.B) {
	benchmarkComplexPictureChunks(b, 8, 8, 128, 2)
}
func BenchmarkComplexPictureChunks4x4x256w2(b *testing.B) {
	benchmarkComplexPictureChunks(b, 4, 4, 256, 2)
}
func BenchmarkComplexPictureChunks2x2x512w2(b *testing.B) {
	benchmarkComplexPictureChunks(b, 2, 2, 512, 2)
}
func BenchmarkComplexPictureChunks1x1x1024w2(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1, 1, 1024, 2)
}

func BenchmarkComplexPictureChunks1024x1024x1w3(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1024, 1024, 1, 3)
}
func BenchmarkComplexPictureChunks512x512x2w3(b *testing.B) {
	benchmarkComplexPictureChunks(b, 512, 512, 2, 3)
}
func BenchmarkComplexPictureChunks256x256x4w3(b *testing.B) {
	benchmarkComplexPictureChunks(b, 256, 256, 4, 3)
}
func BenchmarkComplexPictureChunks128x128x8w3(b *testing.B) {
	benchmarkComplexPictureChunks(b, 128, 128, 8, 3)
}
func BenchmarkComplexPictureChunks64x64x16w3(b *testing.B) {
	benchmarkComplexPictureChunks(b, 64, 64, 16, 3)
}
func BenchmarkComplexPictureChunks32x32x32w3(b *testing.B) {
	benchmarkComplexPictureChunks(b, 32, 32, 32, 3)
}
func BenchmarkComplexPictureChunks16x16x64w3(b *testing.B) {
	benchmarkComplexPictureChunks(b, 16, 16, 64, 3)
}
func BenchmarkComplexPictureChunks8x8x128w3(b *testing.B) {
	benchmarkComplexPictureChunks(b, 8, 8, 128, 3)
}
func BenchmarkComplexPictureChunks4x4x256w3(b *testing.B) {
	benchmarkComplexPictureChunks(b, 4, 4, 256, 3)
}
func BenchmarkComplexPictureChunks2x2x512w3(b *testing.B) {
	benchmarkComplexPictureChunks(b, 2, 2, 512, 3)
}
func BenchmarkComplexPictureChunks1x1x1024w3(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1, 1, 1024, 3)
}

func BenchmarkComplexPictureChunks1024x1024x1w4(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1024, 1024, 1, 4)
}
func BenchmarkComplexPictureChunks512x512x2w4(b *testing.B) {
	benchmarkComplexPictureChunks(b, 512, 512, 2, 4)
}
func BenchmarkComplexPictureChunks256x256x4w4(b *testing.B) {
	benchmarkComplexPictureChunks(b, 256, 256, 4, 4)
}
func BenchmarkComplexPictureChunks128x128x8w4(b *testing.B) {
	benchmarkComplexPictureChunks(b, 128, 128, 8, 4)
}
func BenchmarkComplexPictureChunks64x64x16w4(b *testing.B) {
	benchmarkComplexPictureChunks(b, 64, 64, 16, 4)
}
func BenchmarkComplexPictureChunks32x32x32w4(b *testing.B) {
	benchmarkComplexPictureChunks(b, 32, 32, 32, 4)
}
func BenchmarkComplexPictureChunks16x16x64w4(b *testing.B) {
	benchmarkComplexPictureChunks(b, 16, 16, 64, 4)
}
func BenchmarkComplexPictureChunks8x8x128w4(b *testing.B) {
	benchmarkComplexPictureChunks(b, 8, 8, 128, 4)
}
func BenchmarkComplexPictureChunks4x4x256w4(b *testing.B) {
	benchmarkComplexPictureChunks(b, 4, 4, 256, 4)
}
func BenchmarkComplexPictureChunks2x2x512w4(b *testing.B) {
	benchmarkComplexPictureChunks(b, 2, 2, 512, 4)
}
func BenchmarkComplexPictureChunks1x1x1024w4(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1, 1, 1024, 4)
}

func BenchmarkComplexPictureChunks1024x1024x1w5(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1024, 1024, 1, 5)
}
func BenchmarkComplexPictureChunks512x512x2w5(b *testing.B) {
	benchmarkComplexPictureChunks(b, 512, 512, 2, 5)
}
func BenchmarkComplexPictureChunks256x256x4w5(b *testing.B) {
	benchmarkComplexPictureChunks(b, 256, 256, 4, 5)
}
func BenchmarkComplexPictureChunks128x128x8w5(b *testing.B) {
	benchmarkComplexPictureChunks(b, 128, 128, 8, 5)
}
func BenchmarkComplexPictureChunks64x64x16w5(b *testing.B) {
	benchmarkComplexPictureChunks(b, 64, 64, 16, 5)
}
func BenchmarkComplexPictureChunks32x32x32w5(b *testing.B) {
	benchmarkComplexPictureChunks(b, 32, 32, 32, 5)
}
func BenchmarkComplexPictureChunks16x16x64w5(b *testing.B) {
	benchmarkComplexPictureChunks(b, 16, 16, 64, 5)
}
func BenchmarkComplexPictureChunks8x8x128w5(b *testing.B) {
	benchmarkComplexPictureChunks(b, 8, 8, 128, 5)
}
func BenchmarkComplexPictureChunks4x4x256w5(b *testing.B) {
	benchmarkComplexPictureChunks(b, 4, 4, 256, 5)
}
func BenchmarkComplexPictureChunks2x2x512w5(b *testing.B) {
	benchmarkComplexPictureChunks(b, 2, 2, 512, 5)
}
func BenchmarkComplexPictureChunks1x1x1024w5(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1, 1, 1024, 5)
}

func BenchmarkComplexPictureChunks1024x1024x1w6(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1024, 1024, 1, 6)
}
func BenchmarkComplexPictureChunks512x512x2w6(b *testing.B) {
	benchmarkComplexPictureChunks(b, 512, 512, 2, 6)
}
func BenchmarkComplexPictureChunks256x256x4w6(b *testing.B) {
	benchmarkComplexPictureChunks(b, 256, 256, 4, 6)
}
func BenchmarkComplexPictureChunks128x128x8w6(b *testing.B) {
	benchmarkComplexPictureChunks(b, 128, 128, 8, 6)
}
func BenchmarkComplexPictureChunks64x64x16w6(b *testing.B) {
	benchmarkComplexPictureChunks(b, 64, 64, 16, 6)
}
func BenchmarkComplexPictureChunks32x32x32w6(b *testing.B) {
	benchmarkComplexPictureChunks(b, 32, 32, 32, 6)
}
func BenchmarkComplexPictureChunks16x16x64w6(b *testing.B) {
	benchmarkComplexPictureChunks(b, 16, 16, 64, 6)
}
func BenchmarkComplexPictureChunks8x8x128w6(b *testing.B) {
	benchmarkComplexPictureChunks(b, 8, 8, 128, 6)
}
func BenchmarkComplexPictureChunks4x4x256w6(b *testing.B) {
	benchmarkComplexPictureChunks(b, 4, 4, 256, 6)
}
func BenchmarkComplexPictureChunks2x2x512w6(b *testing.B) {
	benchmarkComplexPictureChunks(b, 2, 2, 512, 6)
}
func BenchmarkComplexPictureChunks1x1x1024w6(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1, 1, 1024, 6)
}

func BenchmarkComplexPictureChunks1024x1024x1w7(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1024, 1024, 1, 7)
}
func BenchmarkComplexPictureChunks512x512x2w7(b *testing.B) {
	benchmarkComplexPictureChunks(b, 512, 512, 2, 7)
}
func BenchmarkComplexPictureChunks256x256x4w7(b *testing.B) {
	benchmarkComplexPictureChunks(b, 256, 256, 4, 7)
}
func BenchmarkComplexPictureChunks128x128x8w7(b *testing.B) {
	benchmarkComplexPictureChunks(b, 128, 128, 8, 7)
}
func BenchmarkComplexPictureChunks64x64x16w7(b *testing.B) {
	benchmarkComplexPictureChunks(b, 64, 64, 16, 7)
}
func BenchmarkComplexPictureChunks32x32x32w7(b *testing.B) {
	benchmarkComplexPictureChunks(b, 32, 32, 32, 7)
}
func BenchmarkComplexPictureChunks16x16x64w7(b *testing.B) {
	benchmarkComplexPictureChunks(b, 16, 16, 64, 7)
}
func BenchmarkComplexPictureChunks8x8x128w7(b *testing.B) {
	benchmarkComplexPictureChunks(b, 8, 8, 128, 7)
}
func BenchmarkComplexPictureChunks4x4x256w7(b *testing.B) {
	benchmarkComplexPictureChunks(b, 4, 4, 256, 7)
}
func BenchmarkComplexPictureChunks2x2x512w7(b *testing.B) {
	benchmarkComplexPictureChunks(b, 2, 2, 512, 7)
}
func BenchmarkComplexPictureChunks1x1x1024w7(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1, 1, 1024, 7)
}
func BenchmarkComplexPictureChunks1024x1024x1w8(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1024, 1024, 1, 8)
}
func BenchmarkComplexPictureChunks512x512x2w8(b *testing.B) {
	benchmarkComplexPictureChunks(b, 512, 512, 2, 8)
}
func BenchmarkComplexPictureChunks256x256x4w8(b *testing.B) {
	benchmarkComplexPictureChunks(b, 256, 256, 4, 8)
}
func BenchmarkComplexPictureChunks128x128x8w8(b *testing.B) {
	benchmarkComplexPictureChunks(b, 128, 128, 8, 8)
}
func BenchmarkComplexPictureChunks64x64x16w8(b *testing.B) {
	benchmarkComplexPictureChunks(b, 64, 64, 16, 8)
}
func BenchmarkComplexPictureChunks32x32x32w8(b *testing.B) {
	benchmarkComplexPictureChunks(b, 32, 32, 32, 8)
}
func BenchmarkComplexPictureChunks16x16x64w8(b *testing.B) {
	benchmarkComplexPictureChunks(b, 16, 16, 64, 8)
}
func BenchmarkComplexPictureChunks8x8x128w8(b *testing.B) {
	benchmarkComplexPictureChunks(b, 8, 8, 128, 8)
}
func BenchmarkComplexPictureChunks4x4x256w8(b *testing.B) {
	benchmarkComplexPictureChunks(b, 4, 4, 256, 8)
}
func BenchmarkComplexPictureChunks2x2x512w8(b *testing.B) {
	benchmarkComplexPictureChunks(b, 2, 2, 512, 8)
}
func BenchmarkComplexPictureChunks1x1x1024w8(b *testing.B) {
	benchmarkComplexPictureChunks(b, 1, 1, 1024, 8)
}

It is important that the caller must read the channel done until the calculation ends because, if there is no receiver, the channel is blocked.

The saveImage is a flag passed to the test command to keep the generated image for each benchmark. This allows to visually verify that the benchmark is actually rendering the right image.

This is the section selected for the benchmarks. I’m using a different method to give color to the set this time. I’m using a color palette with 7 colors and I assign it based on the number of iterations. color[iterations % len(color)] This is awesome because always give the same color to a section independently of the zoom.

Mandelbrot CLI

Let’s create a CLI to generate awesome images of any given area.

To ease the final usage, I’ve simplified the input parameters to some more human-friendly. The default values, generate a global view of the Mandelbrot in a 1920x1920 image.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


top := flag.Float64("top", 1.5, "Top mandelbrot position")
left := flag.Float64("left", -2.1, "Left mandelbrot position")
areaSize := flag.Float64("areaSize", 3, "From the TopLeft, the size of the complex area")

imageSize := flag.Int("imageSize", 1920, "Size of the squared image generated in pixels")
divisions := flag.Int("divisions", 50, "Number of divisions to split the work over multiple routines")
maxIterations := flag.Int("maxIterations", 100, "Maximum number of iterations per point")

workers := flag.Int("workers", runtime.NumCPU(), "Maximum number of iterations per point")
out := flag.String("out", "mandelbrot.jpg", "output file, it can be png or jpg")
timeout := flag.Int64("timeout", 20, "Maximum number of seconds to compute, if reached. the program will exit")

With this flags, I directly call a mandelbot.NewPicture function that accepts this parameters and handles the conversion to the original parameters needed by the mandelbot.Picture. The only thing to adjust is the chunkSize based on the number of divisions to ensure that we draw the same area. The drawback is that we need to carefully select the number of divisions or it will be impossible to split the area in a whole number of divisions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


pic := mandelbrot.NewPicture(complex(*left, *top), *areaSize, *imageSize, *divisions, *maxIterations)

func NewPicture(topLeft complex128, chunkSize float64, imageSize int, divisions int, maxIterations int) *Picture {
	if imageSize%divisions != 0 {
		log.Printf("WARNING: ImageSize %d can't be divided in %d divisions, The final image will be smaller", imageSize, divisions)
	}
	chunkImageSize := imageSize / divisions
	return &Picture{
		TopLeft:               topLeft,
		MaxIterations:         maxIterations,
		ChunkSize:             chunkSize / float64(divisions),
		HorizontalImageChunks: divisions,
		VerticalImageChunks:   divisions,
		ChunkImageSize:        chunkImageSize,
	}
}

After calling pic.Init(), we can call pic.CalculateAsync() to get the doneIndex channel that will notify us about the progress. As we need a little bit of setup with a context and a picture. I will wrap the calculation in a function Calculate that accepts the pic and returns an image.RGBA ready to be exported.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


func Calculate(timeout int64, workers int, pic *mandelbrot.Picture) (*image.RGBA, error) {
	ctx, ctxCancel := context.WithTimeout(context.Background(), time.Second*time.Duration(timeout))
	defer ctxCancel()

	doneIndex := pic.CalculateAsync(ctx, workers)
	img := image.NewRGBA(image.Rect(0, 0, pic.HorizontalResolution(), pic.VerticalResolution()))

	for {
		select {
		case <-ctx.Done():
			return img, ctx.Err()
		case i, ok := <-doneIndex:
			if !ok {
				log.Print("Finished")
				return img, nil
			}
			log.Printf("Index %d done", i)
			offsetX, offsetY := pic.GetImageOffsetFor(i)
			paintAreaInImage(img, pic.GetArea(i), offsetX, offsetY)
		}
	}
}

The mandelbot.Picture provides utility method that return information about the final image dimensions. This way, I can prepare the final image. In the for loop we wait for updates from the calculation and call a function to paint the finished Area into the final image. The method picture.GetImageOffserFor accepts an index and returns the x,y positions in the resolution of the final image. If the context is cancelled. the Calculate function returns a partial image and the error that canceled the context. This way, if the context is cancelled. The function returns almost immediately. This could be a serious bug because workers will never be able to finish their work. But it is not a problem here because the program will just save the image and exit. I will solve this in future posts.

Let’s take a look into the paintAreaInImage function. It’s purpose is to iterate over all the pixels in the area, assign them a color, and add it to the final Image. The function getColor accepts (point mandelbrot.Point, palette []color.RGBA, maxIterations int, maxIterationsColor color.RGBA) The color is chosen by dividing the iterations over the len of the palette. This ensures that we the next color for every iterations and that we loop back to the first one when the length is reached. The maxIterations is needed to paint the “Unknown” area of a different color. In this case, Black color.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53


func paintAreaInImage(img *image.RGBA, area mandelbrot.Area, offsetX int, offsetY int) {
	for x := 0; x < area.HorizontalResolution; x++ {
		for y := 0; y < area.VerticalResolution; y++ {
			point := area.GetPoint(x, y)
			color := getColor(point, []color.RGBA{
				color.RGBA{
					R: 255,
					A: 255,
				},
				color.RGBA{
					G: 255,
					A: 255,
				},
				color.RGBA{
					B: 255,
					A: 255,
				},
				color.RGBA{
					R: 255,
					G: 255,
					A: 255,
				},
				color.RGBA{
					G: 255,
					B: 255,
					A: 255,
				},
				color.RGBA{
					R: 255,
					B: 255,
					A: 255,
				},
				color.RGBA{
					R: 255,
					G: 255,
					B: 255,
					A: 255,
				},
			}, area.MaxIterations, color.RGBA{
				A: 255,
			})
			img.SetRGBA(offsetX+x, offsetY+y, color)
		}
	}
}

func getColor(point mandelbrot.Point, palette []color.RGBA, maxIterations int, maxIterationsColor color.RGBA) color.RGBA {
	if point.Iterations() == maxIterations {
		return maxIterationsColor
	}
	index := point.Iterations() % len(palette)
	return palette[index]
}

When the function Calculate returns, we have a img object that must be saved to a file. If you pay attention to the parameters accepted by the program, it says that accepts png and jpeg formats. To do so, we can use the function filepath.Ext to find the extension of the file name. As go already provides codecs for jpeg and png. The code is straight forward.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


outFile, err := os.Create(*out)
if err != nil {
	log.Fatalf("output file cannot be opened, cause: %s", err)
}
defer outFile.Close()

switch filepath.Ext(*out) {
case ".jpg", ".jpeg":
	encodingError := jpeg.Encode(outFile, img, &jpeg.Options{Quality: 90})
	if encodingError != nil {
		panic(err)
	}
case ".png":
	encodingError := png.Encode(outFile, img)
	if encodingError != nil {
		panic(err)
	}
}

Click here to see all code

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140


package main

import (
	"context"
	"flag"
	"image"
	"image/color"
	"image/jpeg"
	"image/png"
	"log"
	"os"
	"path/filepath"
	"runtime"
	"time"

	"github.com/metalblueberry/mandelbrot/mandelbrot"
)

func main() {
	top := flag.Float64("top", 1.5, "Top mandelbrot position")
	left := flag.Float64("left", -2.1, "Left mandelbrot position")
	areaSize := flag.Float64("areaSize", 3, "From the TopLeft, the size of the complex area")

	imageSize := flag.Int("imageSize", 1920, "Size of the squared image generated in pixels")
	divisions := flag.Int("divisions", 50, "Number of divisions to split the work over multiple routines")
	maxIterations := flag.Int("maxIterations", 100, "Maximum number of iterations per point")

	workers := flag.Int("workers", runtime.NumCPU(), "Maximum number of iterations per point")
	out := flag.String("out", "mandelbrot.jpg", "output file, it can be png or jpg")
	timeout := flag.Int64("timeout", 20, "Maximum number of seconds to compute, if reached. the program will exit")

	flag.Parse()

	log.Printf("Start")

	pic := mandelbrot.NewPicture(complex(*left, *top), *areaSize, *imageSize, *divisions, *maxIterations)
	pic.Init()

	log.Printf("Calculation started")

	img, err := Calculate(*timeout, *workers, pic)
	if err != nil {
		log.Printf("Calculation failed, image is not complete. cause: %s", err)
	}

	outFile, err := os.Create(*out)
	if err != nil {
		log.Fatalf("output file cannot be opened, cause: %s", err)
	}
	defer outFile.Close()

	switch filepath.Ext(*out) {
	case ".jpg", ".jpeg":
		encodingError := jpeg.Encode(outFile, img, &jpeg.Options{Quality: 90})
		if encodingError != nil {
			panic(err)
		}
	case ".png":
		encodingError := png.Encode(outFile, img)
		if encodingError != nil {
			panic(err)
		}
	}
}

func Calculate(timeout int64, workers int, pic *mandelbrot.Picture) (*image.RGBA, error) {
	ctx, ctxCancel := context.WithTimeout(context.Background(), time.Second*time.Duration(timeout))
	defer ctxCancel()
	doneIndex := pic.CalculateAsync(ctx, workers)
	img := image.NewRGBA(image.Rect(0, 0, pic.HorizontalResolution(), pic.VerticalResolution()))

	for {
		select {
		case <-ctx.Done():
			return img, ctx.Err()
		case i, ok := <-doneIndex:
			if !ok {
				log.Print("Finished")
				return img, nil
			}
			log.Printf("Index %d done", i)
			offsetX, offsetY := pic.GetImageOffsetFor(i)
			paintAreaInImage(img, pic.GetArea(i), offsetX, offsetY)
		}
	}
}

func paintAreaInImage(img *image.RGBA, area mandelbrot.Area, offsetX int, offsetY int) {
	for x := 0; x < area.HorizontalResolution; x++ {
		for y := 0; y < area.VerticalResolution; y++ {
			point := area.GetPoint(x, y)
			color := getColor(point, []color.RGBA{
				color.RGBA{
					R: 255,
					A: 255,
				},
				color.RGBA{
					G: 255,
					A: 255,
				},
				color.RGBA{
					B: 255,
					A: 255,
				},
				color.RGBA{
					R: 255,
					G: 255,
					A: 255,
				},
				color.RGBA{
					G: 255,
					B: 255,
					A: 255,
				},
				color.RGBA{
					R: 255,
					B: 255,
					A: 255,
				},
				color.RGBA{
					R: 255,
					G: 255,
					B: 255,
					A: 255,
				},
			}, area.MaxIterations, color.RGBA{
				A: 255,
			})
			img.SetRGBA(offsetX+x, offsetY+y, color)
		}
	}
}

func getColor(point mandelbrot.Point, palette []color.RGBA, maxIterations int, maxIterationsColor color.RGBA) color.RGBA {
	if point.Iterations() == maxIterations {
		return maxIterationsColor
	}
	index := point.Iterations() % len(palette)
	return palette[index]
}

Conclusion

This has been really interesting. I was not expecting a x6 speed but looks like I was wrong. I think that the optimal number of divisions comes from a tread-off between the capacity to split the work over multiple cores and the extra time required to send data over channels. I would like to benchmark the code again and see where is the bottle neck for each configuration.

I think that the next project will be to create a mandelbrot-server that provides a webpage when the user will be able to zoom in the set dynamically. It would be awesome!

Checkout the whole code in the repository. Here is the link to the tag that hold the code at this moment. You can also download the binary for linux! just in case you want to run it without installing go.

Thank you for reading and feel free to leave a comment bellow, I will be really happy to hear from you.

Mandelbrot Parallel Computation

Introduction

The Mandelbrot Picture Struct

Calculation

Task Synchronization With sync.WaitGroup

Work Queue

Work Progress Report

Calculation Execution Context

Configuration And Benchmarks

Mandelbrot CLI

Conclusion

References