rsm λ spisemisu . com
and joe λ celo . io
Abstract
Let A be an array of unique numbers of size n, where the minimum and maximum numbers in the array are min and max respectively.
In this blog post we introduce a parallel algorithm, bfsort, based on Bloom Filters, that can sort the array A in O(log n) time and O(m) space where m is the size of: { min, min + 1, min + 2, … , max  1, max }, while the sequential version, can sort the array A in O(m) time and O(n) space. We argue that if merge operations of empty arrays that could be stored in CPUcache, given unlimited cores, then the memory overhead wouldn’t have that big of an impact on the parallel version of the algorithm.
As the algorithm is based on Bloom Filters, both the parallel and sequential versions sort the array A with a collision probability of 0.00001%. The algorithm is parameterized on the collision probability, which can be adjusted for optimal results based on the size of the array.
1. Introduction
In this section, we will describe a few data structures and concepts that will ease the understanding of the following sections:

Sorting a sequence of values could be defined with the following constraint: { a₀ ≤ a₁ ≤ … ≤ aₙ } where a number is less or equal to it’s successor in the sequence. When working with sorting algorithms, it’s important to understand the underlying data structure that stores the values. Arrays allow additions in linear time, as a new array +1 needs to be allocated on another spot in memory, while indexed access are in constant time. Lists allow constant time additions and linear time lookups.

Bloom Filter is a spaceefficient data structure as it saves the necessary information in bits, see Figure 0, which differs from HashTables which will need at least a boolean, normally implemented as a byte, where only one bit is set and the rest are unused, to store the information. With regard to test if an element is in the set, the Bloom Filter can only warranty that an element definitely is not in set while if an element is in the filter, it still has to make a lookup on the underlying data structure. There is a probability that an element is in the filter but it isn’t in the underlying data structure, false positives. The false positives are due to the set of hashes can set the same bit for different words. This is why it’s very important to choose the most spaceefficient size of bits, m, in relation to the probability of false positives, p, while choosing the optimal number of hash functions, k, that uniformly distribute the values across the filter [cao]:

Asymptotic performance measures:

DAG: A directed acyclic graph, G = (V,E), in this blog post represents the computations of an algorithm. The vertices, V, represent the instructions to be computed and the edges, E, represent the dependencies between the instructions. If there is a edge, (u,v) ∈ E, the instructions in the vertex v cannot be executed before the instructions in the vertex u.

Work: The total time of computing all instructions of the algorithm executed on a computer with a single processor, see Figure 1. Work can be mapped to the DAG as all the vertices, V, in the graph. The work law states that if k work can be done in Tₖ(n) time on a computer with k processors and the amount of work is only T(n), where k · Tₖ(n) ≥ T(n), then the execution time of the algorithm with k processors, is limited to the amount of work divided by the amount of processors.

Span: The execution time of the longest path in a DAG, also know as critical path, see Figure 1. The asymptotic time complexity of the algorithm is dominated by the critical path. The span law is defined as follows: no parallel computer with a limited amount of k processors can execute a parallel algorithm faster than a computer with an unlimited amount of cores. The computer with unlimited amount of processors can always just execute the parallel algorithm with k processors: Tₖ(n) ≥ T∞(n).

2. Sorting with Bloom Filters
As mentioned in the previous section when describing the Bloom Filters, in order to uniformly distribute the values across the filter, we must find the optimal relation between bits, probability of false positives and number of hash functions.
For this algorithm, the primary concern is the probability of false positives and not so much the space efficiency or the number of hash functions. For this reason, we choose a very low probability as default value, 0.0000001, even though it means that the amount of bits and hash functions will be slightly bigger in size than normal. Even though it’s a very low probability number, the chosen number ensures that the size of the underlying array in the Bloom Filters doesn’t becomes larger than the length of the array, which could be translated in term of asymptotic complexity as the algorithm would use n extra space, where n is the length of the array.
(31u, 1e07, 23,
[2632944486u; 176026093u; 2853052920u; 534382327u; 2173353107u; 840630065u;
4181145030u; 878918235u; 33641635u; 1504134878u; 136075251u; 3384303498u;
2194677699u; 768245312u; 3208224496u; 2410081556u; 219847018u; 2055198962u;
3112189164u; 4023218364u; 1356830802u; 3285656209u; 449204162u])
Figure 3: Info parameters { number of bits: 31 * 32, probability of collision:
0.00001%, number of hashes: 23 and list of hashes: ... }.
Once we have decided the parameters for the Bloom Filters, we can then instantiate it and be ready to populate it with the values of the array.
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
["□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"; "□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□"]
Figure 4: Empty instantiated Bloom Filter.
While we populate the Bloom Filter with the values in linear time. It’s worth noticing that even though we calculate several hashes and perform several updates on the Bloom Filter, those operations are still in constant time. We can also find the minimum and maximum values in the array, also in linear time, more specifically in the same traversal of the array as the one that performs the updates in the Bloom Filters.
["■□□■■□□□□□■□■■■□□□■■□■□□□□□□■□□□"; "■■□□□□□□□□□■■□■■□■■□□■□■□■□■■■□□"]
["■□□□□□□□□□■□■□■■□□□□■■■□□■■■□□■□"; "■□■■■■□□■■□■□■■■■□□■□■■□□■■□■□■□"]
["□□■■□□■■■■■■■□■□□■□■■■□■■□■□■■□■"; "□■■□■□□□■□■□□□□□■■□□■□□■■■□□□□□□"]
["■□■□■□■■□□■■■□□□■□■□■□■■■□□□■■□■"; "□■□□□□□■■□■□■■□■□□■■■□■□□□■■■■■■"]
["□□□■■□□□□□□■□□■□■□■□■□■■□□■■□■□□"; "□□□□□■□■□■□■□■□■□□■■■■■□■□■■■□■■"]
["■□■□□□■□□■□■■□■■□□□□■■■□■■■□■□■□"; "□■■■□■□■□■□□□□□■■□□□□■■□□■■□□□□■"]
["□□□□■□□□□□□□□□□□■□□□□■□■■■□□■■□■"; "■□□■□□■■■□□■■□□■□□□■■□□■□■■■■□■□"]
["□■□□■■□■■■■■■□■■■■□□□■□■□■□■■□■□"; "■■■□□□■□■■□■■□■■■■□□■■■□□■□□■□□■"]
["■□□■■■■□■■■□■□■■□■■■■□□■■□■□■■■□"; "■□■□■■□■□■■■□□■■□■□□■■□■■■■□□□□□"]
["■■■■□□□■□■■■□□■■■■□■■■■■■■□□■■□□"; "□■□■■■■■□□■□■■■□■■□□□■■□□□□■■□■■"]
["□■□■□□□□■□□□■■□■□■□□■■■□■□■□■■■□"; "■□□□□□■■□■■■□□□■□■□□■□■■□■■□■□□■"]
["□□■■■□□□□■□□■□■□■□■■□□■□□□■□■□■□"; "□□□□□□■□□□□■□■□■□□■■■□■□□■□□□■□□"]
["□■□■■□□□■■□■■■■□□□■□■□□□□■□□□■■□"; "■□□■□□□□■■■■□■□□■□■■□■□□□■■■□□□□"]
["□■□■■□□■■■■■■■□■■■■■■■□□■■□■□□□□"; "■■■■■□■□■□■□■■■□□■□■□■□■■□□■□□■□"]
["□□□■■■□■□■■□■□■■□□■□□□□□■■■■□■■■"; "□■□■■□□□■□□□□□□□■□■□■■□■□□■■■□□□"]
["■□■■■□■■■■□■□□□□■■■□□□■■□□□□■■□■"; "□■□■■■□□■□□□■■■□□■■□■■□■■□■□□■■■"]
Figure 5: Bloom Filter bits populated with the unique bytes from the array.
After the bits of the Bloom Filter are populated with the values of the array, we can now begin to execute the part of the algorithm that will produce a sorted array.
As we have found the minimum and the maximum values in the array, we can now recursively iterate through all the numbers in the range from the minimum number to the maximum number.
The technique we use is to exclude numbers which we definitely know are not in the set. And because we have chose a very little probability of false positives, we can easily argue that numbers that aren’t excluded, most likely are in the initial array.
[0; 1; 2; 6; 14; 25; 30; 68; 72; 90; 106; 118; 119; 124; 136; 146; 149; 162;
165; 177; 179; 182; 187; 190; 195; 199; 219; 222; 234; 237]
Figure 6: Array is sorted after algorithm execution.
Therefore we can state that the work is in Θ(m), where m is the length of the interval from the minimum value to the maximum value. As this is a sequential algorithm, the span will also match the work and therefore be in O(m).
It’s worth noticing that we only need n extra space, where n is the length of the array to be sorted.
For an example of the algorithm applied to a list of bytes, please see Figures 2  6.
3. Parallelizing the algorithm
Readers familiar with Bloom Filters will recognize that there are certain operations in the algorithm described in the previous section, that can be parallelized.
1
2
3
4
5
6
7
8
9
10
let rec aux f lower high = parallel {
if lower < high then
let mid = lower + high >>> 1
let! fork = spawn < aux f lower mid
let! main = aux f (mid+1) high
let! join = fork
return Util.merge join main
else
return if f lower then [lower] else []
}
Figure 7: Pseudocode of a fork/join implementation in F#, an MLfamily programming language.
In this section we will write them down and show how we can lower the span, critical path, if an a computer with an unlimited amount of cores is provided and a fork/join approach is used, see pseudocode from Figure 7, which will reduce the global asymptotic time complexity of the algorithm:
3.1 Initialization
In the sequential algorithm, this step is done in linear time by iterating over each of the elements of the array.

Allocate underlying Bloom Filter array: Allocating the array in memory can be done in constant time while populating it with initial zero values can be done in logarithmic time, O(h), where h = log₂ n, when using a similar parallel algorithm as described in the pseudocode from Figure 7 in this section.

Seeds and Hashes: As the size of the bits of the underlying Bloom Filter array much greaterthan the size values of the seeds and the hashes, therefore none of these operations will have an impact on the time complexity. It’s worth noticing that the random generation of seeds as well as the instantiation of the hashes based on the seeds can be done in parallel as well.
The new span for this step is reduced from O(n) to O(h) where h = log₂ n and n is the length of the array.
3.2 Populating
In the sequential algorithm, this step is done in linear time by iterating over each of the elements of the array.

Updating the Bloom Filter with the hash values: As described in the sequential algorithm, even though we calculate several hash functions for each value, these operations don’t change the asymptotic time complexity of this step as the operations are in constant time. Therefore, as we are able to parallelize the update of the Bloom Filter with an approach as described in pseudocode from Figure 7 in this section. The new span for this step is reduced from O(n) to O(h) where h = log₂ n and n is the length of the array. It’s worth noticing that Bloom Filter are built on top of the or operation, which is the most suitable operation to be parallelized. As we don’t work with bytes we need to ensure that in order to ensure no race conditions, an architecture with support for concurrent atomic CPU operations [atomic], must be chosen.

Minimum and maximum: As above, we can also easily calculate these values by storing locally the value of the comparison between two numbers and bobble them all the way to the root of the recursive tree. The new span for this step is reduced from O(n) to O(h) where h = log₂ n is the length of the array. If the architecture supports concurrent atomic CPU operations, then the extra space needed can be reduced to O(1).
As both steps are reduced from O(n) to O(h) where h = log₂ n is the length of the array, we can conclude that the new span for this step is also reduced from O(n) to O(h).
3.3 Sorting
In the sequential algorithm, the span, O(m), is bound to the work which is in Θ(m), where m is the length of the interval from the minimum value to the maximum value.

Querying the Bloom Filter with the hash values We can easily parallelize the queries of the values contained in interval from the minimum value to the maximum value with a fork/join approach, reducing the span from O(n) to O(h’) where h’ = log₂ m. In order to make this value more understandable, we will showcase the following example with 32bit unsigned integers. Lets assume that our list contains some 32bit uints where the minimum value equals 0 and the maximum value equal 4.294.967.295 (2³²  1). Then our span would be O(32) as log₂ 4.294.967.295 = 32.

Merging Once we know if a value is definitely not in set then we can return an empty array for that case while for the other case we will return an array of size one containing that element. The merge process will consist of combining two arrays into one and copying the elements, in parallel into the newly created array, with the same approach of fork/join as mentioned in this section. The initialization and merge of the top array will be bounded to n, where n is the size of the array to sort, therefore reducing the span of this process O(lg n) as we can create a new array in constant time and populate it in O(h) where h = log₂ n.
As the second step dominates the first, we can argue that we have reduced from O(m) to O(h) where h = log₂ n is the length of the array, we can conclude that the new span for this step is also reduced from O(n) to O(h).
Sadly, we will have an overhead in memory usage as we are storing empty arrays, for at most m values. But, since the merge of empty arrays, which are minimalistic in size, could be stored on the unlimited cores CPUcache, then the memory overhead wouldn’t have that big of an impact on the algorithm.
3.4 Summary
By combining the time complexities from the different steps:
Tₖ(n) = O(h) + O(h) + O(h) = 3 · O(h) = O(h)
where h = log₂ n and n is the length of the array. Therefore, we can conclude that the work is Θ(m) where m is the length of the interval from the minimum value to the maximum value in the array to be sorted and the span is in logarithmic time.
As mentioned in the previous section, there will be an overhead in memory usage as we are storing empty arrays for at most m values. If the merge operations of empty arrays, which are minimalistic in size, could be stored on each of the unlimited cores CPUcache, then the memory overhead wouldn’t have that big of an impact on the algorithm.
4. Implementation
The algorithms are implemented in F#, which could easily be described as the OCaml implementation for .NET, as we already had a working native version of the Bloom Filter data structure. Here are a few implementation details that are worth noticing:
The hash algorithm chosen is MurmurHash3 [smhasher] as it has support for a seed which can be used to generate different hashes for the same key by reusing the same algorithm. In order to get a better seeds and hereby ensuring a more uniform distribution of values across the filter, see Figure 8 for a graphical representation, the RNGCryptoServiceProvider is used for random number generation instead of System.Random. See Hash and Random modules for more information.
Figure 8: Graphical representation of values uniformly distributed across the filter.
Usage of Interlocked.CompareExchange [interlocked] which allows concurrent atomic CPU operations to perform updates on data values with minimal overhead, see Figure 9. In order to use these operations with the spaceefficient data structure, we will need to instantiate our underlying array with ref ‘a values. In case the reader is not convinced that the minimum and maximum numbers can be found in parallel with these operations, it can easily be refactored to a binary tree of height h = log₂ n by using a (fork/join) approach, where the value of the comparisons are stored locally. It’s easier to see that the span, critical path, is also O(lg n) time but with a bit more space overhead O(n) as described in the previous section. Please look into the Atomic and BloomFilter modules for more information.
Figure 9: Several processors reading from the same shared memory but only one can
write atomically in a clock cycle.
In the parallel versions of the algorithm, by using true parallelism Array.Parallel (system threads) compared to using concurrency with Async.StartChild (lightweight threads) in a fork/join approach, shows how there is an extra memory overhead in the first as the operations require to allocate all the values in the range from minimum to maximum in an array in order to filter out values that are definitely not in set, while in the second approach there is less memory overhead as we can use the same approach as described in the previous paragraph when finding in parallel the minimum and maximum numbers in an array to merge lists by using lazy list that allows concatenation in constant time O(1). See the BloomFilter and List.Lazy modules for more information.
A full implementation can be seen in Appendix A: Source Code.
5. Concluding remarks
We have presented both a sequential as well as a parallel algorithm that uses the Bloom Filter data structure as an underlying component in order to sort an array with a very small collision probability of 0.00001%, which can be incremented and decremented for optimal results based on the size of the array.
We argue that the sequential algorithms asymptotic complexity is in Θ(m) time and space as the work is in Θ(m), where m is the length of the interval from the minimum value to the maximum value. As this is a sequential algorithm, the span will also match the work. Also it’s worth noticing that we only need n extra space, where n is the length of the array to be sorted.
And we argue that the parallel algorithms asymptotic complexity is in logarithmic time time as the dominating operation is bounded to the span O(lg n) where h = log₂ n is the length of the array. We also argue that the work is is Θ(m) where m is the length of the interval from the minimum value to the maximum value in the array and therefore we will need m extra space.
5.1 Notes:

In order to ease the understanding of the span for the parallel algorithm, the following example was used: Lets assume that our list contains some 32bit uints where the minimum value equals 0 and the maximum value equal 4.294.967.295 (2³²  1). Then our span would be O(32) as log₂ 4.294.967.295 = 32.

In order to limit the amount of work, it would be ideal to use this sorting algorithm with data that is to not scattered, as the amount of work is defined by the interval from minimum and maximum values of the array.

As this is a probabilistic sorting algorithm, you will have a small chance that the given array is not sorted after executing the algorithm. Therefore this algorithm shouldn’t be used critical operations.

It can easily be shown if an array is sorted in O(lg n), where n is the size of the array, by using the fork/join parallel approach mentioned in section 3. Parallelizing the algorithm, where you just need to ensure that the head of the merged leaf array is lower or equal to the head of the right array and bubble that boolean value up to the root of the tree. In case the array is not sorted, the sorting algorithm could be executed again but by providing an even lower probability which would result in a bit more of memory overhead and work.
References
[bloom] 
Bloom, Burton H. (1970), “Space/Time Tradeoffs in Hash Coding with Allowable Errors”, Communications of the ACM, 13 (7): 422–426, doi:10.1145/362686.362692 
[eppstein] 
David_Eppstein/Gallery https://commons.wikimedia.org/wiki/User:David_Eppstein/Gallery Accessed: 20171203 
[cao] 
Bloom Filters  the math http://pages.cs.wisc.edu/~cao/papers/summarycache/node8.html Accessed: 20171203 
[smhasher] 
SMHasher is a test suite designed to test the distribution, collision, and performance properties of noncryptographic hash functions. https://github.com/aappleby/smhasher Accessed: 20171203 
[atomic] 
Atomic Operations. https://software.intel.com/enus/node/506090 Accessed: 20171203 
[interlocked] 
Interlocked Operations. https://docs.microsoft.com/enus/dotnet/standard/threading/interlockedoperations Accessed: 20171203 
Bibtex
If you want to cite this blog post, you can use the following BibTeX
information:
@online{bfsort,
author = {Ramón Soto Mathiesen and Johan Thillemann},
title = {Bloom Filter Sort (bfsort)},
year = 2017,
url = {http://blog.stermon.com/articles/2017/12/06/bloomfiltersortbfsort},
urldate = {20171206}
}
Appendix A: Source Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
#!/usr/bin/env fsharpi
(* This construct is for ML compatibility. The syntax '(typ,...,typ) ident'
is not used in F# code. Consider using 'ident<typ,...,typ>' instead. *)
#nowarn "62"
(*
bfsort: Bloom Filter Sort
*)
let inline (>>=) m f = Option.bind f m
[<RequireQualifiedAccess>]
module Assert =
let sorted : int array > bool =
fun xs >
let rec aux acc = function
 []  [ _ ] > acc
 x :: [ y ] > (x <= y && acc)
 x :: y :: xs > aux (x <= y && acc) (y :: xs)
xs > Array.toList > aux true
[<RequireQualifiedAccess>]
module Atomic =
open System.Threading
(*
Interlocked Class:
https://msdn.microsoft.com/enus/library/
system.threading.interlocked(v=vs.110).aspx
*)
let rec private aux : (int > int > int) > int ref > int > int =
fun f a b >
let x = !a
let y = f x b
match x = Interlocked.CompareExchange(a,y,x) with
 true > y
 false > aux f a b
let or' : int ref > int > int =
fun x y > aux (  ) x y
let min : int ref > int > int =
fun x y > aux min x y
let max : int ref > int > int =
fun x y > aux max x y
[<RequireQualifiedAccess>]
module Binary =
open System
let mask = function  true > '■'  false > '□'
let fromInt : int > int > (bool > char) option > string option =
fun bits n mask >
let con = function  '1' > true  _____ > false
let def = function  true > '1'  false > '0'
let bin =
Convert.ToString(n,2).PadLeft(bits,'0')
> Seq.map (con >> (defaultArg mask def) >> string)
> Seq.fold ( + ) ""
match bin > String.length = bits with
 true > Some bin
 false > None
[<RequireQualifiedAccess>]
module Bytes =
[<RequireQualifiedAccess>]
module BigEndian =
open System
let private littleEndian = BitConverter.IsLittleEndian
let toUint32 : byte array > uint32 =
fun bs >
let bs' = if littleEndian then bs > Array.rev else bs
(bs',0) > BitConverter.ToUInt32
[<RequireQualifiedAccess>]
module Random =
open System
let private r = Random ()
let byte ( ) = r.Next(0 , 1 <<< 8)
let next lb ub = r.Next(lb, ub )
[<RequireQualifiedAccess>]
module Crypto =
open System.Security.Cryptography
let next () =
use rng = new RNGCryptoServiceProvider ()
let bs = Array.Parallel.init 32 (fun _ > 0uy)
rng.GetNonZeroBytes bs
Bytes.BigEndian.toUint32 bs
[<RequireQualifiedAccess>]
module Hash =
let private string2bytes : string > byte array =
System.Text.Encoding.UTF8.GetBytes
(*
FVN Hash:
http://isthe.com/chongo/tech/comp/fnv/
*)
let fnv1aHash : string > uint32 =
fun key >
(* Example: ./fnv1a32 s foo => 0xa9f37ed7 *)
let fnvp = (1 <<< 24) + (1 <<< 8) + 0x93 > uint32
let fnvob = 2166136261u
let b = key > string2bytes
let rec aux h = function
 i when i < (b.Length) >
let h' =
h ^^^ (b.[i] > uint32)
> fun x > x * fnvp
aux h' (i+1)
 _ > h
aux fnvob 0
(*
Murmur Hash:
https://sites.google.com/site/murmurhash/
*)
let murmur3Hash : uint32 > string > uint32 =
fun seed key >
(* Example: ./murmur3 foo => x86_32: b12f489e (seed 42) *)
let rotl x r = (x <<< r)  (x >>> (32  r))
let fmix h =
h
> fun x > x ^^^ (x >>> 16)
> fun x > x * 0x85ebca6bu
> fun x > x ^^^ (x >>> 13)
> fun x > x * 0xc2b2ae35u
> fun x > x ^^^ (x >>> 16)
let getblock b i =
System.BitConverter.ToUInt32(value = b, startIndex = i)
let data = key > string2bytes
let len = data > Array.length
let nblocks = len >>> 2 (* equivalent to len / 4 but faster *)
let h1 = seed
let c1 = 0xcc9e2d51u
let c2 = 0x1b873593u
(* body *)
let rec body h = function
 i when i < (nblocks) >
let k1 =
getblock data (i * 4)
> fun x > x * c1
> fun x > rotl x 15
> fun x > x * c2
let h' =
h ^^^ k1
> fun x > rotl x 13
> fun x > x * 5u + 0xe6546b64u
body h' (i+1)
 _ > h
let h1' = body h1 0
(* tail *)
let tail = nblocks * 4
let rec tail' (k,h) = function
 0 > h
 1 >
let k' =
k ^^^ (uint32 data.[tail])
> fun x > x * c1
> fun x > rotl x 15
> fun x > x * c2
let h' = h ^^^ k'
tail' (k',h') (0)
 i >
let k' =
(uint32 data.[tail + (i  1)]) <<< (1 <<< (i + 1))
> fun x > k ^^^ x
tail' (k',h) (i1)
let h1'' = tail' (0u,h1') (len &&& 3)
(* finalization *)
h1'' ^^^ (uint32 len)
> fun x > x > fmix
[<RequireQualifiedAccess>]
module Logarithm =
(*
ISO 3111:
https://en.wikipedia.org/wiki/
ISO_3111#Exponential_and_logarithmic_functions
*)
let ln : double > double = log
let lb : double > double = fun x > System.Math.Log (x,2.)
let lg : double > double = fun x > System.Math.Log10 (x )
[<RequireQualifiedAccess>]
module List =
[<RequireQualifiedAccess>]
module Lazy =
type 'a lazylist = Nil  Cons of 'a * (unit > 'a lazylist)
let rec merge : 'a lazylist * 'a lazylist > 'a lazylist = function
 xs, Nil  Nil, xs > xs
 Cons (x,xs), Cons (y,ys) >
let x' = min x y
let y' = max x y
Cons (x', fun () > merge (xs (), Cons (y', fun () > ys())))
let rec toList : 'a lazylist > 'a list = function
 Nil > []
 Cons (x, xs) > x :: (toList (xs()))
let toArray : 'a lazylist > 'a array = fun xs > toList xs > List.toArray
[<RequireQualifiedAccess>]
module BloomFilter =
type bloomFilter = private Config of config
and private config =
{ parameters : parameters
bits : int ref array
seeds : uint32 array
hashes : (string > uint32) array
}
and private parameters =
{ n : uint32
p : double
k : int
m : uint32
}
let init : uint32 > double > uint32 array option > bloomFilter option =
fun n p seeds >
let m =
(* ceil up to nearest 2^n in order to use &&& (n1) instead of % n *)
(float n * Logarithm.ln p) / (Logarithm.ln 2. ** 2.)
(* round to nearest 32 bits *)
> uint32 >>> 5 <<< 5
let k =
(*
Optimal number of hashes:
http://pages.cs.wisc.edu/~cao/papers/summarycache/node8.html
*)
(float m / float n) * Logarithm.ln 2.
> (ceil >> int)
let bs = Array.Parallel.init (m >>> 5 > int) (fun _ > ref 0)
match seeds with
 Some ss >
if ss > Array.length = k then
Config
{ parameters =
{ n = n
p = p
k = k
m = m
}
bits = bs
seeds = ss
hashes = ss > Array.Parallel.map Hash.murmur3Hash
}
> Some
else
None
 None >
let ss = Array.Parallel.init k (fun _ > Random.Crypto.next ())
Config
{ parameters =
{ n = n
p = p
k = k
m = m
}
bits = bs
seeds = ss
hashes = ss > Array.Parallel.map Hash.murmur3Hash
}
> Some
let info : bloomFilter > (uint32 * double * int * uint32 array) option =
fun (Config { parameters = { n = n; p = p; k = k }; seeds = ss; })>
(n,p,k,ss) > Some
let private bytebit hash =
let byte = hash >>> 5
let bit = hash  (byte <<< 5)
byte,bit
let private sbit : int > int = fun n > 1 <<< (31n)
let private gbit : int > int > int =
fun i32 n >
System.Convert.ToString(i32,2).PadLeft(32,'0').[n] > string > int
let add : string > bloomFilter > unit option =
fun key (Config { parameters = { m = m; }; hashes = hs; bits = bs; }) >
hs
> Array.Parallel.map (fun f > f key % m > (int >> bytebit))
> Array.Parallel.iter(fun (x,y) > Atomic.or' bs.[x] (sbit y) > ignore)
> Some
let query : string > bloomFilter > bool option =
fun key (Config { parameters = { m = m; }; hashes = hs; bits = bs; }) >
hs
> Array.Parallel.map (fun f > f key % m > (int >> bytebit))
> Array.fold(fun a (x,y) > (gbit !bs.[x] y = 1) && a) true
> Some
let print : bloomFilter > string array option =
fun (Config { bits = bs; }) >
let masked = fun n > n > Binary.fromInt 32 < Some Binary.mask
bs
> Array.Parallel.map (fun n > masked !n)
> Array.Parallel.choose id
> Some
let length : bloomFilter > int option =
fun (Config { bits = bs; }) >
bs
> Array.length
> Some
let sort : int array > double option > int array =
fun xs probability >
let size = xs > Array.length
let minimum = System.Int32.MaxValue > ref
let maximum = System.Int32.MinValue > ref
(*
Probability of collisions:
0.1 > 10%
0.01 > 1%
0.001 > 0.1%
0.0001 > 0.01%
0.00001 > 0.001%
0.000001 > 0.0001%
0.0000001 > 0.00001% (still with 2*n space)
*)
let collision = defaultArg probability 1.e7
(*
Create & Update Bloom Filter, Min & Max values in parallel
*)
let bf = init (uint32 size) collision None
xs
> Array.Parallel.iter (
fun k >
Atomic.min minimum k > ignore
Atomic.max maximum k > ignore
bf >>= add (string k) > ignore
)
(*
Create a new array with sorted values in sequential.
Note: Extra space is bound to size of array
*)
let rec aux : int list > int > int array =
fun acc n >
let acc' =
if bf >>= query (string n) = Some true then
n :: acc
else
acc
match n with
 i when i = !minimum > List.toArray acc'
 i > aux acc' (i1)
aux [] !maximum
[<RequireQualifiedAccess>]
module Parallel =
let sort : int array > double option > int array =
fun xs probability >
let size = xs > Array.length
let minimum = System.Int32.MaxValue > ref
let maximum = System.Int32.MinValue > ref
(*
Probability of collisions:
0.1 > 10%
0.01 > 1%
0.001 > 0.1%
0.0001 > 0.01%
0.00001 > 0.001%
0.000001 > 0.0001%
0.0000001 > 0.00001% (still with 2*n space)
*)
let collision = defaultArg probability 1.e7
(*
Create & Update Bloom Filter, Min & Max values in parallel
*)
let bf = init (uint32 size) collision None
xs
> Array.Parallel.iter (
fun k >
Atomic.min minimum k > ignore
Atomic.max maximum k > ignore
bf >>= add (string k) > ignore
)
(*
Create a new array with sorted values in parallel.
Note: Parallel version would require more than O(N) of extra space
*)
Array.Parallel.init (!maximum  !minimum + 1) (
fun i >
let n = !minimum + i
if bf >>= query (string n) = Some true then
Some n
else
None
)
> Array.Parallel.choose id
[<RequireQualifiedAccess>]
module Async =
[<RequireQualifiedAccess>]
module Parallel =
let merge : 'a array > 'a array > 'a array =
fun xs ys >
let n,m = Array.length xs, Array.length ys
Array.Parallel.init (n+m) (
fun i > if i < n then xs.[i] else ys.[in]
)
let sort : int array > double option > int array =
fun xs probability >
let size = xs > Array.length
let minimum = System.Int32.MaxValue > ref
let maximum = System.Int32.MinValue > ref
(*
Probability of collisions:
0.1 > 10%
0.01 > 1%
0.001 > 0.1%
0.0001 > 0.01%
0.00001 > 0.001%
0.000001 > 0.0001%
0.0000001 > 0.00001% (still with 2*n space)
*)
let collision = defaultArg probability 1.e7
(*
Create & Update Bloom Filter, Min & Max values in parallel
*)
let bf = init (uint32 size) collision None
xs
> Array.Parallel.iter (
fun k >
Atomic.min minimum k > ignore
Atomic.max maximum k > ignore
bf >>= add (string k) > ignore
)
(* This implementation is slower than when using lazy lists
let rec aux low high =
async {
if low < high then
let mid = low + high >>> 1
let! fork = Async.StartChild (aux low mid)
let! main = aux (mid+1) high
let! join = fork
return merge join main
else
let condition =
if bf >>= query (string low) = Some true then
[ low ]
else
[ ]
return condition
}
aux !minimum !maximum
> Async.RunSynchronously
*)
let rec aux low high =
async {
if low < high then
let mid = low + high >>> 1
let! fork = Async.StartChild (aux low mid)
let! main = aux (mid+1) high
let! join = fork
return List.Lazy.merge (join, main)
else
let condition =
if bf >>= query (string low) = Some true then
List.Lazy.Cons (low, fun () > List.Lazy.Nil)
else
List.Lazy.Nil
return condition
}
aux !minimum !maximum
> Async.RunSynchronously
> List.Lazy.toArray
(* Explanation with the finite byte Set: {0} ∪ ℕ<0xFF *)
let explanation () =
let pprint : 'a array > unit =
fun xs >
xs > Array.chunkBySize 2 > Array.iter (printfn "%A")
(* Generate unique random values in a specific range *)
let upper = System.Byte.MaxValue > int
let lower = System.Byte.MinValue > int
let size = (upper  lower) >>> 3
let randoms =
Array.Parallel.init size (fun _ > Random.next lower upper)
> Array.distinct
printfn "Random array of unique bytes:"
randoms > printfn "%A"
(* Init with with constraints *)
let bf = BloomFilter.init (uint32 size) 0.0000001 < None
(* Info *)
bf >>= BloomFilter.info >>= (printfn "Info:\n%A" >> Some) > ignore
(* Before *)
printfn "Before:"
bf >>= BloomFilter.print >>= (pprint >> Some) > ignore
(* Populate BF and narrow the Set by setting minimum and maximum values *)
let min' = System.Byte.MaxValue > int > ref
let max' = System.Byte.MinValue > int > ref
let populate =
randoms
> Array.Parallel.iter(
fun b >
Atomic.min min' b > ignore
Atomic.max max' b > ignore
bf >>= BloomFilter.add (string b) > ignore
)
(* After *)
printfn "After:"
bf >>= BloomFilter.print >>= (pprint >> Some) > ignore
(* Sort with a collision probability of 0.0000001 *)
let xs' =
let rec aux lower high =
async {
if lower < high then
let mid = lower + high >>> 1
let! fork = Async.StartChild (aux lower mid)
let! main = aux (mid+1) high
let! join = fork
return List.Lazy.merge (join, main)
else
let condition =
if bf >>= BloomFilter.query (string lower) = Some true then
List.Lazy.Cons (lower, fun () > List.Lazy.Nil)
else
List.Lazy.Nil
return condition
}
aux !min' !max'
> Async.RunSynchronously
> List.Lazy.toArray
printfn "Array sorted: %b" < Assert.sorted xs'
xs' > printfn "%A"
(* Example of usage of the sorting algorithm *)
let examples () =
(* Generate unique random values in a specific range *)
let upper = 355_000
let lower = 115_050
let size = (upper  lower) >>> 1
let randoms =
Array.Parallel.init size (fun _ > Random.next lower upper)
> Array.distinct
let xs = randoms > BloomFilter .sort < None
let ys = randoms > BloomFilter .Parallel.sort < Some 1.e7
let zs = randoms > BloomFilter.Async.Parallel.sort < Some 1.e7
printfn "Assert arrays are sorted: %b"
(Assert.sorted xs && Assert.sorted ys && Assert.sorted zs)
explanation ()