High performance allocation applications with the Hungarian algorithm
At some point, we all have to deal with some kind of allocation application. For example: assigning students an exam version, assigning an hour for a certain appointment based on availability, assigning a task force to certain employees, ...
In many cases this can easily be taken care of, just think of a bidding site where the product is automatically assigned to the highest bid. However, there often are much more complex situations. For example: assigning an exam time slot to students where those students can indicate a preference.
For that situation we will create an example scenario, in which we can assign as many students as possible their favourite choice.
Our scenario
We have a variable number of students and an array of time slots that each have x-number of available places (in total there are 78 available places):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$students = range(0,10);
$slotsAvailability = [
'10:00' => 2,
'10:15' => 6,
'10:30' => 3,
'10:45' => 2,
'11:00' => 8,
'11:15' => 5,
'11:30' => 10,
'11:45' => 2,
'12:00' => 10,
'12:15' => 20,
'12:30' => 2,
'12:45' => 8,
];
The script will generate a random array of preferences and then compare the combinatorial solution with the solution using the Hungarian algorithm.
Combinatorial solution
In order to find the most optimal combination, we should first calculate all combinations and then run through them to find out which ones are possible.
To calculate all combinations we have used the function below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
public function combinations($arrays, $i = 0) {
if (!isset($arrays[$i])) {
return [];
}
if ($i == count($arrays) - 1) {
return $arrays[$i];
}
// get combinations from subsequent arrays
$tmp = $this->combinations($arrays, $i + 1);
$result = [];
// concat each array from tmp with each element from $arrays[$i]
foreach ($arrays[$i] as $k => $v) {
foreach ($tmp as $x => $t) {
$result[] = is_array($t) ?
array_merge([$v], $t) :
[$v, $t];
}
}
return $result;
}
Then, we will run through each combination to check whether it is a possible combination according to the available places for a certain time slot
The first combination we encounter, of which all chosen time slots are available, will be marked as $finalCombination.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
foreach ($combinations as $combination) {
// Check if available
$combinationOk = true;
foreach ($slotsAvailability as $slot => $amount) {
$arrayCountValues = array_count_values($combination);
if (isset($arrayCountValues[$slot]) && $arrayCountValues[$slot] > $amount) {
$combinationOk = false;
break;
}
}
if ($combinationOk) {
$finalCombination = $combination;
break;
}
}
This solution works perfectly for small allocation matrixes.
However, when we have a matrix with more students, we soon notice that this solution reaches its limits and already goes out of memory at 15 students.
We now have two possible solutions, to give the PHP process more memory or to look for another solution, such as the Hungarian algorithm.
Hungarian algorithm
The Hungarian algorithm is a so-called combinatorial optimization algorithm. This means that we will look for the most optimal combination within a very large collection of possible combinations, without having to calculate and go over all those combinations.
There are already several PHP classes with which you can get started right away.
We used munkres.php (Munkres algorithm = Hungarian algorithm) to get started and altered this class to HungarianAlgorithm.php so the class can also process non-square matrixes.
How do we use the algorithm?
The class expects a non-associative array as matrix. So it is important to keep track of a mapping with the values corresponding to the indexes in our matrix.
This matrix is structured as follows (template + example)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$matrix = [
'<student>' => [
'<slot_1>' => '<choice_number>',
'<slot_2>' => '<choice_number>',
'<slot_3>' => '<choice_number>',
],
1 => [
0 => 2,
1 => 1,
2 => 1,
3 => 3,
4 => 3,
5 => 999,
]
];
Here are a few remarks:
- If a time slot has more availabilities than one, it should occur several times in the matrix. (e.g. slot 1 and 2 are both 10h15, slot 3 and 4 are both 10h45)
- If there are several availabilities for a certain time slot, the same preference (choice number) should be given to it as well
- If there are more possible slots than a student is allowed to choose, for example: there are 12 slots, but students are only allowed to pass their top 3, we pass 999 for the remaining slots, so they will certainly not be chosen by the algorithm.
- Once we have set up our matrix, we can pass it on to the algorithm and let the algorithm run.
- The algorithm will return a matrix in the same structure as the matrix we gave as input, only the "choice_numbers" are now replaced by zeros and a one. The value with value "1" is the slot that was selected for that student.
Now all you have to do is to map this selection back to the actual value of the time slot.
In the example below, if we get a 1 back for time slot 4, we will map it back to 10:45 for Jane using the SlotMapping and StudentMapping arrays.
Input | Output | SlotMapping | StudentMapping |
---|---|---|---|
$matrix = [ 1 => [ 0 => 2, 1 => 1, 2 => 1, 3 => 3, 4 => 3, 5 => 999, ] ]; | $matrix = [ 1 (student) => [ 0 => 0, 1 => 0, 2 => 0, 3 => 0, 4 => 1 (chosen slot), 5 => 0, ] ]; | $slots = [ 0 => 10u00, 1 => 10u15, 2 => 10u15, 3 => 10u45, 4 => 10u45, 5 => 11u00 ] | $students = [ 0 => 'John', 1 => 'Jane', 3 => 'Chris' ] |
Performance
As mentioned before, the combinatorial solution already went OOM when calculating the combinations for 15 students.
If we run the Hungarian algorithm for the same number of students, this runs smoothly.
What are the exact advantages of the Hungarian algorithm in terms of performance? We make the equation.
The number of time slots is still 78 as indicated in the initial example. Each result is the average of 5 tests.
Aantal studenten |
Array combinations (s) |
Hungarian Algorithm (s) |
5 |
0.00143981 |
0.002079821 |
10 |
0.168286848 |
0.011980629 |
15 |
OOM |
0.028774214 |
20 |
OOM |
0.070318842 |
40 |
OOM |
0.342720747 |
75 |
OOM |
2.176020622 |
Conclusion
For small, simple allocations, the combinatorial solution will suffice, but if you need to process large allocation matrices, the Hungarian algorithm is certainly a recommended solution. It is very quick to implement and can be used for various purposes.
When to use the Hungarian algorithm? You can find an overview below:
- When we need to process large matrices.
- Allocation based on preference and availability. Here we assign as many people as possible their favourite choice.
- Allocation based on costs. Here we assign tasks in such a way that the total cost is as low as possible.
When to use the combinatorial solution or another solution:
- Small matrices of, for example, 5x3. For these, it also turns out that the combinatorial solution is slightly faster.
- Allocation based on logical equation. For example: highest bid, best reputation, first come first serve, ...
- Completely random allocation. For example "secret santa"
- When different parameters have to be taken into account in the allocation. We can also look at possible extensions of the Hungarian algorithm here, depending on which parameters we have to compare.