This is the last part of a three part series that starts here.

The `SetMap`

class discussed in the previous entry essentially serves as a wrapper around our hash table. We have also implemented a mechanism for specifying what constitutes a valid score for a given type of set. And we have provided our three target classes that inherit from `SetMap`

. The basics of our model are thus in place.

What remains to be implemented is all the interesting operations on sets! The remaining part of the interface will, however, not interact with the internally used hash table directly, but only through the interface developed so far.

For the purpose of implementing those additional operations, we open a new module `SetLike`

, which we include in `SetMap`

. Since our target classes `ClassicalSet`

, `FuzzySet`

and `MultiSet`

inherit from `SetMap`

, they will also be able to access the functionality provided by `SetLike`

.

### Division of Labor

The division of labor we adopt here takes a cue from the place the `Enumerable`

module occupies in Ruby’s design. `Enumerable`

provides a number of useful methods for working with collections to any class that chooses to include it. In doing so, `Enumerable`

assumes that the class implements an `each`

method, which forms the basis for all the methods `Enumerable`

defines. Beyond this, however, `Enumerable`

does not (need to) know anything about the class using it. Here, we do something very similar:

`SetLike`

provides most of the functionality commonly associated with sets.`SetLike`

assumes that any class that uses it implements the instance methods`retrieve`

,`insert`

,`delete`

,`each`

and`size`

.`SetLike`

does not make any further assumptions about the internals of the class using it.

In particular, none of the methods in `SetLike`

need to know that we are using a hash for internal storage. This draws a distinction between methods that do need to know that we have chosen to represent sets with hash tables (they go in the `SetMap`

class), and methods that don’t need to know this (they go in the `SetLike`

module), and thus encapsulates the internal state of a set in `SetMap`

. As long as we keep the public interface of `SetMap`

stable, we could just as well reimplement all its methods using a binary search tree instead of a hash for storing a set, say: `SetLike`

will not care.

### Operations on sets

We would like to implement the usual operations on sets, like `union`

, `intersection`

, and `difference`

. Since they all follow essentially the same pattern, we focus on just one of them, `union!`

(the destructive version of `union`

).

Following the example established by Ruby’s `Set`

class, we first implement a helper method `SetLike#do_with`

that yields to a block:

1
2
3
4
5
6
7
8
9

def do_with(other)
unless other.instance_of?(self.class)
raise SetError, "#{self.class} instance needed"
end
return other.each unless block_given?
other.each { |key, val| yield([key, val]) }
end
private :do_with

This methods takes care of the type-checking for us. We would not, e.g., want to subtract a fuzzy set from a classical set, as the result will not in general be a classical set. The above guard clause ensures that the operations we will define are only carried out for two objects that belong to the same target class. Beyond this, `do_with`

simply yields the key-value pairs of the set passed to it as an argument. Using `do_with`

, we implement `union!`

as follows:

1
2
3
4
5
6

def union!(other)
do_with(other) do |key, _|
self[key] = [self[key], other[key]].max
end
self
end

The union of a given set with another one is defined by maximizing scores for given keys across the two sets. This is how the union is usually defined, and it yields the expected results.

1
2
3
4
5
6
7
8

# SetMap::from_hash(hsh) creates a new set instance and populates it
# with the key-value pairs from the given hash `hsh`
multi_set1 = MultiSet.from_hash(2 => 1, 3 => 2)
multi_set2 = MultiSet.from_hash(2 => 2, 4 => 1)
multi_set1.union!(multi_set2)
multi_set1 == MultiSet.from_hash(2 => 2, 3 => 2, 4 => 1) # true

`union!`

implements a notion of “combining what is contained in two given sets”. There is a similar, yet slightly different notion which is interesting from the perspective of our polymorphic approach: the *sum* of two sets. So let’s briefly digress:

1
2
3
4

def sum!(other)
do_with(other) { |key, val| insert(key, val) }
self
end

Rather than maximizing scores, as `union!`

did, `sum!`

adds the scores given by the other set to the scores of the receiver. For classical sets, `sum!`

and `union!`

amount to the very same thing:

1
2
3
4
5
6
7
8
9
10

# SetMap::[](*list) creates a new set instance and turns the
# members of `list` into keys of the new instance.
set1 = ClassicalSet[1, 2, 3]
set2 = ClassicalSet[2, 3, 4]
set3 = ClassicalSet[1, 2, 3]
set4 = ClassicalSet[2, 3, 4]
set1.union!(set2) == set3.sum!(set4) # true

This is why Ruby’s own `Set`

class simply defines `+`

(the sum operator) to be an alias of `|`

(the union operator). However, for other types of sets, sum and union are not always the same. This is because taking the maximum of two scores is not generally the same as summing up those two scores! For example:

1
2
3
4
5
6
7
8
9
10
11

set1 = FuzzySet.from_hash(2 => 0.4)
set2 = FuzzySet.from_hash(2 => 0.3)
set3 = FuzzySet.from_hash(2 => 0.4)
set4 = FuzzySet.from_hash(2 => 0.3)
set1.sum!(set2)
set3.union!(set4)
set1 == FuzzySet.from_hash(2 => 0.7) # true
set3 == FuzzySet.from_hash(2 => 0.4) # true

Our model captures all of this correctly.

### Set predicates

A classical set `A`

is a subset of a classical set of `B`

if any element of `A`

is also an element of `B`

. This can be expressed in terms of keys and their associated scores by saying that the score for any key in `A`

is less than or equal to the score for that same key in `B`

. This definition also applies to multisets, and fuzzy sets, so that, again, a common implementation is possible. Here is a first stab at the `SetLike#subset?`

method:

1
2
3
4
5
6
7
8

def subset?(other)
return false unless other.instance_of?(self.class)
all? do |key, _|
self[key] <= other[key]
end
end
alias <= subset?

Following the pattern established by the `do_with`

method discussed above, however, it makes sense to extract the “key comparison” functionality to a separate `compare_with?`

method that takes a block (you may want to check out the code of the multiset gem—the gem author does exactly this).

1
2
3
4
5
6
7
8
9
10
11

def keys
each.map(&:first)
end
def compare_with?(other)
return false unless other.instance_of?(self.class)
(keys | other.keys).all? do |key|
yield(self[key], other[key])
end
end

According to line 2 above, the keys of a set object are given by the first component of each key-value pair. `SetLike#compare_with?`

then iterates over the keys of both `self`

and `other`

, and yields the corresponding values to the block. This allows us to implement `subset?`

as follows:

1
2
3
4
5
6

def subset?(other)
compare_with?(other) do |s, o|
s <= o
end
end
alias <= subset?

Definitions for the other common set predicates (`proper_subset?`

, `superset?`

and `proper_superset?`

) are similar, so we omit them here.

### Equivalence

When are two sets `A`

and `B`

the same? Again, there is an answer that works for all three target classes: the two sets should be in a mutual inclusion relation, i.e., `A`

should be a subset of `B`

, and `B`

a subset of `A`

. However, invoking `subset?`

twice seems slightly redundant, since in the worst case, this amounts to performing every comparison twice. Using the `compare_with?`

method defined above, we can more simply write the following code:

1
2
3
4
5
6
7

def equivalent?(other)
compare_with?(other) do |s, o|
s == o
end
end
alias == equivalent?
alias eql? equivalent?

Notice that we have aliassed the `equivalent?`

method both as `==`

and `eql?`

. We have already taken the `==`

method for granted in some earlier snippets. Now what about `eql?`

? This brings us to our final topic for today:

### Nested sets

Unless overridden, the `Object#eql?`

method considers two objects to be the same if they are identical (i.e., are stored at the same location in memory). In the current context, overriding `Object#eql?`

is critical, because `eql?`

is the method that Ruby uses when accessing hash keys. Let’s leave the context of our `SetLike`

module for a moment, and consider this line of Ruby code:

1

some_hash[some_obj]

When executing this line, Ruby will check if there is a key `key`

to be found in `some_hash`

with the property that `some_obj.eql?(key)`

. If so, the value for `key`

will be returned.

For our purposes, this process of looking up keys in a hash is crucial because (1) we are using hashes for storing set membership information, and (2) we would like to be able to model *nested* sets, which are sets that have sets among their keys. Consider:

1
2
3
4
5
6
7
8
9

# SetMap::[](*list) creates a new set instance and turns the
# members of `list` into keys of the new instance.
set1 = ClassicalSet[1, 2, 3]
set2 = ClassicalSet[1, 2, 3]
set3 = ClassicalSet[set1, 4]
set4 = ClassicalSet[set2, 4]
set3 == set4 #=> ?

Now the question is whether `set3`

and `set4`

are the same set. It seems that the answer should be yes, because, after all, they *contain the same elements*. Assume, however, for a moment that we had not overridden `eql?`

. Then `set3`

and `set4`

do *not* come out the same in the sense of `==`

because `set1`

and `set2`

do not reference the same object, which implies that `set3[set1] == set4[set1]`

will return false, for the simple reason that `set1`

is not considered a key in the set `other`

, since it is not the case that `set2.eql?(set1)`

. So overriding `eql?`

, like we did above, is indeed critical.

### The `hash`

method

As a final aside: For our set comparisons to work, we also need to override the `Object#hash`

method so as to ensure that two set objects that are `eql?`

have the same return value when `hash`

is called on them. This can be achieved, e.g., like this:

1
2
3

def hash
each.map(&:hash).sum
end

Here, we simply map each key-value pair (a two-element array) yielded by `each`

to its `hash`

value and sum up the result, trusting that `Array#hash`

is implemented in a meaningful way. And this indeed ensures that `eql?`

sets have the same `hash`

return value.

### Coda

What has been achieved? As mentioned at the beginning of part 01, the set functionality that we have discussed is either readily available as part of Ruby’s Standard Library (for classical sets), or via an easily accesible Ruby gem (for multisets). However, the code presented here presents a *uniform* perspective on classical sets and multisets. While I have written `SetMap`

and its child classes as an exercise for myself, I consider this uniformity an advantage over the pre-existing implementation. We have also seen how easily the approach generalizes to further use cases by considering fuzzy sets.

While coming up with a first working implementation of the types of sets discussed here was pretty straightforward, arriving at a way to structure and modularize my code that I found convincing myself required me to go back to the drawing board several times. If you have a chance to check out my code, your feedback would be greatly appreciated.