What makes a collection enumerable? Largely it is just the fact of being a collection. The module Enumerable has the requirement that the default iterator each should be defined. Sequence as such is not an issue since even an unordered collection such as a hash can have an iterator.
Additionally, if the methods min, max, and sort are to be used, the collection must have a comparison method (<=>). This is fairly obvious.
So an enumerable is just a collection that can be searched, traversed, and possibly sorted. As a rule of thumb, any user-defined collection that does not subclass an existing core class should probably mix in the Enumerable module.
Bear in mind that what we say about one enumerable applies in effect to all of them. The actual data structure could be an array, a hash, or a tree, to name a few.
There are, of course, some nuances of behavior. An array is an ordered collection of individual items, whereas a hash is an unordered collection of paired key-value associations. Naturally there will be differences in their behavior.
Many of the methods we looked at for arrays and/or hashes (such as map and find) really originate here in the Enumerable module. In many cases it was difficult to determine how to cover this material. Any confusion or inaccuracy should be considered my fault.
The array is the most common and representative collection that mixes in this module. Therefore by default I will use it as an example.
The inject method comes to Ruby via Smalltalk (and was introduced in Ruby 1.8). Its behavior is interesting, if a little difficult to grasp at first sight.
This method relies on the fact that frequently we will iterate through a list and "accumulate" a result that changes as we iterate. The most common example, of course, would be finding the sum of a list of numbers. Whatever the operation, there is usually an "accumulator" of some kind (for which we supply an initial value) and a function or operation we apply (represented in Ruby as a block).
For a trivial example or two, suppose that we have this array of numbers and we want to find the sum of all of them:
nums = [3,5,7,9,11,13] sum = nums.inject(0) {|x,n| x+n }
Note how we start with an accumulator of 0 (the "addition identity"). Then the block gets the current accumulated value and the current value from the list passed in. In each case, the block takes the previous sum and adds the current item to it.
Obviously, this is equivalent to the following piece of code:
sum = 0 nums.each {|n| sum += n }
So the abstraction level is only slightly higher. If inject never fits nicely in your brain, don't use it. But if you get over the initial confusion, you might find yourself inventing new and elegant ways to use it.
The accumulator value is optional. If it is omitted, the first item is used as the accumulator and is then omitted from iteration.
sum = nums.inject {|x,n| x+n } # Means the same as: sum = nums[0] nums[1..-1].each {|n| sum += n }
A similar example is finding the product of the numbers. Note that the accumulator, if given, must be 1 since that is the "multiplication identity."
prod = nums.inject(1) {|x,n| x*n } # or: prod = nums.inject {|x,n| x*n }
The following slightly more complex example takes a list of words and finds the longest words in the list:
words = %w[ alpha beta gamma delta epsilon eta theta ] longest_word = words.inject do |best,w| w.length > best.length ? w : best end # return value is "epsilon"
The quantifiers any? and all? were added in Ruby 1.8 to make it easier to test the nature of a collection. Each of these takes a block (which of course tests true or false).
nums = [1,3,5,8,9] # Are any of these numbers even? flag1 = nums.any? {|x| x % 2 == 0 } # true # Are all of these numbers even? flag2 = nums.all? {|x| x % 2 == 0 } # false
In the absence of a block, these simply test the truth value of each element. That is, a block {|x| x } is added implicitly.
flag1 = list.all? # list contains no falses or nils flag1 = list.any? # list contains at least one true value (non-nil # or non-false)
As the saying goes, "There are two kinds of people in the worldthose who divide people into two kinds, and those who don't." The partition doesn't deal with people (unless we can encode them as Ruby objects), but it does divide a collection into two parts.
When partition is called and passed a block, the block is evaluated for each element in the collection. The truth value of each result is then evaluated, and a pair of arrays (inside another array) is returned. All the elements resulting in true go in the first array; the others go in the second.
nums = [1, 2, 3, 4, 5, 6, 7, 8, 9] odd_even = nums.partition {|x| x % 2 == 1 } # [[1,3,5,7,9],[2,3,4,6,8]] under5 = nums.partition {|x| x < 5 } # [[1,2,3,4],[5,6,7,8,9]] squares = nums.partition {|x| Math.sqrt(x).to_i**2 == x } # [[1,4,9],[2,3,5,6,7,8]]
If we wanted to partition into more than two groups, we'd have to write our own simple method for that. I will call this classify after the method in the Set class.
module Enumerable def classify(&block) hash = {} self.each do |x| result = block.call(x) (hash[result] ||= []) << x end hash end end nums = [1,2,3,4,5,6,7,8,9] mod3 = nums.classify {|x| x % 3 } # { 0=>[3,6,9], 1=>[1,4,7], 2=>[2,5,8] } words = %w[ area arboreal brick estrous clear donor ether filial patina ] vowels = words.classify {|x| x.count("aeiou") } # {1=>["brick"], 2=>["clear", "donor", "ether"], # 3=>["area", "estrous", "filial", "patina"], 4=>["arboreal"]} initials = words.classify {|x| x[0..0] } # {"a"=>["area", "arboreal"], "b"=>["brick"], "c"=>["clear"], # "d"=>["donor"], "p"=>["patina"], "e"=>["estrous", "ether"], # "f"=>["filial"]}
In every case we've seen so far, we iterate over a list a single item at a time. However, there might be times we want to grab these in pairs or triples or some other quantity.
The iterator each_slice takes a parameter n and iterates over that many elements at a time. (To use this, we need the enumerator library.) If there are not enough items left to form a slice, that slice will be smaller in size.
require 'enumerator' arr = [1,2,3,4,5,6,7,8,9,10] arr.each_slice(3) do |triple| puts triple.join(",") end # Output: # 1,2,3 # 4,5,6 # 7,8,9 # 10
There is also the possibility of iterating with a "sliding window" of the given size with the each_cons iterator. (If this name seems unintuitive, it is part of the heritage of Lisp.) In this case, the slices will always be the same size.
require 'enumerator' arr = [1,2,3,4,5,6,7,8,9,10] arr.each_cons(3) do |triple| puts triple.join(",") end # Output: # 1,2,3 # 2,3,4 # 3,4,5 # 4,5,6 # 5,6,7 # 6,7,8 # 7,8,9 # 8,9,10
Every enumerable can in theory be converted trivially to an array (by using to_a). For example, a hash results in a nested array of pairs:
hash = {1=>2, 3=>4, 5=>6} arr = hash.to_a # [[5, 6], [1, 2], [3, 4]]
The method enTRies is an alias for the to_a method.
If the set library has been required, there will also be a to_set method that works as expected. See section 9.1, "Working with Sets," for a discussion of sets.
require 'set' hash = {1=>2, 3=>4, 5=>6} set = hash.to_set # #<Set: {[1, 2], [3, 4], [5, 6]}>
An Enumerator object is basically a wrapper that turns an iterator method into a full-fledged Enumerable. After being wrapped in this way, it naturally has all the usual methods and features available to it.
In this contrived example, class Foo has an iterator but nothing else. In fact, the iterator itself does nothing but four yield operations. To further clarify how this works, the iterator is named every rather than each:
require 'enumerator' class Foo def every yield 3 yield 2 yield 1 yield 4 end end foo = Foo.new # Pass in the object and the iterator name... enum = Enumerable::Enumerator.new(foo,:every) enum.each {|x| p x } # Print out the items array = enum.to_a # [3,2,1,4] sorted = enum.sort # [1,2,3,4]
If this conversion seems puzzling to you, it is essentially the same as this:
enum = [] foo.every {|x| enum << x }
In the previous example, enum is a real array, not just an Enumerator object. So although there are subtle differences, this is another way to convert an object to an Enumerable.
If enumerator is required, Object will have an enum_for method. So the object instantiation in the first example could also be written more compactly:
enum = foo.enum_for(:every)
We've already seen that we can iterate over groups with each_slice and each_cons. As it turns out, there are special methods enum_slice and enum_cons that will create enumerator objects using these iterators (in effect transforming the iterator name to each). Bear in mind that Enumerable::Enumerator.new and enum_for can both take an optional list of arguments at the end. Here we use that fact to pass in the "window size" to the iterator:
array = [5,3,1,2] discrete = array.enum_slice(2) # Same as: Enumerable::Enumerator.new(array,:each_slice,2) overlap = array.enum_cons(2) # Same as: Enumerable::Enumerator.new(array,:each_cons,2) discrete.each {|x| puts x.join(",") } # Output: # 5,3 # 1,2 overlap.each {|x| puts x.join(",") } # Output: # 5,3 # 3,1 # 1,2
The idea of a generator is interesting. The normal Ruby iterator is an internal iterator; the iterator drives the logic by repeatedly calling the code block.
There is also an external iterator, where the code drives the logic, and the iterator provides data items "on demand" rather than on its own precise schedule.
By analogy, think of getline as providing an external iterator onto an IO object. You call it at will, and it provides you data. Contrast that with the internal iterator each_line, which simply passes each line in succession into the code block.
Sometimes internal iterators are not appropriate to the problem at hand. There is always a valid solution, but it may not always be convenient. Sometimes an external iterator is more convenient.
The generator library simply enables the conversion from an internal iterator to an external one. It provides an IO-like interface with methods such as next, rewind, and end?. Here's an example:
require 'generator' array = [7,8,9,10,11,12] gen = Generator.new(array) what = gen.current # 7 where = gen.index # 0 (same as pos) while gen.end? and gen.current < 11 gen.next end puts gen.current # 11 puts gen.next # 11 puts gen.index # 4 (index same as pos) puts gen.next? # true (next? same as end?) puts gen.next # 12 puts gen.next? # false
Note how we can "read" through the collection an item at a time at will, using one loop or multiple loops. The end? method detects an end of collection; the generator literally throws an EOFError if you ignore this. An alias for end? is next?.
The index method (alias pos) tells us our index or position in the collection. Naturally it is indexed from zero just like an array or file offset.
The current and next methods may be a little unintuitive. Imagine an implicit "get" done at the beginning so that the current item is the same as the next item. Obviously, next advances the pointer, whereas current does not.
Because many collections can only move forward by their nature, the generator behaves the same way. There is no prev method; in theory there could be, but it would not always apply. The rewind method will reset to the beginning if needed.
The real drawback to the generator library is that it is implemented with continuations. In all current versions of Ruby, these are computationally expensive, so large numbers of repetitions might expose the slowness.