Chapter 4: Arrays and Hashes
Up to now, we've generally been using objects one at a time. In this chapter we'll
find out how to create a list of objects. We'll start by looking at the most common
type of list structure - an array.
ARRAYS
What is an Array?
An Array is a sequential collection of items in which each item can be
indexed. In Ruby, (unlike many other languages) a single Array can
contain items of mixed data types such as strings, integers and floats
or even a method-call which returns some value:
a1 = [1,'two', 3.0, array_length( a0 ) ]
The first item in an array has the index 0, which means that the final
item has an index equal to the total number of items in the array mi-nus 1. Given the array, a1, shown above, this is how to obtain the
values of the first and last items:
a1[0] # returns 1st item (at index 0)
a1[3] # returns 4th item (at index 3)
We've already used arrays a few times - for example, in 2adventure.rb in chap-ter 2 we used an array to store a map of Rooms:
mymap = Map.new([room1,room2,room3])
CREATING ARRAYS
In common with many other programming languages, Ruby uses square brack-ets to delimit an array. You can easily create an array, fill it with some comma-delimited values and assign it to a variable:
arr = ['one','two','three','four']
As with most other things in Ruby, arrays are objects. They are defined, as you
might guess, by the Array class and, just like strings, they are indexed from 0.
You can reference an item in an array by placing its index between square
brackets. If the index is invalid, nil is returned:
arr = ['a', 'b', 'c']
puts(arr[0]) # shows "a"
puts(arr[1]) # shows "b"
puts(arr[2]) # shows "c"
puts(arr[3]) # nil
It is permissible to mix data types in an array and even to include expressions
which yield some value. Let's assume that you have already created this method:
def hello
return "hello world"
end
You can now declare this array:
x = [1+2, hello, `dir`]
Here, the first element is the integer, 3 and the second is the string “hello world”
(returned by the method hello). If you run this on Windows, the third array
element will be a string containing a directory listing. This is due to the fact that dir
is a back-quoted string which is executed by the operating system (see
Chapter 3). The final 'slot' in the array is, therefore, filled with the value returned
by the dir command which happens to be a string of file names. If you are
running on a different operating system, you may need to substitute an appro-priate command at this point.
Creating an Array of File Names
A number of Ruby classes have methods which return arrays of val-ues. For example, the Dir class, which is used to perform operations
on disk directories, has the entries method. Pass a directory name to
the method and it returns a list of files in an array:
Dir.entries( 'C:\\' ) # returns an array of files in C:\
If you want to create an array of single-quoted strings but can't be bothered
typing all the quotation marks, a shortcut is to put unquoted text separated by
spaces between round brackets preceded by %w like this (or use a capital %W for
double-quoted strings, as explained in Chapter 3):
y = %w( this is an array of strings )
You can also create arrays using the usual object construction method, new.
Optionally, you can pass an integer to new to create an empty array of a specific
size (with each element set to nil), or you can pass two arguments - the first to set
the size of the array and the second to specify the element to place at each index
of the array, like this:
a = Array.new # an empty array
a = Array.new(2) # [nil,nil]
a = Array.new(2,"hello world") # ["hello world","hello world"]
MULTI-DIMENSIONAL ARRAYS
To create a multi-dimensional array, you can create one array and then add other
arrays to each of its 'slots'. For example, this creates an array containing two
elements, each of which is itself an array of two elements:
a = Array.new(2)
a[0]= Array.new(2,'hello')
a[1]= Array.new(2,'world')
You can also create an Array object by passing an array as an argu-ment to the new method. Be careful, though. It is a quirk of Ruby that,
while it is legitimate to pass an array argument either with or without
enclosing round brackets, Ruby considers it a syntax error if you fail
to leave a space between the new method and the opening square
bracket -another good reason for making a firm habit of using
brackets when passing arguments!
It is also possible to nest arrays inside one another using square brackets. This
creates an array of four arrays, each of which contains four integers:
a = [ [1,2,3,4],
[5,6,7,8],
[9,10,11,12],
[13,14,15,16]
]
In the code shown above, I have placed the four 'sub-arrays' on separate lines.
This is not obligatory but it does help to clarify the structure of the multi-dimensional array by displaying each sub-array as though it were a row, similar
to the rows in a spreadsheet. When talking about arrays within arrays, it is
convenient to refer to each nested array as a 'row' of the 'outer' array.
For some more examples of using multi-dimensional arrays, load up the
multi_array.rb program. This starts by creating an array, multiarr, containing
two other arrays. The first of these arrays is at index 0 of multiarr and the second
is at index 1:
multiarr = [['one','two','three','four'],[1,2,3,4]]
ITERATING OVER ARRAYS
You can access the elements of an array by iterating over them using a for loop.
The loop will iterate over two elements here: namely, the two sub-arrays at index
0 and 1:
for i in multiarr
puts(i.inspect)
end
This displays:
["one", "two", "three", "four"]
[1, 2, 3, 4]
So, how do you iterate over the items (the strings and integers) in each of the two
sub-arrays? If there is a fixed number of items you could specify a different
iterator variable for each, in which case each variable will be assigned the value
from the matching array index.
Here we have four sub-array slots, so you could use four variables like this:
for (a,b,c,d) in multiarr
print("a=#{a}, b=#{b}, c=#{c}, d=#{d}\n" )
end
Iterators and for loops
The code inside a for loop is executed for each element in some ex-pression. The syntax can be summarized like this:
for <one or more variables> in <expression> do
<code to run>
end
When more than one variable is supplied, these are passed to the
code inside the for..end block just as you would pass arguments to a
method. Here, for example, you can think of (a,b,c,d) as four argu-ments which are initialised, at each turn through the for loop, by the
four values from a row of multiarr:
for (a,b,c,d) in multiarr
print("a=#{a}, b=#{b}, c=#{c}, d=#{d}\n" )
end
We'll be looking at for loops and other iterators in more depth in the
next chapter.
You could also use a for loop to iterate over all the items in each sub-array
individually:
for s in multiarr[0]
puts(s)
end
for s in multiarr[1]
puts(s)
end
Both of the above techniques (multiple iterator variables or multiple for loops)
have two requirements: a) that you know how many items there are either in the
'rows' or 'column's of the grid of arrays and b) that each sub array contains the
same number of items as each other.
For a more flexible way of iterating over multidimensional arrays you could use
nested for loops. An outer loop iterates over each row (subarray) and an inner
loop iterates over each item in the current row. This technique works even when
subarrays have varying numbers of items:
for row in multiarr
for item in row
puts(item)
end
end
INDEXING INTO ARRAYS
As with strings (see Chapter Three) , you can index from the end of an array using
minus figures, where -1 is the index of the last element; and you can also use
ranges:
arr = ['h','e','l','l','o',' ','w','o','r','l','d']
print( arr[0,5] ) #=> "hello"
print( arr[-5,5 ] ) #=> "world"
print( arr[0..4] ) #=> "hello"
print( arr[-5..-1] ) #=> "world"
Notice that, as with strings, when provide two integers in order to return a
number of contiguous items from an array, the first integer is the start index
while the second is a count of the number of items (not an index):
arr[0,5] # returns 5 chars - ["h", "e", "l", "l", "o"]
You can also make assignments by indexing into an array. Here, for example, I
first create an empty array then put items into indexes 0, 1 and 3. The 'empty'
slot at number 2 will be filled with a nil value:
arr = []
arr[0] = [0]
arr[1] = ["one"]
arr[3] = ["a", "b", "c"]
# arr now contains:
# [[0], ["one"], nil, ["a", "b", "c"]]
Once again, you can use start-end indexes, ranges and negative index values:
arr2 = ['h','e','l','l','o',' ','w','o','r','l','d']
arr2[0] = 'H'
arr2[2,2] = 'L', 'L'
arr2[4..6] = 'O','-','W'
arr2[-4,4] = 'a','l','d','o'
# arr2 now contains:
# ["H", "e", "L", "L", "O", "-", "W", "a", "l", "d", "o"]
COPYING ARRAYS
Note that when you use the assignment operator, =, to assign one array variable
to another variable, you are actually assigning a reference to the array itself - you
are not making a copy. You can use the clone method to make a new copy of the
array:
arr1=['h','e','l','l','o',' ','w','o','r','l','d']
arr2=arr1
# arr2 is now the same as arr1. Change arr1 and arr2 changes too!
arr3=arr1.clone
# arr3 is a copy of arr1. Change arr1 and arr2 is unaffected
TESTING ARRAYS FOR EQUALITY
A few words need to be said about the comparison operator <=>. This compares
two arrays - let's call them arr1 and arr2; it returns -1 if arr1 is less than arr2; it
returns 0 if arr1 and arr2 are equal; it returns 1 if arr2 is greater than arr1. But
how does Ruby determine if one array is 'greater than' or 'less than' another? It
turns out that it compares each item in one array with the corresponding item in
the other. When two values are not equal, the result of their comparison is
returned. In other words if this comparison were made:
[0,10,20] <=> [0,20,20]
the integer at index 1 is of lower value (10) in the first array than the integer at
the same index in the second (20).
If you are comparing arrays of strings, then comparisons are made on ASCII
values. If one array is longer than another and the elements in both arrays are all
equal, then the longer array is deemed to be 'greater'. However, if two such
arrays are compared and one of the elements in the shorter array is greater than
the corresponding element in the longer array, then the shorter array is deemed to
be greater.
SORTING ARRAYS
The sort method compares adjacent array elements using the comparison
operator <=>. This operator is defined for many Ruby classes, including Array,
String, Float, Date and Fixnum. The sort operator is not, however, defined for all
classes (that is to say that it is not defined for the Object class from which all
other classes are derived). One of the unfortunate consequences of this is that it
cannot be used to sort arrays containing nil values. It is, however, possible to get
around this limitation by defining your own sorting routine. This is done by
sending a block to the sort method. We'll look at blocks in detail in Chapter 10.
For now, it is enough to know that the block here is a chunk of code which
determines the comparison used by the sort method.
This is my sort routine:
arr.sort{
|a,b|
a.to_s <=> b.to_s
}
Here arr is an array object and the variables a and b represent two contiguous
array elements. I've converted each variable to a string using the to_s method;
this converts nil to an empty string which will be sorted 'low'. Note that, while
my sorting block defines the sort order of the array items, it does not change the
array items themselves. So nil will remain as nil and integers will remain as
integers. The string conversion is only used to implement the comparison, not to
change the array items.
COMPARING VALUES
The comparison 'operator' <=> (which is, in fact, a method) is defined in the
Ruby module named Comparable. For now, you can think of a module as a sort
of reusable 'code library'. We'll be looking more closely at modules in Chapter
12.
You can 'include' the Comparable module in your own classes. When this is
done, you can override the <=> method to enable you to define exactly how
comparisons will be made between specific types of object. For example, you
may want to subclass Array so that comparisons are made based purely on the
length of two Arrays rather than on the values of each item in the Array (which
is the default, as explained earlier). This is how to might do this:
class MyArray < Array
include Comparable
def <=> ( anotherArray )
self.length <=> anotherArray.length
end
end
Now, you can initialize two MyArray objects like this:
myarr1 = MyArray.new([0,1,2,3])
myarr2 = MyArray.new([1,2,3,4])
And you can use the <=> method defined in MyArray in order to make compari-sons:
# Two MyArray objects
myarr1 <=> myarr2 # returns 0
This returns 0 which indicates that the two arrays are equal (since our <=> method
evaluates equality according to length alone). If, on the other hand, we were to
initialise two standard Arrays with exactly the same integer values, the Array
class's own <=> method would perform the comparsion:
# Two Array objects
arr1 <=> arr2 # returns -1
Here -1 indicates that the first array evaluates to 'less than' the second array since
the Array class's <=> method compares the numerical values of each item in arr1
and these are less than the values of the items at the same indexes in arr2.
But what if you want to make 'less than', 'equal to' and 'greater than' compari-sons using the traditional programming notation:
< # less than
== # equal to
> # greater than
In the MyArray class, we can make comparisons of this sort without writing any
additional code. This is due to the fact that the Comparable module, which has
been included in the MyArray class, automatically supplies these three compari-son methods; each method makes its comparison based on the definition of the
<=> method. Since our <=> makes its evaluation based on the number of items in
an array, the < method evaluates to true when the first array is shorter than the
second, == evaluates to true when both arrays are of equal length and > evaluates
to true when the second array is longer than the first:
p( myarr1 < myarr2 ) #=> false
p( myarr1 == myarr2 ) #=> true
The standard Array, class, however, does not include the Comparable module so,
if you try to compare two ordinary arrays using <, == or >, Ruby will display an
error message telling you that the method is undefined.
It turns out that it is easy to add these three methods to a subclass of Array. All
you have to do is include Comparable, like this:
class Array2 < Array
include Comparable
end
The Array2 class will now perform its comparisons based on the <=> method of
Array - that is, by testing the values of the items stored in the array rather than
merely testing the length of the array. Assuming the Array2 objects, arr1 and
arr2, to be initialized with the same arrays which we previously used for myarr1
and myarr2, we would now see these results:
p( arr1 < arr2 ) #=> true
p( arr1 > arr2 ) #=> false
ARRAY METHODS
Several of the standard array methods modify the array itself rather than return-ing a modified copy of the array. These include not only those methods marked
with a terminating exclamation such as flatten! and compact! but also the
method « which modifies the array to its left by adding to it the array on its
right; the clear which removes all the elements from the array and delete and
delete_at remove selected elements.
HASHES
While arrays provide a good way of indexing a collection of items by number,
there may be times when it would be more convenient to index them in some
other way. If, for example, you were creating a collection of recipes, it would be
more meaningful to have each recipe indexed by name such as 'Rich Chocolate
Cake' and 'Coq au Vin' rather than by numbers: 23, 87 and so on.
Ruby has a class that lets you do just that. It's called a Hash. This is the equiva-lent of what some other languages call a Dictionary. Just like a real dictionary,
the entries are indexed by some unique key (in a dictionary, this would be a
word) which is associated with a value (in a dictionary, this would be the defini-tion of the word).
CREATING HASHES
Just like an array, you can create a hash by creating a new instance of the Hash
class:
h1 = Hash.new
h2 = Hash.new("Some kind of ring")
Both the examples above create an empty Hash. A Hash object always has a
default value - that is, a value that is returned when no specific value is found at
a given index. In these examples, h2 is initialized with the default value, 'Some
kind of ring'; h1 is not initialized with a value so its default value will be nil.
Having created a Hash object, you can add items to it using an array-like syntax
that is, by placing the index in square brackets and using = to assign a value.
The obvious difference here being that, with an array, the index (the 'key') must
be an integer; with a Hash, it can be any unique data item:
h2['treasure1'] = 'Silver ring'
h2['treasure2'] = 'Gold ring'
h2['treasure3'] = 'Ruby ring'
h2['treasure4'] = 'Sapphire ring'
Often, the key may be a number or, as in the code above, a string. In principle,
however, a key can be any type of object.
Unique Keys?
Take care when assigning keys to Hashes. If you use the same key
twice in a Hash, you will end up overwriting the original value. This
is just like assigning a value twice to the same index in an array. Con-sider this example:
h2['treasure1'] = 'Silver ring'
h2['treasure2'] = 'Gold ring'
h2['treasure3'] = 'Ruby ring'
h2['treasure1'] = 'Sapphire ring'
Here the key 'treasure1' has been used twice. As a consequence, the
original value, 'Silver ring' has been replaced by 'Sapphire ring', re-sulting in this Hash:
{"treasure1"=>"Sapphire ring", "treasure2"=>"Gold ring", "treas-ure3"=>"Ruby ring"}
Given some class, X, the following assignment is perfectly legal:
x1 = X.new('my Xobject')
h2[x1] = 'Diamond ring'
There is a shorthand way of creating Hashes and initializing them with key-value pairs. Just add a key followed by => and its associated value; each key-value pair should be separated by a comma and the whole lot placed inside a
pair of curly brackets:
h1 = { 'room1'=>'The Treasure Room',
'room2'=>'The Throne Room',
'loc1'=>'A Forest Glade',
'loc2'=>'A Mountain Stream' }
INDEXING INTO A HASH
To access a value, place its key between square brackets:
puts(h1['room2']) #=> "The Throne Room"
If you specify a key that does not exist, the default value is returned. Recall that
we have not specified a default value for h1 but we have for h2:
p(h1['unknown_room']) #=> nil
p(h2['unknown_treasure']) #=> 'Some kind of ring'
Use the default method to get the default value and the default= method to set
it (see Chapter 2 for more information on get and set 'accessor' methods):
p(h1.default)
h1.default = 'A mysterious place'
COPYING A HASH
As with an array, you can assign one Hash variable to another, in which case
both variables will refer to the same Hash and a change made using either
variable will affect that Hash:
h4 = h1
h4['room1']="A new Room'
puts(h1['room1']) #=> "A new Room"
If you want the two variables to refer to the same items in different Hash objects,
use the clone method to make a new copy:
h5 = h1.clone
h5['room1'] = 'An even newer Room'
puts(h1['room1']) #=> "A new room' (i.e. its value is unchanged)
SORTING A HASH
As with the Array class, you may find a slight problem with the sort method of
Hash. It expects to be dealing with keys of the same data type so if, for example,
you merge two arrays, one of which uses integer keys and another of which uses
strings, you won't be able to sort the merged Hash. The solution to this problem
is, as with Array, to write some code to perform a custom type of comparison
and pass this to the sort method. You might give it a method, like this:
def sorted_hash( aHash )
return aHash.sort{
|a,b|
a.to_s <=> b.to_s
}
end
This performs the sort based on the string representation (to_s) of each key in
the Hash. In fact, the Hash sort method converts the Hash to a nested array of
[key, value] arrays and sorts them using the Array sort method.
HASH METHODS
The Hash class has numerous built-in methods. For example, to delete an item
using its key ( someKey ) from a hash, aHash, use aHash.delete( someKey ). To
test if a key or value exists use aHash.has_key?( someKey ) and
aHash.has_value?( someValue ). To return a new hash created using the original
hash's values as keys, and its keys as values use aHash.invert; to return an array
populated with the hash's keys or with its values use aHash.keys and
aHash.values, and so on.
The hash_methods.rb program demonstrates a number of these methods.
Digging Deeper
TREATING HASHES AS ARRAYS
The keys and values methods of Hash each return an array so you can use
various Array methods to manipulate them. Here are a few simple examples:
h1 = {'key1'=>'val1', 'key2'=>'val2', 'key3'=>'val3', 'key4'=>'val4'}
h2 = {'key1'=>'val1', 'KEY_TWO'=>'val2', 'key3'=>'VALUE_3',
'key4'=>'val4'}
p( h1.keys & h2.keys ) # set intersection (keys)
#=> ["key1", "key3", "key4"]
p( h1.values & h2.values ) # set intersection (values)
#=> ["val1", "val2", "val4"]
p( h1.keys+h2.keys ) # concatenation
#=> [ "key1", "key2", "key3", "key4", "key1", "key3", "key4", "KEY_TWO"]
p( h1.values-h2.values ) # difference
#=> ["val3"]
p( (h1.keys << h2.keys) ) # append
#=> ["key1", "key2", "key3", "key4", ["key1", "key3", "key4", "KEY_TWO"]]
p( (h1.keys << h2.keys).flatten.reverse ) # "un-nest" arrays and reverse
#=> ["KEY_TWO", "key4", "key3", "key1", "key4", "key3", "key2", "key1"]
APPENDING AND CONCATENATING
Be careful to note the difference between concatenating using + to add the values
from the second array to the first and appending using « to add the second array
as the final element of the first:
a =[1,2,3]
b =[4,5,6]
c = a + b #=> c=[1, 2, 3, 4, 5, 6] a=[1, 2, 3]
a << b #=> a=[1, 2, 3, [4, 5, 6]]
In addition « modifies the first (the 'receiver') array whereas + returns a new
array but leaves the receiver array unchanged.
Receivers, Messages and Methods
In Object Oriented terminology, the object to which a method belongs
is called the receiver. The idea is that instead of 'calling functions' as
in procedural languages, 'messages' are sent to objects. For example,
the message + 1 might be sent to an integer object while the message
reverse might be sent to a string object. The object which 'receives' a
message tries to find a way (that is a 'method') of responding to the
message. A string object, for example, has a reverse method so is
able to respond to the reverse message whereas an integer object has
no such method so cannot respond.
If, after appending an array with « you decide that you'd like to add the ele-ments from the appended array to the receiver array rather than have the ap-pended array itself 'nested' inside the receiver, you can do this using the flatten
method:
a=[1, 2, 3, [4, 5, 6]]
a.flatten #=> [1, 2, 3, 4, 5, 6]
MATRICES AND VECTORS
Ruby provides the Matrix class which may contain rows and columns of values
each of which can be represented as a vector (Ruby also supplies a Vector class).
Matrices allow you to perform matrix arithmetic. For example, give two Matrix
objects, m1 and m2, you can add the values of each corresponding cell in the
matrices like this:
m3 = m1+m2
SETS
The Set class implements a collection of unordered values with no duplicates.
You can initialize a Set with an array of values in which case, duplicates are
ignored:
Examples:
s1 = Set.new( [1,2,3, 4,5,2] )
s2 = Set.new( [1,1,2,3,4,4,5,1] )
s3 = Set.new( [1,2,100] )
weekdays = Set.new( %w( Monday, Tuesday, Wednesday, Thursday,
Friday, Saturday, Sunday ) )
You can add new values using the add method:
s1.add( 1000 )
The merge method combines values of one Set with another:
s1.merge(s2)
You can use == to test for equality. Two sets which contain the same values
(remembering that duplicates will be removed when a Set is created) are consid-ered to be equal:
p( s1 == s2 ) #=> true