[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]
LINUX GAZETTE
...making Linux just a little more fun!
Programming in Ruby, part 2
By Hiran Ramankutty

Review

A wide variety of applications from different domains need different levels of organization. We have seen the fundamentals of Ruby in Part 1, and now we jump on to the next level of organization.

Regular Expressions

In Ruby, a regular expression is quoted by '/' as in Perl and awk rather than by quotation marks. Regular expressions have an efficient expressive power, whenever you deal with patterns (as in pattern matching). Also some methods convert a string into a regular expression.

print "abcdef" =~ /de/,"\n"
print "aaaaaa" =~ /d/,"\n"
^D
3
FALSE

The operator `=~' is a matching operator with respect to regular expressions. It returns the position in a string where a match was found, or nil if the pattern did not match. It is interesting to see that regular expressions share a particular kind of vocabulary as shown below:

   
	  [ ]     range specification. (e.g., [a-z] means a letter in range of from a to z)
          \w      letter or digit. same as [0-9A-Za-z_]
          \W      neither letter nor digit
          \s      blank character. same as [ \t\n\r\f]
          \S      non-space character.
          \d      digit character. same as [0-9].
          \D      non digit character.
          \b      word boundary (outside of range specification).
          \B      non word boundary.
          \b      back space (0x08) (inside of range specification)
          *       zero or more times repetition of followed expression
          +       one or more times repetition of followed expression
          {m,n}   at least n times, but not more than m times repetition
                  of followed expression
          ?       at least 0 times, but not more than 1 times repetition
                  of followed expression
          |       either preceding or next expression may match
          ( )     grouping

For example, `^f[a-z]+' means "f followed by repetition of letters in range from `a' to `z'. Now what if we want check whether a string fits a given description say for example: "Starts with lower case `f', which is immediately followed by exactly one upper case letter, and optionally more junk after that, as long as there are no more lower case characters. You will have to write a dozen lines in C, right? Admit it; you can hardly help yourself. In Ruby you just have to request the string to be matched with the regular expression /^f[A-Z](^[a-z])*$/. This ability of regular expressions in string matching is often used in UNIX environment, typical example is `grep'. Let us get acquainted with regular expressions. Consider the program given below:

 #Store this as regx.rb
 st = "\033[7m"
 en = "\033[m"
     
 while TRUE
	print "str> "
	STDOUT.flush
	str = gets
	break if not str
	str.chop!
	print "pat> "
	STDOUT.flush
	re = gets
	break if not re
	re.chop!
	str.gsub! re, "#{st}\\&#{en}"
	print str, "\n"
end
print "\n"
# Now run ruby regx.rb

The program requires inputs twice, once for a string and once for a regular expression. The test is performed for the string against the regular expression, and matched parts are highlighted in reverse video. Note that this requires an ANSI terminal since it uses reverse video escape sequences. Do not mind the details of the program.

str>foobar
pat>^fo+
foobar
~~~

We see that foo is reversed. Note that ``~~~'' is just for text-based browsers. We shall experiment with different inputs.

str>asd987wonew06521
pat>\d
asd987wonew06521
   ~~~     ~~~~~
str>foozboozer
pat>f.*z
foozboozer
~~~~~~~~

Note that foozbooz is matched and not fooz. This is because here the regular expression matches the longest possible substring. First glance interpretation is difficult. Try this:

str> Wed Feb  7 08:58:04 JST 1996
pat> [0-9]+:[0-9]+(:[0-9]+)?
Wed Feb  7 08:58:04 JST 1996
           ~~~~~~~~

Now try to represent a hexadecimal number using regular expressions. (for example: 0x123af00c as well as 0Xbc13590ae are hexadecimal numbers)

def chab(s)   # "contains hex in angle brackets"
	(s =~ /<0(x|X)(\d|[a-f]|[A-F])+>/) != nil
end
print chab "Not this one."
print "\n",chab "Maybe this? {0x35}" # use of wrong kind of brackets
print "\n",chab "Or this? <0x38z7e>" # Is this a HEX number
print "\n",chab "Okay, this: <0xfc0004>."
print "\n"
^D
false
false
false
true

Iterators

Iterator means "one which does the same thing many times". Consider the C code given below:

char *str;
for (str = "abcdefg"; *str != '\0'; str++) {
  /* process a character here */
}

Note the abstraction provided by C's for(...) syntax to create loops, but in fact, the programmer has to know the internal structure of a string to test *str with the null character.

Flexible support for iteration is one of the few features that mark a high-level language. Consider the following shell script (/bin/sh):

for i in *.[ch]; do
  # ...  something to do for each file
done

All the C source and header files in the current directory are, processed, and the command shell handles the details of picking up and substituting file names one by one. Isn't this working at a higher level than C? What do you think ?

Considering the fact that, it is fine to provide iterators in a programming language for built-in data types, but it is a disappointment if we have to write low-level loops to iterate our dat types. In OOP, this can be a serious problem, since users often define one data type after another.

To solve above matters, every OOP language has elaborate ways to make iterations easy, for example some languages provide class controlling iteration, etc. On the other hand, ruby allows us to define control structures directly. In term of ruby, such user-defined control structures are called iterators.

Let us see few examples:

"abc".each_byte{|c| printf "%c\n", c}
^D
a
b
c

Here, each_byte is an iterator for each character in the string. A local variable `c' is being used here, and each character is being substituted into it. This can be translated into something that looks a lot like C code ...

s="abc"
i=0
while i < s.length
	printf "%c\n",s[i]
	i+=1
end
^D
a
b
c

... however, the each_byte iterator is simpler conceptually and is more likely to continuously work even if the string class happens to be radically modified in the near future. One benefit of the iterators is their tendency of robustness in the face of such changes, and I think that is a characteristic of a good code.

Another iterator of string is each_line.

"a\nb\nc\n".each_line{|l| print l}
^D
a
b
c

Every irksome task like finding delimiters for lines, generating sub strings etc. are undertaken by iterators.

Now, let's rewrite this example with for statement.

for l in "a\nb\nc\n"
        print l
end
^D
a
b
c

The for statement does iteration by way of an each iterator. String's each works the same as each_line as seen in the previous example.

Current iteration can be done or retried again from the top, by using a control structure `retry' in conjunction with an iterated loop. See below:

c = 0
for i in 0..4
	print i
     if i==2 and c==0
		c = 1
          print "\n"
          retry
     end
end
^D
012
01234

The definition of an iterator may have an occurrence of `yield', which moves control to the block of code that is passed to the iterator (we will see more of this later). The example below defines the iterator repeat, which repeats a block of code the number of times specified in an argument.

def repeat(num)
	while num < num
		yield
		num-=1
	end
end
repeat(4) {print "hello world\n"}
^D
hello world
hello world
hello world

If it is not clear, then, print the value of num before and after the occurrence of `yield'.

With `retry' one can define an iterator which works the same as `while', but it is not practical due to slowness.

def MYWHILE(cond)
	return if not cond
	yield
	retry
end
i = 0
MYWHILE(i<3) {print i,"\n" ;i+=1}
^D
0
1
2

By now, I hope you must have got an idea about iterators. There are a few restrictions, but you can write your original iterators; and in fact, whenever you define a new data type, it is often convenient to define suitable iterators to go with it. In this sense this, the above examples `repeat() and `MYWHILE()' are not very useful. We will talk about practical iterators after we have a better understanding of what classes are.

Object Oriented Thinking

`Object Oriented' is indeed a very catchy phrase. Ruby claims to be an object oriented scripting language; but what does 'object oriented' exactly mean?

There have been a variety of answers to that question, all of which probably boil down to about the same thing. Before arguing and summing our definitions too quickly, let's think for a moment about the traditional programming paradigm.

Traditionally, a programming problem is attacked by coming up with some kinds of data representations, and procedures that operate on that data. We can associate terms inert, passive,and helpless with `data' under this model and that the data sits completely at the mercy of a large procedural body with which we associate terms active, logical, and all-powerful.

The problem with this approach is that programs are written by programmers, who are only human and can only keep so much detail clear in their heads at any one time. As a project gets larger, its procedural core grows to the point where it is difficult to remember how the whole thing works. Minor lapses of thinking and typographical errors become more likely to result in well-concealed bugs. Complex and unintended interactions begin to emerge within the procedural core, and maintaining it becomes like trying to carry around an angry squid without letting any tentacles touch your face. There are guidelines for programming that can help to minimize and localize bugs within this traditional paradigm, but there is a better solution that involves fundamentally changing the way we work.

What object-oriented does is to let us delegate most of he mundane and repetitive logical work to the data itself; we can then change our concept of data from passive to active. Put another way,

What is described above as a "machine" may be very simple or complex on the inside; we can't tell from the outside, and we don't allow ourselves to open up the machine (except when we are absolutely sure something is wrong with its design), so we are required to do things like flip the switches and read the dials to interact with the data. Once the machine is built, we don't want to have to think about how it operates.

You might think we are just making more work for ourselves,but this approach tends to do a nice job of preventing all kinds of things from going wrong.

Let's start with an example that is to simple to be of practical value, but should illustrate at least part of the concept. My 2-wheeler has a trip meter. Its job is to keep track of the distance it has travelled since the last time its reset button was pushed. How would we model this in a programming language? In C, the trip meter would just be a numeric variable, possibly of type float. The program would manipulate the variable by increasing its value in small increments, with occasional resets to zero when appropriate. What's wrong with that? A bug in the program would assign a bogus value to the variable, for any number of unexpected reasons. Anyone who has programmed in C knows what it is like to spend hours or days tracking down such a bug whose cause seems absurdly simple once it has been found. (The moment of finding the bug is commonly indicated by the sound of a loud slap to the forehead.)

In object-oriented context, the same problem can be attacked in a different manner. A programmer designing a trip meter is supposed not to ask "which of the familiar data-types comes closest to resembling the thing" but instead be interested in "how exactly is this thing supposed to act?" The difference winds up being a profound one. It is necessary to spend a little bit of time deciding exactly what an odometer is for, and how the outside world expects to interact with it. We decide to build a little machine with controls that allows us to increment it, reset it, read its value, and nothing else.

We don't provide a way for a trip meter to be assigned arbitrary values; why? because we all know trip meters don't work that way. There are only a few things you should be able to do with a trip meter, and those are all we allow. Thus, if something else in the program mistakenly tries to place some other value (say, the target temperature of the vehicle's climate control) into the trip meter, there is an immediate indication of what went wrong. We are told when running the program (or possibly while compiling, depending on the nature of the language) that we are not allowed to assign arbitrary values to Trip meter objects. The message might not be exactly that clear, but it will be reasonably close to that. It doesn't prevent the error, does it? But it quickly points us in the direction of the cause. This is only one of several ways in which OO programming can save a lot of wasted time.

We commonly take one step of abstraction above this, because it turns out to be as easy to build a factory that makes machines as it is to make an individual machine. We aren't likely to build a single trip meter directly; rather, we arrange for any number of trip meters to be built from a single pattern. The pattern (or if you like, the trip meter factory) corresponds to what we call a class, and an factory) corresponds to an object. Most OO languages require a class to be defined before we can have a new kind of object, but ruby does not.

I would like to emphasize on the fact that the use of an OO language will not enforce proper OO design. Indeed it is possible in any language to write code that is unclear, sloppy, ill-conceived, buggy, and wobbly all over. What ruby does for you (as opposed, especially, to C++) is to make the practice of OO programming feel natural enough that even when you are working on a small scale you don't feel a necessity to resort to ugly code to save effort. We will be discussing the ways in which ruby accomplishes that admirable goal as this guide progresses; the next topic will be the "switches and dials" (object methods) and from there we'll move on to the "factories" (classes). Are you still with us?


Copyright © 2002, Hiran Ramankutty. Copying license http://www.linuxgazette.net/copying.html
Published in Issue 83 of Linux Gazette, October 2002

[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]