Extending Ruby: The Sorted Array

2

I’ve been working on my personal projects and I came across a problem with the ruby Array class. I have an array containing approximately 500 elements. I need to find each element approximately 500 times ( O(n^2) as the array search is linear ). As you can guess, it takes about 0.5 seconds for all these searches. I figured, if the linear time is 0.5, the binary search taking log_2(n) should take significantly less time to perform.

Attempt #1: A ruby Sorted Array

I first attempted to make a sorted array class in pure ruby. While creation of the class was easy, it was not as efficient as hoped. As a matter of fact, according to my tests, the original ruby Array’s find_index was twice as fast as mine on average. In fact, in order for my set to be as efficient as the original Array, the value being sought had to be over 4/5 the way into the array. Only after that point did my SortedArray overtake the Array in speed. Clearly, this was not the way to go.

After several attempts at re-working and simplifying the loop, I decided that the only reason the Array was so fast, was because it was written natively in C, not ruby.

Attempt #2: A native C ruby Sorted Array

Most of the guiding code to the Sorted Array is from the Pragmatic Ruby Programmer’s Guide: Extending Ruby.

The C-file

First, we start with a new project tree, a-la SVN (this isn’t too terribly important, but I prefer it). Next, we create the first file: sorted_array.c.

In it I included the required ruby file: ruby.h. Next, I went to the subversion repository for the Array code. I’m using the 1.9.0.2 tag.

Take a good look at array.c. I’s packed full of goodness. Unfortunately, I’ll need to copy most of the code so that it supports all the functions we’re used to with the standard Array (though, some cannot be used with this version, such as insert).

There, we now have a fully functional SortedArray that doesn’t sort. It’s a good start.

Turning a normal Array into a SortedArray

The standard Array and the SortedArray are essentially the same. The exception is that all insertions should be made in such a fashion as to preserve the ordering of the elements, that a special constructor be provided to accept un-natural sorting functions, and that searches can be optimized to take advantage of the knowledge of the arrangement of data.

The c-version of the Array has 3 major components in the data structure: length (len), ptr (the start of the array in memory), and aux. Aux has lots of things that support the array, such as the capacity. Keep these in mind when mucking with the internals.

Remap the names

We can’t just use the code as is. Let’s rename all the functions from "rb_ary.

" to "rb_sorted_ary_.
". With vim, I do: :%s/rb_ary_/rb_sorted_ary_/g. Poof, all renamed.

Enter the Constructors

One of the constructors takes a variable argument list as the 2nd and above parameters. We need to correct this as it simply inserts values. We need to push each individual element. Yes, this will slow down the construction of these types of arrays, but again, we’re banking on searching being the primary operation, not construction.

My version of array.c (1.9.0.2) looks like this:

VALUE
rb_ary_new3(long n, ...)
{
    va_list ar;
    VALUE ary;
    long i;

    ary = rb_ary_new2(n);

    va_start(ar, n);
    for (i=0; i<n; i++) {
	RARRAY_PTR(ary)[i] = va_arg(ar, VALUE);
    }
    va_end(ar);

    RARRAY(ary)->len = n;
    return ary;
}

It needs a little bit of changing. Let’s change: RARRAY_PTR(ary) line to: “rb_sorted_ary_push(ary,va_arg(ar,VALUE));” That should fix it’s wagon. It’s a variable array, so we need to use the c-library variable argument accessors: va_arg and cast it to Ruby’s VALUE type. After that, we just push it to the array, one at a time.

As you may have guessed, I’ve decided to make the primary changes to push and delegating all insertion to that method. We’ll get to this later, however.

Next, we tackle the new4 function. This is the “copy-constructor” if you’re used to C++. Essentially, it is expecting an array passed in as the second parameter (“elts” in my version). The C-code makes a straight copy form a C-array (not a ruby Array!). However, this is not possible if the incoming array is not sorted. We need to insert them one at a time, inserting each. Originally, it looks like this:

VALUE
rb_sorted_ary_new4(long n, const VALUE *elts)
{
    VALUE ary;

    ary = rb_sorted_ary_new2(n);
    if (n > 0 && elts) {
	MEMCPY(RARRAY_PTR(ary), elts, VALUE, n);
	RARRAY(ary)->len = n;
    }

    return ary;
}

The if statement is assuring that the incoming array is not empty or null (nil). That’s a good check and we’ll keep it. But we need to augment it. We know how long the array is, so let’s just iterate through each and push the values onto the array. I threw away the MEMCPY and replaced it with:

for(long i = 0; i < n; ++i) {
  rb_sorted_ary_push( ary, elts[i] );
}

That does it for the constructors (for now, we still need to add a way to provide an unnatural sorting Proc call, but we’ll come back to this later after a bit of testing).

ary_make_* and ary_.* functions

There are two functions that do not adhere to the rb_ary_.* syntax: ary_make_shared and ary_make_hash. Stick a “sorted_” in front of those definitions and in every place they are used.

Do the same with ary_.* functions (such as ary_new and ary_alloc and ary_shared_first). You can do this by simply using vim to search for “\<ary_” and it will locate any ary_ that has a space or nothing before it (starts a “word”).

to_ary

Keep them as is. I see no need to convert to a sorted array, as we may need to add a parameter to take in a custom sort function (which we’re coming back to later).

Initialize

We now hit the initialize function. It has 4 forms:

  1. Array.new(size=0,obj=nil)
  2. Array.new(array)
  3. Array.new(size) {|index| block}

The first is easy: create an array of size, size, and then fill it with objects given in obj. This version requires no sorted, as all objects will be the same. The second version requires a bit of changing. We need to push each value of the incoming array one at a time to ensure consistency. Jump down to rb_sorted_ary_replace.

The Replacement Detour: before fixing initialize

VALUE
rb_sorted_ary_replace(VALUE copy, VALUE orig)
{
    VALUE shared;
    VALUE *ptr;

    orig = to_ary(orig);
    rb_sorted_ary_modify_check(copy);
    if (copy == orig) return copy;
    shared = sorted_ary_make_shared(orig);
    if (!ARY_SHARED_P(copy)) {
	ptr = RARRAY(copy)->ptr;
	xfree(ptr);
    }
    RARRAY(copy)->ptr = RARRAY(orig)->ptr;
    RARRAY(copy)->len = RARRAY(orig)->len;
    RARRAY(copy)->aux.shared = shared;
    FL_SET(copy, ELTS_SHARED);

    return copy;
}

They simply make a copy of the array’s pointer and declare the source array to be shared. Since we’re not sharing the array, there’s no need to do this anymore, yank out the make_shared line, the line at sets the aux.shared to shared, and the FL_SET. Next, we’ll yank out the pointer and length copying. Since we’re replacing our contents, we need to drop the old contents. Simply drop that if-statement around the xfree(ptr); line, and that will be taken care of. We’ll convert it to a for-loop, iterating through the original, don’t forget to use Ruby’s handy macros! To save some work, I’ll re-use the existing “constructor:”

VALUE
rb_sorted_ary_replace(VALUE copy, VALUE orig)
{
    VALUE shared;
    VALUE *ptr;

    orig = to_ary(orig);
    rb_sorted_ary_modify_check(copy);
    if (copy == orig) return copy;
		ptr = RARRAY(copy)->ptr;
		xfree(ptr);
		// create a new array
		long len = RARRAY_LEN(orig);
		if( len == 0 ) len++;
		RARRAY(copy)->ptr = ALLOC_N(VALUE, len);
		RARRAY(copy)->len = 0;
		RARRAY(copy)->aux.capa = len;
		for( int i = 0; i < RARRAY_LEN(orig); ++i ) {
			rb_sorted_ary_push(copy,RARRAY(orig)->ptr[i]);
		}

    return copy;
}

OK, this is probably the most confusing part so far. Why did we throw away the shared? Ruby cheats because instead of making a true, deep copy of the array it’s making a copy of, it declares the contents of the array as “shared” and simply copies pointers. We can’t do that as we need to re-arrange our data. So that’s why I yanked all the code related to sharing. So now, each object remains unlinked as far as the garbage collector is concerned. Yes, this does make things slower, and yes, does double the memory usage. But we can’t modify the incoming array as it may not be intended to be sorted. So we’re stuck. Because we’re doing a replacement, it’s best to just start from scratch. We do this by freeing up the c-array pointer, using xfree. Next, we get the length of the original array. We then use this to size-up our copy. Why do I add 1 in the case that the length is zero? Because, allocating an array of size 0 will cause C to throw a fit. So, we avoid that mess entirely by creating a starting array with 1 place to store things, but it is empty. We create the appropriately sized, internal c-array. We then set the size of the array to zero to spoof a blank array. We then set the capacity, to tell the Array that it need not resize itself when pushing the next set of items, one at a time.

When all items are added, the array will have the correct length (no longer zero, if the original wasn’t empty).

Back to Initialize to finish the last case

Back to the “initializer,” we now need to take care of the strange case that uses the block to initialize the array, based on the index. We cannot assume that the values created by this block will always be in order. So, after each generation (rb_yield), we need to push them onto the array as usual. The line looks like: “rb_sorted_array_store(ary, i, rb_yeild(LONG2NUM))”. Simply yank that line, and the line incrementing the length (as the length will be modified by the push function). Replace those lines with: “rb_ary_push(ary, rb_yield(LONG2NUM));” That should fix initialize.

Next, the rb_sorted_ary_s_create needs to be changed to, you guessed it, use push. Drop the length and the MEMCPY and insert the for loop, with the push call:

static VALUE
rb_sorted_ary_s_create(int argc, VALUE *argv, VALUE klass)
{
    VALUE ary = sorted_ary_alloc(klass);

    if (argc < 0) {
	rb_raise(rb_eArgError, "negative array size");
    }
    RARRAY(ary)->ptr = ALLOC_N(VALUE, argc);
    RARRAY(ary)->aux.capa = argc;
		for( int i = 0; i < argc; ++i ) {
			rb_sorted_ary_push(ary, argv[i] );
		}

    return ary;
}

Guessing an index

Next, we need to fix up push to stop simply appending. But before that, we need to create a special function to guess an index. Why guess? Because the value may not yet exist, but if it did, this is the index where it would be located in the array. This will be the basic function for both searching, deleting, and pushing.

I’ll call my new function: rb_sorted_ary_guess_index(VALUE ary, VALUE item). Not very creative, but very descriptive. Here’s the problem though: we want to be able to use both natural, and unnatural sorting algorithms. And, either is fine, so long as the ordering is monotonic (goes one-way and is predictable). Before we get into that, let’s just assume we’re using natural ordering, for simplicity. I wrote the following code:


Head up to sorted_array_alloc. Hmmm…. We need to add something to our arrays! We need to store the optional sort-by block!

The Sort-by block

Create a new file called “sorted_array.h”. Create the data structure:

#include "ruby.h"

struct RSArray {
	struct RBasic basic;
	long len;
	union {
		long capa;
		VALUE shared;
	} aux;
	VALUE *ptr;
	ID cmp;
};

This is an exact duplicate of the RArray in ruby.h. The only exception is the additional ID *cmp, our comparison function. Be sure you’ve included this structure file in the sorted_array.c file.

Back in the sorted_array.c file, make sure you cap the new cmp variable with a NULL in sorted_ary_alloc: ary→cmp = NULL; Just put it under the aux.capa assignment. Now, we need to “fix” the sort_2 function to fall-back to the default comparison function, but to prefer a specified function if given. Let’s create a class-specific sort function called: sorted_array_sort_3.


Comments

  1. paddor said 8 months later:

    And what’s the performance gain now? :-)

  2. Christopher Wojno said 8 months later:

    Hmmm… I guess I never finished this article. There is a DRAMATIC performance improvement. It’s not only faster than the Array’s find, but it’s MUCH faster. Sorry I don’t have any numbers. I did the testing so long ago I don’t really remember them specifically. I do know that I was very impressed with the results on arrays as small as 500 elements and pushed it well above that.

    I ultimately settled on a binary search mechanism. You can probably guess that insertions and deletions are REALLY slow compared to the other Array. But I wanted lookup speed.

    As a side note, I suppose the best option is to have a few classes: Array (unsorted) and SortedArray (sorted). To create a sorted array, you push everything onto an array, then call sort and that yields a SortedArray. Any functions called on it should take advantage of the ordering. Insertions will be slower, but finds faster. Naturally, SortedArray should have a function called .as_unsorted or something that converts the class back to an unsorted array (it doesn’t scramble the order, but future calls will not assume order is preserved). Any additional insertions or deletions can be done quickly. When you’re done, just re-convert it to a SortedArray for some fun. That’s just a thought though.

(leave url/email »)

   Comment Markup Help Preview comment