Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 807 Vote(s) - 3.51 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How does Ruby's sort_by {rand} work?

#1
I think this is a great Ruby one-liner:

someArray.sort_by {rand}

It's concise, it's readable, and it works - but I don't quite understand how. Here's what I know:

1. `rand` evaluates to a number between 0 and 1 (like 0.783468632804653)
2. `rand` is being repeatedly evaluated in the code above, because assigning it to `x` first breaks the random sort
3. `sort_by {0.783468632804653}`, or any other number I tried, has no effect on the array

ruby-doc.org wasn't much help to me [in this case][1].

Can someone explain this step-by-step?

[1]:

[To see links please register here]


## Update

I've been using Ruby longer now, and I see that I was missing a concept or two here. The key thing is that:

1. `rand` is a method (defined on Kernel); it generates a random number
2. `{rand}` is a block, which `sort_by` keeps, calling it **each time** it wants to compare two items in the collection. If the collection is a bunch of objects representing countries, it needs to be able to grab two of them and determine which one comes first. Do you put the one with the longest name first? The one with the largest land mass? The block should answer that question by returning a value that says "you asked about Spain vs Cameroon, and I say Cameroon comes first." (You could do that with `{|country| country.name.length}`

The rest of how `sort_by` works is explained in the documentation. I'm still not quite sure why returning a random number works at all - presumably `sort_by` rounds it to -1, 0, or 1, whichever is closest? But in any case, getting a **different random number** every time you call the block is quite different from getting the **same number** every time. When `sort_by` says "which of these two countries comes first?", `{rand}` puts on a blindfold, turns around 10 times, points and says "that one!" :)
Reply

#2
In Ruby 1.8/1.9 both `sort` and `sort_by` are implemented in C, this is a rough equivalent of how this works:

Say you start with `[1,2,3,4]` and call `sort_by{rand}`:

1. (I invented some random numbers):

An array of tuples is created: `[[0.12232, 1],[0.53434, 2],[0.333, 3],[0.99, 4]]`

In roughly equivalent Ruby code this is: `[1,2,3,4].map{|x| [rand, x]}`

2. Ruby's quick sort is performed on the array based off the first element: (note the internal implementation is far from trivial and contains a ton of optimisations for already ordered arrays and such)

[[0.12232, 1],[0.333, 3],[0.53434, 2],[0.99, 4]]

In rough Ruby this step is: `ary.sort{|x,y| x[0] <=> y[0]}`

3. Pointers are copied from the new sorted array, to the correct position in the original array.

[1,3,2,4]

In rough Ruby this step is: `ary.map{|x,y| y}`

This technique is sometimes referred to as a "[Schwartzian Transform][1]". Caching means that the expensive operation is not performed more than N times. Meaning, this is a very efficient way of randomizing an array.

**Note**: `array.shuffle!` will be the most efficient built-in way to shuffle an array (in-place) since it uses a modern version of [Fisher-Yates][2]:

static VALUE
rb_ary_shuffle_bang(VALUE ary)
{
long i = RARRAY_LEN(ary);

rb_ary_modify(ary);
while (i) {
long j = rb_genrand_real()*i;
VALUE tmp = RARRAY_PTR(ary)[--i];
RARRAY_PTR(ary)[i] = RARRAY_PTR(ary)[j];
RARRAY_PTR(ary)[j] = tmp;
}
return ary;
}



[1]:

[To see links please register here]

[2]:

[To see links please register here]

Reply

#3
What `sort_by` does can be split up into two simple steps:

1. It calls the `map`/`collect` method on the provided array and with the *provided block*. In your case the result of it would be just an array of random numbers - let's call this intermediate array A1. Note, that it has the length of the initial array.

2. A1 gets sorted normally, but what is returned is not the sorted A1, but rather the original array, where the items are moved around the same way as the corresponding ones from A1, while it's being sorted!

That's how the following example works:

["Paulo", "Sergito", "Nick"].sort_by {|word| word.length}

It sorts the words by their length, because first the array of words is mapped into an array of lengths, and those lengths are then sorted, while the words in the original array are moved around correspondingly.
Reply

#4
`sort_by` is a refinement of `sort`, which is used like so:

people.sort do |person1, person2|
person1 <=> person2
end

The `sort` function yields to the block when it needs to know the order of two things, in this case, people. The block returns -1 if the left thing is less than the right thing, 0 if they are equal, and 1 if the right thing is larger than the left thing. The spaceship operator `<=>` has the wonderful property that it returns -1, 0 or +1, exactly what sort needs.

I haven't looked, but odds are good that Ruby is using the [quicksort][1] algorithm.

Some smart person noticed that we kept doing the same thing on the left side of the spaceship operator as we do on the right side, and came up with `sort_by`, used like so:

people.sort_by do |person|
person.name
end

Instead of the sort algorithm giving two objects to the block and letting the block compare them, the algorithm gives a single object to the block. The block then returns whatever attribute or value should be used to do the sort. Ruby remembers the value the block returned for each element, and comparing those values, knows what order to put things in. It's neat that you don't have to repeat yourself anymore.

Your shuffle code works by just "making stuff up" when the sort algorithm yields to the block. Instead of returning something sensible, the block yields a random value. This causes the sort algorithm to sort things, well, randomly.

[1]:

[To see links please register here]

Reply

#5
The block `rand` yields a key that is used in sorting. It's different each time it's evaluated, so you get a random order.

When you put a single number in there, it's the same each time, so the order doesn't change. That means the sorting algorithm is 'stable' - it doesn't move in-order entries.

And here's some even shorter, even clearer code:

someArray.shuffle
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through