Programming

TIL: Ruby Classes that Look Callable

Posted in Programming on October 18th, 2011 by Aviv Ben-Yosef – 4 Comments

One of the concept I had to get used to moving from Python to Ruby was that regular objects aren’t callable, and that there was a closed set of objects that can be called. Meaning that where in Python it was possible for any class to implement __call__ and so allow us to call it with obj(), Ruby doesn’t allow this. One of the advantages of that syntax in Python is that each class implements its constructor using this. For example:

class MyClass:
    def __init__(self, value):
        self.value = value

my_class = MyClass(1) # We are calling the class to get
                      # an instance, instead of
                      # MyClass.new(1) in Ruby

This was a nice little trick I liked in Python but quickly got used to living without it. That was until I saw Ruby code that seemed to allow the exact same behavior:

Integer.class
#=> Class

Integer(1)
#=> 1

How’s this so? Can we really make classes callable? A quick glance at Integer’s source code in the Rubinius code reveals that there’s no magic going on in it, and that it actually has no reference for this method I’m looking to call. Instead what we’ll see is that alongside the class definition there’s also a method definition:

class Integer
  #...
end

def Integer(value)
  #...
end
view raw integer.rb This Gist brought to you by GitHub.

So the whole trick is simply to define both. But how exactly does this work? How are names not clashing?

What actually happens is that whenever we define a new class or module, its name is added as a constant that points to the actual class. Similarly, when we define a method at the top level it’s added as a private method to Object. That means that whenever we type in a name that looks like a constant (starts with a capital letter) without parenthesis, Ruby will search for that constant:

Object.const_defined? :Integer
#=> true

But when we add parenthesis, Ruby understands that it should seek for a method instead:

Object.private_method_defined? :Integer
#=> true

This nifty little trick is all it takes for Ruby to allow this nice syntax.

Hope you learned a new thing! In case you want to dig deeper, two great books that really helped me wrap my head around dark corners of Ruby are Eloquent Ruby and Metaprogramming Ruby.

You should subscribe to my feed and follow me on twitter!

Submitting your first patch to Rubinius

Posted in Programming on October 11th, 2011 by Aviv Ben-Yosef – 1 Comment

I always love helping interesting open source projects, and Rubinius is one of those great projects that are very cool to play with. In case you don’t know it, Rubinius is a Ruby implementation written (almost) entirely in Ruby. Just playing with such a code base is quite interesting and whenever a peek around in the code I learn new stuff about Ruby.

At the moment, the people at Rubinius are working hard on making it compatible with ruby 1.9, and so there are a lot of easy changes that are waiting for you to do and start contributing. I’d like to show you a quick walk-through of how to find such simple tasks and get started.

Setup

Clone the project from the GitHub repo. Once that’s done, to make sure that everything works properly do this:

./configure
rake spec

The specs should be all passing on your machine. It will take a few minutes the first time, but afterwards whenever you make small changes it will be faster.

Finding interesting work

Of course you can submit whatever patch you find interesting, but in my opinion a quick way to get started is to find incompatibilities with 1.9. Fortunately for you, it’s pretty easy to find those.

Rubinius, along with the other Ruby implementations, uses mspec in order to have written specs of the language written in Ruby and is checked against that. These specs are similar to RSpec. Among other options, some specs are simply marked as having to pass only on Ruby 1.9 and of these, those that are currently failing are our hunt.

I came up with this command in order to find and execute such 1.9 specs that were last reported by Rubinius developers to be failing:

bin/mspec tag --list fails -tx19 :ci_files

This command will list the RubySpecs that are tagged as failing on Rubinius in 1.9 mode.

You should see plenty (at the time of this writing, over 500) of failing specs. Just pick something that seems easy enough to get started with.

Once you spot a spec that looks interesting you can run it specifically and see the code. For example, if you see an interesting spec for String#squeeze, you can run it with:

bin/mspec -tx19 spec/ruby/core/string/squeeze_spec.rb

Doing some work

For example, let’s look at one of the really simple specs I decided to get passing, you can see the commit here. I wanted to make a simple change to the String#ord method, but only on 1.9 version. The way to do that on Rubinius is that many of the files, say string.rb now have also “string18.rb” and “string19.rb” that contain the code that differs. In my case, I just made a simple change to the version used on 1.9 by editing the ord method on the string19.rb file (in case the 19 and 18 files don’t exist yet, you can simply create them like shown here).

After you’ve made your changes, be sure to run the specs again and see that everything works. Before submitting it, you should make sure to run all specs thoroughly using the command rake spec. If all is well, just do the regular GitHub pull-request dance and off you go!

Further than that, you can include in your pull request another commit that removes the failing tags from the specs you’ve just fixed. Find the appropriate file and just remove it, as you can see in this commit.

For some more in depth review of how to start contributing to Rubinius, see this excellent post on the official blog.

You should subscribe to my feed and follow me on twitter!

When being idiomatic wears you out

Posted in Programming on August 27th, 2011 by Aviv Ben-Yosef – 13 Comments

I believe that when learning a new programming language, it’s really important to learn its idioms and use them. I’ve written procedural C-like code in Java, and bloated Java-like code in Python, but only once you start using a language “like it was meant to” can you really say you’ve started mastering it. Had I not read Effective Java I don’t think I could have ever written a sensible line in this language.

I practically cringe whenever I see someone creating a new list in Java and then adding to it a single element when he just could have used Collections.singletonList(element). I’m that kind of a fanatic.

But, lately I’m getting worn out of being verbose. Yes, you can use the trick above to save a line of code and a lot of typing, but damn it – I just want to say [element]!

Less than a month into BillGuard we realized we don’t want to do all of our coding in Java and started calling Python code from Java (not in the JVM though, since Jython just doesn’t seem solid enough). Running away from Java’s notoriously long idioms, we preferred adding the overhead of having multiple programming languages in one project (which I think justified itself plenty, but it is an overhead).

This solution helped us when doing big stuff we didn’t want to do in Java, stuff that we’d represent in a unique class. But the smaller stuff just kept nagging us. We kept finding ourselves writing 10-15 lines of code to do something we thought trivial and then putting a 1-2 lines of comments before it saying what we actually meant in Python. These eventually lead to a lot of extracted methods which are generally good, but rarely would I extract such logic in Python/Ruby – where it would be a single concise line of code.

Lately, we started toying with just saying “screw the idioms” and doing what feels right. If that means having a JavaSucksUtils class with methods such as zip() and defaultdict_int() so be it. I think that with time this will lead to using a wholly different language in the JVM mostly, but in the mean time this seems to be a nice transition.

I mean, common:

// Why write this..
public static <T, V> Map<T, List<V>> defaultdict_list() {
    return new MapMaker().makeComputingMap(new Function<T, List<V>>() {
        @Override public List<V> apply(T unusedCrap) {
            return Lists.newArrayList();
        }
    });
}

# When you just want this (Python)
defaultdict(list)

# Or this (Ruby)
Hash.new {|h,k| h[k] = []}
view raw gistfile1.txt This Gist brought to you by GitHub.

Now we’ll have to wait and see where this gets us.

You should subscribe to my feed and follow me on twitter!

Guest Post: Lookup Tables with Ruby-on-Rails

Posted in Programming on August 9th, 2011 by Aviv Ben-Yosef – Be the first to comment

This is a guest-post by Nimrod Priell (@nimrodpriell) about the kind of time-saving tricks that I’m amazed are so easy to pull off in Rails

If you want to have an ActiveRecord macro to define memory-cached, dynamically growing, normalized lookup tables for entity ‘type’-like objects, read along. Or in plain English – if you want to have a table containing, say, ProductTypes which can grow with new types simply when you refer to them, and not keep the Product table containing a thousand repeating ‘type=”book”‘ entries – and gain some insight into ruby metaprogramming techniques – sit down and try to follow through.

A normalized DB means that you want to keep types as separate tables, with foreign keys pointing from your main entity to its type. For instance, instead of:

ID car_name car_type
1 Chevrolet Aveo Compact
2 Ford Fiesta Compact
3 BMW Z-5 Sports

You want to have two tables:

ID car_name car_type_id
1 Chevrolet Aveo 1
2 Ford Fiesta 1
3 BMW Z-5 2

And

car_type_id car_type_name
1 Compact
2 Sports

The pros/cons of a normalized DB can be discussed elsewhere. I’d just point out a denormalized solution is most useful in settings like column oriented DBMSes. For the rest of us folks using standard databases, we usually want to use lookups.

The usual way to do this with ruby on rails is:

  • Generate a CarType model using rails generate model CarType name:string
  • Link between CarType and Car tables using belongs_to and has_many

Then to work with this you can transparently read the car type:

car = Car.first
car.car_type.name # returns "Compact"

Ruby does an awesome job of caching the results for you, so that you’ll probably not hit the DB every time you get the same car type from different car objects.

You can even make this shorter, by defining a delegate to car_type_name from CarType:

# car_type_name.rb
delegate :name, :to => :car, :prefix => true

And now you can access this as

# car_type.rb
car.car_type_name

However, it’s less pleasant to insert with this technique:

car.car_type.car_type_name = "Sports"
car.car_type.save!
#Now let's see what happened to the OTHER compact car
Car.all.second.car_type_name #Oops, returns "Sports"

Right, what are we doing? We should’ve used

car.update_attributes(car_type: CarType.find_or_create_by_name(name: "Sports"))

Okay. Probably want to shove that into its own method rather than have this repeated in the code several times. But you also need a helper method for creating cars that way…

Furthermore, ruby is good about caching, but it caches by the exact query used, and the cache expires after the controller action ends. You can configure more advanced caches, perhaps.

The thing is all this can get tedious if you use a normalized structure where you have 15 entities and each has at least one ‘type-like’ field. That’s a whole lot of dangling Type objects. What you really want is an interface like this:

car = Car.first
car.car_type #returns "Compact"
car.car_type = "Sports" #No effect on Car.all.second, just automatically use the second constant
car.car_type = "Sedan" #Magically create a new type

Oh, and it’ll be nice if all of this is cached and you can define car types as constants (or symbols). You obviously still want to be able to run:

CarType.where(:id > 3) #Just an example of supposed "arbitrary" SQL involving a real live CarType class

But you wanna minimize generating these numerous type classes. If you’re like me, you don’t even want to see them lying around in app/model. Who cares about them?
I’ve looked thoroughly for a nice rails solution to this, but after failing to find one, I created my own rails metaprogramming hook.
The result of this hook is that you get the exact syntax described above, with only two lines of code (no extra classes or anything):

In your ActiveRecord object simply add

# car.rb
require 'active_record/lookup'
class Car < ActiveRecord::Base
#...
include ActiveRecord::Lookup
lookup :car_type, :as => :type
#…
end

That’s it. the generated CarType class (which you won’t see as a car_type.rb file, obviously, as it is generated in real-time), contains some nice methods to look into the cache as well: So you can call

CarType.id_for "Sports" #Returns 2
CarType.name_for 1 #Returns "Compact"

and you can still hack at the underlying ID for an object, if you need to:

car = Car.first
car.car_type = "Sports"
car.car_type_id #Returns 2
car.car_type_id = 1
car.car_type #Returns "Compact"
car.find_car_by_type_and_color("Compact", :blue) #Works, the underlying search is done by the ID

The full source code and gem can be found in https://github.com/Nimster/RailsLookup . The gem is named rails_lookup so you can just `gem install rails_lookup` to get the functionality required.

Note you do need to create tables for the new Type classes. The table format is very simple:

create_table :car_types do |t|
t.string :name
end
add_column :cars, :type, :integer

In this post, however, I would like to elucidate how this is achieved, hopefully teaching some ruby meta-programming and rails considerations on the way.

So how do we achieve that? Well, we start with creating our own Lookup module which can be included into active record classes:

module Lookup

  module ClassMethods
    #Any new "macros" go here
    def lookup(lookup_name)
    end
  end

  def self.included(host_class)
    host_class.extend(ClassMethods)
  end
end
view raw gistfile1.rb This Gist brought to you by GitHub.

This is the basic setup for inserting a new “macro” like belongs_to (which is actually a simple class method). When the Lookup module is included in a class, the ruby interpreter will call the hook method “self.included” with the class this was included into. We ask to also extend this class, thereby adding any class methods defined in ClassMethods into it.

We can now call “lookup :car_type, :as => :type” in our Car class, only that it doesn’t do anything. Let’s make it do something. We need to achieve the following things:

  1. Create the CarType ActiveRecord
  2. Link the CarType and Car ActiveRecords (with the standard has_many, belongs_to link)
  3. Make the Car#car_type=, Car#car_type methods behave in the way we described above.
  4. (Optional) code-fill the caches when the class loads from the data in the DB

We will now present the code for each – when you read through, remember this all runs in the host class context (e.g. Car) so that self is the Car class, and any actions we take are equivalent to having explicitly written them in the Car class itself.

    def lookup(as_name)
      mycls = self #Class I'm defined in
 
      #We now define the CarType class, as if we were in a file car_type.rb
      cls = Class.new(ActiveRecord::Base) do #Define a new class, extending AR::Base
        #CarType should have the has_many :cars link
        has_many mycls.name.tableize.to_sym

        #These are optional. You can define any additional constraints you like.
        validates_uniqueness_of :name
        validates :name, :presence => true

        #Methods for using the cache. Providing a second argument saves data into the cache.
        def self.id_for(name, id = nil)
          #We cannot access the class variable for CarType as simply '@@rcaches' because it will
          #look for @@rcaches in the scope of the module we're in.
          class_variable_get(:@@rcaches)[name] ||= id
        end

        #This helper method is the "find_or_create" of the class that also
        #updates the cache and the DB.
        def self.gen_id_for(val)
          id = id_for val
          if id.nil?
            #Define this new possible value
            new_db_obj = find_or_create_by_name val
            id_for val, new_db_obj.id
            name_for new_db_obj.id, val
            id = new_db_obj.id
          end
          id
        end

        #Query the cache for the value that goes with a certain DB ID
        def self.name_for(id, name = nil)
          class_variable_get(:@@caches)[id] ||= name
        end
      end

      #Finally, Bind the created class to a name
      lookup_cls_name = lookup_name.to_s.camelize
      Object.const_set lookup_cls_name, cls #Define it as a global class
view raw gistfile1.rb This Gist brought to you by GitHub.

The important parts to note here are:

  • How we define a new class and then bind it to the constant “CarType” so that after a class containing the lookup (like Car) is referred to (just calling Car.to_s is enough), the CarType is not accessible as if it were inside of a car_type.rb file in our app/models directory.
  • How we use Rails’ built-in Inflections module which it mixes in to string, to move from so-called “table_notation” to CamelNotation and vice versa.
  • How we use class_variable_get and class_variable_set to access the class variables of the newly created CarType class – because confusingly enough @@var will refer to the class we’re in now and not the one being defined inside the block, when the code is executed. We discuss initialization of these two variables later on, during part (4).
Side note: This is not the complete class definition – I shortened it a bit to remove details which are handled in the gem version, like supporting Rails’ where() methods, support anonymous classes that have lookups and supporting multiple classes using the same lookup. If you’re interested in these, I urge you to check out the gem.

Note also that we have already included the has_many link inside of CarType. In the same way, we will include the belongs_to in the other direction. We do this and also define the special accessors for getting and setting the CarType as a String:

   def lookup(as_name)
     #...

     #Now, define the foreign key from Car to CarType.
     belongs_to lookup_name.to_s.to_sym, :foreign_key => "#{as_name}".to_sym
     validates "#{as_name.to_s}_id".to_sym, :presence => true

      #Now we define the "delegates" that will allow us to just set call car.car_type = "Sports"
      #Define a setter for car_type
      define_method("#{as_name.to_s}_id=") do |id|
        #We would have used instance_variable_get. However rails maintains a hash of attributes
        #that we must use to play nicely along with rails. Here we write the ID of the value
        #instead of the value itself inside the field.
        write_attribute "#{as_name.to_s}".to_sym, id
      end

      # Setter via String
      define_method("#{as_name.to_s}=") do |val|
        id = cls.gen_id_for val
        write_attribute "#{as_name.to_s}".to_sym, id
      end

      # Getter for the ID
      define_method("#{as_name.to_s}_id") do
        read_attribute "#{as_name.to_s}".to_sym
      end

      #Define the getter
      define_method("#{as_name.to_s}") do
        id = read_attribute "#{as_name.to_s}".to_sym
        if not id.nil?
          value = cls.name_for id
          if value.nil?
            # This is reached in case many processes use the DB and some other process
            # inserted a new value that we were not aware of, but who's ID was inserted
            # into this object.
            lookup_obj = cls.find_by_id id
            if not lookup_obj.nil?
              cls.name_for id, lookup_obj.name
              cls.id_for lookup_obj.name, id
            end
          end
        end
        value
      end

    #...
  end

view raw gistfile1.rb This Gist brought to you by GitHub.

The important thing to note here is how we employ ActiveRecord’s read_attribute and write_attribute. The data in your ActiveRecord is maintained in a hash called attributes where the names of fields (in the DB) are saved along with their values. A classic setter method like `car.car_type = “Compact”` would set an attribute entry in the hash with :car_type => “Compact”, which will later cause SELECT or INSERT statements to try and access the in existing column car_type. Our approach is to intercept every time the ‘type’ attribute is being written (with a String), and replace that String with a numerical ID (meanwhile creating the corresponding CarType entry if necessary).

Finally, prefill the caches from the DB when this class loads. This is optional but as the list of types is likely to be rather small, a real-time expanding cache is just wasting some user time and could be better done ahead.

    def lookup(as_name)
      #...

      all_vals = cls.all
      cls.class_variable_set(:@@rcaches, all_vals.inject({}) do |r, obj|
          r[obj.name] = obj.id
          r
        end)
      cls.class_variable_set(:@@caches, all_vals.inject([]) do |r, obj|
          r[obj.id] = obj.name
          r
      end)
    end
view raw gistfile1.rb This Gist brought to you by GitHub.

That’s it. If you don’t like the caching this becomes even easier – remove all of the references to @@rcaches and @@caches and you simply saved yourself the trouble of manually maintaining CarType objects.

The only remaining thing is to define your migrations for creating the actual database tables. After all, that’s something you only want to do once and not every time this class loads, so this isn’t the place for it. However, it’s easy enough to create your own scaffolds so that a command like

rails generate migration create_car_type_lookup_for_car

will automatically create the migration. This is the required migration

class CreateCarTypeLookupForCar < ActiveRecord::Migration
  def self.up
    create_table :car_types do |t|
      t.string :name
      t.timestamps #Btw you can remove these, I don't much like them in type tables anyway
    end

    remove_column :cars, :type #Let's assume you have one of those now…
    add_column :cars, :type, :integer #Maybe put not_null constraints here.
  end

  def self.down
    drop_table :car_types
    remove_column :cars, :type
    add_column :cars, :type, :string
  end
end
view raw gistfile1.rb This Gist brought to you by GitHub.

I’ll let you work out the details for actually migrating the data yourself – this post has already ran long enough. I urge you to read more in the gem’s source code here. There are some tricks I’ve omitted to make rails be able to support calls like Car.find_by_car_type_and_color “Compact”, :blue (when the actual SQL query should be asking about car_type_id = 1), and some more options for setting the lookup itself, handling Car.where(type: “Compact”) or multiple classes using a single lookup.

I hope this helped you and saved a lot of time and frustration. I’d like to thank Aviv for hosting me here. If you don’t already, read the rest of his blog, you’re sure to learn something useful! Follow me on twitter: @nimrodpriell

Today I Got Burnt by Isolated Tests

Posted in Programming, testing on August 7th, 2011 by Aviv Ben-Yosef – 10 Comments

Generally, I prefer the GOOS school of TDD which includes isolating my classes as much as possible, putting mocks and stubs everywhere. Even though one of its known disadvantages is that you risk testing your classes in a fake environment that won’t match the real production code using it, I’ve rarely come across a place where I got really bitten by it.

Today I set out with my pair to add some functionality to a certain class. That class had about 30-40 lines of code and about 10 test cases, which seemed quite decent. We added our changes TDD style and just couldn’t get the thing working. After digging into it for a few more minutes we suddenly realized the class shouldn’t be working at all and checking in the DB showed that indeed the last time that specific feature had any effect was 3 months ago!

Fortunately for us, all the problems that caused this bug are solved problems, we just need to get better at implementing the solutions:

Isolated tests go much better hand in hand with a few integration tests (some might say the right term is acceptance tests) that execute the whole system and make sure the features are working. Had we had those, we would have caught the bug much sooner.

The bug was introduced in a huge commit that changes 35 files and 1500 lines of code. We usually try and go over every commit made, even if it was paired, because we believe in collective code ownership, but it’s impossible to go over such a huge diff and find these intricacies. Working in small baby steps makes it far less likely to break something and more likely that someone else will spot your mistakes. Huge refactorings give me the creeps.

After the change was committed, it was not followed-through: this specific feature is a feature you usually notice over a few days and we missed out on making sure it kept working. We moved on to other tasks and forgot all about it, thinking it was working all this time. Had we taken the time to make sure we were seeing, it would have been squashed by the next deployment.

Any of these would have helped us spot sooner that the isolated tests were actually testing the code against a scenario that never happens. These tiny changes of our workflow would have made several of our users happier over this timeframe.

Hopefully all is well now and the feature is back at 100%, but only time will tell whether we were able to learn from this mishap.

You should subscribe to my feed and follow me on twitter!

Shell Hackery: The Use of “cd .”

Posted in Programming, techie on August 4th, 2011 by Aviv Ben-Yosef – 1 Comment

I have a nasty habit of going over my bash history every once in a while. Usually I sort commands by frequency to find stuff I can automate/alias. Last time I came across “cd .” and thought I’d write up a little explanation of why I find this seemingly useless command useful.

So what does it do? “cd .” literally means “change directory to the current directory”, which sounds like a no-op. The point is that sometimes the current directory is no longer the current directory! Let’s start with an example.

Say I have a git repository on my_repo/ and on its master branch there’s a my_repo/folder directory and on its bugfix branch that directory doesn’t exist. No Imagine I have a terminal window open after performing the following command:

cd my_repo/folder # now on branch master

And now, while that terminal is open I need to switch to the bugfix branch for a few minutes, do my thing and return to it. If I switch branches using a different terminal or some GUI tool, what becomes of my terminal’s shell? When I switched to the bugfix branch, git essentially removed that directory the shell was in, and when I returned to the master branch, the directory was put back into place.

So, one might expect that after switch back and forth between branches and returning to my original terminal, simply executing “ls -l” will show that everything is ok. But it won’t. What I would actually see when running “ls -l” is that the current directory is empty!

Oh no! Are all our files lost? Nope. They’re right there in my_repo/folder, but our shell doesn’t know that. To understand why, we need to dig a bit deeper. When a unix process accesses any file or directory, it obtains a file descriptor to it. That includes a shell’s current directory – all throughout its lifetime, it has an open fd of the current dir. You can see that by running lsof -p [your shell pid].

When process A holds an open fd to a file/directory and process B removes that directory, what should happen? Unix doesn’t have that file locking mechanism windows does. What it does do is remove the file from anywhere except still holding it somewhere til process A finishes working with it. What this means is that if, for example, you’ve got a file open in some software and accidentally “rm”ed the file, you can still recover the file because it’s held somewhere by the open program. You can see an example for restoring files this way on linux here.

Back to our problem! Our shell process is now sitting with its current directory actually being some phantom directory that is no more. That means that even after we checked out the master branch again and the directory was already there, no one updated our shell regarding that. It does know it’s in “my_repo/folder”, though.

That means that in order to quickly get our terminal back to being useable (say, we want “ls” to actually show stuff) we can, of course, be all lame, close the shell and open a new one. Or, we can “refresh” the file descriptor to the current directory. How?

cd .

Hope you learned something new!

You should subscribe to my feed and follow me on twitter!

Why I Regret Choosing RightScale

Posted in Programming on July 27th, 2011 by Aviv Ben-Yosef – 3 Comments

A few months ago we had to decide on some framework/environment to use for our devops needs. I’ve blogged about my experiences with Puppet and Chef on EC2. Somehow, we eventually ended up using RightScale.

Quick disclaimer: this is not a rant and I don’t intend any bashing. It’s just a report of my impression from using it.

RightScale provide a system for configuring and managing your cloud infrastructure, from defining how servers are created to monitoring and changing them. RightScale has a few nice features. It has a pretty nice clustering setup of MySQL solution for EC2. It also has decent monitoring and alerting capabilities.

My main problem with it, though, is that they basically took a few steps backwards from all other known solutions, making my life so much harder. I’ve pointed most if not all of these issues to RightScale on twitter and private emails, yet I can’t imagine seeing these issues solved any time soon.

Scripting (Dis)Abilities

If you’ve used Chef or Puppet, you probably got hooked on the ease of managing and creating your own set up scripts. RightScale’s solution, RightScripts is a weaker, 1990ish kind of solution:

  • No templates – remember the days you had files with placeholders like ‘@@REPLACE_HERE@@’ to sed out? Know how nice are real templates in Chef for example, where you can use .erb files? Well, with RightScale it’s all gone again. Sed away.
  • No dependencies – RightScale do have a nice RightScript to install MySQL. Problem is, it depends on a bunch of other scripts and there just isn’t any link to it. Install it, hopefully find a reference for dependency name in README. Install dependency. Look for its dependencies. Error prone and tiresome.
  • Made up version control model – No longer can you use git to update and manage your scripts. RightScale has a dumb-down version control system where you can “commit” changes to scripts you make. These aren’t accessible locally on your machine and lack all the nice features of real VCS: can’t grep, can’t search history. You can’t do a “git status” and see what has changed all over your servers. Chaos.
  • Scripts are edited in text areas – that’s right. That means I’m constantly copying the script from the browser to vim, edit it, copy back and save.
  • No easy sharing of scripts – with Chef you could download cookbooks from all over the internet. With RightScale you’re limited to a closed and pretty empty market of rightscripts.
  • No composability – say you’ve got a generic script to attach an EBS volume to a server. Want to attach 2? Thought you can just call the script twice with different parameters? Wrong! You can’t. Only option is to copy and paste the script with a new name and new parameter names.
Some of these issues might be solved soon, since RightScale seem to be working on enabling use of Chef for scripts. We’ve tried to set up this beta on our installation but got a lot of exceptions and left it as it is for now.

Mouse Control

The UI is centered around clicking way too much. They’re pretty nice monitoring dashboard per machine is not configurable. That means that for each and every server we have a routine of doing over a few graphs, clicking and dragging stuff the way we like them. Want to change alert type of a server? Click them all one by one. Need to run a script on all your servers? Click, click, click. This is a painstakingly slow process that makes me feel undervalued each and every time.

No Automatic Updates

The beauty of systems like Chef and Puppet is that you can make a change in the configuration and it will automatically get to all of your servers. That’s not the case here. You have to go over each server, figure its state and then run the proper scripts.

Bottom Line

If you have decent coding ability and know your way around a server, chances are you’d be better off no using RightScale. There’s just so much you’ll be missing out and a major time waste. I truly hope to see these issues taken cared of, but I think we’re far from it.
You should subscribe to my feed and follow me on twitter!

In the Mind of a Master Programmer

Posted in Programming on July 25th, 2011 by Aviv Ben-Yosef – Be the first to comment

He would probably object to me calling him that, but I’ve long ago realized Kent Beck is one of the precious few who deserve the title “a mastermind”. With Extreme Programming, Test Driven Development, Responsive Design, the Four Elements of Simple Design and more under his belt, who can claim otherwise?

After attending a workshop of his about a year ago and listening to him talk a whole day I was astonished. I tried to pick his brains to understand what makes him tick. Of course there are many factors here – reading over 10,000 books and being smarter than most would help anyone. But something a bit less common takes a major part in this in my opinion, what Kent told me he has: a “habit of desperately wanting things to make sense” and his ability to take things apart until they do.

I recently picked up another of Kent’s books, Implementation Patterns. I love this book because it shows exactly that: his process of thinking and breaking things apart in order to understand them. The book provides a rare glimpse to his method of decomposition. Since I’ve been coding for years, a lot of the patterns made sense to me or seemed trivial. But the “magic” is the fact he was able to put into words things that for me were just hunches. Actually explaining what makes you sense a method is too long or what is a proper name for a variable is something I’ve never seen done with such care to specifics.

Because it’s such a quick read, I think anyone will benefit by reading Implementation Patterns. More than helping you understand our craft better, it will provide a new outlook on decomposing and judging your designs and pretty much everything else.

 

You should subscribe to my feed and follow me on twitter!

Input Validation means more than Javascript

Posted in Programming on June 20th, 2011 by Aviv Ben-Yosef – 2 Comments

So much has been written about security before, that I never thought I’d end up writing something about it. Then again, I never thought one of the top U.S. banks will get hacked simply by twiddling digits in a URL.

Basically, the only thing you should take away from this post is that when it comes to external data – trust no one. And I mean absolutely no one.

I think and hope that by now most web developers know not to trust data that users entered in input fields. That trust is what gave birth to SQL injections. Nowadays, just about no one should be exposed to such a lame problem, especially since pretty much every ORM framework out there protects you from these. But checking your input fields is just the beginning.

Every form of input you accept, even indirect input, is still untrusted input. I just want to go over a few examples, because you all should have this in mind:

URLs – Just like I mentioned above, CitiBank got hacked simply because someone noticed an integer on his browser address bar and started incrementing it. Any parameter you accept from a URL should be examined. Accessing an email by id? Make sure it corresponds with the current user. Always.

Form arguments/JSON – These are just the same thing as validating input fields. Everyone should know by now that it’s wrong to trust and validation done on the client side, since every moderately capable person can craft his own POST/GET requests and bypass any validation. Validate everything on the server. And don’t use the client as a place to put some state in, unless it really belongs there. I can’t tell you how many ecommerce sites I’ve seen that pass the price of products along your regular forms as hidden input fields. From that point it’s just a few right clicks in firebug and you’re gonna get that LCD TV for $1.

Cookies – Again, these are inputs generated from your clients. Yeah, you put the cookie there in the first place, but since you put it there your users had the chance to do whatever they want to it. So, putting in a cookie any kind of integer means it has to be validated again on the server side, just like a URL parameter. Any data you put there might have been mangled. The solution is to either not use cookies for anything like that, or sign your cookies the way Rails does.

Really anything possible – Have you ever used a service that allowed you to update certain stuff via email? That’s, for example, another form of input. You wouldn’t want someone to change some URL/number in the email when he’s replying and get access to a different user’s data, would you?

These are really just the tip of iceberg, but I’m constantly surprised to see how many around us are popping up web sites with no thought given to these problems. Just a tiny bit of thinking can prevent you from topping reddit for being a lame developer.

You should subscribe to my feed and follow me on twitter!

Statistics of 62K Passwords

Posted in Programming on June 18th, 2011 by Aviv Ben-Yosef – 23 Comments

A couple of days ago, LulzSec published a batch of 62K random logins (emails and passwords). At first, I grabbed it in order to make sure that neither me nor anyone on my contacts had his passwords revealed. Later I decided to run a few stats on this rare dump of data. Following are a few interesting facts.

Password length

The dump’s average password length is 7.63. I was surprised, because I thought most users would use something like 4 characters, but then remembered a lot of sites nowadays enforce a a 6-8 character limit minimum, so this makes sense. As you should know, and as you can find in Hacking: The Art of Exploitation, longer passwords are greatly harder to crack, so this is definitely a case where size does matter.

Here’s a short graph depicting the distribution of password length (Note that edge groups have less than 10 passwords and so aren’t really seen here):

Passwords by length
Common Passwords

Not surprisingly, the most common password is 123456 with 569 occurrences, followed by its “more secure” cousin 123456789 with 184. The 3rd most common password is… “password” (132 occurrences)! The other top-10 passwords are interesting – some are plain words such as “romance”, “mystery”, “tigger” and “shadow”, “102030″ makes quite a few appearances.

The 10th most used password is quite intriguing actually – “ajcuivd289″. Everyone on the internet seem baffled as to the source of this password. My guess would have to be it’s some worm that resets the accounts it hacked into to it. Edit: As Marc comments below, the logins with these passwords seem “clustered”, which makes it more likely that these are actually the result of some bot creating accounts. Thanks Marc!

A couple hundred passwords are just not-so-random keyboard taps (“123qwe”, “asdf1234″, etc.). 789 passwords are taken exactly from the username, and twice that many are part of the username followed by some digits (most seem like birth years).

Inside Passwords

12179 of the passwords are all numeric, some are 14 digits long! That’s just crazy. While 34717 (that’s more than half) of the passwords contain any digits, only 1262 contain capital letters and 533 contain special characters!

Some Common Words

418 passwords contain the word “love”. “sex” is in 125, “jesus” in 67. More people prefer cats (414) to dogs (291). And the language battle – 6 javas, 2 pythons and 17 “ruby”s (guess which one is also a name).

 

I’d like to sum this up with urging you to never use the same password twice and use a password manager in order to generate secure passwords! Using a password manager ensures that even if a certain site is breached, it doesn’t mean all of your passwords are revealed, and secure paswords are just harder to brute force.

You should subscribe to my feed and follow me on twitter!