Posts Tagged ‘Programming’

Today I Got Burnt by Isolated Tests

Posted in Programming, testing on August 7th, 2011 by Aviv Ben-Yosef – 10 Comments

Generally, I prefer the GOOS school of TDD which includes isolating my classes as much as possible, putting mocks and stubs everywhere. Even though one of its known disadvantages is that you risk testing your classes in a fake environment that won’t match the real production code using it, I’ve rarely come across a place where I got really bitten by it.

Today I set out with my pair to add some functionality to a certain class. That class had about 30-40 lines of code and about 10 test cases, which seemed quite decent. We added our changes TDD style and just couldn’t get the thing working. After digging into it for a few more minutes we suddenly realized the class shouldn’t be working at all and checking in the DB showed that indeed the last time that specific feature had any effect was 3 months ago!

Fortunately for us, all the problems that caused this bug are solved problems, we just need to get better at implementing the solutions:

Isolated tests go much better hand in hand with a few integration tests (some might say the right term is acceptance tests) that execute the whole system and make sure the features are working. Had we had those, we would have caught the bug much sooner.

The bug was introduced in a huge commit that changes 35 files and 1500 lines of code. We usually try and go over every commit made, even if it was paired, because we believe in collective code ownership, but it’s impossible to go over such a huge diff and find these intricacies. Working in small baby steps makes it far less likely to break something and more likely that someone else will spot your mistakes. Huge refactorings give me the creeps.

After the change was committed, it was not followed-through: this specific feature is a feature you usually notice over a few days and we missed out on making sure it kept working. We moved on to other tasks and forgot all about it, thinking it was working all this time. Had we taken the time to make sure we were seeing, it would have been squashed by the next deployment.

Any of these would have helped us spot sooner that the isolated tests were actually testing the code against a scenario that never happens. These tiny changes of our workflow would have made several of our users happier over this timeframe.

Hopefully all is well now and the feature is back at 100%, but only time will tell whether we were able to learn from this mishap.

You should subscribe to my feed and follow me on twitter!

Shell Hackery: The Use of “cd .”

Posted in Programming, techie on August 4th, 2011 by Aviv Ben-Yosef – 1 Comment

I have a nasty habit of going over my bash history every once in a while. Usually I sort commands by frequency to find stuff I can automate/alias. Last time I came across “cd .” and thought I’d write up a little explanation of why I find this seemingly useless command useful.

So what does it do? “cd .” literally means “change directory to the current directory”, which sounds like a no-op. The point is that sometimes the current directory is no longer the current directory! Let’s start with an example.

Say I have a git repository on my_repo/ and on its master branch there’s a my_repo/folder directory and on its bugfix branch that directory doesn’t exist. No Imagine I have a terminal window open after performing the following command:

cd my_repo/folder # now on branch master

And now, while that terminal is open I need to switch to the bugfix branch for a few minutes, do my thing and return to it. If I switch branches using a different terminal or some GUI tool, what becomes of my terminal’s shell? When I switched to the bugfix branch, git essentially removed that directory the shell was in, and when I returned to the master branch, the directory was put back into place.

So, one might expect that after switch back and forth between branches and returning to my original terminal, simply executing “ls -l” will show that everything is ok. But it won’t. What I would actually see when running “ls -l” is that the current directory is empty!

Oh no! Are all our files lost? Nope. They’re right there in my_repo/folder, but our shell doesn’t know that. To understand why, we need to dig a bit deeper. When a unix process accesses any file or directory, it obtains a file descriptor to it. That includes a shell’s current directory – all throughout its lifetime, it has an open fd of the current dir. You can see that by running lsof -p [your shell pid].

When process A holds an open fd to a file/directory and process B removes that directory, what should happen? Unix doesn’t have that file locking mechanism windows does. What it does do is remove the file from anywhere except still holding it somewhere til process A finishes working with it. What this means is that if, for example, you’ve got a file open in some software and accidentally “rm”ed the file, you can still recover the file because it’s held somewhere by the open program. You can see an example for restoring files this way on linux here.

Back to our problem! Our shell process is now sitting with its current directory actually being some phantom directory that is no more. That means that even after we checked out the master branch again and the directory was already there, no one updated our shell regarding that. It does know it’s in “my_repo/folder”, though.

That means that in order to quickly get our terminal back to being useable (say, we want “ls” to actually show stuff) we can, of course, be all lame, close the shell and open a new one. Or, we can “refresh” the file descriptor to the current directory. How?

cd .

Hope you learned something new!

You should subscribe to my feed and follow me on twitter!

In the Mind of a Master Programmer

Posted in Programming on July 25th, 2011 by Aviv Ben-Yosef – Be the first to comment

He would probably object to me calling him that, but I’ve long ago realized Kent Beck is one of the precious few who deserve the title “a mastermind”. With Extreme Programming, Test Driven Development, Responsive Design, the Four Elements of Simple Design and more under his belt, who can claim otherwise?

After attending a workshop of his about a year ago and listening to him talk a whole day I was astonished. I tried to pick his brains to understand what makes him tick. Of course there are many factors here – reading over 10,000 books and being smarter than most would help anyone. But something a bit less common takes a major part in this in my opinion, what Kent told me he has: a “habit of desperately wanting things to make sense” and his ability to take things apart until they do.

I recently picked up another of Kent’s books, Implementation Patterns. I love this book because it shows exactly that: his process of thinking and breaking things apart in order to understand them. The book provides a rare glimpse to his method of decomposition. Since I’ve been coding for years, a lot of the patterns made sense to me or seemed trivial. But the “magic” is the fact he was able to put into words things that for me were just hunches. Actually explaining what makes you sense a method is too long or what is a proper name for a variable is something I’ve never seen done with such care to specifics.

Because it’s such a quick read, I think anyone will benefit by reading Implementation Patterns. More than helping you understand our craft better, it will provide a new outlook on decomposing and judging your designs and pretty much everything else.

 

You should subscribe to my feed and follow me on twitter!

Input Validation means more than Javascript

Posted in Programming on June 20th, 2011 by Aviv Ben-Yosef – 2 Comments

So much has been written about security before, that I never thought I’d end up writing something about it. Then again, I never thought one of the top U.S. banks will get hacked simply by twiddling digits in a URL.

Basically, the only thing you should take away from this post is that when it comes to external data – trust no one. And I mean absolutely no one.

I think and hope that by now most web developers know not to trust data that users entered in input fields. That trust is what gave birth to SQL injections. Nowadays, just about no one should be exposed to such a lame problem, especially since pretty much every ORM framework out there protects you from these. But checking your input fields is just the beginning.

Every form of input you accept, even indirect input, is still untrusted input. I just want to go over a few examples, because you all should have this in mind:

URLs – Just like I mentioned above, CitiBank got hacked simply because someone noticed an integer on his browser address bar and started incrementing it. Any parameter you accept from a URL should be examined. Accessing an email by id? Make sure it corresponds with the current user. Always.

Form arguments/JSON – These are just the same thing as validating input fields. Everyone should know by now that it’s wrong to trust and validation done on the client side, since every moderately capable person can craft his own POST/GET requests and bypass any validation. Validate everything on the server. And don’t use the client as a place to put some state in, unless it really belongs there. I can’t tell you how many ecommerce sites I’ve seen that pass the price of products along your regular forms as hidden input fields. From that point it’s just a few right clicks in firebug and you’re gonna get that LCD TV for $1.

Cookies – Again, these are inputs generated from your clients. Yeah, you put the cookie there in the first place, but since you put it there your users had the chance to do whatever they want to it. So, putting in a cookie any kind of integer means it has to be validated again on the server side, just like a URL parameter. Any data you put there might have been mangled. The solution is to either not use cookies for anything like that, or sign your cookies the way Rails does.

Really anything possible – Have you ever used a service that allowed you to update certain stuff via email? That’s, for example, another form of input. You wouldn’t want someone to change some URL/number in the email when he’s replying and get access to a different user’s data, would you?

These are really just the tip of iceberg, but I’m constantly surprised to see how many around us are popping up web sites with no thought given to these problems. Just a tiny bit of thinking can prevent you from topping reddit for being a lame developer.

You should subscribe to my feed and follow me on twitter!

Statistics of 62K Passwords

Posted in Programming on June 18th, 2011 by Aviv Ben-Yosef – 23 Comments

A couple of days ago, LulzSec published a batch of 62K random logins (emails and passwords). At first, I grabbed it in order to make sure that neither me nor anyone on my contacts had his passwords revealed. Later I decided to run a few stats on this rare dump of data. Following are a few interesting facts.

Password length

The dump’s average password length is 7.63. I was surprised, because I thought most users would use something like 4 characters, but then remembered a lot of sites nowadays enforce a a 6-8 character limit minimum, so this makes sense. As you should know, and as you can find in Hacking: The Art of Exploitation, longer passwords are greatly harder to crack, so this is definitely a case where size does matter.

Here’s a short graph depicting the distribution of password length (Note that edge groups have less than 10 passwords and so aren’t really seen here):

Passwords by length
Common Passwords

Not surprisingly, the most common password is 123456 with 569 occurrences, followed by its “more secure” cousin 123456789 with 184. The 3rd most common password is… “password” (132 occurrences)! The other top-10 passwords are interesting – some are plain words such as “romance”, “mystery”, “tigger” and “shadow”, “102030″ makes quite a few appearances.

The 10th most used password is quite intriguing actually – “ajcuivd289″. Everyone on the internet seem baffled as to the source of this password. My guess would have to be it’s some worm that resets the accounts it hacked into to it. Edit: As Marc comments below, the logins with these passwords seem “clustered”, which makes it more likely that these are actually the result of some bot creating accounts. Thanks Marc!

A couple hundred passwords are just not-so-random keyboard taps (“123qwe”, “asdf1234″, etc.). 789 passwords are taken exactly from the username, and twice that many are part of the username followed by some digits (most seem like birth years).

Inside Passwords

12179 of the passwords are all numeric, some are 14 digits long! That’s just crazy. While 34717 (that’s more than half) of the passwords contain any digits, only 1262 contain capital letters and 533 contain special characters!

Some Common Words

418 passwords contain the word “love”. “sex” is in 125, “jesus” in 67. More people prefer cats (414) to dogs (291). And the language battle – 6 javas, 2 pythons and 17 “ruby”s (guess which one is also a name).

 

I’d like to sum this up with urging you to never use the same password twice and use a password manager in order to generate secure passwords! Using a password manager ensures that even if a certain site is breached, it doesn’t mean all of your passwords are revealed, and secure paswords are just harder to brute force.

You should subscribe to my feed and follow me on twitter!

Sometimes Tests Have to Fail

Posted in Programming, testing on April 3rd, 2011 by Aviv Ben-Yosef – Be the first to comment

A friend asked me about a common problem that pops up in real-world projects and testing: What do you do when you test code with random properties?

A simple example might be handing out jobs to a few workers. If your algorithm for doing that is random, you can usually assert that no one of 3 workers gets all 10 jobs, for example. But, being random, that assert should eventually fail. We’ll assume that with the frequency the team runs the tests, a failure is expected every few days.

Surely no one wants to see the tests fail a couple times a week (especially if you’re keeping score for who broke the build). On the other hand, you’d like to keep the tests. What is a pragmatic coder to do?

If you’re not that meticulous to your suite rarely failing, you might just leave it as it is, which, I think, sucks.

The mega-tester’s approach, which I’ve tried in the past, is usually to stub out the random number generator with values that make sure the failures won’t happen. This is usually cost-effective only for the simplest of cases, and the more complex ones results in brittle tests that are coupled to the implementation and that might need to be changed frequently.

What I rather is to postpone the problem! Say we change our test’s parameters to 10 workers and 3000 jobs. The chances of one worker getting all jobs becomes quite minor. This tweak of parameters in the test is usually simple to do and can guarantee quite a safety net.

And still, sometimes bad stuff happen. 64bit hash collisions are somewhere, out there in the world. If you’re one of those guys that are bugged by that chance, I give you a simple JUnit rule that will retry a specific test in case it fails, making it twice as unlikely to fail. Those 64bit collisions are now more like 128bit! woohoo!

The rule allows you to simply annotate a test to make it retry in case it fails:

public class RetrierTest {
  private static int count = 0;

  @Rule public RetryRule rule = new RetryRule();

  @Test
  @Retry
  public void failsFirst() throws Exception {
    count++;
    assertEquals(2, count);
  }
}

And the implementation is as simple as:

@Retention(RetentionPolicy.RUNTIME)
public @interface Retry {}

view raw Retry.java This Gist brought to you by GitHub.
public class RetryRule implements MethodRule {
  @Override public Statement apply(final Statement base, final FrameworkMethod method, Object target) {
    return new Statement() {
      @Override public void evaluate() throws Throwable {
        try {
          base.evaluate();
        } catch (Throwable t) {
          Retry retry = method.getAnnotation(Retry.class);
          if (retry != null) {
            base.evaluate();
          } else {
            throw t;
          }
        }
      }
    };
  }
}

With the tests so unlikely to fail, I’d start a lottery at work for whoever breaks them.

Happy testing!

You should subscribe to my feed and follow me on twitter!

Testing Techniques: Managing External Resources

Posted in Programming, testing on April 1st, 2011 by Aviv Ben-Yosef – 1 Comment

A friend approached me with one of the known problems in the testing world – How do you keep external resources under a test harness? Having heard the question a few times before, I thought I’d share my thoughts, and mainly put together the common advice that drifts around the web.

The Dilemma

Nowadays, it’s hard to get more than a 100 lines of code before adding an external resource to our code. It might be a web service to manage something, or some convoluted API to receive data from or just about anything. Usually, writing tests for code that directly talks with these resources using the resources themselves is very problematic, for numerous reasons:

  • It significantly slows the tests, because it requires network access and processing on the service’s side.
  • It might cost you money, send emails, tweet stuff and do things you’d rather not do 300 times a day as you run your tests.
  • Making your code handle error conditions with the service is hard or impossible, as you can’t control when those occur.

Basically, all of these factors usually amount up to you having crappy tests that you rarely run. That sucks.

Decouple & Isolate

The best solution I’m aware of is simply isolating the thing. We usually strive to wrap whatever service we’re using with a single-point interface. The decoupling is great since I’ve yet to encounter a service with an API that matched my thinking of the domain problem. Wrapping it up allows us to keep using our own language and logic throughout the system.

A benefit of that is we now have a simple interface or facade we need to stub/mock out during tests. That’s usually relatively easy, and allows us to run our tests blazingly fast and test all those hard to reach to corner cases.

But what if the service changes?

That’s the finishing touch. You should still maintain a suite of tests that run against the real service. Those should be the plain tests that make sure you’re using the API right and that would break if anything you’re relying on changes. These tests won’t be part of your regular suite that gets run constantly. Instead have your CI server run them daily/weekly and let you know when something changes.

This puts us basically in a win-win situation, with us being able to run our tests quickly and yet have the assurance that we won’t miss API changes and the likes.

Happy testing!

You should subscribe to my feed and follow me on twitter!

Design is Simpler Now: Embrace the Extract

Posted in Programming on March 30th, 2011 by Aviv Ben-Yosef – Be the first to comment

For the past 5 years or so I’ve been searching for ways to produce better designed code. I hate the fact I basically can’t put my finger on why certain designs aren’t as good as others.

That’s why I was really blown away when I first learned about the SOLID principles and started practicing TDD. At last I have found rules that gave me the capability to weigh designs, and a process that helped push me towards what feels like better code.

But even 5 rules were too much for me!

SOLID, no doubt, drives better design. My problem was incorporating it natively with my every day coding. Call me dumb, but I just can’t bring myself to contemplate 5 different aspects whenever I whip up a class. I still find it as an excellent checklist to go through when I’m considering refactorings, but thinking about it constantly just drained a big part of my concentration.

For a few months now I’ve been getting the feeling that my OOD toolset has reduced quite a lot to the very essence. That feeling was also magnified by reading GOOS and pretty much everything written by J. B. Rainsberger here and here.

The first tool I use heavily (and I mean heavily, my mind has managed to get OCD about it) is duplication – or DRY. This tool alone makes any codebase a magnitude better. I’ve written plenty about DRY before.

But, just yesterday I realized that other than that, I mainly concentrate on one thing, as I contemplated on twitter:

I think I can sum up all my OOD skills with “wait, shouldn’t this be in a different class/method?” Wondering if that’s a good thing…

Yup, that’s the trick. I was quickly assured by two amazing guys that have been doing this longer than I’ve been breathing, agile manifesto authors:

Ron Jeffries: Yes it is a good thing. I would suspect you also note duplication?

James Grenning: Think of the alternative.. you are asking the right question

You see that? Noticing duplication and moving stuff somewhere else. That’s all there’s to it. This simple question directs at you the Single Responsibility Principle and generally, along with DRY, covers most of the bases needed to adhere to the elements of simple design.

The main question I ask myself now every time I think of a problem, start changing a function, write a test, and at just about anytime I’m coding is “is this the right place for this?” And quite often the answer is “no.” Push this forward and beautiful designs show up, designs of short, cohesive classes. So, to sum it up: Embrace the Extract.

You should subscribe to my feed and follow me on twitter!

Crafting Up – Community is Key

Posted in Programming on March 27th, 2011 by Aviv Ben-Yosef – 2 Comments

It’s been almost a year now since the founding of our local Software Craftsmanship group. This, for me, is a huge dream-come-true.

For years I’ve been looking for a good community around here to join, went to several meetups and looked around to no avail. My frustration grew about a year ago when I noticed the Chicago community is so buzzing with activity, people there have a meetup every day almost. That’s why when Uri started organizing the first meeting I jumped in whole-heartedly.

In just a few months the meeting has influenced me quite a lot. First of all I got to meet a lot of new, smart and interesting people I never would have otherwise. It’s not easy to find people that are as passionate about our profession as I am, yet our group didn’t disappoint me.

The meetings also supply my need to pair with new people. Pair programming is a magical way of working and sharing knowledge, and I’ve yet to have a session with a new pair without picking up something new. I love the first minutes where we have to find a common language to get things started, and even more the high fives of getting a green bar.

Also, a good community is the best way to get feedback. I can say I’m trying to leech this to the max. I’ve already gave talks/sessions at 2 meetings, bugging people frequently on twitter and the mailing list. A varied community of like minded people allows you to get different outlooks and insights to things you’ve been neck-deep in for a while.

And last but not least, a good community might make magic stuff happen. I don’t know how, but I’m sure our group had something to do with the fact some of us got to have dinner with Uncle Bob and Brett Schuchert, two awesome coders and Clean Code authors, on their last visit here.

Bottom line, be part of a community, and if there isn’t one around you help start it! It’s a great source of kindred spirits, an invaluable and rare resource!

You should subscribe to my feed and follow me on twitter!

Using Chef to Automatically Configure New EC2 Instances

Posted in Programming, techie on March 7th, 2011 by Aviv Ben-Yosef – 4 Comments

This is a follow up post to my post about using Puppet to get the same result. In the comments to that post I was told by a few people that chef can make my life easier and I decided to give a try. Here’s what I came up with.

In this post, as in the previous one, our goal is to be able to start a new EC2 instance with one command, which will in turn be created and started with Apache running.

First of all, instead of having to set up our own server to tell the newly created instances what to do, we are going to use a hosted chef server on Opscode’s server. The hosting is free for 5 nodes, and so you can try this out without having to pay them. Go to Opscode’s site and register a new user, then also add a new organization.

On our system, we need to start by installing chef. You will also want to install the dependencies needed to make chef talk with EC2 (these are not installed automatically when installing the gem because they’re optional):

gem install chef net-ssh net-ssh-multi fog highline
view raw install.sh This Gist brought to you by GitHub.

Now, we need to setup a chef repository. This repository will contain our cookbooks (libraries that contain recipes, which are scripts for doing stuff, like installing apache) and roles (which map recipes to nodes), among other stuff. To get it run:
git clone git://github.com/opscode/chef-repo.git
view raw clone.sh This Gist brought to you by GitHub.

In the repository create a .chef directory. Now back on Opscode’s site, you need to download 3 files: your organization’s validator key, your user’s key and a generated knife.rb. Once installed, copy them all to the .chef directory:
cp USERNAME.pem ORGANIZATION-validator.pem knife.rb .chef
view raw cp.sh This Gist brought to you by GitHub.

These will be used by the new instances to connect to Opscode and identify themselves as truly being created by you (this saves us from having to hack an awkward solution for this to work on Puppet). Add to your knife.rb file your AWS credentials:
knife[:aws_access_key_id] = "Your AWS Access Key"
knife[:aws_secret_access_key] = "Your AWS Secret Access Key"
view raw knife.rb This Gist brought to you by GitHub.

We will now fetch the apache2 cookbook, which will allow us to install apache on our instances by adding a single configuration line. To download an existing cookbook, do the following:
knife cookbook site vendor apache2
view raw download.sh This Gist brought to you by GitHub.

You can see what other cookbooks are made available by looking around here. Now, we’ll create a role for our instances. Create the file roles/appserver.rb with this data:
name "appserver"
description "An application server"
run_list(%w{
recipe[apache2]
})
view raw appserver.rb This Gist brought to you by GitHub.

And to update our Opscode server with the new cookbook and role:
knife cookbook upload apache2
knife role from file roles/appserver.rb
view raw upload.sh This Gist brought to you by GitHub.

We’re getting really close now! You should have a security group define in AWS that has port 22 (SSH) open, for knife to be able to connect to it and configure it, and port 80 (HTTP) for our Apache to be available. I called mine “chef”. You will also need to decide with AMI (image) to use, you can find a list of AMIs supplied by Opscode here. And now, to create an instance with one command line, as promised:
knife ec2 server create "role[appserver]" --image ami-f0e20899 \
   --groups chef --ssh-user ubuntu --ssh-key my-key
view raw create.sh This Gist brought to you by GitHub.

This will take a while, as knife will create the instance, connect to it, install ruby, chef itself, apache etc. Once it says it has finished simply copy the public DNS of the newly created image (it should be printed once knife finishes) and open it in your browser. My, what a sense of accomplishment one gets from seeing the string “It works!”

I find this a lot easier, cleaner, stream-lined and fun. I’m still learning the ropes with chef, but it has already surprised by being easy to change, being completely git-integrated and by Opscode’s fast support (even for non-paying customers). You can dig further in these links.

You should subscribe to my feed and follow me on twitter!