Statistics of 62K Passwords

Posted in Programming on June 18th, 2011 by Aviv Ben-Yosef – 23 Comments

A couple of days ago, LulzSec published a batch of 62K random logins (emails and passwords). At first, I grabbed it in order to make sure that neither me nor anyone on my contacts had his passwords revealed. Later I decided to run a few stats on this rare dump of data. Following are a few interesting facts.

Password length

The dump’s average password length is 7.63. I was surprised, because I thought most users would use something like 4 characters, but then remembered a lot of sites nowadays enforce a a 6-8 character limit minimum, so this makes sense. As you should know, and as you can find in Hacking: The Art of Exploitation, longer passwords are greatly harder to crack, so this is definitely a case where size does matter.

Here’s a short graph depicting the distribution of password length (Note that edge groups have less than 10 passwords and so aren’t really seen here):

Passwords by length
Common Passwords

Not surprisingly, the most common password is 123456 with 569 occurrences, followed by its “more secure” cousin 123456789 with 184. The 3rd most common password is… “password” (132 occurrences)! The other top-10 passwords are interesting – some are plain words such as “romance”, “mystery”, “tigger” and “shadow”, “102030″ makes quite a few appearances.

The 10th most used password is quite intriguing actually – “ajcuivd289″. Everyone on the internet seem baffled as to the source of this password. My guess would have to be it’s some worm that resets the accounts it hacked into to it. Edit: As Marc comments below, the logins with these passwords seem “clustered”, which makes it more likely that these are actually the result of some bot creating accounts. Thanks Marc!

A couple hundred passwords are just not-so-random keyboard taps (“123qwe”, “asdf1234″, etc.). 789 passwords are taken exactly from the username, and twice that many are part of the username followed by some digits (most seem like birth years).

Inside Passwords

12179 of the passwords are all numeric, some are 14 digits long! That’s just crazy. While 34717 (that’s more than half) of the passwords contain any digits, only 1262 contain capital letters and 533 contain special characters!

Some Common Words

418 passwords contain the word “love”. “sex” is in 125, “jesus” in 67. More people prefer cats (414) to dogs (291). And the language battle – 6 javas, 2 pythons and 17 “ruby”s (guess which one is also a name).

 

I’d like to sum this up with urging you to never use the same password twice and use a password manager in order to generate secure passwords! Using a password manager ensures that even if a certain site is breached, it doesn’t mean all of your passwords are revealed, and secure paswords are just harder to brute force.

You should subscribe to my feed and follow me on twitter!

Sometimes Tests Have to Fail

Posted in Programming, testing on April 3rd, 2011 by Aviv Ben-Yosef – Be the first to comment

A friend asked me about a common problem that pops up in real-world projects and testing: What do you do when you test code with random properties?

A simple example might be handing out jobs to a few workers. If your algorithm for doing that is random, you can usually assert that no one of 3 workers gets all 10 jobs, for example. But, being random, that assert should eventually fail. We’ll assume that with the frequency the team runs the tests, a failure is expected every few days.

Surely no one wants to see the tests fail a couple times a week (especially if you’re keeping score for who broke the build). On the other hand, you’d like to keep the tests. What is a pragmatic coder to do?

If you’re not that meticulous to your suite rarely failing, you might just leave it as it is, which, I think, sucks.

The mega-tester’s approach, which I’ve tried in the past, is usually to stub out the random number generator with values that make sure the failures won’t happen. This is usually cost-effective only for the simplest of cases, and the more complex ones results in brittle tests that are coupled to the implementation and that might need to be changed frequently.

What I rather is to postpone the problem! Say we change our test’s parameters to 10 workers and 3000 jobs. The chances of one worker getting all jobs becomes quite minor. This tweak of parameters in the test is usually simple to do and can guarantee quite a safety net.

And still, sometimes bad stuff happen. 64bit hash collisions are somewhere, out there in the world. If you’re one of those guys that are bugged by that chance, I give you a simple JUnit rule that will retry a specific test in case it fails, making it twice as unlikely to fail. Those 64bit collisions are now more like 128bit! woohoo!

The rule allows you to simply annotate a test to make it retry in case it fails:

public class RetrierTest {
  private static int count = 0;

  @Rule public RetryRule rule = new RetryRule();

  @Test
  @Retry
  public void failsFirst() throws Exception {
    count++;
    assertEquals(2, count);
  }
}

And the implementation is as simple as:

@Retention(RetentionPolicy.RUNTIME)
public @interface Retry {}

view raw Retry.java This Gist brought to you by GitHub.
public class RetryRule implements MethodRule {
  @Override public Statement apply(final Statement base, final FrameworkMethod method, Object target) {
    return new Statement() {
      @Override public void evaluate() throws Throwable {
        try {
          base.evaluate();
        } catch (Throwable t) {
          Retry retry = method.getAnnotation(Retry.class);
          if (retry != null) {
            base.evaluate();
          } else {
            throw t;
          }
        }
      }
    };
  }
}

With the tests so unlikely to fail, I’d start a lottery at work for whoever breaks them.

Happy testing!

You should subscribe to my feed and follow me on twitter!

Testing Techniques: Managing External Resources

Posted in Programming, testing on April 1st, 2011 by Aviv Ben-Yosef – 1 Comment

A friend approached me with one of the known problems in the testing world – How do you keep external resources under a test harness? Having heard the question a few times before, I thought I’d share my thoughts, and mainly put together the common advice that drifts around the web.

The Dilemma

Nowadays, it’s hard to get more than a 100 lines of code before adding an external resource to our code. It might be a web service to manage something, or some convoluted API to receive data from or just about anything. Usually, writing tests for code that directly talks with these resources using the resources themselves is very problematic, for numerous reasons:

  • It significantly slows the tests, because it requires network access and processing on the service’s side.
  • It might cost you money, send emails, tweet stuff and do things you’d rather not do 300 times a day as you run your tests.
  • Making your code handle error conditions with the service is hard or impossible, as you can’t control when those occur.

Basically, all of these factors usually amount up to you having crappy tests that you rarely run. That sucks.

Decouple & Isolate

The best solution I’m aware of is simply isolating the thing. We usually strive to wrap whatever service we’re using with a single-point interface. The decoupling is great since I’ve yet to encounter a service with an API that matched my thinking of the domain problem. Wrapping it up allows us to keep using our own language and logic throughout the system.

A benefit of that is we now have a simple interface or facade we need to stub/mock out during tests. That’s usually relatively easy, and allows us to run our tests blazingly fast and test all those hard to reach to corner cases.

But what if the service changes?

That’s the finishing touch. You should still maintain a suite of tests that run against the real service. Those should be the plain tests that make sure you’re using the API right and that would break if anything you’re relying on changes. These tests won’t be part of your regular suite that gets run constantly. Instead have your CI server run them daily/weekly and let you know when something changes.

This puts us basically in a win-win situation, with us being able to run our tests quickly and yet have the assurance that we won’t miss API changes and the likes.

Happy testing!

You should subscribe to my feed and follow me on twitter!

Design is Simpler Now: Embrace the Extract

Posted in Programming on March 30th, 2011 by Aviv Ben-Yosef – Be the first to comment

For the past 5 years or so I’ve been searching for ways to produce better designed code. I hate the fact I basically can’t put my finger on why certain designs aren’t as good as others.

That’s why I was really blown away when I first learned about the SOLID principles and started practicing TDD. At last I have found rules that gave me the capability to weigh designs, and a process that helped push me towards what feels like better code.

But even 5 rules were too much for me!

SOLID, no doubt, drives better design. My problem was incorporating it natively with my every day coding. Call me dumb, but I just can’t bring myself to contemplate 5 different aspects whenever I whip up a class. I still find it as an excellent checklist to go through when I’m considering refactorings, but thinking about it constantly just drained a big part of my concentration.

For a few months now I’ve been getting the feeling that my OOD toolset has reduced quite a lot to the very essence. That feeling was also magnified by reading GOOS and pretty much everything written by J. B. Rainsberger here and here.

The first tool I use heavily (and I mean heavily, my mind has managed to get OCD about it) is duplication – or DRY. This tool alone makes any codebase a magnitude better. I’ve written plenty about DRY before.

But, just yesterday I realized that other than that, I mainly concentrate on one thing, as I contemplated on twitter:

I think I can sum up all my OOD skills with “wait, shouldn’t this be in a different class/method?” Wondering if that’s a good thing…

Yup, that’s the trick. I was quickly assured by two amazing guys that have been doing this longer than I’ve been breathing, agile manifesto authors:

Ron Jeffries: Yes it is a good thing. I would suspect you also note duplication?

James Grenning: Think of the alternative.. you are asking the right question

You see that? Noticing duplication and moving stuff somewhere else. That’s all there’s to it. This simple question directs at you the Single Responsibility Principle and generally, along with DRY, covers most of the bases needed to adhere to the elements of simple design.

The main question I ask myself now every time I think of a problem, start changing a function, write a test, and at just about anytime I’m coding is “is this the right place for this?” And quite often the answer is “no.” Push this forward and beautiful designs show up, designs of short, cohesive classes. So, to sum it up: Embrace the Extract.

You should subscribe to my feed and follow me on twitter!

Crafting Up – Community is Key

Posted in Programming on March 27th, 2011 by Aviv Ben-Yosef – 2 Comments

It’s been almost a year now since the founding of our local Software Craftsmanship group. This, for me, is a huge dream-come-true.

For years I’ve been looking for a good community around here to join, went to several meetups and looked around to no avail. My frustration grew about a year ago when I noticed the Chicago community is so buzzing with activity, people there have a meetup every day almost. That’s why when Uri started organizing the first meeting I jumped in whole-heartedly.

In just a few months the meeting has influenced me quite a lot. First of all I got to meet a lot of new, smart and interesting people I never would have otherwise. It’s not easy to find people that are as passionate about our profession as I am, yet our group didn’t disappoint me.

The meetings also supply my need to pair with new people. Pair programming is a magical way of working and sharing knowledge, and I’ve yet to have a session with a new pair without picking up something new. I love the first minutes where we have to find a common language to get things started, and even more the high fives of getting a green bar.

Also, a good community is the best way to get feedback. I can say I’m trying to leech this to the max. I’ve already gave talks/sessions at 2 meetings, bugging people frequently on twitter and the mailing list. A varied community of like minded people allows you to get different outlooks and insights to things you’ve been neck-deep in for a while.

And last but not least, a good community might make magic stuff happen. I don’t know how, but I’m sure our group had something to do with the fact some of us got to have dinner with Uncle Bob and Brett Schuchert, two awesome coders and Clean Code authors, on their last visit here.

Bottom line, be part of a community, and if there isn’t one around you help start it! It’s a great source of kindred spirits, an invaluable and rare resource!

You should subscribe to my feed and follow me on twitter!

Making Embedded GitHub Gists Show Up on RSS Readers

Posted in Uncategorized on March 9th, 2011 by Aviv Ben-Yosef – 1 Comment

Just a quick let-you-know: I found out that the gists I use to embed code in my posts don’t show up on RSS readers (e.g. Google Reader).

I know how annoying it is not to be able to read a blog fully from my reader for me, and so found a nice WordPress plugin called Embed GitHub Gist that handling embedding gists elegantly and also automatically makes sure the code will be displayed even on readers.

I’ve even updated my latest post (about Chef and EC2) to work with it, and new posts from now on will look good too :)

Using Chef to Automatically Configure New EC2 Instances

Posted in Programming, techie on March 7th, 2011 by Aviv Ben-Yosef – 4 Comments

This is a follow up post to my post about using Puppet to get the same result. In the comments to that post I was told by a few people that chef can make my life easier and I decided to give a try. Here’s what I came up with.

In this post, as in the previous one, our goal is to be able to start a new EC2 instance with one command, which will in turn be created and started with Apache running.

First of all, instead of having to set up our own server to tell the newly created instances what to do, we are going to use a hosted chef server on Opscode’s server. The hosting is free for 5 nodes, and so you can try this out without having to pay them. Go to Opscode’s site and register a new user, then also add a new organization.

On our system, we need to start by installing chef. You will also want to install the dependencies needed to make chef talk with EC2 (these are not installed automatically when installing the gem because they’re optional):

gem install chef net-ssh net-ssh-multi fog highline
view raw install.sh This Gist brought to you by GitHub.

Now, we need to setup a chef repository. This repository will contain our cookbooks (libraries that contain recipes, which are scripts for doing stuff, like installing apache) and roles (which map recipes to nodes), among other stuff. To get it run:
git clone git://github.com/opscode/chef-repo.git
view raw clone.sh This Gist brought to you by GitHub.

In the repository create a .chef directory. Now back on Opscode’s site, you need to download 3 files: your organization’s validator key, your user’s key and a generated knife.rb. Once installed, copy them all to the .chef directory:
cp USERNAME.pem ORGANIZATION-validator.pem knife.rb .chef
view raw cp.sh This Gist brought to you by GitHub.

These will be used by the new instances to connect to Opscode and identify themselves as truly being created by you (this saves us from having to hack an awkward solution for this to work on Puppet). Add to your knife.rb file your AWS credentials:
knife[:aws_access_key_id] = "Your AWS Access Key"
knife[:aws_secret_access_key] = "Your AWS Secret Access Key"
view raw knife.rb This Gist brought to you by GitHub.

We will now fetch the apache2 cookbook, which will allow us to install apache on our instances by adding a single configuration line. To download an existing cookbook, do the following:
knife cookbook site vendor apache2
view raw download.sh This Gist brought to you by GitHub.

You can see what other cookbooks are made available by looking around here. Now, we’ll create a role for our instances. Create the file roles/appserver.rb with this data:
name "appserver"
description "An application server"
run_list(%w{
recipe[apache2]
})
view raw appserver.rb This Gist brought to you by GitHub.

And to update our Opscode server with the new cookbook and role:
knife cookbook upload apache2
knife role from file roles/appserver.rb
view raw upload.sh This Gist brought to you by GitHub.

We’re getting really close now! You should have a security group define in AWS that has port 22 (SSH) open, for knife to be able to connect to it and configure it, and port 80 (HTTP) for our Apache to be available. I called mine “chef”. You will also need to decide with AMI (image) to use, you can find a list of AMIs supplied by Opscode here. And now, to create an instance with one command line, as promised:
knife ec2 server create "role[appserver]" --image ami-f0e20899 \
   --groups chef --ssh-user ubuntu --ssh-key my-key
view raw create.sh This Gist brought to you by GitHub.

This will take a while, as knife will create the instance, connect to it, install ruby, chef itself, apache etc. Once it says it has finished simply copy the public DNS of the newly created image (it should be printed once knife finishes) and open it in your browser. My, what a sense of accomplishment one gets from seeing the string “It works!”

I find this a lot easier, cleaner, stream-lined and fun. I’m still learning the ropes with chef, but it has already surprised by being easy to change, being completely git-integrated and by Opscode’s fast support (even for non-paying customers). You can dig further in these links.

You should subscribe to my feed and follow me on twitter!

Fake It Till You Make It – Team Edition

Posted in Programming on March 5th, 2011 by Aviv Ben-Yosef – 1 Comment

Fake it till you make it is a known pattern in Test Driven Development implementation, which means one writes code that acts like it knows what it’s doing in order to know what it’s doing. This is a powerful technique and I’ve already written how using the same trick on the individual scale can help you make your team better.

I just recently realized that I had already seen this principle applied to a whole team which then caused a whole department to follow suit.

Back in 2005, I had the luck to join a particularly interesting team. Hanging around the section the team was part of clearly showed that all other teams regard that specific team (let’s call it A Team) as a highly skilled team. People said they were the XP (Extreme Programming) team, and were generally looked at as an example of how a good team should work.

After joining the team I got a look from the inside of what was really going on. All the developers were highly talented, but being “The XP Team”? Hah! 2 guys have read Kent Beck’s (amazingly awesome) Extreme Programming Explained and simply started pairing and writing automated unit tests before the code.

Simply starting with those 2 small parts of the XP way of doing things got them improved results which then got the rest of the section interested. By simply saying they were going to try that XP thing and saying it made their lives better, the A Team got the ball rolling for the whole section without never even trying to start an Agile Transition.

And this wasn’t a one trick poney! About 2 years later, the same thing happened with Scrum. One teammate read a good intro to it (back when it was still a free PDF), told the rest of the team which then decided to give a try. After a few sprints of seeing how organized standup meetings and the like actually helped our process we decided to keep it.

We didn’t try to “get everyone to realize this is the best way”. Some people happened to come inside our room during standups, or see the scrum board. Those alone got people interested and from then on again, A Team got the section to advance nicely.

This is a marvelous story that I only now realize how rare it is. Simply because the team looked to the rest of the section like they knew what they were doing it got all of them to try agile without having to break down walls or bust open doors. Sometimes just doing what feels right is enough.

Fake it till you make it is just another way of saying “If you build it they will come”!

You should subscribe to my feed and follow me on twitter!

You Owe it to Yourself to be Old-School

Posted in Programming on February 22nd, 2011 by Aviv Ben-Yosef – 35 Comments

I love watching House. My favorite episodes are those where he manages to debug an illness not by knowing an obscure desease, but by having the holistic knowledge of how the body works and thus being able to deduce the real problem.

I find this correlates very much to a set of tools and knowledge a lot of coders are missing that has tremendous value. Joel Spolsky wrote years ago that developers should learn C in order to have a thorough understanding of their environment. I actually think this should be taken a few notches further.

Learn C and some systems programming and you have the ability to grasp basics of most tools you use. How can you spot and truly understand memory leaks without having to manage memory allocation by yourself?

What would you do if some code you wrote or application you use suddenly simply blurts out it has a connection error? Or the Apache server you’re installing is acting up on you? My #1 power tool for these situations is simply opening wireshark and look at what goes through my wire. Learn the basics of TCP/IP and you’ll be able to debug most network problems swiftly.

And don’t get me started on using the shell. No matter what you think, having shell-fu pays off daily. Any text manipulation you’re thinking of, most simple processing tasks – you can whip up a oneliner to do it in less time than most IDEs take to start up.

And the reasons just go on and on. Reading important functions from the Linux kernel will help you understand why Java suddenly won’t fork child processes. Knowing how known security issues work (injections, buffer overflows, etc.) is the only way for you to catch security mistakes at the drawing-on-the-board stage and not at the shit-the-DB-is-stolen stage.

I don’t care if you’re doing Rails and never need to see the outside of a pointer. There’s nothing like having the holistic grasp of things to help you solve problems quickly, a-la Dirk Gently. All these points I’ve made in this post? All real problems solved in the last couple of months with some old-school chops.

Do yourself good – read K&R for some C understanding. Read the first chapters of TCP/IP Illustrated. Read Linux Kernel Development (3rd Edition) for a nice walk-through of the interesting parts. This knowledge won’t get obsolete anytime soon. Can you say that about your favorite framework?

You should subscribe to my feed and follow me on twitter!

Stop Wasting My Code

Posted in Programming on February 5th, 2011 by Aviv Ben-Yosef – 4 Comments

During my service in the army I had the opportunity to move around some electronic equipment from place to place. A lot of it was pretty old (and by that I mean it predates me), but worked perfectly where it was. We had systems running for decades without a problem, but once we unplugged them and moved them to a different room they went dead.

Over time we’ve identified this phenomenon and simply noted that things that aren’t in use stop functioning. It used to puzzle me, but eventually I came to accept this. What still is hard for me to accept though, is the fact that this is exactly the same with software as it is with hardware, if not worse.

I thought I learned this lesson a few years ago, after reading the Pragmatic Programmer and having it hammer YAGNI and KISS to my head, but I keep getting surprised every time I find out that I’ve just done it again.

Actually, learning Git has made this problem rear its ugly head again. Git makes it easy to write up some code and then keep it somewhere. I’d either stash some changes or keep a side branch with some work I started. The really bad part is adding this code to production code, simply because it’s there. The problem is that code gets stale if it’s not really used, and fast.

I can’t think of a single case where we added code before it was actually needed and got something good out of it. Fact is, every line of code you write before there’s a real use case or actual need for is just you guessing. And we’re mostly guessing anyway about stuff we actually need to get done, so why add more ambiguity in there?

As I read in Growing Object-Oriented Software code isn’t sacred simply because it’s there, and it won’t take as long to write it again if you’ll need to. Don’t be afraid to delete code that isn’t actually needed just because you put two hours in it. The time you’ll spend maintaining it will take much more.

This is exactly the Lean definition of Waste – everything not adding value to customers, and adding code just for you to feel better isn’t helping your customers. I now consider waste as one of my sworn enemies. At my work I’ve decided to take on myself the role of do-we-really-need-that dude. It means being a PITA sometimes, but it pays off tenfold.

Next time you feel tempted to commit that code you’re not sure you’ll need anymore, keep in mind the best code is no code.

You should subscribe to my feed and follow me on twitter