Wednesday, April 30, 2014

pro tip for developers: take some time to code up some well known data structures and algorithms yourself

remember all that stuff you learned in school?

It's probably stuff you take for granted when you use data structures built into higher level languages like Java, Scala and C# or libraries such as Apache Commons, Google Guava, and the like. Ultimately that stuff is the foundation of your knowledge as a developer: you don't think about it all the time but the fundamentals are ingrained in you.

That said, I bet if you cracked open a textbook today and decided to read about one of the classics, it may stump you for a bit. Do you remember the details of Dijkstra's algorithm? You may remember it's used for shortest paths in a graph, but you may not remember how it works. What about hash based data structures? Do you remember what makes a good hash function, or different strategies for handling hash collisions? How about string sorting or searching approaches?

If some of these aren't fresh in your head, you shouldn't feel bad. You may have a notion of how they work on a high level still, along with the computational and storage impacts, but you may not remember how to implement them offhand. Going back a few sentences, if you've ever reread these from a textbook, I'm willing to bet you didn't grasp all of it on the first pass. It's pretty dense material, and usually the best algorithms somehow cheat or have some type of trick that gets around some of the computational expense of the problem that you need to get your brain in sync with again.

Now you may be thinking "what's the point of all of this? I almost never have to be hands on with this in my day to day job, that's why it's not at the ready." This is probably the case, but it very well might not be if you're in an interview. High caliber companies will probably ask you about this kind of stuff, and may very well ask you indirectly as a word problem instead of saying "show me how a trie works."

why you should take the time to write some of this code yourself

Ultimately, we all learn a little bit differently, and we speak about problems in ways that only we understand. How much easier do you find your own code to read and comprehend versus someone else's?

For that reason, I'd recommend writing up some solutions yourself, with your own comments and flair, so that if you ever need to refer to it, you don't end up cracking open a big textbook and try to relearn from someone else's words. I had to do this in recent history for a task, we'll call it "homework", where having my own implementation helped me pull the solution out of the archived storage in my head and back into my brain's cache. Everything was familiar to me once I read my own code and my own comments; I even called out the things that tripped me up the first time I revisited the subject matter that tripped me up again when I looked at my textbook.

Big players will ask you this kind of stuff; Google is notorious for this and will typically suggest study guides to candidates, and there's also a famous blog post about how to prepare. I've known people who studied for months for Google interviews, and I'm willing to bet that they wrote out these algorithms themselves to prepare. So, given that, why not be ahead of the game?

stuff worth focusing on

Perhaps I've sold you on this idea. If I have, here's what I'd suggest for resources and problems to focus on to help you.

First, you should buy Algorithms by Robert Sedgewick and Kevin Wayne. It's one of the better algorithm books I've read (still dense, but reasonable), and has a lot of code samples and illustrations to help you understand.

Second, I'd recommend dabbling in the following as focus points:

  • Time-order complexity and Big-O notation (document the code you write with this, as fine grained as you can)
  • Sorting
  • Hashing and hash based data structures
  • Trees (binary/red-black)
  • Graphs and graph traversal
  • String sorting and searching
  • Data structures with interesting properties like min heaps and tries
  • Concurrency, for example map/reduce or a blocking work queue with a poison pill termination condition

All of these are pretty broad subjects, but they yield a lot in terms of the number of computer science problems related to them. Again, further emphasizing the point, writing code for the subjects above on your own gives you a persistent way to communicate to yourself in a very high fidelity capacity, and provides you some nice examples if you're tasked with having this knowledge at the ready.

I mean what I said about that textbook too; it's a very well written book that covers a wide variety of classic problems and algoritms, and you owe it to yourself to add it to your collection. :)

No matter what position you're in, I hope this advice helps you in your career in the long haul. If you have questions, or if you've done this and had it help you, please feel free to comment below!