All BLacks vs. Wales

I don't normally write about sport, but I thought I'd mention that I'm about to start the long drive to Wales to see the match this weekend. My money's on the All Blacks to take the match, but many people have suggested that it may be a close call.

Will post an update upon my return!

EDIT: Well, I'm back. The Rugby was awesome, possibly the best game I've ever seen. I wanted to post a link to a highlights video on YouTube, but it seems no one has posted one yet. Instead, I'll post a video of the initial stand-off between the teams after the Haka. The atmosphere was incredible - a full house (74 thousand strong) all screaming at the top of their lungs. Good fun. Anyway, you can see it here:

The data: URL scheme

Here's something you may not already know: You can include data directly in an (x)html page, instead of referencing it externally.

For example, most of the time when you want to display an image you would write code like this:

<img src="" alt="some random image" />

Web browsers downloading your HTML source will download the text first, then download any external references, including "some_image.png" (assuming the user has not turned off image downloading).However, there are a few cases where you want to distribute an HTML file with images, but don't want to distribute multiple files. In those cases, the 'data:' URL scheme is what you need.

The scheme is documented in the (very readable) RFC2339. Essentially, you can include the binary data straight into your HTML code. The example they give in the RFC looks like this:

<img src="%20%20%20AAAC8IyPqcvt3wCcDkiLc7C0qwyGHhSWpjQu5yqmCYsapyuvUUlvONmOZtfzgFz%20%20%20ByTB10QgxOR0TqBQejhRNzOfkVJ+5YiUqrXF5Y5lKh/DeuNcP5yLWGsEbtLiOSp%20%20%20a/TPg7JpJHxyendzWTBfX0cxOnKPjgBzi4diinWGdkF8kjdfnycQZXZeYGejmJl%20%20%20ZeGl9i2icVqaNVailT6F5iJ90m6mvuTS4OK05M0vDk0Q4XUtwvKOzrcd3iq9uis%20%20%20F81M1OIcR7lEewwcLp7tuNNkM3uNna3F2JQFo97Vriy/Xl4/f1cf5VWzXyym7PH%20%20%20hhx4dbgYKAAA7" alt="Larry" />

Which equates to this image:


There are many reasons why you wouldn't want to do this - it increases the size of your HTML file, forcing users to download more before they can see whether your content is what they want (especially if the embedded data is near the beginning of the file). There are also some limitations on the size of data and those limitations vary depending on where this technique is used. Still, it's a useful technique that can be used when you need to embed small amounts of binary data within an HTML file and you don't want to distribute multiple files.

Ten Things to Teach Programming Students

While talking to a friend recently, we began discussing the role of graduates in the industry. My belief is that employers employ graduates and expect them to have the same skill level as their existing, trained employees (I have certainly seen this first-hand). Having been on the "other side" of the problem I appreciate that graduates are rarely fit for the tasks set for them without further training.

This got me thinking: If there were 10 things graduates should know before graduating, what should they be? What short list of skills can graduates teach themselves to become better than their competition (and getting that first job is just that: a competition). That train of thought spawned the following list:

Ten things programming students should know before graduating:
  1. Inheritance & Composition. In the land of OO, you must know what inheritance does for you. In C++, this means that you must know what public, protected and (rarely used) private inheritance means. If class A is publically inherited form class B, that does that tell you about the relationship between A and B? What about if the inheritance was protected, rather than public? In a similar vein, what does virtual inheritnace do, and when would you want to use it? Sooner or later a graduate programmer will discover a complex case of multiple inheritance, and they need to be able to cope with it in a logical fashion. Knowing the answers to the above questions will help.
    Unfortunately, a lot of the time inheritance is over-used. Just because we have access to inheritance, doesn't mean we should use it all the time! Composition can be a useful tool to provide clean code where inheritance would muddy the waters. Composition is such a basic tool that many graduates don't even think of it as a tool. Experience will teach when to use composition and when to use inheritance. Graduates have to know that both can be solutions to the same problem.

  2. Memory Allocation. So many graduates do not understand the importance of cleaning up after yourself. Some do not fully appreciate the difference between creating objects on the stack and on the heap. Some know that but fail to understand how memory can be leaked (exceptions are a frequent cause of memory leaks in novice programmers). Every programmer should know the basic usage of new, new[], delete and delete[], and should know when and how to use them.

  3. Exceptions. Most programmers share a love / hate relationship with exceptions; You gotta know how to catch them, but at the same time you tend to avoid using them yourself. Why? Because exceptions should be .... exceptional! There's a reasonably large amount of overhead associated with throwing and catching exceptions. Using exception as return values or flow-control constructs are two examples of exception mis-use. Exceptions should be thrown only when the user (or programmer) does something so bad that there's no way to easily fix or recover from it. Running out of resources (whether it be memory, disk space, resource Ids or whatever) is a common cause for exceptions to be thrown.

  4. Const correctness. Const correctness is so simple, yet so many programmers just don't bother with it. The big advantage of const-correctness is that it allows the compiler to check your code for you. By designating some methods or objects const you're telling the compiler "I don't want to change this object here". If you do accidentally change the object the compiler will warn you.

  5. Threading. Threading is hard. There's no simple way around this fact. Unfortunately, the future of PC hardware seems to be CPUs with many cores. Programs that do not make use of multiple threads have no way to make use of future hardware improvements. Even though using libraries like Qt that make it ridiculously easy to create threads and pass data between threads, you still need to understand what a thread is, and what you can and cannot do. A very common thing I see in new programmers is a tendency to use inadequate synchronization objects in threads. Repeat after me: "A volatile bool is not a synchronization object!".

  6. Source control. Every programmer on the planet should know how to use at least one version control system. I don't care if it's distributed or not, whether it uses exclusive locks or not, or even if it makes your tea for you. The concepts are the same. Very few professional programmers work alone. Graduates must be able to work in a team - that includes managing their code in a sensible fashion.

  7. Compiler vs Linker. Programmers need to understand that compiling an application is a two step process. Compilation and Linking are two, discreet, and very different steps. Compiler errors and Linker errors mean very different things, and are resolved in very different ways. Programmers must know what each tool does for them, and how to resolve the most common errors.

  8. Know how to debug. When something goes wrong, you need to know how to fix it. Usually, finding the problem is 90% of the work, fixing it is 5% of the work, and testing it afterwards is another 10%. No, that's not a typo - it does add up to more than 100%, which is why there's a lot of untested code out there! Of course, if you were really good you wouldn't write any bugs in the first place!

  9. Binary Compatibility. This one is for all those programmers that write library code, or code that gets partially patched over time. As you probably already know, shared libraries contain a table of exported symbols. If you change that table so a symbol is no longer available (or it's signature changes), code that uses that symbol will no longer work. There's a list of things you can and cannot do while maintaining binary compatability, and it's very hard not to break those rules, even if you know what you're doing. I've blogged about this before, and linked to the KDE binary compatibility page on techbase - worth a read!
    The main method of maintaining binary compatibility is to program to an interface, rather than to an implementation. Once you start paying attention to binary compatibility, you'll quickly realise that it's a very bad idea to export your implementation from a shared library, for one simple reason: If you want to change your implementation you're stuck with the restrictions placed upon you by the need to maintain binary compatibility. If all you export is a pure interface and a means to create it (possibly via a factory method) then you can change the implementation to your heart's content without having to resort to pimpl pointers.

  10. Read the right books. There are a few movers and shakers in the programming industry that it pays to keep an eye on. There are many books worth reading, but I'm going to recommend just two. The first is "Design Patterns: Elements of Reusable Object-Oriented Software", and the second is the "Effective C++" series. Neither are considered to be great bedtime reading, both are considered to be packed from cover to cover with things that will help you out in every-day situations. Any programmer worth his or her salt will own a copy of at least one of these books, if not both. Of course, there are books on UI design and usability, threading, text searching, SQL and database maintenance, networking, hardware IO, optimisation, debugging... the list goes on.

  11. Networking. What's this? An 11th item? That's right: it's in here because it cannot be ignored in most programming tasks. It's getting harder and harder to avoid networking. Most graduates will have to write code that sends data over a network sooner or later, so they'll need to know the difference between UDP and TCP and IP, as well as what the basic network stack looks like (think "Please Do Not Touch Steve's Pet Alligator"), and what each layer does. Being familiar with tools like wireshark helps here.

What's not in the list:

You may notice that I haven't included any specific technologies in this list. That's because I firmly believe that it really doesn't matter. Sure, there are some libraries that are better than others (I'd bet my life on a small set of libraries), but the programmer next to me has a different set. I care not one grote whether a graduate knows how to program in .NET, Qt, wxWidgets or anything else - as long as they're willing to learn something new (whatever I'm using on my project).

Which brings me nicely to the conclusion: The single quality I see in all the programmers I admire is a sense of curiosity; a restlessness and a sense of adventure. Our industry is constantly shifting. The best programmers are able to ride the changes and come out better for it.

Is this post horribly self-indulgent and boring? Probably, but it had to be done. Have I forgotten anything? Things you feel should be on the list that are missing? Remember that the point of the exercise is to keep a small list - I could list every programming skill and technology required under the sun, but that would not be very useful would it?