Tech-Foo: design

Showing posts with label design. Show all posts

Why your python editor sucks

I'm doing a reasonable amount of python-coding work these days. It would help me to have an editor that doesn't suck. My requirements are:

Small & Fast. I'm not after a massive clunky IDE, just an editor with enough smarts to make editing multiple python files easier.
Sensible syntax highlighting.
Understands python indentation, PEP8 style. Specifically, indents with 4 spaces, backspace key can be used to unindent.
Can be integrated with one or more lint checkers. Right now I use a wonderful combination of pep8, pyflakes and pylint. I want the output of these to be integrated with the editor so I can jump to the file & line where the problem exists.

That's it. I don't think I'm asking too much. Here's the editors I've tried, and why they suck:

KATE. I love kate, it's my default text editor for almost everything. However, there is no way to integrate lint checkers. I could write a plugin, but that's yet another distraction from actually doing my work.
Vim. I'm already reasonably skilled with vim, and Alain Lafon's blog post contains some great tips to make vim even better. My problem with vim is simply that it's too cryptic. Sure, I could spend a few years polishing my vim skills, but I want it to just work. Vim goes in the "kind of cool, but too cryptic" basket.
Eric. When you launch eric for the first time it opens the configuration dialog box. It looks like this:
How many options do I really need for an editor? Over-stuffed options dialogs is the first sign of trouble. It gets worse however, once you dismiss the settings window, the editor looks like this:

Need I say more?
Geany. Looks promising, but no integration into lint checkers.
pida. Integrates with vim or emacs for the editor component. Looks promising, although the user interface is slightly clunky in places. Pida suffers from exactly the same problems as vim does however, but I may end up using it.

There are a few options I have not tried, and probably won't:

Eclipse & pydev. Eclipse is a huge, hulking beast. I want a small, fast, lean editor, not an IDE.
Emacs. Can't be bothered learning another editor. Doesn't look that much different to vim, so what's the point in learning both?
KDevelop. Same reason as Eclipse, above.

I suspect there's a market for a simple python editor that just works. Please! Someone build it!

Design and Implementation

One of the key tenets in good software deisgn is to separate the design of your product from it's implementation.

In some industries, this is much harder to do. When designing a physical product, the structural strength & capabilities of the material being used must be taken into account. There's a reason most bridges have large columns of concrete and steel going down into the water below. From a design perspective, it'd be much better to not have these pillars, thereby disturbing the natural environment less and allowing shipping to pass more easily.

Photo by NJScott. An example of design being (partially) dictated by implementation.

Once you start looking for places where the implementation has "bubbled up" to the design, you start seeing them all over the place. For example, my analogue wristwatch has a date ticker. Most date tickers have 31 days, which means manual adjustment is required after a month with fewer than 31 days. I'm prepared to live with this. However, the date ticker on my watch is made up of two independent wheels - and it climbs to 39 before rolling over, which means manual intervention is required every month! What comes after day 39? day 00 of course!

It's easy to understand why this would be the case - it's much simpler to create a simple counting mechanism that uses two rollers and wraps around at 39 than it is to create one that wraps at the appropriate dates. I have yet to see an analogue wristwatch that accounts for leap-years.

Software engineers have a much easier time; our materials are virtual - ideas, concepts and pixels are much easier to manipulate than concrete and steel. However, there are still limitations imposed on us - for example data can only be retrieved at a certain speed. Hardware often limits the possibilities open to us as programmers. However, these limitations can often be avoided or disguised. Naive implementations often lead to poor performance. A classic example of this is Microsoft's Notepad application. Notepad will load the entire contents of the file into memory at once, which can take a very long time if the file you are opening is large. What's worse is that it will prevent the user from using the application (notepad hangs, rendering it unusable) while this loading is happening. For example, opening a 30MB text file takes roughly 10 seconds on this machine. This seems particularly silly when you consider that you can only ever see a single page of the data at a time - why load the whole file when such a small percentage of it is required at any one time? I guess the programmers who wrote notepad did not intend for this use case, but the point remains valid: an overly-simple implementation led to poor performance.

The unfortunate state of affairs is that the general population have been conditioned to accept bad software as the norm. There really is no excuse for software that is slow, crashes, or is unnecessarily hard to use. It's not until you use a truly incredible piece of software that you realise what can be achieved. So what needs to change? Two things:

Developers need to be given the tools we need to make incredible software. These tools are getting better all the time. My personal preference for the Qt frameworks just paid off with the beta release of Qt 4.7 and QT Creator 2.0. I plan on writing about the new "Quick" framework in the future: I anticipate it making a substantial difference to the way UI designers and developers collaborate on UI design and construction.
Users need to be more discerning and vocal. As an application developers it can be very hard to know what your users think. If you don't get any feedback, are your users happy, or just silent? We need a better way for users to send feedback to developers; it needs to be low-effort fast and efficient.

Product Branding Critical to User Expectations

A recent post on kdevelopers.org caught my attention. In a post tagged "rant" (I love rants), "tstaerk" outlines his situation:

I am in a small team where we provide a Linux Terminal Server (LTS) for a company. It is based on NX. Every employee in this company can use the service, however, we provide it free of charge and out of enthusiasm. That means, we are not paid for setting it up nor for giving phone-support. We sometimes have 70 concurrent users on the server, that may mean we reach 500 users on the whole. The server is running KDE 3.5 as desktop environment. Recently, we evaluated - no, let me keep this understandable - we sat together and discussed the possibility of upgrading to KDE 4.

Everyone including me was against the upgrade. This is especially ashaming for me as I am spending every weekend to develop KDE. So what were the reasons?

KDE4 seems to have suffered a lot from people complaining that it's not an easy upgrade from KDE3, and to a certain extent, the complaint is justified. When KDE4 first arrives on the scene, it was really little more than a tech demo, and certainly not usable by normal users (I use the term "normal" users with all due respect - "normal" in this context means users not in the KDE development scene, and perhaps not as technically literate as the developers). However, the hype surrounding it's release meant that lots of normal users upgraded and were subsequently disappointed. With KDE4.2, we're finally getting to a stage where the KDE 4 series is actually usable as a desktop environment.

However, that's still not answering the original concern: KDE4 is still not a replacement for KDE3. Tstaerk goes on to list some of the shortcomings he sees in KDE4:

If you install a KDE 4 desktop by default, you do not have the possibility to add icons to your desktop by right-clicking onto the desktop. That would mean to us: Take 500 phone calls, explain users why it is no longer possible, explain 500 times why we do a change if it is a change to the worse... You got it, 500 times an ENOTAMUSED.

If you install a KDE 4 desktop by default, you do not have the possibility to move the clock in the panel. For me, the clock is ticking constantly on the left where I do not want it. Our users will be upset seeing another change to the worse. Yes, there is a work-around but it is so complicated that I do not want to tell it 500 times on the phone

If you install a KDE 4 desktop by default, you get a strange icon in the upper right corner. No one could explain to me what it is called, but everybody said it was something about Plasma. Users will click on it and eventually hit "Zoom out". Then, their screen is filled with strange gray squares. Just imagine you have to sit on a phone and answer 500 phone calls (for no money) from users who all tell you something about "squares" not knowing they should call it "activities".

Does he have a point? Perhaps.

In some ways, calling the new product KDE4 implies an easy upgrade path from KDE3, which is misleading, since many aspects of the product have been written from scratch, and behave in a totally different manner. It would have been a better decision, I think to brand KDE4 in such a way that it was obvious that it was a new product, that would not work in the same way. This in turn might have saved a considerable amount of grief when developers found that their snazzy new technologies were being ignored, since users could not use the product like they were used to.

So, the moral of the story?

Be careful how you brand your product, especially when a newer version breaks compatibility with an older version. Is it an upgrade, or a new entity in it's own right?

Ten Things to Teach Programming Students

While talking to a friend recently, we began discussing the role of graduates in the industry. My belief is that employers employ graduates and expect them to have the same skill level as their existing, trained employees (I have certainly seen this first-hand). Having been on the "other side" of the problem I appreciate that graduates are rarely fit for the tasks set for them without further training.

This got me thinking: If there were 10 things graduates should know before graduating, what should they be? What short list of skills can graduates teach themselves to become better than their competition (and getting that first job is just that: a competition). That train of thought spawned the following list:

Ten things programming students should know before graduating:

Inheritance & Composition. In the land of OO, you must know what inheritance does for you. In C++, this means that you must know what public, protected and (rarely used) private inheritance means. If class A is publically inherited form class B, that does that tell you about the relationship between A and B? What about if the inheritance was protected, rather than public? In a similar vein, what does virtual inheritnace do, and when would you want to use it? Sooner or later a graduate programmer will discover a complex case of multiple inheritance, and they need to be able to cope with it in a logical fashion. Knowing the answers to the above questions will help.
Unfortunately, a lot of the time inheritance is over-used. Just because we have access to inheritance, doesn't mean we should use it all the time! Composition can be a useful tool to provide clean code where inheritance would muddy the waters. Composition is such a basic tool that many graduates don't even think of it as a tool. Experience will teach when to use composition and when to use inheritance. Graduates have to know that both can be solutions to the same problem.
Memory Allocation. So many graduates do not understand the importance of cleaning up after yourself. Some do not fully appreciate the difference between creating objects on the stack and on the heap. Some know that but fail to understand how memory can be leaked (exceptions are a frequent cause of memory leaks in novice programmers). Every programmer should know the basic usage of new, new[], delete and delete[], and should know when and how to use them.
Exceptions. Most programmers share a love / hate relationship with exceptions; You gotta know how to catch them, but at the same time you tend to avoid using them yourself. Why? Because exceptions should be .... exceptional! There's a reasonably large amount of overhead associated with throwing and catching exceptions. Using exception as return values or flow-control constructs are two examples of exception mis-use. Exceptions should be thrown only when the user (or programmer) does something so bad that there's no way to easily fix or recover from it. Running out of resources (whether it be memory, disk space, resource Ids or whatever) is a common cause for exceptions to be thrown.
Const correctness. Const correctness is so simple, yet so many programmers just don't bother with it. The big advantage of const-correctness is that it allows the compiler to check your code for you. By designating some methods or objects const you're telling the compiler "I don't want to change this object here". If you do accidentally change the object the compiler will warn you.
Threading. Threading is hard. There's no simple way around this fact. Unfortunately, the future of PC hardware seems to be CPUs with many cores. Programs that do not make use of multiple threads have no way to make use of future hardware improvements. Even though using libraries like Qt that make it ridiculously easy to create threads and pass data between threads, you still need to understand what a thread is, and what you can and cannot do. A very common thing I see in new programmers is a tendency to use inadequate synchronization objects in threads. Repeat after me: "A volatile bool is not a synchronization object!".
Source control. Every programmer on the planet should know how to use at least one version control system. I don't care if it's distributed or not, whether it uses exclusive locks or not, or even if it makes your tea for you. The concepts are the same. Very few professional programmers work alone. Graduates must be able to work in a team - that includes managing their code in a sensible fashion.
Compiler vs Linker. Programmers need to understand that compiling an application is a two step process. Compilation and Linking are two, discreet, and very different steps. Compiler errors and Linker errors mean very different things, and are resolved in very different ways. Programmers must know what each tool does for them, and how to resolve the most common errors.
Know how to debug. When something goes wrong, you need to know how to fix it. Usually, finding the problem is 90% of the work, fixing it is 5% of the work, and testing it afterwards is another 10%. No, that's not a typo - it does add up to more than 100%, which is why there's a lot of untested code out there! Of course, if you were really good you wouldn't write any bugs in the first place!
Binary Compatibility. This one is for all those programmers that write library code, or code that gets partially patched over time. As you probably already know, shared libraries contain a table of exported symbols. If you change that table so a symbol is no longer available (or it's signature changes), code that uses that symbol will no longer work. There's a list of things you can and cannot do while maintaining binary compatability, and it's very hard not to break those rules, even if you know what you're doing. I've blogged about this before, and linked to the KDE binary compatibility page on techbase - worth a read!
The main method of maintaining binary compatibility is to program to an interface, rather than to an implementation. Once you start paying attention to binary compatibility, you'll quickly realise that it's a very bad idea to export your implementation from a shared library, for one simple reason: If you want to change your implementation you're stuck with the restrictions placed upon you by the need to maintain binary compatibility. If all you export is a pure interface and a means to create it (possibly via a factory method) then you can change the implementation to your heart's content without having to resort to pimpl pointers.
Read the right books. There are a few movers and shakers in the programming industry that it pays to keep an eye on. There are many books worth reading, but I'm going to recommend just two. The first is "Design Patterns: Elements of Reusable Object-Oriented Software", and the second is the "Effective C++" series. Neither are considered to be great bedtime reading, both are considered to be packed from cover to cover with things that will help you out in every-day situations. Any programmer worth his or her salt will own a copy of at least one of these books, if not both. Of course, there are books on UI design and usability, threading, text searching, SQL and database maintenance, networking, hardware IO, optimisation, debugging... the list goes on.
Networking. What's this? An 11th item? That's right: it's in here because it cannot be ignored in most programming tasks. It's getting harder and harder to avoid networking. Most graduates will have to write code that sends data over a network sooner or later, so they'll need to know the difference between UDP and TCP and IP, as well as what the basic network stack looks like (think "Please Do Not Touch Steve's Pet Alligator"), and what each layer does. Being familiar with tools like wireshark helps here.

What's not in the list:

You may notice that I haven't included any specific technologies in this list. That's because I firmly believe that it really doesn't matter. Sure, there are some libraries that are better than others (I'd bet my life on a small set of libraries), but the programmer next to me has a different set. I care not one grote whether a graduate knows how to program in .NET, Qt, wxWidgets or anything else - as long as they're willing to learn something new (whatever I'm using on my project).

Which brings me nicely to the conclusion: The single quality I see in all the programmers I admire is a sense of curiosity; a restlessness and a sense of adventure. Our industry is constantly shifting. The best programmers are able to ride the changes and come out better for it.

Is this post horribly self-indulgent and boring? Probably, but it had to be done. Have I forgotten anything? Things you feel should be on the list that are missing? Remember that the point of the exercise is to keep a small list - I could list every programming skill and technology required under the sun, but that would not be very useful would it?

threads vs. processes

It's been a few weeks since my last post. My excuse is that I've been busy - my job is always busy around this time of year due to the IBC trade show. Thankfully most of the work is now done. In my own time I've been working on a number of projects (to be unveiled shortly, once they're usable).

In the last few weeks we've seen the launch of the google chrome browser. I won't discuss it here directly (plenty of other people have reviewed it separately). I will, however mention the one feature that has converted me away from firefox and towards chrome:

Google Chrome has a multi-process architecture, meaning tabs can run in separate processes from each other, and from the main browser process.

What this means in theory is that a misbehaving website cannot bring down the entire browser session - you can just close / kill the offending tab / window.

This is pretty cool, but how does it relate to my own work?

I'm writing an application that makes extensive input of python scripts to customize the input and output of the application. I could have spend a while embedding the python scripts using the python API, and then running those separate scripts in a thread, but instead I chose to run them as separate processes spawned by a central application. The way I see it, this approach has several advantages:

Crashing child processes (in this case python scripts) are unlikely to bring down the entire application - you still need to be careful since there's likely to be communication between child and parent process, and the parent needs to be able to hand corrupted data in that communication. Other than that, if one of my scripts breaks, I can easily carry on; no error handling or cleanup needs to be done.
In my application, I need to repeatedly call the script files - probably thousands of times for a single application run. In a traditional threaded environment the possibility for memory leaks is huge. This way, since each sub-process is short lived, the operating system takes charge of cleaning up any memory not deallocated by the child process.
It's a damn site easier to program too. No thread synchronization required!

It's not all roses however.

From my experiences the main drawback is that it's much harder to pass data between parent and child processes. For simple communication it may be enough to use stdin and stdout, but for anything more complicated you'll want to use some form of proper IPC (sockets, named pipes, DBus etc). Even with proper IPC it's still harder to pass custom data structures between processes, since you'll need some sort of data serialization.

Think this all sounds obvious?

That's because it really is! I can't believe how much simpler this approach is, especially if your requirements are for simple delegation to scripts or sub-processes. If you require more complex interaction you might find this approach more trouble than it's worth.

Problem with pimpls

That's "pimpl", not "pimple" - I'm talking about opaque pointers, these beauties help protect your public interfaces from changing implementation. This useful technique has a few drawbacks that aren't so well publicised. In order to prevent others making the same mistakes I have, I thought I'd outline the general use of the pimpl pointer, and some of it's drawbacks:

Here's an example of a first-pass class to encapsulate a user account (I'm making this up on the fly, so bear with me):


class
userAccount
{
public:
    // public methods go here
private:
    // private data members:
    unsigned int accountId;
    std::string username;
    std::string realName;
};

This will work just fine, but there's a problem. If you need to change the implementation (say you want to store the user's real name in two fields instead of one), unless you're very careful you will end up changing the size and / or the declaration of the class.

Changing the size of the class is a big problem if you're trying to maintain binary compatibility. Changing the declaration of the class is a problem because (some) compilers will now recompile every file that includes your changed header file, even if the changes make no difference to the binary output.

The solution comes in the form of an opaque pointer, or "pointer to implementation" (which is where we get the charming "pimpl" name from). The idea is that the implementation details are put in a separate class that is forward declared in the header file, and fully declared in the cpp file. Your external interface now only contains a single pointer - you can change the size of your implementation class to your hearts content, and you will never change the size or declaration of your external interface. The class above refactored to use a pimpl pointer looks like this:

Header File:


// forward declare implementation class:
class
userCountImpl;

class
userAccount
{
public:
    userAccount();
    // public methods go here
private:
// private data members:
    userAccountImpl *pimpl;
};

Source File:


class
userAccountImpl
{
public:
    unsigned int accountId;
    std::string username;
    std::string realName;
};


userAccount::userAccount()
: pimpl(new userAccountImpl)
{
}

Now any data members can be accesed via the private implementation class. There are several things you can do to extend the example above (using a shard_ptr is a start), but I want to keep things simple for the sake of the example.

Until recently, I took this method for granted and used it as often as I could. As so often happens when I learn something new I rush to use it in every possible situation, including ones where it doesn't make sense. The pimpl idiom has a few problems associated with it, which I will outline here:

Your object's memory footprint is now split into more than one place in memory. This may not be a huge problem for 90% of classes, but consider a small utility class that contains only standard data types:
```
class colour
{
public:
    // public methods go here
private:
    unsigned char red;
    unsigned char green;
    unsigned char blue;
};
```
In order to serialise this into a buffer (like a file), you can get away with using a memcpy or similar technique. Since your object's memory footprint is contiguos, copying the entire object into a file is simple (Yes yes, I know: there are many reasons why this isn't a good idea, but let's face it - this happens all the time). Once you start using pimpl pointers it gets a bit more difficult. Since your implementation class is private to the .cpp file, the code to do the copying needs to be in the same cpp file (otherwise it doesn't have access to the definition of the implementation class). This is relatively easy to work around, but the trouble doesn't stop there - consider what you need to do to un-serialise an object from a buffer. You can no longer be cheeky and use a reinterpret-cast like so:
```
colour *pCol = (colour*) pBuffer;
```
Again - I realise that this isn't the best idea in the world, but when you're programming with constraints sometimes this is the best way to do things.
The default new and delete operators are expensive. I never realized just how expensive they can be. In a recent bout of performance testing on some real-time software I saw the default new operator take 55ms to allocate a block of 4B of memory. That's way too slow for the real-time application i was working on, and may be too slow for other applications as well. What's more, the times get worse the more memory you allocate - so using the pimpl pointer may not be a good idea at all, since it adds the overhead of a new call to each constructor, and a delete to each destructor. ouch!

If you're looking for more info on the pimpl idiom, Sutter wrote a good article on the pimpl pointer, and a more recent article that talks about some of the performance issues associated with pimpl pointers. This technique is worth using - it can save many attacks of "code cheese" in the future, just be careful where you us it, or you may end up with some nasty performance issues you didn't expect!

Pet Project: game

Yep, I'm writing a small game. I'm usually rather tight-arsed about proper code design before you start writing code, but I've realized that a lot of the time this stops me writing any code at all.

For this project my general methodology is to write whatever comes to mind, and be prepared to throw away code that I think is too crap to last in the final build.

In fact, my design phase has been so minimalistic I don't even have a name for the game. For now, it's just called "game". It's going to be a land-cased top-down 2D shooter, with lots of weapons and cool stuff. Right now it can load a very simple level format from XML; you can control the player using the keyboard "WASD" keys, and fire the players rifle using the mouse. Bullets have collision detection.

Here's an early video that shows the game sans collision detection:

My next step will be to have predefined objects for a level (right now they're all just boxes). I might start with an immobile gun turret - that should let me get some AI going for the enemies, as well as some health stats for the player and enemies.

Eventually I want to make the game easily extensible using QtScript for enemy AI.

Working with Qt4 has made this project an absolute breeze - there's hardly any code in this project! When it gets a few more features I'll upload the source online.

Cheers,!

Why this website sucks: A rant on poor web design

I have just had an epiphany of biblical proportions:

Fixed-width websites suck.

Okay, so it's not a huge revelation, but still, I was quite proud of myself. Why is it that website designers think they know how large I want their website to appear on my screen? I have two 20" monitors, and many websites show content in less than half of my browser window.

Any decent website would have a template that showed content at whatever resolution the viewer wanted.

That's why there'll be some changes around here. I'm going to try and design my own blogger template, or at least rip of someone else's good work and call it my own.

Edit:

Round one of changes has been completed. The new theme is based on the stretch-denim blogspot theme, with a few revisions of my own.

Tech-Foo