Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Visualising the Ubuntu Package Repository

Like most geeks, I like data. I also like making pretty pictures. This weekend, I found a way to make pretty pictures from some data I had lying around.

The Data: Ubuntu Packages

Ubuntu is made up of thousands of packages. Each package contains a control file that provides some meta-data about the package. For example, here is the control data for the autopilot tool:

Package: python-autopilot
Priority: optional
Section: universe/python
Installed-Size: 1827
Maintainer: Ubuntu Developers 
Original-Maintainer: Thomi Richards 
Architecture: all
Source: autopilot
Version: 1.2daily12.12.10-0ubuntu1
Depends: python (>= 2.7.1-0ubuntu2), python (<< 2.8), gir1.2-gconf-2.0, gir1.2-glib-2.0, gir1.2-gtk-2.0, gir1.2-ibus-1.0, python-compizconfig, python-dbus, python-junitxml, python-qt4, python-qt4-dbus, python-testscenarios, python-testtools, python-xdg, python-xlib, python-zeitgeist
Filename: pool/universe/a/autopilot/python-autopilot_1.2daily12.12.10-0ubuntu1_all.deb
Size: 578972
MD5sum: c36f6bbab8b5ee10053b63b41ad7189a
SHA1: 749cb0df1c94630f2b3f7a4a1cd50357e0bf0e4d
SHA256: 948eeee40ad025bfb84645f68012e6677bc4447784e4214a5512786aa023467c
Description-en: Utility to write and run integration tests easily
 The autopilot engine enables to ease the writing of python tests
 for your application manipulating your inputs like the mouse and
 keyboard. It also provides a lot of utilities linked to the X server
 and detecting applications.
Homepage: https://launchpad.net/autopilot
Description-md5: 1cea8e2d895c31846b8d3482f96a24d4
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Origin: Ubuntu

As you can see, there's a lot of information here. The bit I'm interested in is the 'Depends' line. This lists all the packages that are required in order for this package to work correctly. When you install autopilot, your package manager will install all it's dependencies, and the dependencies of all those packages etc. This is (in my opinion), the best feature of a modern Linux distribution, compared with Windows.

Packages and their dependant packages form a directed graph. My goal is to make pretty pictures of this graph to see if I can learn anything useful.

Process: The Python

First, I wanted to extract the data from the apt package manager and create a graph data structure I could fiddle with. Using the excellent graph-tool library, I came up with this horrible horrible piece of python code:

#!/usr/bin/env python

from re import match, split
from subprocess import check_output
from debian.deb822 import Deb822
from graph_tool.all import *

graph = Graph()
package_nodes = {}
package_info = {}


def get_package_list():
    """Return a list of packages, both installed and uninstalled."""
    output = check_output(['dpkg','-l','*'])
    packages = []
    for line in output.split('\n'):
        parts = line.split()
        if not parts or not match('[uirph][nicurhWt]', parts[0]):
            continue
        packages.append(parts[1])
    return packages


def get_package_info(pkg_name):
    """Get a dict-like object containing information for a specified package."""
    global package_info
    if pkg_name in package_info:
        return package_info.get(pkg_name)
    else:
        try:
            yaml_stream = check_output(['apt-cache','show',pkg_name])
        except:
            print "Unable to find info for package: '%s'" % pkg_name
            package_info[pkg_name] = {}
            return {}
        d = Deb822(yaml_stream)
        package_info[pkg_name] = d
        return d


def get_graph_node_for_package(pkg_name):
    """Given a package name, return the graph node for that package. If the graph
    node does not exist, it is created, and it's meta-data filled.

    """
    global graph
    global package_nodes

    if pkg_name not in package_nodes:
        n = graph.add_vertex()
        package_nodes[pkg_name] = n
        # add node properties:
        pkg_info = get_package_info(pkg_name)
        graph.vertex_properties["package-name"][n] = pkg_name
        graph.vertex_properties["installed-size"][n] = int(pkg_info.get('Installed-Size', 0))
        return n
    else:
        return package_nodes.get(pkg_name)


def get_sanitised_depends_list(depends_string):
    """Given a Depends string, return a list of package names. Versions are
    stripped, and alternatives are all shown.

    """
    if depends_string == '':
        return []
    parts = split('[,\|]', depends_string)
    return [ match('(\S*)', p.strip()).groups()[0] for p in parts]

if __name__ == '__main__':
    # Create property lists we need:
    graph.vertex_properties["package-name"] = graph.new_vertex_property("string")
    graph.vertex_properties["installed-size"] = graph.new_vertex_property("int")

    # get list of packages:
    packages = get_package_list()

    # graph_nodes = graph.add_vertex(n=len(packages))
    n = 0
    for pkg_name in packages:
        node = get_graph_node_for_package(pkg_name)
        pkg_info = get_package_info(pkg_name)
        # purely virtual packages won't have a package info object:
        if pkg_info:
            depends = pkg_info.get('Depends', '')
            depends = get_sanitised_depends_list(depends)
            for dependancy in depends:
                graph.add_edge(node, get_graph_node_for_package(dependancy))
        n += 1
        if n % 10 == 0:
            print "%d / /%d" % (n, len(packages))

    graph.save('graph.gml')


Yes, I realise this is terrible code. However, I also wrote it in 10 minutes time, and I'm not planning on using it for anything serious - this is an experiment!

Running this script gives me a 2.6MB .gml file (it also takes about half an hour - did I mention that the code is terrible?). I can then import this file into gephi, run a layout algorithm over it for the best part of an hour (during which time my laptop starts sounding a lot like a vacuum cleaner), and start making pretty pictures!

The Pretties:

Without further ado - here's the first rendering. This is the entire graph. The node colouring indicates the node degree (the number of edges connected to the node) - blue is low, red is high. Edges are coloured according to their target node.

These images are all rendered small enough to fit on the web page. Click on them to get the full image.
A few things are fairly interesting about this graph. First, there's a definite central node, surrounded by a community of other packages. This isn't that surprising - most things (everything?) relies on the standard C library eventually.
The graph has several other distinct communities as well. I've produced a number of images below that show the various communities, along with a short comment.

C++

These two large nodes are libgcc1 (top), and libstdc++ (bottom). As we'll see soon, the bottom-right corder of the graph is dominated by C++ projects.

Qt and KDE

This entire island of nodes is made of up the Qt and KDE libraries. The higher nodes are the Qt libraries (QtCore, QtGui, QtXml etc), and the nodes lower down are KDE libraries (kcalcore4, akonadi, kmime4 etc).

Python

 The two large nodes here are 'python' and 'python2.7'. Interestingly, 'python3' is a much smaller community, just above the main python group.

System

Just below the python community there's a large, loosely-connected network of system tools. Notable members of this community include the Linux kernel packages, upstart, netbase, adduser, and many others.

Gnome


This is GNOME. At it's core is 'libglib', and it expands out to libgtk, libgdk-pixbuf (along with many other libraries), and from there to various applications that use these libraries (gnome-settings-daemon for example).

Mono

At the very top of the graph, off on an island by themselves are the mono packages.

Others

The wonderful thing about this graph is that the neighbourhoods are fractal. I've outlined several of the large ones, but looking closer reveals small clusters of related packages. For example: multimedia packages:

This is Just the Beginning...

This is an amazing dataset, and really this is just the beginning. There's a number of things I want to look into, including:
  • Adding 'Recommends' and 'Suggests' links between packages - with a lower edge weight than Depends.
  • Colour coding nodes according to which repository section the package can be found in.
  • Try to categorise libraries vs applications - do applications end up clustered like libraries do?
I'm open to suggestions however - what do you think I should into next?

So you missed PyCon US...

If you're anything like me you've watched another PyCon US come and go. Living in New Zealand makes attending overseas conferences an expensive proposition. So you've missed the conference. You've watched all the talks on pyvideo.org, but it's still not enough. You'd love to attend a PyCon in person, perhaps one in an exotic location (what a great opportunity for a family vacation). Of course, I have a solution: Come to Kiwi PyCon!



In contrast to the US PyCon, Kiwi PyCon is a smaller, more intimate affair, with a few hundred delegates, two streams, and plenty of chances to meet other python hackers from the Australia/New Zealand/Pacific region. Places are limited, and registrations are open, so here's what you need to do to beat the post-PyCon blues:
  1. Go to nz.pycon.org, and register for the conference. While you're there, check out the sponsorship options!
  2. If you're feeling brave, submit a talk proposal!
  3. Book accommodation and flights (we will soon have accommodation options listed on the website).
  4. Count down the days to the conference!
It's that simple. Do it now!

Kiwi PyCon Sponsorship Drive

 Kiwi PyCon is approaching! You probably think that's a good thing, but if you're one of the poor volunteer organisers, that's a scary thought. Why? We have bills to pay, and very little income. That means it's time to shill for some cash! Below is an excerpt from our public sponsorship announcement email. If your company is willing to sponsor a good cause, please get in touch with me.

Kiwi PyCon is organised by the New Zealand Python User group - a not-for-profit organisation. We don’t make any profit from the conference, and the organisers donate their free time to make the event a success. We rely entirely on companies’ sponsorship to pay the bills.

Sponsorship has several advantages for you:

  • It’s an opportunity to get brand exposure in front of the foremost Python experts from New Zealand and around the world.
  • Presents a fantastic networking opportunity if you are looking to employ engineers now, or in the future.
  • Align yourself with market leaders and past sponsors such as Github, Weta Digital, Catalyst IT, Mozilla. Become known as a Python promoter and industry leader. 
  • Gold sponsors receive five complimentary tickets to the conference and their logo on the conference shirt and all print materials.

If you’d like to sponsor the conference, a document describing sponsorship opportunities is available here.

To get in touch, email kiwipycon-sponsorship@nzpug.org.

Python GObject Introspection oddness

I recently ported indicator-jenkins to Gtk3 using the python GObject Introspection Repository (gir) bindings. Ted Gould did most of the work, I just cleaned some bits up and made sure everything worked. One issue that puzzled me for a while is that the GObject library changed the way it's "notify" signal works between GObject 2 and GObject 3. I've not seen any documentation of this change, so I'll describe it here.

For this example, let's make a very simple class that has a single property:

import gobject

class MyClass(gobject.GObject):
    prop = gobject.property(type=int)

...and a very simple callback function that we want to call whenever the value of 'prop' changes:

def cb(sender, prop):
    print "property '%s' changed on %r." % (prop.name, sender)

Finally, with GObject 2 we can create an instance of 'MyClass' and connect to the 'notify' signal like this:

inst = MyClass()
inst.connect("notify", cb)
inst.prop = 42

When we run this simple program we get the following output:
property 'prop' changed on .
... which is what we expected. However, if we port this code to GObject 3, it should look like this:

from gi.repository import GObject

class MyClass(GObject.GObject):
    prop = GObject.property(type=int)


def cb(sender, prop):
    print "property '%s' changed on %r." % (prop.name, sender)


inst = MyClass()
inst.connect("notify", cb)
inst.prop = 42

However, running this gives an error:

/usr/lib/python2.7/dist-packages/gi/_gobject/propertyhelper.py:171: Warning: g_value_get_object: assertion `G_VALUE_HOLDS_OBJECT (value)' failed
  instance.set_property(self.name, value)
Traceback (most recent call last):
  File "gobject3.py", line 8, in cb
    print "property '%s' changed on %r." % (prop.name, sender)
AttributeError: 'NoneType' object has no attribute 'name'

The 'prop' parameter in the callback is set to None.

There is a solution however - connecting the callback to a more specific notification signal works as expected:

from gi.repository import GObject

class MyClass(GObject.GObject):
    prop = GObject.property(type=int)


def cb(sender, prop):
    print "property '%s' changed on %r." % (prop.name, sender)


inst = MyClass()
inst.connect("notify::prop", cb)
inst.prop = 42

 It took me a while to figure this out - hopefully I've saved someone else that work.

KiwiPyCon Sloecode Talk

It's about time I posted my Kiwi PyCon talk from last August! I spent an enjoyable 30 minutes talking about what Sloecode is, how we built it, and the problems encountered along the way. As per my usual approach to public speaking, I tried to keep things light-hearted and entertaining. Anyway, the slides and audio are available here:

KiwiPyCon 2012 will be in Dunedin

It is a great honour and privilege to be involved in bringing the New Zealand Python developers conference (KiwiPyCon) to Dunedin in 2012. I intend to blog regularly with details of next year's conference as they emerge, so watch this space for more information over time!

Finally, a huge "Thank You" to Tim McNamara for organising the best KiwiPyCon to date. I have a huge task ahead of me, and he's raised the bar very high indeed.

Have any good ideas for next year? Please let us know!

Sloecode 1.1 - get your bugreports in now!

I'm at KiwiPyCon in Wellington, New Zealand. It turns out, being surrounded by geeks is great inspiration & motivation. Because of this, I've created a Sloecode 1.1 milestone in launchpad. If you have any bugreports or feature reuests, now's the time to get them in!

...I can't believe I'm asking people for bug-reports. This feels wrong...

Representin' the home-boys!

That's *exactly* the kind of blog title that's liable to get me in trouble down the road. Faux street swagger doesn't age well.

In other news, I'm presenting a talk at this years NZ Python conference KiwiPyCon. Their website does not have the schedule visible yet, but there is a list of talks available.

I'm looking forward to meeting other python developers, and maybe getting some help with Sloecode!

Sloecode: Now with Ubuntu Packages!


We're still working towards a sloecode 1.0 release. We're now dangerously close - we have some UI tweaks to complete, and a few other bits and pieces. However, we now have Ubuntu packages for both the client and server components, ready for you to test!

What is Sloecode?


Sloecode is an open source project, hosted on launchpad.net that aims to provide a comprehensive, installable code forge. A "code forge" is a set of tools that help groups of people write software (think sourceforge, launchpad etc). It typically includes a revision control system, and optionally project wikis, bug tracking, issue tracking, feature planning and any other features people might need. However, it's important to note that we're not trying to replicate Launchpad! Launchpad is a great service, and we don't want to compete with it. Instead, we're providing a tool for people who:

  • Don't want their code to be public - either because it's commercial software or because it's not ready yet. You can use sloecode in your business, college, university without having to release your software to the wider world. You can do this on launchpad, but you must pay for the privilege. The project was borne out of the need to have a code forge for an educational environment: putting students' work online is not an option.

  • Want their code forge system to be hosted locally. For large code bases, the communication times between a local machine and the central launchpad.net servers can become expensive - especially in locations where the Internet connection is poor. Running sloecode is easy (as we'll see soon) and gives you complete control over the system. You're are responsible for server maintenance, backup, and general administration.
Compared to Launchpad, we're aiming for a completely different user base.

How does it Work?
Sloecode is made up of two distinct parts:

The Bazaar/SSh smart server is a python-twisted application that understands the SSH protocol, and runs bzr-serve on demand for users. Every time you interact with the server using a Bazaar client application you're actually talking to the smart server. We need this component for several reasons: first, we don't want users to require a system account on the Linux server (which is the usual way of setting up a Bazaar server), but we do still want to use SSH public/private key authentication. This means the SSH server needs to know about our database and be able to retrieve keys from it on demand.

The Sloecode Web App is the front-end for the whole system. It's written in pylons, and makes use of a whole host of other libraries including jinja2 (page templates), formencode (input validation), YUI3 (javascript components), repoze.what and repoze.who (authorisation and authentication), and a whole lot more. The web-app provides control for managers and administrators to create projects and users and assign users to projects (with different levels of access). Regular users can see basic information about their bazaar branches (both personal repositories and project repositories), and manage their SSH keys.

The plan for the future is to add optional components - the key word being optional. One of our design goals is to make sloecode easy to install; this includes minimising install-time dependancies.

The key thing to understand about Sloecode is that we're not writing the VCS ourselves - we're simply making Bazaar easier to install and manage.

Enough already, give me the server!


Sloecode has not yet been released, but you are welcome to participate in testing. To set up the server, you'll first need the sloecode PPA in your sources.list. To achieve this, run the following command:

sudo add-apt-repository ppa:sloecode

Then update your package lists:

sudo apt-get update

You can now install the sloecode server package:

sudo apt-get install sloecode

The server package should install all the package dependancies. However, before you can run the server, you need to set up the database back-end. The default back-end us sqlite, which is fine for testing or for very low-use installations. Sloecode uses sqlalchemy, which allows us to use any number of databse backends. This, and many other configuration details can be edited by tweaking this file:

/etc/default/sloecode-production.ini

Once you're happy with the settings, you need to create the database tables. Do this by running:

sudo paster setup-app /etc/default/sloecode-production.ini

This command will read the values present in the configuration file, and create the default database tables for you. You only need to run this command once (to create the tables in your database)!

If you have edited any of the other values, you will want to restart the server:

sudo service sloecode restart

By default, log files are in /var/log/sloecode, Bazaar repositories are created in

/srv/sloecode/bzr-repos

and the default database is a simple SQLite file (which won't handle any kind of heavy workload or concurrency, so you may wish to change that). The RSA public/private key pair the smart server uses to communicate with clients are stored in

/var/sloecode/keys


And the clients?


Client setup is much simpler. Simply follow these steps:
  1. Make sure you have an account created for you in the sloecode web interface. Log in to the web interface with your username and password.
  2. Click on the "Manage SSH Keys" link on your home page. You need to paste your SSH public key in the form provided. This allows your Bazaar client to authenticate with the sloecode server. If you don't have an SSH key, you can generate one by running:

    ssh-keygen

    and view the public key by running:

    cat ~/.ssh/id_rsa.pub
    
    
  3. All client machines need the 'bzr' and 'bzr-sloecode' packages installed. Assuming you have the sloecode PPA installed (see instructions for doing this in the server section, above), you can run:

    sudo apt-get install bzr bzr-sloecode
    
    
  4. Finally, you need to tell the Bazaar sloecode plugin where the sloecode server is. The client plugin looks for an environment variable called "SLOECODE_SERVER", and will complain if it is not found. The easiest way to set this environment variable is to edit your "~/.bashrc" file and add this line to the end:

    export SLOECODE_SERVER=domain.of.your.server.com

    The value of this variable should be either a domain name, OR an IP address that points to the sloecode server you wish to use. For example, if the server is installed on the local network you might have the following:

    export SLOECODE_SERVER=192.168.1.10

    Note that you must not add any network protocol specification - the sloecode client plugin takes care of that for you. If you are running the sloecode ssh service on a port other than 22, you must add this port to the end of the string, like so:

    export SLOECODE_SERVER=192.168.1.10:4022

  5. Finally, if your username on the sloecode server is different from your local computer username, you need to tell the Bazaar sloecode plugin what username it should use. To do this, run the command:

    bzr sc-login sloecode_username

    Where "sloecode_username" is the username you use to log in to the sloecode web interface. For example, on my machine the command is:

    bzr sc-login thomir

    If you run the command without a username it will tell you what username is currently set.

That's it! Everything is all set up. Now you can start using the sloecode server.

Bazaar / Sloecode Basics


I won't cover the details of how to use Bazaar. If you need that instruction, look at the official Bazaar documentation. However, here are a few pointers:

  • Every user on the sloecode server has a personal repository. To push code to your personal repository, do this:

    bzr push sc:~sloecode_username/branch_name

    By default personal repositories are created with no branches, so whenever you push or pull from a personal repository you must always specify a branch name!

  • Personal repositories are private - you are the only one who can read or write to your personal repository. If you need to share your code with someone else, you need to store it in a project repository.

  • To pull code from a project repository, the command looks like this:

    bzr pull sc:project_name/branch_name

    Note that there is no tilde character before the project name. Also note that you need not specify the branch name if you want the special branch called "trunk". For example, if I want a copy of the trunk branch of project "elastic", I'd run the following command:

    bzr pull sc:elastic

    However if I want a specific branch called "fix-bug-1234", I'd run the following:

    bzr pull sc:elastic/fix-bug-1234
Final Thoughts:


We're still writing sloecode - it's an ongoing effort and won't be finished any time soon. We welcome your feedback - the best way to contact is us via the sloecode-developers mailing list, or in the #sloecode channel on freenode IRC, or leave a comment below. Have we missed anything? Spot any bugs? Have suggestions for improvements/future direction? We'd love to hear from you!

Why your python editor sucks

I'm doing a reasonable amount of python-coding work these days. It would help me to have an editor that doesn't suck. My requirements are:

  1. Small & Fast. I'm not after a massive clunky IDE, just an editor with enough smarts to make editing multiple python files easier.
  2. Sensible syntax highlighting.
  3. Understands python indentation, PEP8 style. Specifically, indents with 4 spaces, backspace key can be used to unindent.
  4. Can be integrated with one or more lint checkers. Right now I use a wonderful combination of pep8, pyflakes and pylint. I want the output of these to be integrated with the editor so I can jump to the file & line where the problem exists.
That's it. I don't think I'm asking too much. Here's the editors I've tried, and why they suck:

  1. KATE. I love kate, it's my default text editor for almost everything. However, there is no way to integrate lint checkers. I could write a plugin, but that's yet another distraction from actually doing my work.
  2. Vim. I'm already reasonably skilled with vim, and Alain Lafon's blog post contains some great tips to make vim even better. My problem with vim is simply that it's too cryptic. Sure, I could spend a few years polishing my vim skills, but I want it to just work. Vim goes in the "kind of cool, but too cryptic" basket.
  3. Eric. When you launch eric for the first time it opens the configuration dialog box. It looks like this:
    How many options do I really need for an editor? Over-stuffed options dialogs is the first sign of trouble. It gets worse however, once you dismiss the settings window, the editor looks like this:

    Need I say more?
  4. Geany. Looks promising, but no integration into lint checkers.
  5. pida. Integrates with vim or emacs for the editor component. Looks promising, although the user interface is slightly clunky in places. Pida suffers from exactly the same problems as vim does however, but I may end up using it.
There are a few options I have not tried, and probably won't:
  1. Eclipse & pydev. Eclipse is a huge, hulking beast. I want a small, fast, lean editor, not an IDE.
  2. Emacs. Can't be bothered learning another editor. Doesn't look that much different to vim, so what's the point in learning both?
  3. KDevelop. Same reason as Eclipse, above.

I suspect there's a market for a simple python editor that just works. Please! Someone build it!