V is for Version Hell

by Dave Dash 07Aug09

Versioning is quite difficult to deal with. Versions are nearly-numbers, but you can't quite sort them using standard numerical algorithms.

While the following is true:

1.1 < 1.2

The following is also true:

1.2 < 1.18 < 1.20

The "." is not a decimal point but a separator.

Mozilla uses a modestly complicated versioning system that involves stars, plusses, and sometimes "x".

I found a very convoluted way to translate these versions into large integers. The versions for applications in the AMO database have four parts at most, they are potentially alpha or beta and potentially a pre-release. In some cases we have multiple versions represented with .*, .x or + at the end. The Toolkit docs let us translate "+" to mean "pre-release of the next version". E.g. 1.0+ is 1.1pre0. Since my primary purpose of all this is for sorting, .* and .+ may as well just be a very large "version part." Since all the version parts I deal with are a maximum of 2-digits, I turned .* and .+ into .99.

For example:
3.5+ => '03'+'05'+'99' => 030599

We also need to deal with versions that may be alpha, beta or not. If everything else is equal:

3.5a < 3.5a5 < 3.5b < 3.5b2 < 3.5 < 3.5+

We assign a single integer to represent a version's "non-alphaness":

a => 0
b => 1
non alpha/beta => 2

We assume that 3.5a = 3.5a1. Therefore:

'3.5a => 3.5.0a1 => '03'+'05'+'00'+'0'+'01' => 030500001

Similarly if it's a pre-release we assign a 0 or 1 to represent "non-pre-releaseness":

'3.5a pre2 => 3.5.0a1pre2 
=> '03'+'05'+'00'+'0'+'01'+'0'+'02 
=> 030500001002

So what does this get us? Integers which we can use for comparison, sorting, etc. It's a one time calculation for each version and we can do some nice SQL statements in AMO like:

mysql> SELECT version,version_int FROM appversions WHERE application_id = 1 ORDER BY version_int LIMIT 15;
+---------+--------------+
| version | version_int  |
+---------+--------------+
| 0.3     |  30000200100 | 
| 0.6     |  60000200100 | 
| 0.7     |  70000200100 | 
| 0.7+    |  80000200000 | 
| 0.8     |  80000200100 | 
| 0.8+    |  90000200000 | 
| 0.9     |  90000200100 | 
| 0.9.0+  |  90100200000 | 
| 0.9.1+  |  90200200000 | 
| 0.9.2+  |  90300200000 | 
| 0.9.3   |  90300200100 | 
| 0.9.3+  |  90400200000 | 
| 0.9.x   |  99900200100 | 
| 0.9+    | 100000200000 | 
| 0.10    | 100000200100 | 
+---------+--------------+
15 rows in set (0.00 sec)

I can now index these integers using Sphinx and do some very easy searches for addons based on version number.


Where am I?

This is a single entry in the weblog.

"V is for Version Hell" is filed under spindrop. It was published in August 2009.

August 2009
M T W T F S S
« Jul   Sep »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

need more help

If you found our tutorials and articles to be useful, but are still looking for more hands on help, consider hiring us. Find out more about how Spindrop can help you.

 

8 Responses to “V is for Version Hell”


  1. 1 Daniel Einspanjer Posted August 7th, 2009 - 2:40 pm

    Wow. That is a nifty little system you have there. For the Metrics data warehouse, I have a regex that splits the version string up into seven distinct parts: major int minor int minorsuffix varchar suba int subasuffix varchar subb int subb_suffix varchar

    These fields allow me to perform a numeric sort on the integers and a lexical sort on the string suffixes.

    I also have to deal with some issues like version strings that come in from other builds (e.g. Debian, etc) and they have sometimes put extra suffixes in like Firefox 3.5.1-g1.

    I’ll keep this serialization technique in mind for the future though, I can think of some areas where it might be handy.

  2. 2 Daniel Einspanjer Posted August 7th, 2009 - 2:49 pm

    How come the output of the query above doesn’t have 99s for the entries that contain a ?

    I would expect the following: | 0.7 | 70000200100 | | 0.7+ | 80000200000 |

    to instead be

    | 0.7     |  70000200100 | 
    | 0.7+     |  79900200000 |
    
  3. 3 Dave Dash Posted August 7th, 2009 - 3:16 pm

    Hi Daniel, this is because I actually lied in my explanation – in order to be brief.

    In the toolkit versioning page they said that:

    0.7+ = 0.8pre0
    

    In my python scripts I adhered to that, I think when I ported this back to PHP, I did get lazy and just said 0.7+ =~ 0.7.99.

    Good eye!

  4. 4 tsb Posted August 7th, 2009 - 7:43 pm

    Would it still not cause trouble? I’m just noticing that version 0.3 has 30000200100 while 3.5 has 030500001002 which is “the same” sans the ‘0′ first.

  5. 5 Mook Posted August 7th, 2009 - 11:22 pm

    Yep, the toolkit versioning format is… occasionally crazy-looking. For example, 3.5 > 3.6a (because the first is really “3.6pre” instead, and a < p). Also, actually, 3.6a is 3.6a0 which means it’s smaller than 3.6a1… Hopefully there are release management (i.e. non-code, but people) rules in place to not break you though.

    I’m not sure how the C code would parse “3.5a pre2″ (notably, because spaces are just another ASCII character). “3.5a.pre2″ would work, though.

  6. 6 Dave Dash Posted August 8th, 2009 - 10:52 am

    I think you computed the numbers incorrectly:

    mysql> SELECT version, version_int FROM appversions 
    WHERE application_id = 1 and version IN ('0.3', '3.5') ORDER BY version_int ; +---------+---------------+ | version | version_int | +---------+---------------+ | 0.3 | 30000200100 | | 3.5 | 3050000200100 | +---------+---------------+

    The resulting integers are an order of magnitude difference.

  7. 7 Robert Kaiser Posted August 9th, 2009 - 7:55 am

    I don’t think we ever used pre0, pre1, etc. – we usually just have plain “pre” at the end of the version.

    Also, does it represent the following correctly? 1.9.0 <= 1.9.1a1pre < 1.9.1a1 < 1.9.1b5 < 1.9.1pre < 1.9.1rc1 < 1.9.1

  8. 8 Dave Dash Posted August 9th, 2009 - 2:51 pm

    Robert,

    Upon inspection of AMO’s data – I guess you are correct about pre never including a number, I could strip two digits out of this translation – which would be nice.

    As for your test, I used it as the basis of a doctest:

    
        >>> def c(x,y):
        ...    x = int(translate(x))
        ...    y = int(translate(y))
        ...    if (x-y > 0):
        ...        return 1
        ...    elif (x-y < 0):
        ...        return -1
        ...    return 0
        >>> v = ['1.9.0a1pre', '1.9.0a1', '1.9.1.b5', '1.9.1.b5', '1.9.1pre', '1.9.1', '1.9.0']
        >>> assert c(v[0],v[1]) == -1
        >>> assert c(v[1],v[2]) == -1
        >>> assert c(v[2],v[3]) == 0
        >>> assert c(v[3],v[4]) == -1
        >>> assert c(v[4],v[5]) == -1
        >>> assert c(v[5],v[6]) == 1
    

    Here is the script I am using.

Further Help

If you require more hands on assistance, we do offer affordable hands on support.

Leave a Reply


Comment guidelines: No spamming, no profanity, and no flaming. Inappropriate comments will be deleted outright.