Versioning is quite difficult to deal with. Versions are nearly-numbers, but you can't quite sort them using standard numerical algorithms.
While the following is true:
1.1 < 1.2
The following is also true:
1.2 < 1.18 < 1.20
The "." is not a decimal point but a separator.
Mozilla uses a modestly complicated versioning system that involves stars, plusses, and sometimes "x".
I found a very convoluted way to translate these versions into large integers. The versions for applications in the AMO database have four parts at most, they are potentially alpha or beta and potentially a pre-release. In some cases we have multiple versions represented with .*, .x or + at the end.
The Toolkit docs let us translate "+" to mean "pre-release of the next version". E.g. 1.0+ is 1.1pre0. Since my primary purpose of all this is for sorting, .* and .+ may as well just be a very large "version part." Since all the version parts I deal with are a maximum of 2-digits, I turned .* and .+ into .99.
For example:
3.5+ => '03'+'05'+'99' => 030599
We also need to deal with versions that may be alpha, beta or not. If everything else is equal:
3.5a < 3.5a5 < 3.5b < 3.5b2 < 3.5 < 3.5+
We assign a single integer to represent a version's "non-alphaness":
a => 0
b => 1
non alpha/beta => 2
We assume that 3.5a = 3.5a1. Therefore:
'3.5a => 3.5.0a1 => '03'+'05'+'00'+'0'+'01' => 030500001
Similarly if it's a pre-release we assign a 0 or 1 to represent "non-pre-releaseness":
'3.5a pre2 => 3.5.0a1pre2
=> '03'+'05'+'00'+'0'+'01'+'0'+'02
=> 030500001002
So what does this get us? Integers which we can use for comparison, sorting, etc. It's a one time calculation for each version and we can do some nice SQL statements in AMO like:
mysql> SELECT version,version_int FROM appversions WHERE application_id = 1 ORDER BY version_int LIMIT 15;
+---------+--------------+
| version | version_int |
+---------+--------------+
| 0.3 | 30000200100 |
| 0.6 | 60000200100 |
| 0.7 | 70000200100 |
| 0.7+ | 80000200000 |
| 0.8 | 80000200100 |
| 0.8+ | 90000200000 |
| 0.9 | 90000200100 |
| 0.9.0+ | 90100200000 |
| 0.9.1+ | 90200200000 |
| 0.9.2+ | 90300200000 |
| 0.9.3 | 90300200100 |
| 0.9.3+ | 90400200000 |
| 0.9.x | 99900200100 |
| 0.9+ | 100000200000 |
| 0.10 | 100000200100 |
+---------+--------------+
15 rows in set (0.00 sec)
I can now index these integers using Sphinx and do some very easy searches for addons based on version number.



Wow. That is a nifty little system you have there. For the Metrics data warehouse, I have a regex that splits the version string up into seven distinct parts: major int minor int minorsuffix varchar suba int subasuffix varchar subb int subb_suffix varchar
These fields allow me to perform a numeric sort on the integers and a lexical sort on the string suffixes.
I also have to deal with some issues like version strings that come in from other builds (e.g. Debian, etc) and they have sometimes put extra suffixes in like Firefox 3.5.1-g1.
I’ll keep this serialization technique in mind for the future though, I can think of some areas where it might be handy.
How come the output of the query above doesn’t have 99s for the entries that contain a ?
I would expect the following: | 0.7 | 70000200100 | | 0.7+ | 80000200000 |
to instead be
Hi Daniel, this is because I actually lied in my explanation – in order to be brief.
In the toolkit versioning page they said that:
In my python scripts I adhered to that, I think when I ported this back to PHP, I did get lazy and just said 0.7+ =~ 0.7.99.
Good eye!
Would it still not cause trouble? I’m just noticing that version 0.3 has 30000200100 while 3.5 has 030500001002 which is “the same” sans the ‘0′ first.
Yep, the toolkit versioning format is… occasionally crazy-looking. For example, 3.5 > 3.6a (because the first is really “3.6pre” instead, and a < p). Also, actually, 3.6a is 3.6a0 which means it’s smaller than 3.6a1… Hopefully there are release management (i.e. non-code, but people) rules in place to not break you though.
I’m not sure how the C code would parse “3.5a pre2″ (notably, because spaces are just another ASCII character). “3.5a.pre2″ would work, though.
I think you computed the numbers incorrectly:
The resulting integers are an order of magnitude difference.
I don’t think we ever used pre0, pre1, etc. – we usually just have plain “pre” at the end of the version.
Also, does it represent the following correctly? 1.9.0 <= 1.9.1a1pre < 1.9.1a1 < 1.9.1b5 < 1.9.1pre < 1.9.1rc1 < 1.9.1
Robert,
Upon inspection of AMO’s data – I guess you are correct about pre never including a number, I could strip two digits out of this translation – which would be nice.
As for your test, I used it as the basis of a doctest:
Here is the script I am using.