Saturday, 4 January 2014

Reading OBO Files in PHP using PhpObo

OBO files are specially formatted text-based human-readable files that contain ontologies, terms and descriptions, that describe a domain. At work I had a requirement to look through and double check the status of certain terms in the Mammalian Phenotype (MP) ontology. My first approach was to loop through the XML-based OWL format of the MP ontology, but I soon realised the OWL file was infrequently updated and I needed something much more current to work with. The other thing I noted was that other ontologies were not available in OWL format either, so what was really needed was a way to scan through the OBO files.

I had a little search and saw that there were solutions in Java and a Perl library but nothing in PHP, which is what my main application is written in so I decided to write my own OBO parser in PHP. Initially, it was going to be a really simple script to just loop through the OBO file but after reading the OBO format specification I realised I might as well write a proper library and set about writing PhpObo for myself and anyone else who would need it.

I have published the PhpObo library on Github under the Apache 2.0 license so feel free to use it, modify it and contribute if you wish. It is written in a flexible object oriented manner so you can swap out virtually any class from the library with your own version or extend its functionality. It is not a complete solution since it only works with one document at a time and doesn't resolve external ontology dependencies. But it does serve most people's needs and allow you to loop through any OBO file and it also allows you to generate your own OBO document using either an OOP or an Array-based (ArrayAccess) approach and serialize it out in the OBO file format.

If you wish to use PhpObo in your PHP 5.3+ project, I recommend you use a PHP PSR-0 dependency manager and autoloader like Composer to import the PhpObo project via Packagist.

No comments:

Post a Comment