Wednesday, 17 December 2014

Tips for those starting C programming

Since starting my PhD I have been coding in Perl again (yay) but I have also started coding in C. I've found C more difficult than I thought to get to grips with for such a familiar syntax. Here are some things that caught me out and need to be remembered.

Remember to cast your results, arguments and expressions to exactly what is requested/expected
I was using the ceil() function found in the <math.h> header include. This has a signature: double ceil(double x); My initial thoughts were it would do the casting automatically since what is an int except a double with the decimal numbers attached and if I'm dividing a double by an int then it will just evaluate the result of the calculation as a double. I was wrong. I've been coding PHP way too long. I tried:

x = ceil ( y / z ); //x was 0 no matter what numbers I passed!

The problem was a type issue. I had to cast z to a double and change the output of ceil to int:

x = (int) ceil ( y / (double) z ); //now it works!

Always initialize variables with a starting value
Another thing I expected the language to do was to initialize variables for me like it does with PHP, so when I declared: unsigned int i; I expected i to hold a value of 0. It doesn't! C just grabs a block of memory for the x variable but it doesn't clear it. YOU have to do that! This is what I did initially:

unsigned int i; //but what is held in i?
while ( j > i ) {
    //do something
    i++;
}

Every time I ran this bit of code it did something a different numbers of times and sometimes not at all. That's because the memory occupied for the variable i wasn't initialized and the previous value present in that memory block was being used as the starting value. It was usually something random like 65288424572589. The correct thing to do is to initialize a variable with a value at declaration or shortly before you need to use it: unsigned int i = 0;

Choose calloc over malloc

Calloc is memory allocation but it initializes values to 0 as expected. It is always better to know what values you expect before you change them. The best way to allocate memory from the stack is:

unsigned int * arr;
unsigned int length = 10;
if ( ( arr = ( unsigned int * ) calloc ( length , sizeof ( unsigned int ) ) ) == NULL )
{
    fprintf ( stderr, " Error: arr could not be allocated!\n");
    return ( EXIT_FAILURE );
}


Don't forget to free( arr ) before returning. Every time, check you free things before returning.

Don't trouble yourself with extra compilation flags (preference)

Just because you like to code a certain way, it shouldn't mean you need the burden of more compilation flags. Only bother with more flags if you really need the performance boost. For example, declaring and initializing a variable in a for loop requires the -std=c99 flag to be added to compile, otherwise it gives this error for the code:

//'for' loop initial declaration used outside C99 mode
for (int i = 0; i < x; i++){/*bla*/}

To get past this problem why not just declare the variable before and initialize it in the loop like so:

int i;
for ( i = 0; i < x ; i++){/*bla*/}

The reason why I prefer not to use more flags is because you want your code to be portable and compilable everywhere and by novices who don't know about flags.

Read up on string handling in C

Strings are virtually ubiquitous in coding. You just need them. In C a string is declared as a char array or using the pointer shortcut:

char * s = "Hello";

This not only automatically adds the string terminator (NUL or '\0'), it is a pointer. But because it's a pointer you can't do very much with it unless you use the <string.h> header functions such as strcmp, strcpy and concat. Two examples:

if ( s ==  "Hello); //wrong!
if ( strcmp (s, "Hello") == 0); //right! 

char * a = "Hello A";
char * b = a; //wrong!
strcpy ( b, a ); //right!

And another thing, long string literals, an alternative to HEREDOC syntax, is declared like so in C:

char * r = "ABC"
           "DEF"
           "GHI";

Pointers store the starting memory address of variables

It's really straightforward once you figure it out. Consider a variable i and it's pointer. i holds the actual value, while the pointer holds the memory address of i. To modify the contents of i through it's pointer you need to dereference it:

int i = 3; //initialise i to 3
int *iPtr = &i; //Ampersand returns a variables address to be stored in pointer
( *iPtr ) = 4; //dereference the pointer to access it's associated variable
printf("%d\n", i); //4

Pointers are used for arrays mainly as shown in the calloc example above. The pointer of an array is the start position of the first element in the array. You should know that if you pass an array to a function you are actually only passing its reference.

Saturday, 7 June 2014

Open-sourced my work projects

One of the awesome things about my last job is that I got to open source my work. More companies should do this! Among the benefits of open-sourcing work is it presents evidence of previous work and experience for the employee and for the employer it means projects get exposure and contributions from outside.

My main project is a SOP/experiment definition management system. It was written in PHP using the CodeIgniter framework with a little Zend Framework and the data is stored in a MySQL database. I'm close to finishing the paper for the project. Check out the code in the meantime: https://github.com/mpi2/impress

For the main project I created an OWL Ontology using a combination of Protégé and the Java OWLAPI to generate ontology classes and individuals from the items in the database. For those of you who want to figure out how to use the OWLAPI to build ontologies I am sure the code will help you figure it out: https://github.com/mpi2/impress_owl

Also, I'd like to point out another project I wrote which is mappings of core XML datatypes into PHP classes (with validation checking): https://github.com/mpi2/xmldatatype.

And a reminder to a previously mentioned project, PhpObo, which is a OBO file format parser and builder written in PHP: https://github.com/mpi2/PhpObo

Friday, 18 April 2014

Permutations

The other day a friend was seeking advice about passwords and password length and such and I just told him the strength of a password is based on however many characters (n) he has available to him to the power of the length of the string (l), n^l, and that he should also use non-alphanumeric characters as well and make sure it's long enough and memorable to him. Using the formula, a 4 letter password using just digits is 10^4, or 10000 possible combinations.

That got me thinking about cracking and finding combinations of passwords if you know which characters were used and in what frequency. Basically you would need to jumble the characters up and try the different combinations but because you know which characters were used the entropy is far less, l^2 - l. This would mean a 4 letter passwords using the characters abcd would have 4^2 - 4, or 12 possible combinations.

I had to put a little thought into how I was going to code this. The first thing you do when you have to write an algorithm is figure out how you would do it in real life and I thought up the idea of swivelling characters round. So, if we have the starting characters abcd, you would lock the first character and swivel the other three round:

abcd
adbc
acdb

Then, you would shift the first character (a) off the original array and push it onto the end, lock the new first character (b), and do the swivelling of the other three again:

bcda
bacd
bdac

and so on. So when I first tried to solve this problem I did it the easy way by using the php array functions:

<?php

/**
 * Produce permutations of a string using php array functions
 * @author Ahmad Retha
 * @license public domain
 */

$s = "abcd"; //starting letters
$a = str_split($s); //converted to char array
$l = count($a); //length of array
$i = 0; //initialize counter for loop
$r = array(); //result array holds the combinations

while ($i < $l) {
   
    $b = array_slice($a, 1); //new array missing first element
    $j = 0; //initialize counter for inter-swivel
    while ($j < $l - 1) {
        array_push($b, array_shift($b)); //swivel smaller array

        //store result in $r
        $r[] = implode('', array_merge((array)$a[0], $b));
        $j++;
    }

   
    array_push($a, array_shift($a)); //swivel initial character
   
    $i++;
}
?>


But I was not satisfied with this and set about rewriting it in a more efficient way using array index manipulation to move elements around. This code is a little longer but more efficient and runs a tad faster:

<?php

/**
 * Produce permutations of a string through array manipulation
 * @author Ahmad Retha
 * @license public domain
 */

$s = "abcd"; //starting letters
$a = str_split($s); //converted to char array
$l = count($a); //length of array
$r = array(); //result array holds the combinations
$c = 0; //counter
$t = null; //temporarily holds first char

while ($c < $l) {

    $t = $a[0]; //store first char

    $n = 1; //initialize counter for inner loop

    for ($i = 1; $i < $l; $i++) {

        while ($n < $l) {

            //temporarily store first char of inner substring
            $u = $a[1];
           
            //shift chars along in substring
            for ($j = 2; $j < $l; $j++) {
                $a[$j - 1] = $a[$j];
            }
           
            $a[$l - 1] = $u; //push substring temp char to end
           
            $r[] = implode('', $a); //store the permutation
           
            $n++;
        }
        

        //shift chars along in whole string
        $a[$i - 1] = $a[$i];
    }

    $a[$l - 1] = $t; //push temp char to end

    $c++;
}


?>


This is sample output of the 12 combinations held in $r:

acdb
adbc
abcd
bdac
bacd
bcda
cabd
cbda
cdab
dbca
dcab
dabc

Tuesday, 14 January 2014

Installing PEAR packages through Composer

Did you know that you can obtain PEAR packages via Composer? And you don't need to install or set up PEAR or configure global settings!

A work colleague was talking about difficulties setting up CPAN for Perl on a server and that got me thinking about dependency managers and repositories and the options available in the PHP world. The modern way to manage dependencies and find libraries/packages is using Composer and Packagist but the old way was through using PEAR which was a global dependency manager with a repository of useful packages. I started to wonder if it was possible to load an old-style PEAR package through the new Composer dependency manager and after some fiddling about I got it working successfully.

Not long ago the PEAR guys decided to create a new version of PEAR so there are now two PEAR repositories available: classic PEAR (http://pear.php.net) and PEAR2 (http://pear2.php.net) which is also known as Pyrus. Most of the packages are still stuck in the old PEAR repository. So let's pick a test package from each of them to install. For PEAR2 I'm going to try the package suggested in the Composer documentation, PEAR2_HTTP_Request, and for old PEAR I'm going to try Numbers_Roman.

Open up the composer.json file and edit it so it looks something like this:

{
    "repositories": [
        {
            "type": "pear",
            "url": "http://pear.php.net"
        },
        {
            "type": "pear",
            "url": "http://pear2.php.net"
        }
    ],
    "require": {
        "pear-pear2/PEAR2_HTTP_Request": "*",
        "pear-pear/Numbers_Roman": "*"
    }
}


The two different PEAR repositories are defined in the repositories section of the composer.json file. Note the first two words (pear-pear and pear-pear2) used in the paths of the require values. This allows Composer to know which repository to look in. Details about choosing alternative channels are in the Composer documentation.

Then run php composer.phar update to install the new packages.

Now we can create a new PHP file to test the packages just installed. Here's my test code. Note that PEAR2 packages are namespaced, hence the use command:

<?php

require 'vendor/autoload.php';

use pear2\HTTP\Request,
    pear2\HTTP\Request\Adapter\Curl;

$url = 'http://www.yahoo.com/';
$request = new Request($url, new Curl());
$response = $request->sendRequest();

$nr = new Numbers_Roman();

//http response code 200 or 'CC' in roman numerals means success
if ('CC' == $nr->toNumeral($response->code)) {
    echo "Successfully accessed $url";
} else {
    echo "An error (HTTP{$response->code}) occured while trying to access $url";
}

?>

And that's it. Composer is just awesome!

Saturday, 4 January 2014

Reading OBO Files in PHP using PhpObo

OBO files are specially formatted text-based human-readable files that contain ontologies, terms and descriptions, that describe a domain. At work I had a requirement to look through and double check the status of certain terms in the Mammalian Phenotype (MP) ontology. My first approach was to loop through the XML-based OWL format of the MP ontology, but I soon realised the OWL file was infrequently updated and I needed something much more current to work with. The other thing I noted was that other ontologies were not available in OWL format either, so what was really needed was a way to scan through the OBO files.

I had a little search and saw that there were solutions in Java and a Perl library but nothing in PHP, which is what my main application is written in so I decided to write my own OBO parser in PHP. Initially, it was going to be a really simple script to just loop through the OBO file but after reading the OBO format specification I realised I might as well write a proper library and set about writing PhpObo for myself and anyone else who would need it.

I have published the PhpObo library on Github under the Apache 2.0 license so feel free to use it, modify it and contribute if you wish. It is written in a flexible object oriented manner so you can swap out virtually any class from the library with your own version or extend its functionality. It is not a complete solution since it only works with one document at a time and doesn't resolve external ontology dependencies. But it does serve most people's needs and allow you to loop through any OBO file and it also allows you to generate your own OBO document using either an OOP or an Array-based (ArrayAccess) approach and serialize it out in the OBO file format.

If you wish to use PhpObo in your PHP 5.3+ project, I recommend you use a PHP PSR-0 dependency manager and autoloader like Composer to import the PhpObo project via Packagist.