A common task in bioinformatics is to read a FASTA file to get a sequence from it. You might want to grab a whole chromosome from a genome (multi-)FASTA file, or you might want to grab a bit of DNA from a single chromosome by providing the start position and end position (range). In both cases, I have you covered:
For extracting a chromosome from a genome file, like the 900MB GRCh37.gz file provided by the 1000 genomes project, which contains all the human chromosomes in one file - I created a program to extract individual chromosomes to a FASTA file. You don't even need to extract the original .gz file - it handles it as is. Check it out here: https://github.com/webmasterar/extractChromosome
For extracting a string of bases from a chromosome stored in a FASTA file just by providing the start and end positions, I created a little Python helper function called getRef() and you can access it here: https://gist.github.com/webmasterar/3a60155d4ddc8595b17fa2c62893dbb0
It is easy to use and takes three arguments: getRef(Chr_File, Start_Pos, End_Pos).
Showing posts with label bioinformatics. Show all posts
Showing posts with label bioinformatics. Show all posts
Friday, 1 December 2017
Monday, 15 June 2015
An online Dynamic Programming matrix viewer
I decided to create a single page application that generates and displays the Dynamic Programming matrix of two given strings. This is for any bioinformatician or computer scientist who needs a quick and easy DP tool they can use online and be able to view the matrix that is output: DPMatrix.
It supports different algorithms - Pattern matching, Global Alignment (e.g. Needleman-Wunsch) and Local Alignment (e.g. Smith-Waterman). And it supports two DP models - Edit distance and Hamming distance. And it allows you to enter different penalty values.
It supports different algorithms - Pattern matching, Global Alignment (e.g. Needleman-Wunsch) and Local Alignment (e.g. Smith-Waterman). And it supports two DP models - Edit distance and Hamming distance. And it allows you to enter different penalty values.
Saturday, 4 January 2014
Reading OBO Files in PHP using PhpObo
OBO files are specially formatted text-based human-readable files that contain ontologies, terms and descriptions, that describe a domain. At work I had a requirement to look through and double check the status of certain terms in the Mammalian Phenotype (MP) ontology. My first approach was to loop through the XML-based OWL format of the MP ontology, but I soon realised the OWL file was infrequently updated and I needed something much more current to work with. The other thing I noted was that other ontologies were not available in OWL format either, so what was really needed was a way to scan through the OBO files.
I had a little search and saw that there were solutions in Java and a Perl library but nothing in PHP, which is what my main application is written in so I decided to write my own OBO parser in PHP. Initially, it was going to be a really simple script to just loop through the OBO file but after reading the OBO format specification I realised I might as well write a proper library and set about writing PhpObo for myself and anyone else who would need it.
I have published the PhpObo library on Github under the Apache 2.0 license so feel free to use it, modify it and contribute if you wish. It is written in a flexible object oriented manner so you can swap out virtually any class from the library with your own version or extend its functionality. It is not a complete solution since it only works with one document at a time and doesn't resolve external ontology dependencies. But it does serve most people's needs and allow you to loop through any OBO file and it also allows you to generate your own OBO document using either an OOP or an Array-based (ArrayAccess) approach and serialize it out in the OBO file format.
If you wish to use PhpObo in your PHP 5.3+ project, I recommend you use a PHP PSR-0 dependency manager and autoloader like Composer to import the PhpObo project via Packagist.
I had a little search and saw that there were solutions in Java and a Perl library but nothing in PHP, which is what my main application is written in so I decided to write my own OBO parser in PHP. Initially, it was going to be a really simple script to just loop through the OBO file but after reading the OBO format specification I realised I might as well write a proper library and set about writing PhpObo for myself and anyone else who would need it.
I have published the PhpObo library on Github under the Apache 2.0 license so feel free to use it, modify it and contribute if you wish. It is written in a flexible object oriented manner so you can swap out virtually any class from the library with your own version or extend its functionality. It is not a complete solution since it only works with one document at a time and doesn't resolve external ontology dependencies. But it does serve most people's needs and allow you to loop through any OBO file and it also allows you to generate your own OBO document using either an OOP or an Array-based (ArrayAccess) approach and serialize it out in the OBO file format.
If you wish to use PhpObo in your PHP 5.3+ project, I recommend you use a PHP PSR-0 dependency manager and autoloader like Composer to import the PhpObo project via Packagist.
Friday, 29 July 2011
String DB
RAS, RAF, MEK, MAPK... or something like that. That's what I remember as being the chain of gene activation to activate an oncogene leading to cancer (P53 getting its ass kicked) . No I can't remember the details but these activation/interactions form a tree of what gene up or downregulates the function of another and it's quite a complicated part of genetics and bioinformatics. But I found a website recently that I hadn't seen before and thought of sharing it because apart from getting a nice little description of what the gene does in different species, it has nice spider diagrams like this showing gene relationships:
http://string-db.org/
A search example, PAX6: http://string-db.org/newstring_cgi/show_network_section.pl?caller_identity=expasy_api&identifier=pax6
If you click continue at the bottom of that page you see the diagram.
http://string-db.org/
A search example, PAX6: http://string-db.org/newstring_cgi/show_network_section.pl?caller_identity=expasy_api&identifier=pax6
If you click continue at the bottom of that page you see the diagram.
Subscribe to:
Posts (Atom)