Friday 19 November 2010

Pretty Microscope Pictures: Olympus Bioscapes Digital Photocompetition

1st Place 2010 Olympus BioScapes Digital Imaging Competition: Eyes of daddy longlegs. A frontal section of Phalangium opilio eyes. The lenses, retinasand optic nerves are visible. The image is a depth color-coded projection of a confocal image stack. (Igor Siwanowicz, Max Planck Institute for Neurobiology, Munich, Germany)



Monday 18 October 2010

Unique value row count in SAS

You may have come across the need to count unique values in your dataset and put that as a row in your dataset/table - something like this:

id, name
1, Alan
2, Brad
2, Brad
3, David
3, David
3, David
4, George
5, Joe
6, Steven
7, Zed
7, Zed

The solution involves using the LAG() function. Lag is like a look-back function and for each row that is processed it looks back a row and fetches the value. So, if I were at observation (row) 2 and did LAG(name), it would return the value of "Alan". The retain function puts an initial temporary value in a variable so I can use it in my processing.

Anyway, here's the code:

data pupils;
    input name $;
    datalines;
Alan
Brad
Brad

David
David
David
George
Joe
Steven
Zed
Zed
;
run;


data store;
    set pupils;

    prevname = lag(name);

    format id BEST12.;
    retain id 0;
     if name^=prevname then id = id + 1;

    drop prevname;
run;

Sunday 10 October 2010

Biggest Genome ever? It's plant vs amoeba

A few days ago there was an article in Science magazine that they had discovered a plant (Paris japonica) with the largest genome ever found with 149 billion base pairs...

Except, I remembered there's an amoeba (Polychaos dubium - what a bloody weird name by the way!) with an even bigger genome of 670 billion base pairs.

So it got me wondering - how could Science magazine get it so wrong?! Well, this comment on the science mag site gave the reason why they would reject dubium:


"While this and some other Amoeba have been reported to have such very large genomes, some caveats regarding their reliability are perhaps in order.
The measurement for Amoeba dubia and other protozoa which have been reported to have very large genomes were made in the 1960s using a rough biochemical approach which is now considered to be an unreliable method for accurate genome size determinations. The method uses whole cells rather than isolated nuclei and thus will include not only DNA from the mitochondria but also any DNA in engulfed food organisms. Also some of the species are multinucleate.
The accuracy of the genome size estimates are also called into question given that a related species, Amoeba proteus, which was reported to have a genome size of 300 pg was more recently shown to be an order of magnitude smaller (34 - 43 pg DNA per cell). Like the situation for dinoglagellates (see below), the genomes of these Amoeba are clearly large but to know just how big requires their genome sizes to be estimated using modern best practice techniques available today. Only then will it be possible to know just how their genomes compare in size to those of Paris japonica." {Quoted from Ilia}
The reasons given in the quote can be ignored except for the clincher which is in bold. If we assume the same experimental conditions and level of accuracy, dubium will be found to have about 69pg DNA.

Game over, the plant wins.

Wednesday 29 September 2010

Difficult SAS issue - restructuring datasets with proc transpose

I've been trying to solve this problem for a few hours and have tried many different things. In the end, it was two sets of a process that solved the problem with proc transpose. It's worth documenting so I'm blogging it.

The Problem:

I have an example dataset that contains observations like so:


(dataset: dayobs)
days_range,    value,    count
0-9,    300,    6
60-69,    250,    4
300-309,    76,    1

I wanted it to show all the ranges with value and count set to zero if it wasn't already set:

days_range,    value,    count
0-9,    300,    6
10-19,     0,     0
20-29,     0,     0
30-39,     0,     0
40-49,     0,     0
50-59,     0,     0
60-69,    250,    4
... etc ... 
290-299,    0,    0
300-309,    76,    1

That was easy, simply make a dataset that lists the day_range from 1-10 ... 390-400.

data ranges;
    input days_range $;
    format days_range $7.;
    datalines;
0-9
10-19
... etc ...
    ;
run;

and then merge both the datasets - dayobs and ranges - to create a new dataset called daysrange:

data daysrange;
    merge ranges dayobs;
    by days_range;
run;


OK. Now, I wanted to transpose it into a single row so I can insert that into a big dataset that logs the changes every day. I want to make it look like so:

v1_9, c1_9, v10_19, c10_19, v20_29, c20_29, ...etc... v60_69, c60_69, ...etc
300, 6, 0, 0, 0, 0, ...etc... 250, 4, ...etc

It might seem obvious what to do but I tried to be clever and did this:

data rowifieddaysrange;
    set daysrange;


    if days_range='0-9' then do;
        c1_9=count;
        v1_9=value;
    end;
    else if days_range='10-19' then do;
        c10_19=count;
        v10_19=value;
    end;
 etc...


keep c1_9 v1_9 c10_19 v10_19 etc...;
run;

and this is what the data looked a bit like in the end:

0, 0, 0, 0, 0, 300, 0, 0, 0, 0, 0, 0, 6
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
250, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0
etc.

Basically, it had the correct columns, but they were spaced out across different rows. No matter what I tried - I look at all the different sas procs and their options - I was unable to collapse/compact it into one row... if anyone knows the solution to this problem I'd really like to see it please. Ta.

OK. So guess what the solution is... It turns out you need to use two sets of proc transpose and a couple of merges, to deal with each variable individually - then finally merging the two separate datasets together at the end! :


*transpose the daysrange dataset first by count;
proc transpose data=daysrange out=tdaysrange(drop=_:) prefix=c;
    id days_range;
    var count;
run;
*now transpose the table with all the categories;
proc transpose data=ranges out=tranges(drop=_:) prefix=c;

    id days_range;
run;
*now merge them;
data cdataset;
    merge tranges tdaysrange;
run;


*transpose the daysrange dataset next by value;
proc transpose data=daysrange out=vdaysrange(drop=_:) prefix=v;
    id days_range;
   var value;
run;
*now transpose the table with all the categories;
proc transpose data=ranges out=vranges(drop=_:) prefix=v;
    id days_range;
run;
*now merge them;
data vdataset;
    merge vranges vdaysrange;
run;


*now join the the value and count datasets into one dataset!
data rowifieddaysrange;
    merge vdataset cdataset;
run;

And finally it's all in one row that looks something like this:


v1_9, c1_9, v10_19, c10_19, v20_29, c20_29, ...etc... v60_69, c60_69, ...etc
300, 6, 0, 0, 0, 0, ...etc... 250, 4, ...etc

That's a lot of work for something that aught to be straightforward!

Monday 13 September 2010

Got a new book: Pro HTML5 Programming

I got this book a few days ago and so far so good. I've read through chapter 1 and made my first HTML5 compliant page. The book is not written for absolute beginners - it assumes you've done HTML4/XHTML1, CSS2-3 and have a good grasp of Javascript - but I have these so it's all good. I'll report back once I've read the whole book and tried everything.

Thursday 22 July 2010

Amusing Unix/Linux Commands

From the book 'The Unix-Haters Handbook'.

% rm meese-ethics
rm: meese-ethics nonexistent

% "How would you rate Dan Quayle's incompetence?
Unmatched ".

% ^How did the sex change^ operation go?
Modifier failed.

% If I had a ( for every $ the Congress spent, what would I have?
Too many ('s.

% make love
Make: Don't know how to make love. Stop.

% sleep with me
bad character

% got a light?
No match.

% man: why did you get a divorce?
man:: Too many arguments.

% ^What is saccharine?
Bad substitute.

% %blow
%blow: No such job.

These attempts at humor work with the Bourne shell:

$ PATH=pretending! /usr/ucb/which sense
no sense in pretending!


$ drink <bottle; opener
bottle: cannot open
opener: not found

$ mkdir matter; cat >matter
matter: cannot create

Friday 2 July 2010

Nil Points

I submitted a PHP class (Music Albums Year Analyzer) to PHPClasses.org about a month ago and they nominated it for the monthly innovation award contest. Guess how many people voted for my class...

Not a single person. lol.

I came joint last (12th). Seems no-one wants to analyse their music collection or they didn't like my coding style or something?

A few months ago I submitted another class, the Bounded Queue, and I came 9th. At least that's not last. I write classes I haven't seen been done in PHP before and that are somewhat useful. But most of the cool things have already been done so what's left is not that interesting usually.

Friday 25 June 2010

Whales are great

I found this BBC resource called Great Whales a few days ago. It covers Blue, Fin, Right, Sei, Sperm, Bowhead, Bryde's, Humpback, Gray, and Minke whales. Check it:

10 years of the Human Genome

The completion of the draft human genome sequence was announced ten years ago. Nature 's survey of life scientists reveals that biology will never be the same again. Declan Butler reports.

Declan Butler

"With this profound new knowledge, humankind is on the verge of gaining immense, new power to heal. It will revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases." So declared then US President Bill Clinton in the East Room of the White House on 26 June 2000, at an event held to hail the completion of the first draft assemblies of the human genome sequence by two fierce rivals, the publicly funded international Human Genome Project and its private-sector competitor Celera Genomics of Rockville, Maryland (see Nature 405, 983–984; 2000).

Ten years on, the hoped-for revolution against human disease has not arrived — and Nature 's poll of more than 1,000 life scientists shows that most don't anticipate that it will for decades to come (go.nature.com/3Ayuwn). What the sequence has brought about, however, is a revolution in biology. It has transformed the professional lives of scientists, inspiring them to tackle new biological problems and throwing up some acute new challenges along the way.

Almost all biologists surveyed have been influenced in some way by the availability of the human genome sequence. A whopping 69% of those who responded to Nature 's poll say that the human genome projects inspired them either to become a scientist or to change the direction of their research. Some 90% say that their own research has benefited from the sequencing of human genomes — with 46% saying that it has done so "significantly". And almost one-third use the sequence "almost daily" in their research. "For young researchers like me it's hard to imagine how biologists managed without it," wrote one scientist.
“69% were inspired by the genome to become a scientist or change their research direction.”
The survey, which drew most participants through Nature 's print edition and website and was intended as a rough measure of opinion, also revealed how researchers are confronting the increasing availability of information about their own genomes. Some 15% of respondents say that they have taken a genetic test in a medical setting, and almost one in ten has used a direct-to-consumer genetic testing service. When asked what they would sequence if they could sequence anything, many respondents listed their own genomes, their children's or those of other members of their family (the list also included a few pet dogs and cats).
Some are clearly impatient for this opportunity: about 13% say that they have already sequenced and analysed part of their own DNA. One in five said they would have their entire genome sequenced if it cost US$1,000, and about 60% would do it for $100 or if the service were offered free. Others are far more circumspect about sequencing their genome — about 17% ticked the box saying "I wouldn't do it even if someone paid me".

Nature 's poll also gauged where the sequence has had the greatest effect on the science itself. Although nearly 60% of those polled said they thought that basic biological science had benefited significantly from human genome sequences, only about 20% felt the same was true for clinical medicine. And our respondents acknowledged that interpreting the sequence is proving to be a far greater challenge than deciphering it. About one-third of respondents listed the field's lack of basic understanding of genome biology as one of the main obstacles to making use of sequence data today.

Sequence is just the start

Studies over the past decade have revealed that the complexity of the genome, and indeed almost every aspect of human biology, is far greater than was previously thought (see Nature 464, 664–667; 2010). It has been relatively straightforward, for example, to identify the 20,000 or so protein-coding genes, which make up around 1.5% of the genome. But knowing this, researchers note, does not necessarily explain what those genes do, given that many genes code for multiple forms of a protein, each of which could have a different role in a variety of biological processes. "The total sequence was needed, I think, to allow us to see that our one gene–one protein model of genetics was much too simplistic," wrote one respondent.

A decade of post-genomic biology has also focused new attention on the regions outside protein-coding genes, many of which are likely to have key functions, through regulating the expression of protein-coding genes and by making a slew of non-coding RNA molecules. "Now we understand," wrote another survey respondent, "that, without looking at the dynamics of a genome, determining its sequence is of limited use." Some big projects are under way to fill in the gaps, including the Encyclopedia of DNA Elements (ENCODE) and the Human Epigenome Project, an effort to understand the chemical modifications of the genome that are now thought to be a major means of controlling gene expression.

The biggest effects of the genome sequence, according to the poll, have been advances in the tools of the trade: sequencing technologies and computational biology. Technological innovation has sent the cost of sequencing tumbling, and the daily output of sequence has soared (see Nature 464, 670–671; 2010). "Deep sequencing technology is now becoming a staple of scientific research. Would this have occurred if it wasn't for the technological push required to finish the human genome?" read one response.

Data dreams, analysis nightmares

Cheaper and faster sequencing has brought its own problems, however, and our survey revealed how ill-equipped many researchers feel to handle the exponentially increasing amounts of sequence data. The top concern — named by almost half of respondents — was the lack of adequate software or algorithms to analyse genomic data, followed closely by a shortage of qualified bioinformaticians and to a lesser extent raw computing power. Other concerns include data storage, the quality of sequencing data and the accuracy of genome assembly. Commenting on the survey results, David Lipman, director of the US National Center for Biotechnology Information in Bethesda, Maryland, says that the worries about data handling and analysis were an issue even in the earliest discussions of the genome project. Perhaps, he suggests, "there's a sort of disappointment that despite having so much data, there is still so much we don't understand".

Eric Green, director of the National Human Genome Research Institute (NHGRI) in Bethesda, says that the institute is well aware of the need for more bioinformatics experts, better software and a clearer understanding of how the differences between genomes influence human health. He says the institute is planning to publish in late 2010 its next strategic five-year plan for the genomics field. One possible solution to the computing challenge, which was discussed at an NHGRI workshop in late March, is cloud computing, in which laboratories buy computing power and storage in remote computing farms from companies such as Google, Amazon and Microsoft. The European Nucleotide Archive, launched on 10 May at the European Molecular Biology Laboratory's European Bioinformatics Institute in Cambridge, UK, will also offer labs free remote storage of their genome data and use of bioinformatics tools.
“13% have sequenced part of their own DNA.”
Given ten years' of hindsight and the current set of obstacles, it's no surprise that researchers now state somewhat modest expectations for what human genomics can deliver and by when. The rationale for sequencing and exploring the human genome — to revolutionize the finding of new drugs, diagnostics and vaccines, and to tailor treatments to the genetic make-up of individuals — is the same today. But almost half of respondents now say that the benefits of the human genome were oversold in the lead up to 2000. "While I do feel that the gains made by the human genome project are extraordinary and affect my research significantly, I still feel that it was overhyped to the general population," read one typical response. More than one-third of respondents now predict that it will take 10–20 years for personalized medicine, based on genetic information, to become commonplace, and more than 25% even longer than that. Some 5% don't expect it will happen in their lifetime. "Our understanding of the genome will not come in a single flash of insight. It will be an organized hierarchy of billions of smaller insights," says David Haussler, head of the Genome Bioinformatics Group at the University of California, Santa Cruz.

Green says that when the Human Genome Project was envisioned, scientific leaders of the day predicted that it would take 15 years to generate the first sequence, and a century for biologists to understand it. "I think they got that about right," he says. "While we still don't have all the answers — being a mere 10% of the way into the century with a human genome sequence in hand — we have learned extraordinary things about how the human genome works and how alterations in it confer risk for disease."
Haussler agrees. "All that happened in the first ten years is still just early rumblings of much more dramatic changes to come when we begin to truly understand the genome," he says.

15 years of PHP

PHP was released by Rasmus Lerdorf on June 8, 1995. His original usenet post is still available online if you want to examine a computing artefact from the dawn of the web. Many of us owe our careers to the language, so here’s a brief history of PHP…

PHP originally stood for “Personal Home Page” and Rasmus started the project in 1994. PHP was written in C and was intended to replace several Perl scripts he was using on his homepage. Few people will be ancient enough to remember CGI programming in Perl, but it wasn’t much fun. You could not embed code within HTML and development was slow and clunky.

Rasmus added his own Form Interpreter and other C libraries including database connectivity engines. PHP 2.0 was born on this day 15 years ago. PHP had a modest following until the launch of version 3.0 in June 1998. The parser was completely re-written by Andi Gutmans and Zeev Suraski; they also changed the name to the recursive “PHP: Hypertext Preprocessor”.

Critics argue that PHP 3.0 was insecure, had a messy syntax, and didn’t offer standard coding conventions such as object-orientated programming. Some will quote the same arguments today. However, while PHP lacked elegance it made web development significantly easier. Programming novices could add snippets of code to their HTML pages and experts could develop full web applications using an open source technology which became widely installed by web hosts.

PHP 4.0 was released on May 22, 2000. It provided rudimentary object-orientation and addressed several security issues such as disabling register_globals. Scripts broke, but it was relatively easy to adapt applications for the new platform. PHP 4.0 was an instant success and you’ll still find it offered by web hosts today. Popular systems such as WordPress and Drupal still run on PHP 4.0 even though platform development has ceased.

Finally, we come to PHP 5.0 which was released on July 13, 2004. The language featured more robust object-orientated programming plus security and performance enhancements. The uptake has been more sedate owing to the success of PHP 4.0 and the introduction of competing frameworks such as ASP.NET, Ruby and Python.

PHP has its inconsistencies and syntactical messiness, but it’s rare you’ll encounter a language which can be installed on almost any OS, is provided by the majority of web hosts, and offers a similar level of productivity and community assistance. Whatever your opinion of the language, PHP has provided a solid foundation for server-side programming and web application development for the past 15 years. Long may it continue.

http://www.sitepoint.com/blogs/2010/06/09/php-15-birthday/

Friday 18 June 2010

SAS Macro Functions - how to return a value

I've replaced this article with a newer one: http://bioinfornetics.blogspot.com/2011/01/sas-macro-variables-type-casting.html

Please visit that link to find out how to return a value using the SAS macro programming language.

Btw, formats are hard to remember for SAS, but here's a very useful resource that lists them: http://www.sascommunity.org/wiki/TS_486_Functions,_Informats,_and_Formats

Friday 4 June 2010

Microsoft Visual C# 2010 takes forever to install

I decided to install Visual C# Express 2010 with the MS-SQL Express database and I couldn't believe how long it took to install. It downloaded about 220MB but it took about 55 minutes to install everything on my laptop and about 40 minutes on my relatively powerful desktop. It asks you to restart twice during the installation too! How irritating.

Then when you start up the program it takes ages to load up. Nice one Microsoft :\.

Tuesday 18 May 2010

SAS Macro Programming is a pain in the butt!

I've been playing around with SAS macros these past few days and I must say I am totally shattered and depressed. It took me ages to find out why the SAS macros were not working in the first place - apparently macros are not meant to be run in "Open Space". They are meant to be placed inside data or proc steps... except that's not how you make global variables. However, it IS possible to run macros in open space though you need to add a function or two to get it working. Then I had issues with formatting and dates not being interpreted correctly. Finally I couldn't run a system command line command except on the local machine. Gah!

First. let's set some variables:

%let var_name = value;

The value can be a number or a string, but you're not meant to quote the string, like you usually do with SAS strings. If you want to see the value of var_name in the log (helps a great deal with debugging!):

%put &var_name.;

OK. Let's say you want to run a function to assign a value to a variable. Not straightforward - if you're in open space you need to call a special macro function %sysfunc:

%let var_name = %sysfunc( mdy(05, 18, 2010) ); * var_name holds date int value;

Note the command mdy() is in American Month-Day-Year mode.
What if we want to format this for output?

%let var_name = %sysfunc( putn( %sysfunc( mdy(05, 18, 2010) ), DATE7.) ); *holds 18MAY2010;

OK. Let's say I want to write a macro procedure/function or whatever they call it, and run it. Check out the silly if statements and loops.

%macro chk;
    * Check Sunday - yes, it counts the week days from Sunday - How annoying;
    * &yesterDate is already declared intnx(Day, "&sysdate"d, -1);
    %let weekday = %sysfunc( WEEKDAY( &yesterDate. ) );
    %if &weekday. = 1 %then %do;
        * intnx() is a great function - if its Sunday it sets yesterDate to Friday;
        %let yesterDate = %sysfunc( intnx(Day, &yesterDate., -2) );
    %end;
    * Now I want to declare an array... except macros don't do arrays, apparently!
    * Recommended guidlines say to do this:
    %let year = %sysfunc( YEAR(&yesterDate.) );

    %let Xmas1 = %sysfunc( mdy(12, 24, &year.) );
    %let Xmas2 = %sysfunc( mdy(12, 25, &year.) );
    %let Xmas3 = %sysfunc( mdy(12, 26, &year.) );
    %Do i = 1 %to 3;
        %if &yesterDate. = &&Xmas&i %then %do;
            %let yesterDate = %sysfunc( mdy(12, 23, &year.) );
            * Check its not Sunday again - Recursive Macro calling;
            %chk;
        %end;
    %end;
    * I have another loop to check for more holidays but I'm ommiting that;
%mend;
 
Now to run the macro I write:
 
%chk; 
 
but it would still work if I missed the ending semicolon - which is shocking because nothing else except multi-line comments work without an ending semicolon. Even single line comments and half finished statements need a semicolon on the end of a line.
 
OK. Now I had to run a command on the system. I had two options - the X command and the call system() command. X is a unixy command line thinga-ma-jig and can be called from anywhere - open space or not, while call system() can only be called inside a step. They are used like so:
 
X "move /path/to/file1.txt path/to/file2.txt"; *this is how you rename a file under unix/linux btw;
data _null_;
    call system("move /path/to/file1.txt /path/to/file2.txt"); *while windows does have the rename command, move works fine;
run;
 
An interesting thing is that you can send multiple commands through X seperated by semicolons while with call system() it does one command at a time.
 
Here's the problem... X didn't work on the remote server but call system() only worked on the local system. The rest of the script is meant to work on the remote server though. So this means that I'll need to run this part locally, comment it out temporarily, then run the whole script again remotely. Or I can skip that mess which took me ages to figure out why it wasn't working and manually change the file names.
 
SAS Sucks.

Saturday 8 May 2010

CSS3 is the bomb and HTML5 forms in Opera

I recently redesigned one of my personal websites and used some CSS3 coding. It works great in Firefox and Opera but not in Internet Explorer. IE7+, at least, looks acceptable, but with the dreaded IE6 it is a bit broken up. Now, if I were the type of person who had an inkling of respect for anyone who still uses IE6 I would have solved this problem for IE6 users but instead I've left it as it is and written a bit of JavaScript to tell (off) IE6 users to upgrade.

However, if I ever need to make CSS3 work in IE or need to remember the cool CSS3 effects, here's where I'd go: http://www.smashingmagazine.com/2010/04/28/css3-solutions-for-internet-explorer/

It covers Opacity / Transparency, Rounded Corners, Box Shadow, Text Shadow, Color Gradients, Transparent Background Colors, Multiple Backgrounds and Element Rotation, and the options available for IE. It's an awesome resource.

On another note, I've been looking at HTML5 from a distance for quite a while now but really haven't got my hands dirty yet. I found this great HTML5 slide show, which covers almost everything, but then I found something else, an example of HTML5 form widget, something that Firefox (3.6.3) doesn't show but Opera (10.51) gets working.

Basically, without Javascript or anything, just plain HTML5, for date and time fields, Opera shows a widget where you can pick the date and time with your mouse or keyboard. It's just so surprising to find something like this without needing heavy JS libraries and frameworks. Hopefully, the rest of the browsers will follow suite and build better such widgets and in the future we'll be using far less JS (or there will be some wicked abstraction going on).

Monday 19 April 2010

Two SQL tips

Here's something that I've overlooked but naturally managed to get right (most of the time) when writing SQL commands - the order of the statements. This is how it should be done:

select ...
    from ...
        where ...
            group by ...
                having ...
                    order by ...;

It might seem obvious but it is important to remember it. Here are two sayings that make it easy to remember the order:

Some French Workers Glue Hardwood Often.
San Francisco, Where The Greatful Dead Heads Originated.
 
In case you're wondering, I didn't come up with those.
 
Another tip with SQL, is that when you use UNION, the statements coming after the WHERE clause must come after the last SQL SELECT and WHERE statement. This is how to write it properly:
 
SELECT id, name, password, 'Student' AS position FROM students
WHERE username LIKE '%ar%'
UNION
SELECT id, name, password, 'Staff' AS position FROM staff
WHERE username LIKE '%ar%' 
ORDER BY position;
 
This makes the whole dataset ordered by position. If you try to order individual SELECTs it will throw an error.

Wednesday 14 April 2010

Some SAS Snippets

Don't show the current date on reports:

options nodate;

To load/import SAS datasets (sas7bdat files) you tell SAS about the library (folder) which they're in:

libname Morty "Z:\path\to\Dummy Data\";

This will create a library icon that will show up on the left in the library explorer. I want to copy the files to my work folder using more manageable names:

data Dec09;
    set Morty.cybdec09;
run;
data Jan10;
    set Morty.cybjan10;
run;
data Feb10;
    set Morty.cybfeb10;
run;

Next, I plan to merge the datasets into a single table by a mutual variable (column) name, Account_no, but I need to sort the datasets by that before merging.

proc sort data=Dec09;
    by Account_no;
run;
proc sort data=Jan10;
    by Account_no;
run;
proc sort data=Feb10;
    by Account_no;
run;


data allmort;
    merge Dec09 Jan10 Feb10;
    by Account_no;
run;

What if I want to insert Datalines as CSV (delimited by commas) by hand instead of the horrible default manual space-padded SAS format:

data trydel;
    input usrname $ finger_count;
    infile datalines delimiter=",";
    datalines;
Adam, 10
Bob, 11
Chris, 9
Dan, 10
Eddy, 12
Fred, 10
Gary, 7
Hugh, 5
Ian, 10
Joe, 10
Kev, 10
Lee, 10
Mike, 9
Norm, 10
Otis, 8
Pat, 9
;
run;
 
Sort that data by a variable (column):
 
proc sort data=trydel;
    title "Everybody by finger count";
    by DESCENDING finger_count DESCENDING usrname;
run;

SAS proc sort-by arguments are not seperated with commas and you should type DESCENDING in full and before the variable (column) name, unlike SQL where DESC and ASC are sufficient as acronyms and it comes after the variable name. Also the SAS keyword ASCENDING doesn't exist! It sorts ascending by default.

Make a standard report without SQL:

proc print data=trydel noobs;
    title "More than 8 Fingers";
    where finger_count > 8;
run;

"Noobs" (NO OBServation count) makes sure that the SAS report doesn't show a row count next to the data.

Now use SQL to get the data you need:

proc sql;
    title "SQL Where Names gt Bob";
    SELECT usrname, finger_count format=z2.
    FROM Work.trydel WHERE usrname > "Bob"
    ORDER BY finger_count DESC, usrname DESC;
quit;

Note that the library name Work (or wherever your dataset is), has to be explicitly named before the dataset (table) name, like so: Work.dataset. (Library.Table).
Also note the "format=z2." part. It formats the number with leading zeros, so instead of showing 7, 9, 11, 10, it will show 07, 09, 11, 10. Of course, you need to find out how many digits to set, so if you wanted to show more leading zeros, put a larger number after z.

Another thing that is different is the string length checking SQL function, often named STRLEN in other diallects. In SAS it's just "LENGTH". Usage: SELECT * FROM table WHERE LENGTH(usrname) > 2.

Other useful Proc SQL functions include IS NULL / IS NOT NULL and IS MISSING / IS NOT MISSING.

This is a good resource: http://en.wikiversity.org/wiki/Data_Analysis_using_the_SAS_Language

Monday 5 April 2010

I have a job now!

I am employed now! After what feels like forever, about a year and a half of searching! I'm well pleased. It's at the Leeds Yorkshire Bank head offices. Thankfully, they had faith in this graduate and gave him a chance - more organizations aught to do that with graduates.

Sunday 21 March 2010

Opinion Piece: The Sun Microsystems website is ugly now that Oracle run it





Emblazoned across every page on the Sun Microsystems website you now see the Oracle logo.  They have basically modified the style sheets to display gray beams here and there and the logo shows up on every page. If you visit http://www.sun.com now, it automatically redirects to the Oracle site. Some of the java.sun.com pages that used to have more content are now trimmed down and thus less useful. The good old calm blueness of the Java site is forever gone and the new industrial, corporate grey and red has replaced it. It is now an unbearably ugly website and frankly, it fills me with grief.. I hate the Oracle logo and the redness. Oracle are branding Sun with a red hot iron and man it hurts. /bitch

Thankfully, the MySQL website has not been violated... yet.

Wednesday 10 March 2010

Dolphins have diabetes off switch



Dolphin (NMM Foundation)
Dolphins appear to be resistant to insulin, say researchers
A study in dolphins has revealed genetic clues that could help medical researchers to treat type 2 diabetes.
Scientists from the US National Marine Mammal Foundation said that bottlenose dolphins are resistant to insulin - just like people with diabetes.
But in dolphins, they say, this resistance is switched on and off.
The researchers presented the findings at the annual meeting of the American Association for the Advancement of Science (AAAS) in San Diego.
They hope to collaborate with diabetes researchers to see if they can find and possibly even control an equivalent human "off switch".
The team, based in San Diego, took blood samples from trained dolphins that "snack" continuously during the day and fast overnight.
"The overnight changes in their blood chemistry match the changes in diabetic humans," explained Stephanie Venn-Watson, director of veterinary medicine at the foundation.
This means that insulin - the hormone that reduces the level of glucose in the blood - has no effect on the dolphins when they fast.
Big brains
In the morning, when they have their breakfast, they simply switch back into a non-fasting state, said Dr Venn-Watson. In diabetic people, chronic insulin resistance means having to carefully control blood glucose, usually with a diet low in sugar, to avoid a variety of medical complications.
But in dolphins, the resistance appears to be advantageous. Dr Venn-Watson explained that the mammals may have evolved this fasting-feeding switch to cope with a high-protein, low-carbohydrate diet of fish.
"Bottlenose dolphins have large brains that need sugar," Dr Venn-Watson explained. Since their diet is very low in sugar, "it works to their advantage to have a condition that keeps blood sugar in the body… to keep the brain well fed".
But other marine mammals, such as seals, do not have this switch, and Dr Venn-Watson thinks that the "big brain factor" could be what connects human and dolphin blood chemistry.
There are several interesting diseases that you only see in humans and dolphins
Lori Schwacke
NOAA
"We're really looking at two species that have big brains with high demands for blood glucose," she said.
"And we have found changes in dolphins that suggest that [this insulin resistance] could get pushed into a disease state. "If we started feeding dolphins Twinkies, they would have diabetes."
Genetic link
Since both the human genome and the dolphin genome have been sequenced, Dr Venn-Watson hopes to work with medical researchers to turn the discovery in dolphins into an eventual treatment for humans.
"There is no desire to make a dolphin a lab animal," she said. "But the genome has been mapped - so we can compare those genes with human genes."
Scientists at the Salk Institute in San Diego have already discovered a "fasting gene" that is abnormally turned on in people with diabetes, "so maybe this is a smoking gun for a key point to control human diabetes", Dr Venn-Watson said.
If scientists can find out what switches the fasting gene on and off in dolphins, they may be able to do the same thing in people.
Lori Schwacke, a scientist from the National Oceanic and Atmospheric Administration (NOAA) in Charleston, South Carolina, said that the work demonstrated that there are interesting similarities between dolphins and humans.
Dr Schwacke, who is studying the effect of pollution on dolphins along the coast of the US state of Georgia, is also interested in the links between dolphin and human health.
"There are several interesting diseases that you only see in humans and dolphins," she told BBC News. In this case, Dr Venn-Watson said, "the fundamental difference is that dolphins can switch it off and humans can't".

http://news.bbc.co.uk/1/hi/sci/tech/8523412.stm

The future of medicine: Gene Testing

Gene test to see which diet is best for you: http://news.bbc.co.uk/1/hi/health/8550091.stm
Gene test to identify best chemotherapy drugs for cancer patients: http://news.bbc.co.uk/1/hi/health/8539502.stm
Studying cancer genomes: http://www.guardian.co.uk/science/2009/dec/16/cancer-genome-sequences-genetic-mutations

Human gut microbes hold 'second genome'



Clostridium difficile bacteria
Clostridium difficile bacteria, a normal inhabitant of the human gut
The human gut holds microbes containing millions of genes, say scientists.
In fact, there are more genes in the flora in the intestinal system than the rest of our bodies. So many that they are being dubbed our "second genome".
A study published in the journal Nature details the analysis of the genes, carried out to better understand how the gut flora is affected by disease.
"Basically, we are a walking bacterial colony," said Professor Jeroen Raes, one of the researchers involved.
"There is a huge diversity. We have about 100 times more microbial genes than human genes in the body. We also have 10 times more bacterial cells in our body than human cells," he told BBC News. Most of the microbes present in our bodies live in the gut.
We're basically living in symbiosis with these microbes
Professor Jeroen Raes
The study was led by Professor Jun Wang from the Beijing Genomics Institute-Shenzhen.
Scientists from Germany, Belgium, Denmark, Spain, France and the UK also took part in the international effort, named the European MetaHIT consortium, which has been co-ordinated by Dr Stanislav Dusko Ehrlich.
"Everyone was so motivated," said Dr Dusko Ehrlich. "To have such an exciting project to run - it's a piece of cake. The work went much faster than we expected."
Professor Raes, who works at Vrike Universiteit Brussel, explained why the microbes warranted such an intensive study: "Gut flora is crucial for our health. We're basically living in symbiosis with these microbes.
"The bacteria help digest food, provide vitamins, protect us from invading pathogens. If there's a disturbance, people get all sorts of diseases such as Crohn's disease, Ulcerative colitis, and a link has also been made to obesity."
Untangling a mess
The researchers have developed what is called a metagenome, a combined genome of all the bacteria sequenced at once.
"This creates a huge dataset that has to be disentangled," explained Professor Raes. "The untangling of this mess is what I do; it's my role in the study."
The team analysed faecal matter from 124 Europeans and found each person had about 160 bacterial species. The samples were more alike than they had expected and a significant fraction of the bacteria was shared between all the people who took part.
We already have very exciting results in terms of differences between healthy and sick people
Dr Stanislav Dusko Ehrlich
By mapping the genes, the scientists have found a way around the problem of having to culture bacteria in order to study them.
Many bacteria are very difficult to grow in cultures in the lab. From looking at the genes, the researchers hope to be able to investigate how the flora changes when a person has a disease.
"It will allow us to understand diseases better," said Professor Raes. "We know there is a microbial component but we don't know exactly how [it works]. We will use it for prognostic and diagnostic markers so we can predict disease severity or sensitivity to these diseases."
Dr Dusko Ehrlich said the work was showing promising results: "We have extremely interesting findings based on the results of this gene catalogue. We already have very exciting results in terms of differences between healthy and sick people."
Professor Elaine Holmes from Imperial College, London, who was not involved in the research, said it was a welcome advance on previous studies.
"The article is extremely timely given the escalating interest in the influence of the gut microbiota in many aspects of health ranging from Irritable Bowel Disease, sepsis and obesity to autism," she told BBC News.
"It uses a large number of participants and therefore one assumes it is more representative of the 'real' microbial composition than previous studies. Also, it is an amazing feat of data processing."

http://news.bbc.co.uk/1/hi/sci/tech/8547454.stm

Neural Networks in PHP

PHPclasses.org recently published an article about a PHP implementation of Neural Networks by neuralmesh.com. The great thing about this is you don't really need to understand the inner workings of NNs but you can just use their framework to get things done. This might come in very handy in the future: http://www.phpclasses.org/blog/post/119-Neural-Networks-in-PHP.html

Just one thing I noticed, their Connect 4 NN doesn't learn very well at all.

Ancient DNA

I can't believe this is something I missed all these years. I was always under the impression that biological molecules break down very quickly upon death but apparently they have been discovering fragments (<1kb) of ancient DNA for over 30 years. Basically, they find ultra small DNA samples in fossils and inside amber (like Jurassic Park), make a whole load of copies using PCR and then analyse it. Recently they managed to extract DNA from the eggshells of the "Elephant bird" (a bit like a huge emu).

http://scienceblogs.com/notrocketscience/2010/03/dna_from_the_largest_bird_ever_sequenced_from_fossil_eggshel.php
http://www.newscientist.com/article/mg14119104.600-fact-fiction-and-fossil-dna-analysis-of-ancient-dna-should-give-clues-about-the-origin-of-species-and-how-they-evolved-over-time-but-only-if-the-dna-really-is-ancient.html?full=true
http://findarticles.com/p/articles/mi_m1200/is_n21_v146/ai_15951710/
http://www.dailymail.co.uk/sciencetech/article-1026340/Jurassic-Park-comes-true-How-scientists-bringing-dinosaurs-life-help-humble-chicken.html

Thursday 4 March 2010

PHP anonymous function (closure) serialization

Serialization is the conversion of object states, arrays and simple variables into a String type to allow transmission and storage for later use or by another script. PHP provides the serialize() and unserialize() functions to deal with this but after experimenting with anonymous functions (also called closures) which I blogged about earlier, I found out that trying to serialize a closure threw an exception because anonymous functions are actually internal PHP Closure objects that have serialization disabled.

I thought - "there's a gap that needs to be filled", and set out to write my second class for phpclasses.org but after a quick google search I found this (Extending PHP 5.3 Closures with Serialization and Reflection), and I must say, I'm glad I'm not the one who wrote the code for this class. The code is complex and brilliant, check it out!

Sunday 28 February 2010

Dog Breeds - A cool Bing feature

Check this out (click the image). Basically it lists the dog breeds alphabetically by image:



I wish they'd do this for cats. Here's a few other things they do: http://www.bing.com/visualsearch

Monday 22 February 2010

Dolly & Royana, we hardly knew ye!


On this day (22 Feb) in 1996, Dolly, the first mammal, the first sheep, to have been successfully cloned was born in Scotland. http://news.bbc.co.uk/onthisday/hi/dates/stories/february/22/newsid_4245000/4245877.stm


And on this day in 2010, Royana, the Middle East's first, Iran's first cloned sheep, has been put down. http://www.presstv.ir/detail.aspx?id=119220&sectionid=3510208

RIP sheep.

PHP now supports anonymous functions

Did you know PHP 5.3 now supports anonymous functions? Let's look at some code.

This is the old way how you used to do it:

<?php
//the array we're working with
$arr = array(1,2,3,4,5);

//declare a function for use as callback
function callback($i){
    echo $i;
}

//use the callback
array_walk($arr, 'callback'); //outputs 12345
?>

Now, this is the new way:

<?php
$arr = array(1,2,3,4,5);

//use anonymous callback method
array_walk($arr, function($i){echo $i;}); //outputs 12345

?>

And here's another interesting thing you can now do with PHP:

<?php
$arr = array(5,4,3,2,1);

//create a variable which will be a reference to a new function later
$fn = null;

//$fn's function is declared within the arguments of another function
array_walk($arr, $fn = function($i){echo $i;}); //outputs 54321

//calls the new function in $fn
$fn(68); //outputs 68
?>

Did you know you could use external variables inside of closures? You need the use keyword and pass the variables into it separated by commas like so:

$basePath = "/usr/home/";
$getPathFor = function($name) use ($basePath) {return "$basePath$name";}
echo $getPathFor("bob"); // /usr/home/bob

Nice.

Pachnoda sinuata - That's one wierd looking beetle

 

That's not it's face... no, it's its' bum! The first time I saw this I thought it was painted on for a joke but no, it's real! Here's what he looks like from above:

Lovely Photos for all you nature lovers

Sloth: I R Chillin'

Sloths: http://news.bbc.co.uk/earth/hi/earth_news/newsid_8498000/8498058.stm
Birds: http://news.bbc.co.uk/1/hi/sci/tech/8487031.stm
Plankton: http://news.bbc.co.uk/1/hi/sci/tech/8498786.stm

Morbidly obese 'may have missing genes'

X-ray of morbidly obese person's leg
An estimated 700,000 people in the UK are morbidly obese
A small number of extremely overweight people may be missing the same chunk of genetic material, claim UK researchers.
The findings, published in the journal Nature, could offer clues to whether obesity can be "inherited" in some cases.
Imperial College London scientists found dozens of people - all severely obese - who lacked approximately the same 30 genes.
The gene "deletion" could not be found in people of normal weight.
While much of the "obesity epidemic" currently affecting most Western countries has been attributed to a move towards high-calorie foods and more sedentary lifestyles, scientists have found evidence that genes may play a significant role in influencing weight gain in some people.
It is becoming increasingly clear that for some morbidly obese people, their weight gain has an underlying genetic cause
Professor Philippe Froguel, Imperial College London
The latest study focused on the "morbidly obese", who have a Body Mass Index (BMI) of more than 40, and who are at the highest risk of health problems.
There are an estimated 700,000 of these people in the UK.
'Learning difficulties'
The first clue came by looking at a group of teenagers and adults with learning difficulties, who are known to be at higher risk of obesity, although the reasons for this are not entirely clear.
They researchers found 31 people who had nearly identical "deletions" in their genetic code, all of whom had a BMI of over 30, meaning they were obese.
Then a wider scan of the genetic makeup of a mixture of more than 16,000 obese and normal weight people revealed 19 more examples of the missing genes.
All of the people involved were classed as "morbidly obese", with a BMI of over 40, and at the highest risk of health problems related to their weight.
Most of them had been normal weight as toddlers, but then became overweight during later childhood.
None of the people studied with normal weight had the missing code.
The precise function of the missing genes is unclear, as is the precise nature of the relationship between learning difficulties and obesity - none of the people with the deletions in the wider study had learning problems.
Weight-loss surgery
Professor Philippe Froguel, from Imperial College, said: "It is becoming increasingly clear that for some morbidly obese people, their weight gain has an underlying genetic cause.
"If we can identify these individuals through genetic testing, we can then offer them appropriate support and medical interventions, such as the option of weight loss surgery, to improve their long-term health."
Dr Robin Walters, also from Imperial, said that while this particular set of deletions was rare - affecting some seven in 1,000 morbidly obese people - there were likely to be other variations yet to be found.
"The combined effect of several variations of this type could explain much of the genetic risk for severe obesity, which is known to run in families."
Dr Sadaf Farooqi, from Cambridge University, who collaborated with this research, and was involved in similar research published in December which pointed to another gene flaw which could be linked to obesity.
She said it was likely that a "patchwork" of different genetic variations would eventually emerge to explain more cases of obesity - perhaps by affecting appetite, or the rate at which the body burns fat.
She said: "There is still an important public health message about diet and exercise, but simply blaming people for their obesity is no longer appropriate."

http://news.bbc.co.uk/1/hi/health/8496938.stm

Xenophyophores - That's one wierd looking single-celled organism

Single-celled organisms are generally required to maintain microscopic sizes. Xenophyophores, immobile shell-making mud-stickers, however, brazenly ignore all requirements of general microbial decency by attaining sizes not merely macroscopic, but positively enormous (at least by unicell standards). One of the largest species, Stannophyllum venosum Haeckel 1889, is a broad flat form up to 25 cm across, although only about a millimetre thick.  Tendal (1972).

Despite such impressive dimensions, mention of them is likely to garner blank looks from most of the general public, and even from many biologists who probably should know better. This is because xenophyophores are restricted to the deep sea, not usually regarded as a prime holiday destination.  Those that are occasionally pulled up from below are probably not recognised.  Like benthic Steptoes, xenophyophores surround themselves with all sorts of junk they find lying around, which they use to make their shells, stuck together with a cement of polysaccharides.  Id.  Foraminiferan and radiolarian shells, sponge spicules, mineral grains – all are potential building materials (though individual species are often quite picky with regard to exactly what they use, and some species eschew foreign particles altogether). The particles used are referred to as xenophyae.  When the fragile test is brought up, these particles tend to all fall apart, and are hence not recognised as having once been part of a larger whole.
Image: Syringammina from the web page of J. Alan Hughes.

http://www.newscientist.com/article/dn18468-zoologger-living-beach-ball-is-giant-single-cell.html?DCMP=NLC-nletter&nsref=dn18468
http://www.palaeos.com/Eukarya/Units/Rhizaria/Xenophyophorea.html

Close encounters with Japan's 'living fossil', the Giant Salamander

Dr Takeyoshi Tochimoto gives a guided tour of the world's biggest amphibian
It soon becomes clear that the giant salamander has hit Claude Gascon's enthusiasm button smack on the nose.
"This is a dinosaur, this is amazing," he enthuses.
"We're talking about salamanders that usually fit in the palm of your hand. This one will chop your hand off."
As a leader of Conservation International's (CI) scientific programmes, and co-chair of the Amphibian Specialist Group with the International Union for the Conservation of Nature (IUCN), Dr Gascon has seen a fair few frogs and salamanders in his life; but little, he says, to compare with this.
The skeleton of this species is almost identical to that of the fossil from 30 million years ago; therefore it's called the 'living fossil'
Dr Takeyoshi Tochimoto
Fortunately for all of our digits, this particular giant salamander is in no position to chop off anything, trapped in a tank in the visitors' centre in Maniwa City, about 800km west of Tokyo.
But impressive it certainly is: about 1.7m (5ft 6in) long, covered in a leathery skin that speaks of many decades passed, with a massive gnarled head covered in tubercles whose presumed sensitivity to motion probably helped it catch fish by the thousand over its lifetime.
If local legend is to be believed, though, this specimen is a mere tadpole compared with the biggest ever seen around Maniwa.
A 17th Century tale, related to us by cultural heritage officer Takashi Sakata, tells of a salamander (or hanzaki, in local parlance) 10m long that marauded its way across the countryside chomping cows and horses in its tracks.
Shrine
The hanzaki shrine is an attempt to make up for a mythical killing
A local hero was found, one Mitsui Hikoshiro, who allowed the hanzaki to swallow him whole along with his trusty sword - which implement he then used, in the best heroic tradition, to rend the beast from stem to stern.
It proved not to be such a good move, however.
Crops failed, people started dying in mysterious ways - including Mr Hikoshiro himself.
Pretty soon the villagers drew the obvious conclusion that the salamander's spirit was wreaking revenge from beyond the grave, and must be placated. That is why Maniwa City boasts a shrine to the hanzaki.
The story illustrates the cultural importance that this remarkable creature has in some parts of Japan.
Its scientific importance, meanwhile, lies in two main areas: its "living fossil" identity, and its apparently peaceful co-existence with the chytrid fungus that has devastated so many other amphibian species from Australia to the Andes.
Close family
"The skeleton of this species is almost identical to that of the fossil from 30 million years ago," recounts Takeyoshi Tochimoto, director of the Hanzaki Institute near Hyogo.
"Therefore it's called the 'living fossil'."
The hanzaki (Andrias japonicus) only has two close living relatives: the Chinese giant salamander (A. davidianus), which is close enough in size and shape and habits that the two can easily cross-breed, and the much smaller hellbender (Cryptobranchus alleganiensis) of the south-eastern US.
Creatures rather like these were certainly around when dinosaurs dominated life on land, and fossils of the family have been found much further afield than their current tight distribution - in northern Europe, certainly, where scientists presumed the the lineages had gone extinct until tales of the strange Oriental forms made their way back to the scientific burghers of Vienna and Leiden a couple of centuries ago.
"They are thought to be extremely primitive species, partly due to the fact that they are the only salamanders that have external fertilisation," says Don Church, a salamander specialist with CI.
Scientists at the Hanzaki Insitute filmed a fight between two of the giant beasts
The fertilisation ritual must be quite some sight.
Into a riverbank den that is usually occupied by the dominant male (the "den-master") swim several females, and also a few other males.
The den-master and the females release everything they have got, turning incessantly to stir the eggs and spermatozoa round in a roiling mass.
Maybe the lesser males sneak in a package or two as well; their function in the ménage-a-many is not completely clear.
They have bacteria living on their skin that produce peptides that are lethal to the amphibian chytrid fungus
Don Church, Conservation International
When the waters still, everyone but the den-master leaves; and he alone guards the nest and its juvenile brood.
It is not an ideal method of reproduction.
Research shows that genetic diversity among the hanzaki is smaller than it might be, partly as a result of the repeated polygamy, which in turn leaves them more prone to damage through environmental change.
But for the moment, it seems to work.
Outside the breeding season, the salamander's life appears to consist of remaining as inconspicuous as possible in the river (whether hiding in leaves, as the small ones do, or under the riverbanks like their larger fellows) and snapping whatever comes within reach, their usual meandering torpor transformed in an instant as the smell of a fish brushes by.
The adults' jaws are not to be treated lightly.
Among Dr Tochimoto's extensive collection of photos is one of bloodied human hands; and as he warns: "you may be attacked and injured; please be careful".
Giant salamander
The giant Maniwa hanzaki brought gasps from experienced amphibian-watchers

When the chytrid fungus was identified just over a decade ago, indications were that Japan would be an unlikely place to look for its origins.
With the discovery of chytrid on museum specimens of the African clawed frog (Xenopus laevis), an out-of-Africa migration spurred by human transportation of amphibians once seemed the simple likelihood.
But just last year, a team of researchers led by Koichi Goka from Japan's National Institute for Environmental Studies published research showing that certain strains of chytrid were present on Japanese giant salamanders, and only on Japanese giant salamanders, including museum specimens from a century or so back; and that the relationship seemed benign.
AMPHIBIANS: A QUICK GUIDE
Black-eared Mantella. Image: Franco Andreone/ARKive
First true amphibians evolved about 250m years ago
There are three orders: frogs (including toads), salamanders (including newts) and caecilians, which are limbless
Adapted to many different aquatic and terrestrial habitats
Present today on every continent except Antarctica
Many undergo metamorphosis, from larvae to adults
The hanzaki-loving strains of chytrid appear to differ from those that are proving so virulent to amphibians now.
Unravelling all that, says Don Church, might tell us something about the origins and spread of chytrid - and there is so much diversity among Japanese chytrid strains that the country is now being touted as a possible origin, as diversity often implies a long evolutionary timeframe.
More importantly, the discovery might also provide options for treating the infection.
"In the case of the North American salamanders, what was found was that they have bacteria living on their skin that produce peptides that are lethal to the amphibian chytrid fungus," says Dr Church.
"And those bacteria might be able to be transplanted to other species that can't fight off the fungus."
This is a line of research that is very much in play in laboratories around the world.
It appears likely now that studies of the Japanese giant salamander can expand the number of chytrid-fighting bacteria known to science, and so extend the options for developing treatments for an infection that currently cannot be controlled in the wild.
But that can only come to pass if the giant salamanders endure; something that is not guaranteed, with the challenges they face in modern Japan including, perhaps, new strains of chytrid itself.
There is as yet no modern hero able to still the pace of habitat loss or prevent invasion from rival species.