Monday, April 28, 2008

Help Wanted

Things are moving along very quickly here at work and we've hit our first big milestone. It kind of feels like we're just over the top of the first big climb on a roller coaster... things are about to start getting wild! There is a lot of software work to be done and it looks like we have a number of openings in software development that we're hoping to fill soon.

In general, we're looking for bright, self-motivated, and effective people that can come on board and happily pick up some new projects and jump right into making some software. In particular we're looking for Java and C++ programmers for a variety of projects including production database management, image processing, and algorithm development to name a few.

The job postings are not yet up on the website, but they should be posted some time this week. If you know a person interested in a new opportunity with a cool and fun company please direct them either to myself or to the hiring manager Erik.

Wednesday, April 16, 2008

Here comes the science, Part 3

Before I went on hiatus it was just about time to talk about the software that I've been working on and how it pertains to the process of working with optical genome maps and actually doing something interesting with them.

Building a database
As I mentioned in previous posts the most interesting thing you can do with an optical map is to compare it to other maps of similar genomes for the purposes of looking at similarities and differences. But in order to do that you need a repository of maps and way of categorizing and searching that repository to find what you're looking for. We didn't have anything like that at the time I started so it was the first thing I worked on and we now have a nicely categorized, searchable database of over 40,000 genome maps.

Making maps in software
Creating optical maps is currently a time-consuming process and we'd need a lot of people to make 40,000 maps by hand. The vast majority of those maps that we have in the database are what are called "in-silico" maps, which is a cutesy way of saying that they were made in software. When you think about what mapping is, you're taking little bits and pieces of DNA and cutting it up with an enzyme and then measuring the fragments that get created. We don't necessarily know what the actual DNA sequence of that genome is and it's actually irrelevant for the purposes of creating optical maps (which can be very helpful, which I'll describe later). However there are plenty of people out there who are working hard at sequencing the genomes of all sorts of organisms. We can take those sequences (the literal nucleotide sequence, e.g. ATCGGACT) and simulate the process of applying a restriction enzyme to cut that sequence into fragments to create in-silico maps. Luckily someone out there already wrote libraries for doing these sorts of things so it was pretty easy to use that code to populate our database.

Comparing maps
Really the critical functionality of the software I'm working on is the ability to compare maps to each other. In a nutshell we compare maps by looking at the series of fragments in each map and use some complicated math that I'll likely never understand to figure out whether or not they're "close enough" to each other to confidently say that they probably represent the same underlying DNA structure. While it's very likely that the actual DNA sequences are different in some respects, those differences are small enough that they don't show up at the map level. And we can reasonably assume that these are regions of similarity between the genomes. Using maps these similarities and differences are very easy to visualize.. here's an example of a couple of similar strains of P.aeruginosa:


The purple parts are regions of similarity while the white parts represent regions that do not appear to be similar at all. It's immediately obvious where these particular strains differ and where they appear to have common structure.

Extracting meaning
All of these leads up to my final point which is how you can use this software for comparing maps to extract meaning. As we know, the DNA structure of organisms dictates what they look like and what they are capable of in the physical world. In our particular realm we're mainly looking at bacteria.. specifically bacteria that make people sick. There are a lot of species of bacteria that make people sick and, within those species, there are several sub-species or strains that act differently. Some are particularly nasty, some are immune to certain antibiotic medications, and some are just run-of-the-mill . Since these strains are all of the same species they (frequently, but not always) end up sharing a lot of similar DNA. So by comparing maps of the different strains you can fairly easily see places where the DNA structure differs and that can really help you isolate the region of the genome that is cause a particular strain to be especially nasty.

I guess that's about it for now .. this is getting pretty long. I may revisit this topic a little later as more code gets written and I start in on more new things. Right now I'm kind of in the middle of a round of bug-fixing and polish and that' s just not that interesting!! Bye for now.

Brushing the dust off

It's embarrassing to say it, but it's been almost 3 months since my last blog post. I've had a few people comment on this and my wife actually asked if she should remove my blog from her RSS aggregator... ouch!! Well my only comment is that I took a while off when our daughter, Sylvia, was born in early February and I got out of the habit. But I'm back in the saddle now and looking at getting back to moving forward.