My PhD research was in the field of synthetic biology, with Timothy Lu at MIT.
I invented a method to generate and screen combinatorial genetic perturbations in O(1) time. Here’s what that means and why it’s significant.
First, the basics. Genes are the foundation of our biology. They code for proteins that have functions in our cells and bodies, and these proteins interact with each other in dense networks. We understand only a small fraction of all the functions these genes likely have, and especially little about how they relate to disease. For example, the mechanisms of diabetes and heart disease are still frustratingly poorly understood.
You can study the function of genes by artificially turning gene expression way up or way down (like a volume meter) in cells. For example, if you have a mysterious gene that you turn up, and suddenly the cells show signs of cancer, then there’s a decent chance the gene has something to do with cancer.
A powerful way to study biology is to run these studies across every single gene that you have. Humans have about 20,000 genes. It’d be really cool to crank each of these up, individually, and see what the effects are. For example, you might find that 300 of your 20,000 genes lead to a diabetes-like state – this is now a great starting point to dig deeper. This is called genome-wide screening and has already been heavily explored.
The more difficult and interesting question is – how do combinations of genes cause effects? Learning about single genes is useful, but remember that genes belong to networks, and networks are what drive biology. If you can study how combinations of genes cause disease, then you get a much clearer picture. For example, out of your 300 genes that led to diabetes above, if you found that 10 pairs, when cranked up together, led to a super-severe diabetic state, then this would be an even better place to start. You’d also find surprising combinations of genes that interact with each other that you couldn’t have predicted a priori.
Now here lies the problem – creating and studying these combinations is incredibly labor intensive. Using traditional techniques, you would need to create and study every single one of those combinations separately. For example, using the most brute-force method, if you want to create pairwise combinations of every gene with each other, that would require 20,000 x 20,000 = 400 million separate experiment setups. This would require equipment on the size of a football field, let alone the prohibitive costs of reagents and labor. (In computer science terms, this is O(n2) time).
And higher-order combinations above pairwise – forget about it. A 3-wise combination would require 8 trillion experiments, and a 4-wise combination would require 160 quadrillion experiments. Thus, if you wanted to run an m-wise combination experiment, this would require O(nm) time.
And so, finally, my research (as with much of science, it takes a long time to explain why the research actually matters).
In short, I invented a method that allows you to generate large combinations of genes and study their effects in an incredibly short period of time. Specifically, creating the initial library of genes take O(n) time, and scaling to each combination afterward takes O(1) time. This is a big improvement over O(nm) time.
Functionally, this means:
- the most laborious step is the first one: creating the gene expression units. For every gene you want to study (and crank up or down) you need to create it manually. Say we want to study 2,000 genes – this will require creation of 2,000 separate gene units. (O(n) step)
- to create a pairwise combination, we need to do only ONE reaction in ONE tube. This creates all 4,000,000 pairwise combinations in one combined population.
- to experiment with this population, we just need to
- to create a 3-wise combination, we again need to do only ONE reaction in ONE tube. This creates all
Meaning, you could create 1 million 3-wise combination variants in a single afternoon in one reaction, instead of requiring a million separate reactions. Afterward, you can study the entire population in a single tube, instead of needing a million separate reaction wells.
The technical details of this are:
- each genetic construct (eg an inducible overexpression or knockdown unit) is created in a plasmid and labeled with a genetic barcode
- the barcode and the construct have restriction sites in specific places to allow splicing in of other barcoded constructs
- after a digestion and ligation reaction, the barcodes are now in tandem, as are the genetic constructs. This graphic will clarify things:
- this reaction can be iterated to create 3-wise, 4-wise, 5-wise, etc. combination libraries
- the barcodes in sequence can be read via Illumina HiSeq, allowing one-pot reactions and assays
- these plasmid libraries are transformed into cellular populations
- by comparing an experimental induced population with a control population, you detect changes in the frequency of barcode combinations across the population. This can suggest synergies between genes specific to your phenotype
- for example, if you were interested in proliferation as a phenotype, genetic combinations that became over-represented would suggest a boost to proliferation, whereas combinations that disappeared may represent toxic combinations
- this is a generalized technique applicable to cells of your choice, from bacterial to human
Using this method, you can find unexpected synergies between genes, as well as study interactions at the network level.
Limitations of the Technique
The above description is a simplification, and as with all techniques there are limitations that cap how many unique combinations you can really study:
- the transformation/transfection efficiency limits how many unique library members you can get into a population. In bacteria, you might get 106 transformants per reaction, which means even if you have 108 unique library members in plasmids, most won’t make it into the population. You can of course run more transformations, then pool the population to form your experimental population
- the Illumina HiSeq currently caps at around 5 billion reads, and to detect differences in your population, you’ll want to be able to sample each unique combination multiple times
So, for now, it’s still difficult to assay trillions of combinations, but you can get into the hundreds of thousands and millions reasonably.
Where’s the Research Now?
I published my thesis work in PNAS in 2013, where I looked for combinations of genes that would overcome antibiotic resistance. We also patented the technique, now called “massively parallel combinatorial genetics” or “CombiGEM – Combinatorial Genetics en Masse.”
CombiGEM has since been applied to chemotherapy resistance and to CRISPR for discovery of effective drug combinations in the Lu lab.
There are more cool applications of CombiGEM that would be cool to do:
- combine targets of FDA-approved drugs, to discover unexpected drug synergies to treat disease
- discover combinations of transcription factors that lead to stem cell differentiation to your endpoint of choice. This is usually done manually in a labor-intensive, 1-by-1 gene elimination method
- combine genes implicated in GWAS studies to find interactions that lead to disease. This helps address the issue that GWAS genes don’t explain much of the heritability of disease
Healthcare’s a tough industry to innovate in, requiring the combined expertise of technologists, health practitioners, and business operators. But these people rarely have chances to intersect. This means builders don’t know what health problems to solve, while practitioners know what problems exist but don’t know how to create a sustainable solution.
So in 2011, I co-founded Hacking Medicine with Elliot Cohen and Zen Chu, creating one of the first health hackathons. The idea was simple – bring engineers, doctors, and businesspeople together over one weekend to hack on health problems. Few projects actually became real companies, but the important result was creating a community around healthcare innovation and building working relationships that last for years.
Since then, Hacking Medicine’s grown to become an incredible community of healthcare innovators worldwide. The organization partners to put on dozens of health hackathons worldwide, as well as putting on the flagship annual Grand Hack at MIT.