Human Genome Structural Variation, Disease, and Evolution – Evan Eichler
Articles Blog

Human Genome Structural Variation, Disease, and Evolution – Evan Eichler

[applause] Dr. Evan Eichler:
So it really is, indeed, a pleasure to be here. I was here I think five years ago last
time and my message has stayed — maybe not hopefully — but has stayed very much consistent.
As Jim alluded to, our lab has been focused on what some people have called the darkest
matter of the Human Genome. We’re specifically interested in regions that changed very, very
rapidly, specifically within the human species; areas of the Genome that have been proven
to be dynamic both in terms of structure, their organization, and in terms of their
evolution. As he mentioned, specifically we’re focused,
as one aspect of that study, on regions of the Genome that are highly duplicated. And
so these come in two different flavors. These are duplicated sequences that are either duplicated
within a chromosome known as intrachromosomal duplications or duplicated between non-homologous
chromosomes known as interchromosomal duplications. The reasons we’re interested in this are
really two-fold. One, these are dynamic by the fact that they have sequence identity
at very high levels to not only the comolgous recombination promoting additional rearrangement
events at the specific sites. The second reason is that if you believe the work of Susumu
Ohno and others duplication is the primary force by which new genes and gene families
evolve. So we’re interested in these regions really from the perspective of dynamic mutation,
De novo mutation associated with disease, and second from the perspective of the evolution
potentially of new genes and gene families within human. And both of those topics I want
to discuss today. So just to summarize the work that came from
really analyzing the whole Genome, or really the finished Human Genome, this is the pattern
of the largest and most identical duplications within our Genome; the blue lines representing
these large blocks, greater than 95 percent, greater than 20 kb in size, of intrachromosomal
duplications. And you’ll notice from this that a lot of our duplications are essentially
interspersed. So if you look at chromosome seven, you find a lot of the paraways [spelled
phonetically], the large ones, are separated by megabases of sequence. If you add the intrachromosomal
pattern you get something like this. So this is the pattern of the Human Genome that has
been relatively constant to the new assemblies. And most of this data, I should point out,
came from back-based sequencing. This is the sequencing of large insert clones. If you
go back and look at some of the first published whole Genome shotgun assembly versions of
the Human Genome, these areas of the Genome are completely missing. The important point
is about 60 percent of the large duplications within our Genome are interspersed. That is
to say, they are separated by at least a megabase from their nearest neighbor, or they map to
another chromosome. If you contrast this with some recent data that we’ve done with Deanna
Church and looking at kind of the comparable finished version of the mouse Genome, this
is the pattern that you see for the most identical and the largest duplications within mouse. So in this, the total amount of duplicated
sequence now turns out to be very similar to what we saw initially with human; roughly
five percent of the Genome. But you’ll notice two things. The actual locations of these
are fewer in number, so there are about half the number of sites in the mouse Genome that
are highly duplicated. And the second thing you’ll notice is that essentially most of
the lines, the blue lines here, which indicate intrachromosomal duplications, are right on
top of one another, suggesting that most of the duplications in mouse, about 82 percent
of them, are tandem, that is to say, clustered, in orientation. So this difference between
man and mouse in terms of finished back-based sequence assembly has important ramifications,
both in terms of evolution, the fact that you can juxtapose different pieces of DNA
creating complex configurations that you don’t see in close-related species, and also in
terms of disease. So its importance in terms of disease comes
from really some of this seminal work from Jim Lofsky [spelled phonetically] and others
in the early 1990s, and the idea is very straightforward. If you have duplicated sequences within a
Genome you can trick the recombination machinery during meiosis to recombine where it shouldn’t.
So here’s showing two of the four chromosomes aligning during meiosis, the duplicated sequence
shown in green, and anomaly re-homologous [spelled phonetically] recombination, also
known as an equal crossing over event occurring, leading to gametes that have accumulated an
additional copy of that duplicated sequence or have lost a copy of that duplicated sequence. So the really important part is that if these
were essentially interspersed, imagine intrachromosomal duplications now with unique sequences encoding
genes A, B, and C, genes A, B, and C get taken along for the ride. So in addition to producing
gametes that have additional copies of that duplicated sequence, we now have gametes that
have additional copies of genes A, B, and C, and we have gametes that have lost copies
of A, B, and C. And so those genes are triple sensitive, haploid sufficient, or imprinted,
the result is disease. And so there are at this point about 30 different syndromes in
the human population that are caused precisely by this mechanism. It is not really a genetic
disease, because it doesn’t have to be transmitted. There’s something that goes on in all of
us as we sit in this room and we produce gametes, either egg or sperm. And an architecture that
has a lot of these intraspersed configurations is obviously going to be predisposed to these
types of events at a much higher frequency. So these are some of the diseases. I’m sure
many of you have heard of some of them: Velo-Cardio-Facial DiGeorge, Williams Syndrome, Prader-Willi
and so on. There are two interesting aspects about these diseases, if you look at them.
So shown here is the actual size of the duplication, which is immediate in the rearrangement. And
the important point here is that most of the events are large. The duplicated sequences
have to be greater than 10 kb, often greater than 100 kb in size, to mediate at a high
frequency of De novo event. The second point is that the degree of sequence identity is
also very high. So typically most of the diseases are caused by duplications that are greater
than 95 percent, and the vast majority are greater than 98 percent. And the third component
of these diseases, which you can kind of see here, is that the vast majority of diseases
that have been described thus far involve some type of neurologic component: either
peripheral nervous system, or central nervous and cognitive deficit with these kids. So the hypothesis was very straightforward.
If we had a beautiful duplication map of the Genome, which was born on the sweat of a lot
of wonderful people working on this project over the last 10 years, could we use that
as essentially a morbidity map to predict the sites of disease associated with these
specific regions? And, specifically, could we focus on children with mental retardation
to find new diseases, previously unknown? So this is this duplication map I showed you
again. So it was not just a quality control exercise for the Human Genome project, but
we actually viewed it as a disease map. And so, here is our roadmap. All of the gold bars
represent blocks of sequence where the architecture is such that you would be believe to be a
high frequency of De novo mutation based on very large, very identical sequences at these
positions. So there are roughly 130 regions at that time, of the Genome. Twenty-three
at that time, which are the gold bars with letters behind them, were ones already associated
with disease, and we were betting, at least the subset of those remaining regions would
be associated with De novo disease in the human population. So the way we did this — this is kind of
old technology now, but we began this work about two and a half years ago — we targeted
all of our regions that had at least 50 kb of unique sequence less than five megabases
that were flanked by duplications greater than 95 identity and greater than 10 kb in
size. We took backs from these regions and we built a specialized microray which contained
about 2,000 backs from these roughly 130 regions of the Human Genome. We spotted them on a
microray and we simply would test and give a normal DNA sample labeled with one florachrome
against a diseased individual labeled with another florachrome, and looked for sigma
intensity differences based on hybridization to this chip as evidence or gain of loss of
that specific region. So, in terms of the study populations, we
used a normal control group, which people have argued maybe isn’t the best normal
control group, but it was what we had available at that time, which included all of the HapMap
samples, as well as an additional diversity panel of roughly 45 individuals. So we used
these normal individuals to establish the normal pattern of variation within individuals
without disease, or at least without disease associated with mental retardation. So I’m
not going to go over those details, other than to say that we found lots of copy number
variation. So, harkening back to something that Claire mentioned, the Human Genome had
riddled with copy number differences and gains and losses of sequences in different individuals.
We then focused on a collection of kids that essentially the clinical community, or at
least diagnostic community, had given up on. There’s roughly 500 children: children which
had been tested for Fragile X had come back negative, children tested for [unintelligible]
rearrangements, and children whose carrier type was normal for testing, using this platform. So, some of the results. So after screening
the normal collection then following up with studies of these three, roughly initially
the first 291 children from Oxford, we found regions of the Genome that look like this.
So what you’re looking at here is a log two relative hybridization intensity plot
for four different individuals. These are all children with mental retardation. And
we’re looking for things that deviate from the log two ratio of zero, which would be
no difference. And you’ll probably notice that there’s
a lot of noise over these regions, which is because about a third of our probes were actually
selected right from the duplicated regions, where the denominator really isn’t 2n, but
is actually more than that, so this actually creates some background. But clearly there’s
something different about these four kids. They have essentially about five backs that
are apparently showing evidence of microdeletion in a region that we never saw once in a normal
control group of study. These were validated by fish. I think the most probably interesting
aspect is that we could actually go back now and do a more high density [unintelligible]
nucleotide customized micro race instead of using five backs in the region, we designed
now 11,000 [unintelligible] over that specific region and really confirmed to see whether
the breakpoints were identical. So it’s showing here those four children
once again. This is the log two relative hybridization intensity depression shown here in terms of
Log Two, indicated by significance in terms of when you see the red signal. And what you
can see here is a couple of things. First off, if we compare the affected child with
that of the parents, so this is one of the children compared to mom and dad, you see
mom and dad are normal over in that area, but the child has a deletion of roughly 450
kb, precisely at that site. You also noticed here, this is the segmental duplications.
These are very large, highly identical duplications, which chair about 99 percent identity over
100 kb in size. So the duplications are demarcating the boundaries, or the breakpoints, roughly.
But you’ll also notice when you look at the regions contained underneath these duplications,
you see a lot of variation in the normal population, as well. The important point here was that we had essentially
an identical, critical region in four children identified from this study of mental retardation.
All of them had haploid sufficiency [spelled phonetically], at least that’s our model,
and in fact all of them that we’ve been able to test so far were De novo events. In
other events, parents did not have this lesion; this was seen specifically in the kids. On top here are some of the genes and then
there’s five genes mapping into that region. We don’t know which causes the disease but
obviously there are some great candidates. One of the most interesting is MAPT, also
known as Tao. It’s a gene in which point mutations have been associated with Parkinson’s,
Alzheimer’s, and frontal temperate dementia. So we’re screening now patients which have
essentially phenocopy in terms of disease and looking for point mutations. I’d like to just emphasize and make this
note, that even though we screened only 300 kids, this was roughly of the idiopathic collection
that we looked at, was roughly one and a half percent of the total in terms of disease. These are what the kids looked like in collaboration
with our former competitors, Bert de Vries in Holland. We’ve had the opportunity to
look at roughly now 21 children, all of them which have the microdeletion. Nineteen of
them, we’ve been able to look at parentals and show the De novo events. And if you look
at these kids you can see there’s some similarities in phenotype. One of the most pronounced,
believe it or not, is this very bulbous nose that you see in almost all of the kids. You
see a pronounced philtrum, sometimes protruding tongue, as well as a fairly happy disposition,
which has actually been noted in many of the clinical records. So the children have a better
outlook than most of us in terms of life. And, in fact, we’ve now been able to go
back and identify from De novo collections being able to show clinicians the data being
able to identify additional kids using this approach. So one of the interesting parts of this particular,
what we think is a new, deletion syndrome, is that the exact some region that we identified
as being deleted in the human population was a region that was described a year-and-a-half
earlier by Curry Stephenson [spelled phonetically] from Decode as being a site of a common inversion
polymorphism in humans. And shown here is the region once again blowing up. This is
actually looking at the CEPH Diversity Panel, and the black indicating the frequency of
that inversion. So that inversion is essentially restricted largely to Caucasian populations.
Both European and Mediterranean populations have this inversion, most common. You’ll
see, once you get into Africa and Asia and India, you see very low frequencies of this
inversion. Their data suggests that this inversion, for completely reasons, was associated with
increased fecundity and associated with increased combination in these populations. That was
based on genealogical data from I believe the Icelandic population. So we went back and we looked at our kids
to see if they came from haplotypes that essentially carried the inversion, and to date 19 out
of 19 cases all come from this inversion haplotype. So I want to make it clear that we don’t
know necessarily whether it’s the inversion that’s predisposing to this microdeletion
event or it’s something else on that haplotypic background which may be predisposing. But
the data are overwhelming that this inversion polymorphism, which is ethnically stratified,
is essentially predisposing, or the inversion haplotype is predisposing to disease. So this
obviously has some ramifications. One of the ramifications would be that this is largely
a Caucasian-specific idiopathic mental retardation syndrome. And our screening so far of African-Americans
has shown no cases in a screening of 500 kids of this particular deletion. That wasn’t the only one we found. So here’s
another region on 15Q24.1, 24.2. Four megabases in size. These are the children and these
are their actual genotypes based on a [unintelligible] GH over [unintelligible]. Breakpoints in three
of the four cases occur precisely at regions of high sequence identity. In these three
cases, we know that each of these events is De novo. And these kids are fairly high functioning.
They have IQs of around 65 to 70. They have been described as Autistic Spectrum Disorder,
but they have extra features such as a growth deficiency. Here’s yet a third example. This is distal
to the Prader-Willi Region on 15Q13.3. Our initial screening we skipped over this region
and that was because of our criteria. This index patient here had a breakpoint between
breakpoint three and breakpoint five. Actually, it was not a De novo event. So when we looked
at the parents, the mother actually had this very large deletion over this region, however,
it turned out that the mother also had mild mental retardation as well as epilepsy. So
we screened this one and we got two additional cases that came in. Both of these cases were
smaller. They were between breakpoint four and breakpoint five. These particular cases
were both De novo and in both of these cases there’s also mild mental retardation or
a developmental delay and epilepsy. We don’t know for sure if this a genetic disorder but
we’re betting it is. What’s particularly interesting is that there is one gene located
here, [unintelligible] seven, which is a gated ion channel gene, which has been associated
or at least has been implicated and I don’t think ever proven to be associated with myoclonic
epilepsy. So we believe that haploid sufficiency of this region also causes disease and once
again the breakpoints are mapping to these very large, highly identical duplications. And the last example that I’ll show you
is an example of recurrent deletion not associated with mental retardation. So we’ve now moved
outside of kids with mental retardation and started screening kids with other types of
pediatric disease. And so this is analysis of some of those children. This is a collection
of roughly 80 pediatric patients with renal disease that have been screened. What we found
in this particular case was once again a De novo deletion. We should point out that all
of these cases are De novo with respect to breakpoints embedded right within the segmental
duplications. What’s particularly remarkable about this disease is that at least in terms
of the studies that we’ve looked at and the samples that we’ve looked at, and this
is largely with Christine Belan Chantello [spelled phonetically] at Paris, it accounts
for about 20 percent of pediatric patients with renal disease that they have in their
collection. So it’s a very common, what we think is a common, microdeletion. Interestingly
enough, it’s never been observed once in a controlled group of 927 individuals. And
interestingly, on top of that is essentially that about 36 percent of children with maturity
onset diabetes of the young Type Five also have the same microdeletion. There is a gene
in this region, TCF2, the transcription factor, in which point mutations have been shown to
be associated with both renal disease and MODY Five diabetes. So, in summary, we’ve actually looked at
now a large number of kids, particularly from the IMR Study. These numbers are based largely
on the initial 300 set from Oxford. And in these patients we identified what we think
are roughly 16 sites of novel structural variation. I wouldn’t claim that the majority of those
are causative, but I do feel comfortable saying that we do have at least three novel genomic
disorders in which we have De novo event recurrence and we have phenotypic similarities that actually
allow us to assign this as a new disorder. We have one example of a microdeletion event
associated with diabetes and renal disease. And I’d be willing to hazard a bet that
if we screen more children with more forms of pediatric disease, we’ll actually find
additional genomic disorders associated with a wide range of phenotypes. I’ll just leave this one slide here as an
example of why I think this is so important. We just finished screening using the Lumina
platform with Debbie Nickerson and Greg Cooper in my group, a large number of normal individuals.
These are individuals that came in essentially for lipid testing as part of a study known
as the “Park Study” [spelled phonetically]. And shown here is essentially hot spot regions
that we find deleted or duplicated within this normal control group. So shown here are
the duplications in pink, and the deletions are shown here in blue. These are the number
of chromosomes from this collection of roughly 1,920 chromosomes that were shown at various
frequencies. So here is the absolute number, here’s the one percent frequency cut off,
and here are a bunch of events that are roughly .1 and .2 percent frequency. So, coming back to a point that Richard made.
Two issues that I want you to think about. Roughly in this group we have six to 12 percent
of normal individuals having big deletions, precisely over regions that are non-allelic,
homologous, recombination, predisposed. We have an excess, which we don’t understand
why in terms of deletions versus duplications, but we have an absence of things that are
around the one to two to three percent frequency. I would bet that these are being fed by De
novo mutations at a high frequency, in the more normal pool. And the question remains
open: what is the impact of these in terms of disease or susceptibility? So one of the things you might ask yourself
is, why? If you think about the mouse Genome architecture and human, why do we have all
of these large blocks of interchromosomal and intrachromosomal duplications, if they
predispose 10 percent of our Genome to microdelete and microduplicate at a high frequency? Well,
so we tried to address this question over the years, and maybe I’ll kind of go over
these fairly quickly. But the idea and one of the important things to realize about the
duplication architecture in these regions is it’s not just one piece of sequence.
It’s essentially heterogeneous, made up of many different parts that have had different
evolutionary histories and trajectories. So this is one of roughly the 400 regions in
our Human Genome, and each of the colored in grey represent regions that are duplicated.
So basically this full 790 kb stretch of DNA is entirely duplicated. When you actually
reconstruct the evolutionary history of this region, what you find is that everything in
color we’ve been able to show comes from a different area of the Genome. So we have,
essentially, this hodgepodge mosaic over these specific regions made up of all of these parraways
[spelled phonetically] alignments from all over the Genome. To complicate matters, these
regions then duplicate between these large duplication blocks and can share large blocks
of homology in common with another. And these are the types of events, the secondary events,
that are actually predisposed to microdeletions and microduplications associated with disease. So we have this architecture; can we systematically
reconstruct the evolutionary history of these regions? And so, working with Pavel Pevzner,
we came up with an approach to look at all of the individual ParaWise alignments within
the Human Genome that make up these duplication blocks, decompose them into minimal evolutionary
shared segments so we could break all of these parraways alignments into individual subunits,
or duplication subunits. And then, using data largely from work from UCSC, basically compare
these regions of the Human Genome, all of the duplicated positions, to see if we can
identify the ancestral segment from where the duplication began. Therefore, provide
directionality in terms of the duplications. And here the logic is pretty straightforward.
Most of the duplications that we’re studying are primate-specific. So if we look at out-group
species, such as rat, dog, and mouse that should not have these regions duplicated,
we should see a single hit. Moreover, because the human copy that’s ancestral moves by
this multiple step procedure in terms of duplication, we should see more autologous anchors between
the human and mammalian out-group sequence. So using this approach we defined the ancestral
origin for 67 percent of the duplications within the Human Genome. We confirmed or validated
by fish to see if we really could identify these ancestral origins. So we take an out-group
species, we take a probe that comes from the derivative locus, and we hybridize to see
if it goes back to the right spot that we predicted. That confirmed, in this case, a
relatively small number of experiments in a matter of 12 times. We then also compared
our experimental maps, which we had generated over the years before with our Insilco prediction
with Pavel, and you can see that there’s pretty good correspondence between the dupocons
[spelled phonetically] that we identified. So what did we learn from this analysis? So
here’s the part that we learned. If we start looking at these intrachromosomal duplication
blocks that cause disease what we find in almost all cases, maybe with one or two exceptions,
is that shown here is a map of the duplication blocks. So these are all of the duplication
blocks that have emerged in the last 25 million years on chromosome 15. About a third of these
cause disease. One of the things that we find is that located almost precisely in the center
of these blocks is a common sequence in about 90 percent of the cases, at least for this
specific chromosome. This is what we call a core duplicon. It has a number of interesting
properties. It’s the most abundant and most ancient, as you might except in terms of duplications.
Even though these have all heterogeneous histories, it is common to the vast majority, seems to
be the focal point for intrachromosomal duplication formation. Cores are frequently duplicated
as solo elements in the Genome, but rarely are the flanking duplicons. So the flanking duplicons almost always exist
in association with a core. And when you look at the cores they are enriched four to five
fold for both genes, at least annotated genes per base pair, as well as ESTs. So these seem
to be the most transcriptionally active, most dynamic areas of the Genome. When we compare
those cores, and we find them on about a half a dozen human chromosomes that have experienced
this burst of intrachromosomal duplication, what we find is that these cores are often
associated with Great Ape and human specific gene families that have been described in
the literature over the last five or six years. We described one of the first, called a nuclear
pole interacting protein, which evolves about 50 times faster than most normal genes, at
least based on DNDS ratios, and there’s a number of other genes that have been described.
The common features of these genes is they do not have orthologs in mouse. They have
multiple copies in human in chimp. They show dramatic expressions and changes in their
expression profile when compared to these out-group species, such as baboon or macaque.
And at least three of the four examples here show signatures of positive selection, and
in two cases very dramatic examples of positive selection. So, I’ll just finish off by actually sharing
with you some of the work we’ve been able to do with Eric and NISC, particularly in
this regard. Because these regions are so complicated, we really can’t get a handle
on their architecture from looking at whole Genome shotgun sequence assemblies of chimpanzee,
gorilla, macaque, and so on. So working with Eric we’ve been able to actually target
these regions and re-sequence them systematically in a number of primate species. So shown here
is another core region, just to give you an idea. These are all of the locations of these
cores, and this is — or I should say these duplication blocks. So this is about 250 kb
in size and there’s this core of roughly 20 kb, which is in 14 out of 16 of the blocks
that are shown on this chromosome. This is a core, which is particularly interesting,
as it has a very rapidly evolving gene family embedded within in, which is the nuclear pore
interacting protein. So that if you look at the actual degree of sequence identity and
sliding windows across this region of the Genome, this is actually comparing any two
copies, you will find troughs and peaks in terms of the sequence identity. And what’s
most remarkable is that these troughs correspond precisely to the position of exons. So this
is this eight exon gene, with no known function. And the other thing I’ll just point out
is that 98 percent of these changes have resulted in amino acid changes between the copies.
This is an extreme example of positive selection. So working with Eric we drilled down and looked
at a lot of other copies in other primates, particularly focusing on gorilla, chimpanzee,
orangutan, and baboon. We sequence annotated all of the sequences that we got back, both
experimentally and computationally, and then we reconstructed the phylogeny of these segments.
So I hope you don’t go blind, but this is the actual phylogeny, shown here. This is
based on a neighbor joining analysis of two kb of non-coding sequence for the core. And
shown here is the structure that you see with HSA representing human, PTR representing chimp,
GGO representing gorilla, and so on. So what we get from this bewildering complexity over
these parts of the Genome are really a couple of things. Number one, all of this architecture
that we now see, which we now know causes disease, is about 10 million years young.
So all of the events have occurred in the common ancestor of chimp, human, and gorilla,
or immediately after the separation of those species. The second thing, which I think is
really, really interesting, is that when we look at orangutan, we find none of the core
— we see the core once again present, but we see completely different flanking duplicons,
which are unique in human and all of the other Great Ape species. So orangutan has done the
exact same thing that our Genome did seven million years, ago using the same core, but
has actually picked up completely different flanking sequences which are unique in chimpanzee,
unique in gorilla, and unique in human. So this tells us that this core is actively
transducing, we think, segments of the Genome around. And just to give you a perspective
back now 25 million years ago, these are all of the pieces that in human look like this
— oh, sorry. So this is the architecture that we see in human. Each one of these blocks
of sequence are essentially unique in baboon, they’re unique in macaque. So we think these
all began as unique copy sequences, with the core beginning to jump probably about 20 million
years ago, pick up flanks, and continue to grow, such that it now occupies LCR16A and
its associated duplicons, about 10 percent of the euchromatin of human chromosome 16.
At least, in this case, 16P. With large insert sequences, we can also map the locations in
orangutan. And so this is the orangutan picture. This is human 16. Very limited activity on
human 16. But here you see on chromosome 13, the core has essentially jumped, jumped to
a new chromosome, and begun to do its dance again on these particular chromosomes, creating
a very complex architecture once again on chromosome 13, interspersed at duplications
distributed across, in this case, chromosome 13. So here I think the important point is
the cores are mobile. They can jump to new chromosomes, and they can actually transduce
flanking sequences as part of its trajectory. I don’t have time to go into all of this
data; just summarize what we know about this particular core. We know that it began as
a single copy sequence about 25 million years ago, and data that I don’t have time to
show you indicates that it was actually testes specific. It was expressed only in the tests
and it showed no evidence of selection by any of the normal tests looking at KAKS. So
this is a little bit heretical, I think, because most people would teach you that all genes
are born from other genes. Our data suggests that this thing was born from a transcript
that was probably neutral in terms of evolution. Then about 25 million years to 12 million
years ago, in the common ancestor of orangutan, gorilla, and chimpanzee in human, it began
to move, it began to duplicate. Some copies on this lineage duplicating specifically heavily
in chromosome 16 and here duplicating on chromosome 13. At this point, when it began to duplicate
based on expression analysis in orangutan, chimpanzee, and human, we see ubiquitous pattern
of expression. It’s expressed in every tissue in orangutan and in human that we’ve ever
analyzed to date. So we’ve looked at about a dozen in orangutan and about 32 different
tissues in human. Between seven and 12 million years ago, not
on this lineage but on the African Great Ape human lineage, we see extreme positive selection.
KAKS values on the order of, like, 10 compared to the Old World monkey sequence, suggesting
that at that point some mutation must have occurred to lead to an open reading frame
that was essentially selective and then became fixed at a very high frequency in the population.
So the 98 percent amino acid changes that I mentioned are occurring right here at this
branch. So we believe that the movement of this core
led to the emergence of a novel gene family about seven million years ago; probably one
of the youngest and most rapidly evolving genes in the human species. So in summary, I’ve talked about the architecture
of the Human Genome, with respect to these large blocks of duplication, I talked about
how complex they were, and specifically showed you some examples of how these complex, all
of this complex architecture can predispose to De novo, large deletions, probably of huge
significant effect or selective effect within the population. Our targeted approach has
uncovered four new microdeletion syndromes, and we’ve shown that they’re recurrent
De novo. And I think the question remains unanswered: what is the importance of this
mechanism toward complex disease? Because if you think about it none of these events
can be tagged using a tag SNP. Because they’re De novo, they’re occurring in some cases
on different haplotypes. Then I talked about the evolutionary significance
of these regions; particularly the core architecture that we think has emerged to account for the
expansion of intrachromosomals within the human Great Ape lineage. And particularly,
I’ll leave you with this kind of final thought, maybe the negative selection of these microdeletion
and microduplication events that exist in our species may be partially offset by the
positive effect of having newly minted genes, many copies of them, at new locations. And
if you think about it, there’s a huge challenge ahead, even though they’re few in number,
is to work out what are the functions of these types of genes that don’t exist in out-group
species and that are embedded in these very complex regions of the Genome where STS and
SNP mappers fear to tread. [laughter] And we hope to continue hopefully — I think
my students say it’s going to be on my epitaph, that he found these genes but never actually
showed the function of a single one. Maybe five years from now I can come back and share
with you some evidence that they’re actually functional. So, acknowledge these folks: Andy
Sharp [spelled phonetically], Heather Mefford [spelled phonetically]. They did most of the
post docs that worked on the human disease angle. Matt Johnson [spelled phonetically]
and Zo Xi Jang [spelled phonetically], they are both students who did all of the work
with respect to the evolution of these core regions. Good colleagues in sequencing centers,
Baylor Washoe [spelled phonetically] and specifically at NISC, who rose to the challenge of sequencing
some of these very difficult clones. I think I have a reputation, probably rightly deserved,
that these are some of the nastiest clones for sequencing centers to sequence. Thank
the patience of people like Bashali Mascari [spelled phonetically], Bob Blakesly [spelled
phonetically], and really Jerry Bouchard [spelled phonetically], who really took these on and
took them to completion, at least within the primates. And a lot of great colleagues, clinical
colleagues, particularly overseas, that have been very forthcoming in providing samples
and working on collaborations. Thank you. [applause] Dr. Eric Green:
We have time for quick questions. Any questions from the floor? Dr. Evan Eichler:
Richard’s got one. Dr. Eric Green:
Richard, you’ve got one? Come over here. Dr. Richard Gibbs:
Yea. Evan, what about mild mice? Have you got any duplication data there? Have they
got the same low level of these events? Dr. Evan Eichler:
Nothing on — you mean wild outbred mice? Yea, so no information on wild yet. We have
a lot of information from the inbreds over the duplicated regions, and they show as much
variation as humans do. The only difference is that variation is restricted to the duplications
which are tandem, and doesn’t influence these unique stretches between the interspersed
duplications. Dr. Eric Green:
So I want to ask a question and we’re trying to get a PowerPoint loaded, so I’ll also
stall a little. Evan, so the screen that you did of the pediatric patients, you made some
comment that if you screened for more pediatric diseases you’d like to find more of these
copy number changes, but there’s no reason to think if you screened unusual adult onset
diseases you might find similar. Dr. Evan Eichler:
Right. No, we’ve toyed with the idea and we’re thinking about doing a more adult-oriented
disease, I guess. I guess I kind of have this fundamental belief that if we can show it
at a pediatric level, there’ll be a stronger genetic component, and so I’m more interested
in actually screening more kids with disease in which we don’t have a good explanation,
than actually looking at diseases where environment will play probably a bigger role and genetics
might play less.

5 thoughts on “Human Genome Structural Variation, Disease, and Evolution – Evan Eichler

  1. wondering if you can please tell us which disease genes are mapped in the regions of our duplications… we are in australia and our geneticist, neurologist and metabolic specialist explained our condition is rare and they do not know of any other cases. our neurologist has corresponded with american geneticists regarding their research studies. two of our 3 small children, myself and my brother have a ring duplication of chromosome 16p11.2-16q12.1

  2. last post continued: (16p11.2-16q12.1 duplication)
    myself (12.4MB 33582714-45952128) and my brother (11.9MB 34071832-45952128) we have no noticeable clinical symptoms. my 5 year old boy – (11.8MB 34109660-45928657) with very mild clinical presentation. my 3 year old boy – (14MB 31943419-45952128) with significant clinical presentations. any thoughts are appreciated greatly. thanks in advance

  3. @adamleah Your best try is searching "16p11.2 16q12.1 duplication
    " without quotes or similar in the PUBMED website. Some publications are free others are not. I've seen many new genetical, medical and psicological studies about duplications lately, so you might see something new coming up if you search again after few months. I think that there is much interest in duplications because of higher frequency that previously thought, and possible linking to autism.

  4. It is very difficult to know without more information about symptoms. Based on a cursory search, the genes in that region I find most interesting are CLN3 (Batten Syndrome), APOBR (Apo B48 receptor, possibly resulting in some sort of subacute combined degeneration), or ATXN2L (spinocerebellar ataxia variants)

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top