Tracking HIV Evolution
Theoretical biologist Bette Korber’s career has been devoted to classifying one of the most variable viruses ever identified
By Regina McEnery
Los Alamos, New Mexico, consists of a series of rust-colored mesas that form a picture-postcard setting: snow-capped Jemez Mountains in the distance and vast swathes of undisturbed wilderness that belie its history-making role in US defense.
Los Alamos is, of course, the place where 30 scientists gathered in 1943 to build the world’s first atomic bomb. Physicists recruited to work at the Los Alamos National Laboratory (LANL) with nuclear physicist J. Robert Oppenheimer, scientific director of the Manhattan Project, worked feverishly in the race to develop a nuclear weapon before the Germans. Their top-secret crusade transformed this bucolic southwest community and its acres of pine trees into a nerve center for US military weapons research.
Seven decades later, the LANL campus still looks a bit like a frontier outpost. Single-story modular units—like those found at construction sites—are laid out like a maze and are surrounded by acres of federally-owned forest that remain largely off-limits to the public.
It is here that theoretical biologist Bette Korber and her team of 13 multi-disciplinary scientists track, with exquisite detail, the evolution of one of the most diverse and peripatetic viruses ever identified. They do this with a vast network of super-computer systems—some occupying a space equivalent to half a football field—that can crunch data at speeds of up to 10 to the 15th power, or a quadrillion calculations per second, roughly in the blink of an eye.
The work of tracking HIV began in earnest in 1986 with the creation of the HIV Database and Analysis Project. The US National Institute of Allergy and Infectious Diseases (NIAID), which funds the project through an agreement with the Department of Energy, hoped the formation of the database would accelerate the development of better drugs, as well as a vaccine to protect against HIV/AIDS.
Since its inception, the main goal of the HIV database project has been to collect, curate, and annotate HIV genetic material, and provide the data to scientists in an open-access environment to try to encourage collaboration within the field. Relying on data from an earlier LANL effort called GenBank, a public database set up in 1982 to store laboratory samples of previously sequenced organisms, the HIV database took the effort to a new level.
The red phylogenetic tree represents the genetic variability of HIV-1 V2-C5 in the Democratic Republic of the Congo in 1996. The black phylogenetic tree represents the genetic variability of global influenza A virus in the same year. The size depicts the extent of variation.
To date, the HIV database contains published genetic sequences of DNA from 250,000 different viruses obtained from HIV-infected individuals around the world. Although not the only database of its kind—Stanford University also has a database of about 100,000 viral sequences that it uses to identify drug-resistant HIV mutations—it is by the far the largest and most utilized by HIV researchers. And the project’s scope and mission hasn’t ended there. LANL scientists have also cooked up dozens of software tools to assist scientists in their research, including programs that help identify the specific subtype or clade of the HIV sequence. More recently, researchers used the HIV database at LANL to engineer vaccine candidates designed to provide greater coverage against the diverse strains of HIV in circulation—which Korber hopes will finally provide sufficient breadth to cope with the problem of viral diversity in HIV vaccine development.
LANL researchers have also established three additional databases. A molecular immunology database provides a comprehensive listing of defined HIV epitopes, including epitope alignments, epitope maps, and reference information for cytotoxic T-cell (CTL) and helper T-cell epitopes, as well as antibody-binding sites. Another database tracks drug-resistant HIV mutations, while another tracks HIV vaccine trials being conducted in nonhuman primates (NHPs).
The HIV database, overseen by Korber, is an incredibly important tool for HIV scientists, judging by the number of citations and acknowledgements to LANL, as well as the numerous plaudits from leading AIDS researchers. “They have supported everyone’s research over the past 15-20 years and are the central repository not only for sequences but meticulous annotations of those sequences,” says Barton Haynes, director of the Duke Human Vaccine Institute and the Center for HIV/AIDS Vaccine Immunology (CHAVI) at Duke University in North Carolina, who has collaborated with LANL researchers on a number of vaccine-related projects.
The spectrum of research studies that have benefited from the HIV databases has been huge, and ironic, given that its founder Gerald Myers, an LANL scientist, initially thought the HIV sequencing project would only last about a year. But from its inception, the project was flooded with database entries that reflected the incredible genetic variation of HIV strains circulating globally. Myers soon realized the tremendous challenges viral diversity posed in the development of an effective AIDS vaccine and pushed NIAID to escalate funding and expand its contract.
Much of the credit for the project’s credibility today goes to Korber, whose aversion to doing experiments—or as she phrases it, the tedious cycle of pipetting, pipetting, pipetting—drove her toward mathematics, and ultimately theoretical biology, when she was working toward her doctoral degree in immunology at the California Institute of Technology (Caltech) in the 1980s. “I like thinking and puzzling better. That also takes great care and hard work, but it’s just the nature of the work that I like better,” says Korber, when interviewed at her office on the eve of the annual Keystone Symposia on HIV Biology and Pathogenesis in Santa Fe, New Mexico.
As the largest repository of information about the mind-boggling diversity of HIV, it’s no surprise that over the years, the HIV database project, and Korber as well, has been drawn into thorny, sometimes contentious, debates about who discovered HIV, the origin of the virus, and even the widely publicized case of a Florida woman who claimed she had been infected with HIV by her dentist.
Using math to tackle HIV
Theoretical biologists use a variety of analytic tools, from mathematical and computational models to systems biology and bioinformatics, to better understand biological systems and predict how they will evolve. This partly explains why Korber accepted a position at LANL—with its nascent database project and access to some of the best computer hardware in the world—after completing a post-doctoral fellowship in molecular epidemiology of human retroviruses at Harvard University in 1990.
But there were also very deep, personal reasons why Korber decided to focus her attention on HIV. In the early 1980s, when Korber and her fiancé James Theiler were studying at Caltech, they became close friends and housemates with a physicist from the UK. The bond was so close that when Korber and Theiler decided to get married in 1988, their friend received training as a lay minister so he could marry them at a ceremony by a stream in the mountains above Pasadena, California. “He was just a wonderful, brilliant man,” Korber says of her housemate.
Their friend was also, unfortunately, one of the earliest reported individuals to be infected with HIV in Pasadena. His struggle with the virus had a profound effect on Korber’s life. It was still early days in the escalating epidemic, long before highly active antiretroviral therapy (HAART) began rescuing HIV-infected individuals from the brink of death. “We learned a lot about HIV while he was sick,” says Korber. “But there was no treatment for him and he died in 1991. I decided when I graduated from my PhD program that I wanted to work on HIV.”
Specifically, her friend’s battle with HIV propelled Korber to commit her life to finding an AIDS vaccine. “I hate HIV,” she says, her voice rising with emotion. “I lost a couple friends to it. HIV kills in horrible ways. I think of what the epidemic has done to Africa and it motivates me.”
Korber spent her first few months at LANL getting used to “playing” on the computer, a transition made easier, she says, because her mentor, Myers, was patient and gave her space. Eventually, Korber suggested to Myers that LANL add the Molecular Immunology Database, which like the HIV Sequence Database, was the first database of its kind dedicated to a single pathogen.
The goal of the immunology database was to provide a comprehensive listing of defined HIV and SIV epitopes associated with sequences previously published in scientific literature and submitted to the HIV database, and then make the searchable collection available to the general scientific community. Launched in 1995, it now contains more than 1,200 HIV epitopes, with at least 275 of them considered “A-list” because they have been characterized with a high degree of detail, according to HIV Molecular Immunology, which provides annual updates and reviews of the database. Over time, the development of large cohorts of individuals known as long-term nonprogressors, who have demonstrated an unusual ability to control HIV infection without treatment, and an evolving war chest of gene sequencing and data analysis tools has enabled researchers to assess different HIV epitopes for their potential role in controlling or preventing HIV infection, the authors of the compendium noted in its 2009 review.
“The HIV Database project took on the issue of the interface of the virus with the host, compiling not only viral sequences but immunological epitopes recognized by B cells, CD4+ and CD8+ T cells, and antibodies, then laid out the foundation for a relational database that they made available to the field. They emphasized the need for collaboration early on in the AIDS epidemic,” says Haynes.
In most cases, the information about each epitope includes the protein fragment’s published name, the specific protein that it is associated with, the location on the protein within a region of 21 amino acids or less, the viral subtype, and the host species.
A more in-depth search of each epitope will show the country where the circulating virus was identified, assays used to test the immune response, the major histocompatability complex/human leukocyte antigen (MHC/HLA) of the infected donor, and how many different epitopes are linked to the particular HIV sequence in question. Each epitope entry in the HIV Molecular Immunology Database also includes annotated footnotes that summarize information about the immune responses measured, such as cross-reactivity patterns, escape mutations, and antibody sequences that overlap with an epitope, as well as a link to studies measuring the epitope response in human and animal studies.
By documenting all the known epitopes of every DNA sequence published in the HIV literature, the HIV Immunology Database offers researchers an unprecedented way of studying HIV’s diversity. “What we did was really unique,” says Korber.
Bruce Walker, director of the Ragon Institute, first met Korber when she was doing her post-doc at Harvard and the two are now collaborators on various projects. Like many scientists in the field, he has found the LANL HIV database a uniquely valuable resource, and gives Korber high marks for her oversight of the project.
“I think she’s extremely careful, meticulous, and passionate about digging through to the truth behind the phenomenon we are observing,” says Walker. “She’s been a fantastic steward for this repository because she puts so much effort into making sure that what is in there is accurate. I can’t express that enough. A database is only as good as the data put into it. This is a resource you can completely count on and it has been an enormous benefit for the field.”
Korber, along with Myers, also helped shepherd in an at first controversial policy for journals in the early 1990s that ended the practice of allowing researchers to publish papers about viral sequences without submitting the sequences to the public repository. Sometimes researchers would not take the time to make the sequences public, closing the door on other researchers trying to replicate the findings and missing the opportunity to build on the collective body of sequencing information. “We had to fight for this,” says Korber reflecting on the new policy, which was eventually adopted by major scientific/medical journals. “It will be interesting to see how curation and data sharing unfold in the years ahead with the advent of new sequencing technologies.”
|A Computer Powerhouse|
The HIV Database and Analysis Project at the Los Alamos National Laboratory (LANL) catalogs and analyzes a dizzying array of HIV fragments and isolates. The ability to track one of the most variable viruses in history comes from an evolving stable of supercomputers and state-of-the-art genotyping tools that researchers at LANL can access. Here are three key examples of how computational technology has informed AIDS research.
• About a decade ago, scientists at LANL turned to what was then the fastest unclassified supercomputer in the world, a system known as Nirvana, to construct phylogenetic trees that ultimately helped them trace HIV back to its most common recent ancestor (1). The scientific analysis conducted by theoretical biologist Bette Korber and other members of the LANL team showed that the HIV pandemic likely began between 1915 and 1930.
This was not just an interesting development for the biological history books, it directly challenged a controversial hypothesis that the virus had sprung up in humans in the late 1950s because batches of oral polio vaccine cultured in primate cells were contaminated with simian immunodeficiency virus (SIV), the monkey equivalent of HIV. Developers of the vaccine denied that chimp tissue had been used to make the polio vaccine, but the theory persisted, in large part, due to circumstantial evidence laid out in the book “The River,” by British journalist Edward Hooper. The LANL research, with the help of Nirvana, provided the strongest evidence to counter that theory.
The Nirvana system, capable of making one trillion calculations per second, enabled scientists to analyze very large sets of HIV Envelope sequences derived from blood samples of about 160 individuals infected with HIV-1, and then apply these sequences to sophisticated evolutionary models. This type of work would have been impossible using previous computer systems.
Nearly a decade later, Michael Worobey, an evolutionary biologist at the University of Arizona, built on the LANL findings. Using more advanced technological tools, he estimated that the HIV pandemic likely began between 1884 and 1924, based on the amplification and sequencing of a wax-embedded lymph-node specimen obtained in 1960 from an adult female from what is now the Democratic Republic of the Congo (2).
• Last year, through a unique arrangement that allowed a handful of scientists access to LANL’s latest supercomputer, the Roadrunner (pictured below), before it was moved to a classified computing network, Korber, computer scientist Marcus Daniels, and physicist Tanmoy Bhattacharya compared the evolutionary history of more than 10,000 genetic sequences from more than 400 HIV-infected individuals to try and identify common features of the virus that is transmitted and establishes infection. This work was done in collaboration with the Center for HIV/AIDS Vaccine Immunology (CHAVI), of which Korber is an investigator. CHAVI collected the samples from both acutely and chronically HIV-infected individuals from around the world. The samples were used to construct the world’s largest phylogenetic tree, with the end goal of identifying similarities in HIV sequences from samples taken during acute and chronic infection. A single HIV-infected person can have 100,000 different variants of the virus circulating throughout their body, so understanding how these variants branch off from the initially transmitted virus is important for the development of vaccine candidates. To build such a tree, LANL researchers needed Roadrunner‘s processing capability. Roadrunner does 1.042 petaflops, or a quadrillion calculations per second, using 122,400 processors. To gauge the power of Roadrunner, consider this: It took a single week to run a calculation on Roadrunner that the fastest supercomputer a decade ago needed 20 years to complete.
• LANL researchers and their collaborators at Duke University and the University of Alabama-Birmingham have also applied a next-generation genotyping tool to track the evolution of HIV immune escape during acute infection, allowing researchers to identify rare viral variants that would not have been detectable using conventional sequencing technologies. The 454 sequencing technology, developed by Roche spinoff 454 Life Sciences, is being used increasingly by AIDS researchers to study viral diversity because it requires fewer cloning steps and produces unprecedented quantities of sequencing data. This sequencing tool can obtain more than one million DNA base pairs per run.
Korber and her collaborators recently used 454 sequencing to look at early cytotoxic T-cell escape in four epitopes from three HIV-infected individuals during acute infection. The first sample from each individual was taken during acute infection, prior to an observed immune response, with two additional samples over the course of several weeks. The number of sequences obtained ranged from a few thousand to more than 100,000 per sample, and reflected a much higher level of diversity generated by immune escape than was expected. Korber, who directs the HIV database project, says the level of detail and clarity provided by the genotyping tool is enlightening. “It reminded me of when I was 14 and I got my first pair of glasses,” says Korber. “Before that, I saw trees as great green blobs. When I got my glasses, I could for the first time see the leaves.” —RM
Her many passions
Korber’s work schedule is grueling. She usually rises at 4 a.m. and is often firing off emails to colleagues as the clock approaches midnight in her Los Alamos-area home. “I can vouch for that,” says Mark Muldoon, a long-time friend and colleague from the UK, who was visiting Korber’s lab while he was in Santa Fe for the January Keystone Conference.
But HIV research is not her sole passion. Korber and her husband, a physicist at LANL’s Space and Remote Sensing Sciences Division, both love to hike, and Korber holds a black belt in Tae Kwon Do. Korber also jams regularly with a Celtic band called Roaring Jelly, named for the blasting gelatin used more than a century ago for mining operations. Korber plays the bodhran, an Irish hand-held drum about twice the size of a tambourine, and the Irish whistle. Her 17-year-old son, Sky Korber, plays a “hot fiddle” in the band, says Korber, referring to her son’s musical prowess. Korber’s 21-year-old son, Max Theiler, attends the University of California in Santa Cruz.
Korber has also taken a keen interest in helping people and regions disproportionately impacted by the HIV pandemic. Four years ago, Korber used US$50,000 in prize winnings from the prestigious E.O. Lawrence Award—the Department of Energy’s highest honor for scientific achievement—to help establish, along with family and friends, an orphanage in South Africa for 500 AIDS orphans. The orphanage was created under the auspices of Nurturing Orphans of AIDS for Humanity (NOAH). Korber is also trying to help initiate use of portable, maintenance-free gardening systems known as Earth Boxes, which have been placed at various orphanages, clinics, and schools in Africa.
In addition to leading an eclectic group of molecular biologists, sequence analysts, and computer technicians at LANL, Korber is also on the faculty of the Santa Fe Institute, a 26-year-old research and education non-profit organization that encourages collaboration among scientists across different disciplines to solve complex problems of the day. Her research portfolio also includes hepatitis and she recently received a $1.5 million grant to study the interactions between tuberculosis and HIV.
But Korber’s main research endeavor, from the start, has been driving toward the development of the elusive AIDS vaccine, and specifically, how a vaccine could overcome HIV’s diversity, one of the most potentially vexing obstacles to the development of a vaccine. Although recent findings from complete genome sequence analyses of transmitted founder viruses suggest a single viral variant usually initiates infection in heterosexual transmission cases, the infecting strains are still unique and distinctive.
Designing a vaccine capable of overcoming such genetic variation has been daunting. One approach being explored by Korber, along with collaborators at Beth Israel Deaconess Medical Center in Boston, the University of Manchester, NIAID’s Vaccine Research Center, the University of Alabama, and Duke University, is to use various computational methods to determine the most common amino acids in the Envelopes of multiple variants of HIV from different clades, and then develop antigens based on these Env proteins, which are referred to as consensus sequences.
When a vaccine candidate containing a computationally derived, global consensus Envelope sequence was evaluated in rhesus macaques, it generated cellular immune responses to three- to four-fold more HIV epitopes of Env proteins across clades A, C, and G than a clade B immunogen from a naturally occurring Envelope sequence from a single individual did against clades A, C, and G. Moreover, the T-cell responses stimulated by the consensus immunogen within clade B was comparable with those stimulated by the naturally occurring clade B immunogen (3).
More recently, Korber and her collaborators also created what are referred to as mosaic vaccine antigens, which are assembled from natural sequences and optimized to achieve coverage of the many different versions of HIV proteins that are circulating. These mosaic vaccine candidates triggered strong cross-reactive immune responses in rhesus macaques in two separate studies (4,5).
One study led by Norman Letvin, a professor of medicine at Beth Israel Deaconess Medical Center, showed that the CD8+ T-cell responses in rhesus macaques vaccinated with a prime-boost regimen of a DNA plasmid followed by a recombinant vaccinia virus vector were stronger if the vaccine constructs expressed mosaic immunogens compared to those expressing consensus immunogens (5). “This increased breadth and depth of epitope recognition could contribute to protection against infection by genetically diverse viruses and, in some instances, may block the emergence of common variant viruses,” the study’s authors noted.
A second animal study led by Dan Barouch, also of Beth Israel Deaconess Medical Center, evaluated mosaic Gag, Pol, and Env antigens expressed by recombinant, replication-incompetent adenovirus serotype 26 (rAd26) vectors. The team immunized 27 rhesus macaques with a single injection of the rAd26 vectors expressing mosaic antigens, consensus antigens, combined clade B and clade C antigens, or naturally occurring clade C Gag, Pol, and Env antigens. The Ad26 vector expressing mosaic antigens induced CD8+ T cells that recognized more epitopes, as well as more variants within an epitope, than Ad26 vectors expressing consensus or natural sequence antigens (4). Overall, mosaic antigens provided a four-fold improvement in the breadth of the immune response.
Taken together, these NHP studies suggest that mosaic antigens could both broaden the range of recognized epitopes and increase responses to high-frequency HIV variants, although it remains to be seen if this approach will work as well in humans. A Phase I trial to compare the safety and immunogenicity of mosaic Envelope antigens with antigens that express either a global consensus Env sequence or a natural env gene, is scheduled to begin later this year and will involve about 100 volunteers. The HIV Vaccine Trials Network is conducting the trial in collaboration with CHAVI, the European Vaccine Effort Against HIV/AIDS, and the Bill & Melinda Gates Foundation.
“I am really hopeful,” says Korber, who confesses she “thinks about sequences and HIV diversity all the time. We have to deal with the diversity issue. If we don’t, we will never have a vaccine that works.”