The largest family tree to date — which includes 13 million people going back 11 generations and 500 years — provides new insights about marriage and death, and it all comes from public data.
The tree was created by a team led by Yaniv Erlich, a Columbia University computer scientist who is also chief science officer at the genealogy company MyHeritage. Erlich’s team downloaded 86 million public profiles from the ancestry site Geni.com (which is owned by MyHeritage). Many small family trees emerged, along with one huge one with 13 million people; about 85 percent are from the Western world. The tree, which is available online, includes (anonymized) data on when and where everyone died. When Erlich’s team analyzed the data to find trends related to marriage and death, they found that genetics may play a smaller role in longevity than we thought, and the advent of mass transportation wasn’t the only reason why we started marrying people outside the family. The results were published today in the journal Science.
The project was possible in part because the Geni platform lets users merge trees. “So if you put your tree and I put mine, and we share an Uncle Albert, the website would offer to merge the trees together to create a much larger tree,” says Erlich. This way, his team didn’t have to start from scratch but could build on the work of people using the site. After downloading the raw data, the challenge was to clean it and make sure it didn’t include results that were biologically impossible, like people with three parents. If the data wasn’t clean, they wouldn’t be able to run algorithms to analyze the information.
For the analysis, the researchers focused on two topics: how long we live and who we choose to marry. By measuring the birth location between husbands and wives and tracking that over time, they found that, unsurprisingly, before the Industrial Revolution most Americans married someone within six miles of where they were born. This person was also likely to be a relative — a fourth cousin on average, says Erlich. After the Industrial Revolution, when transportation became more common, people started to marry those who were born farther away and were more distantly related. (By 1950, people were finding their spouses within 60 miles of where they were born.)
But the pattern shows it’s not all about transportation. Between 1800 and 1850, people were traveling more and moving to cities en masse, but the genealogical distance remained the same: in other words, people were still marrying their relatives. “This suggests that the advent of mass transportation and the train system was not the only reason that people took to marrying their cousins,” says Erlich. “There’s a lag between the two, so it’s likely that cultural factors also made people start marrying outside their group.”
Next, to look at death. The researchers analyzed the lifespans of 3 million relatives who were born between 1600 and 1910 and lived past age 30. (The data didn’t include twins and people who died in wars.) Genes obviously play a role in longevity — someone with a gene that makes them more likely to have cancer will likely have a shorter lifespan — but environmental factors matter a lot, too. By comparing each person’s lifespan to that of their relatives, they found that genes are responsible for about 16 percent of the variation in how long they lived. Peter Visscher, a quantitative geneticist at the University of Queensland who was not involved in the study, noted that he would have guessed genes were responsible for 10 to 20 percent of longevity, which is in line with the authors’ report, though some estimates have given numbers as high as 30 percent.
The results also suggest that genes that influence longevity act independently instead of interacting with each other — a question that has been a big debate in the field of genomics. If gene variants worked together, there would be a bigger correlation in lifespan between relatives who are more closely related. For example, the correlation in lifespan should increase very quickly between two first cousins compared to two identical twins. But that pattern didn’t appear in the data.
Because the data is available for free online, there are a lot of different questions it could help answer in the future, says Erlich, such as how migration affects fertility. Additionally, MyHeritage now offers DNA tests. So if Geni.com users uploaded genetic data that matched the genealogy one, scientists could answer even more questions of nature and nurture, Visscher wrote in an email to The Verge.
In the meanwhile, it’s really quite beautiful to see so many lives that are spread all over the world visualized in an interlocking map.