Loading...

Follow The DNA Geek | Mixing science and genealogy on Feedspot

Continue with Google
Continue with Facebook
or

Valid

Ancestry is having a
4th of July Flash Sale
in the US for 2 days only!

Working on mystery in your family tree?  This sale is a great opportunity to get access to both family history records and DNA matches for great prices.

  • Ancestry® Family History Memberships and will be on sale for 50% off. (Terms apply.)
  • AncestryDNA® will also be on sale for $59.

The sale starts at 9 pm Pacific time on July 2 and ends at 9 pm Pacific on July 4 (midnight to midnight Eastern time).

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

The past year has seen a chilling in the genetic genealogy industry:  DNA kit sales are down drastically since April 2018.

I don’t work for any of the testing companies, nor do I have any special insight into their official sales figures.  However, I have been tracking their database sizes for a couple of years now (with data retroactive to 2013), and the decline in growth rate is obvious.  Yes, the databases are still growing, but they’re growing more slowly than before.

Consider the graph below.  The slope (steepness) of each line indicates how fast the database is growing at that point in time.  Notice that the slopes for AncestryDNA (green) and 23andMe (purple) got steeper and steeper until April 2018, after which growth of both databases slowed (the region in the grey box).

Growth at the other databases (with the possible exception of MyHeritage; see below) has also slowed since April 2018, although it’s harder to see from the graph because the scale on the y-axis is set to the larger companies.

How much has growth slowed?

Curve Fitting

We can predict how large the databases would be had they continued to grow at the rates prior to April 2018 using curve fitting.  Curve fitting is a mathematical process in which an equation is found that “fits” the real-life data as best as possible.  Once a good equation is found, it can be used to extrapolate the expected values beyond the range of the existing data.

We can gauge how well the equation fits the data using a metric called R² (pronounced R-squared).  The R² value is always between zero and one.  The closer it is to one, the better the equation fits the real data.

I used an online curve fitting tool called MyCurveFit to fit exponential equations to each company’s growth trajectory through April 2018.  In exponential growth, the rate of change increases over time, which is what we see in the graph above prior to April 2018.

For each database, I plotted the database sizes as currently known on the same graph as the values calculated from the fitted equation.

AncestryDNA

AncestryDNA has the largest database of the genealogical testing companies, larger than the others combined.  In May 2019, they announced that their database contained more than 15 million people.  Previously, they’ve announced growth milestones three or four times per year, giving me 19 data points prior to April 2018 and two after that date.

The graph below compares the actual reported values from Ancestry with the values projected by the equation.

The two lines overlap nearly perfectly prior to April 2018.  In fact, the R² value is 0.9970, or almost one.  However, the lines diverge sharply after that date.  Had AncestryDNA’s database continued to grow at the previous rate, the equation projects it would have had more than 21 million people in May 2019 rather than the reported 15 million.

Put another way, from April 2018 to May 2019, the database added 6 million people, when it was predicted to add more than 12 million. That’s a decline in growth of 51%.

23andMe

23andMe is the second largest genealogical database, with more than 10 million people as of April 2019.  The company reports their database size once or twice a year, giving 14 data points prior to April 2018 and one after.

The graph below compares the actual reported values from 23andMe with the values calculated from the equation.

The two lines overlap well prior to April 2018, with an R² value of 0.9612.  As with AncestryDNA, the lines then follow different trajectories.  Had 23andMe’s database continued to grow according to the equation, it would have had nearly 14 million people rather than the 10 million reported in April 2019.

From February 2018 to April 2019, the database added 5 million people.  It was projected to add nearly 9 million, a decline in growth of 43%.

FamilyTreeDNA

FamilyTreeDNA is the smallest of the databases discussed here, and they have never officially announced how many autosomal DNA testers they have.  The values used here were estimated by Tim Janzen, a long-time customer of the company, and published on the ISOGG wiki. There were 20 data points prior to April 2018 and three after.

The graph below compares Tim Janzen’s estimates for FamilyTreeDNA’s autosomal database with the values projected by the equation.

The R² value prior to April 2018 is 0.9901 and, again, we see a decline in growth after that point.  Had FamilyTreeDNA’s autosomal database continued to grow according to the equation, it would have had about 1.5 million people rather than the 1 million estimated in February 2019.

Assuming the estimated numbers are correct, FamilyTreeDNA added 200,000 people from March 2018 to February 2019, when it was projected to add 700,000.  In other words, growth declined 71%.

GEDmatch

The owners of GEDmatch have kindly reported their database size to me personally at intervals since January 2016, giving seven data points prior to April 2018.  Either directly from GEDmatch or from media reports, I had five data points after April 2018.

The graph below compares the actual values with those projected by the best-fit equation for GEDmatch.

The two lines are almost identical before April 2018, with an R² value is 0.9977.  After that point, the GEDmatch database initially grows faster than expected, then declines below the values predicted by the equation.  Had GEDmatch continued to grow as projected, it would have had more than 1.4 million people in May 2019 rather than 1.2 million.

GEDmatch added 387,000 people from February 2018 to May 2019, when it was projected to add nearly 650,000.  Growth declined 40%.

MyHeritage

MyHeritage is the most recent entry to the DNA testing market that is discussed here.  Thus, there were only four data points prior to April 2018, not enough to fit a reliable curve. (The projected database size based on those four points was 79 million, which is simply not credible.)  Thus, for MyHeritage—and only for MyHeritage—I included one data point from May 2018.

The graph below compares the actual growth trajectory for MyHeritage with that projected based on those five points.

The R² value prior to May 2018 is 0.9891.  Like the other databases, growth was slower than expected after that point.  Had MyHeritage’s database continued to grow according to the equation, it would have had nearly 3.8 million people rather than the 3 million reported in May 2019.

Between May 2018 and May 2019, MyHeritage added 1.6 million people. If the projections are correct, it was expected to add nearly 2.4 million, a decline in growth of 32%.

The Obvious Question

The pattern is clear:  something happened early in 2018 to cause database growth to slow across the board, from 32% at MyHeritage to as much as 71% at FamilyTreeDNA.  The question is:  What?  What caused the decline?

One possibility is market saturation.  Perhaps genetic genealogy is approaching its natural consumption level, where those who are inclined to purchase a test already have.  The counter to that argument is that 23andMe is not a genealogy company; it’s a biomedical one.  Theirs is a different market, yet the company’s growth declined along with those of the genealogy companies.

What’s more, one might reasonably argue that the market in the United States is approaching saturation, but relatively few people in Europe have tested, meaning there’s still ample room for growth there.  Yet MyHeritage, which is based in Israel and sells most of their DNA kits in Europe, also experienced a decline in growth.

It’s also possible that my numbers for past growth are wrong because the testing companies don’t report the exact database size on a specific date.  Rather, they usually report that their database is “larger than X”, and the precise date it hit that threshold is not publicly known.  However, the numbers I have for GEDmatch are largely date-specific, and GEDmatch’s growth slowed, as well.

The elephant in the room, of course, is the use of some genealogy databases, specifically GEDmatch and FamilyTreeDNA, by law enforcement.  That fact first became public knowledge on April 25, 2018, when the Golden State Killer story broke.  And April 2018 is precisely when we see a decline in growth across the board.

Public concern over law enforcement using genetic databases seems the most likely explanation for the cooling of the market.  In fact, Anne Wojcicki, the CEO of 23andMe, publicly speculated that law enforcement and privacy concerns were indeed behind their decline in growth.  And she actually does have inside information on their sales figures!

This explanation fits a few observations well.  First, the company showing the smallest decline, MyHeritage (32%), is also the company whose market is furthest removed from the American judicial system and thus either unaware of or unthreatened by US law enforcement using genealogy databases.

Second, the company most welcoming of..

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Emily’s is a true story, but some given names and surnames have been changed to protect privacy.

When Emily’s DNA results first came back, the suspicions began to churn.  Her birth in 1950, 10 years into her parents’ marriage.  Her mother Sara’s explanation, in Emily’s teen years, that they struggled with fertility; that they were in the process of adopting; that their first thought was cancer—not pregnancy—when they finally did conceive; that Emily was a miracle in more ways than one, born with white-blonde hair that never darkened to two raven-haired parents.  An aunt who announced, at a family gathering when Emily was in her 40s, that ‘Everyone knows Emily was adopted.’  Sara denied it, but also stopped talking to her own sister.

You see, Emily’s parents were both Jewish, as were all four of her grandparents, but Emily’s ethnicity estimates from four different DNA testing companies agreed:  genetically, Emily was only half Ashkenazi.  At least one of her parents, it seemed, was primarily British.

Emily’s ethnicity estimates from (clockwise) AncestryDNA, 23andMe, Family Tree DNA, and MyHeritage. What Was Going On?

Both of Emily’s parents have passed away, so she couldn’t ask them.  Emily had several close DNA matches who are descended from her mother Sara’s siblings, so that part of her tree was not in question.  Clearly, she was Sara’s daughter.

Family tree showing Emily (gold) and her maternal DNA matches at AncestryDNA (green), 23andMe (purple), and Family Tree DNA (rose). Numbers are centimorgans / # segments.

However, her father’s brother tested at Family Tree DNA and shared only 69 cM with her.  What’s more, once the small, unreliable segments were filtered out, Emily and her uncle shared only 16 cM over two segments, an amount typical for unrelated Ashkenazim.  Biologically, he was not her uncle.

Had her father, Leo, been adopted into a Jewish family?  Was he not her biological father?  Had her mother been unfaithful?  Emily found the latter almost impossible to believe.

She thought back to her aunt’s comment decades before that she was adopted.  Could she be the product of an early sperm donation?  In 1950, no state in the USA legitimized donor-conceived children, even if the husband had consented to the procedure.  Sperm donation was considered by many “semi-adoption.”

If Emily were donor conceived, who was her biological father?

A Blessing and a Curse

Being genetically half Jewish and half English is both a blessing and a curse for genealogy research.  On one hand, the Jewish and non-Jewish sides (maternal and paternal, respectively, for Emily) are very easy to distinguish.  On the other, the vast majority of Emily’s matches—75% or more—were on her Jewish side.

That didn’t leave us much to work with on her unknown paternal side.  In fact, there were only two matches that we could link to one another initially, two first cousins, John and Susan, who were both grandchildren of an Ernest Hayes and Hattie Gibson from Reading, Massachusetts.

The challenge, though, was that John and Susan are first cousins to one another, but the amount of shared DNA suggests that they are second cousins to Emily.  Emily could be related to them through their Hayes line or through their Gibson line.  We had to research both.

The Reading Connection

Emily was born and raised in a western suburb of Boston, but the Hayes–Gibson family was from Reading, Massachusetts.  That got Emily thinking.  Her mother had had a dear friend from Reading, a man named Fred Watson, which whom she shared a passion for classical music.  They spent a lot of time together listening to new or favorite pieces. Fred had even given Emily a book about musical instruments, inscribed “to Emily, already a person of note.”

Leo never objected to the friendship.  Fred lived with his mother Ellen until she died, and Sara always assumed he was gay.

Could Fred have helped Sara and Leo conceive?  Was it crazy to think that based solely on a tenuous tie to Reading?

Emily and I didn’t have to dig far to get our answer:  Fred’s mother Ellen was a Gibson!  Hattie’s sister, in fact.  If Fred had donated sperm to conceive Emily, then Emily is a 2nd cousin to John and Susan, which fits the DNA well.

The best way to prove it would be to test Fred or one of his direct descendants, but he had died in the late 1970s with no known children.  He did, however, have one brother, John, who had two living grandsons, Mark and Michael.  If we were right that Fred was Emily’s biological father, then Mark and Michael would be Emily’s first cousins once removed and would be expected to share about 425 cM with her.

We explained the situation to Michael.  He was initially suspicious of a prank, but when Emily sent him a picture of the handwritten inscription in the book, he agreed to take a DNA test.

It only took about 2 weeks for the results to come in.  Michael shared 495 cM with Emily!  Fred was, indeed, Emily’s biological father.

In the months since Michael’s DNA test, he has shared memories and photographs of Fred, and Emily has built out Fred’s tree many generations.  As an only child, she finds it comforting to connect with a past she never knew she had.

Everyone who knew the people involved is confident that Emily’s is not a story of infidelity.  It’s the story of a couple who desperately wanted a child and a caring friend who was willing to help.  Fred remained a part of the family’s life until he died.

During the course of the investigation, a family story came out:  Emily’s cousin Devorah recalled overhearing Sara tell her sister Irena (Devorah’s mother) how messy artificial insemination was, knowledge almost certainly gained the hard way.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

The genetic genealogy community is in turmoil.  People are angry, accusations are flying, and friendships are sundering.  All over something as seemingly straightforward as whether we have the right to decide whether our DNA data can be used by law enforcement.

The Backstory

Last year, the world learned that genetic genealogy had been used to identify a serial rapist and murderer who had terrorized the state of California in the 1970s and 1980s.  The so-called “Golden State Killer” had been dormant for decades and was finally captured thanks to the inspired work of Barbara Rae-Venter.  She knew from her years helping adoptees that he could be identified using DNA left at a crime scene.  She was right:  Joseph James DeAngelo was arrested in April 2018, thanks to her collaboration with the FBI.

Mug shot of Joseph DeAngelo, accused of being the Golden State Killer

While the world rejoiced that a vicious killer was finally off the streets, the surreptitious use of GEDmatch—a privately-owned third-party database created by and for genealogy enthusiasts—raised ethical and legal concerns.  After all, the people whose data helped identify the suspect had not consented to that use, and the FBI had not obtained a court order to access the database.  They hadn’t even asked GEDmatch’s owners for permission.

Since then, more than 50 criminal cases and 10 Doe cases (unidentified bodies) have been solved using these methods.  The benefit to society and the solace to the victims’ families cannot be understated.

The privacy concerns, however, must still be addressed.  After all, residents of the US are protected by the Fourth Amendment to the Constitution, which states:

The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

Privacy advocates argue that law enforcement violates the Fourth Amendment by searching a privately-owned database containing personal and biomedical genetic information without a warrant or individual consent of the DNA testers.  Notably, those advocates, including myself, don’t want to stop such searches; we want them to be conducted with either judicial oversight or explicit permission.  And since approval by a judge seems unlikely given that a warrant would not be able to describe the “persons or things to be seized” beforehand, explicit permission is key.

Uncharted Territory

The genetic genealogy world would have been better off had these questions been resolved before law enforcement entered our databases.  Instead, we’re all playing catch-up on scientific and ethical issues that would be challenging to discuss under the best of circumstances and are all but impossible now that so many are emotionally or even financially invested in one side or another. No aspersions cast here.  This is cutting edge stuff, and the guardrails are being built as the cars careen past, so to speak.

The fact remains that most people uploaded to GEDmatch with no idea at all of what was possible in terms of privacy.  Not just regarding law enforcement, but also family secrets, biomedical information, and who knows what else.

The entire community should take a breather and objectively assess where we want to be in 5 years, else I’m afraid that will be watching this play out on the floor of the US Supreme Court.  It may be too late to avoid judicial and legislative oversight, but implementing an informed consent model now for law enforcement searches seems reasonable.

Informed Consent

The term “informed consent” is borrowed from the medical community, which is ethically bound to ensure that patients and research subjects understand the risks and benefits of treatments (i.e., are informed) and that they or their powers-of-attorney agree before treatment is rendered (i.e., give consent).

23andMe offers an excellent example of informed consent in the context of genetic genealogy testing.  Their business model centers on crowd-sourcing genetic and trait information to advance medical knowledge and drug discovery.  To that end, their customers can opt in to research studies that use their DNA.   Their Research Consent Document clearly outlines the risks and rewards:

How will I benefit from this research?

  • By taking surveys you may learn about 23andMe’s research findings, including how your answers compare with those of others and new discoveries made by 23andMe’s research program.
  • Sometime in the future you or your family may benefit indirectly from research discoveries made by 23andMe or its research partners.

What are the risks of taking part in this research?

  • There is a very small chance that someone with access to the research data or results could expose personal information about you. 23andMe has policies and practices in place to minimize the chance of such an event.

Importantly, at no time does 23andMe make the decision for you, even though it’s in their interests for you to opt in.  And nowhere in the Research Consent Document does 23andMe try to influence your decision one way or the other.

That’s a far cry from what’s been happening with regard to law enforcement.  For example, Family Tree DNA unilaterally opted in the majority of their customers (and all of the Americans) to law enforcement matching.  That is not consent, much less informed consent.

And although GEDmatch recently implemented a system that requires an explicit opt-in (as opposed to opting people in without their knowledge), nowhere do they outline the risks.  (Judy Russell also observes that opting out does not make the kit truly invisible to law enforcement.)  What’s more, the owners are openly advocating one choice over another.

That’s consent, but not informed consent, and it’s extracted under pressure.  Should anything go awry with a criminal investigation in their database, GEDmatch may have exposed itself to liability with that statement.

The Risks and Benefits

We, as a community, need to think about the benefits and the risks of using the same databases as law enforcement.  Because no one else is, to my knowledge, addressing these points explicitly, I put forth this preliminary list.  I hope that others will chime in with both pros and cons that can be incorporated (with credit) in updates.  And I hope that both GEDmatch and Family Tree DNA will consider posting this list, or one of their own devising, on their own websites to ensure that their customers are fully informed about the choice they’re being asked to make.

How will I benefit from opting in to law enforcement matching?

  • By participating in law enforcement matching, your DNA data and family tree may be used to identify and capture a violent criminal or to give closure to the family of an unidentified deceased person (a John or Jane Doe).
  • In some cases, the researchers working with law enforcement may be willing to share their genealogy research with you once the investigation is complete.

What are the risks of taking part in law enforcement matching?

  • You may learn that you are related to a violent criminal.
  • Government agents, or their contractors, may discover family secrets about you or your relatives.
  • Your data and family tree may misidentify an innocent person as a criminal, putting them through unexpected stress and legal expenses.
  • You may be named in a search warrant or arrest warrant that becomes public, putting you at risk of public harassment or retaliation.  (See page 5, first two paragraphs, of this search warrant for an example.)
  • The Terms of Service at GEDmatch and/or Family Tree DNA may change or be waived to expand investigations beyond violent crimes.
  • And finally, while the vast majority of forensic genealogists are ethical and upstanding citizens, some may turn out to be otherwise.  After all, the Golden State Killer was a cop.
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

A family friend has leukemia.  He’s 11.  Without chemotherapy, he would have died.  Thanks to modern biomedical research, though, he’s through the worst of the treatment, back in school, and doing well.  He’s even participating in a drug study to improve the odds for patients who come after him.

There is no reason to think the experimental treatment will harm him; the doctors wouldn’t try it if they thought it might.  Even so, there would be outrage if he had been enrolled in the research study without the knowledge and consent of his parents.  Even though there is a greater good at stake.  Even though it could improve treatments.  Even though it could save lives.

That’s because informed consent is key.

Why should police investigations be any different?  They shouldn’t, and yet, somehow, they are.  Perhaps it’s because biomedical researchers ask for permission while the cops just barge on in.  And for reasons beyond my ken, some of the very people we’ve entrusted to guard our most personal genetic data are happy to comply.

And now we have the first evidence that GEDmatch broke their own Terms of Service—a contract between themselves and their users—simply because the cops asked nicely.  Deseret News reported yesterday on an assault (not murder or rape) case that was solved using GEDmatch, with GEDmatch’s permission.

Here’s their damage control:

The thing is, GEDmatch doesn’t have the right to give permission on behalf of their users.  They don’t have the right to unilaterally ignore their own Terms of Service because one guy thinks it’s a good idea at the time.  They have an obligation to put their users first, and they have utterly and completely betrayed that trust.

Parabon is at fault here, too.  Their lead genealogist is keenly aware of the debates surrounding informed consent, and Parabon knew that GEDmatch could not ethically give permission on behalf of a million people who were completely unaware of what was happening. Yet, they went ahead anyway.

What’s next?  Will GEDmatch and FTDNA open their databases for other violent crimes?  Crimes that are classified as violent even if no one gets hurt?  Why stop there?  It’s in the interests of society to stop petty crime, too, is it not?

Coming Full Circle

Let’s go back to medical research.  According to the Centers for Disease Control and Prevention, 647,457 people died of heart disease and 599,108 of cancer in 2017.  And that’s just the top two causes of death that could be cured with new treatments.  By contrast, the FBI reports that 17,284 people were murdered or died from non-negligent homicide that year, while 135,755 were raped.

Couldn’t we make a “compelling argument” that GEDmatch and FamilyTreeDNA should open their databases to Big Pharma—consent be damned—for the greater good?  After all, curing heart disease alone would save nearly 40 times more lives per year than preventing all murders.

Of course not!

We wouldn’t, and shouldn’t, argue that our data be handed over for biomedical research without our consent, no matter how much good could come of it.  Any database that did that would be pilloried.  Heck, some of the companies have been pilloried even when they don’t betray our trust that way.

Why, then, does GEDmatch, and FTDNA before them, think it’s okay to give law enforcement access to our data?  Not just law enforcement, but private, for-profit companies hired by law enforcement.  It’s not okay.  It’s wrong.  And GEDmatch has shown us that their contract with their users is meaningless.  They should change their Terms of Service to read “Anything goes.  You’re on your own.”

Additional Reading

Aldous, Peter.  “The Arrest Of A Teen On An Assault Charge Has Sparked New Privacy Fears About DNA Sleuthing.” Buzzfeed, 14 May 2019.

Parabon NanoLabs “About” page

Reavy, Pat. “Plastic milk container, genealogy helped Utah police crack church assault case.”  Deseret News, 13 May 2019.

Russell, Judy G. “Withdrawing a recommendation.” The Legal Genealogist, 15 May 2019.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

At age 16, Jerry’s mother was working in a canning factory, where she met a young man.  As things happen, things happened.  When the season ended, she returned to her home state a little bit pregnant.  She eventually married and never saw her beau again.

Fast forward 45 years, and Jerry learns the truth:  the man who raised him was not his biological father.  After much denial, his mother eventually confessed and gave Jerry a name:  Paul Gillen (not his real surname).  After searching on his own for years, Jerry took a DNA test and contacted Janet Rinardo Travis and Lynette Bryan.  They are volunteer search angels through the Facebook group Search Squad and also offer professional search services through their company DNA Now.

Within a week, Janet and Lynette had linked Jerry to the Gilland family (note the different spelling).  The Gillands had 10 sons, and one was named Paul.  But the timing of the discovery couldn’t have been worse:  Paul had passed away 3 days prior.  Normally, we could test a child or grandchild of Paul to confirm that he was Jerry’s father, but he had no known children.  The alternative would be to test descendants of each of the other nine brothers to show that they were not Jerry’s father.  The quest seemed hopeless.

Enter providence.  Paul had one living brother left, Uncle Sedric.  For reasons unknown, he had delayed Paul’s cremation, and he gave his blessing for the funeral home to take DNA samples.  Janet had the foresight to ask for both cheek swabs and hair follicle samples (not just cuttings) and delivered them personally to FamilyTreeDNA’s lab in Houston to have DNA extracted.  Paul had been dead for 6 days by then, and the cheek swabs failed, but the lab was able to get useable DNA from the hair follicles. They then ran a standard autosomal analysis.

Things Get Technical

The no-call rate for the DNA analysis was 23%, meaning that only 77% of the possible data could be obtained.  That’s not surprising given that the samples were taken nearly a week after Paul passed away.   But was it enough to confirm whether Paul was Jerry’s biological father?

The data quality wasn’t high enough for Paul’s kit to be included in FamilyTreeDNA’s matching database, so Janet copied the raw data file to GEDmatch.  There, she was able to compare Paul and Jerry directly using the “Autosomal One-to-One Comparison” tool.

The results were not what Janet expected:

A parent and child should share about 3579 cM, whereas Jerry and Paul shared 2588 cM.  That’s the expected amount for full siblings, which is simply impossible given that Paul was older than Jerry’s mother.  What’s more, the largest segment shared by a parent and child is about 281.4 cM (the full chromosome 1), whereas Paul and Jerry’s longest segment was only 80.4 cM.  Finally, a parent and child should share 22 segments (for the 22 chromosomes), and full siblings usually share 40–50 segments.  Paul and Jerry shared 125!  What was going on?

Visualizing Matches

Here’s where it helps to understand a bit about DNA matching and what typical parent–child and full sibling matches look like.  GEDmatch gives the option to visualize the segments that two people share, as opposed to just listing the start and stop points of each match.

Recall that we each have two copies of each autosomal chromosome (autosome).  Because a parent passes on one copy of each, and because the child inherits a second copy from the other parent, a comparison between the two should show that they match on one of their two chromosomes, but not both.  The image below shows the first three autosomes (of 22 total) in a normal parent–child match.

Each chromosome diagram is made up of tens of thousands of color-coded, vertical lines, each line representing one bit of DNA (base pair) that was analyzed.  Yellow bars mean that the mother and child match on one of their two bases at that spot.  We call the yellow regions “half identical”.  Green means that they match on both copies at that spot, which will happen by random chance sometimes in a parent–child match.  And red means that they don’t match on either chromosome.  For a parent–child comparison, the occasional red flecks are errors and can be ignored.

Full siblings, on the other hand, get a mix of DNA from their two parents.  The pattern is quite distinct.

In some spots, they are half identical to one another (yellow) because they inherited the same chunk of DNA from either mom or dad.  In other places, they have large stretches of so-called “fully identical regions” (FIRs) where they match on both chromosomes, because they inherited the same chunk from mom and dad. Those are mostly solid green, as opposed to the scattered green flecks we saw above.  And in yet other spots, they won’t match at all, because one sibling inhered segment copies from, say, mom’s mom and dad’s dad, while the other sibling inherited that region from mom’s dad and dad’s mom.  Non-matching segments appear as red-and-yellow patches, where the red lines are mis-matches and the yellow flecks are coincidental matches.

Artifacts from Artifacts

Jerry and Paul’s match looked like this.

That’s remarkably like the parent–child example above, with one key difference:  there are a lot more red flecks.  Because Paul’s DNA had started to degrade by the time it was sampled and extracted, and possibly because there isn’t as much DNA in a hair follicle sample as in a living cheek swab sample, there are a fair number of errors in the data.  That’s to be expected.

Those errors trick GEDmatch’s algorithm into thinking Paul and Jerry don’t match in small regions where they really do.  That is, the mismatches are artifacts of the post-mortem testing process, not real mismatches.  In short, Paul was Jerry’s father.

The mismatch regions caused the anomalies we saw in the GEDmatch summary of the match, the lower-than-expected total centimorgans, the higher-than-expected number of segments, and the short longest segment.  Each mismatch region reduces the total and breaks up contiguous chromosomes into artificial subsections.

Despite the tragedy of missing Paul by a few days, there is joy to this tale.  The Gillands have invited Jerry to their upcoming family reunion, and they’re thrilled to welcome their newest member, 52 years in the making.

As post-mortem and artifact testing from objects like mailing envelopes and hairbrushes become more common, we will see more instances where the initial interpretation of match data doesn’t tell the full story.  Professionals, search angels, and community leaders will need to educate those they help to avoid misinterpretation of the match data and false disappointment.  In Jerry’s case, an overview of the match suggested that Paul was not his father, while a closer look confirmed that he indeed was.  As of this writing, such SNP-by-SNP comparisons can only be done at GEDmatch.

Expenses and Timeline

For those interested in similar analyses, here is a summary of the costs and timeline associated with Jerry’s search:

  • Day 1:  Jerry asks Janet for help (Cost: none; Janet and Lynette worked as volunteers)
  • Day 5:  Paul died
  • Day 8:  Janet contacts a close Gilland family member
  • Day 11:  DNA samples are taken and delivered to the lab
  • Day 42:  DNA extraction is complete (Cost:  $250)
  • Day 98:  Autosomal DNA results complete (Cost:  $79 or less)

In summary, for less than $350, Jerry was able to confirm with certainty that Paul was his father, even though Paul had passed away shortly before being located.

Costs for DNA extraction from artifacts, like envelopes, or from older biological samples, like bone, will be higher.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Mother’s Day sales on DNA tests have been announced!  Show Mom you love her by linking her to her ancestors.  All announced sales end 13 May, with the exception of Living DNA; they haven’t posted an end date for their sale.

Testing Company Sale Price Best Features Database Size Average Days to Results
AncestryDNA $59 (US) Largest database, Genetic Communities, Tree-DNA integration, ThruLines ≈ 15 million 24
23andMe Health + Ancestry test $169 Haplogroups, FDA-approved health information, Chromosome browser > 10 million 17
MyHeritage test $69 Chromosome browser, Tree-DNA integration, Theories of Family Relativity > 2.5 million 79
Living DNA test $79  British Isles ethnicity estimates unknown 21

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Every year on April 25, America marks National DNA Day to honor both the discovery of DNA’s double-helix structure (published April 25, 1953) and the completion of the Human Genome Project exactly 50 years later.

DNA Sales

DNA Day is a good time to be a genealogist, because DNA tests usually go on sale.  Here are the announced prices so far:

  • AncestryDNA in the US: $69 (normally $99)
  • Living DNA is on sale worldwide, with regional prices as follows: USA:  $59 (normally $99); UK: £59 (normally £99); EUROPE: €69 (normally €109)
    CANADA: $99 (normally $149); AUSTRALIA: $119 (normally $169); NEW ZEALAND: $119 (normally $169); REST OF WORLD: $99 (normally $149)
  • MyHeritage in the US: $69
  • 23andMe has not yet announced DNA Day sales
Fun Facts About DNA
  • Every form of life on the planet uses DNA for its genetic code.  Some viruses use a similar chemical called RNA, but viruses are not considered to be fully alive.
  • Humans have more than 3 billion base pairs (units) of DNA.  Estimates vary, but about 90% of it has no known function.
  • DNA is made of two strands of chemical building blocks called nucleotides that are twisted together like two strings of beads.  While each strand is tightly connected along its own length, the two strands are held to one another with weak, reversible chemical linkages called hydrogen bonds. This diagram shows the two backbones of the helix with the paired nucleotides forming rungs in the middle. The gap along each rung is a hydrogen bond.

    By Jerome Walker,Dennis Myts – Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=1694871

  • Each strand is “complementary” to the other, meaning that the nucleotide A on one strand will pair with the nucleotide T on the other.  Similarly, the nucleotides C and G pair up.  Those pairings are called base pairs.
  • Base pairing allows DNA to be copied and repaired.  If one strand is damaged, the cell can fix it using the pairing rules from the remaining strand.  For example, if one strand reads ACCTG, the complement will read TGGAC.
  • The order in which the nucleotides are strung together on a strand is a code that tells the cell how to behave.  The weak hydrogen bonds can unzip, allowing access to the code inside to turn a gene on, and re-zip to turn it off.
  • The “zippability” (not a real word) of DNA also allows molecular biologists to study it.  For example, we can use heat to separate the two strands so that we can sequence the DNA itself.  Separating the strands this way is called melting (really).
  • The famous DNA structure published by Watson and Crick in 1953 is only one of three known conformations (shape variants) of DNA to occur in nature.  All three are double helixes, but they differ in how tightly twisted the strands are (A-DNA is more compact than B-DNA) and which direction the helix twists (Z-DNA turns the opposite way from A-DNA and B-DNA).  Watson and Crick described B-DNA.  This is what A-DNA, B-DNA, and Z-DNA look like viewed from the side and straight down the shaft: By Mauroesguerroto – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=35919357

     

  • Human DNA is packaged into enormous sections called chromosomes.  The shortest human chromosome, the Y chromosome, is “only” about 57.2 million base pairs long, while the longest (chromosome 1) is nearly 249 million base pairs long.  There is also a tiny circular bit of DNA called mitochondrial DNA that is normally 16,569 base pairs long.
  • We inherit one set of chromosomes 1–22 from each parent, meaning that we have two copies of each.  Those copies should not be confused with the double helix, because each copy is made up of a double helix.
  • The final set of chromosomes is called the sex chromosomes, because they determine biological sex.  Women have two copies of the X chromosome (one inherited from each parent) while men have one X chromosome (from their mothers) and one Y chromosome (from their fathers).

I could go on and on about DNA all day!

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Some time around April 1, 23andMe updated their “About Us” corporate page to say that they have more than 10 million customers.  No, it wasn’t an April Fool’s joke!

How Fast Are the Databases Growing?

23andMe just released new information.  We have recent, official updates for the database sizes of AncestryDNA and MyHeritage from RootsTech in early March.   And while Family Tree DNA has never directly acknowledged the size of their autosomal database, several reports put it at about 1 million.  That puts us in a great position to compare recent growth rates among the testing companies.

Company Time Period Growth Days in Period Growth per Day Relative Rate
AncestryDNA Sep18–Feb19 5,000,000 179 27,933 32
23andMe Feb18–Mar19 5,000,000 400 12,500 14
MyHeritage Dec18–Feb19 100,000 60 1,667 2
FTDNA Sep18–Feb19 110,000 127 866 1
Total 42,966

Across the four databases, nearly 43,000 new people are testing every single day.  Wowzers!

We can also compare the relative growth rates of the companies.  AncestryDNA continues to lead the pack with a database that’s growing more than twice as fast as 23andMe’s23andMe, in turn, is growing 7 times faster than MyHeritage which is growing twice as fast as Family Tree DNA.

I didn’t include GEDmatch in this comparison because they have been purging duplicate kits recently.  As a result, their database appeared to grow much more slowly between January and March 2019 than in reality.  In other words, the previous estimates were inflated by duplicates, which will be less of a problem moving forward.  In the period prior to the purge (May 2019 to January 2019), they were averaging 805 new kits per day.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

AncestryDNA recently added color tags to their New and Improved DNA Matches beta feature (available here if you don’t already have it).  Not only can you color-code your DNA matches for a quick visual overview, you can filter using those tags.ƒ

There are 24 colors to choose from.

But which colors to assign to whom?  That’s entirely up to you, of course!  This post shares a system that’s worked for me.  It follows this color scheme:

My first sort for someone’s DNA match list is into the maternal (pink) and paternal (blue) sides.  Then, I can filter to show only the maternal matches, for example, and in a second cut assign the matches to the her father’s side (fuchsia) and her mother’s side (burnt orange … Hook ‘Em! ). When I can, I carry on by assigning more precise branches to matches.

Using cool colors—blues, greens, violet—for the paternal side and warm colors—reds, yellows, orange—for the maternal allows you to quickly scan through your match list to see who’s who.  The deeper the color, the more refined the sort; that is, a single light blue dot beside a match means you’ve determined only that it’s on your father’s side, while match with forest green has been localized the match to your patrilineal great grandfather’s side.

An individual match can have multiple color dots assigned.  This one has light blue for father’s side, teal for father’s father’s side, and forest green for father’s fathers’ father’s side:

This system uses 14 of the 24 available colors.  You can use the others for any category you like, such as a geographic region, historical event, research status, etc.

If you’d like to add these icons to your tree, as in the screenshot above, you can find them in the public tree here:  https://www.ancestry.com/family-tree/tree/82989698/family.  Each ancestor position has an icon as the profile picture, and all 24 color icons are in the picture gallery for the home person in the tree.

Please share how you’re organizing your matches with the color-tags in the comments.

Read Full Article

Read for later

Articles marked as Favorite are saved for later viewing.
close
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview