IrelandDavis.com Home

atDNA Case Study: Four Siblings

Charts: updated - 02 Apr 2017
Text data: updated - 02 Apr 2017

See also:

Caveat

GI's AncestryDNA test was taken using their former v1 chipset. GP, SI and SM's tests were taken using the current v2 chipset. This could be a partial explanation for GI's significantly larger number of matches. His three sisters' match counts are clumped more tightly together.

Key Questions

  1. How much more do we learn by testing multiple siblings rather than one? Four siblings versus three?
  2. How do we prioritize which matches are important for further study?
  3. How useful are distant matches (<10cM)?

Comparing the autosomal DNA (atDNA) results from multiple siblings provides a good insight into the strengths and weaknesses of the test - because full siblings have the identical genetic relationship to each of their relatives, regardless of what atDNA estimates show. Each sibling inherits half of their atDNA from their father and half from their mother, but it's a random 50%. On average, each sibling inherits about half of the atDNA from each parent as any one other sibling - and half that the other does not.

In this case study, I present a detailed analysis of the AncestryDNA test results (an atDNA test) of my father and his three sisters: GI, GP, SI and SM. We'll examine how the inheritance of different DNA segments (or partial segments) result in the appearance of different relationships between family members - even though we know the genetic relationships are actually identical. For example, the third cousin of one full sibling is always the third cousin of a second full sibling, regardless of what the DNA estimates show.

Larger matching DNA segments mean that the accuracy of the match is almost certain. Small DNA segments sometimes align for other reasons. One of the AncestryDNA strengths is that they do a good job of eliminating many false, small-segment matches. Special attention in this analysis is paid to the differences among siblings matches that are close matches (shared atDNA of more than 20 centimorgans - cM), moderate (10-20 cM), or distant (6-10 cM).

Key Findings

  1. When atDNA test results aren't available for both parents, test as many siblings as possible. Results from a fourth sibling still add a significant amount of key information compared to testing three.
  2. By testing four siblings, we were able to identify many more close matches than by testing just one sibling - or even two or three. Each sibling added several hundred close matches that weren't identified as close by any of the other three.
  3. Out of 2,128 close matches, only 112 were identified as close for all four siblings.
  4. 1,100 (not rounded) of the close matches were close to only one of the four siblings - and each sibling has from 193 to 462 of those.
  5. Although most researchers rightly ignore distant matches, identifying distant matches that appear among multiple siblings can be useful - especially if the matches are moderate or close for another sibling.

Close Matches

Many researchers prefer to focus only on the "close" matches that they identify. Researchers generally consider two people who share 20cM or more of atDNA across one or more chromosome segments to be a "close" match. These matches are the most certain - and the most common recent ancestors can often be identified. This is typically a lower bound of the atDNA shared by two fourth cousins - but many other relationships can result in a similar match (such as double 6th cousins or a third cousin once removed.) Many fourth cousins match at a lower level, and some don't match at all. The 20cM marker, though, is used by many as the lowest level of match that they take the time to track.

Because full siblings have the identical genetic relationship to each other, any match who is found to be a close match to one sibling should be evaluated as a close match to all of them.

Close match questions 1-3 are addressed in the first chart below.

Q1 - Close Match: How many unique individuals matched as "close" to one or more of the four siblings?

The four siblings had close matches to their cousins 3,728 times (this counts each match to a sibling, so matches to multiple siblings are counted multiple times.) After removing duplicates, one or more sibling matched as close to 2,128 unique individuals tested on AncestryDNA.

Q2 - Close Match: How many close matches would have been identified if only one sibling had been tested?

For three of the four siblings, about 840. Only GI has more than 1,200 close matches (may be in part due to the AncestryDNA methodology change). The percentage of known close matches ranges from 22% to 32% when looking at just one sibling.

Genetics theory tells us that on average each sibling inherits about an equal amount of atDNA from each of their four grandparents. However, the DNA mix is always different - and sometimes that can result in significant differences. That's very clear among my father and his three sisters. We expect all four siblings to have about the same number of close matches, and the three sisters did. GI, though, had a much larger number of close matches than any of his sisters. He apparently inherited a few key DNA segments that match to many people that his sisters did not - or the difference may result from the change in AncestryDNA. However, each sister also inherited segments that their brother GI (and that their two other sisters) did not.

Q3 - Close Match: When one sibling did not match as "close" while another one did, how did that sibling match?

Each sibling did not match at all to hundreds of matches that another sibling matched to as "close." A smaller percentage had a "moderate" match (>=10cM to 20cM) to their siblings' close matches.

A relatively small number of matches showed up as "distant" if another sibling was close (slightly over 100 matches each.) It was far more common to not match at all than to match as distant.


Q4 - Close Match: How many individuals were matched as "close" to all 4 siblings? Close to 3? Close to 2? Close to only 1?

Remembering that GI, GP, SI and SM are the initials of the four siblings:

  • Only 112, or 5%, individuals matched as close to all four siblings.
  • About 16% of the close matches are matched as close to three of the four siblings.
  • About 27% of the close matches are matched as close to exactly two of the four siblings.
  • Most importantly, 52% of the close matches are close to just one sibling.

Q5 - Close Match: For each close match by sibling, what is the match distance for the other three?

This diagram displays four separate charts. Each chart examines the close matches of a sibling independently of the others. The first section, for example, focuses on the 1,201 individuals matched closely to GI and details how closely they match to his sisters in every range.

Note how each category (close, moderate and distant) has been broken down one extra level by cM range.

  • Regardless of sibling, most of the close matches are between 20 to 40cM, not >=40cM.
  • Regardless of sibling, each close match is more likely to not match another sibling at all than to match them distantly.

Findings: Since GI, SI, SM and GP are full siblings, they all have the same genetic relationship to all of each other's matches. Because they inherit different portions of each of their parents' atDNA, however, the estimates of their relationships to their matches can vary significantly. Hundreds of cousins who show as being closely related to one or more siblings may show as being distantly related to another - or not match at all.

Conclusions: even when examining only "close" matches, it's still very important to get results from as many additional siblings as possible.


Total Matches

The four sibling have a total of 89,535 matches. After removing the duplicates, we can identify 48,701 unique individuals who match to one or more of the siblings.

Q1 - Total Matches: How do match counts break down between close, moderate and distant matches?

Close matches (dark blue) represent only a fraction of the total matches of each sibling - although as we learned above, the known number of 2,128 close matches is almost two to three times that shown for a single sibling.

Most matches are distant matches. Researchers will usually capture them, but not track them unless they match to other siblings or close cousins.

Q2 - Total Matches: Do most individuals who match to one sibling also match to another?

Not generally - although it varies by the distance of the match. For each sibling overall, there are about 21,000-30,000 individuals that one or more of their siblings matches to that they do not.


Q3 - Total Matches: For each match by sibling, what is the match distance for the other three?

As in the close match analysis above, this diagram is four charts. For each sibling, it breaks down all of their matches by distance. It then shows a summary of how those matches related to each of the other three siblings.

  • Close matches are a very small part of the overall set of matches.
  • Each sibling doesn't match thousands of individuals that one of their siblings does.

Q4 - Total Matches: How often do all four siblings match to one individual?

The number of times all four siblings match to one individual is quite low: 2,179 out of 48,701 - just 4.5%. I've broken that number down into three groups:

  1. At least one of the four siblings was a "close" match (>=20cM)
  2. None of the four were close, but at least one was a "moderate" match (10 to 20cM)
  3. All four matches were distant.

I haven't drawn any particular conclusions from the data, but thought it was interesting.


Q5 - Total Matches: How often are individuals matched to only one of the four siblings?

About 44% of the unique individuals matched to only one of the four siblings. Unsurprisingly, 17,815 of the 21,748 singleton matches were "distant" (<10cM) - 82%. Other observations:

  • Only one singleton match over 40cM. For every other case, if one sibling matched >= 40cM, another matched at some level (although perhaps not as a close match).
  • SM has a surprisingly high number of 20-40cM close matches - more than GI, who has many overall matches than SM.

The following table presents the data that appears in the chart below it.

 


Ethnicity

Alhough all four full siblings have the same genealogical background and inherit 50% of their atDNA from each parent, it's a randomly different 50% for each sibling. As a result they each inherit a somewhat different set of ethnicity information. That becomes apparent when reviewing their ethnicity estimates.

GP's results were the most different. She had significantly less "Europe West" but more "Ireland" and "Great Britain." She even appears to have a stray bit of "Polynesia" thrown in, although such a trace amount is probably a testing blip.

Click the image to enlarge it in a new browser tab - then close the tab when finished.


Research Tools

This analysis would not have been possible without Rob Warthen's excellent data analysis tool, DNAGedcom. Access to the Windows or Mac client app (that I use to download detailed AncestryDNA data) costs $5/mo or $50/year. For more information, visit the DNAGedCom website.

I use Microsoft Access to import DNAGedcom data files for the four siblings, identify unique matches among the four siblings, and compare match distances among them. After downloading the DNAGedcom files, it takes only minutes to update and refresh all of the Access queries. I then use Microsoft Excel to point to the Access datasets and update the charts and graphics.

The initial effort to create the Access queries and Excel charts wasn't small, but now I can update all of the data and charts in about 15 minutes.

 


This website is under construction. Thanks for your patience!

Website under construction

All of the links below are old and will be replaced.

Please use the buttons across the top.

Interesting Links


Grandma Ireland's 100th Birthday