Saturday, April 21, 2018

Does the Big Y500 Provide Value?

Along with most of the genetic genealogical world, I discovered Friday that Family Tree DNA had added 450 new Short Tandem Repeat markers (STRs) to our project's Big Y accounts. It was fascinating to look at these markers within my surname project and to try and understand if these results provided any real value. Currently, we have 14 Big Y accounts that are spread across numerous lines and branches of the Owston/ Ouston family that originated in the Ryedale District of what is now North Yorkshire.  You can access the project's charts at

One of the issues with the additional 450 markers is the predominance of no-calls that occurred in the data. In our project, the no-calls varied in number by individual and they ranged from 3 to 35 per person with the average being 11.07. If I remove the outliers of 20 and 35, the average drops to 8.3 no-calls per person. A total of 155 no-calls were reported from 7,854 cumulative markers. This represented 1.97% of the whole. This percentage is less than the reported no calls found in YFull’s analysis of the Big Y’s STR data.

A second issue resolves around the naming of these markers, as it appears that FTDNA has placed proprietary names on many of the markers in Panel 6 (112-561). This makes it difficult to compare with YFull results. Where the naming convention is the same designation as used by YFull, the numbers do not always agree. This is probably based on FTDNA’s counting of the repeats. This was something that those of us who transferred Y-33 and Y-46 results from Ancestry, GeneTree, and other companies to FTDNA had experienced in the past. The same situation may apply here.

Third, several folks on various Facebook groups have noticed that numerous low values appear in the results. This may indicate a lack of variability across participants. The lowest marker value in our project was “4” with 116 modal results. Five repeats also occurred 116 times as the modal value and “6” appeared 65 times. With the new 450 markers, double digit values appeared 69 times as the modal result.

Fourth, while we have not tested many individuals from the same line or branches, we’ve only observed three markers that are genealogically relevant. It is helpful to note that the Cobourg line and the Ganton branch are overrepresented in the study, as it was necessary to check research that was conducted 20 years ago to ascertain if the conclusions reached at the time were valid. More on that later. The relevant markers include DYS631, DYS489, and FTY510.

DYS631 at 11 repeats rather than the modal of 10 is a signature of the Cobourg line of the Sherburn family. All three participants who descend from William Owston (1778-1857) carry this result. This joins the signature value of 11 instead of 12 repeats on DYS643 in Panel Five. Five of the six members of the Cobourg line share this value; the sixth appears to be a back mutation. In addition, the three members of the Cobourg line share the A10921 SNP.

DYS489 at 13 markers as opposed to the modal of 12 is shared by all four members of the Ganton branch of the Sherburn family who have tested with the Big Y. This is the only STR marker that is indicative of this branch, which descends from Thomas Owston (1755-1823) of Ganton, North Yorkshire. These participants also share the A10208 SNP.

FTY510 with 10 repeats as opposed to nine are shared by two seventh cousins, once removed who descend from Richard Owston (c. 1670-1739) from Thornholme in the parish of Burton Agnes in the East Riding of Yorkshire. This family also shares another signature STR: Panel Four’s DYS481 at 25 markers as opposed to the modal of 26. There is also a unique SNP for the Thornholme family – A15739. 
Further Big Y testing may reveal other STR markers with genealogical significance; however, without Panel 6, we’ve had SNPs that were family specific in all three cases and other STRs in two of the three groups.


As we determined with our study that the additional 450 markers provided little genealogical value, do they provide the ability to predict relationships based on genetic distance? I analyzed 91 relationships from among our 14 participants.

Two distinct families who share a common ancestor born in the late 1400s comprise our study: the Sherburn family represents 91% of Owston and Ouston males, while the smaller Thornholme family round out the additional 9%. Two Thornholme participants are represented with Big Y testing.

The exact relationship between the Sherburn and Thornholme families is not presently known; however, by analyzing naming patterns, the closest possible relationships are represented here. There are other possibilities that would place the relationships one to two generations further back in time, but not closer. The two families were familiar with each other and it is believed that both are descended from John Owston who died in 1520. These conjectured relationships are identified with an asterisk.

The Big Y results represent the following relationships:

  2nd Cousins, Once Removed1
  4th Cousins4
  5th Cousins1
  5th Cousins, Once Removed2
  6th Cousins2
  7th Cousins1
  7th Cousins, Once Removed6
  8th Cousins6
  8th Cousins, Once Removed14
  8th Cousins, Twice Removed3
  9th Cousins10
  9th Cousins, Once Removed6
  9th Cousins, Thrice Removed1
10th Cousins, Twice Removed7
11th Cousins, Once Removed3
12th Cousins*1
12th Cousins, Once Removed*8
12th Cousins, Twice Removed*3
12th Cousins, Thrice Removed*1
13th Cousins*7
13th Cousins, Once Removed*3
13th Cousins, Twice Removed*1

The following chart plots genetic distance based on time to the most recent common ancestor (TMRCA). There were 14 instances where one party had a mismatch from the modal results and the comparison individual had a no-call. In these cases, the no-call was treated as having the modal result. This was an arbitrary decision that should not greatly affect the overall results.

The lowest GD was 1 for two fourth cousins, while the greatest GD of 20 was found between pairs of seventh cousins; seventh cousins, once removed; and 13th cousins. One of the parties was an outlier, as he had a value of 5 at DYS602, while the modal value was 12. The dots in the chart below often represent more than one relationship at that TMRCA and GD combination.


As one notices from the above chart, the results vary greatly. While relationship distance increases, GD correspondingly increases; however, there is enough variability from seventh cousins and beyond that the predictability of the Big Y500 based on genetic distance is tenuous at best. Even in the closer relationships, it is impossible to accurately predict a relationship based solely on genetic distance. Fourth cousins have a GD range of 1 to 10.

Because of this and the presence of very few genealogical relevant markers identified for our family, I am hesitant to believe that the addition of these 450 markers provide much value for our purposes. The SNP results in the Big Y and the first five panels in the 111 test should be adequate in determining lineage signature markers. The SNP results are the real value of the Big Y test.

As Kelly Wheaton astutely added on her All Genetic Genealogy Facebook group, “The point that needs underscoring is that STRS change back and forth so their predictive value is less than SNPS that you either have them (and so do all your Y line descendants) or you don't. They are definite and predictive and now with Next Gen sequencing they are useful in a genealogical time frame. Sometimes STRS are helpful but they can also be misleading and have you barking up the wrong tree.”

While this is only one family and one set of results, other projects may reach different conclusions. Furthermore, different haplogroups may provide better return on value. More data will need to be gathered to better ascertain the overall value of the Big Y500.