Big data


It took 13 years to crunch through the 3 billion base pairs that make up the human genome. These data have been violating our assumptions ever since. My introductory biology textbook, published in 1996, speculates that there might be up to 100,000 genes in the genome. It turns out there are a lot less: about 20,000-30,000 by more recent estimates. The Human Genome Project sequenced only a few individuals, and combined all into one genome. However, many of the big questions we have about genetics concerns the differences between individuals.

We are starting to get answers to these questions. In today’s issue of Nature, a paper was published from the 1000 Genome Project, a massive collaborative effort from three continents that is designed to describe and explain the variance between individual’s genomes.

In this work, several types of variance were investigated as independent pilot studies. First, patterns between several mother-father-child trios were examined. Second, a group of 179 people had their whole genomes sequences. Last, more sparse sequencing was done on ~700 people from very diverse genetic backgrounds. While this paper is mostly serving as a progress report, and proof of concept, one very interesting bit is the finding that on average, any given person carries 50-100 gene variants that have been associated with higher risk of illness. This is very reminiscent of last week’s PNAS article showing that possessing such “risky” alleles does not decrease your lifespan to a statistically significant degree.



Durbin, R., Altshuler, D., Durbin, R., Abecasis, G., Bentley, D., Chakravarti, A., Clark, A., Collins, F., De La Vega, F., Donnelly, P., Egholm, M., Flicek, P., Gabriel, S., Gibbs, R., Knoppers, B., Lander, E., Lehrach, H., Mardis, E., McVean, G., Nickerson, D., Peltonen, L., Schafer, A., Sherry, S., Wang, J., Wilson, R., Gibbs, R., Deiros, D., Metzker, M., Muzny, D., Reid, J., Wheeler, D., Wang, J., Li, J., Jian, M., Li, G., Li, R., Liang, H., Tian, G., Wang, B., Wang, J., Wang, W., Yang, H., Zhang, X., Zheng, H., Lander, E., Altshuler, D., Ambrogio, L., Bloom, T., Cibulskis, K., Fennell, T., Gabriel, S., Jaffe, D., Shefler, E., Sougnez, C., Bentley, D., Gormley, N., Humphray, S., Kingsbury, Z., Koko-Gonzales, P., Stone, J., McKernan, K., Costa, G., Ichikawa, J., Lee, C., Sudbrak, R., Lehrach, H., Borodina, T., Dahl, A., Davydov, A., Marquardt, P., Mertes, F., Nietfeld, W., Rosenstiel, P., Schreiber, S., Soldatov, A., Timmermann, B., Tolzmann, M., Egholm, M., Affourtit, J., Ashworth, D., Attiya, S., Bachorski, M., Buglione, E., Burke, A., Caprio, A., Celone, C., Clark, S., Conners, D., Desany, B., Gu, L., Guccione, L., Kao, K., Kebbel, A., Knowlton, J., Labrecque, M., McDade, L., Mealmaker, C., Minderman, M., Nawrocki, A., Niazi, F., Pareja, K., Ramenani, R., Riches, D., Song, W., Turcotte, C., Wang, S., Mardis, E., Wilson, R., Dooling, D., Fulton, L., Fulton, R., Weinstock, G., Durbin, R., Burton, J., Carter, D., Churcher, C., Coffey, A., Cox, A., Palotie, A., Quail, M., Skelly, T., Stalker, J., Swerdlow, H., Turner, D., De Witte, A., Giles, S., Gibbs, R., Wheeler, D., Bainbridge, M., Challis, D., Sabo, A., Yu, F., Yu, J., Wang, J., Fang, X., Guo, X., Li, R., Li, Y., Luo, R., Tai, S., Wu, H., Zheng, H., Zheng, X., Zhou, Y., Li, G., Wang, J., Yang, H., Marth, G., Garrison, E., Huang, W., Indap, A., Kural, D., Lee, W., Fung Leong, W., Quinlan, A., Stewart, C., Stromberg, M., Ward, A., Wu, J., Lee, C., Mills, R., Shi, X., Daly, M., DePristo, M., Altshuler, D., Ball, A., Banks, E., Bloom, T., Browning, B., Cibulskis, K., Fennell, T., Garimella, K., Grossman, S., Handsaker, R., Hanna, M., Hartl, C., Jaffe, D., Kernytsky, A., Korn, J., Li, H., Maguire, J., McCarroll, S., McKenna, A., Nemesh, J., Philippakis, A., Poplin, R., Price, A., Rivas, M., Sabeti, P., Schaffner, S., Shefler, E., Shlyakhter, I., Cooper, D., Ball, E., Mort, M., Phillips, A., Stenson, P., Sebat, J., Makarov, V., Ye, K., Yoon, S., Bustamante, C., Clark, A., Boyko, A., Degenhardt, J., Gravel, S., Gutenkunst, R., Kaganovich, M., Keinan, A., Lacroute, P., Ma, X., Reynolds, A., Clarke, L., Flicek, P., Cunningham, F., Herrero, J., Keenen, S., Kulesha, E., Leinonen, R., McLaren, W., Radhakrishnan, R., Smith, R., Zalunin, V., Zheng-Bradley, X., Korbel, J., Stütz, A., Humphray, S., Bauer, M., Keira Cheetham, R., Cox, T., Eberle, M., James, T., Kahn, S., Murray, L., Chakravarti, A., Ye, K., De La Vega, F., Fu, Y., Hyland, F., Manning, J., McLaughlin, S., Peckham, H., Sakarya, O., Sun, Y., Tsung, E., Batzer, M., Konkel, M., Walker, J., Sudbrak, R., Albrecht, M., Amstislavskiy, V., Herwig, R., Parkhomchuk, D., Sherry, S., Agarwala, R., Khouri, H., Morgulis, A., Paschall, J., Phan, L., Rotmistrovsky, K., Sanders, R., Shumway, M., Xiao, C., McVean, G., Auton, A., Iqbal, Z., Lunter, G., Marchini, J., Moutsianas, L., Myers, S., Tumian, A., Desany, B., Knight, J., Winer, R., Craig, D., Beckstrom-Sternberg, S., Christoforides, A., Kurdoglu, A., Pearson, J., Sinari, S., Tembe, W., Haussler, D., Hinrichs, A., Katzman, S., Kern, A., Kuhn, R., Przeworski, M., Hernandez, R., Howie, B., Kelley, J., Cord Melton, S., Abecasis, G., Li, Y., Anderson, P., Blackwell, T., Chen, W., Cookson, W., Ding, J., Min Kang, H., Lathrop, M., Liang, L., Moffatt, M., Scheet, P., Sidore, C., Snyder, M., Zhan, X., Zöllner, S., Awadalla, P., Casals, F., Idaghdour, Y., Keebler, J., Stone, E., Zilversmit, M., Jorde, L., Xing, J., Eichler, E., Aksay, G., Alkan, C., Hajirasouliha, I., Hormozdiari, F., Kidd, J., Cenk Sahinalp, S., Sudmant, P., Mardis, E., Chen, K., Chinwalla, A., Ding, L., Koboldt, D., McLellan, M., Dooling, D., Weinstock, G., Wallis, J., Wendl, M., Zhang, Q., Durbin, R., Albers, C., Ayub, Q., Balasubramaniam, S., Barrett, J., Carter, D., Chen, Y., Conrad, D., Danecek, P., Dermitzakis, E., Hu, M., Huang, N., Hurles, M., Jin, H., Jostins, L., Keane, T., Quang Le, S., Lindsay, S., Long, Q., MacArthur, D., Montgomery, S., Parts, L., Stalker, J., Tyler-Smith, C., Walter, K., Zhang, Y., Gerstein, M., Snyder, M., Abyzov, A., Balasubramanian, S., Bjornson, R., Du, J., Grubert, F., Habegger, L., Haraksingh, R., Jee, J., Khurana, E., Lam, H., Leng, J., Jasmine Mu, X., Urban, A., Zhang, Z., Li, Y., Luo, R., Marth, G., Garrison, E., Kural, D., Quinlan, A., Stewart, C., Stromberg, M., Ward, A., Wu, J., Lee, C., Mills, R., Shi, X., McCarroll, S., Banks, E., DePristo, M., Handsaker, R., Hartl, C., Korn, J., Li, H., Nemesh, J., Sebat, J., Makarov, V., Ye, K., Yoon, S., Degenhardt, J., Kaganovich, M., Clarke, L., Smith, R., Zheng-Bradley, X., Korbel, J., Humphray, S., Keira Cheetham, R., Eberle, M., Kahn, S., Murray, L., Ye, K., De La Vega, F., Fu, Y., Peckham, H., Sun, Y., Batzer, M., Konkel, M., Walker, J., Xiao, C., Iqbal, Z., Desany, B., Blackwell, T., Snyder, M., Xing, J., Eichler, E., Aksay, G., Alkan, C., Hajirasouliha, I., Hormozdiari, F., Kidd, J., Chen, K., Chinwalla, A., Ding, L., McLellan, M., Wallis, J., Hurles, M., Conrad, D., Walter, K., Zhang, Y., Gerstein, M., Snyder, M., Abyzov, A., Du, J., Grubert, F., Haraksingh, R., Jee, J., Khurana, E., Lam, H., Leng, J., Jasmine Mu, X., Urban, A., Zhang, Z., Gibbs, R., Bainbridge, M., Challis, D., Coafra, C., Dinh, H., Kovar, C., Lee, S., Muzny, D., Nazareth, L., Reid, J., Sabo, A., Yu, F., Yu, J., Marth, G., Garrison, E., Indap, A., Fung Leong, W., Quinlan, A., Stewart, C., Ward, A., Wu, J., Cibulskis, K., Fennell, T., Gabriel, S., Garimella, K., Hartl, C., Shefler, E., Sougnez, C., Wilkinson, J., Clark, A., Gravel, S., Grubert, F., Clarke, L., Flicek, P., Smith, R., Zheng-Bradley, X., Sherry, S., Khouri, H., Paschall, J., Shumway, M., Xiao, C., McVean, G., Katzman, S., Abecasis, G., Blackwell, T., Mardis, E., Dooling, D., Fulton, L., Fulton, R., Koboldt, D., Durbin, R., Balasubramaniam, S., Coffey, A., Keane, T., MacArthur, D., Palotie, A., Scott, C., Stalker, J., Tyler-Smith, C., Gerstein, M., Balasubramanian, S., Chakravarti, A., Knoppers, B., Abecasis, G., Bustamante, C., Gharani, N., Gibbs, R., Jorde, L., Kaye, J., Kent, A., Li, T., McGuire, A., McVean, G., Ossorio, P., Rotimi, C., Su, Y., Toji, L., Tyler-Smith, C., Brooks, L., Felsenfeld, A., McEwen, J., Abdallah, A., Juenger, C., Clemm, N., Collins, F., Duncanson, A., Green, E., Guyer, M., Peterson, J., Schafer, A., Abecasis, G., Altshuler, D., Auton, A., Brooks, L., Durbin, R., Gibbs, R., Hurles, M., & McVean, G. (2010). A map of human genome variation from population-scale sequencing Nature, 467 (7319), 1061-1073 DOI: 10.1038/nature09534