Skip to content

The story behind the paper “Cross-Disciplinary Evolution of the Genomics Revolution” in Science Advances

August 15, 2018
Biology-Computing College in the United States

Evolution of the faculty collaboration network in 155 biology and computing departments in the United States. In 1990, the network is fragmented with small groups of biologists (green) or computer scientists (magenta) working in isolation. There are very few genomics scholars (black). By 2015, there is a remarkable transformation: Almost all faculty is interconnected in a giant component, which manifests a team science environment. The most prominent nodes in the network are the cross-disciplinary genomic nodes (large black discs).

 

I do not remember exactly when the concept of the genomics study formed in my mind – it must have been sometime in 2015 after our science policy publication in Nature Physics. I am a cross-disciplinary person by training, who has been pursuing cross-disciplinary research all along. I have been thoroughly enjoying the ride, but I also had a first hand account of how challenging such career path could be. The challenge is not only intellectual, as one has to master more than one disciplines – mainly the challenge is social and professional. A cross-disciplinary researcher has to deal with more than one academic cultures, sooner or later  is viewed with suspicion by all the `tribes’ s/he is interacting with, and even in success s/he does not get the credit s/he deserves. For the latter, a case in point was my transition from corporate to academic research back in 2002. By that time, I had publications in Nature, Lancet, and the New England Journal of Medicine. Yet, this meant nothing to the search committees of computer science departments, and I ended up in a rather low ranking department.

 

In this difficult landscape, I was observing with some surprise the relative harmonious relationship of some of my bioinformatics colleagues with biologists and the success they were enjoying in the 2000s and 2010s. I was also in awe with the rapid advancements in genomics, following the Human Genome Project (HGP). In this respect, I could think of no other government funded research program that had such a catalyzing impact.

 

By the early 2010s, it was evident that funding agencies were consciously gearing up towards more cross-disciplinary programs to address grand challenges. The most apt example was the brain initiative, but it was not the only one. Wherever there is `honey there are bees’. Hence, scholars from different disciplines were trying to bundle together to have a chance in their solicitation bidding. My experience and my observation, however, was that these cross-disciplinary collaborations were shallow and non-lasting. Slowly in my mind was growing the question what makes some cross-disciplinary teams persistently tick, while most others have the permanency of a sand castle, if at all.

 

I talked to my  fried Alex Petersen who is much more educated than me in science of science matters, and he kindly agreed to get involved in an ambitious investigation into the mystical cultural forces of genomics. We were betting that such a study would shed light to the broader cross-disciplinarity question. I was also fortunate enough to have two talented Ph.D. students, Dinesh and Karl, who worked well together. So, we embarked on an adventurous intellectual journey – did I mention without any funding? I did not have any concrete ideas about how to analyze the data, but thankfully Alex did. I had, however, strong opinions about where to find the data. I suggested we look into the profiles of all faculty in biology and computing departments in the United States. Not all the people engaged in genomics research were in these two types of departments, but many were, forming a reasonable sample to test our hypotheses (or so we thought). Importantly, nobody could argue about the original disciplinary orientation of these faculty, as academic departments tend to hire their own kind – a truth that I painfully experienced. I also insisted on collecting bibliographic data via Google Scholar, rather than Scopus or Web of Science. I was afraid of the backlash from computing scholars who consider Scopus and Web of Science as under-representing their conference paper portfolios and associated citations.

 

The basic hypothesis was that collaboration between biologists and computing scholars on genomics grounds was strong and fruitful, producing impactful science and career benefits for all parties involved. If this were true, then certainly genomics was a bright exception, and the underlying reasons of its success could serve as a beacon to science policy that was treading water on other cross-disciplinary efforts.

 

Very soon, however, the enormity of the undertaking hit home. The only way to collect faculty members from each relevant department was to go to departmental home pages and record every faculty member under the `People’ tab. Furthermore, Google Scholar did not have an Application Programming Interface (API), which meant that the bibliographic data had to be scraped through sophisticated scripting. Regarding funding data only NIH and NSF had decent public grant tables that could be harvested for data analytics. Linking scholar names between all these data sources required a fair amount of name disambiguation. My Ph.D. students, Dinesh and Karl, had superb algorithmic and software engineering skills and took care of everything, but it still took several months to have a fully curated dataset for about 80 biology and computing departments in the United States.

 

Alex developed an analysis method based on network theory, producing a stunning graph with clear evidence of widespread and highly successful cross-disciplinarity in the biology-computing college. We wrote the manuscript and sent it to Nature Biotechnology. To my surprise it was not rejected editorially, which meant that the concept was indeed appealing. It took, however, about six months to clear the review process, a delay that literally wreak havoc in our thinly resourced effort. The reviews that came back were split, but the editor chose to side with the negative ones. The main criticism was that the dataset was small and there was not enough statistical analysis. Both points were true.

 

We were really traumatized by the data collection effort, and for this reason we focused on the statistical analysis part to improve the paper. Alex developed in addition to his network analysis method, a cross-sectional model that was showing cross-disciplinarity to pay off career dividends, significantly boosting reputation. The manuscript underwent  total re-writing, incorporating also the new analytic results. This time we sent the revised paper to PNAS. In a positive sign, the manuscript passed editorial screening and was sent for peer review. The reviews came back in three months featuring a litany of complaints. The only point everybody agreed on was that this was a well motivated research. Reviewers again pointed to the limited size of the set. To this, they added that the dataset was bound to the United States only, and they wanted to see what happened at the international stage, too. They also thought that the cross-sectional analysis was not enough. Additional time series analysis was required and at a much more detailed level.

 

It was late 2016, we were over a year in this research effort, with no funding, and no easy way out. I started applying to the NSF Science Policy program using the pilot results we had to our avail as our `credentials’. In parallel, we agreed to undertake a radical revision of our research effort. By that time, Karl had graduated, so it was only me, Alex, and Dinesh in the team. We decided to double the size of the US biology-computing dataset, reaching a number close to 160 departments and their faculty. Dinesh took care of this. Alex developed in addition to the network and cross-sectional model, a panel model for time series analysis at the publication level. He also brought additional datasets from the international literature that he acquired from the Web of Science. By construction, these datasets were not as detailed as the one we built from scratch for the US biology-computing college, but were sufficient for validation checks.

 

You know, many times you are in a research endeavor, you love the concept, but you have a lingering doubt about the results. You wonder: Will the results scale up? Are they true or are we looking into a `phantom’? I stopped having these doubts, after the initial results stood their ground despite the doubling of the biology-computing dataset. Not to mention that validity checks from the international literature were pointing in the same direction, and the panel analysis was in agreement, too. It was true then: Cross-disciplinarity was a constitutional element of genomics and we documented one key reason – genomics was a career maker for both the biologists and the computer scientists  involved. 

 

A new manuscript was drawn, which we sent back to PNAS with a point by point response to the original reviews. We felt upbeat, and we were hoping that the reviewers would appreciate our improvements in response to their initial critical comments. A few months later, the new reviews came back. The reviewers were recognizing the progress that was made, but they thought we did not cross the threshold. They pointed that the network analysis was static, looking only at the current state of the network, but not its evolution. They also thought that it was not clear where genomic cross-disciplinarity ended and some other type of cross-discplinarity started in the biology-computing college. These were fair points. They also had some unfair points – they apparently misread some issues, due to the non-optimal organization and terminology of the manuscript at that point.

 

Fair or not, the result was devastating, as we were crossing two years in this unfunded research effort, having nothing in our hands despite mighty efforts. There were more bad news as our NSF proposal almost made it, but not quite. The program manager told me that it was rated Highly Competitive, but could not fund it in that cycle due to the lack of funds. She encouraged me to revise a couple of minor points and resubmit. Surprisingly, the same proposal that was lauded in the first round, got mixed reviews the second time.

 

Sometimes, when your psychology hits rock bottom, you hang in the balance, ready to abolish everything. You hear from colleagues bragging about this paper or the other paper that got accepted in that conference or the other after a two or three month superficial effort. You know that several of these papers are BS, because out of curiosity you start reading them, but you stop half way through, as by this time you are so accustomed to perfection and high standards that you become BS-resistant. This experience simply magnifies your depression. Then, the annual evaluation time comes in your department, and the only thing you have to show is half a dozen rejections the last two years and an almost suicidal effort to keep improving a research with a product that is kept being rejected, because the bar is kept being placed higher and higher. You have to resort to the meditations of Marcus Aurelius to make it through these dark times.

 

After some hesitation, we decided to double down. In 2017, a new Ph.D. student, Emtiaz, with graph theory background joined my lab. I asked him to transform the network analysis from static to dynamic, covering the entire period of observation from the start of the Human Genome Project (HGP) up to now. Alex took a second look on the entire methodological part of the paper tightening up everything. Dinesh updated all our datasets, as by now we were well into the third year of this research, and some of the original publication/citation/funding data were outdated. I re-wrote the entire manuscript from scratch, incorporating all the new data and results. I was parsing every word, sometimes for days. I came up with a very clean and consistent terminology. It was a total face lift. The new paper  looked pretty good, but by this time we were battered so badly, that we did not hope for anything. We were doing what we were doing simply for us and to satisfy our internal sense of perfection.

 

We sent the manuscript to Science Advances. It was sent for review and the answer came a couple of months later – it was major revision. At long last, there was light at the end of the tunnel. The reviewers and the editors liked the manuscript but had several difficult questions.  They wanted even harder proof of the cross-disciplinarity advantages and importantly, they wanted us to connect the abstract science of science numbers with the real people and the breakthroughs they managed to pull together. The latter was an excellent point that came into focus for the first time. Apparently, the reviewers had roots in sociology and were viewing things through a different lens – a better lens for that matter.

 

Alex came up with a brilliant plan to enhance the panel modeling with counter-factual matching. Hence, we now had three levels of inferencing: (a) cross-sectional inferencing across careers; (b) panel inferencing within careers; and, (c) matched pair inferencing comparing cross-disciplinary with mono-disciplinary papers of the same genomics author from approximately the same `harvest period’. The impact advantage of cross-disciplinarity was manifest in all three levels of analysis.

 

On my end, I looked into the data, as organized by our models. I began the tracing from the HGP papers in 2000, following the original scholars and how they collaborated in subsequent years with authors from the biology-computing college. An intriguing picture was coming into view – the consortium model that gave us the human genome did not go away. New consortia kept forming, seeded with some of the original HGP authors and populated with new scholars, unlocking the genomes of important fauna and flora: the chimpanzee genome consortium, the chicken genome consortium, the rice genome consortium, and on, and on. Our algorithms, trained to paint green the scholar nodes  with biology pedigree and magenta the scholar nodes with computing pedigree, were producing a beautiful multi-color network picture for these highly published and highly cited papers. Genomics was blooming in the 2000s under a new science organizational model with staying power, and with disciplinary boundaries replaced by disciplinary bridges. I have heard some people over the years  leveraging criticism against genomics. No matter what somebody’s misgivings, nobody can take away from genomics the surprisingly unnoticed revolution that it brought to cross-disciplinary collaborations. And, whoever does not appreciate this, s/he has not been burnt into the cross-disciplinary `killing fields’ of times past, to realize how great of an accomplishment this is, and the promise it holds for the future of science.

 

In the meantime, I had talked to the NSF program manager about the inconsistency in the reviews between the two panels. She was sympathetic and willing to give us a chance. We got a small EAGER award, where we promised to finish up our genomics investigation and follow-up with an investigation on the cross-disciplinary forces in brain science,  using similar methods. It appears that all bad things happen together, but so are the good things. I promised a party and everybody said is coming; even Alex from California … by car … to Houston, Texas!

 

The paper can be accessed at: Petersen et al., Sci. Adv. 2018;4: eaat4211

The outreach site can be accessed at:  Outreach for Sci. Adv. 2018;4: eaat4211

 

 

Advertisements
Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: