I don't necessarily disagree with this guy's take, but what's with his writing style and attitude? That said, he's hardly the first person to point out that we're generating a lot more data than we know what to do with, and that we are also generating a lot of useless data sets from which meaningless conclusions are often drawn. But, personally, I'm sick of the style of writing that's so prevalent on the internet. Instead of "I disagree and here's why", its always "fuck off and fuck you and everyone who agrees with you". Its a tired act that isn't entertaining, funny or valuable. Is this guy a smart person with something to say? Who knows. Why not prove it by offering an alternative to business as usual? My gut response to this article (and bear in mind that I agree with his main point) is "fuck off, douche bag." It solves nothing to talk that way. In science we need debate and reason, not mud slinging.
I'm so conflicted here. This is rant-worthy stuff, and I also agree. Hell, we've paid for these kind of analysis. I would love to hear some specifics, however. Biology is dirty, confusing, and complicated. I think textbooks do an incredible amount of harm by drawing diagrams of cellular systems as if things actually worked like they suggest. Genes don't do things, and proteins don't have specific functions. Journals love to say that they do, however. Everything is contextual and a part of multiple feedback loops. Bioinformatics can be valuable when applied in that light. However, using bioinfomatics to find parts of an intricate machine is dumb. I want to know what Fred Ross' expectations are. That said, most bioinformatics is based on unjustified expectations.
I'm curious, do bioinformaticists just not ask the right questions? Are people just sequencing, building (potentially faulty) genomes, tools to depict and search them, and then just abandoning them? The author of the link did little to give specifics of where scientists in the field were failing, sadly.
IMO that's not just the problem of bioinformatics, but much of biology in general. TBH I think some of the language we use in publications is absurd; but, I have been guilty of it. We speak with too much certainty, and don't use contextual terms enough. If you find that a protein does something in one context, you should report that the protein does it in that context. Don't say that is what the protein does, and please, don't name it for the context in which you found it in. I work with miRNA a lot, and one thing I love about miRNA, is that they each are just given a number. It's so liberating and much more honest IMO. Thanks for the post, BTW.I'm curious, do bioinformaticists just not ask the right questions?
I disagree, sometimes gene names can be a little over the top, but protein names tend to be pretty accurate (Kinase target, structure isomerase, etc) and in general a lot of the language makes sense in the context of the history of the research. Given the precedent already set by past research, I'd say that returning to a flat numbering scheme would be counter-productive and just complicate discussion over them. It's at least easier to add a note when talking about a protein if it's function differs from its name. Oftentimes researchers come up with pretty new methods / patterns that require new / descriptive names. Yeah, it means you have to spend time learning the language if you want to be able to understand, but wuzzy language hurts communication, and domain-specific vocabulary are pretty common among most research fields (Try reading a math publication with only a bio background). There is something to say about language duplicated from more general fields (Re-inventing math terms in biology / chem / etc), but that's easy to criticize, but hard to fix, given that scientists will often hide their newest projects from others, preventing them from gaining the comments from more knowledgeable peers on what to call it. Then you just have peer-reviewers as the safety net, and things fall past them all the time. How common is it to have defined functions (Catalysis, binding certain chemistries, etc) when it comes to miRNA? Oh, no, thank you all for the discussion, I love getting to talk with others about hardcore science :DTBH I think some of the language we use in publications is absurd; but, I have been guilty of it.
I work with miRNA a lot, and one thing I love about miRNA, is that they each are just given a number. It's so liberating and much more honest IMO.
Thanks for the post, BTW.
Molecular biology is the real problem, as he correctly states. Too much data with no way to make sense of it. Most studies are devoid of any deeper meaning than just the very specific technique that is reported about. More data collected on better machines is what is considered good science right now. Lots of time and money wasted.
I am admittedly in the minority of thinkers on this. Here is how the typical molecular biology study goes these days: 1) We have a disease model. 2) In our model, we have found that gene/protein X is altered. 3) We restored expression/function/activity of gene/protein X to a more physiological level. 4) We were able to mitigate the disease state to some extent. Most disease models are non-physiological, for starters. Second, and most important, only specific genetic diseases are caused purely by a gene being out of whack. Most diseases are caused by a combination of genetic and environmental factors, and their etiologies are very complex. All we do with most studies is to try to put fingers in the dike. Unfortunately, the grant system and the academic literature system is most well equipped to score these types of studies. NIH wants logic, technical ability, and sensitive measurement, without much regard for utility. Fancy machines make great slides to show Congress. There is nothing more torturous than sitting through a talk where a researcher describes their "exciting new microarray results." If you want an actual specific example, go to PubMed and just start taking a peek around.
I don't follow...sure, a lot of research is medically inspired, but that threads off plenty of research into the entirety of the workings of cells. I'm right now reading about kinetic studies of protein folding as a factor in formation of protein aggregates, which may or may not be linked to a number of diseases, including ALS, Alzheimer's, and drug resistance. Then you have entire pathways of metabolism being studied to aide in nutrition and obesity studies (And vice versa). Sure, but you still have larger categories that can be slightly more distantly linked to general diseases. (Ex: Tumor-suppressor genes and cancer) This is where you've lost me, does medical application or biosynthesis not count towards utility? Citing one would be preferable, I'd say that the majority of the microbiology papers I've read thus far made sense in some context or another. Some's shite, but higher tiered journals are generally a little better at filtering for that, and who am I to judge if someone's research is worthless? :Ponly specific genetic diseases are caused purely by a gene being out of whack
NIH wants logic, technical ability, and sensitive measurement, without much regard for utility.
If you want an actual specific example, go to PubMed and just start taking a peek around.
I've reached out to Frederick and he said that if anyone has any questions for him regarding this piece they can email him. He seems a nice fella. I'll pm you his email address. If anyone else is interested in asking him a question, please let me know.
I agree. The author certainly sounds like he's had some negative experiences in that field. It's true that a lot of times the 'bioinformatics component' of a research project consists entirely of a phylogenetic tree generation and a multiple sequence alignment. It's also true that at some point some key people and research groups stopped mining their data for informative results and simply started accruing massive datasets, hoping that someone else (bioinformaticists, I guess) would have the mindset and/or develop the tools necessary to do meaningful analyses of those datasets. I have my own personal tale of working in the same institute with someone who made some very pretty pictures by analyzing massive sets of protein sequences, and using grid computing environments to handle the numbers-crunching. It was hailed as a big success and a perfect marriage between high performance computing, bioinformatics, and basic bench science. But it was ultimately worthless because we learned nothing new.
Well, you've gotta have tool and method development as at least a fraction of research. He might not have answered any scientific questions, but I also contribute to the development of software for other scientists and I can certainly point to plenty of examples of its worth, both in improving performance and feature set. If anything, not enough effort is put to that category of research. Molecular dynamic simulations can take days to run, but most computational biologists in that field care more about proving the correctness of their results than optimizing performance.I have my own personal tale of working in the same institute with someone who made some very pretty pictures by analyzing massive sets of protein sequences, and using grid computing environments to handle the numbers-crunching.
I am currently studying molecular biology and plan on taking computer science when I finish my B.Sc. in the former. I was hoping to combine the two and work in bioinformatics but after reading the post and the comments here I'm feeling a bit discouraged, since it kinda sounds like this is a dying / useless field (although I know this might be a pretty isolated and bit biased sample). I was hoping to know if there is cause for my worries? All comments welcome!
I am a scientist working in molecular biology. Personally, I love my job; I don't know if I could find another career I would enjoy more. I spend about a third of my time tinkering in the lab, a third reading and writing, and a third bullshitting with colleagues about what experiments would be cool to perform. Perfection. I think there is a place for bioinformatics, and I think it will only get more important as time goes on. But, IMO, I think its great usefulness will come in meta-analyses, which are often severely lacking. Analyzing the genome (or proteome, or transcriptome, or any other "-ome') of a tumor is a waste of time in some sense, but it might get you paid quicker. But meta-analyses of the volumes of data that are out there I think is what will tell us what it is we're all looking at. There are too much data for anybody to make heads or tails of. Bioinformatics will be very important to disease treatment in the future, but it hasn't really found its sea legs yet. If I were starting from scratch, I would probably do my graduate work in mathematical biology, because I regret that I don't get to use my math skills in my work as often as I would like. But all in all, the population is getting sicker, so if you're in biomedical research and you have something to contribute, you'll have a job somewhere.
The field I work in is targeted delivery of antisense oligonucleotides (microRNAs, mostly) for treatment of disease. I think this field will get a lot bigger, but that might be wishful thinking. I think you'll see "omics" becoming a dying beast. Unless, of course, the NIH leadership gets populated with a lot of those people, in which case we'll probably sink even more money into it.
At any rate, there's a lot of crappy code written in all of the sciences. It's not just bioinformatics. My personal feeling is there is hubris across many of the sciences that writing good code is "easy." I've offended many physicists when I have suggested that they might want a computer scientist or software engineer help write their code.
In my last project, I was advised that, while writing good code is beneficial in the long run, the cutting edge is all about prototyping fast, finding out what works, and not getting invested in ideas that don't. Now, I am of the opinion that poor software engineering practices cost a price in the long run. Also, frequently code that was never meant for production finds its way onto many other scientists' computers. Personally, I blame it on a lack of experience and general computational education. It's not that hard to add docstrings to python code. Hell, it's easier to use standard modules than to roll your own format parser. I'm regularly able to compress code bases in half by just adhering to the best practices of the language being used. But that doesn't win grants.
There's several examples that I know of that were the one-off prototype code written by the physicists that turned into production code. They were used for years but would take hours to run. We look at it and fix it for them. Usually takes less than a week. Now, the codes now run in minutes instead of hours. That's a lot of wasted scientist time. Though, I get that it's hard to get software engineers or computer scientists on the grants, anyways, because no one wants to pay for them, even if there is a large computational portion of the research.
Easy argument against it is that funds are spread paper thin already, and when you have to choose between a grant to optimize an already-built program and one to write a completely novel one, well, the choice is clear. Now, if everyone wasn't spending every waking minute stressing about grants...but I don't think anyone will disagree with me that that's already a known and well documented problem. ;) I personally think it would be worthwhile in the long run, economically, to add a short subject on good programming practices to conferences / workshops, but I'm not the one scheduling them.Though, I get that it's hard to get software engineers or computer scientists on the grants, anyways, because no one wants to pay for them, even if there is a large computational portion of the research.