Not really a side, because it's still in lab, but I've been trying to convert my lab to use as much open software as possible. I've been re-writing a lot of old MatLab code in Python, setting up IPython notebooks, and converting or disorganized set of protocols into a git-version-controlled repository of Markdown documents. I came across this talk on a similar workflow being set up in neuroscience imaging, which got me excited: And in the process learned about one of the most futuristic research campuses I've seen: I'm hoping that when I finally go to publish a paper, the analysis will be completely NumPy / IPython / Jyupter / Docker-powered. I'm currently stuck on how to distribute an eventual 50+ GB of data without having to pay extraordinary costs for hosting...
Torrents are a possibility, but I'd prefer a central host with guaranteed uptime. Something akin to the PDB, but for a different kind of data. And the code is for validation and analysis of mass spectrometry data. Basically measuring the levels of many proteins and protein modifications in a sample to get a snapshot of a cell / tissue's signaling network.
There is PRIDE, which may turn out to be more useful than I initially thought. But it's also nowhere near the quality, in terms of regular analysis, more automated metrics of quality, a decent search engine, etc. After writing the above comment, I did go back and write a little wrapper around it, so, if binder and the like don't mind downloading a few hundred megabytes of summarized data, then I should be okay.
I'll probably do a pubski / drunkski post when the project is finally published, but that won't be for at least 1-2 years (we're still applying for funding right now). I wouldn't mind writing a few more general "state of the field" type posts, but I'm not sure how much hard science people want to read through.