Fascinating stuff. I'd love to mess around with Hubski text in this way. It would be interesting to relate users based upon the words they write. kleinbl00, this might be a compelling way to relate and suggest tags, not only based upon tags, but upon the text associated with a tag. Theoretically, if we had enough RAM, something like this could replace tags altogether. Posts would be grouped by word vectors alone.
All I know is I read it, shared it, and subscribed to #machinelearning. My big programming trick of the day was copy-pasting javascript into a Google spreadsheet and then I broke that. When I see something that says "it will take a multi-core workstation several days to parse the equivalent of 1000 books or 5 million twitter comments" I go "I'll bet this is useful to somebody else."
Well, they do warn in the end that it requires a lot of words in the case of specialized topics. It might not perform too well on hubski-specific tags (i.e. linking thehumancondition to writing and philosophy or vaguequestionsbynowaypablo to askhubski). How does the current related tags method work? Association of tags being used together?
Why bother with tags at all? Just throw the entire posts in, see what comes out. Actually, that'd probably result in unusably large vectors and more data than Google could deal with. It'd be cool though...
It was part of Monday's meeting. Truth be told, we aren't going to be able to quit our day jobs with the level of donations/subscriptions that we will currently muster, and that is part of the reason I have held off. That said, it would be nice not having to dig into our own pockets for servers and stickers. I cannot imagine a VC scenario that would work out well for the site. If Fred Wilson really wanted to give us $10M, it would be difficult not to consider trying to make it work, but I don't think that cash money is what Hubski ought to be after. In my mind, Hubski is about expanding a dimension of human interaction that could benefit from the effort. That goal can easily extend beyond what exists here and now, but I am not keen on selling myself or you all on a new strategy for Hubski that fits a funding model. Ideally, I would like to get the site into a state where is sustained itself, and can continue its mission with minimal polluting influences. I do think it is possible, and I think that it will be even more possible as time passes. The team seems to think that there won't be much harm in an early revenue model; we are going to experiment with that this year.
NPR has evergreen partners (a set amount comes out of my checking account every month). Hubski could do the same. Think givemarkadollar.com, but ongoing. No - you still couldn't quit your day job, but that could provide a steady, reliable stream of server/sticker opex. People feel good about contributing to causes they love.
It's a bit technical, but it may be of minor interest to lil given her recent post on algorithmic text.
Yes - a bit of an inside joke from my last job. I rather like biological trees :)
I'm not sure. If you're asking if it's a marijuana reference (only other common meaning for trees that I'm aware of), then no. It's due to my (alleged) aversion to trees. EDIT: It occurs to me that you're probably asking about my 'liking biological trees' comment. I'm referring to the large plants made out of wood, there. Basically, I wrote some really slow, difficult to maintain code that operated on very, very large trees. I still work with many of the same people at a new company, and I still haven't heard the end of it.
It's just completely mind-blowing to me that it works at all. It's like "Hey, here's a bunch of words and a rule for turning them into lists of numbers! If you do math on the numbers and convert them back to words, the results make sense!" How bizarre is that? The whole "king - man + woman = queen" example is just insane.