Most data isn’t “big,” and businesses are wasting money pretending it is

a thoughtful web.

Good ideas and conversation. No ads, no tracking. Login or Take a Tour!

Most data isn’t “big,” and businesses are wasting money pretending it is · 4

thundara · 4219 days ago

qz.com · #technology · #bigdata · #bigdata

The story is similar at Yahoo, where it appears the median task size handed to Yahoo’s cluster is 12.5 gigabytes. (pdf) That’s bigger than what the average desktop PC could handle, but it’s no problem for a single powerful server.

tweet · print · htmlmarkup tips · 0

mk · 4218 days ago · link ·

Unless you deal with large datasets, I think it's difficult to understand just how difficult designing meaningful analysis becomes. In the end, when it comes to market analysis, you'd probably do just as well going with the 'gut feeling' of someone that has demonstrated a keen understanding of the market. Otherwise, mining data like user behavior is probably only going to allow you to tweak things around the edges. The problem being, you might learn two important things, and in reacting to them, work counter to a third and possibly more important thing that you didn't even know to look for. -That's why I think A/B testing is most often bunk.

As an aside, I just started changing the way tags are stored, and have been trying to anticipate what information might be best to store with them so that we can use them more effectively. I can't anticipate using a separate machine to crunch these data any time soon.

+discuss+discuss

–

forwardslash · 4218 days ago · link ·

I've been wondering: how is data persistently stored on hubski?

+discuss+discuss

–

mk · 4218 days ago · link ·

Magnetically. ;)

Hubski has no database. Instead, data is stored in flat files as s-expressions (lists, it's Lisp!). All the data you see here is actually in memory, loaded on startup, plus newly added data. Tags have existed as elements within post and user data, but I am currently creating an independent directory for them, so they can have their own associated elements that can be updated, sorted more quickly, etc.

There's definitely some work to be done with all this as things progress. It should be fun.

+discuss+discuss

AlderaanDuran · 4218 days ago · link ·

Ugh, filled with so many buzzwords and pointless conjecture from an author who has obviously never actually worked with the technologies he is writing about. It's so painful to read articles about "big data" or "the cloud". The devil is in the details when it comes to any companies needs, so trying to put some stupid buzzword on every company is just pointless. But I get a certain kick out of reading articles from bloggers and "technology" enthusiasts that haven't actually worked with the things they write about, though it's not the same entertainment value they were hoping to provide. :)

And then at the bottom of the article...

Read more about our obsession with The Cloud

Yeah... kind of knew that was coming.

Also,

But buying into something as faddish as the supposed importance of the size of one’s data is the kind of thing only pointy-haired Dilbert bosses would do.

And you know, the author of that article. Hah.