Anyone who knows data mining and projects regarding BI know...
Then I will have to admit to not knowing much about data mining and BI. But will confess that I am quite a data whore and will share with you just a little bit about how I go about whoring. Since I have the need (curiosity) for many petabytes of data, until recently I used a nifty Hyperion BSO-ASO set up. Because what I'm looking for, for the most part, is patterns within large data sets uploaded 2 minutes ago I have the need for speed. And since the data set is refreshed every day and because it is so large the system (ASO side of things) completely rebuilds itself every night (transparent partitions are wonderful things - but how I do the drop to xml and then rebuild/error check in the flash of an M80 is my own secret, and I don't use MaxL). The data goes into BSO because we have some massive calcs and BSO can do the heavy lifting. The data then goes to ASO about as fast as you can keystroke (jokes please), and here ASO is much, much faster, but the calcs need to be on the lighter side of things. ASO can do a few billion/second. This pretty much makes analyzing every single phone call made in the US a task as simple as pushing a Tonka toy around a small sand box. Especially if they collect only meta-data and not data, but who knows the truth of that matter.
My data sets are far larger than every call made in the US, for an entire year, and I have lots of people that want to latch onto this data, and we all want it faster. So I guess I'm not the only data whore around. Now we have the problem of user concurrency, big time. Along comes in-memory processing for me to consider. An initial objection to using an Exalytics set-up is that we wind up with at least 2 copies of the data, one on disk and one in memory. But, given the budget we have, I say fuck that concern because if we have multiple Exalytic boxes for a given set of users then we can do away with the concurrency problem. On top of that we have no down time during nightly refreshes/rebuilds, and the sun never sets on this operation anyways. At this point we can go Exalytics or something else like SAP's HANA (claiming 400 billion records/second), which means throwing away what we already had. What we came up with is a sort if witches brew (cluster) of technology. What I am willing to wager is that the NSA has LOTS of byte-guzzling data whores. That is neither condemnation nor admiration for them, but I sure as fuck would love to have a peek at their data center.
But all this comes from a guy who has spent far more time with the classics than SQL or MDX, so make of it what you will.