Verbosity and algorithmic defences

The flood of information that is coming through our personal data buses is increasing all the time. I came across a couple of comparative statistics the other day that blew me away, and I wonder if we are all foolishly ignoring that data deluge and betting on algorithms to save the day.Firstly, consider the sheer overwhelming torrent of information that is hitting us harder every cycle. There is all the news, reports, etc. that is available, in text or, worse, in non-machine-parseable multimedia. [Schmidt from Google](http://www.i-cio.com/blog/august-2010/eric-schmidt-exabytes-of-data) was attributed with “Between the birth of the world and 2003, there were five exabytes of information created. We [now] create five exabytes every two days. See why it’s so painful to operate in information markets?”. And much of it is about increasing sources, and much of it [long-tail noise](http://en.wikipedia.org/wiki/Pareto_distribution), but also it is about a worrisome increase in the amount of each transmission. And how in this increased transmission, how do we survive the deluge and still catch the important information.

According to [FB](http://www.facebook.com/press/info.php?statistics), more than 30 Billion (no that’s not a typo) pieces of content are shared on Facebook each month. Facebook users also install more than 20 million applications, and link through to more than 250,000 external websites every day. The average number of friends is 130 per user. With close to 600 million actives users, 50% of whom are logging on daily, the FB alone could swamp an individual (and I believe that it already does and so we are on a low density, high volume diet of information, e.g., high calorie, low-nutrient!)

Consider a couple of comparisons.

## Number of words

* Gettysburg Address – 0.3K
* US Constitution – 5K
* FB privacy circa 2010 – 6K
* MacBeth – 18K
* Apple iTunes agreement – 20K

This is out of control. How do we know that we are not missing vital information in the last segment of the FB or Apple agreements! Is that intentional on their parts?

And then I am told that the salvation will be statistical methods to filter and promote *meaningful*, *relevant* information. How? How does any filter now that I need to weigh it high before I have seen it? The amount of correlations that one can make about a term in the 19K-20K segment in the Apple agreement is surely going to be swamped by the hits in the first 18K. So I will be forced to parse all of it, possibly still missing something simply due to volume, or skip it for pragmatic effectiveness.

Stop drowning us. If the US forefathers could pack the generative ideas running the US since its birth in less words than FB needs to talk to you about privacy issues, then we need to slim down on our transmission. More information density, less total volume.

twitter
twitter

Leave a Reply

Your email address will not be published. Required fields are marked *