Aristillus

A large crater to dump my thoughts

Help! We’re drowning in data

leave a comment »

The age we live in is the first with an overabundance of data. It’s had a tremendous influence on our daily lives.

Cheap electronics and digital processing have made it possible to gather and organize all data imaginable. All online stores, and most regular ones have easy access to their transactions in digital form. Video cameras keep an eye on dangerous corners and other interesting or boring views. The integrated GPS, GSM and Wi-Fi in phones is used to pinpoint the location of its user. (Radio)telescopes capture vast amounts of images. And, of course, web browsers and web servers are aware of the web pages we visit.

In addition, the extremely low cost of digital storage, compared to the technology available to previous generations, has made it possible to record this information and keep most of it around for a long time. At least for as long as it could possibly be useful.

Gathering digital information usually is not hard, nor expensive. There is often barely any effort involved at all, because the data is already generated or digitized as part of some other process – it just needs to be stored. Storing large amounts of data is not trivial, but also not unreasonably hard. This is also one of the reasons why most internet giants are reluctant to throw away data; it can come in handy later, but once it’s gone it’s completely gone or at least hard to regain.

The real challenge is in organizing and accessing the information in a useful and simple manner. This is a hard problem. Not only does the information need to be available, it needs to be available in a structured manner. Having millions of petabytes of raw data at your disposal isn’t very useful without knowing where to look or how to interpret what you find. Likewise, it isn’t possible to iterate over all available data for every single query. If you’ve used the web in its early days you will remember that search engines only provided a very simple subtext match over all pages everywhere and returned results in no particular (useful) order. They were some help in navigating your way through the web, but there was still a skill set (and a lot of patience) required to actually finding what you were looking for. This challenge – making all information everywhere available in a usable manner, and fast – seems to be what Google is all about.

Having all of this data available, and the ability to access it in a meaningful way, is absolutely awesome.  It’s an information engineers’ wet dream. It helps eliminate information asymmetry. It’s what allows my phone to know where I am without having GPS enabled (it uses the location of the nearby WiFi access points instead). It’s what makes it possible for me to find the full details and contents of any book in mere seconds. It’s what makes it possible for last.fm to predict the concerts I’m interested in in my direct vicinity.

It’s also pretty fucking scary, considering the effect omni-accessible information has on our privacy.

Perhaps the current use of the centrally available data by the state is reasonable, but there is little or no monitoring of how this data is used. The mere availability enables easy widespread abuse by any (future) totalitarian state.

Advertisements

Written by aristillus

July 24, 2012 at 22:53

Posted in rants

Tagged with , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: