23 January 2011

Real-Time Streams and the @Cloud

Heraclitus says, doesn't he, that all things move on and nothing stands still, and comparing things to the stream of a river he said that you cannot step twice into the same stream (Plato, in Cratylus 402A)

The Internet, and the services that it offers, have traditionally been a rather static affair. However, there is evidence that we are beginning to see a shift in the way in which we use the web, and also how the web uses us. This is known as the growth of the so-called ‘real-time web’ and represents the introduction of a software system that operates in real-time in terms of multiple sources of data fed through millions of data streams into computers, mobiles, and technical devices more generally.[1] Utilising Web 2.0 technologies, and with the mobility of new devices and their locative functionality, they can provide useful data to the user on the move. Additionally, these devices are not mere ‘consumers’ of the data provided, they also generate data themselves, about their location, their status and their usage. Further, they provide data on data, sending this back to servers on private data stream channels to be aggregated and analysed (such as clickstreams). As the web space begins to fill with these devices and services that have the facility to feedback information and exchange data in real-time we see the experience of the web begin to change, that is,
1. The web is transitioning from mere interactivity to a more dynamic, real-time web where read-write functions are heading towards balanced synchronicity. The real-time web... is the next logical step in the Internet’s evolution.
2. The complete disaggregation of the web in parallel with the slow decline of the destination web.
3. More and more people are publishing more and more “social objects” and sharing them online. That data deluge is creating a new kind of search opportunity (Malik 2009).

The way we have traditionally thought about the Internet has been in terms of pages, but we are about to see this changing to the concept of ‘streams’ (see Berry 2011). In essence, the change represents a move from a notion of information retrieval, where a user would attend to a particular machine to extract data as and when it was required, to an ecology of data streams that forms an intensive information environment. This notion of living within streams of data is predicated on the use of technical devices that allow us to manage and rely on the streaming feeds. Thus,
Once again, the Internet is shifting before our eyes. Information is increasingly being distributed and presented in real-time streams instead of dedicated Web pages. The shift is palpable, even if it is only in its early stages... The stream is winding its way throughout the Web and organizing it by nowness (Schonfeld 2009).
Importantly, the real-time stream is not just an empirical object; it also serves as a technological imaginary, and as such points the direction of travel for new computational devices and experiences (indeed, it encourages the consumption of devices and media). In the world of the real-time stream, it is argued that the user will be constantly bombarded with data from a thousand (million) different places, all in real-time, and that without the complementary technology to manage and comprehend the data she would drown in information overload (see Datasift for an example of a real-time social media filtering engine). But importantly, the user will also increasingly desire the real-time stream, both to be in it, to follow it, and to participate in it, and where the user wishes to opt out, the technical devices are being developed to manage this too. For example:


To avoid the speed of a multiply authored follow stream, especially where they might number in the hundreds or thousands of people you follow, instead you might choose to watch the @mention stream instead. This only shows Tweets that directly mention your username, substantially cutting down the amount of information moving past and relying on the social graph, i.e. other people in your network of friends, to filter the data for you. That is, the @mention stream becomes a collectively authored stream of information presented for you to read (Berry 2011).  


Gillmor (2011) calls this the @mention Cloud, and I think that the idea of a space or 'Cloud' which is a holding location for real-time streams is really interesting. Clouds, as in cloud-computing, are normally understood as location independent data-centres that are controlled and owned by data warehousing companies and which provide data, software and even processing power to client computer systems.[2] But clouds can also refer to statistical clusters, where elements are grouped around an anchor, in this case a particular username, or @mention. With his notion of the @mention cloud, Gillmor gestures towards an important part of the problem with following and understanding real-time streams, and that is the relevance and quality of the information they contain. And they do hold important information, its just difficult sometimes to find, extract and order it (for example, see the curation of real-time data streams in crises or Brand management with SwiftRiver). Indeed, one of the problems is that they transcend organisational boundaries and move quickly between different topics and knowledges.

The @mention stream, found on services like Twitter, allow your social graph (that is the group of people you follow) to act as a kind of social filter, only drawing your attention to the things that they think are important, often called the interest-graph. To attempt to follow the raw data stream from Twitter (which they call the firehose) would be impossible as the dataflow is just too fast, indeed, according to ComScore, there were over 25 billion tweets in 2010 alone (Jeavons 2011). Interestingly, there are now so-called Data Resellers like Gnip, that offer subsets of the firehose, called halfhose (50% of data stream), decahose (10% of data stream) and Spritzer (1-2% of data stream). Therefore information management becomes an increasingly important concern in order to keep some form of relationship with the flow of data that doesn’t halt the flow, but rather allows the user or organisation to step into and out of a number of different streams in an intuitive and useful way. This is because the web becomes,
A stream. A real time, flowing, dynamic stream of information — that we as users and participants can dip in and out of and whether we participate in them or simply observe we are [...] a part of this flow. Stowe Boyd talks about this as the web as flow: “the first glimmers of a web that isn’t about pages and browsers” (Borthwick 2009).

Of course, real-time streams and clouds could also enable the emergence of what is being called "cloud jacking" and "cloud hijacking" (Cohen 2009), and we might even envision dark-streams and dark-clouds, indeed we could think of Wikileaks as a dark-cloud itself. We could imagine that these dark-clouds absorb data, rather like a black-hole absorbs light, and into which we are unable to perform search or discovery, that is they remain opaque to us. Within certain industries this kind of dark cloud system could be useful for anonymising streams, or creating aggregations or search results without revealing the dark algorithms that drive them (Google page rank could be thought of as a dark algorithm), unsurprisingly in the finance sector there is the emergence of a similar concept called dark pools.[3]

However, the key question remains: how might we transform an @mention stream from its diachronic state, as a fast moving stream, into a frozen place of immanence, that is a synchronic state. This can be understood as the ability to use cloud-computing to freeze statistical @mention clouds, which I want to call the @Cloud.[4] The reason is, that as the real-time streams currently stand they become increasingly difficult to manipulate, refer to, or even connect and compare. The @Cloud would therefore need to implement the function that Kittler argues is intrinsic to all digital media, that is Time Axis Manipulation,[5]

[which] shift[s] the chronological order of time to the parallel order of space – and spaces are things that can principally be restructured – [thus] written media become elementary forms that not only allow temporal order to be stored but also to be manipulated and reversed (Krämer 2006). 

I also want to suggest that the @Cloud would preferably combine the features of computational search (exemplified by Google) and the social graph (exemplified by Facebook or Twitter). The key is to be able to translate multiple fast moving streams of information, that is a time-based medium, into a space-based medium. Providing the interface for temporality through storage, this is the essence of the @Cloud. But the @Cloud, is not merely a storage Cloud itself, as it allows multiple stream-like access points back into the information that it has collected, you have forwarded to it, or friends in your social graph have suggested (we could call these @streams). The @Cloud would, therefore, allow the replaying of the streams, the rewinding or fast-forwarding of the data, and even the move to a different dimension to view the information from above, below, or even comparatively against other data (anyone who has read Flatland will understand what I am suggesting here).


We can think of the @Cloud as a sink, into which we can pour various information, both diachronic (i.e. moving data streams that continue to flow into it) and synchronic (e.g. email, books, PDFs, photos, websites, URLs, etc).[6] But it is more than just a cloud-based storage service or data-locker.[7] The @Cloud can then act as a meta-interface with multiple dimensions into a datascape that is rapidly changing, including real-time streaming of itself (see Rao 2009, Gillmore 2011). This is, of course, not just RSS, which is information syndication, as it brings to bear the advantages of the social graph and even what we might call the thing-graph (i.e. the collection of devices, and things, that you have connected together through this @Cloud itself). Thus, one could watch one's own @streams from @clouds, including media-streams, photo-streams, @mention streams, and @reading streams. Each stream could potentially be connected to the others, and relations, ideas and concepts from each stream could interact and provoke combinations, questions and narratives that might not be apparent in isolation. 


Indeed, thinking of the @Cloud as an interface might be the best way of understand it, a highly visual experience for viewing complex time-based media, in a number of computationally and social-media assisted ways. Treating all information in the @Cloud as a potential stream (frozen/dried streams), rather than a collection of discrete objects, which can then be re-streamed using a number of different search/tagged criteria, would also open up new narrative modes of interpretation (certainly Qwiki demonstrates one way of reconceptualising search as a streamed media experience, and Apple iPhoto 9 with its 'Faces' and 'Places' function shows another). We could also imagine viewing one's @Cloud through filters such as heat-maps, wordle-type visualisations, location, people, places or even through versioning systems which highlight change within data streams.[8] Importantly, we could also share portions of our @clouds, creating new @tropospheres that others could explore.[9]


Notes

[1] Programming these new real-time services will pose particular problems as they will require computer code to remediate static web services to distributed computational devices. They also required the kind of distributed computing power that is able to respond, process and communicate through networks. 


[2] The Ecologist argues that "[c]alling this vague collection of ‘other’ computers a ‘cloud’ evokes a vaporous world of weightless websites, but that would be misleading. In truth, The Cloud consists of dataprocessing warehouses the size of football fields, strung together by fat cables and inside which air-conditioning fans cool rows of computing servers 24 hours a day. Far from being weightless, the expanding digital cloud is really an enormous necklace of steel, silicon and concrete." (Ecologist 2008)


[3] Whilst within data circles there has been a move to the language of streams and clouds, within the finance sector there has been a corresponding rise in the use of the language of so-called dark pools, these are "trading venues that match buyers and sellers anonymously. [Which by] concealing their identity, as well as the number of shares bought or sold, dark pools help institutional investors avoid price movements as the wider market reacts to their trades." (Economist 2009)


[4] We might think of the @cloud as a platform for streaming services. Completely customisable to user requirements in terms of search criteria and relevance. 


[5] The "means of time axis manipulation are only possible when the things that occupy a place in time and space are not only seen as singular events but as reproducible data. Such production sites of data are ‘discourse networks’. Discourse networks are media in the broader sense: they form networks of technological and institutional elements." (Krämer 2006) 

[6] The idea of collating email into an @cloud that can then be streamed back out, perhaps in a short format, translates the static nature of email into a dynamic streaming format. I can imagine that an @stream for email would be extremely useful. 

[7] Streaming media from an @cloud into custom @streams, such as photo-streams may be part of the investment  Apple is making into huge data centres


[8] Services that help to filter the real-time streams include peer-scoring like Klout and PeerIndex that calculate your 'authority' in relation to other users of real-time services.  Datasift, for example allows you to combine this reputational data with geo-location, 'sentiment', and lots of other filters to perform search and discovery on the Twitter realtime data stream. This could be used for crisis-tracking, brand tracking/management, or other forms of rapid data discovery. Datasift even has rules such as 'no swearing' which enables the automatic bowdlerisation of Twitter, or patterns of text like ISBN codes. 


[9] Salman Rushdie in Haroun and the Sea Of Stories has a wonderful passage that describes something similar to the @cloud, that is a living stream of narratives and temporalities: "Haroun looked into the water and saw that it was made up of a thousand thousand thousand and one different currents, each one a different color, weaving in and out of one another like a liquid tapestry of breathtaking complexity; and [the Water Genie] explained that these were the Streams of Story, that each colored strand represented and contained a single tale. Different parts of the Ocean contained different sorts of stories, and as all the stories that had ever been told and many that were still in the process of being invented could be found here, the Ocean of the Streams of Story was in fact the biggest library in the universe. And because the stories were held here in fluid form, they retained the ability to change, to become new versions of themselves, to join up with other stories and so become yet other stories; so that unlike a library of books, the Ocean of the Streams of Story was much more than a storeroom of yarns. It was not dead but alive." (Rushdie, quoted in Rumsey 2009).


No comments:

Post a Comment

Disqus for Stunlaw: A critical review of politics, arts and technology