winter

odaesa

Mashing up on the web

The internet has developed since the later 90s from a simple text transport protocol to become a tool for widespread social interaction, leading to the growth of complimentary technologies and standards. The inherent nature of the internet as a method for communication (data transfer) between disparate communities provides good ground for developing highly portable, interoperable applications. It is from this foundation that the mashup has emerged.

There is no ‘right’ process for a mashup, but a methodology and some standards to follow. When a content creator, perhaps a business or webmaster, is willing to provide some or all of their content or data for use and incorporation elsewhere, it can be combined with another form of data or software service to create a new dataset or different data format which users find useful.

It may seem that this method of ‘programming’ must be very impractical, perhaps unstable. However, this is not the case, due to the consistent usage of defined standards. Thus, just as http enabled PCs of different types to share html documents, standards such as XML1 provide an ideal base for mashups to operate on.

Data may be collated in any way, using any tools or software desired, as long as result data is distributed in a standard format. This tends to be an XML document adhering to a published DTD or Schema that defines how to interpret the data. As long as the provider continues to output data in the implicitly agreed manner, it can be put to good use elsewhere. A common method of data distribution for mashups is RSS feeds, which are defined as a subset of XML. Additionally, HTML itself has developed into XHTML which is also a subset of XML.

Once a reliable source of data is available, it can then be combined with another data source or software service to provide a new data set that can be more useful and interactive than the originals. Google Maps2 is a well known tool for use in mashups, with examples such as Clearmap3 demonstrating this data enhancement very well.

In many cases the next step utilises Javascript to perform the actual mashing. Javascript code is highly portable – almost all browsers now support it easily – so it can be used on any platform. Processing power and bandwidth requirements can both be reduced as scripts are run on the client machine as opposed to the server and, using AJAX methodologies – essentially the XMLHttpRequest function, only the changing parts of web pages need be transmitted.

Javascript benefits from some object-oriented capabilities, which is very useful in maintaining standard interfaces whilst allowing underlying changes to be made to systems.

There are limitations to mashups when compared to traditional desktop applications. They may be slower to execute, and while their implementation may be robust, providing well formed output and input methods, the distributed nature of the network imparts inherent weaknesses. There is an obvious lack of global time agreement, an inability to accurately report service states, possible time lags or even complete connection failure – all of which may be beyond the control of users or adminstrators.

Security too can be an issue; regardless of how well a mashup tool may be constructed, and that in itself may be suspect, the network over which communications take place is vulnerable to breach at and via various points and methods. Unfortunately Javascript is a source of many security issues surrounding mashups, as running code on client machines can have detrimental side effects. There are methods in place to counteract these, with the Same Origin Policy4 playing a strong part in the strategy. It is currently being reviewed by the W3C5 in the hope of maintaining security whilst allowing greater interactivity between machines.

There are already some very useful implementations of mashups, and data is made available by various sources for such purposes. Google6, of course, provide various mashup APIs. Other big internet names such as Youtube7 and eBay8 provide APIs to their data too. mashupawards.com9 lists more interesting examples.

Whilst a mashup may be considered to be something that provides online access to a data set in a particular way, there are also other ways in which similar processes are used. Digg10 and Del.icio.us11 are examples of websites that exist only to provide links to other data sources on the internet. In addition to the functionality they provide, there are applications such as addthis.com12 and sharethis.com13, which provide APIs to link websites – or particular content – into sites like Digg and Del.icio.us. Pipes14 – now a part of Yahoo! – is a useful online mashup development tool. There is also now a move to develop an online ID service, named OpenID15, to enable easier authentication of mashup-like services. These services have developed independently yet are all inter-related and rely on eachother to function.

AllTheAnalysts.com16 is an example of mashup concepts in development. It combines search data results with staff-defined quantifiers, then tracks user activity on the search results. These two datasets are combined and analysed to produce reports for sale to the community. Thus, it is currently a data mashup. Sharethis.com is used to offer tagging on sites such as Del.icio.us. Originally GET requests and parsing (written in Perl) were used to create the search dataset, and similar methods were and still are available to mine specific data out of human-readable web pages – collectively known as screen-scraping. However Google have since implemented a service that provides the required search data. Should the resultant data from ATA be made available via RSS in future, it could then be utilised in a consumer mashup.

In summary, mashups give an idea of what is achievable via collaborations based on stable standards and object-oriented programming. Increasing access to data in general and making it easier to use and interact with (visualisations) can only improve learning and development overall. Limitations are ever-present, but certainly not insurmountable – security and network reliability have improved greatly and should continue to do so. Though the current conception of mashups may not survive, the utility of programming over distributed systems and the methods required to do so can only improve in line with network and machine advances.

REFERENCES

  1. XML definition at www.w3.org/XML
  2. code.google.com/apis/maps/
  3. gis.chicagopolice.org/CLEARMap_crime_sums/startPage.htm
  4. Same Origin Policy at https://developer.mozilla.org/en/Same_origin_policy_for_JavaScript
  5. www.w3.org
  6. www.google.co.uk
  7. www.youtube.com
  8. www.ebay.co.uk
  9. mashupawards.com
  10. digg.com
  11. del.icio.us
  12. addthis.com
  13. sharethis.com
  14. pipes.yahoo.com
  15. openid.net
  16. alltheanalysts.com

October 23rd, 2008 at 2:40 pm

Leave a Reply