winter

odaesa

Submitting jobs to the GRID and associated security implications

Segments of computational infrastructure across the world have interconnected to form a network of machines that can provide significant resources for data and computation tasks. The various internet, email and web 2.0 technologies are now common and well known, and are well utilised in social, business and academic circles. But there is a layer of GRID technologies which offer computational power – rather than the more traditional data services – for application by both business and research users.

High Performance Computing nodes have been available for some time, and have developed an infrastructure through which they can be utilised. However GRID architectures, where many interconnected machines provide a lower performance but higher throughput, is also realised to be very useful – in number crunching vast amounts of data, for example. There are many such jobs that are not ideally suited to an HPC machine, but could be processed over a large amount of machines instead. This is the foundational principle on which the GRID has developed. However, there is a significant logistical requirement associated with operating many machines across large, geographically disparate networks. Additionally, the fact that such machines are under the control of various different organisations, yet need to be accessible to a wide range of users, adds complication. In the HPC world, an HPC machine is handled usually via one administative gateway, which can state requirements for access, control throughput, and implement some concept of unit costs involved in utilising the service. However, this does not apply so easily to the GRID, as jobs can have a wider range of requirements and, where units of use (and thereby cost) can be calculated for HPC based on cycles used, a GRID system must consider further overheads such as data storage and network bandwidth as a more fundamental part of the cost and benefit of using the service. If the GRID is to continue growing and to operate in a greater capacity, particularly in the business world, it is essential that these requirements are handled effectively.

Of course, there have been operational GRID infrastructures in place for quite some time now, so clearly some of these issues have been tackled already. The problem lies in the fact that, as these services become interconnected, the underlying differences in how the issues are tackled become more apparent, and act as obstacles to integration. Hence, there is a now a strong requirement for standardisation combined with a need to maintain service usability for existing users and make it attractive to potential new customers.

The functions of the GRID can be viewed as three distinct areas – Job Submission, Resource Management, and Matching – with a fourth and overarching area being the issue of Security. As a user wishing to apply the GRID to some particular task, the important area is that of Job Submission. It is important that any submitted job clearly defines its requirements, so that it can be handled effectively once on the GRID. This is similar to how a job must be defined when passing it to an HPC administrative gateway. The important aspects of a job would be information such as :

•    estimated runtime
•    compilation requirements
•    chip architecture compatibility
•    memory requirements
•    data requirements
•    how many processors it can run on

The different GRID administrative entities handled these in their own way, for example under Condor Job descriptions are defined in Class Ads, and under gLite there is the Job Description Language – JDL. However these are essentially the same, although syntactically different, processes – of the form attribute x = value y for some set z of attributes. The logical progression to meet the requirements of the larger interconnected GRID, then, is a standardised method for describing jobs – one such standard being the Job Submission Description Language – or JSDL.

JSDL is used by many of the GRID administrative systems, such as Platform, GridWay, UNICORE, and GridSAM. There is, again, similarity to HPC situations in that a JSDL describes a job in a way that is acceptable to the administrative system. JSDL is an extensible XML schema specification for job description. It defines a main job definition element which can contain various sub elements that convey information about a particular job. In this way, it can act as a standard to be adopted in order to allow easy specification of jobs for use on the GRID, under various administrative systems.

It is important to note, however, that JSDL is not a process monitor. It does not in itself handle jobs or job queues, nor does it administer machines on which jobs may run. This, of course, is handled by the various GRID administration systems, such as Condor and gLite. This is where the other functional areas of the GRID are performed – handling the machines that jobs run on and matching jobs to those (clusters of) machines. Machines are the resources available for a job to be matched to and run on, and must have a description themselves – for example under Condor Class Ads, a resources has its own Class Ad, similar to a Job Description Class Ad, which specifies attributes of that resource such as its queues, OS, processor details, and for whom it grants access.

There is then an overall requirement to be able to describe jobs in a standard and extensible way, and to match those requirements and options to available resources. JSDL offers a means to do that, in an attempt to incorprate the functionalities of JDL, Class Ads and other job descriptors. To read how to utilise these methods to submit a job, check the reference list.

Apart from providing the actual methods for job submission, resource management, and matching, however, there is the further requirement, previously mentioned, of security. As alluded to by the fact that a resource descriptor may specify to whom it grants access, there must be a reliable mechanism for authorisation and authentication, and also a secure method for transporting jobs and their descriptors around the GRID network.

There are multiple reasons why security is vital to the use and development of the GRID. For one, the data a user may wish to work on could be highly sensitive – for example, it could contain medical records about real people, or it may contain commercially sensitive research information that must be kept confidential. Whilst the GRID is essentially a network of machines owned and operated by different real-world organisations, a lot of GRID development and utilisation is carried out by what are known as Virtual Organisations. Therefore, members of one VO may work for competitor companies, and it is possible that sensitive data could pass through the domain of a competitor. Furthermore, the GRID itself is a massively powerful resource, which could be applied for subversive means such as DDoS attacks on internet machines, or for sharing and copying vast amounts of copyrighted data. The environment itself is also vulnerable to attack as it is a highly interconnected distributed network, meaning that the effects of rogue software can spread very quickly through the system. If the GRID cannot generally be trusted as a secure medium, it would be of limited use for certain data sensitive business and research practices, which would stifle the usability and growth of the system.

The technology utilised to provide security on the GRID follows pretty standard procedures for secure communications, although initial authentication is perhaps tighter than in other technology domains. The authentication certificates used are X.509 certificates – these include typical key pair encryption and a CA digital signature to allow for third party confirmations. The Certificate Authority mechanism is useful in allowing users from two separate domains (perhaps real world competitive companies) who are operating jointly in a VO to have certain processes authenticate to eachother from within their own domain areas via an external trusted third party – the CA. In this way, secure networking between and across otherwise untrusted networks can be achieved. Key pairs and the use of digital signatures ensure that access to certain data and services can be tracked, and that data in transit is not interecepted and modified – which would result in a digital signature match failure, thus evidencing any tampering. Once these security mechanisms are ready for use (generally, to get an X.509 authentication certificate, one must apply to the NGS and then have their identity confirmed in person by their VO representative), one can use GridFTP, for example, to send and receive secure communications. GridFTP allows access to users with valid certificates, who can then transfer data and jobs to various GRID sites, and those sites can then use authorisation settings and CA confirmation to control which users can gain access to particular services. It is also possible to delegate authorisation to a particular process for a minimal amount of time, to enable that process to carry out any authentication dependant tasks. The provided reference list contains links to technical information for explaining and executing such security mechanisms.

As with any system that requires some level of security in its operation, it is in the end the users who have the most profound effect on such security. For instance, the requirement for a particular level of security is inherently specified by the current and potential users – their locations, methods of communication, sensitivity of the information which they wish to transmit and interact with, and so on. However, it also tends to be the users themselves who cause failures to occur within a secure system. This is of particular concern in the GRID domain because, again, users can be part of different real-world organisations, and such organisations can have different policies and cultural attitudes toward levels of security. Engaging such a disparate and flexible group of users to adhere to a particular methodology can be difficult. This can be remedied to a certain degree by the requirement to have initial authentication performed in person, and also by enforcing an authorisation renewal policy, for example yearly refreshment of credentials. Also, the concept of non-repudiation of authentication can instill a sense of responsibility upon users. If ones credentials are used to perform unauthorised acts, one cannot prove their innocence – hence, if there is any possibility of credentials having been compromised, a user should cancel them immediately and obtain new ones. Of course, security is only as good as the weakest link, so it is also necessary to place requirements upon passwords used, to ensure they are not easily guessable or crackable – at least within a certain critical time frame.

It is of vital importance that the GRID system competently provides these services, to ensure that trust in the system is developed and maintained. Users must be able to trust the system, and should also realise their own responsibilities in maintain that security and level of trust. If trust is lost, it is hard to regain. Furthermore though, the system must remain manageable, usable and scalable.

Development is ongoing to integrate GRID authentication with other security mechanisms. For example, as web services grow in number and many organisations implement Single Sign On credentials for networks, there is an opportunity to incorporate Shibboleth or other HTTP protocol access to GRID networks. The Open Grid Services Architecture (OGSA) defines a possible method for implementing this type of integration, utilising Web Services Description Language (WSDL), Simple Object Access Protocol (SOAP) and HTTP technologies. If this can be reconciled with the current dependence on X.509 certificates, a further level of integration between web and GRID services and their authentication requirements could be achieved.

In conclusion, the current methodologies for implementing job submission on the GRID are in a fluctuating state, although there is a tendency towards standardisation already in effect which could soon lead to adoption of an overarching standard. Security has been catered for with the application of tried and tested methods, although until such time as there is cause for the security to fall over, it is important to ensure that users keep up with the application of such security methods, and that their implementations remain current. Complacency must not take hold, particularly if the GRID is to be marketed as an attractive option to new customers. Finally, there are hints towards new levels of usability and scalability with the possible future integration for web services type applications and access controls.

REFERENCES

  1. NeSC
  2. National Grid Service
  3. Open Grid Forum
  4. Condor Class Ads
  5. gLite
  6. globus and JDL
  7. Job Submission Description Language
  8. Platform
  9. GridWay
  10. UNICORE
  11. GridSAM
  12. X.509 certificates
  13. GridFTP
  14. Shibboleth
  15. Web Services Description Language
  16. Simple Object Access Protocol
  17. Open Grid Services Architecture
  18. The Anatomy of the Grid
  19. The Physiology of the Grid

November 28th, 2008 at 11:38 pm

Leave a Reply