header image

The Sharing Problem

How can two, separately written computer programs, share data?  This has been a question and problem since computers entered the business arena over 50 years ago.  If two banks wanted to allow a transfer of funds from one to another, both banks’ systems needed to understand each other.  A supplier’s computer would need to understand the request from a company needing supplies.  A researcher’s program might need to access data from another analysis program in real time.  Time and time again, for business to work, one set of computer systems must be able to interact and share data with other systems.

The basic problem of sharing data between computer systems and/or programs can be divided into two cases.  The first case involves two programs running on the same system that wish to share data.  This could be as simple as a user wanting to import data from a spreadsheet into a word processing document, or as complex a two concurrent programs wishing to share data in real time.  The second case involves processes running on two different computer systems, located physically apart from one another, and connected through a network.

While these two cases have significant differences, they also share some similarities.  For a system to support the sharing of data, it must have some basic properties.  These include allowing the creator of the data to allow or disallow access to the data.  For example, a bank would not want to allow just any computer system to access its financial records.  Such access would have to be limited to other financial institutions that have good reason to access such private data (such as checking to ensure a customer has sufficient funds to make a purchase).  In addition, the owner should be able to not only identify who has access to the data, but who has read/write access to the data.  Taking the previous example a step further, a bank may not want the visiting bank write access to its systems.  For example, if merely checking for sufficient funds, there would be no reason to allow the visiting bank write access to the data.

When two programs need to share data on the same machine, a shared memory can be used to facilitate access.  Both programs would need to be written using the same data structures, but sharing would be easy, because synchronization would never be an issue.  Since both programs point to the same place in memory, the memory is always up to date.  It would also be easy to ensure only one program could write to the memory at a time by locking the memory to one program when that program is attempting to write to it.

To facilitate this process, a two-level mapping scheme was developed.  The scheme was called capability addressing.  This assigns each piece of data used in a program a “handle” or “capability”.  This mapping can be visualized as a simple mapping.  Each process has its own object table.  Each object within the table has an id assigned to it.  Then, each system has one system-level mapping table that maps each handle to an actual memory address.  This means that if the memory segment must be moved or resized, the system must update only one table.  This system level mapping table can also contain information about the object.  For example, which object owns (or created) the data object?  Also, which processes have read/write access to the object?  This two level mapping scheme can be visualized as below.

 

 

The scheme effectively handles sharing on a single machine.  However, it does have some issues.  For example, the process that wants to share the data must know in advance the data types and the object’s handle (unless the handle is received through a call to the creating process).  This means that their must be a large amount of communication between the creating and the sharing process authors.  However, it is likely such communication would be possible if both processes are running on the same system, since it is likely the authors can have direct contact with on another to get such details about the system.

This system was so successful; a handle based system is used by most operating systems file systems to handle managing files and determining the read/write privileges assigned to each file for each user.

However, what if you were developing software on a different computer, and you still wanted to share data with another process running on a different machine?  Also, what if you did not want to have to directly contact the author for permission to share the data?  When computing was in its infancy, this was not a concern.  However, with the development of the computer network, and, eventually, the Internet, such situations become easy to see.  The above solution will obviously not work under such distributed systems, since there is no such thing as a physically-shared memory when discussing two disparate computer systems connected by cable.

Distributed systems changed the landscape of sharing data forever.  No longer did programs have to share data that were running on the same machine.  Now, machines thousands of miles apart, had to be able to share data.  The Internet has exasperated the problem as well.  Now, everyday users are accessing data from a multitude of sources simultaneously.  Users want portals that provide “one stop” access to everything they might need, from bank records to online shopping, to current news and weather.  And they want to be able to choose the data that interests them.

Over time, different solutions to the problem of how different programs in a distributed system should connect to each other to share date have emerged.  The first solution required every system to have specific connections to the other systems it needed to interact with, and special software to allow communication between the two systems.  As an example, consider an e-business site (www.buystuff.com) that was just starting out, and didn't want to write their own routines too handle credit card transactions.   So, they licensed out this part of their purchasing system from wetakecredit.com.  Buystuff now has the right to utilize wetakecredit's system for credit authorization.  However, the question remains, how does buystuff.com utilize the system they just licensed?  The answer is that they would have to look at their system, and convert their purchase format to transmit the necessary information to wetakecredit, in the format that wetakecredit's system expects.  In addition, they may have to build a proprietary hardware link to wetakecredit's systems.  They would have to replicate this type of arrangement for every other service they wanted to outsource (for example, if they decided to outsource their payroll, they might be required to build a custom hardware and software link to wepaypeople.com).  This type of system is show below.

 

This type of architecture was of course expensive and complicated to create and operate.  Other techniques have since been created to make the process a little easier.  For example, a common object interface called CORBA (Common Object Request Broker Architecture) was devised that could be used to allow data to be shared.  It associated the methods to share data by associating them with an IDL file.  This Interface Definition Language universally described the interfaces to the shared objects.  A program could look at the given IDL for the source-program and know what methods were available for it to execute.  This meant that determining what clients could access specific information was easy.  If you wanted to give a client read access but not write access, you could give them access to read-only methods. The methods and object invocations were then called remotely over the IIOP protocol.  This process is shown below.

CORBA has been used successfully by many projects.  For example, The Weather Channel utilized CORBA in creating the system that transmits local weather forecasts to each cable network that offered the channel.

CORBA did have its share of problems, however. The first problem was that how you define the IDL file was different for every programming language.  This was also the case for utilizing the IDL file.  A programmer would have to learn how to access the IDL files for every language he/she needed to utilize the remote objects in.  Another problem is that CORBA utilized its own protocol (IIOP).  This is not a standard Internet protocol, so special connections would have to be installed to allow the connections.  In addition, you still had to know explicitly about the service being provided.  There is no general repository to list public CORBA applications.  All of this makes CORBA well suited for specific application integration tasks over the Internet, but not well suited for publicly shared services.

The current solution to the problem involves XML web services.  They replace the specialized connections with the http protocol, which any system that connects to the Internet can utilize without any additional hardware or software.  Data types do not have to be known in advance because creators of web services can publish the ways to share data through WSDL.  In fact, users of web services do not even have to know the web services exist from the creator.  This is because there is UDDI to discover web services.

Let’s look at how XML web services work in more detail.  The one development that makes XML web services a workable idea is XML (eXtensible Markup Language).  XML is a markup language for defining other languages.  For example, you could define XHTML using XML.  Once you have an XML document, you utilize an XML parser to parse the document.  The parser takes the XML document, and the definition for the particular document structure, and creates a tree of data.  This is shown on the next page.  The power in XML lies in that, unlike COBRA’s IDL files, XML files are truly universal.  Once you know a document type as defined in XML, you can create, use, and modify documents based on that definition in any programming language with an XML parser.  So, how does this come into play with web services?  There are three major items involved in web services.  These are:

·        SOAP (Simple Object Access Protocol)

·        WSDL (Web Services Definition Language)

·        UDDI (Universal Description, Discovery, and Integration)

All three of these items are nothing more than XML language specifications.  They are nothing more than a plain-text document written to conform to a markup language specification.  And, because they are plain text, they can be transmitted using the standard http protocol that any web-enabled server/host has access too.  No special software is necessary to link computers using web services.

 

WSDL is the web service version of CORBA’s IDL files.  They describe what methods may be invoked by the web service.  In other words, it controls what data can be accessed by the client.  Again, if the owner of the service does not wish to allow write access to certain data, they just do not describe those methods in the WSDL file, thus preventing the client from changing the data.

SOAP is how the actual data is transferred.  Once you know the methods available to you as a client (from the WSDL file), you know have to call that remote object’s methods.  You do this by creating a SOAP file with the information needed for the method to be executed.  You then transmit the SOAP document (over http) to the server.  The server then parses the SOAP document, uses the contained information to run the method, and get a result.  The server then packages the result in SOAP and transmits that information back to the client using http.  Finally, the client parses the SOAP document with the result, and utilizes the result.  This process is shown on the next page.

The final piece to the web service puzzle is UDDI.  UDDI allows a user to discover a web service through a UDDI repository.  This allows public web services to be found by anybody wishing to use that web service.  In addition to a description of the web service, the location of the WSDL file is also stored.  This allows you to utilize the service without having to directly contact the author of the service.

So, as you can see, web services help solve the problem.  There is now a universal hardware pipe and a universal software description language.  The diagram below shows a general architecture for web services.

Web services have been successfully executed by many projects.  For example, Goggle® allows any developer access to its search engine through its web service.

The above describes what web services are in a nutshell, and why they provide a workable solution to the sharing problem.  However, even with the above history, it does not show how the sharing problem is a great principal of information technology.

Sharing data is practiced by virtually every computer user (no matter their skill level).  Because of this, they are broadly practiced.  Whenever an Internet user is using a web page, he or she may be sharing data with other web sites and services.  Undoubtedly, when  a person purchases something with a credit card, the POS system is sharing data with a bank to ensure they are sufficient funds in the buyer’s account.  This layer is unseen by most users, but is still there.  If they execute a search, they might be using a web services that Google.com provides.  If they are on a web site that tells them the current weather, they might be viewing information provided by a web service at weather.com.   When they purchase something on the Internet, their transaction might be handled by a web service or a CORBA based system.  Microsoft Office users are used to using Object Linking and Embedding to automatically update an MS Word document when the user updates an Excel spreadsheet.  The point is that sharing data between systems is a trait invisible to the user.  But users are sharing data across systems, and doing so will more than likely never cease.

The ability to share data between two systems has broad social impact because they change how users use the Internet, even though most do not know or care about the change.  However, they do see the increased functionality of web sites as it becomes easier to link together web sites and applications.  They notice better and faster results as, instead of writing their own procedures, companies can just use the procedures of other companies and stop reinventing the wheel.  They notice the convenience of debit cards.  Because it affects the Internet as a whole, the ability of two systems to share data affects every one of the hundreds of millions of Internet users.

The answer to the question of how two systems share data is a timeless one because it solve an age old problem with enterprise applications: how to connect them.  This has been a problem since the beginning of computing, and has invented several technologies to assist: CORBA, RMI, etc.  Web services are the newest solution to this problem.  And, even if something newer and better comes along to solve this problem, that solution will take advantage of the knowledge gleaned from web services.  For example, CORBA developed the idea of separating the interface of an object from its implementation. Web Services then took this idea and modified the idea so that the “envelope” describing the objects methods was an easy-to-transmit plain text (XML) file.  The shared memory architecture used for first sharing data between different processes on the same machine developed the idea of allowing multiple users different reading and writing authority on shared objects,  The same encryption techniques that have been used to encrypt web pages data along the network can also be used to encrypt web services.  In the end, the history of sharing data has been creating one technology, improving it and adding to it, and repeating the process. 

Sharing data crosses all Information Technology disciplines because, when creating a web service, for example, all IT disciplines must be involved.  Web designers must help design the look and feel for the presentation logic.  Network engineers and system architects must develop the hardware backend that will implement the service.  Software developers must code the services.  Computer Scientists must continue to look for better and more efficient ways of taking a text file, parsing it to see which methods must be called on an object, and sending the results back through a similar text file.  Web services would not exist if it were not for every IT discipline.  Even when sharing data involved but a single machine, it still involved programmers, engineers, and computer architects to develop the hardware/software mix that made the sharing possible.

The problem of sharing data meets every qualification for being considered a great principal of information technology. Thus, sharing data is a great principle.


 

References

http://www.corba.org/success.htm, viewed November 2002.

http://www.uddi.org/, viewed November 2002

http://www.omg.org/gettingstarted/corbafaq.htm, viewed 2002

Denning, P. J.  1996. “Before Memory was Virtual, draft”

 

Stal, Michael. October 2002.  “Web Services: Beyond Component-Based Computing.”  Communications of the ACM, pp. 71-76.

 

Fabry, R.S., July 1974. “Capability-Based Addressing.”  Communications of the ACM, pp. 403-412.

 

Deitel, H. M., et al.  2003.  “Web Services: A Technical Introduction.”  Prentance Hall, NJ.