Monday 25 February 2008

Web Service Development Practicalities

With a current client we've attempted to capture larger issues and practical development ideas beyond just the technical ABCs on implementing web services. We've published it here to hopefully be useful to the casual reader.

The following contains:

1. A brain dump of issues experienced in the past on web service projects that can blow out project delivery, affect the quality of the technical solution, and just plain frustrate users, developers and project managers alike. As web service projects bring in a 3rd party to supply part of the service, there appears to be an "error multiplier" (much like the military's concept of a "force multiplier") meaning that there is a greater chance of problems occurring.

2. A discussion on small ideas to help reduce the pain and build into your project plan and development beyond just building programs to consume/publish web services.

As this is a brain dump the following is certainly not a definitive guide, and not overly well structured or articulated article. In other words your mileage may vary.

Note that the discussion bends towards development from the Oracle database with mention of transactions, triggers, PL/SQL routines and the like. The discussion is also more considerate of the consumer, not publisher of web services - though both parties can take value in reading the points below to consider the other party's issues in using web services.

If you have anything to add, agree or disagree with any of the following points we'd appreciate in hearing your experience and ideas.

Development Issues

Documentation - 3rd parties publishing web services will often supply documentation on the web services to the new consumer, include descriptions of the WSDL files, their URL locations, the SOAP XML payloads and the business processes that result. New development teams should take care to confirm that the documentation supplied matches the actual services published as this is an early indicator that if they differ, the web service and external organisation are off the rails.

Network Connectivity - if you've ever sat at an organisation where you're frequently yelling at the network administrator that "I can't get to Google" or the XYZ sub-domain isn't accessible, chances are you're sitting on an unreliable network. Such network issues will become exasperating in a web services development project as you try and work out what's gone wrong this time, or are waiting once again for the network to come up.

Server Connectivity - as an extension of network connectivity, server connectivity and stability of the Application Server that publishes the web services is essential. If the web services reside on an App Server, Operating System or hardware box that goes up and down on a daily basis, it's pretty much guaranteed to go down during development.

Test Environment Verification - a good test web service needs to provide facilities beyond that of the production web service for development purposes, such that you can verify your transactions. Without this you may need to manually verify the transactions, or god-forbid actually phone somebody and check the results.

Test Environment Stability - it's common to provide a test web service environment in addition to the production web service servers. However there is a difference between providing a test environment for your own testing and development, and providing a test environment for other parties to test with; these should not be the same server. If you're currently using a published test web service from an organisation that is using that test service to do their own testing and development on, chances are you see that server go up and down, slow to a crawl, time out, change it's functionality, report garbage data, report no data, or the web service APIs will change.

Firewalls - typically at each end a firewall will exist between the consumer and publisher, and the firewalls will need to be configured to let web service traffic through. This may be both IP and Port blocking. As soon as you find out what needs to be configured in the development and test environments, request that the changes be made for the production environment so this isn't a roadblock for your production install. Because of red-tape and security constraints at your organisation and the web service publishers, this may be a long fight.

Solution Considerations and Practical Development

Error Handling - web service transmissions can fail at a number of different points and consideration on how to handle the error conditions needs to be applied. This may require flagging the unsent data and attempting to "resend" at a later time.

After a number of failed attempts for 1 or more messages the system should log the error in such a fashion that a human will be notified (either via email, in Oracle OEM logs etc). Once failed the system should stop attempting to resend and will not automatically restart but instead must be manually restarted. This requires the appropriate procedure for operations staff.

If the publishing server has been down for considerable time such that a number of waiting messages have accumulated, on the external service becoming available consideration should be given to only sending a batch of the waiting messages as to not flood the external server and network (and bring it down again via a DOS attack).

If your system fails after transmitting its payload the transaction needs to be written in such a manner that it doesn't rollback to a point where it thinks the message was not sent and doesn't erroneously send it again. Considered use of PL/SQL autonomous_transactions will take of this.

Soap Failures - the Soap web service protocol has an error reporting "fault" mechanism. Any custom code you deliver needs to be able to detect what is an expected response and what is an unexpected response, how to handle known faults or unknown faults and log them appropriately with as much detail as possible for debugging purposes. Though the Soap protocol defines a number of different soap fault responses and mechanisms to deal with these, as web services may be hand-crafted solutions you may see totally arbitrary error handling capabilities.

Network and Server Latency - the publishing server's response time can be variable dependent on network latency and server load. As such any solution should consider carefully not communicating to the external server within the same transaction that a human is part of; the wait time can become infuriating for the user. Instead write the message to be sent to a separate table/data structure then commit, with a separate independent process periodically searching for new records to be sent.

In particular be mindful of having database table triggers that call the web service routines to send messages. If a user undertakes DML on the table the operation may hang until the web service call is complete.

Test Utilities - as web services can fail at many different areas (connection, timeout, payload errors etc), it's prudent to write simple test programs that help diagnose these issues rather than depending on your final production code under development. Such utilities will help you debug and diagnose the issues without the bloat of your own code. If written properly such tools can be included in production solutions to detect issues as they occur and log the issue or notify the appropriate operational person.

Own Test Web Services
- if the external provider is having issues with providing a consistent test service, consider creating your own test web service based on the WSDL and XML payloads that they have published to keep your development going.

Service Level Agreements - care needs to be given to get the web service publisher to provide SLA on both the test and production environments. In particular uptime, not changing the specifications, not changing the business process, and notifications of system disruptions and potential future changes are essential. If you detect a casual response to this, be wary what this implies, a casual response or non concern of providing you the services.

Verification Utilities - the publisher of the web service could change their custom web service API at any time regardless of SLAs. Programs to detect changing WSDLs as well as fault handling in your programs to detect changing SOAP XML payload structures, along with reporting to a higher authority can initiate discussions about "what have you guys changed now?" rather than wasting time on "why isn't our program working now?"

2 comments:

Hemanth said...

As a web service developer i like to add some more things on this:
1. As the connection configurations are totally different from the development environment to that of production environment we need to make sure that all the naming conventions should match the prod environment before actually deploying the service to production middleware.

2. The consumer of the service should have a retry queue with a specific retry counts and not to swamp the server continuously during the server downtime and maintain a batch transaction from the queue when the server is up.

Chris Muir said...

Thanks for the additions Hemanth.

CM.