Friday 30 April 2010

Sprint 6 and progress-to-date review

Sprint 6 has been generally fruitful:

  • Silk Group file sharing is ready for use;
  • Silk Group file sharing online user documentation has been prepared;
  • Shuffl WebDAV storage module has been implemented;
  • Shuffl is now able to run from and save workspace data to an ADMIRAL system;
  • HTTP authentication works well with AJAX calls;
  • JISC 6-month report has been submitted.

The full review is available at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintReview_6.

This review follows a different pattern to the usual end-of-sprint review.  Having a data sharing platform ready for use by researchers was originally estimated to take about 1.5 months, but the work has actually taken about 3 months. The question driving this review is: "Why so long?".

In summary, we don't feel the execution has been greatly lacking (though other experts may disagree and we welcome discussion).  The mismatch between plan and execution seems to be due to oversights in the planning. Some lessons we offer are:

  • We should not assume that using tried-and-tested software means that there is no development risk.
  • We need to create a system that provides the user with a smooth, secure interface. Focusing on the user experience can easily push software components out of their design "comfort zone".
  • Plan for an extended period of "scoping experiments" with any external technology, to be conducted alongside user requirements gathering.
  • Plan to script as much as possible of a system build and configuration: this takes longer to begin with, but saves time in the long term as it facilitates repeating a build with minor variations.
  • Plan to create automated tests for as much as possible. It may seem stange to be creating "unit tests" for accessing a file system, but they have been invaluable for our work in testing access control mechanisms.
  • Plan to build separate systems for testing and live deployment. Automated tests should run with either.
  • When working with a number of users, plan a separate system build and deployment phase for each.
  • We estimate that ongoing user interactions require about half a day per week.
  • Estimated project management overheads (for a small project on the scale of ADMIRAL): 25% for the project manager, and 10% for other team members.
  • We estimate that about 20% of our development time is spent on creating technical documentation through wikis and blogs, but this is generally absorbed in the development effort estimates.

Projects aimed at changing user behaviour, such as ADMIRAL, are critically dependent on a solid technology foundation. Despite not being focused on technology development, we cannot dismiss the engineering effort that is needed to create a smooth, dependable experience for the users.

We estimate that access to in-depth expertise about the software we were using may have saved 25%-50% in some of the areas of time overrun.

Thursday 29 April 2010

ADMIRAL and DropBox (WIN)

From the early days of ADMIRAL's conception, DropBox's user-friendly multi-platform interface was a model that we wished to emulate. Unfortunately the non-local storage of files in DropBox is a showstopper for some of our users.

For users who are not concerned about off-site storage of their data, we have experimented with using DropBox alongside ADMIRAL, with a good degree of success. Our findings are described in the ADMIRAL wiki.

In summary, DropBox can be used to syncronize a remote computer with a user's ADMIRAL filestore area, thus providing an easy-to-use remote access, file recovery, file versioning capability and easy updating from any web browser. This works as an add-on to, and does not affect, the core ADMIRAL capabilites.

Web resource authentication and AJAX (WIN)

We've just run a series of experiments to test how web resource access controls work for AJAX requests. The good news is that (for us, at least, using a Firefox browser and Apache server) they work just the way we want them to.

The test environment used was:

  • Ubuntu 9.10 with Apache 2.2 server with mod_dav, etc., configured with ADMIRAL user accounts
  • Ubuntu 9.10 Firefox 3.5.9
  • Shuffl development code

We copied the Shuffl source code into a WebDAV-enabled area of the file server, modified one of the Shuffl demo applications to register a WebDAV storage handler, and loaded the demo application workspace in a browser.

We were able to:

  • save a modified workspace into a non-access-controlled WebDAV directory without entering any user credentials,
  • save a modified workspace into an access controlled directory on entering appropriate user credentials,
  • access but not modify files in a user's directory when authenticated as the research group leader.
Further, we observed that, when logged in as one user, attempts to access the area of another user was refused until correct credentials were provided.

We had anticipated that the AJAX calls might require authentication by the Javascript code if the required authentication cookies were not already established when loading the web page. The plan was to ensure that the application web page would be protected so that user credentials would be required when loading that page. As it turns out, even this simple step is not needed.

Wednesday 28 April 2010

Shuffl WebDAV test cases all running (WIN)

We have successfully implemented a WebDAV storage module for Shuffl that supports all storage interface functions, including listing the contents of a collection.

This paves the way for a more user-friedly Shuffl interface for loading and saving workspaces.

Tuesday 27 April 2010

WebDAV and Javascript same-origin violations

We've noticed some strange problems using WebDAV to access a server running on the local development machine (i.e. "localhost"). We're using Ajax code running in Firefox to issue the WebDAV HTTP requests, and an APache 2.2 server running mod_dav, etc., to service them. We're using a combination of FireBug and Wireshark to monitor HTTP traffic.

The immediate symptom we see is is that the HTTP requests using methods other than GET or POST (specifically DELETE and MKCOL) are being rejected with access-denied errors without being logged by FireBug. But looking at a Wireshark trace, what we see is an HTTP OPTION request and response, with no further HTTP exchange for the operation requested.

What appears to be happenning is that the HTTP OPTION request is being used to perform "pre-flight" checking of a cross-origin resource sharing request, per http://www.w3.org/TR/access-control/, which is being refused by the server's response.

This was puzzling for us in two ways:

  1. That a request to localhost was being rejected in this way, and
  2. The use of the cross-origin sharing protocol, which is performed "under the hood" by recent versions of Firefox.

The rejection of localhost requests is not consistent: on a MacBook, the request is allowed (still using Firefox and Apache), but on some Ubuntu systems it is refused. When the request is refused, the workaround is to send it explicitly to the fully qualified domain name rather that just "localhost". (This is a bit of a pain as it means our test cases are not fully portable, and I'm hoping we can later find an Apache configuration option to allow this level of resource sharing.)

UPDATE

It turns out that the above "observation" is a complete red herring. We had failed to notice that the web page was being loaded using the full domain name of the host, rather than http://localhost/.... When a URI of the form http://localhost/... is used to load the HTML page that invokes the Javascript code, then WebDAV access to localhost works just as expected.

Wireshark

We've been using Wireshark to help debug and understand protocol flows. I've used Wireshark before, and its predecessor Ethereal, but I've been very impressed at how easy recent versions are to install and use (on Linux and MacOS, at least) for high-level software debugging.

The HTTP protocol decode is really useful, and it handles messy details like re-assembling TCP packets so that protocol units are clearly displayed.

Also, it works very well with a local loopback interface, so it's not necessary to fiogure out arcane filters to exclude background network traffic when debugging a local client/server interaction.

Under Linux, remember that the Pcap library is also needed - the Ubuntu package name is libcap2-bin. Under recent versions of Ubuntu, it is also necessary to set appropriate privileges: see http://wiki.wireshark.org/CaptureSetup/CapturePrivileges.

Listing a directory using WebDAV

I found it surprisingly hard to find this simple recipe on the web, so thought I'd document it here. The difficulty of achieving this with AtomPub has been one factor holding back wider use and further development of Shuffl, so hopefully that will be changing.

To use WebDAV to list the contents of a directory, issue an HTTP request like this:

PROPFIND /webdav/ HTTP/1.1
Host: localhost
Depth: 1

<?xml version="1.0"?>
<a:propfind xmlns:a="DAV:">
  <a:prop><a:resourcetype/></a:prop>
</a:propfind>

Or, using curl, a Linux shell command like this:

curl -i -X PROPFIND http://localhost/webdav/ --upload-file - -H "Depth: 1" <<end
<?xml version="1.0"?>
<a:propfind xmlns:a="DAV:">
<a:prop><a:resourcetype/></a:prop>
</a:propfind>
end

The response is an XML file like this:

HTTP/1.1 207 Multi-Status
Date: Tue, 27 Apr 2010 09:38:30 GMT
Server: Apache/2.2.14 (Unix) DAV/2 mod_ssl/2.2.14 OpenSSL/0.9.8l PHP/5.3.1 mod_perl/2.0.4 Perl/v5.10.1
Content-Length: 706
Content-Type: text/xml; charset="utf-8"

<?xml version="1.0" encoding="utf-8"?>
<D:multistatus xmlns:D="DAV:" xmlns:ns0="DAV:">

<D:response xmlns:lp1="DAV:">
  <D:href>/webdav/</D:href>
  <D:propstat>
    <D:prop>
      <lp1:resourcetype><D:collection/></lp1:resourcetype>
    </D:prop>
    <D:status>HTTP/1.1 200 OK</D:status>
  </D:propstat>
</D:response>

<D:response xmlns:lp1="DAV:">
  <D:href>/webdav/README</D:href>
  <D:propstat>
    <D:prop>
      <lp1:resourcetype/>
    </D:prop>
    <D:status>HTTP/1.1 200 OK</D:status>
  </D:propstat>
</D:response>

<D:response xmlns:lp1="DAV:">
  <D:href>/webdav/shuffltest/</D:href>
  <D:propstat>
    <D:prop>
      <lp1:resourcetype><D:collection/></lp1:resourcetype>
    </D:prop>
    <D:status>HTTP/1.1 200 OK</D:status>
  </D:propstat>
</D:response>

</D:multistatus>

There, that was easy!

Tuesday 13 April 2010

Sprint 6 plan

The ADMIRAL sprint 6 plan has been posted at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintPlan_6.

Notes from the planning meeting are at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_PlanMeeting_6.

(See previous post for Sprint 5 review)

Thursday 1 April 2010

Sprint 5 review

We've just reviewed a particularly long sprint.  Notes of the review are at http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_SprintReview_5.

(We chose to run a long sprint with an intermediate review because we feel that having sprints less than 2 weeks creates too much planning overhead in our particular circumstances - reviewing and adjusting the plan half way through the sprint seems to be a workable compromise.)

The primary goal of the sprint was to complete deployment of a live ADMIRAL file sharing service for the Silk Group, based on a new virtual hosting environment.  We fell just short of that goal, mainly due to having to introduce new, unanticipated, access control mechanisms to handle the group's specific requirements.  But in the process of doing this, we seem to have developed a solution to a problem for which the conventional wisdom seems to be "you can't do that", viz. effective file access control for files that can be created via HTTP/WebDAV ''or'' locally or by CIFS. The new ingredient that makes this possible is Linux ACLs.

An initial draft security model for ADMIRAL has been created, though will surely benefit from review and revision as the project proceeds.

Maybe the biggest disappointment of this sprint was our failure to convene an intended monthly meeting, due primarily to unavailability of key partners, and possibly also because of not allowing enough lead time for planning.