In a recent gridblade.net blog post (Virtualizing TestNG unit tests storage), I suggested to virtualize the storage of Voyeur Tools as a prerequisite to virtualize its unit tests storage. In this blog post, I'll share a few more thoughts on that topic.
Voyeur Tools is at its core a suite of text analytics tools that can be applied to a corpus (a bunch of documents). These tools can be classified into four categories:
- corpus indexing tools
- corpus analytics tools
- corpus reporting tools
- corpus update tools
Corpus update tools notwithstanding, Voyeur Tools therefore does two things: (1) store and index incoming documents and (2) provide derived data about the indexed documents through a visually-rich web interface.
The visually-rich web interface is independent from the analytics back-end, save for a basic API consisting of a list of key-value parameters as input data and a JSON data structure as output data. The testing of the web interface is done by Stéfan Sinclair, Voyeur Tools project lead.
Testing Voyeur Tools, from my perspective, thus consists of testing a large software that relies on a set of clearly identified I/O channels with its environment:
- it receives input data as a list of key-value parameters and outputs text serialized into JSON
- it downloads HTTP-addressable data from the web and also reads files from local or distributed filesystems
- it writes and reads index data through a Java API (about a dozen interfaces with read/write methods for specific data types), which makes it straightforward to virtualize its storage
- it also writes and reads files to and from /tmp (which will also be soon virtualized and put behind an interface so that an in-memory temporary storage space can be easily configured)
In order to allow their virtualization, the unit tests of Voyeur Tools involving outside I/O channels fetch files from JAR resources and download web data from a test web server running on localhost. Both of these concerns (reading local data files, downloading web data) could also be put behind an interface and thus virtualized. The default implementation could rely on the classic java.io and java.net APIs and the virtual implementation could rely on JAR resources and a locally-hosted test web server.
Coming back to the Google Testing Blog post that started my reflections and led to this blog post, the next step is to investigate how Voyeur Tools could be fully tested in the cloud.
There are many benefits to running unit tests in the clouds, among them scaling out (thus speeding up) unit tests even more than by relying on mutlithreading, as well as facilitating the use of a continuous integration server.
I'll restrict this discussion to Java-based PaaS clouds (e.g. Google App Engine or Appscale) in order to control the whole testing process from within the JVM. IaaS clouds (e.g. Amazon web services) are, here, too broad in scope as they would require extra configuration work to configure the testing process from outside of the JVM.
App Engine (Java flavor) offers a servlet container that nonetheless offers a more restricted API that what is available from a typical JVM, and has a completely different storage model and a completely different multithreading model. Assuming that unit tests are run using in-memory storage, and assuming that outside I/O channels of Voyeur Tools are also virtualized in a near future, the only feature that I foresee would require extra work is multithreading. Indeed, App Engine servlets can't launch new threads but instead must rely on the Task Queue API.
To wrap up, I'll just add that I'm very enthusiastic about investigating in the next few months how Voyeur Tools could be fully tested, from the bottom up, upside down, in the cloud. It opens up the way to architectural improvements as well as day-to-day productivity benefits, and also to learn a lot more about software testing. :)
UPDATE (WEDNESDAY, MAY 11, 2011)
With the release of App Engine 1.5.0 yesterday, coincidentally one day after the publication of this blog post, the Pull Queues API brings the multithreaded task processing model of App Engine one step closer to the standard JVM API (java.util.concurrent). This is great news, as it definitely facilitates the virtualization of multithreaded task processing in Voyeur Tools.
