Skip to content

ArchivedITReview

joshuamckenty edited this page Aug 15, 2010 · 10 revisions

AlessandroNotes

Great Experience, IHMO it was about the time to have such a cross domain/religion/philosophy meeting related to the IT developments within the seismological community, with the support of an extremely updated group of experts with different backgrounds and experience. Hence, first of all, thanks to the GEM group that made it happen!

The Requirements

It seems that one of the topics where the reviewers reached a general consensus is the need for the accomplishment of an important target: The Definition of Use Cases and Functional Requirements. And this point is crucial.

Accordingly on my personal experience as a software developer on big projects having as a customer a scientific community, quite often we are requested to have the role of the user, the coder, the designer, the architect, the tester, and last but not the least, the sales manager.
We have to realize whether there`s a need or a problem, solve it and then sell either the problem or the solution :-).

There`s an interesting article talking about scientific portals or VREs (Virtual Research Enviroment) which says:


“The development and presentation of a VRE must be embedded and owned by the communities served and cannot realistically be developed for the research communities by others in isolation. Since the intention is to improve the research process and not simply to pilot technologies for their own sake, the research must drive the requirements”

“A VRE which stands isolated from existing infrastructure and the research way of life will not be a research environment but probably only another underused Web portal”

Michael Fraser Co-ordinator, Research Technologies Service, University of Oxford
http://www.ariadne.ac.uk/issue44/fraser/

The approach of GEM should consider to involve as much as possible the users community in the process of defining the requirements. This will help the developers on focusing on the right solution adopting the technique that more fits their needs and their skills. Moreover in the light of the extremely useful tips and guidelines proposed by the reviewers, those skills that are currently lacking can be quickly improved whether an accurate analysis of the problem and the understanding of the available technologies motivates a change of direction.

GEM1

The design of the GEM1 project depicts an extreme modularization and independence of each piece of the architecture, from the database design to the service and presentation layer. A database expert might not be as good on front end or UI development, so “sometimes” the decoupling is a good thing. This obviously required the ability to gain a deep knowledge on a wide and sometimes complex stack of technologies, standards and formats.. the question is, does this lead to the over-engineering of the project? Is over-engineering always bad? IMHO, the lack of clear requirements leads to over-engineered software.

In general, for all the Java developments I`d suggest to consider the adoption of the Spring framework which helps in keeping the code clean, modular, testable and lightweight thanks to massive adoption of the IOC and singleton patterns. It provides the glue to integrate several small components in a bigger architecture, keeping the small things.. small and testable. unit testing with spring and the available IDEs works just great.

The webservice approach

Given the Use Cases, there`s the need to understand which of them has to be implemented as a webservice (in its general definition of a programming API available on the web) and which on the other hand doesn`t need to be a webservice.
For instance, talking about a browser based product, most of the user interaction could be implemented through a normal MVC pattern where the front end is “directly” connected with the database, better saying to the objects stored in it. Would it be possible and worthwhile to model every use cases as an aggregation of REST webservices? Probably if you start a project from the scratch that`s the way to go..

Some different considerations have to be done in respect of asynchronous call to processing facilities. It`s advisable to use a queue based system where the fire and forget approach is given by definition and the queue can retry a number of times if the delivery fails, submitting a certain number of job in parallel or in a chain. A webservice communication over HTTP implies that the requester needs a response back., Do you want to model such a protocol from the scratch? Is there anything already available?
A combination of the two might require to get into a better understanding of the WPS OGC specs. This link provides some interesting thoughts
http://www.cadmaps.com/gisblog/?p=28.

To Portal or Not To Portal

JSR portlets are complex stuff if you don`t rely on the proper framework, but the general concepts behind a JSR portal, such as personalization, component-based development and the reuse of tools are spreading among several web frameworks implemented also with other technologies, which is worthwhile to investigate better.
On the other hand, several successful projects and teams in e-science adopt the JSR-168 solution, suggesting that this approach might enable a better integration of external expertise on scientific portal development.

Portals provide the ability to aggregate access to applications. In many development environment or collaboration, these applications are owned and maintained by disparate groups where the coordination of release schedules can be difficult or impossible. In such a scenario, WSRP provides the advantage of allowing a portlet or group of portlets to be released independently of the main portal application. This, together with the opportunity of cross domain collaborations, was a fundamental feature within the NERIES project.
OpenSource project like Jetspeed provides a very lightweight stack, while Liferay is moving towards a cross platform solution which is extremely interesting, besides many new social features provided on the shelf.

If the mentioned motivations are not an issue in GEM, probably considering a framework such as GeoNode, where many interactive features related to a collaborative manipulation of GIS products are implemented out of the box, could be helpful, also in terms of future collaboration and contributions to the Risk Assessment web development community.
I`d suggest though to wait for a stable release rather than rely to the existing Beta.

Interoperable Metadata

To what extent some data product should made publicly available on the web, and in which format?
Investigating on proper metadata and publication philosophies such as LinkedData http://linkeddata.org/ might lead to a cross domain interoperability and discovery of public data products.
This is also related to some concerns expressed by Fabian during the QuakeML talk.

Being Social within and beyond GEM

Datasets, data-products, discussions, GEM portal activities. Might be useful to push all these meaningful information outside the fences of the GEM infrastructure, in order to achieve ease of access to interesting results and wider visibility among the domain field community and beyond.

For this purpose consider beside the  implementation of a professional network within the GEM portal, the adoption of well established web2.0 platforms and tools widely used by millions of users, such as iGoogle and Twitter.

AndreasNotes

Data Storage

Well established file formats for spatial data:

  • vector: Shapefiles
  • raster (generic): GeoTIFF
  • raster (hazard related): AME

Developer Community

After talking with some of the team members, it seems that most of them are not expecting to attract developers.

One important thing to consider here is: setting up a FOSS style software development process is a win no matter what. And once this process is in place, accepting external contributions is not a burden, but a help.

FrankNotes

Gridded Data Management.

  • Keep gridded data products outside the database (ie. hazard maps, hazard curves).
  • Keep gridded data products as managed raster files, with references by filename in the database.
  • Support accessing gridded data products by WCS, including implementation of shaML support in one or both of GeoServer and MapServer, and clients like GDAL.
  • WCS “references” could be the primary means by which gridded data products are used from remote locations when copying is discouraged.

Processing Engine

The existing risk processing engine does not seem to address some concerns I have:

  • approaches to distributing the work over a cluster. Try to avoid complexity, or deep ties into specific clustering technology.
  • Look into an engine capability to split up large product calcuations into chunks (ie. great a global calculation into smaller tiles)
  • How to integrate existing processing algorithms (possibly like OpenSHA) that do not work at the same fine grained level as the processing engine. For instance, some algorithms may not be easily broken down into stackable filters, and may not support the virtualized access to input data.
  • I think they need to have a distinct configuration file input format to drive the processing engine(s) so the processing engine is quite distinct from the web services. The configuration file (referencing other input files) becomes the definition of a processing run.
  • For local processing what they should distribute is the engine + modest tools to prepare the “run” configuration file.

Portal, Web Services

Not my area of specialty, but I am doubtful about the use of SOAP/WSDL for the web services. It is a heavy approach which is clumsy for clients. I would contemplate a lighter weight ReST approach for the web services instead of the SOAP/WSDL approach.

Further, I would consider a portal development approach that is more organized around JavaScript client technology (as was done in the Pavia prototyping) built against the ReST API for web services rather than the Java Portlet approach.

I do think that an effort be made to avoid passing large objects (like whole hazard maps, complex logic trees, etc) between web services. Instead identified large objects should be referenced, possibly from the database or from files on disk.

Data Formats

  • Spend time defining, and documenting file formats to be used for import/export of data with the GEM system, including the formats used as input to the processing engine(s).
  • Utilize pixel interleaved GeoTIFF with specific metadata extensions (possibly parts of shaml in a tag) as a working, interchange and archive format for hazard curves and hazard maps. Such a format is compact (binary), efficiently accessable, and already supported by many existing software packages. I can assist.
  • Use of shaml as-is for processing inputs seems ok.
  • Try to avoid creating specific formats where a simple/specific profile of an existing format (like GML) would do (for faults, etc)
  • Put some effort into identifying existing adhoc tools for preparing system inputs, and visualizing system output products.
  • Put some effort into developing additional adhoc tools for working with the data formats, possibly including development of GDAL/OGR drivers (for the C/C++ stack), and GeoTools format handlers (for the Java stack).

Building Modelling

Future work around capturing information on building characteristics globally was not discussed in any depth during the IT review, but I believe there needs to be some thought applied to how the information is stored, accessed, and managed.

  • Consider developing a simple GML profile for building models.
  • Consider storing in a distinct building model database lest the volume of building models eventually collected eventually overwhelms the general purpose GEM database.
  • Consider offering WFS access for read and update to the building model database.

Strong Points

  • For the most part the normalized database structure seems sensible (bulky data notwithstanding).
  • The LDAP auth architecture with users/groups seems good.
  • The development of shaML as a core working format for interchange, and data input/output from the calculation engine seems good.
  • It looks like excellent work was done building on OpenSHA.

Auxiliary Points

  • LGPL is an ok license for the developed code, though a non-reciprocal license (like BSD, MIT, etc) would allow folks like insurance companies to make proprietary improvements to the modelling code without an obligation to release them.
  • SVN is adequate for source control, though a distributed VCS like Git has some minor advantages.
  • I don’t see any compelling reason to move development from Java to Python with the possible exception of adopting a technology like Django. In any event, having the processing engine in Java is fine.

MapServer vs. GeoServer

  • Both are ok, so it would likely be best to let the team pick whichever seems like the best fit.
  • Be prepared to invest some effort back in support for GEM oriented file formats for whichever server technology is selected.
  • Note that GeoServer is well suited to web based feature update vis WFS-T and a client like OpenLayers. MapServer does not support WFS-T (update via WFS).
  • It is possible it will make sense to deploy both GeoServer and MapServer for particular purposes.
  • Ensure that service deployment is based off the product definition within the GEM DB, not having to dump all the data to disk in duplicated forms. For instance, via some appropriate wrappers it should be possible to access any hazard map in the GEM DB without having to constantly dump the GEM DB map list out to some particular file format (MapServer .map, or GeoServer configuration file). In MapServer this would normally be accomplished with dynamic map technology using MapServer to lookup details in the GEM DB. Some similar mechanism no doubt exists for GeoServer.

Open Source Contribution

  • It is unlikely that there will be a great deal of outside contribution to the core DB and webservices of OpenGEM.
  • There might be some contribution of portlets for the web site.
  • There will almost certainly be some contribution of adhoc tools for preparing, visualizing, translating and managing the various inputs and outputs of GEM. Some should be “captured” by GEM, while others will exist in other homes (ie. GDAL/OGR) and should just be referenced as available resources.
  • Likely there will be scientists wishing to development experimental/local variations on the modelling code and configurations. These cannot be upstreamed without great care (to avoid invalidating the global modelling code), but it would nice to be able to share these effectively. Use of a distributed source control system likeMercurial or git might be helpful in this regard.
  • Some contributions will come as improvements to packages like GeoServer, and GDAL used by GEM.
Clone this wiki locally