As promised, here is the blog post about the technologies we are going to be playing with to build our data repository. When we decided we wanted to pursue developing an institutional data repository we evaluated different pieces of software, weighing variables like maturity of system, the presence and type of community behind the system, flexibility for handling different object types and general future-proofedness. There isn’t much of a dramatic pause for me to insert here, as we’ve already written in previous posts that the outcome of this analysis was going with Hydra.
But what is Hydra? Hydra isn’t a single thing - an out of the box solution (though the community around it has set this as a future goal) - rather it’s a framework of different pieces of software, that come together to create an institutional repository. A Hydra installation can be used as a single interface to many different repositories, if we wanted to expand beyond the current scope of research data. Hydra is based on Fedora, the repository platform from DuraSpace, a nonprofit that supports a number of open source technologies related to digital assets (like DSpace and VIVO). Fedora is short-hand for Flexible Extensible Digital Object Repository Architecture and as its long-form name implies, Fedora is a digital asset management system capable of handling content regardless of type (GIS, A/V, images, text, data, etc). Of note, DuraSpace recently has released Fedora 4, which has some significant changes from Fedora 3, including being happier about ingesting larger files and by default providing RDF representation of content and relationships. The Hydra community is energetically working away at getting all of the pieces of the Hydra environment to play nicely with Fedora 4, and has advised that new adopters of Hydra to plan on using Fedora 4 from the get go, rather than create a situation that requires migration at a later date. So, we’ve had a bit of good luck here on our timing for jumping in!
So, Fedora is in charge of managing the objects, the other core components of a Hydra build include Solr and Blacklight. Solr is an open source search platform from Apache that indexes the repository content. Blacklight is the discovery interface that plugs into Solr and provides features like (customizable) faceted browsing, exporting results and saving search history. Now, those are just the core technologies, there are many other packages of code (referred to as gems in world of Ruby - the programming language behind Hydra) necessary to get an instance of Hydra up and running. The community has developed several different flavors of Hydra that leverage this framework of technologies in deployable web applications (technically, Rails engines), the one we’ve elected to go with is Sufia.
We’ve been working on use cases for our repository and our next steps are to define project phases, with realistic timelines and set milestones for each of these phases.