Straw Man Interoperability Charter Discussion Document
Straw Man Interoperability Charter
Introduction
Under the RDN/LTSN Interoperability Project, subject centres and RDN Hubs have agreed to exchange
metadata records via the OAI-PMH (the Open Archives Initiative Protocol for Metadata Harvesting.)
Technically the interchange of records is fairly straightforward given the simplicity of the
OAI-PMH and the availability of software tools to do this. However, record interchange has a number
of ramifications in the areas of rights management, collection management and the social aspects
of how to reach agreement about acceptable use of harvested data and when and with whom that data
might be shared further. Common policies also need to agreed amongst cluster partners in the areas of
record amendments, display, indexing and searching, and usage tracking.
The following is intended as a discussion document outlining a set of possible common policies
that could be gathered together as an Interoperability Charter to be agreed upon and signed up to
by all the members of a cluster.
1. Amendment of harvested data
- No record is to be amended except by the original data provider
- In order to achieve a commonality of cataloguing standards a forum for change requests might
be implemented to allow those requests and other queries to be sent between data providers and
consumers in a mediated way. This could be achieved using a email request ticketing system such as
RequestTracker
2. Crosswalking and changing records
- If a record is to be crosswalked to a element set with a lower specificity
(ie. from LOM to DC Simple), than that used in the original record, the original XML record
will be archived and only that copy will be exposed in any future
outgoing secondary feeds
- Crosswalking to a lower specificity constitutes and amendment to the record and if anyone
other than the originator changes a record then this becomes a new record
- You might wish to crosswalk LOM records to DC Simple for loading into a DC logical data structure
or for indexing purposes, you should not however re-expose these records in subsequent harvests with
the same oai-identifier
- Due to the problems associated with re-exposing records for secondary harvesting you may wish
to explicitly ban this within your charter
3. Meta-Medata Information
- metaMetadata should be used to provide a per record declaration of the attribution of the record
- metaMetadata information should include record creation and last modification time stamps
(and possibly details of the creator, editor or a list email address to send change requests and
other queries to)
- Investigation should be made about whether parsing out the above information from the about
section of OAI headers would suffice for the above
4. Displaying search results
- Search results for harvested records will be presented in a common summary format to be agreed as
part of this charter
- A link will be provided for each record as well as a branding logo of common size and format,
the link taking the user off to the source site for the full view of the record
- A warning that the user is leaving one site for another should be included, with the option
to disable this feature
5. Usage tracking
- The link will send the user via a logging Web Service / proxy to track usuage of each record, in
order to ensure data providers can produce all the usuage statistics required by their funders
- What information is gathered by this facility is to be agreed as part of this charter
6. Controlled vocabularies and indexing
- In order to ensure consistency of indexing, full use should be made
of common controlled vocabularies to be developed over time as part of
the interoperability effort
- A common thesaurus of equivalent or broader prefered terms could aid normalisation of subject
descriptors and names for indexing and summary display searching where centres are using
different vocabularies. Crosswalk mappings to be agreed by the whole
cluster of data providers.
- A Web Service could be built on top of this to allow normalisation of subject terms within
harvested records prior to indexing to aid normalisation and searching accuracy (as this
means that records are changed see requirement above about archiving the original record for
secondary harvesting)
- The last two points give data providers the option to either use the prefered terms from the
common thesaurus, or to continue using their own subject headings, whilst ensuring indexes
are normalised for searching purposes.
- Following this model further, we could also allow other normalisation processing prior to
indexes to fix name citations to a common syntax and title article ordering etc.
7. Display of second and third party reviews
- If second and third party reviews or other value added information is associated with harvested
records at consumer sites then agreement should be reached about how the original resource and
the extra data can be viewed synopticly while retaining clear attribution and providing the
opportunity to see the full record in its original context on the source site
- Base level agreement should be the summary view as defined above with suitably attributed extra
data on a single page (including links, logos etc as above for search results.)
8. Rights management
- Creative Commons rights metadata for each record should be included in each feed and stored in the
backend database on a per record basis, following guidelines emerging from the OAI-rights effort
9. Collection level descriptions and collection management policies
- Collection level descriptions should be produced for each data providers repository and stored
indexed centrally (at the Hub?). These can be linked to from the summary view of each record to
further augment attribution statements. Collection management policy statements for each collection
can also be associated in the same way
- The RSLP CLD schema should be considered for the above, but in light of the terms of reference
outlined in the CHEMS report.
- Online forms for creation of both the above will be made available
10. Mediating record updates
- As records can only be amended by the originator, an email ticketing system should be used to
provide a mechanism for mediation of amendment requests. Requests for changes to a record will be sent
to the system via email. The system loging and tracking all requests coming in. Each request
ticket can then can be claimed by the originator (i.e. a cataloguer at the centre
or Hub where the record was originally created).
This can be done over the Web.
The cataloguer can then make the necessary amendments
and annotate the ticket accordingly. This means a detailed change history can be created and
that both the cataloguer and person requesting the change are kept informed how things are
progressing.
- In order for the above to work effectively, metaMetadata needs to be assigned to each record
- This mediation allows ever member of the cluster to participate in ensuring overall data
quality and integrity
- The RequestTracker ticketing system is recommended