OBO Foundry Identifier Policy 

Background

OBO format and OWL are both valid syntaxes for OBO ontologies, and looking forward, it is the intention that these two languages be entirely interconvertible. One key requirement for interconversion is that both formats use the same system for handling unique identifiers, and currently these are handled differently. OBO format uses a string of the form [prefix]:nnnnnnn, while OWL uses URIs[1]. The purpose of this document is to establish a policy for OBO identifiers, a correspondence between identifiers in the two encodings, and to explain the rationale behind the choice that has been made.

This policy pertains to ontologies that have been submitted to the OBO - the Open Biomedical Ontologies, also called the OBO Library and ontologies that are part of the OBO Foundry[2]


The policies that are recommended here are intended to be normative for OBO Foundry ontologies and suggested for OBO Library ontologies. They do not speak to the use of OBO format for other purposes. Feedback on these policies should be sent to one of two mailing lists: obo-discuss or obo-format


This document addresses identifiers only for ontology terms, and not for Dbxrefs. It is the intention that the mapping of identifiers used in Dbxrefs will be be based on the URIs established by the Shared Name Initiative.


Design goals 

There must be a predictable, bidirectional mapping between OBO ids, Foundry-compliant URIs, and OBO legacy URIs.


Format of OBO format ids

The syntax of OBO format ids is currently underspecified in the OBO format specification. Here is a more precise specification we will use for the purposes of this document. Some productions are adapted from the SPARQL specification. Although it is the intent that the characters used in OBO format are from Unicode, ids are further restricted to the unicode code points corresponding to ASCII characters with codes less than 128.

PN_CHARS_BASE_OBO ::= [A-Z] | [a-z] 
IDSPACE ::= PN_CHARS_BASE_OBO+ ("_" PN_CHARS_BASE_OBO+ )
LOCALID ::= [0-9]+ 
OBO_IDENTIFIER ::= IDSPACE ":" LOCALID

Format of Foundry-compliant URIs

FOUNDRY_OBO_URI ::= "http://purl.obolibrary.org/obo/" IDSPACE "_" LOCALID

Format of OBO legacy URIs

Those are found in documents that were natively authored using the OBO format and which were converted using the NCBOoboInOwl script before this policy was put in place.

LEGACY_OBO_URI ::= "http://purl.org/obo/owl/" IDSPACE "#" IDSPACE "_" LOCALID

Mapping of OWL ids to OBO format ids

The mappings between OBO format, Foundry-compliant URIs, and OBO legacy URIs are shown in Figure 1. 


Figure 1. Mappings between OBO format ids and URIs.

The mappings between OBO format, Foundry-compliant URIs, and OBO legacy URIs  is defined in terms of a regular expression substitution using Perl syntax. 

1. From OBO format to Foundry-compliant URI: 

    s/(([A-Za-z_]*):(\d+))/http:\/\/purl.obolibrary.org\/obo\/$1_$2/

2. From OBO format to OBO legacy URI: 

    s/(([A-Za-z_]*):(\d+))/http:\/\/purl.org\/obo\/owl\/$1#$1_$2/

3. From Foundry-compliant URI to OBO format: 

    s/http:\/\/purl.obolibrary.org\/obo\/([A-Za-z_]*)_(\d+)/$1:$2/

4. From OBO legacy URI to Foundry-compliant URI: 

    s/http:\/\/purl.org\/obo\/owl\/([^#]+)#\1_([A-Za-z_]*)/http:\/\/purl.obolibrary.org\/obo\/$1_$2/

5. From Foundry-compliant URI to OBO legacy URI: 

    s/http:\/\/purl.obolibrary.org\/obo\/([A-Za-z_]*)_(\d+)/http:\/\/purl.org\/obo\/owl\/$1#$1_$2/

6. From OBO legacy URI to OBO format: 

    s/http:\/\/purl.org\/obo\/owl\/([^#]+)#\1_([A-Za-z_]*)/$1:$2/

Using these rules the OBO id GO:0050918 is mapped to
    a) the Foundry-compliant URI
http://purl.obolibrary.org/obo/GO_0050918 and
    b) the OBO legacy URI
http://purl.org/obo/owl/GO#GO_0050918

idspace directive in OBO Format file

The idspace directive [6,7,8] in OBO format associates a prefix with a URI. idspace directives can be omitted in OBO format files, for those IDSPACEs that are allocated to OBO ontologies. If present they MUST agree with the mappings above.

ID expressions and xref header directives in OBO Format files

Note: This section references OBO specification version 1.3, which is deprecated. Readers should consider this section as background material.

OBO 1.3 [7,8] introduces identifier expressions that can be taken as a kind of macro that expands into a logical definition. Although the surface syntax of id expressions conforms to the OBO format for identifiers they are not considered identifiers but rather shorthand for un-named class expressions which should be expanded (see [8] sec 4.1) MC: [8] 4.1 is "4.1 Relation to OBO-Format 1.2 "  Should it be 4.11? AR: TOC of sync: Chris FIXME before applying the id mapping policy. Therefore the mappings above do not apply to id expressions. Note: Although the draft OBO 1.3 specification allows id expressions to function as normal ids, including being associated with annotations such as label and definition, this will likely be revised to disallow such use.

Example of ID Definitional Expressions: GO:0005737^part_of(CL:0000023) can be used wherever one wants to say "cytoplasm of oocyte". This is treated as if it has the following definition:

[Term]
id: GO:0005737^part_of(CL:0000023)
intersection_of: GO:0005737 ! cytoplasm
intersection_of: part_of CL:0000023 ! oocyte



OBO 1.3 defines a further transformation that can affect the parsing of strings that on the surface might be transformed as identifiers, according to the mapping rules described above. Transformations specified by these directives, described in [8] sec 4.11, should be applied before interpreting any ids in an OBO format file. After transformation, ids should comply with the ID policy.

The process of "expanding" an OBO format file by applying the header directives for xrefs and to interpret identifier expressions, should be considered a source transform that given an OBO Format file with such elements, creates a new OBO Format file without any xref header directives or identifier expressions. Any mappings of identifiers are applicable to the resultant OBO Format file that is the result of such an expansion.

Policy for OBO Foundry ontologies

OBO Foundry ontologies MUST use OBO format identifiers that match the production OBO_IDENTIFIER if they are formatted in OBO format, and the production FOUNDRY_OBO_URI if they are formatted as OWL. Where an ontology is distributed in both formats, identifiers are mapped according to the substitutions defined in the section "Mapping of OWL ids to OBO format ids".


Response to Web requests for OBO URIs

It is expected that the Foundry-compliant URIs behave, on the web, usefully.  It will be the role of the OBO Foundry to supply generic software for responding to requests at URIs that identify OBO terms.


We borrow the criteria from the Shared Name Initiative (http://sharedname.org/) as a base line. The OBO Foundry may issue further recommendations if experience shows them to be considered generally useful.


  1. It must be clearly stated what the intended referent of each URI is supposed to be, i.e. that the URI denotes some particular record from some particular database.

  2. Information about the URI and its referent, including such a statement, must be made available, and in order to leverage existing protocol stacks, it must be obtainable via HTTP. (We will call such information "URI documentation".)

  3. URI documentation must be provided in RDF.

  4. Provision of URI documentation must be an ongoing concern. The ability to provide it may have to outlive the original ontology developer's group or creator.

  5. The provider of the URI documentation must be responsive to community needs, such as the need to have mistakes fixed in a timely manner.

  6. URI documentation must be open so that it can be replicated and reused.


Individual ontology projects may, at their discretion, choose to manage these responses, with the understanding that if service lapses the Foundry may substitute the generic software for handling them in order to maintain service.

Policy for OBO Library ontologies

OBO Library ontologies are not constrained by this policy, however, we recommend that they follow it nonetheless, for three reasons. First, it provides a uniform experience and sets expectations for ontology clients. Second, by doing so library ontologies will be able to take advantage of shared infrastructure. Third, ontologies that eventually join the foundry would have to disrupt their ids if they had to change to follow this policy.


Allocating IDSPACEs


IDSPACEs within the OBO library are unique for a given project and are chosen not to conflict with prefix for xrefs. Although IDSPACEs are case-sensitive, there will never be more than one IDSPACEs that are the same when compared in a case-insensitive manner. Therefore, although "GO" and "go", "Go" and  "gO" are different IDSPACEs, the IDSPACE "go", "Go" and  "gO" will not be used as "GO" has already been allocated. 


A registry of allocated IDSPACEs will be maintained. Requests for an IDSPACE should be made by sending mail to obo-discuss@lists.sourceforge.net, cc obo-admin@fruitfly.org. A request should include information about the ontology, such as scope and maintainer and a confirmation that the ontology is open access.


Resources at Known locations


Registry of IDSPACEs: http://purl.obolibrary.org/obo/idspaces.txt


A tab delimited text file with five columns, 


1) the idspace, 

2) a string indicating the status of the idspace. The possible values for status are 

    "OBOFOUNDRY" - The IDSPACE is of an OBO Foundry ontology

    "OBOLIBRARY" - The IDSPACE is of an OBO ontology, not currently a Foundry ontology

    "RESERVED" - The IDSPACE is used for dbxrefs or is otherwise unsuitable as an idspace for an ontology 

3) The name of the point of contact

4) The email address for the point of contact

5) A short description of the scope


Current ontology document:


The most current version of an ontology will be at the following URL, where "IDSPACE" is replaced with the IDSPACE of the given ontology in lower case.


  Current OWL: http://purl.obolibrary.org/obo/IDSPACE.owl

  Current OBO: http://purl.obolibrary.org/obo/IDSPACE.obo

For example, the Ontology for Biomedical Investigations has the IDSPACE "OBI", so the current version of the OWL document would be at http://purl.obolibrary.org/obo/obi.owl

Ontology subsets and variants:

Aside from the standard version, an ontology may provide differently packaged elements of the ontology, or ontology versions with additional contents. For example
    
    - Subsets of the ontology tailored to particular purposes or user communities, sometimes called subsets, slims, views, or modules.
    - A version with or without inferred relations as part of it
    - A version that includes more or less metadata, such as change tracking
    - Pre-release work or experimental extensions

The current version of such additional artifacts should be accessible at URIs with the prefix 

    http://purl.obolibrary.org/obo/<idspace>/<name>.owl
    http://purl.obolibrary.org/obo/<idspace>/<name>.obo

Where <idspace> is replaced by the IDSPACE in lower case, and <name> is the name for the artifact.

For example, IAO distributes its ontology metadata set as a distinct document, with <name> "ontology-metadata"


Versions of ontologies: 

Versions are named by a date in the following format: YYYY-MM-DD. For a given version of an ontology, the ontology should be accessible at the following URL, where <idspace> is replaced by the IDSPACE in lower case

  OWL: http://purl.obolibrary.org/obo/<idspace>/YYYY-MM-DD/<idspace>.owl
  OBO: http://purl.obolibrary.org/obo/<idspace>/YYYY-MM-DD/<idspace>.obo

For example, for the version of OBI released 2009-11-06, the OWL document is accessible at http://purl.obolibrary.org/obo/obi/2009-11-06/obi.owl

Ontology variants are versioned in the same manner. URIs for a given version would have the version date between <idspace> and <name>. For example.

  http://purl.obolibrary.org/obo/iao/ontology-metadata.owl for the version of date 2009-11-02 would have the URI
  http://purl.obolibrary.org/obo/iao/2009-11-02/ontology-metadata.owl
  

Home page:

If the ontology has a home page on the Web, it is accessible at http://purl.obolibrary.org/obo/IDSPACE. For example the OBI home page is accessible at: http://purl.obolibrary.org/obo/obi

Tracker:

If the ontology has a tracker for submitting an monitoring term and other requests, it is accessible at http://purl.obolibrary.org/obo/IDSPACE/tracker. For example the OBI tracker is accessible at: http://purl.obolibrary.org/obo/obi/tracker

Browse:

If the ontology developers provide or suggest a way of browsing the ontology, it is accessible at http://purl.obolibrary.org/obo/IDSPACE/browse. For example the OBI project suggests people browse OBI using the NCBO Bioportal, and so http://purl.obolibrary.org/obo/obi/browse redirects to the Bioportal view on OBI.

Wiki:

If the ontology developers provide a development wiki, it is accessible at http://purl.obolibrary.org/obo/IDSPACE/wiki. For example the OBI project wiki is accessible at http://purl.obolibrary.org/obo/obi/wiki.

History and Rationale


This policy was initially discussed, drafted and implemented as part of the development of the Ontology for Biomedical Investigations (OBI, http://purl.obolibrary.org/obo/obi) project.

Their goal was to provide stable and homogeneous identifiers for OBI entities, and to be able
to respond with providing a bounded amount of useful information for each URI denoting a term in OBI. The project chose[3] to use purl [4] based URIs, because of the ability to redirect to a different URL should we want to change hosts, etc. To protect ourselves against a future time when purl.org might not be as reliable, or in case we wish to substitute a different technology for resolving term requests, we use our own domain name, and have the DNS [5] point to purl.org.

The hostname chosen for this by vote of the OBO Foundry editors is purl.obolibrary.org. This dedicated hostname allows redirection at the DNS level, so that we don't require extra time for the resolution or dedicated servers to actually handle lookups.
 

While the initial preference was towards maintaining IDs as currently used by the community (e.g. GO:nnnnnnn), RDF/XML and N3 (the other RDF syntax) require the character after the colon to be alphabetic. (see QName production in the W3C specification Namespaces in XML and productions [7][8][11][4][6] - the last [6] NCNameStartChar ::= Letter | '_'  is responsible for the prohibition against leading digit)

All entities that we define - classes, relations, and instances - are assigned IDs. URIs are opaque, and we use labels for the human readable version. Editing tools can then be configured to display the labels instead of the identifiers.


Note that the OBO legacy URIs will be supported for dereferencing ontologies for some transition period, however applications that depend on referencing OBO ontology terms using the legacy URI will need to migrate to using Foundry-compliant URIs for ontologies that choose to use them.


The OBO legacy URIs are of the form http://purl.org/obo/owl/OBI#OBI_0100051.


Undesirable aspects of the OBO legacy URIs are: 



The adopted format, http://purl.obolibrary.org/obo/OBI_0100051, is as short as sensible while avoiding the above issues.


References


[1] http://en.wikipedia.org/wiki/Uniform_Resource_Identifier, a Uniform Resource Identifier (URI) consists of a string of characters used to identify or name a resource on the Internet. Such identification enables interaction with representations of the resource over a network (typically the World Wide Web) using specific protocols.


[2] The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration.


[3] Changing OBI's URIs to be purl based - discussion on obi-developers list


[4] http://purl.org/, A PURL is a Persistent Uniform Resource Locator. Functionally, a PURL is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL resolution service associates the PURL with the actual URL and returns that URL to the client. The client can then complete the URL transaction in the normal fashion. In Web parlance, this is a standard HTTP redirect.


[5] http://en.wikipedia.org/wiki/Domain_name_system, the Domain Name System (DNS) associates various sorts of information with so-called domain names; most importantly, it serves as the "phone book" for the Internet by translating human-readable computer hostnames, e.g. www.example.com, into the IP addresses, e.g. 208.77.188.166, that networking equipment needs to deliver information.


[6] The OBO Flat File Format Specification, version 1.2 http://www.geneontology.org/GO.format.obo-1_2.shtml


[7] The OBO Flat File Format Specification, version 1.3 draft: http://www.geneontology.org/GO.format.obo-1_3.shtml


[8] OBO-Format and Obolog Specification (1.3) DRAFT http://oboedit.org/obolog/spec/obolog-spec.pdf


Acknowledgments

This policy has been initially discussed, drafted and implemented as part of the development of the OBI project.

Authors: Alan Ruttenberg, Melanie Courtot and Chris Mungall.

Thanks to Jonathan Rees, Bill Bug, Colin Batchelor, David Osumi-Sutherland, Duncan Hull, Peter Robinson, Michel Dumontier, the OBO coordinators and the OBI Consortium for their help.