Naming
From OBO Foundry
Contents |
Naming Conventions for OBO Foundry Ontology engineering
The application of unified labeling or naming conventions for ontology engineering will help to harmonize the appearance and increase the robustness of terms within ontologies. This presentation provides a short introduction on this topic.
In this webpage you can find:
- A review that discusses and raises some issues about present naming conventions
- An initial set of naming conventions that we believe could provide robust labels for ontology classes
- The preliminary results of a survey carried out to review current naming convention practices applied by ontology groups listed under the OBO Foundry
Review of existing naming convention documents
In the forefield of our investigation we analysed some of the more prominent standards and recommendations that tackle naming of representational units in diverse representational artefact types. This informal review can be found here: NC_Reviews
Terminological considerations and types of names
We refer to the representational idioms of which an ontology consists (i.e. its classes and properties) as its representational units. A term is a single word or combination of words (compound term). A class or type is the representational unit that stands for a universal entity. The representational units that exemplify such classes within a knowledgebase are called individuals or instances. Individuals are representations of particulars in reality. Any representational unit that serves to relate representational units to each other is called relation and is a kind of property of the representational unit it is asserted to. Any term used to designate some entity is called a name. Within an ontology it makes sense to disambiguate and capture the following name categories:
- user-preferred name: An unformal name that is chosen to reflect the expectations of the end user and that is found in the literature of the domain with the highest frequency.
- editor-preferred name: A semi-formal name that adheres to the guidelines and naming conventions outlined within these pages, and that is not necessarily aligned with end-user expectations. The editor-preferred name should aid the humans and tools that build and manipulate the ontology and this name should be used to show up in the hierarchy when editing the ontology.
- formal name: A name that is fully controlled through explicit and rigid syntactic rules.
- short name: A very short name or display name that is sometimes needed to be able to display classes in large dense graphs.
- acronym: A name build out of pieces of words in a term, e.g. its initial letters.
- abbreviation: A lexically truncated short form of a name.
Further name categories can be distinguished, such as 'foreign language translation', 'phonetic name' and 'lexical variant' of a name. All of these categories should be exact synonymous to each other.
We recommend to capture at least an ‘editor-preferred name’ and a ‘user-preferred name’.
Summary of Naming Conventions
The editor-preferred name should comply with the following list of naming conventions:
|
Naming Convention |
Description |
Example |
Effect |
|
1. Be clear and unambiguous |
|||
|
Use explicit and concise names |
Keep names short and memorable, but precise enough to capture the intended meaning. Keep names linguistically correct and intuitively meaningful to human readers. Articles should be omitted. |
‘wall of esophagus’, ‘physical_part’ instead of ‘the wall of the esophagus’, ‘distinct_identifiable_physical_part’ |
Faster term recognition |
|
Use context independent names |
Apply names that are self-explanatory and understandable even when viewed outside of the immediate context of the ontology. Avoid truncated names and colloquialisms. In names, capture inherent and intrinsic characteristics rather than asserted and extrinsic characteristics. Avoid using names for non-role entities that refer to roles the entity referred to may potentially play in a particular context at a particular time. Capture product names as they are, but render them intelligible adding contextual information: [company name]+[product name]+[product type] (usually the superclass name). Additional information like the legal status of a company (e.g. Corp. or Inc.) should be omitted. |
‘NMR magnet’ ‘chemotherapy’ and ‘1ml pipette tip’ instead of ‘magnet’, ‘chemo’ and ‘blue pipette tip’
Use ‘Bruker US 2 NMR magnet’ instead of ‘US 2’ |
Increases precision in the interpreted meaning Helps string matching Faster term recognition |
|
Avoid taboo words |
Affixes reflecting epistemological claims e.g., words that indicate types of representational units should be avoided in name. |
‘protocol’ instead of ‘protocol class’ or ‘protocol type’ |
Faster term recognition Redundancy reduction |
|
Avoid encoding administrative metadata in names |
Administrative metadata, e.g., a class’ status and version should be factored out of the name and into suitable separate representational units |
‘protocol’ instead of ‘protocol (definition incomplete)’ |
Increases precision in the interpreted meaning |
|
2. Be univocous |
|||
|
Use univocous names and avoid homonyms |
Names should have the same meaning on every occasion of use and refer to the same types of entities in reality. Homonyms, ambiguous terms that share the same spelling but have many different meanings, are to be avoided as part of editor-preferred names. Use terms with fewest possible amount of homonyms in building names |
‘protocol_collection’ instead of ‘protocol_set’ for a plurality of protocols (store the latter as synonym), ‘parameter_adjustment’ instead of ‘protocol_set’ for the act of setting parameters |
Increases precision in the interpreted meaning Faster term recognition |
|
Avoid conjunctions |
Words that are used to join other words, such as the logical connectives ‘and’ and ‘or’ should be avoided in names as they can introduce ambiguity and may hamper inference by causing excessive branching. The same applies to qualifiers such as ‘in some cases’ |
In ‘anatomic_structure, system or substance’ it is not clear whether the adjective “anatomic” is restricted to “structure” or extends also to “system and substance”. In the first case the substances “drug” and “chemical” would be classified under this class, otherwise not. |
Increases precision in the interpreted meaning |
|
Prefer singular nominal form |
Use singular names throughout. Where plurals need to be captured, e.g. when one instance of the plural class represents a plurality itself, consistently use explicit plural indicating postfixes as part of the class names, e.g. use ‘aggregate’, ‘collective’ or ‘population’ consistently, but only as applicable. |
‘pair of lungs’, ‘population’ instead of ‘lungs’, ‘people_collection’ |
Increases precision in the interpreted meaning Helps string matching |
|
Use positive names |
Avoid use of negations in formulating names. Avoid complements and negative names like ‘non-separation device’ because logically this will include everything in the universe that is not a separation device. The absence of a characteristic is not a concise differentiating criterion. Do not represent the absence of a characteristic (e.g. wing) as the presence of the non-existence of a characteristic, e.g.: 'wing' has_status "absent". |
‘data recording device’ instead of ‘non-separation device’ |
Increases precision in the interpreted meaning |
|
Avoid catch-all terms |
Avoid ‘rag-bag’ words that do not designate natural kinds. The existence of classes is not dependent on our biological knowledge |
Avoid ‘unlocalised’, ‘unknown’, ‘unclassified’ |
Increases precision in the interpreted meaning |
|
3. Reduce string variance |
|||
|
Recycle strings |
Word compositions should be constructed in a consistent manner, rather than using para-synonymous strings interchangeably. When creating compound names re-use strings as they occur in names of entities already defined elsewhere in this or in other ontologies |
‘x part of process’, ‘y part of process’ instead of ‘x component of process’, ‘y portion of process’ |
Helps string matching Eases cross product generation |
|
Use genus-differentia style names |
Class names should reflect the differentia that distinguish the class from its parent class (modifiers to the head word). These should be the same that are modelled explicitly, so that the name compounds can be mapped to representational units that are connected to that class. |
’DNA-microarray’ is_a ‘microarray’ ’protein-microarray’ is_a ‘microarray’, where ‘DNA’ and ‘protein’ are defined elsewhere |
Eases cross product generation Helps string matching |
|
Use space as word separators |
Use the bar space (‘ ’) character as word separator, just as it would normally appear in the language of choice. Where use of the bar space is not allowed by the type of representational unit in use to store a name, the underscore ('_') should be used instead. Camel case should not be used as a means of word separation. |
‘DNA microarray’,‘ ‘pH value’ instead of ‘DNA_microarray’, ‘pHValue’ |
Faster term recognition Helps string matching |
|
Expand abbreviations and acronyms |
Spell out abbreviations and acronyms and capture truncated versions as synonyms. Acronyms that result in expressions that have other meanings should be avoided. Widely known acronyms (anacronyms) such as DNA and LASER can be used. |
‘high resolution probe’ instead of ‘HRP’ or ‘high res. probe.’ |
Faster term recognition Increases precision in the interpreted meaning Helps string matching |
|
Expand special symbols to words |
Special symbols and foreign language letter characters should be spelled out. |
‘degree Celsius’ , ‘alpha helicase’, ‘carbon-14’ instead of ‘°C’, ‘α helicase’, ‘C14’ |
Helps string matching |
|
4. Align Typography |
|||
|
Use lower case beginnings |
Don’t enforce dogmatically, but prefer lower case beginnings for class and property names. Capture names just as they would appear in normal English written text, i.e. where acronyms and proper nouns cannot be avoided in names they should be capitalized. |
Use ‘microarray’, ‘DNA microarray’, ‘pH value’, ‘Golgi apparatus’ |
Faster term recognition
|
|
Avoid character formatting |
Use plain ASCII format to keep names as computationally pliant as possible. Subscripts, superscripts and accents should be avoided. |
‘SIGMA-ALDRICH’ instead of ’Σ-ALDRICH™’’ |
Helps string matching |
Naming Conventions Survey
To initiate a discussion among the ontology groups listed under the OBO portal, we carried out a survey to understand what naming conventions these groups currently apply and what their special requirements regarding the naming of entities are. The survey was conducted by contacting the custodians of 68 ontologies by email or telephone. They received a questionnaire along with the above described initial set of naming conventions. The questionnaire was divided into four parts, covering:
- The ontology and its engineering process
- The current practice in naming entities
- Perceived benefits of common naming conventions
- Questions on particular naming conventions
The questionnaire can be downloaded here:Media:NC_survey.doc
A table illustrating the coverage and containing the list of participating groups is available here: http://msi-ontology.sourceforge.net/SurveyCoverageTable.doc
Currently, we are in the process of consolidating the results we have received to date.
Survey Responses
The questionnaire responses were received as word documents or in email format. These full documents can be found here:
http://msi-ontology.sourceforge.net/SurveyResponseDocs
Due to the open nature of the questions, the answers had to be normalized and were put into an EXCEL table for easy comparison. This table can be found here:
http://msi-ontology.sourceforge.net/SurveyAnswers.xls
A preliminary evaluation results document that was created based on these normalized survey answers is available from: http://msi-ontology.sourceforge.net/Survey_results_selected_questions.doc
Preliminary Survey Results concerning the above conventions
Of the 33 respondents nearly one third stated that they have developed own naming conventions. However, a closer look revealed that most of these (localized) conventions are both limited in coverage and have documentations that are enmeshed in papers or general style guides (for details see survey question 2.1).
The survey revealed that where naming conventions were documented, they all tackled the names of classes (question 2.2). A smaller fraction of these contained conventions on the name of the ontology itself or on the names for properties (relations). Fewer still had conventions for namespaces, instance names, class identifyers or ontology versions. Of the respondents that had not documented their naming conventions, the vast majority were using the ‘GO style guide’ for general guidance and as a source of conventions (see question 2.3). Both OBI and PSI looked at the MSI naming conventions document for inspiration (http://msi-ontology.sourceforge.net/namingconventions/Naming_Conv_v14.htm). Ontologies dealing with chemicals (e.g. ChEBI) often use the IUPAC conventions for naming of small molecules.
The survey revealed that approximately eighty-five percent of the respondents captured exact synonyms (including abbreviations and acronyms) in addition to class names and felt that conventions were needed here too (question 3.1). Nearly two third of the groups captured a ‘user-preferred name’ (as used within the literature) and half of the groups captured a more formal ‘editor-preferred name’ adhering to defined naming principles as well. Seven of the thirty-three respondents captured a ‘short name’ for use in large graphical representations. Only four groups captured foreign language translations, broader / narrower / related terms or database cross-references.
These results indicate a need for higher resolution representations of different types of names. This need was as well indicated explicitly by the survey (question 3.1), which revealed that a third of the respondents wished for a more detailed treatment of this issue. These groups were in favour of having separate representational units to represent these name categories as part of the ontology languages (question 3.3). Additionally, four survey respondents stated that synonym types should also be expanded and handled in more detail in the representation languages.
This problem has also been recognized by the Ontology Task Force of the W3C Semantic Web Health Care and Life Sciences Interest Group (HCLSIG). The Ontology Task Force asserts that heterogeneous naming and a lack of harmonization in the way names are represented in ontology languages forces the construction of more complex queries by users and hinders the development of generic user interfaces.
The need for higher resolution in the representations of types of names is also illustrated by the fact that many groups have developed their own metadata schemes to capture the required name types (e.g. OBI and MSI metadata annotation properties). This was partly due to the fact that naming is not appropriately dealt with in the Protégé ontology editor, where the default, when creating a name for a class, is the ‘Class ID’ field, which tempts users to store class names in the inappropriate rdf:ID field.
As stated above we recommend to -at least- differentiate a ‘user-preferred name’, the one used most often in the literature of the domain being represented, and an ‘editor-preferred name’, a more formal, usually longer name that is not necessarily aligned with end-user expectations, but helps the editors while editing the class hierarchy.
Engage in discussions
If you have comments, questions or suggestions about naming conventions please feel free to mailto:schober@ebi.ac.uk
Another way is to fill out the questionnaire Media:NC_survey.doc and send it backto the above email address.