The micro*scope web site uses an array of innovative software to help manage information. We refer to our software as being taxonomically intelligent. The approach was developed in collaboration with our sister project, uBio.
We developed the taxonomically intelligent software in response to the increasing amount of information about organisms that is becoming available through the internet. Scientists use the internet to make their work visible, and this increases the number and diversity of web sites that carry information about organisms.. Various initaitives, such as the Biodiversity Heritage Library, are digitizing the literature, especially older journals, books, and other 'legacy' data. Search engines dig deeper into databases that were previously opaque to them. Yet, we lack good tools to discover information about organisms and to organize it in a way that makes biological sense..
All pieces of information that are out there are labelled with the names of organisms. Using names to index information on the web is as obvious as using names in the index to a book. Name-based services try to achieve just that. Unfortunately, a number of factors complicate matters. Firstly, organisms may have more than one name - possibly because names are mis-typed, or species have colloquial as well as scientific names, or because names of organisms change as a result of taxonomic and phylogenetic research. When faced with an array of synonyms, which name should one use to find information. A second problem is that of homonyms, the same name being used for different organisms.
Traditionally, taxonomists have developed the knowledge and skills to navigate these obstacles and to bring together information about the same organism. TAXONOMIC INTELLIGENCE refers to software that tries to incorporate taxonomic thinking and taxonomic expertise to overcome the various names problems. Taxonomic intelligence has many dimensions and these continue to expand. They include:
- A list of all names of all organisms. This provides the foundation for taxonomically intelligent name-based services. A compilation can be used to like a spell checker to discover names of organisms in 'data objects'. This allows those objects to be indexed taxonomically. There are about 1,700,000 species, probably with about 10,000,000 scientific names. About 1% of names change each year and a similar number of new names appear each year. If we include all possible typographic errors and variants, there may be as many as 100,000,000 'name-strings' out there. The number of names known to uBio is listed on their web page - but at the time of writing is by far the largest collection of names.
These and many other elements of Taxonomic Intelligence are embedded within this web site. The names of organisms are drawn from the uBio NameBank and organized within a hierarchical classification called CU*STAR. CU*STAR is the organizational core of micro*scope. Taxonomic intelligence underpins our linkout systems that allow us to find complementary data on the same organisms but at different web sites.
- Reconciliation. This strategy deals with the problem that one organism may have more than one name. Reconciliation links or maps alternative names against each other. This allows a query or action started with one name, to reach out and gather information under all names. Reconciliation includes reconciliation of lexical variants (different spellings), nomenclatural variants (names of synonyms that relate to the samne type specimens), vernacular varaints (allowing inclusion of vernacular names which have been unambiguously linked to scientific names), and - in a different way - subjective synonyms.
- Disambiguation. This helps to resolve the problems that arise when one name is used for more than one organism. The association of the authority (the person or people who first introduced a name into the literature) will distinguish spelled-alike names - Peranema Dons 1825 does not refer to the same organisms as Peranema Dujardin 1841. Algorithms can use associated data to find out which taxonomic area a name related to, and so can distinguish between Peranema Pteridophyta and Peranema Protista.
- Hierarchical classifications - these are very valuable in bringing the quantity of information about organisms under control. Hierarchies link information on the basis of perceived evolutionary history. As an informatics device, hierarchies can be exploited in navigating around information, to browse towards more specific information, and to enhance searches by expanding them or focussing them.
- We also use other informed knowledge - such as understanding how names might change when species are moved from one genus to another - to help draw together complementary information.