|
Biocentric
information and the role of taxonomy and taxonomists
Philosophy
Software
Flexibility
Content organization and finding content
Working with distributed and local content,
linkouts
Quality control
Options for using the star approach (collections, star*sites,
star*nets)
Sustainability
Names and classification schemes
CU*STAR
Biopedia
XID and matrix keys
Biocentric
organization and the role of taxonomy and taxonomists
Most
of biology is made up of biocentric information - information about named
entities. Since the time of Linnaeus, this information has been very effectively
organized by biological taxonomy - suites of rules and guidelines about
how to develop and apply names so that they are stable and unambiguous,
and the placement of those names within concept hierarchies called classifications.Taxonomists
are the custodians of taxonomic knowledge and so are the key players in
organizing and indexing information about organisms. In an informatics
context, they oversee the most important array of biological meta-data.
One of their tasks must be to transfer their organization and indexing
skills, as well as their knowledge of organisms, into a world where generic
skills can be put to use by the internet, and their knowledge accessed
through the internet.
Star*sites
aim to help this process. We are embedding the organizational and indexing
power of taxonomy within the internet. In so doing, we seek to add an
array of internet services for biologists.
Philosophy
Our
philosophy is to help develop contemporary knowledge environments for
biology that are inclusive and flexible. Star*sites are intended to be
very scaleable - and managers have a range of options
for the scope of a site - from assembling a small collection within a
pre-existing site to building an extensive network of sites. We also adopt
a principle of inclusiveness - rather than exclusiveness. The software
has been designed to work with a diversity of different data types - although
has begun with images and descriptions. Sites can adopt different classification
schemes as part of their organization. Inclusiveness also extends to sharing
of content - such that any star*site can 'sunch' with other sites or collections,
and expand their own indices to include content at remote sites. Inclusiveness
embraces the ability to combine local and remote or distributed data.
The software also is built to be compliant with an 'inclusive' philosophy.
The software is modular and we welcome developers to add to the range
of items in the software library. The appearance of the site and its functionality
are controlled by variables, and site managers will be able to use the
variables to control the appearance, appearance and content of the sites.
We believe that the result will make the system very flexible and very
scaleable, and embues it with the capacity to grow top a very large size
and to evolve to meet changing needs and demands.
There
are some aspects which are not flexible at this time. Because of the sharing
philosophy, we presume that builders of star*assets will be placing their
content in the public domain. Star*central also needs to be made aware
of certain dimensions of individual sites in order to be able to facilitate
content sharing.
Software
Star
software is modular. The software is written in PHP, and so can work within
a public domain PHP - mySQL - Apache environment. A module library is
available through the toolbox at star*central.
Users can pick up modules, or create new ones. Just talk
to us about how to get going with this.
Flexibility
As
the star*software develops, site managers will have the ability to control
the front page, the 'chrome' or 'livery' so that the appearance can be
moulded to meet the needs of the site. There will be choice over which
functions (modules) will be included within a site, and interested parties
will be able to expand the arrange of functions that are available. Each
site will be able to define its default classification and what part of
the classification the site opens at. The array of outlinks will be flexible
and users will also have the choice of using bio*pedia.
In addition, although installing a star*site will create a default directory
structure, site managers will be able to reorganize this to meet their
local needs. As far as possible we seek to define as many aspects of the
system as variables and to place the options as a buffet-style menu for
both site developers and site visitors. The data are presented in 'dynamic
fact sheets', and the composition and layout of those sheets will also
be under the control of the site manager. Some elements will not be optional,
such as the dialogue with star*central which is required to maintain oversight
of all of the activities among individual sites...
Content organization, and finding content
The
contents of star*sites are referred to as assets. Assets are placed within
collections. Collections can be placed inside other collections. Currently,
star*sites are set up to deal with images, but will be expanded to deal
with other assets. Tools to create collections or to add and edit assets
are available through our toolbox.
Visitors
to star*sites can find assets using a number of pathways. There are search
('Look-for') functions at various levels within the system and these are
appropriate for visitors wo know precisely what they are looking for.
Search functions are quick and direct. Secondly, visitors can browse and
search within classification schemes. Site managers and users can chose
the classification scheme to use. This allows visitors to navigate towards
assets within an informed phylogenetic context. It allows for the search
process to be broadened or narrowed. In addition, visitors can use the
hierarchical collections to navigate from a broad or imprecise search
objective to an increasingly precise and targeted approach. This places
within a contextual environment. Site managers will be able to edit and
expand the hierarchical collection structure to provide expert guidance
to assets.
Working
with local and distributed content, linkouts
The
inclusiveness philosophy dictates that star*sites work with local and
distributed data. Access to distributed data is empowered by two means.
Firstly, we provide tools to allow the easy creation of other star*sites,
and tools can be used to synch the indices of diferent sites so that any
one site becomes a portal to some or all of the assets located elsewhere
within the star family. Secondly, we use a system of linkout inspired
by NCBI. The linkouts will be assembled within a database, and star site
managers, and perhaps even users, will be able to decide which linkouts
are used in a given site at a given time. The linkouts are contextualised
in two ways. Linkouts are mapped against the classification scheme, so
that they only appear within specified clades or specified subsets of
taxa (e.g. only when visiting pages that deal with mammals or possibly
mammals of Australia). Linkouts are also tagged with metadata from an
open-ended controlled vocabulary. This limits the linkouts to ones appropriate
to the audience or the needs of users (the linkouts might, for example,
be limited to those appropaiet to K-12 students or to those relating to
molecular data).
Quality
control
Star*site
are intended to be expert biuological knowledge environments, providing
resources for - among others - research biologists, educators and students,
decision makers, and those who seek to inform or even disinform us. It
is of importance that the quality (accuracy and completeness) of the information
within a star*site be explicit. We adopt two strategies to help protect
data quality. Firstly, star*sites will have the option of adding comments
boxes to every element of the databases, so that visitors can contribute
comments and these can be subsequently viewed by visitors. We advocate
that visibility of comments be controlled by the site manager. Secondly,
we are building a system that will permit three layers of quality 'judgement'
to be applied to content. The lowest level allows anyone to add anything
and no quality control is imposed. At the second level, content is vetted
through a secretariat for each site, and according to agreed guidelines,
the secretariat adds a seal of approval to the content. Finally, we will
provide tools which will permit panels of experts to evaluate, comment,
and edit content to the highest standards. Each web site will have software
switches which will hide content at lower levels of quality and/or show
the different levels of quality in different ways.
Options
for using the star approach (collections, star*sites, star*nets)
Someone
wishing to use the star*approach will have an array of choices. At the
minimalist level, they can identify the owner of a collection of a star*site
and ask to add assets to that collection. This is made possible by through
tools (internet services) accessed at our toolbox
through a web browser. At the next level, a owner may wish to create a
collection within an existing star*site, and again does this using internet
services tools that can be accessed at our toolbox
through a web browser. At the next level, an owner may wish to create
a new star*site, and again does this by downloading software that is accessed
at our toolbox through a web browser. The
new star*site manager will need to set upo a server running Apache, mySQL
and PHP but our toolbox will provide help there too. Finally, a group
of experts may chose to create a network of co-operating star*sites. These
are referred to as star*nets, and the first of these is plankton*net.
Members of star*nets have a dedicated synching function which allows each
site to carry the contents of the indices of all other members. each site
therefore becomes a portal to the content assem,bles by larger community.
Coupled with our robust sustainability model, this should enable large
teams to build large-scale knowledge environments on the internet.
Sustainability
Web
knowledge resources face considerable problems with sustainability. Many
sites have appeared, inspired by enthusiasm but often with a short-term
supply of support. When the enthusiasm diminishes or the support runs
out, the site stals, and begins a process of degradation. star*central
offers a starchiving service - in which not only theindices of star*sites
but also the content can be archived. This ensures that the content remains
accessible even after individual sites stop growing. Compliant with out
philosophy, any star*site will be empowered with the starchiving function
- thereby protecting the sustainability model against the collapse of
a single star*central.

Names and classifications
For
the star*sites to realize their vision, the taxonomic systems that we
call upon must be populated with names. star*sites work closely with the
uBio project. This project has been populating a structure called NameBank
with objective taxonomic information (names of organisms) and also maintains
a second database in which these are placed within alternative classifications.
At the time of writing, NameBank has access to about 1,700,000 names,
and a further 300,000 are waiting to be added. NameBank includes fairly
comprehensive generic coverage of all living organisms (several groups
of insects, some Cnidaria and a few other groups have not yet been covered).
This ensures that the compilation of names can be used to index assets
relating to any kind of organism. NameBank through ClassificationBank
places names within vying classification schemes. NameBank adds considerable
nomenclatural information, maps occurrences and uses of names, identifies
synonyms and alternative names such as colloquial names of mis-spellings
that may have found their way into the literature, internet or databases,
and finally helps navigates through the problems created by homonyms (names
spelled in the same way but referring to different entities). Star*sites
begin with the CU*STAR classification (being a comprehensive and unified
classification of all life), but managers of star*sites can switch to
other classifications. The CU*STAR classification is based on Patterson,
D. J. 1999. The diversity of eukaryotes. American Naturalist 154: S96-124.
(pdf).There will also be editing tools that
will allow users to build their own classifications. As star*sites grow,
new names that are needed to index assets that are to be fed into the
uBio Namebank structure. Please do let us know if you would like to add
names. We will provide you with access to names adding tools.
The
classification system is an important part of the star*sites for many
reasons:
it is of intrisnic value
because it reveals perceptions of relationships
it provides access to
nomenclatural and other ancillary data at related sites
it provides a pre-formed
framework around which assets, whether local or distributed, can be organized
it provides services
for taxa for which we have no assets - such as outlinks
it allows for the reconciliation
of synonynms, misspellings and other alternative names
it identifies homonyms
to be identified and flagged to avoid errors in navigation to information
it provides a framework
around which other services can be built - such as a 'expert*ease' service
which identifies expert taxonomists who know about the taxa in question
it allows clades or
other collections of names to be annotated with generic notes
it provides the basis
of a names registry system
etc.
CU*STAR
The
CU*STAR classification offers a unified classification to ease navigation
from one taxonomic region to another. The classification has been placed
in a flexible and editable environment so that users can build their own
classifications. We seek to keep CU*STAR up to date with names from NameBank.
Different regions of CU*STAR are monitored by taxonomic experts - we refer
to them as custodians.

As
most star*sites can be expected to include descriptions of taxa, we are
creating a species star*site called biopedia. Biopedia is intended to
be a communal resource where users can place basic information about organisms
(initially descriptions, but later descriptions together with images,
key literature, information about experts, and so on). Biopedia will be
connected to NameBank to pick up associated nomenclatural data and information
on synonyms. It is likely that biopedia will include links to other data
sources, but one model for growth involves atomizing the data to create
a super*matrix that will facilitate the discovery of identities of taxa
using matrix identification tools
X:ID and matrix keys
Matrix
keys are a very effective means of identifying organisms. Information
about the organism (characters and character states) can be entered into
a matrix. Organisms can then be found by a process of elimination (which
organisms live in Massachusetts, in marine environments, are one foot
long, with hard shell, have a spikey tail, and legs underneath). Examples
of software through which matrix keys can be built are Delta
(more for experts), Lucid, oir
X:ID. The latter will
be developed so that the entries on characters and taxa can be linked
to appropriate data in both NameBank and star*sites. This increases the
integration of data resources with tools for users. In addition, the development
of keys will feed 'atomised' information into a resource referred to as
the Super*matrix. Subsequent keys will have the option of using the matrix
information available within super*matrix.
|