This page contains technical information about the image retrieval
system used by CalPhotos. For more general information,
please click on one of the links on the left-hand sidebar of
this page.
A Quick Tour of the Image Database & Retrieval System
- About the Image Collection
- This page briefly describes each of the
photo collections, shows a sample photo for each collection, gives a count
of total in each collection (updated nightly),
and has links to sample queries.
- Query & Browse CalPhotos
- This is the query form for composing database queries to
CalPhotos images. There are links to other query forms for
specific collections, such as
plants only, animals only, etc. These also contain browse lists
such as the list of mammals by common name.
These browse lists are generated automatically and updated nightly.
- Photo upload system
- This is the online form the registered photographers use
to upload new images directly into the CalPhotos database. We
have over 400 photographers registered and we receive hundreds of new every
month using this system.
Also see information about Contributing Photos to CalPhotos
and the Photographers' database.
- Photo annotation system
- This page describes the system that allows registered CalPhotos
reviewers to comment on , or change the taxon of a photo they
believe was incorrectly identified by the photographer. This page
links to a list of currently registered reviewers as well as examples
of that have been annotated.
Overview of the System
The Images
For the first few years of the project, most
of the images in this collection were originally
slides that were processed by a photo lab and
provided to the Digital Library Project on Kodak PhotoCD. This
technology provides up to six resolutions of each image in Kodak's
PCD format. The come about 100
to a CD and we typically process them in batches of several hundred
at a time. We copy the PCD images onto a storage
device and we convert selected resolutions to JPEG format for
web browsing. Currently we use the 128x192-pixel size for our
browsing pages and the 512x768 size for enlargements. Some
photographers have requested that we display enlargements no
bigger than 256x384, so this size is also stored for most of
the images.
The JPEGs are stored on disk in a UNIX filesystem by
the unique 16-digit PhotoCD number. This number is also
stored in the database so that images can be found and displayed
at query time.
For more information about usage, contributors, and photographers, please
follow the About the Photos link on the left sidebar of this page.
The Data Model
In addition to the images themselves, we also store
descriptive information for each image, which is provided
by the photographer or institution that
contributed the photo. For example, the
minimum we look for on our flora and fauna pictures is a taxonomic name
for each photo, and the date and location where it was taken.
The data model for CalPhotos is here (PDF file). This model was first developed in 1993 for a research project
that used Postgres to store enviromental information. You can read more about the
history here.
Click here to see the database schema that we use for
the main Photos table.
Data Ingest
Originally
for large photo collections such as Brousseau and Cal. Academy, we uploaded
500-1500 new images at a time. Images came to us on PhotoCD and we wrote
programs to convert them to JPEG and store them on disk. We were given
a delimited text dump of the collection's database for the new , usually from a PC-based
database such as FileMaker Pro or Microsoft Access. We'd process it, verify the
data, add new data, and reformat it into a loading file for Informix.
Since 2001, all photos are uploaded
using our web-based upload system.
This system allows photographers who have
registered with CalPhotos
to submit new photos directly into the database. Web forms are used to upload one
JPEG photo along with descriptive information. If the photographer
is uploading more than one photo, the text information can be carried over
from photo to photo so that re-typing isn't necessary, and data that can
be retrieved from the datsbase is filled automatically, such as the
photographer's copyright information, and the state,country, and continent
if a California County is provided. Currently, around 2,000 new photos
are added each month by photographers using the photo
upload system. Additional information
about this system is available
here.
For both ingest methods, our scripts verify data
to the extent possible; for example,
the taxon provided by the photographer is looked up at ITIS to
check for spelling and usage. See CalPhotos References for a list of online sources we use.
The Retrieval System
The system we use was developed by the Digital Library Project
and currently uses the Perl module
DBI,
the MySQL database, and our own
in-house scripts. We do not store the images in the database.
Each database record contains a 16-digit photo ID
that maps to the actual photo stored on disk.
An html query form accesses a computer program that
creates an SQL (Standard Query Language) query to the MySQL database,
delivers the query, and processes the results, creating a new web page
to display the pictures that matched.
See also:
How to replicate this image database
We have designed our retrieval system for ease of intereoperation
with related projects. Customized queries can be created and encoded
in a URL for use on others' web pages and programs. For instructions
on how to do this, please see How to Link to the Photos.
How to Replicate this System
In order to set up your own image retrieval system that works the way ours does, you'll need:
Software & Hardware
- disk space for storing the images (UNIX file system)
- a UNIX-based relational database (MySQL, Informix, Sybase, Oracle,etc.)
- a UNIX-based web server (we use Apache)
- Perl 5.0 or higher
- DBI and the DBD for your database
- CalPhotos scripts for image retrieval
Data
- JPEG images: one 128x192 thumbnail plus an enlargement
- applicable text for the database record