About the Image Retrieval System

CalPhotos: About the Image Retrieval System

view photos

Query CalPhotos

Browse Images

Landscapes & Habitats

more info

This page contains technical information about the image retrieval system used by CalPhotos. For more general information, please click on one of the links on the left-hand sidebar of this page.

A Quick Tour of the Image Database & Retrieval System
Overview of the System
How to Replicate this System
History of CalPhotos

A Quick Tour of the Image Database & Retrieval System

About the Image Collection: This page briefly describes each of the photo collections, shows a sample photo for each collection, gives a count of total in each collection (updated nightly), and has links to sample queries.
Query & Browse CalPhotos: This is the query form for composing database queries to CalPhotos images. There are links to other query forms for specific collections, such as plants only, animals only, etc. These also contain browse lists such as the list of mammals by common name. These browse lists are generated automatically and updated nightly.
Photo upload system: This is the online form the registered photographers use to upload new images directly into the CalPhotos database. We have over 400 photographers registered and we receive hundreds of new every month using this system. Also see information about Contributing Photos to CalPhotos and the Photographers' database.
Photo annotation system: This page describes the system that allows registered CalPhotos reviewers to comment on , or change the taxon of a photo they believe was incorrectly identified by the photographer. This page links to a list of currently registered reviewers as well as examples of that have been annotated.

Overview of the System

The Images

For the first few years of the project, most of the images in this collection were originally slides that were processed by a photo lab and provided to the Digital Library Project on Kodak PhotoCD. This technology provides up to six resolutions of each image in Kodak's PCD format. The come about 100 to a CD and we typically process them in batches of several hundred at a time. We copy the PCD images onto a storage device and we convert selected resolutions to JPEG format for web browsing. Currently we use the 128x192-pixel size for our browsing pages and the 512x768 size for enlargements. Some photographers have requested that we display enlargements no bigger than 256x384, so this size is also stored for most of the images. The JPEGs are stored on disk in a UNIX filesystem by the unique 16-digit PhotoCD number. This number is also stored in the database so that images can be found and displayed at query time.

For more information about usage, contributors, and photographers, please follow the About the Photos link on the left sidebar of this page.

The Data Model

In addition to the images themselves, we also store descriptive information for each image, which is provided by the photographer or institution that contributed the photo. For example, the minimum we look for on our flora and fauna pictures is a taxonomic name for each photo, and the date and location where it was taken.

The data model for CalPhotos is here (PDF file). This model was first developed in 1993 for a research project that used Postgres to store enviromental information. You can read more about the history here. Click here to see the database schema that we use for the main Photos table.

Data Ingest

Originally for large photo collections such as Brousseau and Cal. Academy, we uploaded 500-1500 new images at a time. Images came to us on PhotoCD and we wrote programs to convert them to JPEG and store them on disk. We were given a delimited text dump of the collection's database for the new , usually from a PC-based database such as FileMaker Pro or Microsoft Access. We'd process it, verify the data, add new data, and reformat it into a loading file for Informix.

Since 2001, all photos are uploaded using our web-based upload system. This system allows photographers who have registered with CalPhotos to submit new photos directly into the database. Web forms are used to upload one JPEG photo along with descriptive information. If the photographer is uploading more than one photo, the text information can be carried over from photo to photo so that re-typing isn't necessary, and data that can be retrieved from the datsbase is filled automatically, such as the photographer's copyright information, and the state,country, and continent if a California County is provided. Currently, around 2,000 new photos are added each month by photographers using the photo upload system. Additional information about this system is available here.

For both ingest methods, our scripts verify data to the extent possible; for example, the taxon provided by the photographer is looked up at ITIS to check for spelling and usage. See CalPhotos References for a list of online sources we use.

The Retrieval System

The system we use was developed by the Digital Library Project and currently uses the Perl module DBI, the MySQL database, and our own in-house scripts. We do not store the images in the database. Each database record contains a 16-digit photo ID that maps to the actual photo stored on disk. An html query form accesses a computer program that creates an SQL (Standard Query Language) query to the MySQL database, delivers the query, and processes the results, creating a new web page to display the pictures that matched. See also: How to replicate this image database

We have designed our retrieval system for ease of intereoperation with related projects. Customized queries can be created and encoded in a URL for use on others' web pages and programs. For instructions on how to do this, please see How to Link to the Photos.

How to Replicate this System

In order to set up your own image retrieval system that works the way ours does, you'll need:

Software & Hardware

disk space for storing the images (UNIX file system)
a UNIX-based relational database (MySQL, Informix, Sybase, Oracle,etc.)
a UNIX-based web server (we use Apache)
Perl 5.0 or higher
DBI and the DBD for your database
CalPhotos scripts for image retrieval

Data

JPEG images: one 128x192 thumbnail plus an enlargement
applicable text for the database record

CalPhotos is a project of BNHM University of California, Berkeley

Questions and Comments
this page last updated: Sep 18, 2007