Wednesday, July 29, 2009

Final steps

I'm finally wrapping up this project! I have now accessioned more than 500 of the images, and during this process, I have made quite a few recommendations for change. I requested that tags be added to various fields, primarily related to type of landscape and location, to assist users when searching for certain types of information. I have also discussed with Melissa the importance of matching the common names used in the Wasowskis' books with those used in the Wildflower Center's database. Currently, many of the common names the Wasowskis use are not in the Center's database. I thought that, for consistency, I should use the common names that the Wildflower Center already acknowledges, but Melissa wanted me to go with the names that the Wasowskis use. She indicated that they would be adding these common names into their database so that, ultimately, the two will match.

One other problem that has come up during the course of accessioning the first 500 images is where to draw a distinction between the landscape and the image. The accession form asks me to identify the habit and type of landscape of each image, and for some, the metadata for a particular image varies from that for the landscape it represents (e.g., if all the plants in the image are native, but not all the plants in the landscape are, should this image be cataloged as "native landscape"?). My instinct was to group all of the images in a particular setting together and to catalog these with the same landscape metadata. After a discussion with Melissa, however, I decided to tag images based on the content of the image itself rather than the landscape as a whole. This was a difficult decision to make, and I am not sure that there is one right answer to this problem. As the database itself has not yet been expanded to accommodate landscape images (the Wildflower Center is still working on developing a web presence/portal for them), we do not yet know exactly how people will be using this collection. However, we speculated that people would not be searching for all of the images of a particular landscape (and if they are, they will be able to search for this landscape by name and pull up all images of it). Thus, ultimately, we thought it would be most helpful to catalog the items independently.

Lastly, we discussed how the landscape images would appear on the website. Melissa stated that when someone searches for a plant in the database, a link for landscape images will appear below the botanical images. I suggested that it would also be a good idea to create a link from the homepage to the landscape collection, as there is for the botanical images. In addition, I stated that I thought it would be best if - eventually - the collection is searchable by both keywords and tags.

Friday, July 10, 2009

Accessioning

Having completed all of the scanning and image processing, I am now ready to move on to the last phase of my project - accessioning the images and uploading them to the database. This involves not just researching the images to uncover important metadata that will be critical to helping users find them but also helping the Wildflower Center develop a structure for the accession form. The Wasowski slides are the first landscape images that the Center has acquired, and because I am the person most familiar with the content of those images, I have insight into what kind of data needs to be collected and what the most reasonable structure for collecting that data is.

The form captures some basic information for every image (e.g., title, photographer, location), but there are also a number of sections whose relevance varies depending on the content of the slide (e.g., botanical or wildlife information). Some of the items, such as title, location, and shot details, are text-entry fields, but others, such as habitat and wildlife type, were set up as drop-down boxes. Melissa asked me to spend a few hours doing some trial accessions, to see what, if any, problems I could find with the data-collection process.

One of the biggest issues I saw was developing a unique title for each image. The title needs to be as descriptive as possible while staying under approximately 10 words, in order to increase the possibility that users will find the collection while conducting a web search. The problem is many of the images are similar enough so that developing a distinctive title for each one could be challenging. I discussed this issue with Melissa, and she agreed that we needed to develop some guidelines for titling similar items.

Another problem I encountered was specifically related to how the landscape type was defined. For this category of data, there were a number of options to choose from: native, non-native, mixed, invasive, natural, planned, and cultivated. I wasn't entirely sure how some of these options were different from one another, and I wasn't sure that users would be clear on that either. I did some research and found that the Wildflower Center had defined some of these terms differently than other organizations or individuals, including the Wasowskis. I suggested to Melissa that we might want to provide more descriptive information about these terms, so that both users and Wildflower Center staff members would be clear on how they are being used.

One final problem that I found was that there were a number of individuals who were in the images but whose names were not in the Center's list of standardized names. Most of these individuals were only in one picture, so this likely won't be a problem for this project. Nonetheless, if in the future additional images featuring these individuals are accessioned, the Center will need to have a standardized way of referring to them.

Friday, June 26, 2009

Image Processing

I have nearly completed the scanning process and have begun processing the images. The Wildflower Center wants to publish the images in four different sizes: 640 x 480, 320 x 240, 160 x 120, and 80 x 60. This means I have to convert each tiff to a jpeg and then size the jpegs four different times, saving each resized image in a different folder. I also reorient the pictures as necessary and do a minor amount of photo retouching, which usually includes readjusting the image to Photoshop's auto levels and marginally altering the brightness, contrast, and color saturation levels. This is a time-consuming process, as each photo must be retouched individually. The conversion and resizing of the images, however, can be completed in batches. I used the Action tool in Photoshop to record each of these processes, and I have been running the photos through each step 100 at a time.

As far as scanning goes, I have just one binder left, and as this binder appears to be primarily duplicates, I plan to save this until the end, when I am even more familiar with the images in this collection

Wednesday, June 10, 2009

Scanning

Last Thursday I started the scanning process for the first binder of slides. This involves using the Wildflower Center's slide scanner, which can hold about 50 slides at once, but scans at a rate of 1-2 minutes per slide. To prepare for scanning, I marked each slide with a yellow dot in the top right corner, so that I could easily put each slide back in its place without having to hold the slide up to the light to determine its orientation. I also wrote the slide label number on each slide pocket so that, when returning the slides to the binder, there would be no confusion about which slide belonged in which pocket.

The scanning process was extremely slow, partly because of the speed of the scanner and partly because the scanner jammed frequently. On the first day of scanning, I was only able to complete 150 slides. While the slides were being scanned, I worked on transposing the metadata from the slide sheets and the slides themselves (many were labeled with location, date, or other information) into the spreadsheet that I created last Tuesday. In addition to the required fields that I had created the spreadsheet with, I added fields for location (e.g., garden or property name) and location notes (i.e., any further information related to location that was written on the slide sheet or on the slide itself), as well as date.

So far, I have finished scanning the first binder and organizing the first two binders and part of the third. At this point, there are a little more than 400 distinct slides that need to be added to the database.

Wednesday, June 3, 2009

First Day

Yesterday marked the first day of work for me at the Lady Bird Johnson Wildflower Center. I will be spending two days per week there until my capstone project is completed. At the end of this project, I will have uploaded all of the Wasowski Native Texas Landscape slides to the Wildflower Center's image database, and each image will be accompanied by appropriate metadata.

The Wildflower Center already has an extensive image collection on their website, and for the most part, I will be following the process that they have already tested and agreed upon to complete this project. However, as the Wasowski slides are the first landscape images that the Wildflower Center has acquired, I will be expected to tweak this process to meet my needs and to offer suggestions for improvements that could be made to the process and, in particular, the types of metadata collected.

Melissa, the Wildflower Center's librarian and my field supervisor, having been through this process before, suggested that it would be easier for me to determine an appropriate organization scheme for the images before attempting to scan them. To this end, I spent my first day familiarizing myself with the contents of the four large binders that currently house the slides; developing an organization scheme; and numbering, labeling, and physically rearranging the slides. Melissa left all decisions about how to organize the slides up to me.

There are an estimated 1,700 slides in this collection; however, as I discovered yesterday, nearly two thirds of them appear to be duplicates or near duplicates. In order to make the scanning process as efficient as possible, I elected to remove all duplicates from the original binders and organize them into new binders. Each duplicate was labeled with the number of the slide that it is a copy of. Near duplicates were were labeled with the same number along with an additional identifying character (i.e., a, b, c, etc.) to indicate that, although they were too similar to another slide to be included in the collection, they were in fact distinct images.

Because of the large amount of slides in the collection, as well as the number of duplicates that needed to be verified and organized independently of the originals, I was only able to organize one of the binders yesterday. At the end of the day, I created a spreadsheet to track some of the information about the slides I had already organized (i.e., slide number, file name, orientation, collection, and photographers). My next step will be to scan the slides I have now organized.