Digital Libraries and Open Source

In reading from our text, Modern Information Retrieval, this week, we learned of multiple types of software systems used in the creation of digital libraries, specifically:

  • Greenstone
  • Fedora
  • Eprints
  • DSpace
  • Online Digital Libraries (ODL)
  • 5S Suite

Several of these, namely DSpace and Fedora, are referenced as open source software.  Open source software is typically available for free and provides users with the source code that be changed based on the users needs (Cherukodan, Kumar, & Kabir, 2013).  The use of open source software provides many advantages to libraries.  These advantages include:

  • Monetary savings, as the software is typically free
  • Flexibility to solicit technical support from a variety of sources, instead of only one vendor if the software was proprietary
  • Sharing of responsibility in solving information system accessibility issues among the various communities that use the open source software

In 2003, the Cochin University of Science & Technology (CUSAT) in India established the CUSAT Digital Library (CDL) using DSpace open source software.  A case study was conducted to “understand digital library design and development using DSpace open source software in a university environment with a focus on the analysis of distribution of items and measuring the value by usage statistics” (Cherukodan, Kumar, & Kabir, 2013).  The study used Google Analytics to measure usage of the digital library.

The writers of the study spent some time discussing DSpace, which they describe as the most popular open source software used for digital libraries.  They mentioned several of the other types of software referenced in our text, Eprints, Fedora, and Greenstone, but went on to explain why CUSAT chose DSpace over the others. 

MIT Libraries and HP Labs developed DSpace in 2002.  It uses the Java programming language and uses Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) to support interoperability.  DSpace was chosen by CUSAT because it offered “the best search and browsing support as well as good support for metadata” and provided “more power to administrators to put restrictions at collection level.”  Active online community support through the DSpace mailing list and DSpace wiki were also factors in the choice of DSpace.

Google Analytics was employed to determine the usage of the digital library after its creation.  The study period was from January 2009 through September 2009.  There were 10,346 individuals who visited the site from 78 countries.  There were 23,722 page visits, which may not sound like a lot until you factor in the fact that the CDL only contained 2,312 items.  This study is touted as being the first of its type.

I think the advantages referenced above provide a compelling argument for the use of open source software in the creation of digital libraries.  I found an interesting website, Free/Open Source Software for Libraries (FOSS4LIB), that helps library staff explore such questions as “Is open source software the right choice for my library?” and if so, what open source software package can meet the specific needs of the library.  There are free webinars available to learn more about FOSS4LIB and how to best take advantage of open source software solutions. 

 

Cherukodan, S., Kumar, G.S., & Kabir, S. H. (2013). Using open source software for digital libraries: A case study of CUSAT. Electronic Libraries, 31(2), 217-225.

Multimedia and XML in Today’s Libraries

Multimedia is digital data found in a variety of formats, including text, pictures, graphs, images, videos, animations, sound, music, and speech (Baeza-Yates & Ribeiro-Neto, 2011). XML is a mark-up language used by libraries to allow retrieval of multimedia data. XML allows the creation and formatting of document markups. It allows structured data to be put into a text file, thus allowing easier information retrieval. In simplest form, it allows the use of tags or other modifiers to provide information about the data presented. These tags or modifiers “tell a computer how to display, recognize, or otherwise work with the marked material.” Electronic publishing is one area that may find great benefits in the use of XML (Exner & Turner, 1998).

XML offers tremendous flexibility because the tags or modifiers “can be read and processed correctly by any web browser no matter what computer system of software was used to create them” as long as certain rules are followed (Miller & Clark, 2004). This means authors and publishers, using XML specifications, can create their own tags or modifiers to describe and organize their own content. XML applications can be classified according to what they can accomplish. These accomplishments include:
• Edit XML documents
• Transform XML documents
• Display of mark-up information in a user-friendly manner
• Store and index XML
• A combination of the above!

Seven applications of the use of XML in libraries include in library catalog records, interlibrary loan, cataloging and indexing, collection building, databases, data migration, and systems interoperability (Holbooks, 2004).

In the article, How Does XML Help Libraries?, we were provided more specific information of the various ways libraries are using XML. (http://www.infotoday.com/cilmag/sep02/Banerjee.htm) Some of the ways libraries are using XML include:
• To simplify interlibrary loan processing
• To enhance digital collections
• To develop a method for encoding archival materials, thus improving access to those materials
• To provide access to subscription databases, digital collections, materials requested via interlibrary loan, and library catalogs that run in a combination of commercial, open source, and locally developed platforms

A list of free XML tools and software can be found at http://www.garshol.priv.no/download/xmltools/.

Several research projects used XML as their document tag language. These projects were the Digital Libraries Initiative (DLI) Project at the University of Illinois and the Libraries Special Collections Project at North Carolina State University (Kim & Choi, 2000).

References:

Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search (2nd ed.). Harlow, England: Pearson Education.
Exner, N., & Tuner, L.F. (1998). Examining XML: New concepts and possibilities in web authoring. Computers in Libraries, 18(10), 50-53.
Holbrooks, Z. (2004). Book reviews. Journal of the American Society for Information Science and Technology, 55(14), 1304-1305.
Kim, H., & Choi, C. (2000). XML: how it will be applied to digital library systems. Electronic Library, 18(3), 183-189.
Miller, D.R., & Clark, K.S. Putting XML to Work in the Library: Tools for Improving Access and Management. Chicago, IL: American Library Association.

Impact of Web Retrieval on Libraries and Technology

Perhaps the most significant impact of web retrieval on library and technology is the development of Online Public Access Catalogs (OPACs).  People no longer have to visit their local library to have a wealth of information available through the library at their fingertips.  Although the Web may have a plethora of information available to all, as we have learned, finding relevant information is still not an easy task and resources available through the library can help information seekers find what they are looking for more effectively and efficiently.

First generation OPACs in the 1960s and 1970s primarily provided “known-item searching” (Antelman, Lynema, & Pace, 2006).  They provided access points just as the older card catalogs.  Second generation OPACs integrated keyword and Boolean (AND, OR, and NOT) searching.  However, it was very hard to search by subject in these first generation catalogs.  Next generation catalogs in the 1980s began using partial-match techniques.

According to Bailey (2011), library user’s look for the following criteria in a library’s OPAC:

  • Ease of navigation
  • Accessibility to library materials
  • Integration with social media
  • Usability
  • Ease of access to actual journal articles

Research in the web retrieval field in such areas as displaying search results, ranking, and query formulation helps to make meeting these criteria possible.

There are still many challenges to overcome. User interfaces of OPACs and their look and feel don’t reflect the conventions found elsewhere on the Web and their functionality often falls short of user expectations.  Plus online catalogs often do not index the totality of a library’s collections, and users have to use other tools to access materials such as magazine or journal articles.  Weare, Toms, and Breeding (2011) discuss how OPACs should include search features like the ones we learned about in our reading this week (Modern Information Retrieval, Chapter 11), such as:

  • Dynamic query suggestions (auto-complete)
  • Spell check function
  • Search term recommendations
  • Ranking by relevancy

They also recommend virtual rich displays such as virtual shelf browsing.  I am confident continuing research in the field of information retrieval will in turn lead to substantial improvements in OPACs.

Antelman, K., Lynema, E., & Pace, A.K. (2006). Toward a twenty-first century library catalog. Information Technology & Libraries, (25)3, 128-139.

Bailey, K. (2011). Online Public Access Catalog: The Google maps of the library world. Computers in Libraries, (31), 6, 30-34.

Weare Jr., W.H., Toms, S., & Breeding, M. (2011). Moving forward: The next-gen catalog and the new discovery tools. Library Media Connection (30)3, 54-57.

Indexing and Searching

I must admit I found the chapter on Indexing and Searching in our text Modern Information Retrieval to be very vexing.  I knew I was in trouble when I read the following just a little over three pages into this 60 page chapter, “Throughout this chapter we assume that the reader is familiar with basic data structures, such as sorted arrays, binary search trees, B-trees, hash tables, and tries.”  I went into the chapter knowing nothing of these data structures, and I can’t say that I learned much from the chapter.  Fortunately, during my search for articles on indexing and searching, I definitely feel I learned a lot.  Below is a summary of two articles I selected on the topics of indexing and searching.

 

Article One, Inverted indexes: Types and techniques, summary:

Inverted indexes are the primary tool used by search engines to execute queries.  We learned in Chapter 9 of our text (Modern Information Retrieval) that there are two elements in the inverted index structure:  the vocabulary and the occurrences.  An inverted index stores a list of documents for each word in which that word appears instead of the convention method of a document being stored as a list of words (thus the “inverted” comes into play!).

Although inverted indexes are easy to build, they typically use a very large amount of memory, storage space, and/or CPU time.  Documents are typically “preprocessed” prior to being indexed using methods such as “lexing”, “stemming”, and eliminating “stop words”.  I found what I thought to be a very good presentation on text processing in information retrieval that discussed these terms at:  http://www.csee.umbc.edu/~ian/irF02/lectures/03Text-Processing.pdf.

“Lexing” is the process of converting byte stream into tokens, which are each a single alphanumeric word.  These are converted into lower case.  Decisions must be made regarding how to handle hyphens and other types of punctuation.  Numbers are typically not included in the list of tokens because they take up a lot of space in the index and don’t add significant value.

“Stemming” involves removing any prefixes and suffixes from a word.  For example, “connected”, “connecting”, and “connection” might all be indexed as “connect”.

“Stop words” are often eliminated because they appear too frequently among almost all documents.  Examples of stop words include “a”, “the”, “of”, and “to”.

The trends of using stemming and stop word removal are used less frequently by modern search engines.  They prefer having a somewhat larger index and thus having the capability to search for various combinations of words to the possibility of being put at a competitive disadvantage.

There are many different ways to search for information, including:

  • Normal queries – typically one word (ex. = magic)
  • Boolean queries – uses Boolean logic such as AND, OR, and NOT with multiple words (ex. = magic AND Houdini)
  • Phrase queries – a series of words contained within quotation marks “to be or not to be”
  • Proximity queries – uses logic term NEAR
  • Wildcard queries – a form of inexact matching where a “*” is used to designate letters, words, or a combination of words (ex. = Paris is the * capital of the world would return documents that contain phrases such as “Paris is the fashion capitol of the world” and Paris is the culinary capitol of the world”)

Results of queries are typically ranked in some order of relevance.  A common method is the weighing scheme TF-IDF (term frequency – inverse document frequency).  This combines proportionally how often a term is found within a document with how often a term is found within all documents being searched.

The authors of this article lament the lack of published research on the topic of query evaluation optimization.  They feel this is in large part due to the fact that large companies don’t wish to share any findings they discover because it would potentially aid their competitors.

Mahapatra, A.K., & Biswa, S. (2011). Inverted indexes: Types and techniques. International Journal of Computer Science Issues, 8(4), 384-392.

 

Article 2, A Two-Tier Distributed Full-Text Indexing System, summary:

Index structure is of paramount importance to search engines.  The better the index structure, the quicker and more accurate the search.  This can be a challenge in the Internet environment, due to its dynamic nature.

This article focuses on the Internet connection among clusters.  It discusses two types of distributed inverted file partitioning schemes: document partitioning and term partitioning.

Attributes of document partitioning include:

  • Easier to implement
  • Higher parallelism
  • Good expansibility
  • Better load balancing
  • Index built by partitioning documents according to subjects
  • Queries related to one subject forwarded directly to a specific index server
  • High network burden because of use of different index servers (the more index servers, the slower the response time)

Attributes of term partitioning include:

  • Index built on one server then allocated to each server index
  • Whole index must be rebuilt when web document is updated
  • Network overhead and resource consuming is low since only one server
  • Better search efficiency
  • Bad load balance

The authors propose a two-tiered distributed full text indexing system that combines attributes of both of these types of document partitioning and provides a balance between search efficiency, resource consuming, and load balance.  The scheme uses document partitioning among the clusters and term partitioning inside each cluster.

Zhang, W., Chen, H., He, H. & Chen, G. (2013). A two-tiered distributed full-text indexing system. Applied Mathematics & Information Sciences, 8(1), 321-326. DOI: 10.12785/amis/080139

Aboutness and Relevance in Information Retrieval

According to Chu (Information Representation and Retrieval in the Digital Age) “aboutness” denotes a keyword or subject representing an object/document.  Searching by that “aboutness” of a document in conjunction with a field attribute such as author, publication year, or document type is a common way to refine information retrieval results.

“Relevance” is a bit trickier to define.  I found it interesting that Beaza-Yates & Riberio-Neto (Modern Information Retrieval) indicated “relevance” is “in the eye of the beholder”, and thus is a very subjective term influenced by many factors.  Those factors include time, location, and even the device being used to access the information.  Although the definition of relevance may vary, it is a property that reflects the relationship between a user’s query and a document.

Determining the relevance of a document affects how to compute the average recall because it is difficult to determine the total number of relevant documents in a system.  Recall is defined as the ratio between the number of relevant documents retrieved and the total number of relevant documents in a system.  Looking at some of the algebraic equations in our texts related to the various methods of calculating precision and recall, the primary metrics evaluated to determine the quality of ranking, made my head spin!

Various studies have identified as many as 38-80 factors that affect relevance.  One study (Xu & Chen, 2006) indicated topicality and novelty are two significant underlying dimensions of relevance.  Topicality is defined as “the extent to which a retrieved document is perceived by the user to be related to her current topic of interest.”  Novelty is defined as “the extent to which the content of a retrieved document is new to the user or different from what the user has known before.”

I understand the preferred methodologies to define recall, precision, and relevance will be automatic in nature, that is, they can be calculated without the necessity of human interaction/intervention.  More human involvement means more time, cost, and harder logistics to any experimentation.  However, can the results be truly relevant (there’s that word again!) without that interaction?

Xu, Y., & Chen, Z. (2006). Relevance judgment: What do information users consider beyond topicality. Journal of the American Society for Information Science and Technology, (57)7, 961-973.

Language Queries

We learned in Chapter 7 that information retrieval languages allow ranking, whereas data retrieval languages do not.  Basic information retrieval models use keyword based retrieval.  A single word is the most basic query.  The oldest and still one of the most used methods to combine keyword queries is through the use of Boolean operators.  “Boolean operators are simple words (AND, OR, NOT or AND NOT) used as conjunctions to combine or exclude keywords in a search, resulting in more focused and productive results.”  http://library.alliant.edu/screens/boolean.pdf

The use of AND returns all documents that contain both keywords separated by the Boolean operators, thus narrow the search.

The use of OR returns all documents that contain either of the keywords, thus expanding the search.

The use of NOT returns all documents that contain the first keyword, but not the second, thus excluding unwanted results.

The natural order for processing Boolean operators is:

  • 1st = NOT
  • 2nd = AND
  • 3rd = OR

Parenthesis can be used to change the natural order.

Patterns can also be used in information retrieval queries.  Some types of patterns that may be queried are:

  • Prefixes – strings that must appear at the beginning of a word in the text
  • Suffixes – strings that must appear at the end of the a word in the text
  • Substrings – strings that must appear somewhere within the words in the text

Queries are typically employed to either:

  • Locate facts
  • Collect information on a topic
  • Browse collections

Studies have specified different classes of queries, such as:

  • Informational
  • Navigational
  • Transactional

Other studies classify queries by the topic of the query.

I decided to use several examples given by Chu in our text, Information Representation and Retrieval in the Digital Age, to see what the query results looked like.

I used Chrome as my browser and Google as my search engine.  I ran the query “filtering AND controversy”.  The results made sense.  Then I ran “filtering OR controversy”.  Again, the results looked like what I expected.  Lastly I ran “filtering NOT controversy”.  I was stumped by the results I got on this search.  So I decided to use Google Advanced search and re-ran the queries.

Here is a screen shot of the returned results for “filtering AND controversy”:

filtering AND controversy

For “filtering OR controversy”:

filtering OR controversyThis search brought back results that contained either “filtering” or “controversy”, but not both.  In order to see the first result with “filtering”, I had to scroll down below what you see in the screen shot above.

When I used Google Advanced search for “filtering NOT controversy”, I got the results I expected:

filtering NOT controversy

After I ran the queries using Google Advanced search, I decided to run the query “using Boolean queries in Google” to see if I could figure out why the first time I ran the queries not using the Advanced search the results using NOT were not what I anticipated (!).  Here is the results page of my query:

using Boolean queries in Google

I found the 4th item returned to be very helpful.  It is a blog post titled “Using Boolean Search Operators with Google” from the blog “Amy’s Scrap Bag: A Blog about Libraries, Archives, and History.”  http://amysscrapbag.wordpress.com/2012/11/05/boolean-operators-and-google/

From Amy I learned that Boolean operators “AND” and “OR” worked the way I thought they would (fortunately I had used both in UPPER CASE – if I had not, I would probably have been confused by the results I got with “or”).  But instead of NOT, Google only recognizes the minus sign (-) to exclude words or phrases and that it must be places immediately in front of the word or phrase without a space.  I noticed when I ran the Advanced Google search to exclude a word, “filtering –controversy” is what was shown in the search box (see screen shot above from when I ran a Google Advanced search to exclude a word).

I really enjoyed this Module, because not only did I learn how to properly use Boolean operators in Google, I found a new blog to add to my RSS feed to follow!

Language in Information Representation & Retrieval

This week we learned about national language and controlled vocabularies.  Natural language is the language that we use to speak and write.  It is typically used in three ways for information retrieval and representation:

  • Use terms taken from titles, topic sentences, and other important components of a document for information representation
  • Use terms derived from any part of a document for information representation
  • Use words or phrases extracted directly from peoples questions for query representation

There are two kinds of words in natural language: significant and function words (articles, prepositions, and conjunctions).  These function words typically are considered part of a stop list, or terms that are too general to be suitable for representations.

Controlled vocabulary is an artificial language in which syntax, semantics, and pragmatics are limited and defined.  Terms in controlled vocabularies are selected by using the principles of literacy or user warrant.  Literacy warrant involves terms chosen from existing literature.  User warrant involves terms that must have been used in the past in order to be included.

There are three major types of controlled vocabularies (CV):

  • Thesauri – most widely used in information representation and retrieval
  • Subject heading lists
  • Classification schemes – 1st type of CV developed; built on artificial framework of knowledge

The strengths of controlled vocabularies are the weaknesses of natural language:

  • Synonyms
  • Homographs
  • Syntax

The weaknesses of controlled vocabularies are the strengths of natural language:

  • Accuracy
  • Updating
  • Cost
  • Compatibility

Metadata is “data about data”.  Common forms of metadata include:

  • Author
  • Date of publication
  • Source of publication
  • Document length
  • Document genre

Descriptive metadata is external to the meaning of a document and is related to how it was created (see common forms above); whereas semantic metadata characterizes the subject matter within a document’s contents.

There are two main techniques to organize data:

  • Taxonomies – classes organized hierarchically
  • Folksonomies – user freely choose keywords, called tags

We learned about various types of markup languages:

  • SGML
  • HTML
  • XML
  • RDF
  • HyTime

Text compression is becoming more relevant in the digital age, and various methods of compression were discussed in our text, to include:

  • Statistical methods

o    Modeling

  • Adaptive models
  • Static models
  • Semi-static models

o    Coding

  • Huffman codes
  • Byte-Huffman codes
  • Dense codes
  • Dictionary based methods

Our text indicates the best choice to introduce compression into modern information retrieval systems is through the use of word-based semi-static methods.

I think a combination of natural language and controlled vocabulary works best for information retrieval, with the emphasis being on natural language since that is what users are typically familiar with.  Controlled vocabularies are more specialized and tend to require some formal training for them to be truly effective and efficient.  Keyword searches (using natural language) are usually easier for a user to formulate then for them to be able to identify subject headings (controlled vocabulary).  However, this being said, controlled vocabularies can be much more powerful and return more relevant information.

Metadata formatting is extremely important in the information retrieval field, particularly as it relates to library records.  Machine Readable Cataloging Record (MARC) is the metadata format used for these records.  MARC formats are the international standard for the dissemination of bibliographic data.

With regards to markup languages, I am most familiar with HTML.  I created a blog several years ago in my very first class at USF.  Although blogging software allows you to create posts using WYSIWYG methodology, you can view your entries in HTML format (see screen shots below).  I would view the posts in HTML to learn more about the coding aspects of HTML.

Blog WYSIWYG Blog HTML

 

 

Boolean Information Retrieval Model

The Boolean model is one of many information retrieval models.  It is identified in our text (Modern Information Retrieval) as one of the three classic unstructured text models.

It is a very simple model and easy to implement.  This model is based on whether an index term is present or not.  The simplicity of this model leads to several of its disadvantages.  It only retrieves exact matches and there is no ranking of documents.  Historically it is the most common model, used by library online public access catalogs (OPACs) and many web search engines.   http://www.csee.umbc.edu/~ian/irF02/lectures/06Models-Boolean.pdf

Boolean search operators are “AND”, “OR”, and “NOT”.  The “AND” operator is used to narrow a search by combining teams.  Only documents containing both search terms a user specifies are returned.  The “OR” operator is used to broaden a search so that documents containing either of the terms a user provides are returned.  The “NOT” operator narrows a search by excluding certain search terms, by returning documents containing the term before the operator, but not the term after.  Other Boolean search symbols include quotation marks, which specify phrases between them are to be searched for exactly as they are written, and the “*” symbol, which is used for wildcard searching.   http://websearch.about.com/od/internetresearch/a/boolean.htm      http://mason.gmu.edu/~montecin/engines.htm

Many search engines now incorporate some type of page ranking mechanism.  For example, one algorithm used by Google is PageRank.  I located the following information about PageRank by using the term “pagerank” in the Yahoo search engine. The Wikipedia article on PageRank was the second on this list of items returned.   PageRank counts the number and quality of links to a web page to determine a rough estimate of that site’s importance using the assumption that more important web pages will receive more links from other web pages.  http://en.wikipedia.org/wiki/PageRank

One example of a company that uses a traditional Boolean search model is Westlaw.  I located this YouTube video by using the search term “examples of Boolean information retrieval models” on Chrome using Yahoo as the search engine.  A link to the video was the 8th option returned.  Westlaw provides an online legal research service to more than half a million subscribers.  Many professional searchers prefer Boolean queries because it gives them more control over the results received.  However, most users find it awkward and difficult to use Boolean expressions.  https://www.youtube.com/watch?v=diBEv4mhtO0

Google Advanced Search offers an easy way to use Boolean operators.  Instructions on how to enter terms in the boxes are included on the right side of the screen.

Google advanced search 2

Yahoo also offers an advanced web search along these same lines.

I typically use Google as my search engine; however I do not use the Google Advance Search feature.  I usually type the search term I want to find information on into the search box such as I described above (although my browser currently defaults to the Yahoo search engine – must change that since I prefer Google!).  Now that I have learned more about Boolean operators, I definitely plan to use them more often when I search the internet!

User Interfaces for Searching the Internet

Users search for information on the internet in different ways.  User interfaces strive to be intuitive in nature, wanting to make the search process easy and efficient for users.  If a particular interface doesn’t work for a user, they are typically going to turn to a different interface they find more “user-friendly”.

There are several theoretical models of how users search.  The classic model consists of four main activities:

1 – identify a problem

2 – articulate what information is needed

3 – formulate a query

4 – evaluate the results

This model is based on the assumption that a user’s information needs are static.  The dynamic model assumes a user’s information needs change as they learn during their search experience.

The searches I used in my job for the first 17 years were definitely dynamic in nature.  I work for the U.S. Postal Inspection Service and was initially an investigator.  The Inspection Service is the federal law enforcement arm of the U.S. Postal Service.  Postal Inspectors investigate any criminal activity with a nexus to the U.S. mail, from mail theft and mail fraud, to illegal items being sent through the mail such as drugs, child pornography, or bombs, to assaults of postal employees, and robberies and burglaries of postal facilities and vehicles.  When conducting investigations, many pieces of information you discover lead to new searches.  Some pieces of information turn out to be irrelevant, but until the pieces of the entire puzzle are put together, it can be hard to know what is relevant.  Interpretation of information can change based on additional information found and may lead to new paths of inquiry.

One of the intelligence tools we use is an interface that allows us to search public and, in some cases, proprietary records.  Until March of this year, my agency used LexisNexis Accurint as our primary platform to access this type of information.  In March, we switched to Thomson Reuter’s CLEAR service.

Several sections of our assigned textbook reading this week reminded me of our switch from Accurint to CLEAR.  Orienteering is the process of starting broad, then narrowing a search.  Some users are having a difficult time with the switch because with CLEAR, less information is more.  Users of Accurint had become accustomed to entering as much data as they had on a particular individual or address and receiving results to their query.  CLEAR uses different search interfaces, and the more information you enter, the more you narrow your field of search.  Accurint would return results that contained any individual portion of the data you entered; whereas CLEAR returns only results that contain ALL the data you entered.

Our text also discussed various user interfaces and the importance of subjective evaluations by users.  Many factors influence user evaluations including familiarity, speed, and types of features offered.  Some Postal Inspectors have embraced CLEAR and find it a more effective platform. Others are resistant to the change and have voiced problems they are having.  However, I often learn when I query those who indicate they prefer Accurint, that they haven’t taken advantage of the training (both in person and via the internet) being offered for CLEAR.  I think some users will always be resistant to any type of change, even if in the long run it makes a task easier and more efficient.

I also thought of my agency’s switch to CLEAR when reading about information visualization.  One of the reasons we switched to CLEAR was because data in CLEAR can be exported directly to i2, link-analysis software we use to provide just such visualization.  It can paint a picture of connections between people, addresses, and telephone numbers.  Another type of information visualization software we use is ArcGIS, mapping software.  Using visualization of data to tell a picture can be very powerful and much easier to interpret than to see a spreadsheet of the exact same data in text form.

What to do in Dublin, Ireland?!?

I’m writing this post from Dublin, Ireland! Slainte! I thought I’d write a post about how I decided what to see and do in Dublin.

I spent quite a bit of time with books I checked out from several of my local libraries, including:

Kathy Brown’s Ireland: Charming Inns and Itineraries (2006)
Frommer’s 24 Great Walks in Dublin (2009)
Ireland – DK Witness Travel Guide (2011)
Rick Steve’s Ireland 2013
Fodor’s 2014 Ireland
Ireland’s best trips: 24 amazing road trips (2013)

However, I really didn’t find them that useful. I ended up ordering Frommer’s Dublin day by day (2011) from Amazon.com since I couldn’t find a copy at any of my local libraries. I’ve used the day by day guides in Paris, Seattle, San Francisco, and New York, and found them very helpful. This time was no different! I like the fold out map that is included in the guides, although I actually found the map provided by the Brooks Hotel in Dublin where we stayed even more useful.

Using the day by day guide and the above referenced map from the hotel, we ultimately decided to visit the following:

St. Stephen’s Green Park
Grafton Street
George St. Arcade
St. Patrick’s Cathedal
Guinness Storehouse
Temple Bar
Dublin Castle
Christ Church Cathedral
Trinity College Library (Book of Kells!)

We also explored down many side streets and alleys. It’s been an awesome trip! Tomorrow we head southwest to Kilkenny and new adventures!

Here is one of my favorite pictures from the trip so far……

20140518-230933-83373030.jpg