We learned in Chapter 7 that information retrieval languages allow ranking, whereas data retrieval languages do not. Basic information retrieval models use keyword based retrieval. A single word is the most basic query. The oldest and still one of the most used methods to combine keyword queries is through the use of Boolean operators. “Boolean operators are simple words (AND, OR, NOT or AND NOT) used as conjunctions to combine or exclude keywords in a search, resulting in more focused and productive results.” http://library.alliant.edu/screens/boolean.pdf
The use of AND returns all documents that contain both keywords separated by the Boolean operators, thus narrow the search.
The use of OR returns all documents that contain either of the keywords, thus expanding the search.
The use of NOT returns all documents that contain the first keyword, but not the second, thus excluding unwanted results.
The natural order for processing Boolean operators is:
- 1st = NOT
- 2nd = AND
- 3rd = OR
Parenthesis can be used to change the natural order.
Patterns can also be used in information retrieval queries. Some types of patterns that may be queried are:
- Prefixes – strings that must appear at the beginning of a word in the text
- Suffixes – strings that must appear at the end of the a word in the text
- Substrings – strings that must appear somewhere within the words in the text
Queries are typically employed to either:
- Locate facts
- Collect information on a topic
- Browse collections
Studies have specified different classes of queries, such as:
Other studies classify queries by the topic of the query.
I decided to use several examples given by Chu in our text, Information Representation and Retrieval in the Digital Age, to see what the query results looked like.
I used Chrome as my browser and Google as my search engine. I ran the query “filtering AND controversy”. The results made sense. Then I ran “filtering OR controversy”. Again, the results looked like what I expected. Lastly I ran “filtering NOT controversy”. I was stumped by the results I got on this search. So I decided to use Google Advanced search and re-ran the queries.
Here is a screen shot of the returned results for “filtering AND controversy”:
For “filtering OR controversy”:
This search brought back results that contained either “filtering” or “controversy”, but not both. In order to see the first result with “filtering”, I had to scroll down below what you see in the screen shot above.
When I used Google Advanced search for “filtering NOT controversy”, I got the results I expected:
After I ran the queries using Google Advanced search, I decided to run the query “using Boolean queries in Google” to see if I could figure out why the first time I ran the queries not using the Advanced search the results using NOT were not what I anticipated (!). Here is the results page of my query:
I found the 4th item returned to be very helpful. It is a blog post titled “Using Boolean Search Operators with Google” from the blog “Amy’s Scrap Bag: A Blog about Libraries, Archives, and History.” http://amysscrapbag.wordpress.com/2012/11/05/boolean-operators-and-google/
From Amy I learned that Boolean operators “AND” and “OR” worked the way I thought they would (fortunately I had used both in UPPER CASE – if I had not, I would probably have been confused by the results I got with “or”). But instead of NOT, Google only recognizes the minus sign (-) to exclude words or phrases and that it must be places immediately in front of the word or phrase without a space. I noticed when I ran the Advanced Google search to exclude a word, “filtering –controversy” is what was shown in the search box (see screen shot above from when I ran a Google Advanced search to exclude a word).
I really enjoyed this Module, because not only did I learn how to properly use Boolean operators in Google, I found a new blog to add to my RSS feed to follow!