This blog is dedicated to the in-depth review, analysis and discussion of technologies related to the search and discovery of information. This blog represents my views only and does not reflect those of my employer, IBM.

Friday, March 23, 2007

Search vs. Discovery

Often times I have been asked to explain the difference between search and discovery. Many feel they mean the same thing or are at least interrelated in some way as in "I have discovered the answer to my query". But actually they are quite different. One way to contrast the two is to classify them by what you know and don't know. That is you search for what you know and discover what you don't know.

When I search, I already have a target in mind be it a document, product, or piece of information. My task is to formulate a query in such a way as to improve the chances for an exact or partial match to some portion of the target document. Keywords in the query tend to be more descriptive so as to qualify exactly what I am looking for. For example, the query "replacement filter for a Masterblade lawn mower" leaves little room for ambiguity.

Discovery, on the other hand, is exploratory in nature driven by a general goal. A search engine becomes a discovery engine when the query is used as a starting point from which to learn more about a particular topic. Just as hyperlinks within web documents facilitate the quick navigation through related topics of information, a discovery engine provides various facets of the result set in the form of navigational links. These links represent different dimensions of the result set and allow you to drill down or sideways depending on the facet. The illustration to the right presents different facets of the results for the query "digital cameras".

With a discovery engine I can more easily "surf" my results quickly jumping from one facet to another depending on what strikes my interest. Not much different than surfing the web. With a conventional search engine the burden is on you to click on a result, read the document, and then potentially click on any embedded hyperlinks that might appear interesting. To learn more see IBM's OmniFind: Discovery Edition.

Click here to

Tuesday, February 13, 2007

Type-ahead to Read My Mind

I’m sure you’ve experienced this before. That surprised, kind of gee-wiz feeling you get when you start typing just a few characters and the program you are using attempts to read your mind and guess what you are trying to type. You’ve seen it used in your browser when previously entered URLs with the same starting characters appear in a drop down list as you type. Or you’ve experienced it in your favorite word processor when it automatically rearranges or inserts letters into what you are typing to arrive at the correct spelling of a word. Inevitably these time-saving techniques have also made their way into the search world.

Google, for example, presents a drop down list of submitted queries by those whose previous search expressions have started with the same characters you have typed (shown right). What’s amazing is the speed at which this list is updated as you type additional characters – the list becoming more refined as you complete your query. Click on an entry and the results for that query are automatically retrieved.

Your browser (and Google for that matter) use input logs to recall previously entered values that match your current input. IBM’s enterprise search engine OmniFind was extended in just this way to provide query suggest capability for IFPMA’s clinical trial search.

A lexicon on the other hand is an alternative to query logs and is demonstrated in SurfWax’s LookAhead auto-completion service. A lexicon is a controlled vocabulary that provides end users with a more intelligent, site-centric association of concepts, terms, and products. All lexicon terms are automatically "rotated" as they are imported into LookAhead. For example, L.L. Bean users could find, "True Comfort Footwear" by starting to type either "com..." or "fo..." or "tru...”. The instant display of rotated terms makes browsing a site fast and encourages discovery.

Collarity is a company that has taken type-ahead to the next level with their Compass product. The screen shot to the right shows their Compass widget which offers more functionality than the conventional drop down list.
  • First the user has the ability to choose via a slider control whether the recall values are to come from previous personal, community or global entry logs.
  • The terms themselves are rolled up so that no one term dominates the list. For example “computer system” and “computer technology” appear on the same line.
  • A small set of search results (URLs) are listed for the terms that have been typed in so far.
  • And lastly, the user can switch the scope of the search to either the entire web or restrict it to the current site by clicking on the tabs on the right hand side

There are some users who find these rapidly changing drop down type-aheads annoying. I myself find them intriguing. Its always interesting to see what others have been searching for and if it saves me a few keystrokes of my time then I’m all for it.

Click here to

Thursday, January 25, 2007

Lotusphere 2007 in Review

Lotusphere in Orlando, Florida is Lotus' annual event to thank their customers and to show off their new features and products for the coming year. I've been presenting at Lotusphere for nine years now and must say that this year was exceptional. With over 7000 customers in attendance and a keynote speech by Neil Armstrong, you could feel the excitement in the air (besides the warm Florida breezes).

Lotus is a collaboration software company most noted for its Domino and Lotus Notes brands. For us grey beards you may even remember when Lotus 1-2-3 was the slickest spreadsheet around. The engineers at Lotus have always been admired for their first to market innovations and this year kept true to their reputation.

First, you didn't even have to be in Orlando to attend Lotusphere. You could log into Second Life and navigate your Avatar through a virtual Lotusphere (pretty cool). They also have taken instant messaging to the next level with Sametime 7.5 that features the integration of audio, video, web conferencing, program sharing, and a multi-protocol gateway that allows you to communicate with other vendor's messaging systems by AOL, Google, and Yahoo! - something I've always wanted to do.

In addition, Notes users will be able to do activity-based computing with "Activities," a Lotus technology that shares and organizes e-mail, instant messages, documents and other items related to a particular activity or project into one logical unit. I like this feature because it unshackles you from using email as the primary work tool for inbox driven task management.

Two new products were announced, namely Lotus Connections and Quickr.
Connections will feature five Web 2.0 technologies designed to allow users to collaborate on activities, communities, bookmarks, profiles and blogs. Quickr is a collaborative content-sharing program which can be perceived as the next generation of Quickplace.

My main goal at Lotusphere was to present OmniFind and its enterprise search capabilities but to also learn about what's new from a search perspective within Lotus. What I learned, which also confirmed what I blogged about earlier, is the search for people. Collaboration software is all about people and their interaction with each other through technology. And search is a key component of that technology.

I witnessed a very cool people search interface in the innovation lab that you had to see to appreciate. Based off of people collaboration in Activities it presented the weighted interaction and contributions of people in an easy to understand graph. People's faces were used as nodes in the graph and their connections as emphasized lines depending on how much they interacted. The people (nodes) were also placed inside boxes representing a particular organization that they belonged which could be adjusted by a slide rule (department, division, country...granularity). What was really cool was you could grab a time line thumbnail and as you moved it see the evolution of the interactions between the different members of the activity. Again, you had to see it to believe it.

Conference Grade: A+

Click here to

Tuesday, January 16, 2007

Nice! But how do I add search to my web site

Wow! What a response to IBM and Yahoo's search announcement last month. There have been over 10,000 downloads of the free enterprise search offering, all within two weeks - and that was over the holidays to boot. You may have already downloaded the IBM OmniFind Yahoo Edition and discovered how easy it is to setup an index and start searching. But perhaps you are beyond that point and are now investigating how to integrate OmniFind's search functionality into your web site.

I have published an article titled "Add IBM OmniFind Yahoo Edition to your Web Site" that describes several approaches to accomplishing this. Check it might find the information useful.

On a separate note, my next post will be a review of the Lotusphere Conference to be held in Orlando, Florida (1/21-25). In addition to being held in sunny Florida during the winter, this is an exciting conference that exhibits the latest in collaboration software by Lotus. If you are one of the 10,000 attendees, do look me up. I will be there presenting a session on enterprise search.

Click here to

Wednesday, December 13, 2006

IBM Unveils Free Enterprise Search Engine!

I’ve been itching to talk about this for a while now and the day has finally come. Today, IBM® and Yahoo! have just announced their partnership to offer a free downloadable enterprise search engine. One that is extremely easy to set up and use. IBM’s OmniFind™ Yahoo! Edition can crawl and index up to half a million Web pages and/or file system documents and make them available for search through a simple to use, Yahoo like, Web interface. You should try it out yourself if you haven’t already. You’ll be amazed at how quick and easy it is to set up an index and start searching.

What is unique about this free offering is that it is built upon open source technologies (read my previous post on the benefits of open source). The core indexer is Apache’s Lucene which will continually gain from the rapid high quality improvements made by the open source community. IBM is a member of the Lucene Project Management Committee (PMC) and is actively contributing to Lucene’s development.

But an effective enterprise search solution is more than just the indexer. IBM has taken Lucene’s proven indexing technology and added its own enhancements. IBM has added web and file system crawlers; installation, administrative and search GUIs; monitoring; HTML and office document support, linguistic processing, language support and much more. IBM has managed to assemble these parts, each with their own inherent complexities, into a coherent and extremely easy to use package (see “Search for Dummies”).

And therein lays the key – its ease of use. With just three clicks, installing IBM’s OmniFind™ Yahoo! Edition is a snap. Setting up an index is just as easy. Just point the crawlers to where the enterprise content is and the index is automatically built for you. Within seconds you can start searching what has been discovered to date, even while the crawlers are busy fetching the remaining content. The search application has the familiar and intuitive Yahoo! Web interface which can be customized using a WYSIWYG layout editor. No programming skills required! And for you programmer types you can learn about other customization techniques by reading my article “Adding IBM’s OmniFind Yahoo Edition to Your Web Site”.

Today’s announcement is not about competing with Google. IBM and Yahoo! are focused on information access for the enterprise market, which has different requirements than Google’s consumer market, and where customers are looking for information solutions that go well beyond Google-style keyword search. Nor, is this free offering an affront to Microsoft whose information access portfolio does not provide the growth path to high-value business insight solutions such as those delivering value today to IBM’s clients. IBM’s OmniFind™ Yahoo! Edition revolutionizes the information access market with a no-cost on-ramp to enterprise search.

Click here to

Thursday, November 30, 2006

Search for Dummies

No, I don't mean searching for people, a topic I addressed in my last post. I'm referring to the wonderfully intuitive approach John Wiley & Sons, Inc. take to presenting complex ideas and topics. My question is: Why can't search be made as easy in a similar way?

I also don't mean search as experienced by the end user. Even my mom has now mastered search, knowing what to expect when she types a few words into that little white search box. I'm referring to the behind the scenes processing that makes all of that magic happen. It is the administrators of search that bare the entire burden.

It is estimated that Google, for example, maintains 450,000 servers, arranged in racks located in cities around the world. For public search engines like Google this complexity is a part of doing business. Whether it is complex or made easy is irrelevant since their customers never experience the pain. It's like buying a preassembled bicycle. You don't care how difficult or long it took to put it together as long as it was done right. But you do care when you need to assemble it yourself.

The same is true for enterprises that want to deploy their own search engine. The ease at which the company can install and maintain the engine becomes extremely important. The parts involved are many. There are the crawlers that extract information from the various document repositories. There are the parsers that decipher and transform the information into an indexable form. There is the indexer that builds the search catalog. And lastly there is the search runtime that ultimately enables users to search the catalog. Each of these parts have their own inherent complexities which makes it difficult to assemble them into a coherent and easy to use package.

But that is just what IBM is trying to do. IBM recognizes that search is more than developing the best algorithms for producing relevant search results. An effective search engine must take into account the total cost of ownership and that includes ease of use. The easier a search product is to use, the less time a search administrator has to spend maintaining it, and that means time gained to be doing something else. IBM has been aggressively pursuing this ease-of-use doctrine which you can expect to experience in its next release of search products.

Click here to

Wednesday, November 15, 2006

Searching for People

Have you ever "Googled" yourself or someone you know? It's a fun exercise and can sometimes produce surprising results. Like finding an old year book picture of yourself posted on the web that you weren't aware of. Social networks such as,, and have aided in the search for people and go further by providing a way for the people to connect once they're found. But I find it astounding that enterprises with large numbers of employees don't have similar tools at their disposal.

Employees are four times as likely to turn to a colleague for answers to work questions than to corporate knowledge management systems. And nearly two-thirds think the information is generally superior - according to a survey conducted by SelectMinds, which creates corporate employee and alumni networks.

Most companies do have user directories that contain basic information about each employee such as name, phone number, email and work address, job title, etc. And these directories tend to standardize on LDAP which is a networking protocol for querying and modifying directory services running over TCP/IP. But LDAP provides a woefully inadequate search capability. You can look up a person by typing in their name IF you know how to spell it.

It gets more interesting when you dont know who you are looking for. In this case I'm looking for experts in the company who might be able to answer a particular question. In addition to a list of documents in response to my search why cant I also get a list of people who might know? And wouldn't it be neat if the list of people had a brief summary of their expertise (similar to a document summary) so I could determine whether to contact them or not? I could also see multifaceted search playing a role. Some of the facets returned might be the different groups of people who might know the answer such as news groups, wikis, or even whole departments.

This type of "expertise" search requires much more information than the basic employee attributes typically captured in LDAP. It requires the composition of an employee profile that synthesizes what the employee does and knows. The common approach to capturing this information is to just require all employees to fully describe themselves. But this is where the fallacy lies. Most people are reluctant to fully describe themselves and willing to only provide minimal data (name, address, phone...). When is the last time you updated your resume or wrote a status report of your activities? The task becomes a burdensome chore and as such suffers in quality. This may be the reason we don't see an effective "people" search in business today.

Click here to