This blog is dedicated to the in-depth review, analysis and discussion of technologies related to the search and discovery of information. This blog represents my views only and does not reflect those of my employer, IBM.


Thursday, November 30, 2006

Search for Dummies

No, I don't mean searching for people, a topic I addressed in my last post. I'm referring to the wonderfully intuitive approach John Wiley & Sons, Inc. take to presenting complex ideas and topics. My question is: Why can't search be made as easy in a similar way?

I also don't mean search as experienced by the end user. Even my mom has now mastered search, knowing what to expect when she types a few words into that little white search box. I'm referring to the behind the scenes processing that makes all of that magic happen. It is the administrators of search that bare the entire burden.

It is estimated that Google, for example, maintains 450,000 servers, arranged in racks located in cities around the world. For public search engines like Google this complexity is a part of doing business. Whether it is complex or made easy is irrelevant since their customers never experience the pain. It's like buying a preassembled bicycle. You don't care how difficult or long it took to put it together as long as it was done right. But you do care when you need to assemble it yourself.

The same is true for enterprises that want to deploy their own search engine. The ease at which the company can install and maintain the engine becomes extremely important. The parts involved are many. There are the crawlers that extract information from the various document repositories. There are the parsers that decipher and transform the information into an indexable form. There is the indexer that builds the search catalog. And lastly there is the search runtime that ultimately enables users to search the catalog. Each of these parts have their own inherent complexities which makes it difficult to assemble them into a coherent and easy to use package.

But that is just what IBM is trying to do. IBM recognizes that search is more than developing the best algorithms for producing relevant search results. An effective search engine must take into account the total cost of ownership and that includes ease of use. The easier a search product is to use, the less time a search administrator has to spend maintaining it, and that means time gained to be doing something else. IBM has been aggressively pursuing this ease-of-use doctrine which you can expect to experience in its next release of search products.

Click here to read more...


Wednesday, November 15, 2006

Searching for People

Have you ever "Googled" yourself or someone you know? It's a fun exercise and can sometimes produce surprising results. Like finding an old year book picture of yourself posted on the web that you weren't aware of. Social networks such as myspace.com, facebook.com, and linkedin.com have aided in the search for people and go further by providing a way for the people to connect once they're found. But I find it astounding that enterprises with large numbers of employees don't have similar tools at their disposal.

Employees are four times as likely to turn to a colleague for answers to work questions than to corporate knowledge management systems. And nearly two-thirds think the information is generally superior - according to a survey conducted by SelectMinds, which creates corporate employee and alumni networks.

Most companies do have user directories that contain basic information about each employee such as name, phone number, email and work address, job title, etc. And these directories tend to standardize on LDAP which is a networking protocol for querying and modifying directory services running over TCP/IP. But LDAP provides a woefully inadequate search capability. You can look up a person by typing in their name IF you know how to spell it.

It gets more interesting when you dont know who you are looking for. In this case I'm looking for experts in the company who might be able to answer a particular question. In addition to a list of documents in response to my search why cant I also get a list of people who might know? And wouldn't it be neat if the list of people had a brief summary of their expertise (similar to a document summary) so I could determine whether to contact them or not? I could also see multifaceted search playing a role. Some of the facets returned might be the different groups of people who might know the answer such as news groups, wikis, or even whole departments.

This type of "expertise" search requires much more information than the basic employee attributes typically captured in LDAP. It requires the composition of an employee profile that synthesizes what the employee does and knows. The common approach to capturing this information is to just require all employees to fully describe themselves. But this is where the fallacy lies. Most people are reluctant to fully describe themselves and willing to only provide minimal data (name, address, phone...). When is the last time you updated your resume or wrote a status report of your activities? The task becomes a burdensome chore and as such suffers in quality. This may be the reason we don't see an effective "people" search in business today.

Click here to read more...


Tuesday, November 07, 2006

IBM Information On Demand 2006 Conference Review


Three weeks ago I spoke at IBM's first annual Information On Demand (IOD) conference held in Anaheim California and I'm just now getting a chance to comment on the trip. By most standards it was a very large conference with over 5000 attendees. IBM combined six previous conferences into this one mega-conference and upped the fun factor with top notch entertainment to include Gladys Knight (without the pips), Wayne Brady (from "Who's Line is It Anyway"), and a key note address by Michael Eisner (former Disney CEO).

The conference is sponsored by IBM's Information Management Division the makers of DB2 and the division to which my search product is a part. I think the most exciting announcement at IOD was the introduction of a new product, the IBM Information Server, which integrates many of IBM's data products into a consistent first-of-a-kind software platform. Like the Web Server, and then Application Server, the Information Server goes to the next level enabling clients to deliver trusted, consistent and reusable information to applications and business processes.

For me the conference was a busy one. For the first three days (mon.-wed.) you would most likely have found me on the exhibit floor demonstrating how IBM's enterprise search product OmniFind could be used for business intelligence. I showed how OmniFind could tease out of full text documents facts and other related information that could then be fed into conventional BI tools (e.g., Cognos) for analysis.

On Thursday I gave a talk on OmniFind search security in Portal and Domino environments, followed by a four hour lab which gave up to forty students the opportunity to work with the OmniFind product directly. At first my colleague and I were not sure we would be able to do the lab. We accidentally chose an IP address for our VMWare image that matched the IP address of the wireless LAN for the conference (what are the chances of doing that). Once we figured out what the problem was we changed the IP address and were back on track (whew!).

In my spare time I was able to meander around the conference and learn what's new. What caught my interest the most were two ad-tech exhibits that demonstrated new search technologies from IBM research. One exhibit demonstrated a search for audio clips based on the content of the audio itself not just the closed caption text associated with the clip. The technology translated speech into text (not a trivial task) and then feed the text into a search engine for conventional search.

Another exhibit demonstrated multi-faceted search with computed expressions. Most multi-faceted search systems only show you counts representing the number of documents contained within each facet. The system from IBM not only showed you the counts but your choice of computed expression (e.g., average, sum, etc...) for the facet. For example, I could have a facet for light weight laptop computers and associate an average cost expression with the facet so that I not only would see how many laptops are in that category but also the average price. It starts to sound a lot like a relational database application but we need to remind ourselves that this information is being extracted and computed from textual information not structured information contained in a relational database.

Overall, I felt that the conference was excellent and something information architects and enthusiasts should not miss next October in Las Vegas.

Click here to read more...