BBC: Search Is a Backwater

September 27, 2008

I just read a quite remarkable essay by a gentleman named Richard Titus, Controller, User Experience & Design for BBC Future Media & Technology. (I like the word controller.) I am still confused by the time zone zipping I have experienced in the past seven days. At this moment in time, I don’t recall if I have met Mr. Titus or if I have read other writings by him. What struck me is that he was a keynote at a BBC Future Media & Technology Conference. My first reaction is that to learn the future a prestigious organization like the BBC might have turned toward the non-BBC world. The Beeb disagreed and looked for its staff to illuminate the gloomy passages of Christmas Yet to Come. You can read this essay “Search and Content Discovery” here. In fact, you must read it.

With enthusiasm I read the essay. Several points flew from the page directly into the dead letter office of my addled goose brain. There these hot little nuggets sat until I could approach them in safety. Here are the points that cooked my thinking:

  1. Key word search is brute force search.
  2. Yahoo BOSS is a way to embrace and extend search
  3. The Xoogler Cuil.com system looked promising but possibly disappoints
  4. Viewdle facial recognition software is prescient. (This is an outfit hooked up with Thomson Reuters, known for innovation by chasing markets before the revenue base crumbles away. I don’t associate professional publishers with innovation, however.)
  5. Naver from Korea is a super electronic game portal.
  6. Mahalo is a human-mediated system and also interesting, and the BBC has a topics page which also looks okay
  7. SearchMe, also built by Xooglers, uses a flash-based interface.

searchmeresults

Xooglers are inspired by Apple’s cover flow. Now how many hits did my query “beyond search” get. Can your father figure out how to view the next hit or make this one large enough to read, a brute force way to get information of course.

These points were followed by this statement:

When you marry solid data and indexing (everyone forgets that Google’s code base is almost ten years old), useful new data points (facial recognition, behavioral targeting, historical precedent, trust, etc) with a compelling and useful user experience, we may see some changes in the market leadership of search.

I would like to comment on each of these points:

Read more

TeezIR BV: Coquette or Quitter

September 26, 2008

For my first visit to Utrecht, once a bastion of Catholicism and now Rabobank stronghold, I wanted to speak with interesting companies engaged in search and content processing. After a little sleuthing, I spotted TeezIR, a company founded in November 2007. When I tried to track down one of the principals–Victor Van Tol, Arthus Van Bunningen, and Thijs Westerveld–I was stonewalled. I snagged a taxi and visited the firm’s address (according to trusty Google Maps) at Kanaalweg 17L-E, Building A6. I made my way to the second floor but was unable to rouse the TeezIR team. I am hesitant to say, “No one was there”. My ability to peer through walls after a nine hour flight is limited.

I asked myself, “Is TeezIR playing the role of a coquette or has the aforementioned team quit the search and content processing business?” I still don’t know. At the Hartmann conference, no one had heard of the company. One person asked me, “How did you find out about the company?” I just smiled my crafty goose grin and quacked in an evasive manner.

The trick was that one of my two or three readers of this Web log sent me a snippet of text and asked me if I knew of the company:

Proprietary, state-of-the-art technology is information retrieval and search technology. Technology is built up in “standardized building blocks” around search technology.

So, let’s assume TeezIR is still in business. I hope this is true because search, content processing, and the enterprise systems dependent on these functions are in a sorry state. Cloud computing is racing toward traditional on premises installations the way hurricanes line up to smash the American south east. There’s a reason cloud computing is gaining steam–on premises installations are too expensive, too complicated, and too much of a drag on a struggling business. I wanted to know if TeezIR was the next big thing.

My research revealed that TeezIR had some ties to the University of Twente. One person at the Hartmann conference told me that he thought he heard that a company in Ede had been looking for graduate students to do some work in information retrieval. Beyond that tantalizing comment, I was able to find some references to Antal van den Bosch, who has expertise in entity extraction. I found a single mention of Luuk Kornelius, who may have been an interim officer at TeezIR and at one time a laborer in the venture capital field with Arengo (no valid link found on September 16, 2009). Other interesting connections emerged from TeezIR to Arjen P. de Vries (University of Twente), Thomas Roelleke (once hooked up with Fredhopper), and Guido van’t Noordende (security specialist). Adding these names to the management team here, TeezIR looked like a promising start up.

Since I was drawing a blank on getting people affiliated with TeezIR to speak with me, I turned to my own list of international search engines here, and I began the thrilling task of hunting for needles in hay stacks. I tell people that research for me is a matter of running smart software. But for TeezIR, the work was the old-fashioned variety.

Overview

Here’s what I learned:

First, the company seemed to focus on the problem of locating experts. I grudgingly must call this a knowledge problem. In a large organization, it can be hard to find a colleague who, in theory, knows an answer to another employee’s question. Here’s a depiction of the areas in which TeezIR is (was?) working:

image

Second, TeezIR’s approach is (was?) to make search an implicit function. Like me, the TeezIR team realized that by itself search is a commodity, maybe a non starter in the revenue department. Here’s how TeezIR relates content processing to the problem of finding experts:

image

Read more

Google Yahoo: A Contrarian’s View

September 21, 2008

In high school, I would get into trouble by asking, “What if we look at this idea from a different point of view?” My high school teachers were kindly but not too eager to listen to a question and then a suggestion that their world view was out of kilter. I am not sure why I developed this habit of mind, but I learned when I got to my first real job at Halliburton (Nuclear Utility Services), I discovered that the nuclear physicists and mathematicians that made up 80 percent of the unit liked my approach. Instead of ignoring me or putting my desk in the hall as my high school teach Miss Sperling did, these guys and gals would light up light up like white LEDs and dig in, intellectually speaking.

After reading Randall Stross’s analysis here, I felt he was on the roight track for 180 degree thinking, but he was hitting the snow covered peaks, ignoring the basalt layers on which his big idea rests. Then I read with enjoyment Michael Arrington’s “Why the Google Yahoo Ad Deal Is Something Fear.” You can read that essay here. Not only did I enjoy his writing, several of his points resonated with me. Nevertheless, my contrarian approach levered both of these astute gentlemen’s comments into several ideas that are rotated a few degrees from each’s positions.

First, the barn is on fire. It’s burning fast. The horses gone. The hay is burning fiercely and the Harrod’s Creek fire engine aided by fire engines from elsewhere can’t douse the flames. So the fire fighting professionals hose some bushes, squirt water on the roof of an adjoining building, and watch the barn burn. My view is that Google was ignored in the period from 1995 to 1998 when Messrs. Brin and Page were fooling around with BackRub. Then in the period from 1998 to 2004, some smart money urged the Googlers and their small cohort of former DEC / AltaVista.com, Bell Labs, and Sun Microsystems’ colleagues forward. When the IPO loomed, Google settled with Yahoo for about a billion dollars. Yahoo realized that Google had learned from early GoTo.com, Overture.com, and other ad efforts. Instead of reinventing the stone wheel, Google had vulcanized a Michelin radial. Clumsy metaphor but if you have a stone wheel and your friendly competitor has Michelins you have limited choices. Yahoo sued and elected to keep using stone wheels. The result is that Yahoo has the kind of choice that BF Goodrich gives NASCAR teams. Use our tires or don’t race. Works in auto racing, and it is working in online advertising.

bullet train 02 copy copy

Who wants to stand in front of this and slow down this bullet train?

Second, Google really doesn’t “sell” advertising. Like the local utility monopoly or the local water company, you can sign up for power and water or spend your money drilling a geothermal hole, erecting solar panels, and buying Evian by the truck load. Google is a service, and if the users and the advertisers did not want to make whoopie, there’s not much Google can do about it. In fact, legislating to company dependent on Google traffic that it can no longer advertise is probably one of those remarkable opportunities to explore the law of unintended consequences in detail. I don’t know about you, but I have yet to meet a government paenl or regulatory committee who has a solid grasp that Google is a giant digital computer. Ad matching and users searching are just applications. If Google removes these functions, develoeprs can use Google’s APIs to build their own systems and Google can charge a fee and take a piece of the action. The result? No changes and maybe even more money for Google because there is a great deal of interest in tapping into Google traffic.

Read more

Search: Moving Up the Buzzword Chain of Being

September 20, 2008

In one of my university required courses, the professor revealed the secrets of “the great chain of being”. After 45 years, my recollection of Dr. Pearce’s lecture are fuzzy, but I recall at the top of the chain was God, then angels, and then a pecking order of creatures. Down at the bottom were paramecia like me.

Search terminology works like this I concluded after giving my talk at Erik Hartmann’s conference in Utrecht. I prepared for my remarks by talking with a dozen vendors exhibiting at the conference. I also listened to various presenters for five to 15 minutes. I had to limit my listening in order to get a representative sampling of the topics and interests of the conference attendees.

What I concluded was:

  1. People perceive Google as a Web search company that sells ads. In this biased sample, I noted a discomfort about Google’s growing dominance of digital information. I did not hear any one criticize Google, but I sensed a growing concern about privacy, scope, traffic, etc. I remain excited about Google and probably come across as a Google cheerleader, which annoyed some of the people with whom I spoke.
  2. Vendors and consultants who once hawked content management, records management, and enterprise search have changed their tune. Instead of talking about CMS, EDM, and other smart sounding acronyms, the vendors are pulling terminology out of MBA lexicons. (More about this in a moment.)
  3. The people listening to these talks, including mine, hunger–even plead–for solutions to challenges arising from their inability to find needed information, manage terabytes of digital “stuff” in their offices, and create a solution that does not require constant spoon feeding.

The result is that “old” solutions and half baked solutions are wrapped in new terminology taken from a higher level in the “great chain of buzzwords”. Here’s an example: instead of saying “enterprise search” or “behind the firewall search”, some vendors talked about “information access” and “findability” whatever that means. The lesser word is search, which most people seemed to agree was uninteresting, which is a code word for “does not work”. The words “information access” come from a loftier position on the buzzword “great chain of being”. The vendors are sounding more like McKinsey and Booz, Allen known nothings than subject matter experts.

great chain of being

A representation of the Great Chain of Being. Image source: http://www.kheper.net/topics/greatchainofbeing/Steps.gif

Consider this example: “business process management”. This is definitely a buzzword from a loftier position on the buzzword “great chain of being”. “BPM” is in the Heaven category, not Stone or Flame category. But I don’t know what BPM means. I think the folks using this word want to avoid precise definitions because that limits their freedom. Implying that “BPM” will solve a problem is easier than actually diagnosing the problem and solving it. “BPM” was the acronym of the conference. Presenters from publishers, consultancies, and vendors inserting this three letter token for what seemed like a pretty basic notion; that is, the steps needed to complete a task. Since search and content management are losers in the revenue generating department, folks engaged in these activities now talk about BPM. Old wine, new bottles but the labels have buzzwords from higher in the “great chain of being”.

Read more

How Smart Is Google’s Software?

September 17, 2008

When you read this, I will have completed my “Meet the Guru” session in Utrecht for Eric Hartmann. More information is here. My “guru” talk is not worthy of its name. What I want to discuss is the relationship between two components of Google’s online infrastructure. This venue will mark the first public reference to a topic I have been tracking and researching for several years–computational intelligence. Some background information appears in the Ignorance Is Futile Web log here.

I am going to reference my analysis of Google’s innovation method. I described this in my 2007 study The Google Legacy, and I want to mention one Google patent document; specifically, US20070198481, which is about fact extraction. I chose this particular document because it references research that began a couple of years before the filing and the 2007 granting of the patent. It’s important in my opinion because it reveals some information about Google’s intelligent agents, which Google references as “janitors” in the patent application. Another reason I want to highlight it is that it includes a representation of a Google results list as a report or dossier.

Each time I show a screen shot of the dossier, any Googlers in the audience tell me that I have Photoshopped the Google image, revealing their ignorance of Google’s public patent documents and the lousy graphical representations that Google routinely places in its patent filings. The quality of the images and the cute language like “janitors” are intended to make it difficult to figure out what Google engineers are doing in the Google cubicles. Any Googlers curious about this image (reproduced below) should look at Google’s own public documents before accusing me of spoofing Googzilla. This now happens frequently enough to annoy me, so, Googlers, prove you are the world’s smartest people by reading your own patent documents. That’s what I do to find revealing glimpses such as this one display for a search of the bound phrase “Michael Jackson”:

image

The highlight boxes and call outs are mine. What this diagram shows is a field (structured) report or dossier about Michael Jackson. The red vertical box identifies the field names of the data and the blue rectangle points your attention to the various names by which Michael Jackson is known; for example, Wacko Jacko.

Now this is a result that most people have never seen. Googlers react to this in shock and disbelief because only a handful of Google’s more than 19,000 employees have substantive data about what the firm’s top scientists are doing at their jobs. I’ve learned that 18,500 Googlers “run the game plan”, a Google phrase that means “Do what MOMA tells you”. Google patent documents are important because Google has hundreds of US patent applications and patents, not thousands like IBM and Microsoft. Consequently, there is intent behind funding research, paying attorneys, and dealing with the chaotic baloney that is the specialty of the USPTO.

Read more

Yahoo Open: Why the Odds Don’t Favor Yahoo

September 16, 2008

When we started The Point (Top 5% of the Internet) in 1993, our challenge was Yahoo. I recall my partner Chris Kitze telling me that the Yahoo vision was to provide a directory for the Internet. Yahoo did that. We sold The Point to Lycos and moved on. So did Yahoo. Yahoo become the first ad-supported version of America Online. The company also embarked on a series of acquisitions that permitted each unit to exist as a tiny fiefdom within the larger “directory” and emerging “ad-supported” AOL business. In the rush to portals and advertising, Yahoo ignored search and thus began its method of buying (Inktomi), licensing (InQuira), or getting with a buy out (Flickr) different search engines. Google was inspired by the Overture ad engine. Yahoo surveyed its heterogeneous collection of services, technologies, and systems and ended up the company it is today–an organization looking to throw a Hail Mary pass for the game winning touchdown. That strategy won’t work. Yahoo has to move beyond its Yahooligan approach to management, technology, and development.

image

The ArnoldIT.com and Beyond Search teams have had many conversations about Yahoo in the last year. Let me summarize the points that keep a lid on our enthusiasm for Yahoo and its present trajectory:

  1. Code fiddling. Yahoo squandered an opportunity to make the Delicious bookmarking service the dominant player in this segment because Yahoo’s engineers insisted on rewriting Delicious. Why fiddle? Our analysis suggests that Yahoo’s engineers don’t know how to take a hot property, scale it, and go for the jugular in the market. The approach is akin to recopying an accounting worksheet by hand because it is just better when the worksheet is perfect. Wrong.
  2. Street peddler pushcart. Yahoo never set up a method to integrate tightly each acquisition the company made. I recall a comment from a person involved in GeoCities years ago. The comment was, “Yahoo just let us do out own thing.” Again this is not a recipe for cost efficiency. Here’s why: The Overture system when acquired ran on Solaris with some home grown Linux security. Yahoo bought other properties that were anchored in MySQL. Then Yahoo engineers cooked up their own methods for tests like Mindset. When a problem arose, experts were in submarines and could not really help with other issues. Without a homogeneous engineering vision, staff were not interchangeable and costs remain tough to control. The situation is the same when my mother bought a gizmo from the street peddler in Campinas, Brazil. She got a deal, but the peddler did not have a clue about what the gizmo did, how it worked, or how to fix it. That’s Yahoo’s challenge today.
  3. Cube warfare. Here’s the situation that, according to my research, forced Terry Semel to set up a sandwich management system. One piece of bread was the set of technical professionals at Yahoo. The other piece of bread was Yahoo top management. Top management did not understand what the technical professionals said, and when technical professionals groused about other silos at Yahoo, Mr. Semel put a layer of MBAs between engineers and top management to sort out the messages. It did not work, and Yahoo continues to suffer from spats across, within, and among the technical units of the company. It took Yahoo years to resolve owning both Flickr and Yahoo Photos. I still can’t figure out which email system is which. I can’t find some Yahoo services. Shopping search is broken for me. An engineer here bought a Yahoo Music subscription service for his MP3 player. Didn’t work from day one, and not a single person from Yahoo lifted a finger, not even the one tracked down via IRC. I created some bookmarks and now have zero idea what the service was or where the marks are located. It took me a year to cancel the billing for a Yahoo music service a client paid me to test. (I think it was Yahoo Launch. Or Yahoo Radio. Or Yahoo Broadcast. Hard to keep ’em straight.) Why? No one cooperates. Google and Microsoft aren’t perfect. But compared to Yahoo, both outfits get passing grades. Yahoo gets to repeat a semester.

When I read the cheerleading for Google in CNet here or on the LA Times’s Web log here, I ask, “What’s the problem with nailing Yahoo on its deeper challenges?” I think it’s time for Yahoo to skip the cosmetics and grand standing. With the stock depressed, Yahoo could face a Waterloo if its Google deal goes south. Microsoft seems at this time to be indifferent to the plight of the Yahooligans. Google is cruising along with no significant challenge except a roadblock built of attorneys stacked like cord wood.

Yahoo is a consumer service. The quicker its thinks in terms of consumerizing its technology to get its costs in line with a consumer operation the better. I’m not sure 300 developers can do much for the corrosive effects of bad management and a questionable technical strategy. Maybe I’m wrong? Maybe not? We sold The Point in 1995 and moved on with our lives. Yahoo, in my opinion, still smacks of the Internet circa 1995, not 2008 and beyond.

Stephen Arnold, September 16, 2008

Extending SharePoint Search

September 15, 2008

Microsoft SharePoint is a widely used content management and collaboration system that ships with a workable search system, which I’ll refer to as ESS, for Enterprise Search System. But for program expansion and customization, you’ll want to look to third-party systems for help.

Sharepoint has reduced the time and complexity of customizing result pages, handling content on Microsoft Exchange servers, and accessing most standard file types. In our tests of SharePoint, ESS does a good job and offers some bells and whistles like identifying the individual whose content suggests an author is knowledgeable about a specific topic. Managing crawls or standard index cycles are point and click, SharePoint is security aware, and customization is easy. But licensees will hit a “glass ceiling” when indexing upwards of 30 million documents. To provide a solution, Microsoft purchased Fast Search & Transfer. Microsoft has released a Fast Search Web part to make integration of the FAST Enterprise Search Platform or ESP easier. The SharePoint FAST ESP Web part is located Microsoft’s CodePlex web site and the documentation can be obtained here.

But licensing Fast ESP can easily soar above $250,000, excluding customizing and integrating service fees making it a major investment to deliver acceptable search-and-retrieval functionality for large, disparate document collections. So what can a SharePoint licensee do for less money?

The good news is that there are numerous solutions available. These range from open source options such as Lucene and FLAX to the industrial-strength Autonomy IDOL (intelligent data operating layer), which can cost $300,000 or more before support and maintenance fees are tacked on.

Third-party systems can reduce the time required to index new and changed documents. One of the major reasons for shifting from the ESS to a third-party system is a need to provide certain features for your users. Among the most-requested functions are deduplication of result sets, parametric searching/browsing, entity extraction and on-the-fly classification, and options for merging different types of content in the SharePoint environment. The good news is that there are more than 300 vendors with enterprise search systems that to a greater or lesser degree support SharePoint. The bad news is that you have to select a system.

Switching Methodology

Each IT professional with Microsoft certification knows how to set up, configure, and maintain SharePoint and other “core” Microsoft server systems. Let’s look at a methodology for replacing SharePoint with ISYS Search Software’s ISYS:web. ISYS is one of a half-dozen vendors offering so-called “SharePoint Search” capabilities.

Here’s a run down of a procedure that minimizes pitfalls:

  1. Set up a development server with SharePoint running. You don’t need to activate the search services. This can be on a computer running Windows Server 2003 or 2008. Microsoft recommends at a minimum a server with dual CPUs, each running at least 3 GHz, and 2 GB of memory. Also necessary for installation are Internet Information Services (IIS, along with its WWW, SMTP, and Common Files components), version 3.0 or greater of the .NET Framework, and ASP.NET 2.0. A more detailed look at these requirements can be found here.
  2. Create a single machine with several folders containing documents and content representative of what you will be indexing.
  3. Install ISYS:web 8 on the machine running SharePoint.
  4. Work through the configuration screens, noting the information required to add additional content repositories to index. An intuitive ISYS Utilities program will let you configure SharePoint indexes.
  5. Launch the ISYS indexing component. Note the time indexing begins and ends. You will need these data in order to determine the index build time when you bring the system up for production.
  6. Run test queries on the indexed content. If the results are not what you expect, make a return visit to the ISYS set up screens, verify your choices, delete the index, and reindex the content collection. Be sure to check that entities are appearing in the ISYS display.
  7. Open the ISYS results template so you can familiarize yourself with the style sheet and the behind-display controls.
  8. Once you are satisfied that the basics are working, verify that ISYS is using security flags from Active Directory.

At this point, you can install ISYS on the production server and begin the processing of generating the master index. Image files for the ISYS installation are available from ISYS. These include screen shots illustrating how to set up the ISYS index.

Some Gotchas to Avoid

First, when documents change, the search system must recognize that change, copy or crawl the document, and make the changed document available to the indexing subsystem. The new index entries must be added to the main index. When a slow down occurs, check the resources available.

Second, keep in mind that new documents must be indexed and changed documents have to be reindexed. Setting the index update at too aggressive a level can slow down query processing. Clustering can speed up search systems, but you will need to allocate additional time to configure and optimize the systems.

Third, additional text processing features such as deduplication, entity extraction, clustering, and generating suggestions or See Also hints for users suck computing resources. Fancy extras can contribute to sluggish performance. Finally, trim the graphical bells and whistles. Eye candy can get in the way of a user’s getting the information required quickly.

To sum up, SharePoint ships with a usable search-and-retrieval system. When you want to break through the current document barrier or add features quickly, you will want to consider a third-party solution. Regardless of the system you select, set up a development server and run shake downs to make user the system will deliver the results the users need.

Stephen Arnold, September 15, 2008

Search: A Failure to Communicate

September 12, 2008

At lunch today, the ArnoldIT.com team embraced a law librarian. For Mongolian beef, this information professional agreed to talk about indexing. The conversation turned to the grousing that lawyers do when looking for information. I remembered seeing a cartoon that captured the the problem we shelled, boiled, and deviled during our Chinese meal.

failure to communicate chrisian

Source: http://www.i-heart-god.com/images/failure%20to%20communicate.jpg

Our lunch analysis identified three constituencies in a professionals services organization. We agreed that narrowing our focus to consultants, lawyers, financial mavens, and accountants was an easy way to put egg rolls in one basket.

First, we have the people who understand information. Think indexing, consistent tagging for XML documents, consistent bibliographic data, the credibility of the source, and other nuances that escape my 86 year old father when he searches for “Chicago Cubs”.

Second, we have the information technology people. The “information” in their title is a bit of misdirection that leads to a stir fry of trouble. IT pros understand databases and file types. Once data are structured and normalized, the job is complete. Algorithms can handle the indexing and the metadata. When a system needs to go faster, the fix is to buy hardware. If it breaks, the IT pros tinker a bit and then call in an authorized service provider.

Third, we have the professionals. These are the ladies and gentlemen who have trained to master a specific professional skill; for example, legal eagle or bean counter. These folks are trapped within their training. Their notions of information are shaped by their dead lines, crazed clients, and crushing billability.

Here’s where the search system or content processing system begins it rapid slide to the greasy bottom of the organization’s wok.

  1. No one listens or understands the other players’ definition of “information”.
  2. The three players, unable to get their points across, clam up and work to implement their vision of information
  3. The vendors, hungry for the licensing deal, steer clear of this internal collision of ignorant, often supremely confident souls
  4. The system is a clunker, doing nothing particularly well.

Enter the senior manager or the CFO. Users are unhappy. Maybe the system is broken and a big deal is lost or a legal matter goes against the organization. The senior manager wants a fix. The problem is that unless the three constituents go back to the definition of information and carry that common understanding through requirements, to procurement, to deployment, not much will change.

Like the old joke says, “Get me some new numbers or I will get a new numbers guy.” So, heads may roll. The problem remains the same. The search and content processing system annoys a majority of its users. Now, a question for you two or three readers, “How do we fix this problem in professional services organizations?

Stephen Arnold, September 12, 2008

Redshift: With Google It Depends From Where You Observe

September 10, 2008

My research suggests that opportunities, money, and customers are rushing toward Google. Competitors–like publishers–are trying to rush away, but the “gravitational pull” is too great. Traditional publishers don’t have the escape velocity to break away. What is this a redshift or a blueshift?

Dr. Greg Papadopoulos, Sun Microsystems wizard, gave a talk at the 2007 Analyst Summit (summit is an over used word in the conference universe in my opinion) called “Redshift: The Explosion of Massive Scale Systems.” I think much of the analysis is right on, but the notion of a “redshift” (not a misspelling) applies to rushing away from something, not rushing toward something. You can download a copy of this interesting presentation here. (Verified on September 9, 2008).

Dr. Papadopoulos referenced Google in this lecture in 2007. For the purposes of this post, I will think of his remarks as concerning Google. I’m a captive of my own narrow research. I think that’s why this presentation nagged at my mind for a year. Today, reading about hadron colliders and string theory, I realized that it depends on where one stands when observing Doppler effects. From my vantage point, I don’t think Google was a redshift. You can brush up on this notion by scanning the Wikipedia entry, which seems okay to me, but I am no theoretical physicist. I did work at a nuclear engineering firm, but I worked on goose feathers, not gluons and colors. From what I recall, when the object speeds away from the observer, you get the “red shift”. When the object rushes towards the observer, you get the blue shift. Redshift means the universe is expanding when one observes certain phenomena from earth. Blueshift means something is coming at you. Google is pretty darn blue to my eyes.

The Papadopoulos presentation contains a wealth of interesting and useful data. I am fighting the urge to cut, paste, borrow, and recycle. But there are three points that warrant a comment.

Read more

Attivio: New Release, Support for 50+ Languages

September 7, 2008

I’m not sure if it’s because Attivio is located less than five miles from Fenway Park and that everyone in that area is, by default, a rabid Sox fan, but I got a preview of a slick new baseball demo they’ve put together to showcase the capabilities of their Active Intelligence Engine (AIE), which is trademarked.

For the upcoming Enterprise Search Summit West in late September, Attivio created a single index that’s composed of more than 700,000 news articles, dating from 2001 to 2007 about baseball. Attivio told me that these were fed into the AIE in XML format. Attivio also processed a dozen comma delimited files that contain baseball statistics such as batting , pitching, player salaries, team information, players post season performances. Here’s the results from my search of steroids.

steroids

© Attivio, 2008

Several aspects of this interface struck me as noteworthy. I liked:

  1. The ability to enter a word or phrase, a SQL query, or a combination “free text” item and a SQL query. Combining the ambiguity of natural language with the precision of a structured query language instruction gives me the type of control I want in my analytic work. Laundry lists don’t help me much. Fully programmatic systems like those from SAS and SPSS are too unwieldy for the fast-cycle work that I have to do.
  2. The point-and-click access to entities, alternative views, and other “facet” functions. Without having to remember how to perform a pivot operation, I can easily view information from structured and unstructured sources with a mouse click. For my work, I often pop between data and information associated with a single individual. The Attivio approach is a time saver, which is important for my work on tight deadlines.
  3. Administrative controls. The Attivio 1.2 release makes it easy for me to turn on certain features when I need them; for example, I can disable the syntax view with a mouse click. When I need to fiddle with my search statement, a click turns the function back on. I can jump to an alerts page to specify what I want to receive automatically and configure other parameters.
  4. Hit highlighting. I want to be able to spot the key fact or passage without tedious scanning.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta