Real Time Search Systems, Part 2

June 22, 2010

Editor’s note: This post tiptoes through the tulips. In this instance, tulips is a synonym for industrial strength content processing systems that can be licensed by commercial entities. governmental organizations, or individuals who want to become a baby Fuld or Kroll. Achieving this type of azure chip transcendence means that you will be a hit at the local bingo parlor when you share your insights with your table mates.

Industrial Strength Tools

The free services don’t provide the user with much in the way of post processing horsepower. Another weakness of free services is that the average user deals with what each system spits out in response to a click or a query. The industrial strength systems provide such functions as:

A system or method for “plugging” in different streams of content. Examples range from electronic mail in the wonderful Microsoft Exchange Server to proprietary content stuffed into a clunky content management system. These connectors are a big deal because without different inputs of content, a real time search engine does not have the wood to burn in the fire box.

Each system provides or supports some type of software circuit board. The idea is that the content moves from the connectors over the circuits on the circuit board to its destination. Acquired content must be processed so its first destination is a system or systems which extract data, generate metadata, and, in the case of Google, figures out the context of the message. The result is an index that contains index terms, metadata, and often such extras as a representation of the source message, precalculated values, and new information constructs.

Applications or “hooks” that make it possible for another software program to tap into the generated values and processed content to create an output. Now the outputs can vary widely. Another software system may just look up an item. Another software application might glue together different items from the index and content representation. The user sees a report, a display on a mobile phone, or maybe a mashup which allows the human to “recognize” or “spot” what’s needed. No searching required.

The Vendors

In my lectures I mentioned some different outfits in each of my two talks. I have rolled up the vendors in the list below. My suggestion is to do some research about each of these companies. I provide “additional color” on the technologies each vendor licenses, but that information is not going to find its way into a free blog posting. Problem? Read the About information available from the tab at the top of this page.

  • Exalead http://www.exalead.com Robust system which handles structured and unstructured data. Outputs may be piped to other enterprise software, a report, or a peripatetic worker with a mobile phone in Starbucks.
  • Fetch Technologies http://fetch.com Developed initially for certain interesting government information needs, you can customize Fetch using its graphical programming method and perform some quite useful analyses
  • JackBe http://www.jackbe.com Developed initially for certain interesting government information needs, you can license JackBe and process a wide range of content.
  • Silobreaker http://www.silobreaker.com Developed initially for certain interesting government information needs, you can output reports that are as good as the roll ups crafted by a trained intelligence professional.

What do these systems do in “real time?” Each of them, when properly resourced, can ingest flows of data and unstructured content, assign metadata, and output alerts, reports, or Google-style search results within minutes of the content becoming known to the system.

If you have millions of dollars or euros, you can slice latency to a second or two. If you are a “normal” operation, you will be in the 12 to 15 minute range except with spikes hit. In the “interesting government need” category, spikes occur when some unusually events occur. No more from me on this topic in a free blog, gentle reader.

My Take

If you want to do low latency data and information analyses, you will need one of these systems or a similar system from another vendor. If you can code your own, you should not be reading this blog. You should be running your own company.

What have I left out? Probably my views of each of these technologies, the Google example I used, and the screen shots of outputs from each of these systems. To demonstrate my good goose heart, herewith is an architectural diagram of how the the industrial strength systems are generally set up. The diagram is three years old and comes from an outfit called Momentum SI. A happy quack to this outfit.

image

Stephen E Arnold, June 22, 2010

Freebie

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta