Autonomy Upgrades Investigative System

November 15, 2008

Autonomy, based in Cambridge, England, continues to be one of the most agile of the information access and services company. The firm has updated its Intelligent Investigator & Early Case Assessment software. You can read about the story here or visit the Autonomy Web site for more details. Autonomy asserts that its software can understand the meaning of large volumes of data collected in an investigation or similar procedure. Once the structured and unstructured data are processed, an investigator can use the Autonomy system:

to reconstruct what occurred, develop informed case strategies and sweep aside non-responsive data. A seamless link with Autonomy Legal Hold software automatically provides a legally defensible preservation and collection process.

Features of the investigative system include:

  • A case centric view of the data. The idea is that an investigator can get a bird’s eye view of information, events, persons of interest, and time in a matter
  • A new feature to analyze data where it resides and provide answers to queries without building a collection and performing some of the manual tasks other systems require
  • A risk component
  • Enhanced entity extraction and alias identification

Other companies offer case management and investigative tools. Autonomy’s broad sweep of software and systems allows the company to provide a solution that can mesh with almost any organizational or legal requirement. Will Autonomy sweep the field in this market? I know the company will try? The challenge will be to convince investigative units and lawyers to try new methods. Investigators and lawyers can be like my grandmother–set in her ways. A number of search and content processing companies are looking closely at these specialized markets. When the economy goes south, legal activity goes north. Autonomy has demonstrated it knows which way the compass is spinning.

Stephen Arnold, November 15, 2008

Yasni: People Search

November 14, 2008

yasni a people search engine, just launched in the U.S. If you’re on the web, yasni supposedly will find you. But the search is on first and last names, and there are lots of “Jessica Bratcher”s out there. My yasni search returned 30 results, including hits on amazon.com, Facebook, MySpace, Google News and Blogs, Technorati, even criminal searches. But for more listings, they’ll send me an e-mail list within 24 hours.

People search has been and remains very important. Zoom Info, LinkedIn, and other sites provide useful information. I have found Cluuz.com useful as well. Cluuz.com displays relationship charts. I did some ego surfing to test yasni and I ran the same queries on Cluuz.com. On Cluuz.com, I found an interview I did in 2005. Cluuz.com also surfaced several articles about newspaper awards I’ve received. On my test queries, I did not find yasni as useful. But it is early in the game for yasni. I will check back in a month or so to see how the service develops. I do recommend that you give it a whirl.

Jessica Bratcher, November 14, 2008

Webinar: Open Standards and Semantic Technology

November 14, 2008

The economic downturn worldwide bodes poorly for dollars to add more search technologies to the enterprise, but the umbrella in the thunderstorm may be found in a movement quietly readying for a download launch. When will a standardized, semantic IT infrastructure be the basis of the enterprise’s entire IT framework for operations across all divisions?

There is a growing discussion in Europe, now spilling over into the US, regarding the SMILA project, the SeMantic Information Logistics Architecture. For more detail, click here or navigate http://eccenca.broxblogs.de. This open source solution is coming from a partnering of brox IT-Solutions, and empolis in Germany through Eclipse.org.

Semantic Technologies

Semantic technologies continue to gain in the discussion amongst researchers and companies investing in their own search frameworks across the organization because it is the unstructured data that remains the elephant in the room. There are proponents in several large IT companies that believe an answer is available in SMILA. When will a semantic IT infrastructure be the basis of the enterprise’s entire IT framework for operations across all divisions? Consider this white paper (in German-use translate.google.com) http://www.heise.de/open/Union-Investment-Integrationsplattform-auf-Basis-offener-Standards–/artikel/118395 The paper contends that:

“Open standards make applications more quickly realized and flawless.”

Eccenca is the commercial level version available for enterprise that is being deployed with professional services and support. At brox, the company is building commercial-grade architecture and applications for the enterprise under the Eccenca Foundation, based on the SMILA codebase. Eccenca products will reflect internal expertise of existing customer requests, including those of startups in Theseus, Volkswagen, and others. See more information in the response to this blog’s recent discussion (Nov.4th) at http://h3lge.de/weblog/. Eccenca.com and the first download of SMILA are anticipated in short order. At Eccenca.com, brox will set up and manage a marketplace for standard-based plug ins, solutions, and expertise.

Webinar

There is a webinar in English coming up to discuss this whole approach further, coming up on December 17, 2008. The seminar will run about one hour and take place at 8:00 am PDT / 11:00 pm EDT / 4:00 pm GMT. The seminar will be given by Georg Schmidt (brox IT-Solutions) and Igor Novakovic (empolis). The title of the webinar is “SMILA – SeMantic Information Logistics Architecture.” This webinar will present the SMILA project (emphasizing the integration possibilities), provide the status report about the latest project developments and give a short demonstration of currently implemented features.

The webinar will discuss the challenge of the amount and diversity of information is growing exponentially, mainly in the area of unstructured data, like emails, text files, blogs and images. Poor data accessibility, user rights integration and the lack of semantic metadata are constraining factors for building next generation enterprise search and other document centric applications. Missing standards result in proprietary solutions with huge short and long term cost. SMILA is an extensible framework for building search solutions to access unstructured information in the enterprise. Besides providing essential infrastructure components and services, SMILA also delivers ready-to-use add-on components, like connectors to most relevant data sources. Using the framework as their basis will enable developers to concentrate on the creation of higher value solutions, like semantic driven applications.

An article authored by Dawn Marie Yankeelov, president of ASPectx.

Google and Novel Content

November 13, 2008

On November 11, 2008, Google received a patent for the invention “Detecting Novel Content”, US7451120. In my opinion this is an important Google invention. The system and method makes it possible for Google to identify a segment of a document that contains interesting information. “Novel” is a code word for distinctive information. The abstract for the invention is:

A system determines an ordered sequence of documents and determines an amount of novel content contained in each document of the ordered sequence of documents. The system assigns a novelty score to each document based on the determined amount of novel content.

Let’s assume that Google uses this invention. What can the method deliver? My thought was a compilation of novel content on a user-specified subject. Traditional publishers cut and paste to create anthologies. In the 15th and 16th centuries books that were collections of snippets were used to teach students Latin and Greek. Another possible use of the method would be to snip content from one document and place that snippet and its metadata into a dataspace.

Stephen Arnold, November 13, 2008

ISYS:web 9 Now Available

November 11, 2008

A happy quack to the reader in Colorado who alerted me to the new release of ISYS Search Software Version 9.0. I had a pre release version, and I found that its speed and date features were particularly useful. According to ISYS Search Software:

ISYS:web 9 offers customers several major enhancements, all designed to deliver the speed, efficiency and accuracy required to find information fast. More importantly, ISYS has expanded its content mining capabilities using predictive and reliable methods that help customers better understand their content. Through its Intelligent Content Analysis, ISYS notes key characteristics about a content collection, such as metadata patterns and entities, and leverages these facets in the interface to provide a more fluid search and discovery process.

Among the new features are:

  • Intelligent Query Expansion. Designed to give users greater context and avenues to pursue, Intelligent Query Expansion offers suggestions based on your query and the document. For example, a search for “SharePoint” might suggest “SharePoint search web part.”
  • ContextCogs are snippets of relevant and contextual information pulled from third-party sources and displayed alongside standard ISYS results. When a search is executed, the query is also passed to each registered Cog, which could include enterprise-level applications, Internet search engines or Active Directory Contacts.
  • Intelligence Clouds enable rapid navigation of key information. The tag cloud appears as a collection of search terms and phrases, with the various terms shown in larger or smaller fonts depending on their density within the index.
  • Improved Performance and Scalability. ISYS:web handles most search requests concurrently with a higher throughput. Additionally, we’ve increased index capacity from 24 gigabytes to 384 gigabytes per index. With indexed data representing, on average, 10 to 20 percent of the total data size, ISYS can now index two to four terabytes of information per index.
  • Search Form Customization. ISYS now offers both automatic and custom designed search forms. For automatic search forms, users point the wizard at their indexes and ISYS creates a search form automatically by analyzing the content and structure of the information. ISYS also offers a point-and-click method for creating forms for searching structured information.
  • Index Biasing. ISYS told me that the company wanted to enhance ISYS:web’s tuning capabilities. “Tuning” in this context means giving administrators with the ability to adjust the weighting on entire collections of documents. This option enables an organization to further tune relevance to suit specific situations; for example, boost specific content across result sets.
  • De-Duplication. ISYS automatically identifies identical documents and either removes them from the results or visually marks them. This capability is of particular importance to legal professionals conducting discovery work, or any user attempting to conduct analysis of a given content collection
  • ISYS:web Federator allows customers to federate their searches across both ISYS and non-ISYS content sources. ISYS:web displays results from each source separately, allowing users to navigate between the sets of results without compromising relevance.
  • Exchange Indexing. Particularly important for responding in a timely manner to discovery requests, ISYS:web enables administrators to centrally create and manage individual indexes for each user’s email account. Administrators can also opt to make these indexes available to end users, relying on Active Directory permissions to ensure users can only search the email indexes for which they are authorized.

I ran several queries on the new system. You can read about my tests and examine a sample screen shot here. In my April 2008 study for the Gilbane Group, I identified ISYS Search Software as a “company to watch.” In fact, I highlighted the company in lecture about enterprise search in 2009 here. For more information about the company, navigate to the ISYS Search Software Web site here. You can download a trial version of the software here. If you want to get a flavor for the company’s commitment to search, you may find the interview I conducted with Ian Davies, founder of ISYS Search Software a way to understand the firm’s approach to information access. I conducted the interview in March 20008, but it is quite relevant today (November 11, 2008).

Stephen Arnold, November 11, 2008

Disturbing Data, Possible Parallel for Search

October 30, 2008

After wrapping up another section of my forthcoming monograph Google Publishing technology for Infonortics Ltd. in Tetbury, England, I scanned the content sucked in by my crawlers. Another odd duck greeted me with the off point headline “Outlook: Don’t Panic It’s Not 2001” here. (This is a wacky url so you may have to navigate to the parent site www.commsdesign.com and hunt for the author Bolaji Ojo.

For me, one telling paragraph was:

In 2001, for instance, the wireline communications equipment market sank 18 percent to $69.6 billion, from $85.3 billion in the previous year. Semiconductor sales to the segment tumbled 37 percent on a combination of sagging demand and severe pricing declines. Seven years later, wired communications equipment sales have yet to recover to the 2000 level, and estimates indicate the market won’t bounce back fully until sometime in the next decade. ISuppli expects 2009 wired communications sales to be approximately $76.6 billion, improving from an estimated $72.5 billion in 2008, but still below the record 2000 figure of $85 billion.

image

Source: http://thesaleswars.wordpress.com/2008/02/

Another interesting point was:

The entire semiconductor market wasn’t as fortunate. Chip sales plunged 43 percent in 2001, to $101.8 billion from $178.9 billion in 2000, according to the Semiconductor Industry Association. The industry resumed growth in 2002, but it wasn’t until 2004 before global sales finally crawled past the previous record. By then, dozens of semiconductor, passives, interconnect and electromechanical companies and electronic manufacturing services providers had disappeared, some merging with stronger rivals. A few others went under, unable to finance operations as customers froze purchases or exited the embattled networking equipment market.

What these data suggested to me was that the search, content processing, and search enabled application sectors may face significant revenue declines and could take years to recover. The loss of companies that have no revenue is understandable. Funding sources may dry up or cut off the flow of money. Large firms may shed staff, but these vendors will, for the most part, remain in business. The real pressure falls on what I call “tweeners”. Tweeners are organizations that are in growth mode but the broader downturn can reduce their sales and squeeze the companies’ available cash. Slow payment from customers adds to the problem.

Read more

Amazon’s iTunes Like Interface

October 28, 2008

Amazon has developed a new interface. You can read the news story on TechCrunch here. The graphical presentation is intended to make it easier and more fun to browse Amazon’s products. Jason Kinkaid’s article does a very good job of explaining the features of this interface. For me, the most important comment in the write up was:

The site seems geared towards shoppers who are just looking for ideas, as there isn’t a search feature. Users can scroll through the site using their arrow keys, zooming in on individual products by hitting the spacebar. Each product includes a demo video (in the case of movies, songs, and video games) or an excerpt (from books).

I have often asserted that search is dead. I did not say that search was not useful. Amazon believes it has cracked the code on information retrieval without asking the user to type in the title of a book or an author’s name. Amazon wants to be a combination of Apple and Google. Amazon may have to keep trying to manage this transition.

Stephen Arnold, October 28, 2008

Exalead: Making Headway in the US

October 25, 2008

Exalead, based in Paris, has been increasing its footprint in the US. The company has expanded its US operation and now it is making headlines in information technology publications. The company has updated its enterprise search system CloudView. Peter Sayer’s “Exalead Updates Enterprise Search to Explore Data Cloud” here provides a good summary of the system’s new features. For me, the most important comment in the Network World article was this comment:

Our approach is very different from Google’s in that we’re interested in conversational search,” he [the president of Exalead] said. That ‘conversation’ takes the form of a series of interactions in which Exalead invites searchers to refine their request by clicking on related terms or links that will restrict the search to certain kinds of site (such as blogs or forums), document format (PDF, Word) or language.”

Exalead’s engineering, however, is the company “secret sauce.” My research revealed that Exalead uses many of the techniques first pioneered by AltaVista.com, Google, and Amazon. As a result, Exalead delivers performance on content and query processing comparable to Google’s. The difference is that the Exalead platform has been engineered to mesh with existing enterprise applications. Google’s approach, on the other hand, requires a dedicated “appliance”. Microsoft takes another approach, requiring customers to adopt dozens of Microsoft servers to build a search enabled application.

On a recent trip to Europe, I learned that Exalead is working to make it easy for a licensee to process content from an organization’s servers as well as certain Internet content. Exalead is an interesting company, and I want to dig into its technical innovations. If I unearth some useful information, I will post the highlights. In the meantime, you can get a feel for the company’s engineering from its Web search and retrieval system. The company has indexed eight to nine billion Web pages. You can find the service here.

Stephen Arnold, October 25, 2008

Twine’s Semantic Spin on Bookmarks

October 25, 2008

Twine is a company committed to semantic technology. Semantics can be difficult to define. I keep it simple and suggest that semantic technology allows software to understand the meaning of a document. Semantic technology finds a home inside of many commercial search and content processing systems. Users, however, don’t tinker with the semantic plumbing. Users take advantage of assisted navigation, search suggestions, or a system’s ability to take a single word query and automatically hook the term to a concept or make a human-type connection without a human having to do the brain work.

Twine, according to the prestigious MIT publication Technology Review, is breaking new ground. Erica Naone’s article “Untangling Web Information: The Semantic Web Organizer Twine Offers Bookmarking with Built In AI” stop just short of a brass band enhanced endorsement but makes Twine’s new service look quite good. You must read the two part article here. For me, the most significant comment was:

But Jim Hendler, a professor of computer science at Rensselaer Polytechnic Institute and a member of Twine’s advisory board, says that Semantic Web technologies can set Twine apart from other social-networking sites. This could be true, so long as users learn to take advantage of those technologies by paying attention to recommendations and following the threads that Twine offers them. Users could easily miss this, however, by simply throwing bookmarks into Twine without getting involved in public twines or connecting to other users.

Radar Networks developed Twine. The metaphor of twine invokes for me a reminder of the trouble I precipitated when I tangled my father’s ball of hairy, fibrous string. My hunch is that others will think of twine as tying things together.

You will want to look at the Twine service here. Be sure to compare it to the new Microsoft service U Rank. The functions of Twine and U Rank are different, yet both struck me as sharing a strong commitment to sharing and saving Web information that is important to a user. Take a look at IBM’s Dogear. This service has been around for almost a year, yet it is almost unknown. Dogear’s purpose is to give social bookmarking more oomph for the enterprise. You can try this service here.

As I explored the Twine service and refreshed my memory of U Rank and Dogear, several thoughts occurred to me:

  1. Exposing semantic technology in new services is a positive development. The more automatic functions can be a significant time saver. A careless user, however, could lose sight of what’s happening and shift into cruise control mode, losing sight of the need to think critically about who recommends what and from where information comes.
  2. Semantic technology may be more useful in the plumbing. As search enabled applications supplant key word search, putting too much semantic functionality in front of a user could baffle some people. Google has stuck with its 1950s, white refrigerator interface because it works. The Google semantic technology hums along out of sight.
  3. The new semantic services, regardless of the vendor developing them, have not convinced me that they can generate enough cash to stay alive. The Radar Networks and the Microsofts will have to more than provide services that are almost impossible to monetize. IBM’s approach is to think about the enterprise, which may be a better revenue bet.

I am enthusiastic about semantic technology. User facing applications are in their early days. More innovation will be coming.

Stephen Arnold, October 25, 2008

SurfRay Round Up

October 24, 2008

SurfRay and its products have triggered a large number of comments on this Web log. On my recent six day trip to Europe, I was fortunate to be in a position to talk with people who knew about the company’s products. I also toted my Danish language financial statements along, and I was able to find some people to walk me through the financials. Finally, I sat down and read the dozens of postings that have accumulated about this company.

I visited the company on a trip to Copenhagen five or six years ago. I wrote some profiles about the market for SharePoint centric search, sent bills, got paid, and then drifted away from the company. I liked the Mondosoft folks, but I live in rural Kentucky. One of my friends owned a company which ended up in the SurfRay portfolio. I lost track of that product. I recall learning that SurfRay gobbled up an outfit called Ontolica. My recollection was that, like Interse and other SharePoint centric content processing companies’ technology, Ontolica put SharePoint on life support. What this means is that some of SharePoint’s functions work but not too well. Third party vendors pay Microsoft to certify one or more engineers in the SharePoint magic. Then those “certified” companies can sell products to SharePoint customers. If Microsoft likes the technology, a Microsoft engineer may facilitate a deal for a “certified” vendor. I am hazy on the ways in which the Microsoft certification program works, but I have ample data from interviews I have conducted that “certification” yields sales.

image

An Ontolica results list.

Why is this important? It’s background for the points I want to set forth as “believed to be accurate” so the SurfRay folks can comment, correct, clarify, and inform me on what the heck is going on at SurfRay. Here are the points I about which comments are in bounds.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta