Autonomy-Repsol Articles at E-Business

July 21, 2011

We’ve found an interesting roundup of Autonomy-related information on the Repsol deal at E-Business Library. What is interesting is that the page looks as if it were assembled automatically. Does Panda have a way to discern auto generated pages.

But automated or not, there’s a lot of information, and Autonomy should be quite happy with whoever created the Repsol page. Here’s an example from one of the documents snippetized by the service. The source is a this press release which sums up the Autonomy Repsol agreement this way:

“Autonomy Corporation plc (LSE: AU. or AU.L), a global leader in infrastructure software for the enterprise, today announced that Repsol, Spain’s largest oil and gas company, has selected Autonomy’s cornerstone technology, IDOL (Intelligent Data Operating Layer) and Autonomy Virage for knowledge management across the enterprise.”

Repsol is a huge company with a LOT of infrastructure to manage. Autonomy provides expert tools for managing and analyzing information, including unstructured data, with their IDOL suite of products. In addition, Autonomy Virage is one of the leaders in video and audio search. Repsol employees will now be able to harness this power to manage their wealth of information and to share across their global operation. Sounds like a good choice.

Check out the roundup of articles at E-Business for more information. If you want to know what Autonomy is doing, you can navigate to Autonomy.com. The firm does a good job of posting information in a timely manner about its deals.

Programmers at Web indexing engines have their work cut out for them. Novices in search may have difficulty discerning the gems published by the addled goose from the pages generated from unknown methods.

Cynthia Murrell July 21, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Exclusive Interview with Margie Hlava, Access Innovations

July 19, 2011

Access Innovations has been a leader in the indexing, thesaurus, and value-added content processing space for more than 30 years. Her company has worked for most of the major commercial database publishers, the US government, and a number of professional societies.

image

See www.accessinn.com for more information about MAI and the firm’s other products and services.

When I worked at the database unit of the Courier-Journal & Louisville Times, we relied on Access Innovations for a number of services, including thesaurus guidance. Her firm’s MAI system and its supporting products deliver what most of the newly-minted “discovery” systems need. Indexing that is accurate, consistent, and makes it easy for a user to find the information needed to answer a research or consumer level question. What few realize is that using the systems and methods developed by the taxonomy experts at Access Innovations is the value of standards. Specifically, the Access Innovations’ approach generates an ANSI standard term list. Without getting bogged down in details, the notion of an ANSI compliant controlled term list embodies logical consistency and adherence to strict technical requirements. See the Z39.19 ANSI/NISO standard. Most of the 20 somethings hacking away at indexing fall far short of the quality of the Access Innovations’ implementations. Quality? Not in my book. Give me the Access Innovations (Data Harmony) approach.

Care to argue? I think you need to read the full interview with Margie Hlava in the ArnoldIT.com Search Wizards Speak series. Then we can interact enthusiastically.

On a rare visit to Louisville, Kentucky, on July 15, 2011, I was able to talk with Ms. Hlava about the explosion of interest in high quality content tagging, the New Age word for indexing. Our conversation covered the roots of indexing to the future of systems which will be available from Access Innovations in the next few months.

Let me highlight three points from our conversation, interview, and enthusiastic discussion. (How often do I in rural Kentucky get to interact with one of the, if not the, leading figure in taxonomy development and smart, automated indexing? Answer: Not often enough.)

First, I asked how her firm fit into the landscape of search and retrieval?

She said:

I have always been fascinated with logic and the application of it to the search algorithms was a perfect match for my intellectual interests. When people have an information need, I believe there are three levels to the resources which will satisfy them. First, the person may just need a fact checked. For this they can use encyclopedia, dictionary etc. Second, the person needs what I call “discovery.” There is no simple factual answer and one needs to be created or inferred. This often leads to a research project and it is certainly the beginning point for research. Third, the person needs updating, what has happened since I last gathered all the information available. Ninety five percent of search is either number one or number two. These three levels are critical to answering properly the user questions and determining what kind of search will support their needs. Our focus is to change search to found.

Second, I probed why is indexing such a hot topic?

She said:

Indexing, which I define as the tagging of records with controlled vocabularies, is not new. Indexing has been around since before Cutter and Dewey. My hunch is that librarians in Ephesus put tags on scrolls thousands of years ago. What is different is that it is now widely recognized that search is better with the addition of controlled vocabularies. The use of classification systems, subject headings, thesauri and authority files certainly has been around for a long time. When we were just searching the abstract or a summary, the need was not as great because those content objects are often tightly written. The hard sciences went online first and STM [scientific, technical, medical] content is more likely to use the same terms worldwide for the same things. The coming online of social sciences, business information, popular literature and especially full text has made search overwhelming, inaccurate, and frustrating. I know that you have reported that more than half the users of an enterprise search system are dissatisfied with that system. I hear complaints about people struggling with Bing and Google.

Third, I queried her about her firm’s approach, which I know to be anchored in personal service and obsessive attention to detail to ensure the client’s system delivers exactly what the client wants and needs.

She said:

The data processed by our systems are flexible and free to move. The data are portable. The format is flexible. The interfaces are tailored to the content via the DTD for the client’s data.  We do not need to do special programming. Our clients can use our system and perform virtually all of the metadata tasks themselves through our systems’ administrative module. The user interface is intuitive. Of course, we would do the work for a client as well. We developed the software for our own needs and that includes needing to be up running and in production on a new project very quickly. Access Innovations does not get paid for down time. So our staff are are trained. The application can be set up, fine tuned, deployed in production mode in two weeks or less. Some installations can take a bit longer. But as soon as we have a DTD, we can have the XML application up in two hours. We can create a taxonomy really quickly as well. So the benefits, are fast, flexible, accurate, high quality, and fun!

You will want to read the complete interview with Ms. Hlava. Skip the pretend experts in indexing and taxonomy. The interview answers the question, “Where’s the beef in the taxonomy burger?”

Answer: http://www.arnoldit.com/search-wizards-speak/access-innovations.html

Stephen E Arnold, July 19, 2011

It pains me to say it, but this is a freebie.

Exalead Embraces SWYM or “See What You Mean”

May 3, 2011

In late April 2011, I spoke with Francois Bourdoncle, one of the founders of Exalead. Exalead was acquired by Dassault Systèmes in 2010. The French firm is one of the world’s premier engineering and technology products and services companies. I wanted to get more information about the acquisition and probe the next wave of product releases from Exalead, a leader in search and content processing. Exalead introduced its search based applications approach. Since that shift, the firm has experienced a surge in sales. Organizations such as the World Bank and PriceWaterhouseCoopers (IBM) have licensed the Exalead Cloudview platform.

I wanted to know more about Exalead’s semantic methods. In our conversation, Mr. Bourdoncle told me:

We have a number of customers that use Exalead for semantic processing. Cloudview has a number of text processing modules that we classify as providing semantic processing. These are: entity matching, ontology matching, fuzzy matching, related terms extraction, categorization/clustering and event detection among others. Used in combination, these processors can extract arbitrary sentiment, meaning not just positive or negative, but also along other dimensions as well. For example, if we were analyzing sentiment about restaurants, perhaps we’d want to know if the ambiance was casual or upscale or the cuisine was homey or refined.

When I probed about future products and services, Mr. Bourdoncle stated:

I cannot pre-announce future product plans, I will say that Dassault Systèmes has a deep technology portfolio. For example, it is creating a prototype simulation of the human body. This is a non-trivial computer science challenge. One way Dassault describes its technology vision is “See-What-You-Mean”. Or SWYM.

For the full text of the April 2011 interview with Mr. Bourdoncle, navigate to the ArnoldIT.com Search Wizards Speak subsite. For more information about Exalead, visit www.exalead.com.

Stephen E Arnold, May 3, 2011

No money but I was promised a KYFry the next time I was in Paris.

Access Innovation Merges Data Harmony and Microsoft SharePoint 2010

April 29, 2011

According to the EContentmag.com article “Access Innovation Integrates Data Harmony with Microsoft SharePoint 2010” Access Innovation hopes its Data Harmony and Microsoft SharePoint 2010 integration will provide clients with even more valuable options. The Data Harmony suite provides users with a content rich thesaurus and management tools to help them organize their information resources. “Data Harmony can be used to provide semantic capabilities to SharePoint to help users take full advantage of their metadata through auto classification, enterprise taxonomy management, entity extraction and enhanced search.”

The new MAIstro program offers users a whole new level of automation services. The software program will automatically index any SharePoint content using a combination of taxonomy and thesaurus database tools. The indexing results obtained “can be more than 90 percent accurate.” Individuals can search a specific subject and even find additional information using related terms. Sounds like the Data Harmony Microsoft SharePoint merger could be the beginning of a beautiful relationship.

April Holmes, April 29, 2011

Freebie but I have been promised a Mexican burrito

Vertical Blog: A New Angle for Online

April 27, 2011

Our Overflight intelligence system tracks certain types of information. There are some basic Overflight services available from the ArnoldIT.com Web log. We have other systems running as well. One of these identified a new blog called Backnotch. Published by Jean Glaceau appears to cover one narrow segment of online information; namely, transactions related to Angola. What’s interesting about the publication is that the content appears to be summaries of publicly-accessible information. The Backnotch service is similar to a traditional abstracting service. The principal difference is that the contributors are offering some broad editorial comments. These comments, plus the collection of articles, comprise a useful resource for anyone looking at what types of open source information cover certain activities associated with Angola and related topics.

According to the About page of the blog:

In my first week of work, I decided to narrow my focus to a handful of issues which are covered in the open source literature. The information I located struck me as similar in some ways to a fictional story or a Hollywood film. Going forward, I want to continue to explore how the open source information follows a particular story and what entities surface in those stories.

The publisher is Jean Glaceau. When we did a couple of queries for him, we found a number of individuals in the hit list. We were not able to determine which Glaceau was running the research project behind the information service. We wrote the email address for the blog, but we had not received an answer as we queued this story for publication.

We checked out the search engine for the service, and it appears to have a backfile of about 60 articles. If Mr. Glaceau keeps up his current pace of content production, the service will generate about 50 to 60 stories each month. Our view is that online has moved from vertical search to vertical “finding” services.

We will check back with Backnotch in a couple of months. Worth a look.

Stephen E Arnold, April 27, 2011

Freebie

Longtop Pumps Up Metadata

April 26, 2011

Longtop Announces Launch of Upgraded Metadata Management Platform,” reports CNBC. China’s highly successful financial services developer/ solutions provider Longtop Financial Technologies Limited is jumping on the metadata bandwagon with its BI.MetaManager V2.0.

Actually, this is an upgrade and expansion, not a brand new product. The company did some custom work in this realm in ’07 and ’08, and deployed version one of BI.MetaManager in 2009 to many of its customers. The article describes the new version:

BI.MetaManager V2.0 offers extended scalability and flexibility for development, improved reliability and user interface, as well as new features such as visualized enterprise data map and cross-platform support of Structured Query Language (SQL) script parsing.

Sounds good. The use of metadata, information about data that is embedded in said data, can be extremely useful when properly managed. Lately, though, many players have been working to capitalize on it; suddenly metadata indexing is the new black. And metadata continues to roil the legal eagles. Is indexing discoverable? Is indexing not discoverable? Who owns metadata? Lawyers will figure this out. In the meantime, indexing helps users, not sure about attorneys.

Cynthia Murrell April 26, 2011

Freebie

Google, Traffic, English 101, and an Annoying Panda

April 21, 2011

I read a snippet on my iPad and then the full story in the hard copy of the Wall Street Journal “Sites Retool for Google Effect.” You can find this story on hard copy page B 4 in the version that gets tossed in the wet grass in Harrod’s Creek, Kentucky. Online, not too sure anymore. This link may work. But, then again, maybe not.

The point of the story is that Google has changed its method of determining relevance. A number of sites mostly unfamiliar to me made the point that Google’s rankings are important to businesses. One example was One Way Furniture, an outfit that operates in Melville, New York. Another was M2commerce LLC, an office supply retailer in Atlanta, Georgia. My take away from the story is that these sites’ owners are going to find a way to deliver content that Google perceives as being relevant.

image

A panda attack. Some Web site owners suffer serious wounds. Who are these Web site owners trying to please? Google or their customers? Image source: http://tomdoerr.wordpress.com/2011/03/25/whos-in-the-house-panda-in-da-house/

I don’t want to be too much like my auto mechanic here in Harrod’s Creek, but what about the customer? My thought is that if one posts information, these outfits should ask, “What does our customer need to make an informed decision?” The Wall Street Journal story left me with the impression, which is probably incorrect, that the question should be, “What do I need to create so Google will reward me with a high Google rank?”

For many years I have been avoiding search engine optimization. When I explained how some of Google’s indexing “worked” on lecture tours for my 2004-2005 Google monograph, The Google Legacy, pesky SEO kept popping up. Google has done a reasonable job of explaining how its basic voting mechanism worked. For those of you who were fans of John Kleinberg, you know that Google was influenced to some extent by Clever. There are other touch points in the Backrub/Google PageRank methods disclosed in the now famous PageRank patent. Not familiar with that document? You can find a reasonable summary on Wikipedia or in my The Google Legacy.

If we flash forward from 1996, 1997, and 1998 to the present, quite a bit has happened to relevance ranking in the intervening 13 to 15 years. First, note that we are talking more than a decade. The guts of PageRank remain but the method has been handled the way my mother reacted to a cold day. She used to put on a sweater. Then she put on a light jacket. After adding a scarf, she donned her heavy wool coat. Underneath, it was my mom, but she added layers of “stuff” to keep her warm.

image

All wrapped up, just slow moving with reduced vision. Layers have and operational downside.

That’s what has happened, in part, to Google. The problem with technology is that if you build a giant facility, it becomes difficult, time consuming, and expensive to tear big chunks of that facility apart and rebuild it. The method of change in MBA class is to draw a couple of boxes, babble a few buzzwords, get a quick touch of Excel fever, and then head to the squash court. The engineering reality is that the MBA diagrams get implemented incrementally. Eventually the desired rebuild is accomplished, but at any point, there is a lot of the original facility still around. If you took an archaeology class for something other than the field trips, you know that humans leave foundations, walls, and even gutters in place. The discarded material is then recycled in the “new” building.

How this apply to Google? Works the same way.

How significant are the changes that Google has made in the last few months? The answer is, “It depends.”

Google has to serve a number of different constituencies. Each constituency has to be kept happy and the “gravity” of each constituency carefully balanced. Algorithms, even Google algorithms, are still software. Software, even smart software that scurries to a look up table to get a red hot value or weight, is chock full of bugs, unknown dependencies, and weird actions that trigger volleyball games or some other mind clearing activity.

image

Google has to make progress and keep its different information “packages” in balance and hooked up.

The first constituency is the advertiser. I know you think that giant companies care about “you” and “your Web site”, but that is just not true. I don’t care about individuals who have trouble using the comments section of this blog. If a user can’t figure something out, what am I supposed to do? Call WordPress and tell them to fix its comments function because one user does not know how to fill in a Web form? I won’t do that. WordPress won’t do that. I am not confident you, gentle reader, would do that. Google has to fiddle with its relevance method because there are some very BIG reasons to take such a risky and unknown charged step as slapping another layer of functionality on top of the ageing PageRank method. My view is that Google is concerned enough to fool with plumbing because of its awareness that the golden goose of Adwords and Adsense is honking in a manner that signals distress. No advertisers, no Google. Pretty simple equation, but that’s one benefit from living in rural Kentucky. I can only discern the obvious.

Read more

Protected: SharePoint: Time Is Money

April 18, 2011

This content is password protected. To view it please enter your password below:

Autonomy Boosts the Discipline of Indexing

April 14, 2011

We found the story “Indexer Flourishes as Search Fails” quite interesting. A few days ago Autonomy, a global leader in enterprise software and “meaning based computing”, released its new service pack for  WorkSite Indexer 8.5 as well as for its new Universal Search Server. While the indexer has done well and received many good reviews, the notion of a “universal server” is a difficult concept. The pre-Microsoft Fast Search & Transfer promised a number of “universal” functions. When “universal” became mired in time consuming and expensive local fixes, some vendors did a global search and replace.

The service pack touts a new Autonomy control center which simplifies the management structure of a multi server environment, improved query returns, additional control over Autonomy’s IDOL components, and an automatic restart feature in case service is snarled due a problem outside of Autonomy’s span of control during a crawl. Network latency continues to be an issue despite the marketing hoo-hah about gigabit this and gigabit that. Based on the information we have at ArnoldIT.com, thus far the service pack has been deployed with little or no trouble.

We have heard some reports that the the Universal Search Server can create some extra perspiration when one tries to deploy multiple WorkSite engines. According to the article cited above, we learned:

Autonomy has identified this as a high priority issue and expects to have a resolution out in the very near future.

Autonomy has been among the more responsive vendors of enterprise solutions. We are confident a fix may be available as you read this or in day or two. If you are an Autonomy licensee, contact your reseller or Autonomy.

Stephen E Arnold, April 14, 2011

Freebie but maybe some day?

Protected: MadCap and MadPak

March 30, 2011

This content is password protected. To view it please enter your password below:

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta