OpenText Joins Semantic Web Race

March 25, 2011

Nstein, the Quebec based content administration merchant recently acquired by Open Text, announced the release of a new version of the popular Semantic Navigation software. In a notice on the company’s blog, “Open Text Semantic Navigation Now Available.” The write up presented a lengthy laundry list of features and functions.

Boiling the article down to a sentence or two proved difficult. We believe that OpenText now offers a crawling and indexing system that supports faceted navigation. But there is an important twist. The semantic tool has a search engine optimization and sentiment analysis component as well. The article asserts:

[A licensee can] enrich content–including huge volumes of uncategorized content–by automatically analyzing and tagging it with metadata to help discern relevant and insightful keywords, topics, summaries, and sentiments.

The list of features and functions is lengthy. There is additional information available. Public information is available at this link, but you will need an OpenText user name and password to access the content at this link.

If the product performs according to the descriptions in the source article, a number of OpenText’s competitors will be faced with significant competition.

Stephen E Arnold, March 25, 2011

Freebie

More Semantic Action: Topicmarks

March 21, 2011

Mint Founder Invests in Semantic Text Startup” asserts some interesting semantic activity.  Mashable.com reports that the company in question, Topicmarks, is raising a healthy bankroll from qualified investors with the help of the man behind Mint.com, Aaron Patzer.

Mr. Patzer certainly knows what a successful startup looks like, having sold his own creation for $170 million last year.  Patzer on Topicmarks:

These guys have tech developed over the last three years, a team of strong engineers, deals in place with Evernote and ShareVault already. Also, I’ve done a dive into the algorithms behind the system, which are impressive and being patented.

What Topicmarks has managed to put together is a semantic tool that can read, analyze and summarize uploaded documents.  It can quickly process hundreds of pages and distill the content into a manageable ten minute or less read.  The company itself likens the technology to a personalized version of CliffsNotes for the text of one’s choosing.  Not surprisingly, this is already becoming a favorite among students, but not so much teachers.

The product is also being laced together with popular cloud services such as DropBox for added functionality.  Besides the numerous uses already in play, it’s easy to see the future applications of Topicmarks considering its ability to read varied formats.  Need the key points from an RSS feed?  Burdened with an overflowing email inbox?  These are soon to be problems of the past.

If you want to see the technology firsthand, visit the Topicmarks Web site link above.  Currently, the beta version is free to run and it is ridiculously easy.  I uploaded a 111-page .pdf as a test, receiving pretty good results in a matter of minutes.  My only question is: am I qualified to be an investor?

Sarah Rogers, March 21, 2011

Freebie

Microsoft and Its Struggles with the Language of Search

March 19, 2011

I struggle with the language used to describe the language of search. So if my interpretation of “Cognition with Semantic Technology to Microsoft’s Bing” is correct, it seems Microsoft has decided to supplement or even step away from Powerset. Microsoft paid about $100 million for this natural language processing and semantic system in 2008.

After a bit of sleuthing I learned of the cessation of the Powerset toolbar’s Wikipedia function.

My attempts to access the engine’s homepage pointed to Bing.com, which is not all that unusual after a big company gobbles a small one. According to the story in the somewhat erratic Search Engine Optimization GB write up, I learned:

The non-exclusive license agreement allows Microsoft to embed elements of the semantic technologies cognition is in any Microsoft application that would benefit from an “understanding” of the English language. First, it is used to the user experience in Google, to improve Microsoft’s online decision engine.

Is this old news or new news? We noted one deal between Cognition and Microsoft in our May 2010 story “Cognition and Bing.”

We are not sure if this is the same deal or the old deal recycled as a new deal. No wonder Microsoft struggles with the language of search. We struggle to understand the semantic technologies Microsoft employs in Bing.com and Fast Search.

Micheal Cory, March 19, 2011

Freebie

Facebook, Semantic Search, and Bad News for the Key Word Crowd

March 16, 2011

You can wade through the baloney from the pundits, satraps, and poobahs. I will cut to the chase. Facebook can deliver a useful search service without too many cartwheels. There are three reasons. (If you want to complain, that’s what the comments section of the blog permits. Spare me personal email and LinkedIn comments.)

First, there are upwards of 500 million users who spend more time in Facebook doing Facebook things than I would have ever believed. I don’t do “social” but 500 million or more people see me as a dinosaur watching the snow flakes. Fine.

Second, the Facebook users stuff links in their posts, pages, wall crannies, and everywhere else in the Facebook universe they can. This bunch of urls is a selection filter that is of enormous value to Facebook users. Facebook gets real people stuffing in links without begging, paying, or advertising. The member-screened and identified links just arrive.

Third, indexing the content on the pages to which the links refer produces an index that is different from and for some types of content more useful to Facebook members than laundry lists, decision engine outputs, or faceted results from any other system. Yep, “any other”. That situation has not existed since the GOOG took the learnings of the key word crowd, bought Oingo, and racked up the world’s biggest online advertising and search engine optimization operation in the history of digital mankind.

Navigate to “New Facebook Patent: the Huge Implications of Curated Search” and learn Bnet’s view of a patent document. I am not as excited about the patent at the Bnet outfit, but it is interesting. If one assumes that the patent contributes to the three points I identified above, Facebook gets a boost.

But my view is that Facebook does not need much in the way of a boost from semantics or any other hot trend technology. Facebook is sitting on a search gold mine. When Facebook does release its index of member-provided sources, four things will take place over a period of “Internet” time.

  1. The Google faces a competitor able to index at lower cost. Google, remember, is a brute force operation. Facebook is letting the members do the heavy lifting. A lower cost index of Facebook-member-vetted content is going to be a threat. The threat may fizzle, but a threat it will be to the Google.
  2. Users within Facebook can do “search” where Facebook members prefer to be. This means that Facebook advertising offers some interesting opportunities not lost on the Xooglers who now work at Facebook and want a gigantic payday for themselves. Money can inspire certain types of innovation.
  3. Facebook is closed. The “member” thing is important to keep in mind. The benefits of stateful actions are many, and you don’t need me to explain why knowing who a customer is, who the customer’s friends are, and what the customer does is important. But make the customer a member and you get some real juice.
  4. Facebook competitors will have to find a way to deal with the 500 million members and fast. Facebook may not be focused on search, but whatever the company does will leverage the membership, not the whizzy technology.

Bottomline: Facebook has an opportunity in search whether it does laundry lists, facets, semantics, or any combination of methods. My question, “When will Facebook drop its other social shoe?”

Stephen E Arnold, March 16, 2011

Freebie unlike the ads big companies will want to slap into Facebook outputs for its members

Digital Reasoning Garners Patent for Groundbreaking Invention

March 16, 2011

There are outfits in the patent fence business. Google, Hitachi, and IBM come to my mind. The patent applications are interesting because they provide a window through which one can gaze at some of the thinking of the firm’s legal, engineering and management professionals.

Then there are outfits who come up with useful and novel systems and methods. The Digital Reasoning patent US7882055, “Knowledge Discovery Agent System and Method”, granted on February 1, 2011, falls into this category. The patent application was filed in July 2007, so it took the ever efficient USPTO about 48 months to figure out what struck me when I first read the application. But the USPTO makes its living with dogged thoroughness. I supplement my retirement income by tracking and following really smart people like Tim Estes. I make my judgments about search and content processing based on my experience, knowledge of what other outfits have claimed as a unique system and method, and talking with the inventor. You can read two of my conversations with Tim Estes in the ArnoldIT.com Search Wizards Speak series. The link to my 2010 interview and my 2011 interview are at www.arnoldit.com/search-wizards-speak. (I did an interview with a remarkable engineer, Abe Music, at Digital Reasoning here.) Keep in mind that I was able to convert my dogging of this company to a small project this year. Hooray!

The guts of the invention are:

A system and method for processing information in unstructured or structured form, comprising a computer running in a distributed network with one or more data agents. Associations of natural language artifacts may be learned from natural language artifacts in unstructured data sources and semantic and syntactic relationship may be learned in structured data sources, using grouping based on a criteria of shared features that are dynamically determined without the use of a priori classifications, by employing conditional probability constraints.

I learned from my contacts at Digital Reasoning:

The pioneering invention entails intelligent software agents that extract meaning from text as humans do – by analyzing concepts and entities in context. The software learns as it runs, continually comparing new text to existing knowledge.  Associated entities and synonym relationships are automatically discovered and relevant documents are identified from across extremely large corpora.
The patent specifically covers the mechanism of measurement and the applications of algorithms to develop machine-understandable structures from patterns of symbol usage. In addition, it covers the semantic alignment of those learned structures from unstructured data with pre-existing structured data – a necessary step in creating enterprise-class entity-oriented systems. The technology as implemented in Synthesys (TM)? provides a unique and now protected means of bringing automated understanding to end users in the
enterprise and beyond.

So what’s this mean?

The Traditional Method The Digital Reasoning Method
image image

In financial analysis, health information, and intelligence applications which do you want to you and your colleagues to use? I go for the Veyron. The 1998 Mustang is great as a back up or knock about. The Veyron means business in my opinion.

Three points:

  1. This is a true “beyond text” system and method. Key word search and 1998-type methods cannot deliver Synthesys 3.0 (TM) functionality
  2. Users don’t want laundry lists. The invention delivers actionable information. The value of the method is proven each day in certain very important applications which involve the top concerns of Maslow’s hierarchy
  3. The system can make use of human inputs but can operate in automatic mode. Many systems include automatic functions, but the method invented by Mr. Estes is a new one. Think of the difference in performance between a 1998 Mustang and the new Bugatti Veyron. Both are automobiles, but there is a difference in state of the art a long time ago and state of the art now.

If you want more information about Digital Reasoning, the company’s Web site is www.digitalreasoning.com.

Stephen E Arnold, March 15, 2011

Freebie but I want a T shirt from Music Row in Nashville

Yahoo and Semantic Search

March 14, 2011

I thought Yahoo was into Bing.com search. Bing.com, of course, has semantic functions galore. But Yahoo?

You can learn about Yahoo and its view of semantic search at “Be a Part of the Next Wave of Web Search.”

Unlike companies rolling out a new product or service, Yahoo is running a competition. The requirements? Here’s a snippet:

… the competition calls for participants to answer queries varying
in complexity, based on a set of structured data collected from
the Web. The results will be presented at the 4th International
Workshop on Semantic Search, co-located with the World Wide Web
Conference 2011 in Hyderabad, India.

Sounds good. Will Microsoft engineers enter? Will there be some Googlers?

Yahoo seems to think that semantics are going to help users cope with Web content and improve relevance. Semantic methods will help filter, cull, and hone information. Yahoo’s goal is to make search more useful. via semantics.

But what about that Bing.com tie up? What about Microsoft’s semantics from Powerset to Cognition Technologies?

Micheal Cory, March 14, 2011

Freebie

Rosette Linguistics Platform Releases Latest Version

March 10, 2011

Basis Technology has announced its most recent release if its Rosette Linguistics Platform. Rosette is the firm’s multilingual text analytics software. Among the features of the new release is the addition of Finnish, Hebrew, Thai, and Turkish to the system’s 24 language capability. One point that we noted is that this release of Rosette sports an interesting mix of compatible search engines. According to the Basis Tech announcement:

“Bundled connectors enable applications built with Apache Lucene, Apache Solr, dtSearch Text Retrieval Engine, and LucidWorks Enterprise to incorporate advanced linguistic capabilities, including document language identification, multilingual search, entity extraction, and entity resolution.”

Several observations seem warranted. First, Basis Tech is moving beyond providing linguistic functionality. The company is pushing into text analytics and search. Second, Basis Tech is supporting commercial and open source search systems; namely, the SharePoint centric dtSearch and the Lucid Imagination’s open source solution.

The question becomes, “What is the business trajectory of Basis Tech? Will it become a competitor to the vendors with which the company has worked for many years? Will it morph into a new type of linguistic-centric analytics firm?” Stay tuned.

Cynthia Murrell, March 10, 2011

Freebie

Patterns in Web Content

March 10, 2011

Data mining refers to a form of application which seeks common themes or patterns in specific pools of information. The core of its popularity rests within the scientific communities, though the technology is increasingly being applied in the various arteries of the commercial sector.

The exponential growth of the Web has brought into focus the necessity for the ability to trace and scrutinize the relationships inherent in the aforementioned collections of information.
The Computational Linguistic & Psycholinguistics Research Center (CLiPS) located in Belgium has just released Pattern, a mining unit that was designed to couple with the Python language system. The Pattern Web site says:

“It [Pattern] bundles tools for data retrieval (Google + Twitter + Wikipedia API, Web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks).”

When you follow the link above, you can access the release directly. Check out the the specifications for compatibility.

I thought it interesting to discover the designers, in a trial of their creation, used the software to track the progress of a local politicians in the 2010 elections in their home country. Pattern scanned thousands of Tweets, split between two languages, updating the data pool on a daily basis. The results were fascinating. You can read a detailed description of the experiment here.

Micheal Cory, March 10, 2011

Meaning in Casual Wording

March 3, 2011

I love science.  Paired with my increasing passion for language and grammar, a sweeter cocktail could hardly be imagined.  “Do Casual Words Betray Warlike Intent?” was a fascinating read.

At the recent American Association for the Advancement of Science (AAAS) meeting, James Pennebaker, a University of Texas at Austin psychologist spoke about the study he and assorted colleagues along with the Department of Homeland Security have been engaged in recently.  The focus of the research has been on four similar Islamic groups and the relationship between the speech they employ and the actions that follow.  The collective hope is the study’s findings can be used to forecast aggressive activity.

Isolating pronouns, determiners, adjectives and prepositions, the group mines them for what Pennebaker calls “linguistic shifts”.  To date they have determined that of the four, the two groups who have committed acts of violence, telegraphed said destructiveness with the use of “more personal pronouns, and words with social meaning or which convey positive or negative emotions.”  Aside from differentiating between various stylistic elements of expression, Pennebaker has also scrutinized statements made by warmongers from our past, including George W. Bush, with interesting results.

Skepticism has always fueled scientific endeavors, and we must continue to ask questions, especially those that breed discomfort.  This science deals with a very grey area and Pennebaker himself labels the results as only “modest probabilistic predictions”.  There is no question that this information must be used responsibly, but my aforementioned appreciation for the field keeps me from seeing this as a negative.

If one can discern an opponent’s intent in a fight or a game of cards by careful observation, why is it so strange to think the same could be done from listening to what they say?

Sarah Rogers, March 3, 2011

Freebie

Sophia Search Co-Founder Speaks

March 1, 2011

Sophia Search offers an alternative to key word retrieval. What’s the secret behind this new system? The Search Wizards Speak series provides some insight into Sophia Search with its most recent interview with Dr. David Patterson.

You can read an exclusive interview with the co-founder of the Belfast, Ireland-based enterprise search vendor Sophia Search on ArnoldIT.com. Dr. Patterson explains his search system’s use of semiotics to discern the meaning of textual information. The result is that a user finds the information required more quickly, thus reducing the need to run multiple queries or plow through a long, laundry list of query results.

In the interview, Dr. Patterson said:

I prefer to call Sophia a “contextual discovery engine.” Sophia can automatically disambiguate the different meanings of words based on their context within a document. In short, Sophia searches by the meaning of what the user is looking for as opposed to just the key words they use in their query. Sophia enables users to discover contextually relevant information they were previously unaware of, and it increases the users’ understanding of their content. One of the benefits of our technical approach is that Sophia operates without human guidance or training, and it does not require taxonomies, ontologies or thesauri.

He added:

Conventional search tools and systems don’t address the discovery component of search. How can the user query for information they don’t know exists? Finally, we were fascinated by solving what we call “the context problem”. Most systems simply do not understand the context of information. Therefore, most search and retrieval systems provide a lot of irrelevant hits to the user. Sophia is all about context and providing users with relevant information in the right context. It is about understanding the meaning of what the user is looking for, not simply returning lists of documents just because they contain the user’s query terms.

You can examine a screen shot of the Sophia Search output along with Dr. Patterson’s comments about the system and method used in this enterprise search system.

You can get the full text of the interview at this link.

Stephen E Arnold, March 1, 2011

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta