Cambridge Semantics, Simplifying Information Exchange for Business
July 14, 2010
Semantic technology has often been viewed as something better left to the IT professionals with the ability and know how to track important business information and sort through what’s important to businesses’ everyday operations.
Now a Boston company, Cambridge Semantics, is looking to change that with semantic middleware that will benefit the end user and allow them to use semantic technology without the technical expertise. The hope is that it will help to make sense of some of the information that is stored within Excel spreadsheets.
At first glance, this looks like an interesting prospect but this Anzo software could be a stretch, especially where numbers are concerned. Still, this attempt by Cambridge Semantics at simplifying the exchange of information for business is well conceived.
Rob Starr, July 14, 2010
Concept Searching Offers Taxonomy Management
July 13, 2010
SharePoint just got better thanks to Concept Searching. They’ve just announced the addition of a Distributed Taxonomy Management feature that will work within the conceptClassifier for SharePoint.
The experts all agree this is a good move, but one that should have been adopted by SharePoint as a foundation for their product. Nevertheless, it’s here now and will be a boon to companies with large document libraries and taxonomy needs.
Transparency for the end user is one of the special features of this application and a central server coordinates all the locking and unlocking of the nodes.
The whole idea that Concept Searching offers Taxonomy Management is of little surprise to an industry familiar with their work. When it comes to statistical metadata generation, this is the only classification software company in the world using concept extraction and compound term processing to provide access to information. The company founded in 2002.
Rob Starr, July 13, 2010
Freebie
Semantic Valley, Italy
July 12, 2010
In the hills of Italy, there is a Semantic think tank waiting for you to throw them a curveball. A recent Semantic Web article, “Semantic Valley Consortium Wants to Help Business Get Going With the Semantic Web,” detailed the creation of a conceptual linguistics organization with ties to IBM, Expert System and Oracle. Situated in the northern Trentino Valley of Italy, Semantic Valley focuses on moving thought processes away from standards-focused and academic discussions to determining ways to improve ROI on things previously thought impossible. “The consortium says it hopes to potentially produce new products and contribute to the content technology market by promoting synergy between research centers and companies,” the article claims.
This is an exciting prospect because the more thought put into semantic technology, the better machines will understand the world wide Web.
Are these three leaders following in the footsteps of the First Triumvirate. Of the three, I know Julius Caesar ended up on top only to get immortality, a great deathbed quote, and the rumor that he plotted the whole show for the benefit of his grand nephew.
Will history repeat itself? Is the Semantic Valley on Google Maps?
Pat Roland, July 10, 2010
Freebie
Cloud, Semantics, and Cancer Fighting
July 10, 2010
There has been some progress that shares semantic Web and cloud technology in detecting lung cancer. The National Cancer Institute’s Early Detection Research Network (EDRN) and the non profit Canary foundation are working together here. This reported in Semanticweb.com.
The preliminary tests have been so encouraging in fact, the combined effort is looking to move forward and get NASA’s Jet Propulsion Laboratory (JPL) to analyze the results. The project is hoping to use the computer technology to analyze at least some of the results that it has, and for humans to be able to collaborate on the collected results to see if cloud computing can someday combat lung cancer.
This is obviously something that’s starting out in its infancy stage, but to even consider that a dent can be made in the 157,300 lung cancer deaths that are forecast for 2010 using a computer driven approach is encouraging.
Rob Starr, July 10, 2010
Sophia Search Lands Venture Funding
July 9, 2010
Wisdom is a good name for a search and content processing system. If you live in rural Kentucky, the Greek becomes “Sophia”, which denotes wisdom. (Gentle reader, “wisdom” is not highly prized in Harrod’s Creek.)
The news that Sophia Search (founded in 2007) landed $1.2 million in seed money reached me via Marketwire. The investors include Volcano, based in Belfast, and Javelin Ventures in London. The story’s title was effective in arresting my attention: “Sophia Search Secures Largest Angel Investment in Northern Ireland to Address Global Demand for Next-Generation Enterprise Search and Discovery.” The news item said:
Sophia’s technology is purpose built on the company’s unique, patented, Contextual Discovery Engine (CDE) based on the linguistical model of Semiotics, the science behind how humans understand the meaning of information in context. The CDE platform automatically detects relationships and themes in unstructured content to enable organizations to seamlessly search, extract, deduplicate and eliminate redundancy of content to minimize risk and reduce the cost of retrieving, storing and managing enterprise information.
The news story revealed that Sophia is built on a patented, next-generation search engine platform. The system can “automatically discover relationships and themes in unstructured content.”
The company, according to my notes, is a spin out from University of Ulster and Saint Petersburg State University. Sophia Search was one of the companies recognzed by the PricewaterhouseCooper entrepreneur competition. (Keep in mind that I do work for the outfit that help PricewaterhouseCoopers conduct these entrepreneur competitions.)
A quick trip to our Overflight system yielded some useful nuggets about this company. The Sophia Search white paper, dated January 2009, pointed out that the method is “fundamentally different to [sic] any other search tool.” The white paper continued:
These tools are based on ideas & principles drawn from disciplines such as Signal Processing or Mathematics. These ideas are ‘borrowed’ from these disciplines and applied to text retrieval to provide search. In Sophia we believe that in order to retrieve useful information for users we must first understand its meaning and as such we build Sophia upon the recognised linguistical model of Semiotics.
The system “understands” the context in which a word or phrase is used. The white paper said: “In order to understand the meaning of a word it must be taken within the context of other words around it.” We agree. Key word indexing is one reason why most search systems drive users to distraction.
The white paper introduces the idea of “intertextuality”. Here’s what the Sophia white paper says:
All texts are rehashes of previously existing ones and in order to understand them properly they must be read within the context of all information available that is related to them.
Many search engines remain ignorant of what has been previously processed. Google’s programmable search engine includes a context server which addresses this problem in the context of Ramanathan Guha’s method. But Google does not as far as I know offer its context server technology to third parties. Sophia’s engineers are heading down an interesting path in my opinion.
The system processes content, picks out key themes, and then clusters the pointers into “themes”. The idea is that a search rturns content which is “topically similar”. According to the write up in the University of Ulster’s U2B newsletter (Winter 2007), Dr. David Patterson, one of the founders of the company, revealed:
Sophia just doesn’t ind relevant information for customers, it also empowers them with an understanding of the meaning of the information returned. Using conventional search is akin to using a torch in a dark room (the torch represents the search engine and the room, an organisation’s information). Only the parts of the room that have the beam of light focussed on them can be seen at any one time, with limited understanding of the information in view. Using SOPHIA is like licking the switch for a bright ceiling light. The whole room can be seen and all information understood at once.
If you are into technical papers, you can get a feel for the system’s method in “Sophia: An Interactive Cluster-Based Retrieval System for the OHSUMED Collection,” published in 2005.
With some search systems fading, new entrants often find eager audiences. Will Sophia become a break out solution? We wish the Sophia team the best.
Stephen E Arnold, July 9, 2010
Freebie
Expert System Honored
July 5, 2010
There were no teary-eyed speeches or red carpet interviews, but the business world recently awarded its version of the Oscar to a company doing exciting things with searches. The Stevie Awards annually select the best and brightest companies around the globe for its awards. This year was no exception, because its Best New Product or Service winner was search and semantic technology innovator Expert System USA () for its COGITO Focus program. This search platform improves search capabilities and interactive analysis for all data. “This allows users to have insight into both structured and unstructured content, both internally and externally, including RSS feeds, Web pages and social networks,” the company says. This honor is another sign that search companies are gaining significant traction and respect in the business world. A happy quack from the goose pond. In September 2010, ArnoldIT.com will feature the Expert System technology in its demonstration series. Watch the blog and the Expert System’s Web site for details.
Stephen E Arnold, July 5, 2010
Semantic Search Retooled at Yahoo
June 25, 2010
Yahoo!’s Developer Network blog published a brief interview with Peter Mika, a wizard from Yahoo’s Research Division. The topic is semantic search, also the topic of a speech he delivered on the 19th, called “The future face of Search is Semantic for Facebook, Google, and Yahoo!”. But I wonder about the “growing interest” mentioned and how the short article frames semantic search as something that’s just now happening (“Semantic Search will also bring entirely new functionality.”) The future is already here; semantic search and developing metadata to use in search is fairly old hat. The article specifically mentions Sindice, a semantic search engine we noted in 2008. While I’m sure there’s good basic info on semantic search to be had, I’m more inclined to see this release as a pitch for Yahoo!’s SearchMonkey. In fact, how do these different semantic efforts fit together? Where’s the intersection with the Bing?
Jessica West Bratcher, July 25, 2010
Freebie
Elsevier Buys Collexis
June 22, 2010
Elsevier continues to add to its search and content processing arsenal. With the cost of human indexing gushing like the BP oil spill, Elsevier is looking for magic to use for publishing scientific, technical, and medical information products and services. Elsevier is the giant company behind journals like The Lancet and the encyclopedia of Mosby reference books. In terms of indexing, sci-tech is easier to machine index than chatty Twitter tweets. To bolster the firm’s multiple methods, Elsevier acquired Collexis Holdings, a semantic technology and software developer. The plan is that the Collexis technology will give Elsevier the ability to help researchers and institutions take advantage of more avenues for finding data and publishing results, creating a better ROI. Is it a good plan? Yahoo has been a practitioner of this approach for years. Perhaps Elsevier can craft a success from this Yahoo-style approach. Now those Collexis assets have to be fine tuned and installed before the company or its clients will start seeing benefits. But kudos for Elsevier for making a positive step.
Jessica West Bratcher, June 22, 2010
Freebie
IBM Back in the NLP Game?
June 22, 2010
IBM has some interesting technology. Like Xerox Parc, good ideas do not necessarily become market-dominant products or services. Remember Stairs III? If you answer, “No,” there you go. I am never sure whether IBM has come up with a great innovation or if it has fired up its public relations machine. With $100 billion in revenue and a motto like “Think”, how can you go wrong betting on IBM?
My my lonely perch in Madrid, I read a Slashdot item here that pointed me to and IBM Web site which timed out and to the fee starved New York Times story “The Watson Trivia Challenge.” The NYT link may be dead when you read my blog post. Be prepared to go hunting in a Vanderbilt-inspired “go-ahead” way. If you are lucky, you will be able to read “Designing a computer that can process and understand natural language.” Here’s a snippet from the pokey IBM Web site:
Known as a Question Answering (QA) system among computer scientists, Watson has been under development for more than three years. According to Dr. David Ferrucci, leader of the project team, “The confidence processing ability is key to winning at Jeopardy! and is critical to implementing useful business applications of Question Answering.” Watson will also incorporate massively parallel analytical capabilities and, just like human competitors, Watson will not be connected to the Internet, or have any other outside assistance.
The idea is that IBM’s technology can play a popular game show better than I can. No contest. I don’t know what the show is nor do I excel at answering questions. For example, I am baffled at such questions as:
- Why does the IBM.com Web site time out?
- Why can’t I locate information via the search box on IBM.com?
- Why is IBM technology focused on search engine optimization, consulting, and beating game show contestants chosen because each can jump up and down, make good television, and give the host an easy target for sly humor?
- Isn’t this “older” news recycled in what seems to be a World Cup week?
Call me a silly goose, but tracking the IBM innovations which seem to have no significant impact on my information seeking life is confusing. A final question, Watson, “Why is this the case?”
Bring up the theme music. Buzzz. Time’s up. Next week’s contestant? Ask.com. See you then.
Stephen E Arnold, June 22, 2010
Freebie
Semantic Search Explained
June 19, 2010
I get asked about semantic search one a day, often more frequently. I usually say, “Semantic search means software can figure out what something is about.” If that does not do the trick, I trot out the more detailed explanation Martin White and I put in our 2009 study “Successful Enterprise Search Management.”
I neglected to write about “10 Things that Make Search a Semantic Search.” The informton in that write up by the founder of Hakia, Dr. Riza C. Berkan is useful. If you have not reviewed the write up, you will want to put this reading on your To Do list.
I don’t want to reproduce the full list. Navigate to the original article and work through. I do want to highlight three points with which I agree.
First, a semantic search can handle synonyms. Languages are like roads in Kentucky, full of potholes. Disambiguation and figuring out synonyms are two important tasks. Their presence signals a semantic component in the content processing system.
Second, a search systm that can present a snippet or a highlight of the key sentence of paragraph is quite useful. I find that some snippeting technology is designed to meet the needs of folks selling ads. The snippeting function I want works with the honesty and zeal of a prisoner who is due to be released from prison in two days.
Finally, a user can enter a query without having to formulate a query with Boolean operators or special instructions such as CC=. Systems have to be smart but not biased or tilted for the benefit of advertisers. Objectivity is important in delivering this type of query support. Alas, I think this is a difficult goal to achieve. Humans are humans and often prefer to click the ad for a vacation rental than running a query and perusing results, then making an informed decision.
A happy quack to Hakia for the post.
Stephen E Arnold, June 19, 2010
Freebie