November 21, 2014
SemanticWeb.com posted an article called “Retrieving And Using Taxonomy Data From DBpedia” with an interesting introduction. It explains that DBpedia is a crowd-sourced Internet community whose entire goal is to extract structured information from Wikipedia and share it. The introduction continues that DBpedia already has over three billion facts W3C standard RDF data model ready for application use.
The W3C standards are already written using the SKOS vocabulary, primarily used by the New York Times, the Library of Congress, and other organizations for their own taxonomies and subject headers. Users can extrapolate the data and implement it in their own RDF applications with the goal of giving your data more value.
DBpedia is doing a wonderful service for users so they do not have to rely on proprietary software to deliver them rich taxonomies. The taxonomies can be retrieved under the open source community bylaws and gain instant improvement for content. There is one caveat:
“Remember that, for better or worse, the data is based on Wikipedia data. If you extend the structure of the query above to retrieve lower, more specific levels of horror film categories, you’d probably find the work of film scholars who’ve done serious research as well as the work of nutty people who are a little too into their favorite subgenres.”
Remember Wikipedia is a good reference tool to gain an understanding of a topic, but you still need to check more verifiable resources for hard facts.
November 20, 2014
The article titled The Power of Semantics on Research Information investigates the advancements in semantic enrichment tools. Scholarly publishers are increasingly interested in enabling their users to browse the vast quantity of data online and find the most relevant information. Semantic enrichment is the proposed solution to guiding knowledge-seekers to the significant material while weeding out the unnecessary and unrelated. Phil Hastings of Linguamatics, Daniel Mayer of Temis and Jake Zarnegar of Silverchair were all quoted at length in the article on their views on the current usages of semantic enrichment and its future. The article states,
“Daniel Mayer, VP product and marketing at TEMIS, gave some examples of the ways this approach is being used: ‘Semantic enrichment is helping publishers make their content more compelling, drive audience engagement and content usage by providing metadata-based discoverability features such as search-engine optimisation, improved search, taxonomy/faceted navigation, links to structured information about topics mentioned in content, “related content”, and personalisation.”
Clearly, Temis is emphasizing semantics. Mayer and the others also gave their opinions on how publishers in the market for semantic enrichment might go about picking their partners. Some suggestions included choosing a partner with expertise within the field, an established customer base and the ability to share best practices.
Chelsea Kerwin, November 20, 2014
October 28, 2014
Partnerships offer companies ways to improve their product quality and create new ones. Semantic Web reports that “Expert System And WAND Partner For A More Effective Management Of Enterprise Information.” Expert System is a leading semantic technology company and WAND is known for its enterprise taxonomies. Their new partnership will allow businesses to have a better and more accurate way to organize data.
Each company brings unique features to the partnership:
“The combination of the strengths of each company, on one side WAND’s unique expertise in the development of enterprise taxonomies and Expert System’s Cogito on the other side with its unique capability to analyze written text based on the comprehension of the meaning of each word, not only ensures the highest quality possible, but also opens up the opportunity to tackle the complexity of enterprise information management. With this new joint offer, companies will finally have full support for a faster and flexible information management process and immediate access to strategic information.”
Enterprise management teams are going to get excited about how Expert System and WAND will improve taxonomy selection and have more native integration with in-place data systems. One of the ways the two will combine their strengths is with the new automatic classification: when a WAND taxonomy is selecting, Expert System brings in its semantic based categorization rules and an engine for automatic categorization.
October 26, 2014
You can find an interesting discussion of the Semantic Web on Hacker News. Semantic Web search engines have had a difficult time capturing the imagination of the public. The write up and the comments advance the notion that the Semantic Web is alive and well, just invisible.
I found the statement from super Googler Peter Norvig a window into how Google views the Semantic Web. Here’s the snippet:
Peter Norvig put it best: “The semantic web is the future of the web, and always will be.” (For what it’s worth, the startup school video that quote comes from is worth watching: http://youtu.be/LNjJTgXujno?t=20m57s)
There are references to “semantic search” companies that have failed; for example, Ontoprise. There are links to clever cartoons.
The statement I highlighted was:
The underlying data just doesn’t necessarily map very well into the seem-web representations, so duplicates occur and possible values explode in their number of valid permutations even though they all mean the same handful of things. And it’s the read-only semantic-web, so you can’t just clean it, you have to map it.. Which is why I’m always amazed that http://www.wolframalpha.com/ works at all. And hopefully one day https://www.freebase.com/ will be a thing. I remember being excited about http://openrefine.org/ for “liberating” messy data into clean linked data… but it turns out that you really don’t want to curate your information “in the graph”; it seems obvious, but traditional relational datasets are infinitely more manageable than arbitrarily connected nodes in a graph. So, most CMS platforms are doing somewhat useful things in marking up their content in machine-readable ways (RDFa, schema.org [as evil as that debacle was], HTTP content-type negotiation and so on) either out-of-the-box or with trivially installed plugins.
Ah, content management systems. Now that’s the model for successful information access as long as one does not want engineering drawings, videos, audio, binaries, and a host of proprietary data types like i2 Analyst Notebook files.
Worth checking out the thread in my view.
Stephen E Arnold, October 26, 2014
October 22, 2014
In April 2014, I cited a report that suggested Hakia was moving forward. It now appears that the Hakia Web site has gone dark. Information about Hakia’s semantic system is available in this interview with Riza C. Berkan.
Stephen E Arnold, October 22, 2014
September 18, 2014
We ran a check on the search and content processing vendors in our file. The Hakia.com site appears to be down.
Hakia was a developer of semantic search and offered several demonstrations of its technology. To learn about the company, the interview with Riza C. Berkan, navigate to this Search Wizards Speak issue.
Stephen E Arnold, September 18, 2014
July 29, 2014
I pointed out in http://bit.ly/X9d219 that IDC, a mid tier consulting firm that has marketed my information without permission on Amazon of all places, has rolled out a new report about content processing. The academic sounding title is “The Knowledge Quotient: Unlocking the Hidden Value of Information.” Conflating knowledge and information is not logically satisfying to me. But you may find the two words dusted with “value” just the ticket to career success.
I have not read the report, but I did see a list of the “sponsors” of the study. The list, as I pointed out, was an eclectic group, including huge firms struggling for credibility (HP and IBM) down to consulting firms offering push ups for indexers.
One company on my list caused me to go back through my archive of search information. The firm that sparked my interest is Information Handling Services or IHS or Information Handling Service. The company is publicly traded and turning a decent profit. The revenue of IHS has moved toward $2 billion. If the global economy perks up and the defense sector is funded at pre-drawdown levels, IHS could become a $2 billion company.
IHS is a company with an interesting history and extensive experience with structured and unstructured search. Few of those with whom I interacted when I was working full time considered IHS a competitor to the likes of Autonomy, Endeca, and Funnelback.
In the 2013 10-K on page 20, IHS presents its “cumulative total return” in this way:
The green line looks like money. Another slant on the company’s performance can be seen in a chart available from Google Finance.
The Google chart shows that revenue is moving upwards, but operating margins are drifting downward and operating income is suppressed. Like Amazon, the costs for operating and information centric company are difficult to control. Amazon seems to have thrown in the towel. IHS is managing like the Dickens to maintain a profit for its stakeholders. For stakeholders, is the hope is that hefty profits will be forthcoming?
Source: Google Finance
My initial reaction was, “Is IHS trying to find new ways to generate higher margin revenue?”
Like Thomson Reuters and Reed Elsevier, IHS required different types of content processing plumbing to deliver its commercial databases. Technical librarians and the competitive intelligence professionals monitoring the defense sector are likely to know about IHS different products. The company provides access to standards documents, regulatory information, and Jane’s military hardware information services. (Yep, Jane’s still has access to retired naval officers with mutton chop whiskers and interesting tweed outfits. I observed these experts when I visited the company in England prior to IHS’s purchase of the outfit.)
The standard descriptions of IHS peg the company’s roots with a trade magazine outfit called Rogers Publishing. My former boss at Booz, Allen & Hamilton loved some of the IHS technical services. He was, prior to joining Booz, Allen the head of research at Martin Marietta, an IHS customer in the 1970s. Few remember that IHS was once tied in with Thyssen Bornemisza. (For those with an interest in history, there are some reports about the Baron that are difficult to believe. See http://bit.ly/1qIylne.)
Large professional publishing companies were early, if somewhat reluctant, supporters of SGML and XML. Running a query against a large collection of structured textual information could be painfully slow when one relied on traditional relational database management systems in the late 1980s. Without SGML/XML, repurposing content required humans. With scripts hammering on SGML/XML, creating new information products like directories and reports eliminated the expensive humans for the most part. Fewer expensive humans in the professional publishing business reduces costs…for a while at least.
IHS climbed on the SGML/XML diesel engine and began working to deliver snappy online search results. As profit margins for professional publishers were pressured by increasing marketing and technology costs, IHS followed the path of other information centric companies. IHS began buying content and services companies that, in theory, would give the professional publishing company a way to roll out new, higher margin products. Even secondary players in the professional publishing sector like Ebsco Electronic Publishing wanted to become billion dollar operations and then get even bigger. Rah, rah.
These growth dreams electrify many information company’s executives. The thought that every professional publishing company and every search vendor are chasing finite or constrained markets does not get much attention. Moving from dreams to dollars is getting more difficult, particularly in professional publishing and content processing businesses.
My view is that packaging up IHS content and content processing technology got a boost when IHS purchased the Invention Machine in mid 2012.
Years ago I attended a briefing by the founders of the Invention Machine. The company demonstrated that an engineer looking for a way to solve a problem could use the Invention Machine search system to identify candidate systems and methods from the processed content. I recall that the original demonstration data set was US patents and patent applications. My thought was that an engineer looking for a way to implement a particular function for a system could — if the Invention Machine system worked as presented — could present a patent result set. That result set could be scanned to eliminate any patents still in force. The resulting set of patents might yield a procedure that the person looking for a method could implement without having to worry about an infringement allegation. The original demonstration was okay, but like most “new” search technologies, Invention Machine faced funding, marketing, and performance challenges. IHS acquired Invention Machine, its technologies, its Eastern European developers, and embraced the tagging, searching, and reporting capabilities of the Invention Machine.
The Goldfire idea is that an IHS client can license certain IHS databases (called “knowledge collections”) and then use Goldfire / Invention Machine search and analytic tools to get the knowledge “nuggets” needed to procure a missile guidance component.
The jargon for this finding function is “semantic concept lenses.” If the licensee has content in a form supported by Goldfire, the licensee can search and analyze IHS information along with information the client has from its own sources. A bit more color is available at http://bit.ly/WLA2Dp.
The IHS search system is described in terms familiar to a librarian and a technical analyst; for example, here’s the attributes for Goldfire “cloud” from an IHS 2013 news release:
- “Patented semantic search technology providing precise access to answers in documents. [Note: IHS has numerous patents but it is not clear what specific inventions or assigned inventions apply directly to the search and retrieval solution(s)]
- Access to more than 90 million scientific and technical “must have” documents curated by IHS. This aggregated, pre-indexed collection spans patents, premium IHS content sources, trusted third-party content providers, and the Deep Web.
- The ability to semantically index and research across any desired web-accessible information such as competitive or supplier websites, social media platforms and RSS feeds – turning these into strategic knowledge assets.
- More than 70 concept lenses that promote rapid research, browsing and filtering of related results sets thus enabling engineers to explore a concept’s definitions, applications, advantages, disadvantages and more.
- Insights into consumer sentiment giving strategy, product management and marketing teams the ability to recognize customer opinions, perceptions, attitudes, habits and expectations – relative to their own brands and to those of their partners’ and competitors’ – as expressed in social media and on the Web.”
Most of these will resonate with those familiar with the assertions of enterprise search and content processing vendors. The spin, which I find notable, is that IHS delivers both content and information retrieval. Most enterprise search vendors provide technology for finding and analyzing data. The licensee has to provide the content unless the enterprise search vendor crawls the Web or other sources, creates an archive or a basic index, and then provides an interface that is usually positioned as indexing “all content” for the user.
According to Virtual Strategy Magazine (which presumably does not cover “real” strategy), I learned that US 8666730:
covers the semantic concept “lenses” that IHS Goldfire uses to accelerate research. The lenses correlate with the human knowledge system, organizing and presenting answers to engineers’ or scientists’ questions – even questions they did not think to ask. These lenses surface concepts in documents’ text, enabling users to rapidly explore a concept’s definitions, applications, advantages, disadvantages and more.
The key differentiator is claimed to move IHS Goldfire up a notch. The write up states:
Unlike today’s textual, question-answering technologies, which work as meta-search engines to search for text fragments by keyword and then try to extract answers similar to the text fragment, the IHS Goldfire approach is entirely unique – providing relevant answers, not lists of largely irrelevant documents. With IHS Goldfire, hundreds of different document types can be parsed by a semantic processor to extract semantic relationships like subject-action-object, cause-and-effect and dozens more. Answer-extraction patterns are then applied on top of the semantic data extracted from documents and answers are saved to a searchable database.
According to Igor Sovpel, IHS Goldfire:
“Today’s engineers and technical professionals are underserved by traditional Internet and enterprise search applications, which help them find only the documents they already know exist,” said Igor Sovpel, chief scientist for IHS Goldfire. “With this patent, only IHS Goldfire gives users the ability to quickly synthesize optimal answers to a variety of complex challenges.”
Is IHS’ new marketing push in “knowledge” and related fields likely to have an immediate and direct impact on the enterprise search market? Perhaps.
There are several observations that occurred to me as I flipped through my archive of IHS, Thyssen, and Invention Machine information.
First, IHS has strong brand recognition in what I would call the librarian and technical analyst for engineering demographic. Outside of lucrative but quite niche markets for petrochemical information or silhouettes and specifications for the SU 35, IHS suffers the same problem of Thomson Reuters and Wolters Kluwer. Most senior managers are not familiar with the company or its many brands. Positioning Goldfire as an enterprise search or enterprise technical documentation/data analysis tool will require a heck of a lot of effective marketing. Will positioning IHS cheek by jowl with IBM and a consulting firm that teaches indexing address this visibility problem? The odds could be long.
Second, search engine optimization folks can seize on the name Goldfire and create some dissonance for IHS in the public Web search indexes. I know that companies like Attivio and Microsoft use the phrase “beyond search” to attract traffic to their Web sites. I can see the same thing happening. IHS competes with other professional publishing companies looking for a way to address their own marketing problems. A good SEO name like “Goldfire” could come under attack and quickly. I can envision lesser competitors usurping IHS’ value claims which may delay some sales or further confuse an already uncertain prospect.
Third, enterprise search and enterprise content analytics is proving to be a difficult market from which to wring profitable, sustainable revenue. If IHS is successful, the third party licensees of IHS data who resell that information to their online customers might take steps to renegotiate contracts for revenue sharing. IHS will then have to ramp up its enterprise search revenues to keep or outpace revenues from third party licensees. Addressing this problem can be interesting for those managers responsible for the negotiations.
Finally, enterprise search has a lot of companies planning on generating millions or billions from search. There can be only one prom queen and a small number of “close but no cigar” runner ups. Which company will snatch the crown?
This IHS search initiative will be interesting to watch.
Stephen E Arnold, July 29, 2014
July 21, 2014
The article titled Text Analytics Company Linguamatics Boosts Enterprise Search with Semantic Enrichment on MarketWatch discusses the launch of 12E Semantic Enrichment from Linguamatics. The new release allows for the mining of a variety of texts, from scientific literature to patents to social media. It promises faster, more relevant search for users. The article states,
“Enterprise search engines consume this enriched metadata to provide a faster, more effective search for users. I2E uses natural language processing (NLP) technology to find concepts in the right context, combined with a range of other strategies including application of ontologies, taxonomies, thesauri, rule-based pattern matching and disambiguation based on context. This allows enterprise search engines to gain a better understanding of documents in order to provide a richer search experience and increase findability, which enables users to spend less time on search.”
Whether they are spinning semantics for search, or if it is search spun for semantics, Linguamatics has made their technology available to tens of thousands of users of enterprise search. Representative John M. Brimacombe was straightforward in his comments about the disappointment surrounding enterprise search, but optimistic about 12E. It is currently being used by many top organizations, as well as the Food and Drug Administration.
Chelsea Kerwin, July 21, 2014
June 18, 2014
Another semantic system turns out the lights. SemanticWeb hosts a guest post from the founders of Sindice titled, “End of Support for the Sindice.com Search Engine: History, Lessons Learned, and Legacy.” The article delves into a wealth of technical details. It opens, however, with this modest introduction:
“Since 2007, Sindice.com has served as a specialized search engine that would do a crazy thing: throw away the text and just concentrate on the ‘markup’ of the web pages. Sindice would provide an advanced API to query RDF, RDFa, Microformats and Microdata found on web sites, together with a number of other services. Sindice turned useful, we guess, as approximately 1100 scientific works in the last few years refer to it in a way or another.”
The team decided to end support for the specialized search engine in order to focus on serving enterprise users. Besides, they say, their vision has been realized. They write:
“With the launch in 2012 of Schema.org, Google and others have effectively embraced the vision of the ‘Semantic Web.’ With the RDFa standard, and now even more with JSON-LD, richer markup is becoming more and more popular on websites. While there might not be public web data ‘search APIs,’ large collections of crawled data (pages and RDF) exist today which are made available on cloud computing platforms for easy analysis with your favorite big data paradigm.”
The account begins at the beginning, with the team’s first goal of developing a simpler API, and ends with their transition to the startup SindiceTech. In between are interesting details, like a description of their 60-machine “Webstar” operations cluster and details on how they leveraged Hadoop for their RDF analytics. We may be sad to see support for Sindice.com go, but at least the team has shared some of their wisdom on the way out.
Cynthia Murrell, June 18, 2014
May 8, 2014
RSuite content management users can now can tap into TEMIS, we learn from “RSuite CMS Leverages TEMIS’s Content Enrichment Capabilities to Deliver a Powerful Semantic Solution.” The partnership makes TEMIS’s semantic enrichment capabilities available to RSuite’s customers in the publishing, government, and corporate arenas. The deal was announced at this year’s MarkLogic World conference, held April seventh in San Francisco; both companies are MarkLogic partners.
The press release elaborates:
“RSuite CMS provides an intuitive user interface that minimizes actions required to execute complex searches across an entire set of content. The solution can globally apply metadata, dynamically organize massive amounts of documents into collections, package and distribute content to licensing partners, and enables customers to meet their multi-channel publishing goals.
“By leveraging TEMIS’s Luxid® Content Enrichment Platform, RSuite CMS can enable customers to automatically enrich their content with domain-specific metadata directly within their publishing workflows. This enables faster and more scalable content indexing, improved metadata consistency and governance, more efficient authoring, and more powerful search and discovery features within customer applications and portals.”
With its focus on publishing and media, RSuite strives to meet today’s ever-evolving publication challenges. The company serves such big names as HarperCollins, Audible, and Oxford University Press. RSuite was launched in 2000 and is located in Audubon, Pennsylvania.
With its collaborative platform, TEMIS adds domain-specific metadata to clients’ data, allowing publishers to supply more relevant information to their own audiences. TEMIS maintains several offices across Europe and North America.
Cynthia Murrell, May 08, 2014