November 25, 2016
A brief write-up at the ontotext blog, “The Knowledge Discovery Quest,” presents a noble vision of the search field. Philologist and blogger Teodora Petkova observed that semantic search is the key to bringing together data from different sources and exploring connections. She elaborates:
On a more practical note, semantic search is about efficient enterprise content usage. As one of the biggest losses of knowledge happens due to inefficient management and retrieval of information. The ability to search for meaning not for keywords brings us a step closer to efficient information management.
If semantic search had a separate icon from the one traditional search has it would have been a microscope. Why? Because semantic search is looking at content as if through the magnifying lens of a microscope. The technology helps us explore large amounts of systems and the connections between them. Sharpening our ability to join the dots, semantic search enhances the way we look for clues and compare correlations on our knowledge discovery quest.
At the bottom of the post is a slideshow on this “knowledge discovery quest.” Sure, it also serves to illustrate how ontotext could help, but we can’t blame them for drumming up business through their own blog. We actually appreciate the company’s approach to semantic search, and we’d be curious to see how they manage the intricacies of content conversion and normalization. Founded in 2000, ontotext is based in Bulgaria.
November 10, 2016
The article on O’Reilly titled Capturing Semantic Meanings Using Deep Learning explores word embedding in natural language processing. NLP systems typically encode word strings, but word embedding offers a more complex approach that emphasizes relationships and similarities between words by treating them as vectors. The article posits,
For example, let’s take the words woman, man, queen, and king. We can get their vector representations and use basic algebraic operations to find semantic similarities. Measuring similarity between vectors is possible using measures such as cosine similarity. So, when we subtract the vector of the word man from the vector of the word woman, then its cosine distance would be close to the distance between the word queen minus the word king (see Figure 1).
The article investigates the various neural network models that prevent the expense of working with large data. Word2Vec, CBOW, and continuous skip-gram are touted as models and the article goes into great technical detail about the entire process. The final result is that the vectors understand the semantic relationship between the words in the example. Why does this approach to NLP matter? A few applications include predicting future business applications, sentiment analysis, and semantic image searches.
November 9, 2016
Relationships among metadata, words, and other “information” are important. Google’s Dr. Alon Halevy, founder of Transformic which Google acquired in 2006, has been beavering away in this field for a number of years. His work on “dataspaces” is important for Google and germane to the “intelligence-oriented” systems which knit together disparate factoids about a person, event, or organization. I recall one of his presentations—specifically the PODs 2006 keynote–in which he reproduced a “colleague’s” diagram of a flow chart which made it easy to see who received the document, who edited the document and what changes were made, and to whom recipients of the document forward the document.
Here’s the diagram from Dr. Halevy’s lecture:
Principles of Dataspace Systems, Slide 4 by Dr. Alon Halevy at delivered on June 26, 2006 at PODs. Note that “PODs” is an annual ACM database-centric conference.
I found the Halevy discussion interesting.
November 4, 2016
Navigate to “Semantic Web Speculations.” After working through the write up, I believe there are some useful insights in the write up.
I highlighted this passage:
Reaching to information has been changed quite dramatically from printed manuscripts to Google age. Being knowledgeable less involves memorizing but more the ability to find an information and ability to connect information in a map-like pattern. However, with semantic tools become more prevalent and a primary mode of reaching information changes, this is open to transform.
I understand that the Google has changed how people locate needed information. Perhaps the information is accurate? Perhaps the information is filtered to present a view shaped by a higher actor’s preferences? I agree that the way in which people “reach” information is going to change.
I also noted this statement:
New way of being knowledgeable in the era of semantic web does not necessarily include having the ability to reach an information.
Does this mean that one can find information but not access the source? Does the statement suggest that one does not have to know a fact because awareness that it is there delivers the knowledge payload?
I also circled this endorsement of link analysis, which has been around for decades:
It will be more common than any time that relations between data points will have more visibility and access. When something is more accessible, it brings meta-abilities to play with them.
The idea that the conversion of unstructured information into structured data is a truism. However, the ability to make sense of the available information remains a work in progress as is the thinking about semantics.
Stephen E Arnold, November 4, 2016
November 1, 2016
Google no longer will have one search “engine.” Google will offer mobile search and desktop search. The decision is important because it says to me, in effect, mobile is where it is at. But for how long will the Googlers support desktop search when advertisers have no choice but embrace mobile and the elegance of marketing to specific pairs of eyeballs?
Against the background of the mobile search and end of privacy shift at the GOOG, I read “The Future of Search Engines – Semantic Search.” To point out that the future of search engines is probably somewhat fluid at the moment is a bit of an understatement.
The write up profiles several less well known information retrieval systems. Those mentioned include:
- BizNar, developed by one of the wizards behind Verity, provides search for a number of US government clients. The system has some interesting features, but I recall that I had to wait as “fast” responses were updated with slower responses.
- DuckDuckGo, a Web search system which periodically mounts a PR campaign about how fast its user base is growing or how many queries it processes keeps going up.
- Omnity, allegedly a next generation search system, “gives companies and institutions of all sizes the ability to instantly [sic] discover hidden patterns of interconnection within and between fields of knowledge as diverse as science, finance, law, engineering, and medicine.,” No word about the corpuses in the index, the response time, or how the system compares to gold old Dialog.
- Siri, arguably, the least effective of the voice search systems available for Apple iPhone users.
- Wolfram Alpha, the perennial underdog, in search and question answering.
- Yippy, which strikes me as a system similar to that offered by Vivisimo before its sale to IBM for about $20 million in 2012. Vivisimo’s clustering was interesting, but I like the company’s method for sending a well formed query to multiple Web indexes.
The write up is less about semantic search than doing a quick online search for “semantic search” and then picking a handful of systems to describe. I know the idea of “semantic search” excites some folks, but the reality is that semantic methods have been a part of search plumbing for many years. The semantic search revolution arrived not long after the Saturday Night Fever album hit number one.
Download open source solutions like Lucene/Solr and move on, gentle reader.
Stephen E Arnold, November 1, 2016
October 20, 2016
Quick update from the Australian content processing vendor SSAP or Semantic Software Asia Pacific Limited. The company’s Semantiro platform now supports the new Ontocuro tool.
Semantiro is a platform which “promises the ability to enrich the semantics of data collected from disparate data sources, and enables a computer to understand its context and meaning,” according to “Semantic Software Announces Artificial Intelligence Offering.”
Ontocuro is the first suite of core components to be released under the Semantiro platform. These bespoke components will allow users to safely prune unwanted concepts and axioms; validate existing, new or refined ontologies; and import, store and share these ontologies via the Library.
The company’s approach is to leapfrog the complex interfaces other indexing and data tagging tools impose on the user. The company’s Web site for Ontocuro is at this link.
Stephen E Arnold, October 20, 2016
September 27, 2016
I read “Exploiting Semantic Annotation of Content with Linked Data to Improve Searching Performance in Web Repositories.” The nub of the paper is, “Better together.” The idea is that key words work if one knows the subject and the terminology required to snag the desired information.
If not, then semantic indexing provides another path. If the conclusion seems obvious, consider that two paths are better for users. The researchers used Elasticsearch. However, the real world issue is the cost of expertise and the computational cost and time required to add another path. You can download the journal paper at this link.
Stephen E Arnold, September 27, 2016
September 22, 2016
The article titled Semantic Tagging Can Improve Digital Content Publishing on Aptara Corp. blog reveals the importance of indexing. The article waves the flag of semantic tagging at the publishing industry, which has been pushed into digital content kicking and screaming. The difficulties involved in compatibility across networks, operating systems, and a device are quite a headache. Semantic tagging could help, if only anyone understood what it is. The article enlightens us,
Put simply, semantic markups are used in the behind-the-scene operations. However, their importance cannot be understated; proprietary software is required to create the metadata and assign the appropriate tags, which influence the level of quality experienced when delivering, finding and interacting with the content… There have been many articles that have agreed the concept of intelligent content is best summarized by Ann Rockley’s definition, which is “content that’s structurally rich and semantically categorized and therefore automatically discoverable, reusable, reconfigurable and adaptable.
The application to the publishing industry is obvious when put in terms of increasing searchability. Any student who has used JSTOR knows the frustrations of searching digital content. It is a complicated process that indexing, if administered correctly, will make much easier. The article points out that authors are competing not only with each other, but also with the endless stream of content being created on social media platforms like Facebook and Twitter. Publishers need to take advantage of semantic markups and every other resource at their disposal to even the playing field.
Chelsea Kerwin, September 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/
September 8, 2016
I noted this catchphrase: “An enterprise without a semantic layer is like a country without a map.” I immediately thought of this statement made by Polish-American scientist and philosopher Alfred Korzybski:
The map is not the territory.
When I think about enterprise search, I am thrilled to have an opportunity to do the type of thinking demanded in my college class in philosophy and logic. Great fun. I am confident that any procurement team will be invigorated by an animated discussion about representations of reality.
I did a bit of digging and located “Introducing a Graph-based Semantic Layer in Enterprises” as the source of the “country without a map” statement.
What is interesting about the article is that the payload appears at the end of the write up. The magic of information representation as a way to make enterprise search finally work is technology from a company called Pool Party.
Pool Party describes itself this way:
Pool Party is a semantic technology platform developed, owned and licensed by the Semantic Web Company. The company is also involved in international R&D projects, which continuously impact the product development. The EU-based company has been a pioneer in the Semantic Web for over a decade.
From my reading of the article and the company’s marketing collateral it strikes me that this is a 12 year old semantic software and consulting company.
The idea is that there is a pool of structured and unstructured information. The company performs content processing and offers such features as:
- Taxonomy editor and maintenance
- A controlled vocabulary management component
- An audit trail to see who changed what and when
- Link analysis
- User role management
The write up with the catchphrase provides an informational foundation for the company’s semantic approach to enterprise search and retrieval; for example, the company’s four layered architecture:
The base is the content layer. There is a metadata layer which in Harrod’s Creek is called “indexing”. There is the “semantic layer”. At the top is the interface layer. The “semantic” layer seems to be the secret sauce in the recipe for information access. The phrase used to describe the value added content processing is “semantic knowledge graphs.” These, according to the article:
let you find out unknown linkages or even non-obvious patterns to give you new insights into your data.
The system performs entity extraction, supports custom ontologies (a concept designed to make subject matter experts quiver), text analysis, and “graph search.”
Graph search is, according to the company’s Web site:
Semantic search at the highest level: Pool Party Graph Search Server combines the power of graph databases and SPARQL engines with features of ‘traditional’ search engines. Document search and visual analytics: Benefit from additional insights through interactive visualizations of reports and search results derived from your data lake by executing sophisticated SPARQL queries.
To make this more clear, the company offers a number of videos via YouTube.
The idea reminded us of the approach taken in BAE NetReveal and Palantir Gotham products.
Pool Party emphasizes, as does Palantir, that humans play an important role in the system. Instead of “augmented intelligence,” the article describes the approach methods which “combine machine learning and human intelligence.”
The company’s annual growth rate is more than 20 percent. The firm has customers in more than 20 countries. Customers include Pearson, Credit Suisse, the European Commission, Springer Nature, Wolters Kluwer, and the World Bank and “many other customers.” The firm’s projected “Euro R&D project volume” is 17 million (although I am not sure what this 17,000,000 number means. The company’s partners include Accenture, Complexible, Digirati, and EPAM, among others.
I noted that the company uses the catchphrase: “Semantic Web Company” and the catchphrase “Linking data to knowledge.”
The catchphrase, I assume, make it easier for some to understand the firm’s graph based semantic approach. I am still mired in figuring out that the map is not the territory.
Stephen E Arnold, September 8, 2016
August 16, 2016
In an exclusive interview, Yippy’s head of enterprise search reveals that Yippy launched an enterprise search technology that Google Search Appliance users are converting to now that Google is sunsetting its GSA products.
Yippy also has its sights targeting the rest of the high-growth market for cloud-based enterprise search. Not familiar with Yippy, its IBM tie up, and its implementation of the Velocity search and clustering technology? Yippy’s Michael Cizmar gives some insight into this company’s search-and-retrieval vision.
Yippy ((OTC PINK:YIPI) is a publicly-trade company providing search, content processing, and engineering services. The company’s catchphrase is, “Welcome to your data.”
The core technology is the Velocity system, developed by Carnegie Mellon computer scientists. When IBM purchased Vivisimio, Yippy had already obtained rights to the Velocity technology prior to the IBM acquisition of Vivisimo. I learned from my interview with Mr. Cizmar that IBM is one of the largest shareholders in Yippy. Other facets of the deal included some IBM Watson technology.
This year (2016) Yippy purchased one of the most recognized firms supporting the now-discontinued Google Search Appliance. Yippy has been tallying important accounts and expanding its service array.
John Cizmar, Yippy’s senior manager for enterprise search
Beyond Search interviewed Michael Cizmar, the head of Yippy’s enterprise search division. Cizmar found MC+A and built a thriving business around the Google Search Appliance. Google stepped away from on premises hardware, and Yippy seized the opportunity to bolster its expanding business.
I spoke with Cizmar on August 15, 2016. The interview revealed a number of little known facts about a company which is gaining success in the enterprise information market.
Cizmar told me that when the Google Search Appliance was discontinued, he realized that the Yippy technology could fill the void and offer more effective enterprise findability. He said, “When Yippy and I began to talk about Google’s abandoning the GSA, I realized that by teaming up with Yippy, we could fill the void left by Google, and in fact, we could surpass Google’s capabilities.”
Cizmar described the advantages of the Yippy approach to enterprise search this way:
We have an enterprise-proven search core. The Vivisimo engineers leapfrogged the technology dating from the 1990s which forms much of Autonomy IDOL, Endeca, and even Google’s search. We have the connector libraries THAT WE ACQUIRED FROM MUSE GLOBAL. We have used the security experience gained via the Google Search Appliance deployments and integration projects to give Yippy what we call “field level security.” Users see only the part of content they are authorized to view. Also, we have methodologies and processes to allow quick, hassle-free deployments in commercial enterprises to permit public access, private access, and hybrid or mixed system access situations.
With the buzz about open source, I wanted to know where Yippy fit into the world of Lucene, Solr, and the other enterprise software solutions. Cizmar said:
I think the customers are looking for vendors who can meet their needs, particularly with security and smooth deployment. In a couple of years, most search vendors will be using an approach similar to ours. Right now, however, I think we have an advantage because we can perform the work directly….Open source search systems do not have Yippy-like content intake or content ingestion frameworks. Importing text or an Oracle table is easy. Acquiring large volumes of diverse content continues to be an issue for many search and content processing systems…. Most competitors are beginning to offer cloud solutions. We have cloud options for our services. A customer picks an approach, and we have the mechanism in place to deploy in a matter of a day or two.
Connecting to different types of content is a priority at Yippy. Even through the company has a wide array of import filters and content processing components, Cizmar revealed that Yippy is “enhanced the company’s connector framework.”
I remarked that most search vendors do not have a framework, relying instead on expensive components licensed from vendors such as Oracle and Salesforce. He smiled and said, “Yes, a framework, not a widget.”
Cizmar emphasized that the Yippy IBM Google connections were important to many of the company’s customers plus we have also acquired the Muse Global connectors and the ability to build connectors on the fly. He observed:
Nobody else has Watson Explorer powering the search, and nobody else has the Google Innovation Partner of the Year deploying the search. Everybody tries to do it. We are actually doing it.
Cizmar made an interesting side observation. He suggested that Internet search needed to be better. Is indexing the entire Internet in Yippy’s future? Cizmar smiled. He told me:
Yippy has a clear blueprint for becoming a leader in cloud computing technology.
For the full text of the interview with Yippy’s head of enterprise search, Michael Cizmar, navigate to the complete Search Wizards Speak interview. Information about Yippy is available at http://yippyinc.com/.
Stephen E Arnold, August 16, 2016