Honkin' News banner

Gleaning Insights and Advantages from Semantic Tagging for Digital Content

September 22, 2016

The article titled Semantic Tagging Can Improve Digital Content Publishing on Aptara Corp. blog reveals the importance of indexing. The article waves the flag of semantic tagging at the publishing industry, which has been pushed into digital content kicking and screaming. The difficulties involved in compatibility across networks, operating systems, and a device are quite a headache. Semantic tagging could help, if only anyone understood what it is. The article enlightens us,

Put simply, semantic markups are used in the behind-the-scene operations. However, their importance cannot be understated; proprietary software is required to create the metadata and assign the appropriate tags, which influence the level of quality experienced when delivering, finding and interacting with the content… There have been many articles that have agreed the concept of intelligent content is best summarized by Ann Rockley’s definition, which is “content that’s structurally rich and semantically categorized and therefore automatically discoverable, reusable, reconfigurable and adaptable.

The application to the publishing industry is obvious when put in terms of increasing searchability. Any student who has used JSTOR knows the frustrations of searching digital content. It is a complicated process that indexing, if administered correctly, will make much easier. The article points out that authors are competing not only with each other, but also with the endless stream of content being created on social media platforms like Facebook and Twitter. Publishers need to take advantage of semantic markups and every other resource at their disposal to even the playing field.

Chelsea Kerwin, September 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

Enterprise Search: Pool Party and Philosophy 101

September 8, 2016

I noted this catchphrase: “An enterprise without a semantic layer is like a country without a map.” I immediately thought of this statement made by Polish-American scientist and philosopher Alfred Korzybski:

The map is not the territory.

When I think about enterprise search, I am thrilled to have an opportunity to do the type of thinking demanded in my college class in philosophy and logic. Great fun. I am confident that any procurement team will be invigorated by an animated discussion about representations of reality.

I did a bit of digging and located “Introducing a Graph-based Semantic Layer in Enterprises” as the source of the “country without a map” statement.

What is interesting about the article is that the payload appears at the end of the write up. The magic of information representation as a way to make enterprise search finally work is technology from a company called Pool Party.

Pool Party describes itself this way:

Pool Party is a semantic technology platform developed, owned and licensed by the Semantic Web Company. The company is also involved in international R&D projects, which continuously impact the product development. The EU-based company has been a pioneer in the Semantic Web for over a decade.

From my reading of the article and the company’s marketing collateral it strikes me that this is a 12 year old semantic software and consulting company.

The idea is that there is a pool of structured and unstructured information. The company performs content processing and offers such features as:

  • Taxonomy editor and maintenance
  • A controlled vocabulary management component
  • An audit trail to see who changed what and when
  • Link analysis
  • User role management
  • Workflows.

The write up with the catchphrase provides an informational foundation for the company’s semantic approach to enterprise search and retrieval; for example, the company’s four layered architecture:

image

The base is the content layer. There is a metadata layer which in Harrod’s Creek is called “indexing”. There is the “semantic layer”. At the top is the interface layer. The “semantic” layer seems to be the secret sauce in the recipe for information access. The phrase used to describe the value added content processing is “semantic knowledge graphs.” These, according to the article:

let you find out unknown linkages or even non-obvious patterns to give you new insights into your data.

The system performs entity extraction, supports custom ontologies (a concept designed to make subject matter experts quiver), text analysis, and “graph search.”

Graph search is, according to the company’s Web site:

Semantic search at the highest level: Pool Party Graph Search Server combines the power of graph databases and SPARQL engines with features of ‘traditional’ search engines. Document search and visual  analytics: Benefit from additional  insights through interactive visualizations of reports and search results derived from your data lake by executing sophisticated SPARQL queries.

To make this more clear, the company offers a number of videos via YouTube.

The idea reminded us of the approach taken in BAE NetReveal and Palantir Gotham products.

Pool Party emphasizes, as does Palantir, that humans play an important role in the system. Instead of “augmented intelligence,” the article describes the approach methods which “combine machine learning and human intelligence.”

The company’s annual growth rate is more than 20 percent. The firm has customers in more than 20 countries. Customers include Pearson, Credit Suisse, the European Commission, Springer Nature, Wolters Kluwer, and the World Bank and “many other customers.” The firm’s projected “Euro R&D project volume” is 17 million (although I am not sure what this 17,000,000 number means. The company’s partners include Accenture, Complexible, Digirati, and EPAM, among others.

I noted that the company uses the catchphrase: “Semantic Web Company” and the catchphrase “Linking data to knowledge.”

The catchphrase, I assume, make it easier for some to understand the firm’s graph based semantic approach. I am still mired in figuring out that the map is not the territory.

Stephen E Arnold, September 8, 2016

Yippy Revealed: An Interview with Michael Cizmar, Head of Enterprise Search Division

August 16, 2016

In an exclusive interview, Yippy’s head of enterprise search reveals that Yippy launched an enterprise search technology that Google Search Appliance users are converting to now that Google is sunsetting its GSA products.

Yippy also has its sights targeting the rest of the high-growth market for cloud-based enterprise search. Not familiar with Yippy, its IBM tie up, and its implementation of the Velocity search and clustering technology? Yippy’s Michael Cizmar gives some insight into this company’s search-and-retrieval vision.

Yippy ((OTC PINK:YIPI) is a publicly-trade company providing search, content processing, and engineering services. The company’s catchphrase is, “Welcome to your data.”

The core technology is the Velocity system, developed by Carnegie Mellon computer scientists. When IBM purchased Vivisimio, Yippy had already obtained rights to the Velocity technology prior to the IBM acquisition of Vivisimo. I learned from my interview with Mr. Cizmar that IBM is one of the largest shareholders in Yippy. Other facets of the deal included some IBM Watson technology.

This year (2016) Yippy purchased one of the most recognized firms supporting the now-discontinued Google Search Appliance. Yippy has been tallying important accounts and expanding its service array.

image

John Cizmar, Yippy’s senior manager for enterprise search

Beyond Search interviewed Michael Cizmar, the head of Yippy’s enterprise search division. Cizmar found MC+A and built a thriving business around the Google Search Appliance. Google stepped away from on premises hardware, and Yippy seized the opportunity to bolster its expanding business.

I spoke with Cizmar on August 15, 2016. The interview revealed a number of little known facts about a company which is gaining success in the enterprise information market.

Cizmar told me that when the Google Search Appliance was discontinued, he realized that the Yippy technology could fill the void and offer more effective enterprise findability.  He said, “When Yippy and I began to talk about Google’s abandoning the GSA, I realized that by teaming up with Yippy, we could fill the void left by Google, and in fact, we could surpass Google’s capabilities.”

Cizmar described the advantages of the Yippy approach to enterprise search this way:

We have an enterprise-proven search core. The Vivisimo engineers leapfrogged the technology dating from the 1990s which forms much of Autonomy IDOL, Endeca, and even Google’s search. We have the connector libraries THAT WE ACQUIRED FROM MUSE GLOBAL. We have used the security experience gained via the Google Search Appliance deployments and integration projects to give Yippy what we call “field level security.” Users see only the part of content they are authorized to view. Also, we have methodologies and processes to allow quick, hassle-free deployments in commercial enterprises to permit public access, private access, and hybrid or mixed system access situations.

With the buzz about open source, I wanted to know where Yippy fit into the world of Lucene, Solr, and the other enterprise software solutions. Cizmar said:

I think the customers are looking for vendors who can meet their needs, particularly with security and smooth deployment. In a couple of years, most search vendors will be using an approach similar to ours. Right now, however, I think we have an advantage because we can perform the work directly….Open source search systems do not have Yippy-like content intake or content ingestion frameworks. Importing text or an Oracle table is easy. Acquiring large volumes of diverse content continues to be an issue for many search and content processing systems…. Most competitors are beginning to offer cloud solutions. We have cloud options for our services. A customer picks an approach, and we have the mechanism in place to deploy in a matter of a day or two.

Connecting to different types of content is a priority at Yippy. Even through the company has a wide array of import filters and content processing components, Cizmar revealed that Yippy is “enhanced the company’s connector framework.”

I remarked that most search vendors do not have a framework, relying instead on expensive components licensed from vendors such as Oracle and Salesforce. He smiled and said, “Yes, a framework, not a widget.”

Cizmar emphasized that the Yippy IBM Google connections were important to many of the company’s customers plus we have also acquired the Muse Global connectors and the ability to build connectors on the fly. He observed:

Nobody else has Watson Explorer powering the search, and nobody else has the Google Innovation Partner of the Year deploying the search. Everybody tries to do it. We are actually doing it.

Cizmar made an interesting side observation. He suggested that Internet search needed to be better. Is indexing the entire Internet in Yippy’s future? Cizmar smiled. He told me:

Yippy has a clear blueprint for becoming a leader in cloud computing technology.

For the full text of the interview with Yippy’s head of enterprise search, Michael Cizmar, navigate to the complete Search Wizards Speak interview. Information about Yippy is available at http://yippyinc.com/.

Stephen E Arnold, August 16, 2016

Semantify Secures Second Funding Round

August 4, 2016

Data-management firm Semantify has secured more funding, we learn from “KGC Capital Invests in Semantify, Leaders in Cognitive Discovery and Analytics” at Benzinga. The write-up tells us primary investor KGC Capital was joined by KDWC Venture Fund and Bridge Investments in making the investment, as well as by existing investors (including its founder, Vishy Dasari.) The funds from this Series A funding round will be used to address increased delivery, distribution, and packaging needs.

The press release describes Semantify’s platform:

“Semantify automates connecting information in real time from multiple silos, and empowers non-technical users to independently gain relevant, contextual, and actionable insights using a free form and friction-free query interface, across both structured and unstructured content. With Semantify, there would be no need to depend on data experts to code queries and blend, curate, index and prepare data or to replicate data in a new database. A new generation self-service enterprise Ad-hoc discovery and analytics platform, it combines natural language processing (NLP), machine learning and advanced semantic modeling capabilities, in a single seamless proprietary platform. This makes it a pioneer in democratization of independent, on demand information access to potentially hundreds of millions of users in the enterprise and e-commerce world.”

Semantify cites their “fundamentally unique” approach to developing data-management technology as the force behind their rapid deployment cycles, low maintenance needs, and lowered costs. Formerly based in Delaware, the company is moving their headquarters to Chicago (where their investors are based). Semantify was founded in 2008. The company is also hiring; their About page declares, toward the bottom: “Growing fast. We need people;” as of this writing, they are seeking database/ BI experts, QA specialists, data scientists & knowledge modelers, business analysts, program & project managers, and team leads.

 

 

Cynthia Murrell, August 4, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

 

Search Experts Rejoice. Work Abounds in Semantics

July 29, 2016

Are you a struggling search engine optimization “expert”? Do you know how to use Google to look up information? Can you say “semantics” five times without slurring your words?

If you answered “yes” to two of these questions, you can apply for the flurry of job openings for “semantic experts.” Incredible, I know. Just think. Unemployed SEO mavens, failed middle school teachers, and clueless webmasters can join the many folks with PhDs in the booming semantic technology sector.

Just think. No more Uber driving on Friday and Saturday nights. No more shame at a conference when someone asks, “What is it you do exactly?”

Navigate to “Semantic Technology Experts In Demand.” Get the truth. Don’t worry to much about:

  • A definition of semantics
  • A knowledge of semantic methods which actually work
  • How semantic methods are implemented
  • Which numerical recipes are most likely to generate accurate outputs.

Cash in now. Embrace this truth:

If you’re not heavily involved in the data world, you may not have heard of semantic technology, but it might be time to give the category some attention. It’s one of those areas of tech that’s becoming more important as organizations of all kinds contend with streams of information that contain multiple data structures (or no structures) and move at speeds that approach the threshold of mind-boggling. If you follow the news, you can watch the technology’s spread through a variety of industries and products. Ford, for example, recently acquired California startup Civil Maps, which develops and maintains live semantic maps of all the roads in the United States. And health IT experts say the day is coming when “data silos and lack of semantic interoperability will not be tolerated.”

If you spent months or years learning about Big Data, the cloud, and natural language processing, you can repurpose your expertise. Just say, “I am an expert in semantics.” Easy, right?

Stephen E Arnold, July 29, 2016

Semantics Made Easier

May 9, 2016

For fans of semantic technology, Ontotext has a late spring delight for you. The semantic platform vendor Ontotext has released GraphDB 7. I read “Ontotext Releases New Version of Semantic Graph Database.” According to the announcement, set up and data access are easier. I learned:

The new release offers new tools to access and explore data, eliminating the need to know everything about the dataset before start working with it. GraphDB 7 enables users to navigate their way through third-party and any other dataset regardless of data volumes, which makes it a powerful Big Data analytics tool. Ver.7 offers visual exploration of the loaded data schema – ontology, interactive query builder for better entity retrieval, and full support for RDF 1.1 allowing smooth import of a huge number of public Open Data as well as proprietary Linked Datasets.

If you want to have a Palantir-type system, check out Ontotext. The company is confident that semantic technology will yield benefits, a claim made by other semantic technology vendors. But the complexity challenges associated with conversion and normalization of content is likely to be a pebble in the semantic sneaker.

Stephen E Arnold, May 9, 2016

Semantic Search: Clustering and Heat Maps Explain Creativity

May 8, 2016

I know zero about semantics as practiced at big time universities. I know about the same when it comes to semantic search. With my background as a tabula rasa, I read “A Semantic Map for Evaluating Creativity.” According to the write up:

We present a semantic map of words related with creativity. The aim is to empirically derive terms which can be used to rate processes or products of computational creativity. The words in the map are based on association studies per for med by human subjects and augmented with words derived from the literature (based on human raters).

After considerable text processing and a dose of analytics, the paper states:

… There is an overlap in the set of words formed by the two methods, but there are also some differences. Further investigations could reveal how these methods are related and if they are both needed (as complements) to arrive at more objective procedures for the evaluation of computational (and human) creativity.

I await a mid tier consulting firm’s for fee study about the applications of this technology in determining which companies are creative. And what about government use cases; for example, which entry lever professional is most creative. Then there are academic applications; for instance, which professors are their most creative. Creative folks can create creative ways to understand creativity. Stay tuned.

Stephen E Arnold, May 8, 2016

Mouse Movements Are the New Fingerprints

May 6, 2016

A martial artist once told me that an individual’s fighting style, if defined enough, was like a set of fingerprints.  The same can be said for painting style, book preferences, and even Netflix selections, but what about something as anonymous as a computer mouse’s movement?  Here is a new scary thought from PC & Tech Authority: “Researcher Can Indentify Tor Users By Their Mouse Movements.”

Juan Carlos Norte is a researcher in Barcelona, Spain and he claims to have developed a series of fingerprinting methods using JavaScript that measures times, mouse wheel movements, speed movement, CPU benchmarks, and getClientRects.   Combining all of this data allowed Norte to identify Tor users based on how they used a computer mouse.

It seems far-fetched, especially when one considers how random this data is, but

“’Every user moves the mouse in a unique way,’ Norte told Vice’s Motherboard in an online chat. ‘If you can observe those movements in enough pages the user visits outside of Tor, you can create a unique fingerprint for that user,’ he said. Norte recommended users disable JavaScript to avoid being fingerprinted.  Security researcher Lukasz Olejnik told Motherboard he doubted Norte’s findings and said a threat actor would need much more information, such as acceleration, angle of curvature, curvature distance, and other data, to uniquely fingerprint a user.”

This is the age of big data, but looking Norte’s claim from a logical standpoint one needs to consider that not all computer mice are made the same, some use lasers, others prefer trackballs, and what about a laptop’s track pad?  As diverse as computer users are, there are similarities within the population and random mouse movement is not individualistic enough to ID a person.  Fear not Tor users, move and click away in peace.

 

Whitney Grace, May 6, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

An Open Source Search Engine to Experiment With

May 1, 2016

Apache Lucene receives the most headlines when it comes to discussion about open source search software.  My RSS feed pulled up another open source search engine that shows promise in being a decent piece of software.  Open Semantic Search is free software that cane be uses for text mining, analytics, a search engine, data explorer, and other research tools.  It is based on Elasticsearch/Apache Solrs’ open source enterprise search.  It was designed with open standards and with a robust semantic search.

As with any open source search, it can be programmed with numerous features based on the user’s preference.  These include, tagging, annotation, varying file format support, multiple data sources support, data visualization, newsfeeds, automatic text recognition, faceted search, interactive filters, and more.  It has the benefit that it can be programmed for mobile platforms, metadata management, and file system monitoring.

Open Semantic Search is described as

“Research tools for easier searching, analytics, data enrichment & text mining of heterogeneous and large document sets with free software on your own computer or server.”

While its base code is derived from Apache Lucene, it takes the original product and builds something better.  Proprietary software is an expense dubbed a necessary evil if you work in a large company.  If, however, you are a programmer and have the time to develop your own search engine and analytics software, do it.  It could be even turn out better than the proprietary stuff.

 

Whitney Grace, May 1, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Expert System: Inspired by Endeca

April 23, 2016

Years ago I listened to Endeca (now owned by Oracle) extol the virtues of its various tools. The idea was that the tools made it somewhat easier to get Endeca up and running. The original patents for Endeca reveal the computational blender which the Endeca method required. Endeca shifted from licensing software to bundling consulting with a software license. Setting up Endeca required MBAs, patience, and money. Endeca rose to generate more than $120 million in revenues before its sale to Oracle. Today Endeca is still available, and the Endeca patents—particularly 7035864—reveal how Endeca pulled off its facets. Today Endeca has lost a bit of its spit and polish, a process that began when Autonomy blasted past the firm in the early 2000s.

Endeca rolled out its “studio” a decade ago. I recall that Business Objects had a “studio.” The idea behind a studio was to make the complex task of creating something an end user could use without much training. But the studio was not aimed at an end user. The studio was a product for a developer, who found the tortuous, proprietary methods complex and difficult to learn. A studio would unleash the developers and, of course, propel the vendors with studios to new revenue heights.

Studio is back. This time, if the information in “Expert System Releases Cogito Studio for Combining the Advantages of Semantic Technology with Deep Learning,” is accurate. The spin is that semantic technology and deep learning—two buzzwords near and dear to the heart of those in search of the next big thing—will be a boon. Who is the intended user? Well, developers. These folks are learning that the marketing talk is a heck of a lot easier than designing, coding, debugging, stabilizing, and then generating useful outputs is quite difficult work.

According to the Expert System announcement:

The new release of Cogito Studio is the result of the hard work and dedication of our labs, which are focused on developing products that are both powerful and easy to use,” said Marco Varone, President and CTO, Expert System. “We believe that we can make significant contributions to the field of artificial intelligence. In our vision of AI, typical deep learning algorithms for automatic learning and knowledge extraction can be made more effective when combined with algorithms based on a comprehension of text and on knowledge structured in a manner similar to that of humans.”

Does this strike you as vague?

Expert System is an Italian, high tech outfit, which was founded in 1989. That’s almost a decade before the Endeca system poked its moist nose into the world of search. Fellow travelers from this era include Fulcrum Technologies and ISYS Search Software. Both of these companies’ technology are still available today.

Thus, it makes sense that the idea of a “studio” becomes a way to chop away at the complexity of Expert System-type systems.

According to Google Finance, Expert System’s stock is trending upwards.

expert system share 4 17

That’s a good sign. My hunch is that announcements about “studios” wrapped in lingo like semantics and Big Data are a good thing.

Stephen E Arnold, April 23, 2016

Next Page »