Gleaning Insights and Advantages from Semantic Tagging for Digital Content

September 22, 2016

The article titled Semantic Tagging Can Improve Digital Content Publishing on Aptara Corp. blog reveals the importance of indexing. The article waves the flag of semantic tagging at the publishing industry, which has been pushed into digital content kicking and screaming. The difficulties involved in compatibility across networks, operating systems, and a device are quite a headache. Semantic tagging could help, if only anyone understood what it is. The article enlightens us,

Put simply, semantic markups are used in the behind-the-scene operations. However, their importance cannot be understated; proprietary software is required to create the metadata and assign the appropriate tags, which influence the level of quality experienced when delivering, finding and interacting with the content… There have been many articles that have agreed the concept of intelligent content is best summarized by Ann Rockley’s definition, which is “content that’s structurally rich and semantically categorized and therefore automatically discoverable, reusable, reconfigurable and adaptable.

The application to the publishing industry is obvious when put in terms of increasing searchability. Any student who has used JSTOR knows the frustrations of searching digital content. It is a complicated process that indexing, if administered correctly, will make much easier. The article points out that authors are competing not only with each other, but also with the endless stream of content being created on social media platforms like Facebook and Twitter. Publishers need to take advantage of semantic markups and every other resource at their disposal to even the playing field.

Chelsea Kerwin, September 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

Alphabet Google Faces a Secret Foe

September 21, 2016

I thought indexing the world’s information made it possible to put together disparate items of information. Once assembled, these clues from the world of real time online content would allow a person with access to answer a business question.

Apparently for Alphabet Google it faces a secret foe. I learned this by reading “Secretive Foe Attacks Google over Government Influence.” I learned:

Google has come under attack by a mysterious group that keeps mum about its sponsors while issuing scathing reports about the Mountain View search giant’s influence on government.

The blockbuster write up reported:

So far, only Redwood Shores-based Oracle has admitted to funding the Transparency Project, telling Fortune it wanted the public to know about its support for the initiative.

Yikes, a neighbor based at the now long gone Sea World.

The outfit going after the lovable Alphabet Google thing is called the Transparency Group. The excited syntax of the write up told me:

The Transparency Project commenced hostilities against Google in April, gaining national media attention with a report tracking the number of Googlers taking jobs in the White House and federal agencies, and the number of federal officials traveling in the other direction, into Google. Project researchers reported 113 “revolving door” moves between Google — plus its associated companies, law firms and lobbyists — and the White House and federal agencies.

Okay, but back to my original point. With the world’s information at one’s metaphorical fingerprints, is it not possible to process email, Google Plus, user search histories, and similar data laden troves for clues about the Transparency Group?

Perhaps the Alphabet Google entity lacks the staff and software to perform this type of analysis? May I suggest a quick telephone call to Palantir Technologies. From what I understand by reading open source information about the Gotham product, Palantir can knit together disparate and fragmented data and put the members of the Transparency Group on the map in a manner of speaking.

I understand the concept of finding fault with a near perfect company. But the inability of a search giant to find out who, what, when, where, what, how, and why baffles me.

It does not, as an old school engineer with a pocket protector might say, compute.

Stephen E Arnold, September 14, 2016

Why European Start Ups Are Non Starters at Scale

September 17, 2016

I read an interesting and probably irritating article “Why European Startups Fail to Scale.” I was sufficiently intrigued with the premise of the essay to send it to some executives at European start ups which have failed to scale. Nota bene: None of these managers wrote me back which suggests that the content of the article was not germane to their firms’ commercial success.

I learned from the article:

European startups fail to recognize that when they expand to a new market they have to adjust themselves to the rules, standards and requirements of that specific market.

Interesting idea. I have noticed in my own experience that companies from some countries struggle when they try to sell their search systems to the US government. The procurement process and some of the regulations make no sense. What’s interesting is that in some European countries one must have a receipt for utilities before being able to rent an apartment makes perfect sense. The notion that a software vendor’s code must be verified to be backdoor free makes zero sense to European vendors who want to take money from the US government.

The write up points out:

No matter if the startup was located in Western, Central, or Eastern Europe somehow most people did not understand that there could be fundamental differences between themselves and consumers inside this new market they were planning to enter.

How does one address this issue? The write up offers some suggestions; for example:

you need to optimize your product for your new markets.

Seems obvious. Another tip is that the company trying to cash in on the exciting US market should have a value proposition and pricing scheme suitable for the savvy American buyer.

The US, unlike some countries, is big. It is, therefore, expensive to advertise “on social media or search engines.”

Whereas a lot of B2C companies in Eastern Europe are talking about Euro cents, in the US a click might cost several Dollars.

The idea I highlighted in grammar gray was:

text is far more important. Whereas Europeans are lenient to typo’s or faulty grammar, Americans are not and expect to be addressed in the catchiest way possible.

How have search engines from Europe managed in the US market? Let me highlight several examples from my historical archives:

  • Antidot. Announced a footprint in San Francisco a couple of years ago. The traces of the company are faint.
  • Autonomy. Sold to HP for $11 billion after more than a decade in business. Since the sale, Autonomy has been a legal and M&A football engaged in continuous knock abouts
  • Fast Search & Transfer. The founder ended up in legal hot water because of some tiny math errors resulting in allegedly misstating revenue. Microsoft ignored these gaffes and paid $1.2 billion for the system.
  • Exalead. Made a splash and ended up selling to Dassault. Largely invisible in the US market after a run at the US government market and the usual commercial targets.
  • Pertimm. Dabbled in the US market and ended up forging a deal with a European company for a Euro centric search system.
  • Sinequa. Announced a push into the US a year or two ago. No one seemed to notice.

At this time, the major success seems to be Elastic, the open source search vendor. One assumes that the European search vendors who have failed to gain traction in the US market would emulate this firm. But if a European search vendor does not acknowledge that Elastic is doing something that works, why change?

Some European search vendors and “experts” are pitching governance and indexing. These are two market segments which strike me as either difficult to sell or very narrow. Change and sustainable may be difficult to achieve regardless of the lipstick applied for the theater of marketing.

Stephen E Arnold, September 17, 2016

Enterprise Search: Pool Party and Philosophy 101

September 8, 2016

I noted this catchphrase: “An enterprise without a semantic layer is like a country without a map.” I immediately thought of this statement made by Polish-American scientist and philosopher Alfred Korzybski:

The map is not the territory.

When I think about enterprise search, I am thrilled to have an opportunity to do the type of thinking demanded in my college class in philosophy and logic. Great fun. I am confident that any procurement team will be invigorated by an animated discussion about representations of reality.

I did a bit of digging and located “Introducing a Graph-based Semantic Layer in Enterprises” as the source of the “country without a map” statement.

What is interesting about the article is that the payload appears at the end of the write up. The magic of information representation as a way to make enterprise search finally work is technology from a company called Pool Party.

Pool Party describes itself this way:

Pool Party is a semantic technology platform developed, owned and licensed by the Semantic Web Company. The company is also involved in international R&D projects, which continuously impact the product development. The EU-based company has been a pioneer in the Semantic Web for over a decade.

From my reading of the article and the company’s marketing collateral it strikes me that this is a 12 year old semantic software and consulting company.

The idea is that there is a pool of structured and unstructured information. The company performs content processing and offers such features as:

  • Taxonomy editor and maintenance
  • A controlled vocabulary management component
  • An audit trail to see who changed what and when
  • Link analysis
  • User role management
  • Workflows.

The write up with the catchphrase provides an informational foundation for the company’s semantic approach to enterprise search and retrieval; for example, the company’s four layered architecture:

image

The base is the content layer. There is a metadata layer which in Harrod’s Creek is called “indexing”. There is the “semantic layer”. At the top is the interface layer. The “semantic” layer seems to be the secret sauce in the recipe for information access. The phrase used to describe the value added content processing is “semantic knowledge graphs.” These, according to the article:

let you find out unknown linkages or even non-obvious patterns to give you new insights into your data.

The system performs entity extraction, supports custom ontologies (a concept designed to make subject matter experts quiver), text analysis, and “graph search.”

Graph search is, according to the company’s Web site:

Semantic search at the highest level: Pool Party Graph Search Server combines the power of graph databases and SPARQL engines with features of ‘traditional’ search engines. Document search and visual  analytics: Benefit from additional  insights through interactive visualizations of reports and search results derived from your data lake by executing sophisticated SPARQL queries.

To make this more clear, the company offers a number of videos via YouTube.

The idea reminded us of the approach taken in BAE NetReveal and Palantir Gotham products.

Pool Party emphasizes, as does Palantir, that humans play an important role in the system. Instead of “augmented intelligence,” the article describes the approach methods which “combine machine learning and human intelligence.”

The company’s annual growth rate is more than 20 percent. The firm has customers in more than 20 countries. Customers include Pearson, Credit Suisse, the European Commission, Springer Nature, Wolters Kluwer, and the World Bank and “many other customers.” The firm’s projected “Euro R&D project volume” is 17 million (although I am not sure what this 17,000,000 number means. The company’s partners include Accenture, Complexible, Digirati, and EPAM, among others.

I noted that the company uses the catchphrase: “Semantic Web Company” and the catchphrase “Linking data to knowledge.”

The catchphrase, I assume, make it easier for some to understand the firm’s graph based semantic approach. I am still mired in figuring out that the map is not the territory.

Stephen E Arnold, September 8, 2016

Machine Learning Search Algorithms Reflect Female Stereotypes

August 26, 2016

The article on MediaPost titled Are Machine Learning Search Algorithms To Blame for Stereotypes? poses a somewhat misleading question about the nature of search algorithms such as Google and Bing in the area of prejudice and bias. Ultimately they are not the root, but rather a reflection on their creators. Looking at the images that are returned when searching for “beautiful” and “ugly” women, researchers found the following.

“In the United States, searches for “beautiful” women return pictures that are 80% white, mostly between the ages of 19 and 28. Searches for “ugly” women return images of those about 60% white and 20% black between the ages of 30 to 50. Researchers admit they are not sure of the reason for the bias, but conclude that they may stem from a combination of available stock photos and characteristics of the indexing and ranking algorithms of the search engines.”

While it might be appealing to think that machine learning search algorithms have somehow magically fallen in line with the stereotypes of the human race, obviously they are simply regurgitating the bias of the data. Or alternately, perhaps they learn prejudice from the humans selecting and tuning the algorithms. At any rate, it is an unfortunate record of the harmful attitudes and racial bias of our time.

Chelsea Kerwin, August 26, 2016

Yippy Revealed: An Interview with Michael Cizmar, Head of Enterprise Search Division

August 16, 2016

In an exclusive interview, Yippy’s head of enterprise search reveals that Yippy launched an enterprise search technology that Google Search Appliance users are converting to now that Google is sunsetting its GSA products.

Yippy also has its sights targeting the rest of the high-growth market for cloud-based enterprise search. Not familiar with Yippy, its IBM tie up, and its implementation of the Velocity search and clustering technology? Yippy’s Michael Cizmar gives some insight into this company’s search-and-retrieval vision.

Yippy ((OTC PINK:YIPI) is a publicly-trade company providing search, content processing, and engineering services. The company’s catchphrase is, “Welcome to your data.”

The core technology is the Velocity system, developed by Carnegie Mellon computer scientists. When IBM purchased Vivisimio, Yippy had already obtained rights to the Velocity technology prior to the IBM acquisition of Vivisimo. I learned from my interview with Mr. Cizmar that IBM is one of the largest shareholders in Yippy. Other facets of the deal included some IBM Watson technology.

This year (2016) Yippy purchased one of the most recognized firms supporting the now-discontinued Google Search Appliance. Yippy has been tallying important accounts and expanding its service array.

image

John Cizmar, Yippy’s senior manager for enterprise search

Beyond Search interviewed Michael Cizmar, the head of Yippy’s enterprise search division. Cizmar found MC+A and built a thriving business around the Google Search Appliance. Google stepped away from on premises hardware, and Yippy seized the opportunity to bolster its expanding business.

I spoke with Cizmar on August 15, 2016. The interview revealed a number of little known facts about a company which is gaining success in the enterprise information market.

Cizmar told me that when the Google Search Appliance was discontinued, he realized that the Yippy technology could fill the void and offer more effective enterprise findability.  He said, “When Yippy and I began to talk about Google’s abandoning the GSA, I realized that by teaming up with Yippy, we could fill the void left by Google, and in fact, we could surpass Google’s capabilities.”

Cizmar described the advantages of the Yippy approach to enterprise search this way:

We have an enterprise-proven search core. The Vivisimo engineers leapfrogged the technology dating from the 1990s which forms much of Autonomy IDOL, Endeca, and even Google’s search. We have the connector libraries THAT WE ACQUIRED FROM MUSE GLOBAL. We have used the security experience gained via the Google Search Appliance deployments and integration projects to give Yippy what we call “field level security.” Users see only the part of content they are authorized to view. Also, we have methodologies and processes to allow quick, hassle-free deployments in commercial enterprises to permit public access, private access, and hybrid or mixed system access situations.

With the buzz about open source, I wanted to know where Yippy fit into the world of Lucene, Solr, and the other enterprise software solutions. Cizmar said:

I think the customers are looking for vendors who can meet their needs, particularly with security and smooth deployment. In a couple of years, most search vendors will be using an approach similar to ours. Right now, however, I think we have an advantage because we can perform the work directly….Open source search systems do not have Yippy-like content intake or content ingestion frameworks. Importing text or an Oracle table is easy. Acquiring large volumes of diverse content continues to be an issue for many search and content processing systems…. Most competitors are beginning to offer cloud solutions. We have cloud options for our services. A customer picks an approach, and we have the mechanism in place to deploy in a matter of a day or two.

Connecting to different types of content is a priority at Yippy. Even through the company has a wide array of import filters and content processing components, Cizmar revealed that Yippy is “enhanced the company’s connector framework.”

I remarked that most search vendors do not have a framework, relying instead on expensive components licensed from vendors such as Oracle and Salesforce. He smiled and said, “Yes, a framework, not a widget.”

Cizmar emphasized that the Yippy IBM Google connections were important to many of the company’s customers plus we have also acquired the Muse Global connectors and the ability to build connectors on the fly. He observed:

Nobody else has Watson Explorer powering the search, and nobody else has the Google Innovation Partner of the Year deploying the search. Everybody tries to do it. We are actually doing it.

Cizmar made an interesting side observation. He suggested that Internet search needed to be better. Is indexing the entire Internet in Yippy’s future? Cizmar smiled. He told me:

Yippy has a clear blueprint for becoming a leader in cloud computing technology.

For the full text of the interview with Yippy’s head of enterprise search, Michael Cizmar, navigate to the complete Search Wizards Speak interview. Information about Yippy is available at http://yippyinc.com/.

Stephen E Arnold, August 16, 2016

The Less Scary Applications of Artificial Intelligence: Computer Vision

August 3, 2016

The article on The Christian Science Monitor titled Shutterstock’s Reverse Image Search Promises a Gentler Side of AI provides a glimpse into computer vision, or the way a computer assesses and categorizes any image into its parts. Shutterstock finds that using machine learning to find other images similar to the first is a vast improvement, because rather than analyzing keywords, AI analyzes the image directly based on exact colors and shapes. The article states,

“That keyword data, while useful for indexing images into categories on our site, wasn’t nearly as effective for surfacing the best and most relevant content,” says Kevin Lester, vice president of engineering at the company, in a blog post. “So our computer vision team worked to apply machine learning techniques to reimagine and rebuild that process.”

The neural network has now examined 70 million images and 4 million video clips in its collection.”

In addition, the company plans to expand the search feature to videos as well as images. Jon Oringer, CEO and founder of Shutterstock, has a vision of endless possibilities for this technology. The article points out that this is one of the clearly positive effects of AI, which gets a bad rap, perhaps not unfairly, given the potential for autonomous weapons and commercial abuse. So by all means, let’s use AI to recognize a cat, like Google, or to analyze images.

 

Chelsea Kerwin, August 3, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Weakly Watson: The Possibilities Are Limitless

July 31, 2016

Hyperbole? Nah, just another fascinating chunk of content marketing by IBM, the proud owner of Watson. You know Watson. The “system” consisting of goodies from open source, acquisitions, and home brew IBM code.

Navigate to “It’s Elementary, Says (IBM) Watson!” The write up shouts:

Given such abilities, the possibilities of what IBM Watson can do in every industry, are limitless!

 

The possibilities, enumerated below, contain hashtags to make certain that the word diffuses through hashtaggy social media channels. I bet those Pokémon Go players are thrilled to get these items in their “news” stream too. The possibilities are:

  • Send Watson to school. This is a nice way of saying that one must create valid training sets. Then the training sets are provided to the content processing system, the results verified, and then the intake process tuned. Does this sound like Autonomy IDOL’s method? It sure does. Plus, it is an expensive and time consuming process when done with rigor. Take a short cut and the system goes off the rails.
  • Oversee Watson’s study. Yep, this is fine tuning, and it involves humans, who want money, time off, benefits, and managerial love. Is this expensive? Yep.
  • Getting a grip on things. Now this is a possibility which makes the others in this list appear to be semi coherent. Watson uses “artificial intelligence” to “understand” what’s being said in text entering the system.  Okay, I think this means Watson is now indexing content in a useful manner. Isn’t that what IBM iPhrase purported to do a decade ago?
  • Solve complex problems in a real world. Okay, now we are getting something. What does Watson suggest to IBM, a company which has reported more than four years of declining revenue? What? I did not hear the answer.
  • Learning from experience. I think this means that as Watson solves real world problems like IBM’s declining revenues, Watson bets “better.” How long will stakeholders wait? Yahoo’s stakeholders became unsettled and look what happened? Fire sale at a fraction of what Microsoft offered a few years ago.

I am not convinced about the logic of the write up nor about the “endless possibilities” Watson creates. I am more inclined to think about Amazon, Facebook, and Google as big companies likely to deliver results from smart software. What’s not to like about Amazon drones in the UK, Facebook filtering Wikipedia content, and Google solving death. Smart stuff is everywhere. One doesn’t need Sherlock Holmes to figure this out.

Stephen E Arnold, July 31, 2016

Stephen E Arnold,

Alphabet Google Is Busy Reinventing

July 22, 2016

From Forbes in India (“Sundar Pichai to Reinvent Google with a Heavy Dose of Artificial Intelligence” which may require a proxy maneuver due to the digitally with it Forbes) or Switzerland (“Google’s New Research Lab in Zurich Is Inventing the Future of Search”) — the Alphabet Google thing is trying to reinvent search.

There you go: Stark evidence that Google information retrieval system is deeply flawed. The electric car does not reinvent the car. But search has to reinvent search.

This is a big and probably futile job. My view is that search is an evolutionary beastie. Incremental innovations from research labs, one man band coders, and start ups with one good idea and couple of crazed investors do the job.

Google itself was a roll up of ideas from IBM Almaden (hell, Jon Kleinberg), AltaVista (hello, Jeff Dean, Simon Tong, and Sanjay Ghemawat), and the fumble bumbles of folks at precursors (hello, AskJeeves and Lycos).

The India angle states:

Think of it as Search 3.0—a new, interactive way to communicate with Google itself. With it you’ll be able to order a ticket, book a flight, play music, schedule a task, reply to a message; the Google assistant might even write it for you. It might prompt you to order flowers ahead of Mother’s Day or to pack for your upcoming trip, and it might be able to pick up an earlier conversation from where you left off. In other words, it will be there, ready to help, in your phone, your speakers, your television, your car, your watch and eventually everywhere. “You are trying to go about your day, and in an ambient way, things are there to help you,” Pichai says. Making sure this assistant lives up to its full potential will take years, and building it will be harder than it was for Page and co-founder Sergey Brin to create search itself. Adds Pichai: “In every dimension, it is more ambitious.”

Yep, ambitious.

From the Swiss side:

he new team has a distinct goal: to invent the future of Search, a voice-activated, human-like entity that can answer any query intelligently. “We are building the ultimate assistant. In two years, you can expect Google to become a personal life assistant across multiple surfaces, including your phone, Google Home, even cars,” Mogenet [Google wizard] said. Some of Google’s best-known products are already shaped by machine learning, the ability of computers to spot patterns in large datasets and learn by example. For instance, Google Photos uses it to understand the content of an image. This means you could search for “cardigan corgi” or “passport” or “birthday celebrations 2014” and the app will bring up the relevant photos.

There you go. Reinvent.

The challenge is to find a way to avoid the stagnation which seems to befall certain types of high technology outfits. Do you use your DEC Rainbow today?

I love the Google. It is just super. The problem is that as it has concentrated traffic, it has left itself unable to respond to opportunities such as those identified by Facebook and Amazon. By the way, both of these outfits face some challenges as well.

The investment in search will benefit some folks. But how likely is it that Google will come up with an “innovation” that matters. I think that when octopus companies do something — whether it is good or bad — it is easy to define whatever happens as success.

The problem is that information returned from Google is often off point. When I run queries for documents I have in my hand, I cannot find them without jumping through hoops. I documented this with a Dark Web paper from Denmark in this blog. Homonyms give the Google fits. Even though my search history is available to Mother Google, the system is tone deaf for my queries. When I look for certain information, the data are often disappeared. I noticed that indexing of pastesites, PDF files, and PowerPoint presentations has become laughable.

Innovation is more than a public relations campaign. How do I know? Google’s marketing is starting to remind me of IBM Watson. You know Watson, the revolutionary information access system from Big Blue. Yep, innovation.

Stephen E Arnold, July 22, 2016

Google and Song Lyrics

July 13, 2016

I love the results I get for pop stars, TV shows, and binge watching. To feed the curious minds of online researchers, Google has upped the ante. “Google Licenses LyricFind for Search Results” reports that Google has addressed its miserable search systems for the words in tunes. Consider this lyric:

“My wrist deserve a shout out, I’m like “what up, wrist’?
My stove deserve a shout out, I’m like “what up, stove’?”

According to the write up:

A query for the lyrics to a specific song will pull up the words to much of that song, freeing users from having to click through to another website. Google rolled out the lyrics feature in the U.S. today (June 27), though it has licenses to display the lyrics internationally as well.

I am definitely thrilled. Why worry about the indexing of PowerPoints, PDFs, and other content when I have access to the source of:

I’m that red bull, now let’s fly away.

What’s really flown away? Rag mop.

Stephen E Arnold, July 13, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta