June 10, 2015
I read “Understanding Semantic Search.” I had high hopes. The notion of Semantic Search as set forth by Tim Bray, Ramanathan Guha, and some other wizards years ago continues to intrigue me. The challenge has been to deliver high value outputs that generate sufficient revenue to pay for the plumbing, storage, and development good ideas can require.
I spent considerable time exploring one of the better known semantic search systems before the company turned off the lights and locked its doors. Siderean Software offered its Seamark system which could munch on triples and output some quite remarkable results. I am not sure why the company was not able to generate more revenue.
The company emphasized “discovery searching.” Vivisimo later imitated Siderean’s user input feature. The idea is that if a document required an additional key word, the system accepted the user input and added the term to the index. Siderean was one of the first search vendors to suggest that “graph search” or relationships would allow users to pinpoint content processed by the system. In the 2006-2007 period, Siderean indexed Oracle text content as a demonstration. (At the time, Oracle had the original Artificial Linguistics’ technology, the Oracle Text function, Triple Hop, and PL/SQL queries. Not surprisingly, Oracle did not show the search acquisition appetite the company demonstrated a few years later when Oracle bought Endeca’s ageing technology, the RightNow Netherlands-originated technology, or the shotgun marriage search vendor InQuira.)
I also invested some time on behalf of the client in the semantic inventions of Dr. Ramanathan Guha. This work was summarized in Google Version 2.0, now out of print. Love those print publishers, folks.
Dr. Guha applied the features of the Semantic Web to plumbing which, if fully implemented, would have allowed Google to build a universal database of knowledge, serve up snippets from a special semantic server, and perform a number of useful functions. This work was done by Dr. Guha when he was at IBM Almaden and at Google. My analysis of Dr. Guha’s work suggests that Google has more semantic plumbing than most observers of the search giant notice. The reason, I concluded, was that semantic technology works behind the scenes. Dragging the user into OWL, RDF, and other semantic nuances does not pay off as well as embedding certain semantic functions behind the scenes.
In the “Understanding Semantic Search” write up, I learned that my understanding of semantic search is pretty much a wild and crazy collection of half truths. Let me illustrate what the article presents as the “understanding” function for addled geese like me.
- Searches have a context
- Results can be local or national
- Entities are important; for example, the White House is different from a white house
So far, none of this helps me understand semantic search as embodied in the 3WC standard nor in the implementation of companies like Siderean or the Google-Guha patent documents from 2007 forward.
The write up makes a leap from context to the question, “Are key words still important?”
From that question, the article informs me that I need to utilize schema mark up. These are additional code behinds which provide information to crawlers and other software about the content which the user sees on a rendering device.
And that’s it.
So let’s recap. I learned that context is important via illustrations which show Google using different methods to localize or personalize content. The write up does not enumerate the different methods which use browser histories, geolocation, and other signals. The write up then urges me to use additional mark up.
I think I will stick with my understanding of semantics. My work with Siderean and my research for an investment bank provided a richer base of knowledge about the real world applications of semantic technology. Technology, I wish to point out, which can be computationally demanding unless one has sufficient resources to perform the work.
What is happening in this “Understanding Semantic Search” article is an attempt to generate business for search engine optimization experts. Key word stuffing and doorway pages no longer work very well. In fact, SEO itself is a problem because it undermines precision and recall. Spoofing relevance is not my idea of a useful activity.
For those looking to semantics to deliver Google traffic, you might want to invest the time and effort in creating content which pulls users to you.
Stephen E Arnold, June 9, 2015
June 7, 2015
Okay, schema markup and microformatting. These are, according to the headline, one thing.
I am probably off base here in Harrod’s Creek, but I thought:
- Schema markup. Google’s explanation is designed to help out the GOOG, not the user. The methods of Guha and Halevy have proven difficult to implement. The result is a Googley move: Have the developers insert data into Web pages. Easy. Big benefit for Google too.
- Microformatting. A decade old effort to add additional information to a Web page. You can find examples galore at http://microformats.org/.
I am not very good at math, but it sure seems to me that these are two different processes.
But the burr under my blanket is that one cannot apply anything unless there is something written or displayed on a Web page. Therefore, these two additions to a Web page’s code cannot be the first thing. Tagging can occur after something has been written or at the same time when the writing is done with a smart input system.
The notion that these rather squishy logical mistakes occur in the headline did not rev my engine when I worked through the 1,800 words in the article. The assumption in the write up is that a reader wants to create an ecommerce site which garners a top Google result. The idea is that one types in a key word like “cyberosint” and the first hit in the result list points to the ecommerce page.
The hitch in the git along is that more queries are arriving from mobile devices. The consequence of this is that the mobile system will be filtering content and displaying information which the system calculates as important to the user.
I don’t want to rain on the semanticists’ parade, nor do I want to point out that search engine optimization is pretty much an undrinkable concoction of buzz words, jargon, and desperation.
Here’s one of the passage in the write up that I marked and inked a blue exclamation point in the margin of my print out:
Within Search Engine Optimization, many businesses focus on keywords, phrases, and search density as a way of sending clues to search engines that they should be known for those things. But let’s look at it from the human side: how can we make sure that our End User makes those associations? How can we build that Brand Association and Topical Relevance to a human being? By focusing our content strategy and providing quality content curation.
Well, SEO folks, I am not too keen on brand associations and I am not sure I want to be relevant to another human being. Whether I agree or not, the fix is not to perform these functions:
- Bridges of association
- Social listening
- Quality reputation (a method used I might add on the Dark Web)
This smoothie is a mess.
There are steps a person with a Web page can take to communicate with spiders and human readers. I am not sure the effort, cost, and additional page fluff are going to work.
Perhaps the semanticists should produce something other than froth? To help Google, write and present information which is clear, concise, and consistent just like in junior high school English class.
Stephen E Arnold, June 7, 2015
May 26, 2015
I am just too old and cranky to get with the search engine optimization program. If a person cannot find your content, too bad. SEO has caused some of the erosion of relevance across public Web search engines.
The reason is that pages with lousy content are marketed as having other, more valuable content. The result is queries like this:
I want information about methods of digital reasoning. What I get is a company profile.
How do I get information for my specific requirement? I have to know how to work around the problems SEO puts in my face every day, over and over again.
This query works on Bing, Google, and Yandex: artificial intelligence decision procedures.
The results do not point to a small company in Tennessee, but to substantive documents from which other, pointed queries can be launched for individuals, industry associations, and methods.
When I read “Semantic Search Strategies That Work,” I became agitated. The notion of “forgetting about content” and “focusing on quality” miss the mark. Telling me to “spend time on engagement” are a collection of unrelated assertions.
The goal of semantics for SEO is to generate traffic. The search systems suck in shaped content and persist in directing people to topics that may have little or nothing to do with the information a person needs to solve his or her problem.
In short, the bastardization of semantics in the name of SEO is ensuring that some users will define the world from the point of view of marketing, not objective information.
What’s the fix?
Here’s the shocker: There is no fix. As individuals abrogate their responsibility to demand high value, on point results, schlock becomes the order of the day.
So much for clear thinking. Semantic strategies that erode relevance do not “work” from my point of view. This type of semantics thickens the cloud of unknowning.
Stephen E Arnold, May 26, 2015
May 12, 2015
If you want to keep Mother Google happy, you want to feed her content which is mobile friendly, meets her rules for her children, and delivers bang up information to her many minions.
Stephen E Arnold, May 12, 2015
May 7, 2015
Lightcrest seems to want to be a major player in the enterprise search market. Recently the company’s senior management has posted links to LinkedIn enterprise search discussion groups. The president is Zach Fierstadt, and I wanted to read some of this other contributions to the search and content processing discussions I follow.
The Metaphors Used to Sell Search in the Cloud
I read “Cloud Nine Is a Private Cloud.” To me, Cloud Nine evokes a somewhat imprecise connotation; specifically, “heaven” and “a utopia of pleasure.” The notion of a utopia of pleasure makes me uncomfortable because promising wondrous outcomes from jargonized technology often comes to no good end.
The Urban Dictionary’s word cloud for Cloud Nine exacerbates my discomfort:
How do pleasure and technology link in hosted search services. Here’s a definition of pleasure from Google.
I noted that the word is used or intended for entertainment rather than business. “pleasure boats”. I immediately think of Caligula’s Lake Nemi ships, the Gary Hart vessel Monkey Business, and the Xoogler’s death by heroin yacht Escape. Let me say that I am not calmed by how my mind relates to metaphors of pleasure and information access.
Now let’s look at the article “Cloud Nine Is a Private Cloud,” which is at this link, http://www.lightcrest.com/blog/2015/04/cloud-nine-is-a-private-cloud/. The author is Zach Fierstadt, who asserts:
Most public cloud providers are not tuned to provide you with full-stack support, including things like DevOps services and caching best-practices. This cost haunts CTOs in the form of sprawling staff requirements, whereby operational staff required to support a 24x7x365 operation grows as the infrastructure on the public cloud grows.
None of these references evoke any pleasure. I noodled over the reference to “DevOps,” which is a neologism. Like much jargon, the word “DevOps” blurs the distinction between two perfectly useful terms: Developers and Operations.
Hosting companies in general and Lightcrest in particular can, as I understand it, make a DevOp’s life into a digital utopia. Mr. Fierstadt writes:
The growth of private and hybrid cloud solutions is indicative of CIOs and CTOs realizing the economic benefits and performance optimizations associated with sophisticated cloud orchestration layered on top of single-tenant hardware. As your workloads and storage requirements grow, make sure your costs don’t blow your budget – and be sure to consider long-term alternatives that allow you to focus on your core business initiatives, and not on cloud operations or cloud economics.
Now this sounds pretty darned good. I like the parental tone and parental rhetoric of “make sure” and attendant sentence structure as well. When I was in college, I knew one student who thought any polysyllabic stream of nonsense was the stuff of his Technicolor dreams. For me, references to sophisticated, optimizations, workloads, costs, core business initiatives, etc. is a substitute for facts, thought provoking commentary, and useful information. Lightcrest offers my hungry mind thin gruel.
Lightcrest’s Alleged Expertise
I did some poking around on the Lightcrest Web site and learned that when the verbiage is parsed, the company does a couple of things. These are:
Before I could see the sun through the psychedelic cloud of marketing silliness, I learned that Lightcrest has expertise in the following search and content processing systems. You can find the list at this link. Lightcrest, the Cloud Nine technology operation, can provide “expertise” for:
- Document management search
- eCommerce search
- Intranet search
- Web indexing.
When it comes to expertise which means skill or knowledge in a particular field, Lightcrest makes other search centric outfits a bit like also rans. Please, check out this collection of systems which the Cloud Nine organization can make bark, sit, roll over, and fetch the newspaper:
- Attivio enterprise search
- Autonomy and Verity. (I thought that Hewlett Packard had moved Autonomy to the cloud and repositioned it as something other than enterprise search. I am confused.)
- Custom indexers and support. (What is a custom indexer? Does Lightcrest have proprietary crawling, parsing, and querying technology? Isn’t that important? Doesn’t an outfit with gargantuan expertise have a fact sheet about these functions?)
- Endeca search and business intelligence. (Isn’t Oracle the owner of Endeca? Why is Endeca separate from Oracle? What happened to Endeca as an eCommerce search system? I must be senile.)
- LucidWorks (Really?)
- Microsoft Fast ESP (Enterprise Search Platform) and FDS 4.x. (which I thought was shorthand for Fire Dynamics Simulator. Shows how little search expertise I have.)
- Oracle Enterprise Search (Is this Secure Enterprise Search, Oracle Text, or functionality from InQuira, TripleHop, or RightNow? No matter. Expertise is easy to say, but I think it might be slightly more difficult to deliver.)
- Solr, Lucene, Nutch, Mahout, and Hadoop. (Are Mahout and Hadoop software delivering functions other than enterprise information retrieval? )
- Sphinx and MySQL full text searching.
Frankly I have grave doubts about this organization’s expertise in these areas. I have several reasons:
First, the odd ball mix of search systems mixes apples and quite old oranges. The square pegs are not in the square spaces. Round pegs sit precariously in the gaps designed for squares.
Also, the logic of the listing of these search engines defies me. I thought Mahout was software for machine learning and data mining, not information retrieval. How does one support and host software which is difficult to obtain from its owners of the intellectual property like Fast ESP or Verity?
The reference to “custom indexers” is interesting. Is Lightcrest able to index the Deep Web like BrightPlanet or like Recorded Future and its monitoring of Tor exit nodes? I wonder if Lightcrest has comparable technical horsepower for this type of work? Based on my experience with BrightPlanet and Recorded Future, I would suggest that Lightcrest is nosing into quite rarified territory without setting forth credentials which give me confidence in the company’s ability to deliver. What exactly are “custom indexers”? Am I able to apply these to a list of Tor sites and cross tabulate retrieved data with targeted clear Web crawls?
In my opinion and without evidence, facts, and concrete examples, the Lightcrest assertions are search engine optimization outputs.
The CEO as a Thought Leader
At least in the LinkedIn enterprise search “space,” Zach Fierstadt has attracted modest attention with his one sentence link only posts. Mr. Fierstadt wrote a non search related article in 2003 labeled “10G Matures” for Computerworld. He has a brief profile or “entry” in Google Plus, Zoom Info, and Stocktwits and a number of other social media sites. He made this statement in a 2010
“Look, there a lot of search solutions out there; but few cut the mustard when it comes to delivering sub-second performance at a reasonable price point. Lucene/Solr is the only platform that gives us the economy of scale needed to provide enterprise-grade search within our hosting model. By leveraging our expertise in deploying search within the enterprise, Lightcrest will be able to provide search solutions to smaller and mid-sized businesses that currently find proprietary platforms to be cost prohibitive.
What’s up with Lightcrest? Lightcrest walks gently, almost as if the company were weightless and massless. Maybe content marketing or just social media shot gunning? The company’s blog archives reveal marketing activities in September 2013 and then gaps in the content flow until January 2014, September 2014, December 2014, and the recent efflorescence of marketing oriented posts.
Bottom Line: Mass or Massless
Net net: Lightcrest may answer the question, “Is light a particle or a wave?” From what I understand about this company, there is most hand waving.
Stephen E Arnold, May 7, 2015
April 20, 2015
I find the advice of experts interesting. When I worked at Halliburton Nuclear, there was an engineer who knew about “everything.” The person was supposed to be an expert in biology, water, nuclear physics, and, of course, math. I recall the person was bright, but his confidence exceeded his mental baggage compartment.
When I encounter experts without the background this pontificator of yore had, I wonder if the big luggage and tiny cart idiosyncrasy is operating. You be the judge. Navigate to “8 Awesome SEO Secrets from the Experts.” A word about whether the advice is good or not: If these experts had secrets which worked, wouldn’t these folks be household names?
Just a question. When it comes to getting a Web page to light up the Google search results, the folks in the European Commission have a suspicion that Google puts its hand on the rudder of results ranking. The notion that eight experts can fiddle the results which Google may steer to some degree if the allegations are correct raises the question, “Okay, who controls results?” I will leave the answer to you as you read the write up.
Herewith are the secrets from the experts, or, I should say, “so called experts.”
Numero uno is semantic search. Okay, there’s a secret for you. I am not able to define to my satisfaction semantic search, but you have the truth, gentle reader. Go forth.
Here are several other secrets:
- Write factual, logical, coherent articles
- Use Google Plus
- Connect with influencers
- Write for mobile devices
Here’s the paragraph I marked as one which puzzled me:
The rise of the Chief Statistical Officer or Chief Conversion Officer is not far away as businesses realize that dominating a niche is going to take more than a few hastily thrown together Adwords campaigns being added to their marketing mix.
I assume only search experts qualify for the job of statistical officer. Differentiate this from other baloney, and perhaps you can be a butcher. Experts, like the fellow at Halliburton, can do just about anything or so they think.
Stephen E Arnold, April 20, 2015
April 9, 2015
Wow. As an outsider to the world of marketing, I find these figures rather astounding. MarketingProfs shares an infographic titled, “The 20 Most Expensive Bing Ads Keywords.” The data comes from a recent analysis by WordStream of 10 million English keywords, grouped into categories. Writer Vahe Habeshian tells us:
“WordStream analyzed some 10 million English keywords and grouped the them into categories to determine the most expensive types of keywords (see infographic, below).
“The most expensive keyword on Bing Ads is ‘lawyer,’ which would cost advertisers seeking the top ad spot a whopping $109.21 per click. Not surprisingly, the top 5 keywords are related to the legal world, indicating how lucrative clients can be.”
Yes, almost $110 per click whether legitimate, a human error, or a robot script. That’s a lot of fruitless clicks. It seems irrational, but it must be working if companies keep spending the dough. Right?
The word in second place, “attorney,” comes to $101.77 per click, and “DUI” is a comparative bargain at $68.56. After the top five, law-related words, there are such valuable terms as “annuity,” “rehab,” and “exterminator.” See the infographic for more examples.
Cynthia Murrell, April 09, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
April 6, 2015
We have hear a lot about the semantic Web and search engine optimization (SEO), but both have the common thread of making information more accessible and increasing its use. One would think this would be the same kettle of fish, but sometimes it is hard to make SEO and the semantic Web work together for platonic web experience. On Slideshare.net, Eric Franzon’s “SEO Meets Semantic Web-Saint Patrick’s Day 2015-Meetup” tries to consolidate the two into one happy fish taco. The presentation tries to explain how the two work together, but here is the official description:
“Schema.org didn’t just appear out of thin air in 2011. It was built upon a foundation of web standards and technologies that have been in development for decades. In this presentation, Eric Franzon, Managing Partner of SemanticFuse provides an introduction to Semantic Web standards such as RDF and SPARQL. He explores who’s using them today and why (hint: it involves money), and takes a look at how Semantic Web, Linked Data, and schema.org are related.”
The problem with the presentation is that we do not have the audio to accompany it, but by flipping through the slides we can understand the general idea. The semantic Web is full of relationships that are connected by ideas and require coding and other fancy stuff to make it one big kettle. In fact, this appears to have too much of the semantic Web flavor and not enough of the SEO spice. One is a catfish for fine meal and the other is a fish fry without the oil.
Whitney Grace, April 6, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com
April 4, 2015
January 21, 2015
Curious to learn where Google is driving the search-engine optimization field these days? Search Engine Watch tells us, “6 Major Changes Reveal the Future of SEO.” Writer Eric Enge declares, “Google is doing a brilliant job of pushing people away from tactical SEO behavior and toward a more strategic approach.” Um, okay. As long as that means more relevant information for users.
The article lists Eng’s six observations and what each means for SEO approaches. For example, Google has stopped handing users’ keyword data to websites, requiring them to use other methods to monitor keyword performance. Then there’s the Hummingbird algorithm, which Enge says is really a major platform change. The write-up also considers the current influence of Google+ and Google’s Authorship program. Finally, Enge cites the In-Depth Article feature Google introduced last August, which points users to more comprehensive sources of information. See the article for more on each of these points. Enge concludes:
“All of these new pieces play a role in getting people to focus on their authority, semantic relevance, and the user experience. Again, this is what Google wants.
“For clarity, I’m not saying that Google designed these initiatives specifically to stop people from being tactical and make them strategic. I don’t really know that. It may simply be the case that Google operates from a frame of reference that they want to find and reward outstanding sites, pages, and authors that offer outstanding answers to user’s search queries. But the practical impact is the same.
“The focus now is on understanding your target users, producing great content, establishing your authority and visibility, and providing a great experience for the users of your site.”
Well, this does sound like a good shift for users. Will SEO workers used to focusing on PageRank data and keywords learn to adapt?
Cynthia Murrell, January 21, 2015