CyberOSINT banner

Jargon Overload: MoSCoW

August 28, 2015

Vladimir Putin is probably confused. My hunch is that when he hears “Moscow” uttered, he thinks about a lovely city, its courteous drivers, its delightful social groupings with idiosyncratic tattoos, and outstanding Moskva stile borshch.

Gentle reader, Mr. Putin would be off base.

MoSCoW, according to “Bats, Dolphins, and Semantic Search,” means Must, Should, Could, and Would. The application of these parental verb structures is to search engine optimization.

image

Please, take out the garbage and straighten your room, kiddies. MoSCoW now.

No, I don’t understand this, but you may want to check out the presentation. You may need to register for LinkedIn/Slideshare. I am never sure what I do to access the knowledge jewelry on this site. Here’s the link to try.

I am not into the parental thing. Click if you want. If not, no biggie.

Stephen E Arnold, August 28, 2015

Wikipedia: The PR Revolution

August 17, 2015

i read “The Covert World of People Trying to Edit Wikipedia—for Pay.” I am an old fashioned backwoodsperson. I look up stuff. I try to figure out which source is semi reliable. I read and do some (not much, of course) thinking.

Other folks just whack 2.7 words into the Alphabet—oops, I mean, the Google—click on the first link which is often a pointer to Wikipedia and take the “information” displayed. Easy. Quick. Just right for those who have no time, like social media, and use handheld devices.

The write up points out what seems to me to be an obvious “evolutionary” leap:

How can a site run by volunteers inoculate itself against well-funded PR efforts? And how can those volunteers distinguish between information that’s trustworthy and information that’s suspect?

The write up explores one example of public relations folks cranking out objective articles for Wikipedia.

Why worry? Getting accurate information involves more than relying on Alphabet – oh, there I go again, I mean the Google – and its all time fave number one Wikipedia.

Dialog Information Services pioneered this default top hit. When I logged on, the default database was Education Index or something like that. The clueless would run their query for diamond deposition in that database, thus having an upside for Dialog. Too bad about the system user.

The burden, gentle reader, falls not on Wikipedia, which is fighting a losing battle against the forces of Lucifer – I am sorry, I mean public relations.

The burden falls on the person doing the search to figure out what information is correct. Bummer. That’s real work. Who has time for that anyway?

Stephen E Arnold, August 17, 2016

Semantic Search Word Play: Nail and Hospital Edition

August 15, 2015

I find the semantic search hoo-hah fascinating. Not long ago, I reported that Yebol, which some semantic wizard was promoting, bit the dust in 2010. No matter. The semantic search boomlet continues to echo. I don’t hear it in my neck of the woods, but apparently some folks are tuned to this semantic razzle dazzle.

The write up which caught my attention this morning is “Semantic Search: Is It Time for a Think Building Campaign?”

My answer is, “No.”

Let’s look at the argument because I am often wrong, off base, and addled. What do you expect from a goose living in rural Kentucky where 300 baud is a speedy network connection.

The write up points out a traffic hungry person could sign up for directories. Anyone remember those? Yahoo was one, but the Xoogler has anchored Yahoo’s revisionist history is “search.” I don’t expect much in the history department. Sorry.

The article leaps to this point:

The biggest reason to re-evaluate the power of moderated local directories, and related resources, has to do with Google’s shift towards semantic search. If you aren’t aware of this transition, the premise is simple: instead of simply matching keywords to pages that exist in the search engine’s databases, Google’s engineers are trying to get better about understanding the context of the search, and the intent of the searcher.

Sounds great, right. The problem is that Google engineers (not the Alphabet crowd) are trying to find ways to pump up the advertising revenue. I am not sure “semantics” is going to help as much as other types of content processing activities.

Nevertheless, the write up then makes this interesting statement:

This sounds technical but it’s conceptually straightforward. Imagine for a moment that I pick up my iPhone and tell Google’s app that I’ve “driven a nail through my leg”. Matching that exact search phrase isn’t important to me in interpreting results – what matters is that I want “hospital” instead of a “hardware store.” That’s the essence of semantic search.

Now, hold one’s mules, please. The person who pounds a nail through one’s leg may not need a hospital. If the nail misses the femur and threads around (not through) the popliteal, posterior tibial, anterior tibial, peroneal, planar, and dorsalis pedis arteries—one might pull out the nail.

Here in Kentucky, the person who performs this act of self mutilation or willful or unintentional abuse might want a link to this health care facility:

image

Disagree? That’s what makes horse races.

The write up points out that one can purchase “reputation.” The article points to WhiteSpark and MOZ Local.

The conclusion to the write up certainly is upbeat:

Taking advantage of citations and directories can still help you improve your findability – on search engines and elsewhere in the real world – but only if you’re focused on providing valuable information for potential customers, instead of trying to beat those ever-changing algorithms. In many ways semantic search takes us back to the golden days of the Web, when in terms of working online anything was possible as long as you had passion, belief in yourself, and energy to work at it.

Yep, the golden days. The issue I have with the write up is that semantic search as a way to distort Google’s already flakey relevance algorithms is an example of SEO adaptation. The carnival has arrived. The SEO snake oil sales person will cure your site’s pancreatic cancer and maybe help a a customer avoid pounding nails into one’s body parts.

Stephen E Arnold, August 15, 2015

Alleged Semantic Tips Spot on for Freshman Comp Students

August 5, 2015

I find the amount of attention given to semantic search as it applies to search engine optimization a fascinating development. “Semantic”, like Big Data, is fast becoming meaningless. The root of semantic is the Greek word for significant.

The application of the word semantic to information search and retrieval is a bit less straightforward. Toss in the concept of “search” and “content processing” and the output is an an information smoothie with big chunks of tough to identify systems and methods; for example:

  • Methods to discern user intent
  • Methods to figure out the context of an ambiguous element
  • Programmatic data inserted into a content object which makes sense to a content processing system set up to recognize these instructions
  • Systems which use pre-compiled look up tables or programmatic methods to figure out which words go together (White House or white house) or which alias goes with which person of interest
  • Systems which attempt to “make sense” of content objects which signify some other information such as “Harrod’s teddy bear” as a token for an illegal substance
  • Systems which deal with multi lingual corpuses
  • Malformed Web accessible content which is supposed to comply with the W3C standards for semantic “stuff”.

You get the idea. Semantic drags in a number of interesting systems and methods. Many of these are complex and evolving as innovators try to deal with lousy precision and recall which is the norm for many “semantic” methods.

Now navigate to “Semantic Search Strategies That Work.” I would suggest that the tips in this write up apply to a person in an introductory college writing class. Here they are:

  1. “Forget about content as a daily grind.” Now that is music to a freshman’s ears. The silly notion that many professional writers have is that writing is something one must do every day and pursue with discipline. Nah, for real semantic search, take it easy. Chillax.
  2. “Concentrate on quality.” Now this is an interesting point. Google calculates quality based on a number of factors. The idea that a person who writes a high quality post and benefit from that effort is intriguing. In my experience, many excellent write ups get absolutely zero attention. These are usually write ups that address topics far from the pop music, Netflix, and Donald Trump scene. Here’s an example: Alon Halevy, et al, “Biperpedia: An Ontology for Search Applications.” This is a high quality paper, and I doubt that SEO mavens can match the effort which went into this 12 page write up. The write up deals with semantic issues, by the way.
  3. “When you write show who you are.” Not so fast. With the data lapses at various government agencies, health insurers, and corporate entities, content generated for the Web may require some thought, grooming, and vetting. How many SEO wizards want me to know about their behaviors and thoughts beyond their asserted expertise in fooling Google to rank an irrelevant site high in a query results list? How many SEO experts want the world to know that Google dropped a site in its rankings due to SEO missteps? What SEO expert wants a system to know what the person did prior to becoming an SEO expert? What about those secret actions like hunting lions in Africa or a dust up at a local watering hole? Think about this “who you are” stuff. Think carefully.
  4. “Focus on your prospects.” Ah, the bias is explicit. The motivating factor is that one writes to sell consulting work. Wrong. My hunch is that Dr. Halevy writes because he is curious and has colleagues with whom to collaborate in order to advance a particular area of inquiry. Halevy already sold a company to Google and, I assume, could sit at home and do volunteer SEO work. So far, he has resisted the siren song of easy money via baloney expertise.
  5. “Spend time on engagement.” I think this means attend conferences, post to social media, and hang out at watering holes without being captured in an on looker’s mobile phone picture.

Snake oil is available, gentle reader. Use with caution because it can damaged certain cognitive functions while emptying one’s bank account.

Stephen E Arnold, August 5, 2015

Semantic Promotions and a Nutrition Free Exercise

July 26, 2015

I saw a link to an item called “5 Basic Steps to Make Sure You Hit Page 1 on Google.” I followed it to this message:

image

One link pointed to this page:

image

But this image was the message.

Intrigued I chased the other urls in the post and located this write up:

image

What happened? I am not sure how the link bait promising number one on Google leads to “Semantic Search: The Future of Marketing.” There must be an informing hand somewhere.

I looked at the write up. Like many odd duck semantic search honks, the article explains that semantic search allows software to understand the context of a search. I am not sure how software would have navigate the original message but broken html is not the focus of the future of marketing. Or is it?

The write up zips through schemas, provides an example of structured data testing courtesy of the Google, dips into the knowledge graph thing, mentions Google’s direct answers, and more. Just briefly and in an earthworm manner. The write up strings out screen shots in order to explain the future of marketing I assume.

The conclusion is an interesting collection of “opportunities.” One of them is to be “mobile friendly.” Got it.

Now if we go back to the starting point we see that this is a collection of links and digital pabulum, an insufficient comestible for the addled goose. My hunch is that others may want some more substantial victuals. On the other hand, SEO experts may find the article’s information a feast at the Golden Arches.

Stephen E Arnold, July 26, 2015

Semantic Search: How Far Will This Baloney Tube Stretch?

July 12, 2015

I have seen a number of tweets, messages, and comments about “Semantic Search: the Future of Search Marketing?”

For those looking for traffic, it seems that using the phrase “semantic search” in conjunction with “search marketing” is Grade A click bait. Go for it.

My view is a bit different. I think that the baloney manufactured from semantic search (more correctly the various methods that can be grouped under the word semantic) is low grade baloney.

Search marketing is on a par with the institutional pizza pumped out for freshman in a dorm in DeKalb, Illinois. Yum, tasty. What is it? Oh, I know it is something that is supposed to be nutritious and tasty. The reality is that the pizza isn’t. That’s search marketing. The relevant result may not be. Relevance is jiggling results so that a message is displayed whether the user wants that message or not. Not pizza.

Here’s a passage in the write up I highlighted in pale yellow, the color in my marker set closest to the dorm pizza:

Semantic search is the technology the search engines employ to better understand the context of a search.

Contrast this definition with this one from “Breakthrough Analysis: Two + Nine Types of Semantic Search” published in 2010, five years before the crazy SEO adoption of the buzzword, if not the understanding of what “semantic” embraces:

Semantics (in an IT setting) is meaningful computing: the application of natural language processing (NLP) to support information retrieval, analytics, and data-integration that compass both numerical and “unstructured” information.

The article then trots out these semantic search options:

  1. Related searches and queries
  2. Reference results (dictionary look up)
  3. Annotated results
  4. Similarity search
  5. Syntactic annotations
  6. Concept search
  7. Ontology based search
  8. Semantic Web search
  9. Faceted search
  10. Clustered search
  11. Natural language search

Now there are many, many issues with this list. How about differentiating faceted, concept, and clustered search? Give up yet?

The point is that semantic search is not one thing. If one accepts this list as the touchstone, the functions referenced are going to contain other content processing operations.

The problem is that these functions on their own or used in some magical, affordable combination are not likely to deliver what the user wants.

The user wants relevant results which pertain directly to her specific information need.

The search engine optimization and marketing crowd want the results to be what they want to present to a user.

The objectives are different and may not be congruent or even similar.

In short, the notion of taking crazy, generalized concepts and slapping them on marketing is likely to produce howlers like this write up and the equally wonky list from 2010.

The point is that semantic baloney has been in the supermarket for a long time.

Obviously this baloney has a long shelf life.

In the meantime, how is ad supported Web search working for you? Oh, how is that in house information access system working for you?

If you want traffic, buy Adwords. Please, do not deliver to me the six pack of baloney.

Stephen E Arnold, July 12, 2015

More Semantic Search and Search Engine Optimization Chatter

June 10, 2015

I read “Understanding Semantic Search.” I had high hopes. The notion of Semantic Search as set forth by Tim Bray, Ramanathan Guha, and some other wizards years ago continues to intrigue me. The challenge has been to deliver high value outputs that generate sufficient revenue to pay for the plumbing, storage, and development good ideas can require.

I spent considerable time exploring one of the better known semantic search systems before the company turned off the lights and locked its doors. Siderean Software offered its Seamark system which could munch on triples and output some quite remarkable results. I am not sure why the company was not able to generate more revenue.

The company emphasized “discovery searching.” Vivisimo later imitated Siderean’s user input feature. The idea is that if a document required an additional key word, the system accepted the user input and added the term to the index. Siderean was one of the first search vendors to suggest that “graph search” or relationships would allow users to pinpoint content processed by the system. In the 2006-2007 period, Siderean indexed Oracle text content as a demonstration. (At the time, Oracle had the original Artificial Linguistics’ technology, the Oracle Text function, Triple Hop, and PL/SQL queries. Not surprisingly, Oracle did not show the search acquisition appetite the company demonstrated a few years later when Oracle bought Endeca’s ageing technology, the RightNow Netherlands-originated technology, or the shotgun marriage search vendor InQuira.)

I also invested some time on behalf of the client in the semantic inventions of Dr. Ramanathan Guha. This work was summarized in Google Version 2.0, now out of print. Love those print publishers, folks.

Dr. Guha applied the features of the Semantic Web to plumbing which, if fully implemented, would have allowed Google to build a universal database of knowledge, serve up snippets from a special semantic server, and perform a number of useful functions. This work was done by Dr. Guha when he was at IBM Almaden and at Google. My analysis of Dr. Guha’s work suggests that Google has more semantic plumbing than most observers of the search giant notice. The reason, I concluded, was that semantic technology works behind the scenes. Dragging the user into OWL, RDF, and other semantic nuances does not pay off as well as embedding certain semantic functions behind the scenes.

In the “Understanding Semantic Search” write up, I learned that my understanding of semantic search is pretty much a wild and crazy collection of half truths. Let me illustrate what the article presents as the “understanding” function for addled geese like me.

  • Searches have a context
  • Results can be local or national
  • Entities are important; for example, the White House is different from a white house

So far, none of this helps me understand semantic search as embodied in the 3WC standard nor in the implementation of companies like Siderean or the Google-Guha patent documents from 2007 forward.

The write up makes a leap from context to the question, “Are key words still important?”

From that question, the article informs me that I need to utilize schema mark up. These are additional code behinds which provide information to crawlers and other software about the content which the user sees on a rendering device.

And that’s it.

So let’s recap. I learned that context is important via illustrations which show Google using different methods to localize or personalize content. The write up does not enumerate the different methods which use browser histories, geolocation, and other signals. The write up then urges me to use additional mark up.

I think I will stick with my understanding of semantics. My work with Siderean and my research for an investment bank provided a richer base of knowledge about the real world applications of semantic technology. Technology, I wish to point out, which can be computationally demanding unless one has sufficient resources to perform the work.

What is happening in this “Understanding Semantic Search” article is an attempt to generate business for search engine optimization experts. Key word stuffing and doorway pages no longer work very well. In fact, SEO itself is a problem because it undermines precision and recall. Spoofing relevance is not my idea of a useful activity.

For those looking to semantics to deliver Google traffic, you might want to invest the time and effort in creating content which pulls users to you.

Stephen E Arnold, June 9, 2015

The Semantic Blenders: Not Consumable by Most

June 7, 2015

i read “Schema Markup and Microformatting Is Only the First Step in your Semantic Search Strategy.”

Okay, schema markup and microformatting. These are, according to the headline, one thing.

I am probably off base here in Harrod’s Creek, but I thought:

  1. Schema markup. Google’s explanation is designed to help out the GOOG, not the user. The methods of Guha and Halevy have proven difficult to implement. The result is a Googley move: Have the developers insert data into Web pages. Easy. Big benefit for Google too.
  2. Microformatting. A decade old effort to add additional information to a Web page. You can find examples galore at http://microformats.org/.

I am not very good at math, but it sure seems to me that these are two different processes.

But the burr under my blanket is that one cannot apply anything unless there is something written or displayed on a Web page. Therefore, these two additions to a Web page’s code cannot be the first thing. Tagging can occur after something has been written or at the same time when the writing is done with a smart input system.

The notion that these rather squishy logical mistakes occur in the headline did not rev my engine when I worked through the 1,800 words in the article. The assumption in the write up is that a reader wants to create an ecommerce site which garners a top Google result. The idea is that one types in a key word like “cyberosint” and the first hit in the result list points to the ecommerce page.

The hitch in the git along is that more queries are arriving from mobile devices. The consequence of this is that the mobile system will be filtering content and displaying information which the system calculates as important to the user.

I don’t want to rain on the semanticists’ parade, nor do I want to point out that search engine optimization is pretty much an undrinkable concoction of buzz words, jargon, and desperation.

Here’s one of the passage in the write up that I marked and inked a blue exclamation point in the margin of my print out:

Within Search Engine Optimization, many businesses focus on keywords, phrases, and search density as a way of sending clues to search engines that they should be known for those things. But let’s look at it from the human side: how can we make sure that our End User makes those associations? How can we build that Brand Association and Topical Relevance to a human being? By focusing our content strategy and providing quality content curation.

Well, SEO folks, I am not too keen on brand associations and I am not sure I want to be relevant to another human being. Whether I agree or not, the fix is not to perform these functions:

  • Bridges of association
  • Social listening
  • Quality reputation (a method used I might add on the Dark Web)

This smoothie is a mess.

There are steps a person with a Web page can take to communicate with spiders and human readers. I am not sure the effort, cost, and additional page fluff are going to work.

Perhaps the semanticists should produce something other than froth? To help Google, write and present information which is clear, concise, and consistent just like in junior high school English class.

Stephen E Arnold, June 7, 2015

Hijacking Semantics for Search Engine Optimization

May 26, 2015

I am just too old and cranky to get with the search engine optimization program. If a person cannot find your content, too bad. SEO has caused some of the erosion of relevance across public Web search engines.

The reason is that pages with lousy content are marketed as having other, more valuable content. The result is queries like this:

image

I want information about methods of digital reasoning. What I get is a company profile.

How do I get information for my specific requirement? I have to know how to work around the problems SEO puts in my face every day, over and over again.

This query works on Bing, Google, and Yandex: artificial intelligence decision procedures.

image

The results do not point to a small company in Tennessee, but to substantive documents from which other, pointed queries can be launched for individuals, industry associations, and methods.

When I read “Semantic Search Strategies That Work,” I became agitated. The notion of “forgetting about content” and “focusing on quality” miss the mark. Telling me to “spend time on engagement” are a collection of unrelated assertions.

The goal of semantics for SEO is to generate traffic. The search systems suck in shaped content and persist in directing people to topics that may have little or nothing to do with the information a person needs to solve his or her problem.

In short, the bastardization of semantics in the name of SEO is ensuring that some users will define the world from the point of view of marketing, not objective information.

What’s the fix?

Here’s the shocker: There is no fix. As individuals abrogate their responsibility to demand high value, on point results, schlock becomes the order of the day.

So much for clear thinking. Semantic strategies that erode relevance do not “work” from my point of view. This type of semantics thickens the cloud of unknowning.

Stephen E Arnold, May 26, 2015

Google and Your JavaScript: Good News and Reality

May 12, 2015

If you want to keep Mother Google happy, you want to feed her content which is mobile friendly, meets her rules for her children, and delivers bang up information to her many minions.

I read “We Tested How Googlebot Crawls Javascript And Here’s What We Learned.” You may want to check out the SEO oriented write up if you wonder why no one visits your Web site. (Tip: Most Web sites do not get much traffic. Traffic is often helped out by buying Adwords. Remember. You heard this from me. Oh, prepare to invest substantial sums for maximum payback. SEO is usually less efficacious.)

The article explains a series of tests to reveal how Mother Google interprets and makes use of JavaScript. I found this passage highlight worthy:

Javascript links work in a similar manner to plain HTML links (at face value, we do not know what’s happening behind the scenes in the algorithms).

JavaScript away. Just remember that Adwords deliver traffic. SEO is usually a somewhat less reliable method. But those SEO experts do charge money. So make your own decision. Adwords which work. SEO methods which are at best uneven.

Stephen E Arnold, May 12, 2015

Next Page »