CyberOSINT banner

The Four Vs Arrive for Semantic Search

September 1, 2015

When I first encountered the four Vs, I thought someone was recycling the mnemonic trick I was taught when I was a wee pre-retirement person.

I associate the four Vs with IBM and Vivisimo. The hook up is probably a consequence of my flawed thought processes. I have a slide in my files showing an illustration of Volume, Velocity, Variety, and Veracity artfully presented as a cartoon.

One of the goslings showed me this image, but I am not sure it is the diagram of which I speak. Here it is, and you can figure out how the four balls, the plugs, and the tough to read cyan type explain Big Data.

These have migrated to semantic search. Now that’s as good a home for these buzzwords as any of the suburban developments in Jargonville.

image

As applied to semantic search, the four Vs appear to guide the would be enemies of relevance to write a lot, make lots of changes, post content in many forms, and provide “accurate” information.

Good advice.

I assume that somewhat revenue thirst search engine optimization experts will be raking in the dollars and euros explaining these concepts to their clients.

I am still baffled about the connection between IBM, Vivisimo, and Big Data. I will leave semantic search to the SEO mavens and the mid tier consultants whom I associate with hard to read azure colors. I much prefer the hard edge tones of the blue chip folks.

Stephen E Arnold, September 1, 2015

Forbes Bitten by Sci-Fi Bug

September 1, 2015

The article titled Semantic Technology: Building the HAL 9000 Computer on Forbes runs with the gossip from the Smart Data Conference this year. Namely, that semantic technology has finally landed. The article examines several leaders of the field including Maana, Loop AI Labs and Blazegraph. The article mentions,

“Computers still can’t truly understand human language, but they can make sense out of certain aspects of textual content. For example, Lexalytics (www.lexalytics.com) is able to perform sentiment analysis, entity extraction, and ambiguity resolution. Sentiment analysis can determine whether some text – a tweet, say, expresses a positive or negative opinion, and how strong that opinion is. Entity extraction identifies what a paragraph is actually talking about, while ambiguity resolution solves problems like the Paris Hilton one above.”

(The “Paris Hilton problem” referred to is distinguishing between the hotel and the person in semantic search.) In spite of the excitable tone of the article’s title, its conclusion is much more measured. HAL, the sentient computer from 2001: A Space Odyssey, remains in our imaginations. In spite of the exciting work being done, the article reminds us that even Watson, IBM’s supercomputer, is still without the “curiosity or reasoning skills of any two-year-old human.” For the more paranoid among us, this might be good news.

Chelsea Kerwin, September 1, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Jargon Overload: MoSCoW

August 28, 2015

Vladimir Putin is probably confused. My hunch is that when he hears “Moscow” uttered, he thinks about a lovely city, its courteous drivers, its delightful social groupings with idiosyncratic tattoos, and outstanding Moskva stile borshch.

Gentle reader, Mr. Putin would be off base.

MoSCoW, according to “Bats, Dolphins, and Semantic Search,” means Must, Should, Could, and Would. The application of these parental verb structures is to search engine optimization.

image

Please, take out the garbage and straighten your room, kiddies. MoSCoW now.

No, I don’t understand this, but you may want to check out the presentation. You may need to register for LinkedIn/Slideshare. I am never sure what I do to access the knowledge jewelry on this site. Here’s the link to try.

I am not into the parental thing. Click if you want. If not, no biggie.

Stephen E Arnold, August 28, 2015

Yahoo: Semantic Search Is the Future

August 16, 2015

I love it when Yahoo explains the future of search. The Xoogler has done the revisionism thing and shifted from Yahoo as a directory built by silly humanoids to a leader in search. Please, do not remember that Yahoo bought Inktomi in 2002 and then rolled out a wild and crazy search system in cahoots with IBM in 2006. (By the way, that search solution brought my IBM multi cpu, DASD equipped, RAM stuffed server to its knees. At least, the “free” software installed.)

image

Now to business: I read “The Future of Search Relies on Semantic Technologies.” For me, semantic technologies have been part of search for many years. But never mind reality. Let’s get to the Reddi-wip in the Yahoo confection.

Yahoo asserts:

Search companies are thus investing in information extraction and data fusion, as well as more and more advanced question-answering capabilities on top of the collected information. The need for these technologies is only increasing with mobile search, where providing results as ten blue links leads to a very poor user experience.

I would point out that as lousy as blue links are, these links produce about $60 billion a year for the Alphabet Google thing and enough zeros for the Microsoft wizards to hang on to its online advertising business even as it loses enthusiasm for other aspects of the Bing thing.

Yahoo adds:

We are a consumer internet company, so for us there is little difference between our internal and external representations.

My comment is a simple question, “What the heck is Yahoo saying?”

I also highlighted this semantic gem:

At Yahoo Labs, we work in advancing the sciences that underlie these approaches, i.e. Natural Language Processing, Information Retrieval and the Semantic Web.

I like the notion of Yahoo advancing science. I wonder if these advances will lead to advances in top line revenue, stabilizing management, and producing search results that are sort of related to the query.

Read more

Semantic Search Word Play: Nail and Hospital Edition

August 15, 2015

I find the semantic search hoo-hah fascinating. Not long ago, I reported that Yebol, which some semantic wizard was promoting, bit the dust in 2010. No matter. The semantic search boomlet continues to echo. I don’t hear it in my neck of the woods, but apparently some folks are tuned to this semantic razzle dazzle.

The write up which caught my attention this morning is “Semantic Search: Is It Time for a Think Building Campaign?”

My answer is, “No.”

Let’s look at the argument because I am often wrong, off base, and addled. What do you expect from a goose living in rural Kentucky where 300 baud is a speedy network connection.

The write up points out a traffic hungry person could sign up for directories. Anyone remember those? Yahoo was one, but the Xoogler has anchored Yahoo’s revisionist history is “search.” I don’t expect much in the history department. Sorry.

The article leaps to this point:

The biggest reason to re-evaluate the power of moderated local directories, and related resources, has to do with Google’s shift towards semantic search. If you aren’t aware of this transition, the premise is simple: instead of simply matching keywords to pages that exist in the search engine’s databases, Google’s engineers are trying to get better about understanding the context of the search, and the intent of the searcher.

Sounds great, right. The problem is that Google engineers (not the Alphabet crowd) are trying to find ways to pump up the advertising revenue. I am not sure “semantics” is going to help as much as other types of content processing activities.

Nevertheless, the write up then makes this interesting statement:

This sounds technical but it’s conceptually straightforward. Imagine for a moment that I pick up my iPhone and tell Google’s app that I’ve “driven a nail through my leg”. Matching that exact search phrase isn’t important to me in interpreting results – what matters is that I want “hospital” instead of a “hardware store.” That’s the essence of semantic search.

Now, hold one’s mules, please. The person who pounds a nail through one’s leg may not need a hospital. If the nail misses the femur and threads around (not through) the popliteal, posterior tibial, anterior tibial, peroneal, planar, and dorsalis pedis arteries—one might pull out the nail.

Here in Kentucky, the person who performs this act of self mutilation or willful or unintentional abuse might want a link to this health care facility:

image

Disagree? That’s what makes horse races.

The write up points out that one can purchase “reputation.” The article points to WhiteSpark and MOZ Local.

The conclusion to the write up certainly is upbeat:

Taking advantage of citations and directories can still help you improve your findability – on search engines and elsewhere in the real world – but only if you’re focused on providing valuable information for potential customers, instead of trying to beat those ever-changing algorithms. In many ways semantic search takes us back to the golden days of the Web, when in terms of working online anything was possible as long as you had passion, belief in yourself, and energy to work at it.

Yep, the golden days. The issue I have with the write up is that semantic search as a way to distort Google’s already flakey relevance algorithms is an example of SEO adaptation. The carnival has arrived. The SEO snake oil sales person will cure your site’s pancreatic cancer and maybe help a a customer avoid pounding nails into one’s body parts.

Stephen E Arnold, August 15, 2015

Oracle: The Ostrich Syndrome

August 14, 2015

I read “Oracle’s Chief Security Officer Mary Ann Davidson Just Made as Rookie Mistake.” No, it has nothing to do with trying to breathe life into Oracle Secure Enterprise Search or increasing the content processing speed of Endeca. Those might be really difficult tasks.

According to the write up:

Oracle Chief Security Officer Mary Ann Davidson was forced to remove a blog post after she made a mistake that made her sound out of touch with the security space. In her online post, she claimed that security researchers who point out flaws in Oracle software may be in violation of the company’s license agreement. She said reverse engineering is not allowed under the company’s own TOS.

Quite a good idea if one is struggling with the Java thing, open source database annoyances, and push back about certain licensing policies and fees.

I read this and thought of the creature which buries its head in the sand.

To make the issue more interesting, Oracle removed the post which allegedly said:

“If we determine as part of our analysis that scan results could only have come from reverse engineering, we send a letter to the sinning customer, and a different letter to the sinning consultant-acting-on-customer’s behalf – reminding them of the terms of the Oracle license agreement that preclude reverse engineering, So Please Stop It Already”

I love the “already.” There is a robust market sector which identifies and provides information about vulnerabilities to those who are not into the ostrich approach to information.

Isn’t this disappearing, revisionistic information trend fascinating. What you do not know cannot possibly harm you. Ignorance is bliss. Be happy.

Stephen E Arnold, August 14, 2014

IT Architecture Needs to Be More Seamless

August 14, 2015

IT architecture might appear to be the same across the board, but depending on the industry the standards change.  Rupert Brown wrote “From BCBS to TOGAF: The Need For a Semantically Rigorous Business Architecture” for Bob’s Guide and he discusses how TOGAF is the defacto standard for global enterprise architecture.  He explains that while TOGAF does have its strengths, it supports many weaknesses are its reliance on diagrams and using PowerPoint to make them.

Brown spends a large portion of the article stressing that information content and model are more important and a diagramed should only be rendered later.  He goes on that as industries have advanced the tools have become more complex and it is very important for there to be a more universal approach IT architecture.

What is Brown’s supposed solution? Semantics!

“The mechanism used to join the dots is Semantics: all the documents that are the key artifacts that capture how a business operates and evolves are nowadays stored by default in Microsoft or Open Office equivalents as XML and can have semantic linkages embedded within them. The result is that no business document can be considered an island any more – everything must have a reason to exist.”

The reason that TOGAF has not been standardized using semantics is the lack of something to connect various architecture models together.  A standardized XBRL language for financial and regulatory reporting would help get the process started, but the biggest problem will be people who make a decent living using PowerPoint (so he claims).

Brown calls for a global reporting standard for all industries, but that is a pie in the sky hope unless the government imposes regulations or all industries have a meeting of the minds.  Why?  The different industries do not always mesh, think engineering firms vs. a publishing house, and each has their own list of needs and concerns.  Why not focus on getting industry standards for one industry rather than across the board?

Whitney Grace, August 14, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

American Sign Language Emojis: Will Search Vendors Adapt?

August 7, 2015

Short honk: Forget words in English. “There’s Finally a Good Way to Text in Sign Language” explains that a new mobile keyboard app allows American Sign Language speakers to send text messages to a hearing impaired individual. The write up states:

Signily also includes animated signs for many popular ASL phrases that don’t have exact English translations. This makes texting a more natural experience for signers.

How does one parse and search these messages? Think look up table maybe? Will semantic vendors be able to make sense of animated signs?

Sure, semantic search is just super. And the meeting to discuss this?None animated GIF

Stephen E Arnold, August 7, 2015

IT Architecture Needs to Be More Seamless

August 7, 2015

IT architecture might appear to be the same across the board, but depending on the industry the standards change.  Rupert Brown wrote “From BCBS To TOGAF: The Need For A Semantically Rigorous Business Architecture” for Bob’s Guide and he discusses how TOGAF is the defacto standard for global enterprise architecture.  He explains that while TOGAF does have its strengths, it supports many weaknesses are its reliance on diagrams and using PowerPoint to make them.

Brown spends a large portion of the article stressing that information content and model are more important and a diagramed should only be rendered later.  He goes on that as industries have advanced the tools have become more complex and it is very important for there to be a more universal approach IT architecture.

What is Brown’s supposed solution? Semantics!

“The mechanism used to join the dots is Semantics: all the documents that are the key artifacts that capture how a business operates and evolves are nowadays stored by default in Microsoft or Open Office equivalents as XML and can have semantic linkages embedded within them. The result is that no business document can be considered an island any more – everything must have a reason to exist.”

The reason that TOGAF has not been standardized using semantics is the lack of something to connect various architecture models together.  A standardized XBRL language for financial and regulatory reporting would help get the process started, but the biggest problem will be people who make a decent living using PowerPoint (so he claims).

Brown calls for a global reporting standard for all industries, but that is a pie in the sky hope unless the government imposes regulations or all industries have a meeting of the minds.  Why?  The different industries do not always mesh, think engineering firms vs. a publishing house, and each has their own list of needs and concerns.  Why not focus on getting industry standards for one industry rather than across the board?

Whitney Grace, August 7, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Alleged Semantic Tips Spot on for Freshman Comp Students

August 5, 2015

I find the amount of attention given to semantic search as it applies to search engine optimization a fascinating development. “Semantic”, like Big Data, is fast becoming meaningless. The root of semantic is the Greek word for significant.

The application of the word semantic to information search and retrieval is a bit less straightforward. Toss in the concept of “search” and “content processing” and the output is an an information smoothie with big chunks of tough to identify systems and methods; for example:

  • Methods to discern user intent
  • Methods to figure out the context of an ambiguous element
  • Programmatic data inserted into a content object which makes sense to a content processing system set up to recognize these instructions
  • Systems which use pre-compiled look up tables or programmatic methods to figure out which words go together (White House or white house) or which alias goes with which person of interest
  • Systems which attempt to “make sense” of content objects which signify some other information such as “Harrod’s teddy bear” as a token for an illegal substance
  • Systems which deal with multi lingual corpuses
  • Malformed Web accessible content which is supposed to comply with the W3C standards for semantic “stuff”.

You get the idea. Semantic drags in a number of interesting systems and methods. Many of these are complex and evolving as innovators try to deal with lousy precision and recall which is the norm for many “semantic” methods.

Now navigate to “Semantic Search Strategies That Work.” I would suggest that the tips in this write up apply to a person in an introductory college writing class. Here they are:

  1. “Forget about content as a daily grind.” Now that is music to a freshman’s ears. The silly notion that many professional writers have is that writing is something one must do every day and pursue with discipline. Nah, for real semantic search, take it easy. Chillax.
  2. “Concentrate on quality.” Now this is an interesting point. Google calculates quality based on a number of factors. The idea that a person who writes a high quality post and benefit from that effort is intriguing. In my experience, many excellent write ups get absolutely zero attention. These are usually write ups that address topics far from the pop music, Netflix, and Donald Trump scene. Here’s an example: Alon Halevy, et al, “Biperpedia: An Ontology for Search Applications.” This is a high quality paper, and I doubt that SEO mavens can match the effort which went into this 12 page write up. The write up deals with semantic issues, by the way.
  3. “When you write show who you are.” Not so fast. With the data lapses at various government agencies, health insurers, and corporate entities, content generated for the Web may require some thought, grooming, and vetting. How many SEO wizards want me to know about their behaviors and thoughts beyond their asserted expertise in fooling Google to rank an irrelevant site high in a query results list? How many SEO experts want the world to know that Google dropped a site in its rankings due to SEO missteps? What SEO expert wants a system to know what the person did prior to becoming an SEO expert? What about those secret actions like hunting lions in Africa or a dust up at a local watering hole? Think about this “who you are” stuff. Think carefully.
  4. “Focus on your prospects.” Ah, the bias is explicit. The motivating factor is that one writes to sell consulting work. Wrong. My hunch is that Dr. Halevy writes because he is curious and has colleagues with whom to collaborate in order to advance a particular area of inquiry. Halevy already sold a company to Google and, I assume, could sit at home and do volunteer SEO work. So far, he has resisted the siren song of easy money via baloney expertise.
  5. “Spend time on engagement.” I think this means attend conferences, post to social media, and hang out at watering holes without being captured in an on looker’s mobile phone picture.

Snake oil is available, gentle reader. Use with caution because it can damaged certain cognitive functions while emptying one’s bank account.

Stephen E Arnold, August 5, 2015

Next Page »