Semantic Search Becomes Search Engine Optimization: That Is Going to Improve Relevance

March 27, 2015

I read “The Rapid Evolution of Semantic Search.” It must be my age or the fact that it is cold in Harrod’s Creek, Kentucky, this morning. The write up purports to deliver “an overview of the history of semantic search and what this means for marketers moving forward.” I like that moving forward stuff. It reminds me of Project Runway’s “fashion forward.”

The write up includes a wonky graphic that equates via an arrow Big Data and metadata, volume, smart content, petabytes, data analysis, vast, structured, and framework. Big Data is a cloud with five little arrows pointing down. Does this mean Big Data is pouring from the sky like yesterday’s chilling rain?

The history of the Semantic Web begins in 1998. Let’s see that is 17 years ago. The milestone is in the context of the article, the report “Semantic Web road Map.” I learned that Google was less than a month old. I thought that Google was Backrub and the work on what was named Google begin a couple, maybe three years, earlier. Who cares?

The Big Idea is that the Web is an information space. That sounds good.

Well in 2012, something Big happened. According to the write up Google figured out that 20 percent of its searches were “new.” Aren’t those pesky humans annoying. The article reports:

long tail keywords made up approximately 70 percent of all searches. What this told Google was that users were becoming interested in using their search engine as a tool for answering questions and solving problems, not just looking up facts and finding individual websites. Instead of typing “Los Angeles weather,” people started searching “Los Angeles hourly weather for March 1.” While that’s an extremely simplified explanation, the fact is that Google, Bing, Facebook, and other internet leaders have been working on what Colin Jeavons calls “the silent semantic revolution” for years now. Bing launched Satori, a knowledge storehouse that’s capable of understanding complex relationships between people, things, and entities. Facebook built Knowledge Graph, which reveals additional information about things you search, based on Google’s complex semantic algorithm called Hummingbird.

Yep, a new age dawned. The message in the article is that marketers have a great new opportunity to push their message in front of users. In my book, this is one reason why running a query on any of the ad supported Web search engines returns so much irrelevant information. In my just submitted Information Today column, I report how a query for the phrase “concept searching” returned results littered with a vendor’s marketing hoo-hah.

I did not want information about a vendor. I wanted information about a concept. But, alas, Google knows what I want. I don’t know what I want in the brave new world of search. The article ignores the lack of relevance in results, the dust binning of precision and recall, and the bogus information many search queries generate. Try to find current information about Dark Web onion sites and let me know how helpful the search systems are. In fact, name the top TOR search engines. See how far you get with Bing, Google, and Yandex. (DuckDuckGo and Ixquick seem to be aware of TOS content by the way.)

So semantic in the context of this article boils down to four points:

  1. Think like an end user. I suppose one should not try to locate an explanation of “concept searching.” I guess Google knows I care about a company with a quite narrow set of technology focused on SharePoint.
  2. Invest in semantic markup. Okay, that will make sense to the content marketers. What if the system used to generate the content does not support the nifty features of the Semantic Web. OWL, who? RDF what?
  3. Do social. Okay, that’s useful. Facebook and Twitter are the go to systems for marketing products I assume. Who on Facebook cares about cyber OSINT or GE’s cratering petrochemical business?
  4. And the keeper, “Don’t forget about standard techniques.” This means search engine optimization. That SEO stuff is designed to make relevance irrelevant. Great idea.

Net net: The write up underscores some of the issues associated with generating buzz for a small business like the ones INC Magazine tries to serve. With write ups like this one about Semantic Search, INC may be confusing their core constituency. Can confused executives close deals and make sense of INC articles? I assume so. I know I cannot.

Stephen E Arnold, March 27, 2015

Organizing Content is a Manual or Automated Pain

January 16, 2015

Organizing uploaded content is a pain in the rear. In order to catalog the content, users either have to add tags manually or use an automated system that requires several tedious fields to be filled out. CMS Wire explains the difficulties with document organization in “Stop Pulling Teeth: A Better Way To Classify Documents.” Manual tagging is the longer of the two processes and if no one created a set of tagging standards, tags will be raining down from the cloud in a content mess. Automated fields are not that bad to work with if you have one or two documents to upload, but if you have a lot of files to fill out you are more prone to fill out the wrong information to finish the job.

Apparently there is a happy medium:

“Encourage users to work with documents the way they normally do and use a third party tool such as an auto classification tool to extract text based content, products, subjects and terms out of the document. This will create good, standardized metadata to use for search refinement. It can even be used to flag sensitive information or report content detected with code names, personally identifiable information such as credit card numbers, social security numbers or phone numbers.”

While the suggestion is sound, we thought that auto-classification tools were normally built in collaborative content platform like SharePoint. Apparently not. Third party software to improve enterprise platforms once more saves the day for the digital paper pusher.

Whitney Grace, January 16, 2015
Sponsored by ArnoldIT.com, developer of Augmentext

Enterprise Search: Confusing Going to Weeds with Being Weeds

November 30, 2014

I seem to run into references to the write up by a “expert”. I know the person is an expert because the author says:

As an Enterprise Search expert, I get a lot of questions about Search and Information Architecture (IA).

The source of this remarkable personal characterization is “Prevent Enterprise Search from going to the Weeds.” Spoiler alert: I am on record as documenting that enterprise search is at a dead end, unpainted, unloved, and stuck on the margins of big time enterprise information applications. For details, read the free vendor profiles at www.xenky.com/vendor-profiles or, if you can find them, read one of my books such as The New Landscape of Search.

Okay. Let’s assume the person writing the Weeds’ article is an “expert”. The write up is about misconcepts [sic]; specifically, crazy ideas about what a 50 year plus old technology can do. The solution to misconceptions is “information architecture.” Now I am not sure what “search” means. But I have no solid hooks on which to hang the notion of “information architecture” in this era of cloud based services. Well, the explanation of information architecture is presented via a metaphor:

The key is to understand: IA and search are business processes, rather than one-time IT projects. They’re like gardening: It’s up to you if you want a nice and tidy garden — or an overgrown jungle.

Gentle reader, the fact that enterprise search has been confused with search engine optimization is one thing. The fact that there are a number of companies happily leapfrogging the purveyors of utilities to make SharePoint better or improve automatic indexing is another.

Let’s look at each of the “misconceptions” and ask, “Is search going to the weeds or is search itself weeds?”

The starting line for the write up is that no one needs to worry about information architecture because search “will do everything for us.” How are thoughts about plumbing and a utility function equivalent. The issue is not whether a system runs on premises, from the cloud, or in some hybrid set up. The question is, “What has to be provided to allow a person to do his or her job?” In most cases, delivering something that addresses the employee’s need is overlooked. The reason is that the problem is one that requires the attention of individuals who know budgets, know goals, and know technology options. The confluence of these three characteristics is quite rare in my experience. Many of the “experts” working enterprise search are either frustrated and somewhat insecure academics or individuals who bounced into a niche where the barriers to entry are a millimeter or two high.

Next there is a perception, asserts the “expert”, that search and information architecture are one time jobs. If one wants to win the confidence of a potential customer, explaining that the bills will just keep on coming is a tactic I have not used. I suppose it works, but the incredible turnover in organizations makes it easy for an unscrupulous person to just keep on billing. The high levels of dissatisfaction result from a number of problems. Pumping money into a failure is what prompted one French engineering company to buy a search system and sideline the incumbent. Endless meetings about how to set up enterprise systems are ones to which search “experts” are not invited. The information technology professionals have learned that search is not exactly a career building discipline. Furthermore, search “experts” are left out of meetings because information technology professionals have learned that a search system will consume every available resource and produce a steady flow of calls to the help desk. Figuring out what to build still occupies Google and Amazon. Few organizations are able to do much more that embrace the status quo and wait until a mid tier consultant, a cost consultant, or a competitor provides the stimulus to move. Search “experts” are, in my experience, on the outside of serious engineering work at many information access challenged organizations. That’s a good thing in my view.

The middle example is what the expert calls “one size fits all.” Yep, that was the pitch of some of the early search vendors. These folks packaged keyword search and promised that it would slice, dice, and chop. The reality of information, even for the next generation information access companies with which I work, focus on making customization as painless as possible. In fact, these outfits provide some ready-to-roll components, but where the rubber meets the road is providing information tailored to each team or individual user. At Target last night, my wife and I bought Christmas gifts for needy people. One of the gifts was a 3X sweater. We had a heck of a time figuring out if the store offered such a product. Customization is necessary for more and more every day situations. In organizations, customization is the name of the game. The companies pitching enterprise search today lag behind next generation information access providers in this very important functionality. The reason is that the companies lack the resources and insight needed to deliver. But what about information architecture? How does one cloud based search service differ from another? Can you explain the technical and cost and performance differences between SearchBlox and Datastax?

The penultimate point is just plain humorous: Search is easy. I agree that search is a difficult task. The point is that no one cares how hard it is. What users want are systems that facilitate their decision making or work. In this blog I reproduced a diagram showing one firm’s vision for indexing. Suffice it to say that few organizations know why that complexity is important. The vendor has to deliver a solution that fits the technical profile, the budget, and the needs of an organization. Here is the diagram. Draw your own conclusion:

infolibrarian-metadata-data-goverance-building-blocks

The final point is poignant. Search, the “expert” says, can be a security leak. No, people are the security link. There are systems that process open source intelligence and take predictive, automatic action to secure networks. If an individual wants to leak information, even today’s most robust predictive systems struggle to prevent that action. The most advanced systems from Centripetal Networks and Zerofox offer robust systems, but a determined individual can allow information to escape. What is wrong with search has to do with the way in which provided security components are implemented. Again we are back to people. Information architecture can play a role, but it is unlikely that an organization will treat search differently from legal information or employee pay data. There are classes of information to which individuals have access. The notion that a search system provides access to “all information” is laughable.

I want to step back from this “expert’s” analysis. Search has a long history. If we go back and look at what Fulcrum Technologies or Verity set out to do, the journeys of the two companies are quite instructive. Both moved quickly to wrap keyword search with a wide range of other functions. The reason for this was that customers needed more than search. Fulcrum is now part of OpenText, and you can buy nubbins of Fulcrum’s 30 year old technology today, but it is wrapped in huge wads of wool that comprise OpenText’s products and services. Verity offered some nifty security features and what happened? The company chewed through CEOs, became hugely bloated, struggled for revenues, and end up as part of Autonomy. And what about Autonomy? HP is trying to answer that question.

Net net: This weeds write up seems to have a life of its own. For me, search is just weeds, clogging the garden of 21st century information access. The challenges are beyond search. Experts who conflate odd bits of jargon are the folks who contribute to confusion about why Lucene is just good enough so those in an organization concerned with results can focus on next generation information access providers.

Stephen E Arnold, November 30, 2014

Enterprise Search: Fee Versus Free

November 25, 2014

I read a pretty darned amazing article “Is Free Enterprise Search a Game Changer?” My initial reaction was, “Didn’t the game change with the failures of flagship enterprise search systems?” And “Didn’t the cost and complexity of many enterprise search deployments fuel the emergence of the free and open source information retrieval systems?”

Many proprietary vendors are struggling to generate sustainable revenues and pay back increasingly impatient stakeholders. The reality is that the proprietary enterprise search “survivors” fear meeting the fate of  Convera, Delphes, Entopia, Perfect Search, Siderean Software, TREX, and other proprietary vendors. These outfits went away.

image

Many vendors of proprietary enterprise search systems have left behind an environment in which revenues are simply not sustainable. Customers learned some painful lessons after licensing brand name enterprise search systems and discovering the reality of their costs and functionality. A happy quack to http://bit.ly/1AMHBL6 for this image of desolation.

Other vendors, faced with mounting costs and zero growth in revenues, sold their enterprise search companies. The spate of sell outs that began in the mid 2000s were stark evidence that delivering information retrieval systems to commercial and governmental organizations was difficult to make work.

Consider these milestones:

Autonomy sold to Hewlett Packard. HP promptly wrote off billions of dollars and launched a fascinating lawsuit that blamed Autonomy for the deal. HP quickly discovered that Autonomy, like other complex content processing companies, was difficult to sell, difficult to support, and difficult to turn into a billion dollar baby.

Convera, the product of Excalibur’s scanning legacy and ConQuest Software, captured some big deals in the US government and with outfits like the NBA. When the system did not perform like a circus dog, the company wound down. One upside for Convera alums was that they were able to set up a consulting firm to keep other companies from making the Convera-type mistakes. The losses were measured in the tens of millions.

Read more

Choosing Office 365 or Azure

November 25, 2014

There is not just a single cloud, or Cloud with a capital C. Rather, there are multiple cloud-based services for SharePoint deployments. CMS Wire helps break down some of the choices that users face when determining which cloud to choose. They even have a handy survey at the end to make selection even simpler. Read more in their article, “SharePoint in the Clouds: Choosing Between Office 365 or Azure.”

The author begins:

“There are dozens of cloud hosting options for SharePoint, beyond Office 365. Amazon, Rackspace and Fpweb offer compelling alternatives to Microsoft’s public cloud for SharePoint online with a mix of capabilities. These capabilities fall on the spectrum between two options: 1) IaaS (Infrastructure as a service) — cloud hosted VMs on which YOU install Windows, SQL, SharePoint … 2) SaaS (Software as a service) — fully managed solution delivering SharePoint services with full subscribed provider managed availability, backup, performance, installation, etc.”

There are definitely pros and cons on both sides. If you need any help sorting through the various angles, turn to Stephen E. Arnold of ArnoldIT.com. He has spent his career following enterprise search, and has collected quite an impressive collection of tips, tricks, and news articles on his SharePoint feed.

Emily Rae Aldridge, November 25, 2014

Channel 19 Offers Office 365 Rest Point Training

November 20, 2014

With all the intricacies of SharePoint, continued training and education is important. Short training videos are getting easier to find, so that users don’t have to subscribe to large training programs, or hire someone to come in. It is worth giving these short tutorials a short. We found an interesting one on Channel 19 called, “Azure, Office 365, and SharePoint Online has REST endpoints with Mat Velloso.”

The summary says:

“Mat Velloso explains how to create applications and services in Azure that get permission to access OTHER applications like SharePoint! We’ll dig into the URL Structure of these services, see how to get events when things are updated, and figure out how ODATA and REST fit into these cloud building blocks.”

Stephen E. Arnold of ArnoldIT.com pays a good amount of attention to training and continuing education regarding SharePoint. His web service, ArnoldIT.com, is devoted to all things search, including a large SharePoint feed that helps users and managers stay on top of the latest tips, tricks, and news that may affect their implementation.  Keep an eye out for further learning opportunities.

 

Emily Rae Aldridge, November 20, 2014

Microsoft Delve A Useful Spy

October 16, 2014

Microsoft is adding a new big data piece to its Office 365 lineup. And in a bit of a change of direction for the company, Microsoft has sought to make this element aesthetically pleasing as it points out patterns of likes and dislikes. Read more about Microsoft Delve in the InfoWorld article, “Microsoft’s Delve: The Office 365 Spy You Just Might Love.”

The article says:

“Microsoft’s Delve is an intriguing new offering for Office 365 business customers. Previously known as Oslo, Delve brings a concierge, Instragram-like pulse to business environments, as curated by Office Graph, sophisticated machine-learning technology that maps relationships between people, content, and activity across Office 365 accounts. Delve pulls content from within your organization’s OneDrive, SharePoint, and Yammer accounts, serving it up to users in a card-based interface reminiscent of Pinterest.”

The verdict is still out as to how helpful the product will really be in the business environment. It does behave without existing permissions, only showing users that which they are granted permission to see. Stephen E. Arnold is a longtime leader in search and reports on the latest news in his SharePoint feed. Since Delve may have helpful implications for SharePoint, keep an eye on ArnoldIT.com for all the latest tips and tricks.

Emily Rae Aldridge, October 16, 2014

Microsoft Azure Price Cuts? Maybe More Bad News for Search Vendors

September 26, 2014

The race for commodity pricing in cloud computing is underway. I read an article, which I assume is semi-accurate, called “Microsoft Azure Sees Big Price Reductions: Competition Is Good.” “Good” is a often a relative term.

For those looking for low cost cloud computing that delivers Azure functions, lower prices mean that Amazon- and Google-type prices may be too high.

For a vendor trying to pitch an information retrieval system to a Microsoft centric outfit, the falling prices may mean that Azure Search is not just good enough. It is a deal. The only systems that can be less expensive are those one downloads from an open source repository or one that a hard worker codes herself.

The write up states:

Microsoft has announced, in a blog post, that it will be slashing the cost of some of its Azure cloud services from October 1st….customers buying through Enterprise agreements will enjoy even lower prices. The rate card currently shows 63 services being reduced by up to about 40%.

For enterprise search vendors chasing SharePoint licensees with promises of better, faster, and cheaper—the move by Microsoft is likely to be of interest.

I anticipate that search vendors will scramble even harder than ever. Furthermore, I look forward to even more outrageous assertions about the value of content processing. As an example, check out this set of assertions about an open source based system that has been scrambling for purchase on the sales mountain for six or seven years.

Stephen E Arnold, September 26, 2014

Launching and Scaling Elasticsearch

August 21, 2014

Elasticsearch is widely hailed as an alternative to SharePoint or many of the other open source alternatives, but it is not without its problems. Ben Hundley from StackSearch offers his input on the software in his QBox article, “Thoughts on Launching and Scaling Elasticsearch.”

Hundley begins:

“Qbox is a dedicated hosting service for Elasticsearch.  The project began internally to find a more economical solution to Amazon’s Cloudsearch, but it evolved as we became enamored by the flexibility and power of Elasticsearch.  Nearly a year later, we’ve adopted the product as our main priority.  Admittedly, our initial attempt took the wrong approach to scale.  Our assumption was that scaling clusters for all customers could be handled in a generalized manner, and behind the scenes.”

Hundley walks through reader through several considerations that affect their own implementation: knowing your application’s needs, deciding on hardware, monitoring, tuning, and knowing when to scale. These are all decisions that must be made on the front-end, allowing for more effective customization. The upside of an open source solution like Elasticsearch is greater customization, control, and less rigidity. Of course for a small organization, that could also be the downside as time and staffing are more limited and an out-of-the-box solution like SharePoint is more likely to be chosen.

Emily Rae Aldridge, August 21, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Does Anything Matter Other Than the Interface?

August 7, 2014

I read what I thought was a remarkable public relations story. You will want to check the write up out for two reasons. First, it demonstrates how content marketing converts an assertion into what a company believes will generate business. And, second, it exemplifies how a fix can address complex issues in information access. You may, like Archimedes, exclaim, “I have found it.”

The title and subtitle of the “news” are:

NewLane’s Eureka! Search Discovery Platform Provides Self-Servicing Configurable User Interface with No Software Development. Eureka! Delivers Outstanding Results in the Cloud, Hybrid Environments, and On Premises Applications.

My reaction was, “What?”

The guts of the NewLane “search discovery platform” is explained this way:

Eureka! was developed from the ground up as a platform to capture all the commonalities of what a search app is and allows for the easy customization of what a company’s search app specifically needs.

I am confused. I navigated to the company’s Web site and learned:

Eureka! empowers key users to configure and automatically generate business applications for fast answers to new question that they face every day. http://bit.ly/V0E8pI

The Web site explains:

Need a solution that provides a unified view of available information housed in multiple locations and formats? Finding it hard to sort among documents, intranet and wiki pages, and available reporting data? Create a tailored view of available information that can be grouped by source, information type or other factors. Now in a unified, organized view you can search for a project name and see results for related documents from multiple libraries, wiki pages from collaboration sites, and the profiles of project team members from your company’s people directory or social platform.

“Unified information access” is a buzzword used by Attivio and PolySpot, among other search vendors. The Eureka! approach seems to be an interface tool for “key users.”

Here’s the Eureka technology block diagram:

image

Notice that Eureka! has connectors to access the indexes in Solr, the Google Search Appliance, Google Site Search, and a relational database. The content that these indexing and search systems can access include Documentum, Microsoft SharePoint, OpenText LiveLink, IBM FileNet, files shares, databases (presumably NoSQL and XML data management systems as well), and content in “the cloud.”

For me the diagram makes clear that NewLane’s Eureka is an interface tool. A “key user” can create an interface to access content of interest to him or her. I think there are quite a few people who do not care where data come from or what academic nit picking went on to present information. The focus is on something a harried professional like an MBA who has to make a decision “now” needs some information.

image

Archimedes allegedly jumped from his bath, ran into the street, and shouted “Eureka.” He reacted, I learned from a lousy math teacher, that he had a mathematical insight about displacement. The teacher did not tell me that Archimedes was killed because he was working on a math problem and ignored a Roman soldier’s command to quit calculating. Image source: http://blocs.xtec.cat/sucdecocu/category/va-de-cientifics/

I find interfaces a bit like my wife’s questions about the color of paint to use for walls. She shows me antique ivory and then parchment. For me, both are white. But for her, the distinctions are really important. She knows nothing about paint chemistry, paint cost, and application time. She is into the superficial impact the color has for her. To me, the colors colors are indistinguishable. I want to know about durability, how many preparation steps the painter must go through between brands, and the cost of getting the room painted off white.

Interfaces for “key users” work like this in my experience. The integrity of the underlying data, the freshness of the indexes, the numerical recipes used to prioritize the information in a report are niggling details of zero interest to many system users. An answer—any answer—may be good enough.

Eureka! makes it easier to create interfaces. My view is that a layer on top of connectors, on top of indexing and content processing systems, on top of wildly diverse content is interesting. However, I see the interfaces as a type of paint. The walls look good but the underlying structure may be deeply flawed. The interface my wife uses for her walls does not address the fact that the wallboard has to be replaced BEFORE she paints again. When I explain this to her when she wants to repaint the garage walls, she says, “Why can’t we just paint it again?” I don’t know about you, but I usually roll over, particularly if it is a rental property.

Now what does the content marketing-like “news” story tell me about Eureka!

I found this statement yellow highlight worthy:

Seth Earley, CEO of Earley and Associates, describes the current global search environment this way, “What many executives don’t realize is that search tools and technologies have advanced but need to be adapted to the specific information needed by the enterprise and by different types of employees accomplishing their tasks. The key is context. Doing this across the enterprise quickly and efficiently is the Holy Grail. Developing new classes of cloud-based search applications are an essential component for achieving outstanding results.”

Yep, context is important. My hunch is that the context of the underlying information is more important. Mr. Earley, who sponsored an IDC study by an “expert” named Dave Schubmehl on what I call information saucisson, is an expert on the quasi academic “knowledge quotient” jargon. He, in this quote, seems to be talking about a person in shipping or a business development professional being able to use Eureka! to get the interface that puts needed information front and center. I think that shipping departments use dedicated systems who data typically does not find their way into enterprise information access systems. I also think that business development people use Google, whatever is close at hand, and enterprise tools if there is time. When time is short, concise reports can be helpful. But what if the data on which the reports are based are incorrect, stale, incomplete, or just wrong? Well, that is not a question germane to a person focused on the “Holy Grail.”

I also noted this statement from Paul Carney, president and founder of NewLane:

The full functionality of Eureka! enables understaffed and overworked IT departments to address the immediate search requirements as their companies navigate the choppy waters of lessening their dependence on enterprise and proprietary software installations while moving critical business applications to the Cloud. Our ability to work within all their existing systems and transparently find content that is being migrated to the Cloud is saving time, reducing costs and delivering immediate business value.

The point is similar to what Google has used to sell licenses for its Google Search Appliance. Traditional information technology departments can be disintermediated.

If you want to know more about FastLane, navigate to www.fastlane.com. Keep a bathrobe handy if you review the Web site relaxing in a pool or hot tube. Like Archimedes, you may have an insight and jump from the water and run through the streets to tell others about your insight.

Stephen E Arnold, August 7, 2014

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta