Google and Removed Links for Pirated Content

January 5, 2015

I read “Google Received 345 Million Pirate link Removal Requests in 2014.” In 2008, Google received 62 requests. In 2014, Google received requests to remove 345,169,134 links. As the article points out, that’s around a million links a day.

The notion of a vendor indexing “all” information is a specious one. More tricky is that one cannot find information if it is blocked from the public index. How will copyright owners find violators? Is there an index for Dark Net content?

My thought is that finding information today is more difficult than it was when I was in college. Sixty years of progress.

Stephen E Arnold, January 5, 2015

Faceted Search: From the 1990s to Forever and Ever

January 4, 2015

Keyword retrieval is useful. But it is not good for some tasks. In the mid 1990s, Endeca’s founders “invented” a better way. The name that will get you a letter from a lawyer is “guided navigation.” The patents make clear the computational procedure required to make facets work.

The more general name of the feature is “faceted navigation.” For those struggling with indexing, faceted navigation “exposes” the users to content options. This works well if the domain is reasonably stable, the corpus small, and the user knows generally what he or she needs.

To get a useful review of this approach to findability, check out “Faceted Navigation.” Now five years old, the write up will add logs to the fires of taxonomy. However, faceted search is not next generation information access. Faceted navigation is like a flintlock rifle used by Lewis and Clark. Just don’t try to kill any Big Data bears with the method. And Twitter outputs? Look elsewhere.

Stephen E Arnold, January 4, 2014

SAP Hana Search 2014

December 25, 2014

Years ago I wrote an analysis of TREX. At the time, SAP search asserted a wide range of functionality. I found the system interesting, but primarily of use to die hard SAP licensees. SAP was and still is focused on structured data. The wild and crazy heterogeneous information generated by social media, intercept systems, geo-centric gizmos, and humans blasting terabytes of digital images cheek by jowl with satellite imagery is not the playground of the SAP technology.

If you want to get a sense of what SAP is delivering, check out “SAP Hana’s Built-In Search Engine.” My take on the explanation is that it is quite similar to what Fast Search & Transfer proposed for the pre-sale changes to ESP. The built-in system is not one thing. The SAP explainer points out:

A standalone “engine” is not enough, however. That’s why SAP HANA also includes the Info Access “InA” toolkit for HTML5. The InA toolkit is a set of HTML5 templates and UI controls which you can use to configure a modern, highly interactive UI running in a browser. No code – just configuration.

To make matters slightly more confusing, I read “Google Like Enterprise Search Powered by SAP Hana.” I am not sure what “Google like” means. Google provides its ageing and expensive Google Search Appliance. But like Google Earth, I am not sure how long the GSA will remain on the Google product punch list. Futhermore, the GSA is a bit of a time capsule. Its features and functions have not kept pace with next generation information access technologies. Google invested in Recorded Future a couple of years ago and as far as I know, none of the high value Recorded Future functions are part of the GSA. Google also delivers its Web indexing service. Does Google like refer to the GSA, Google’s cloud indexing of Web sites, or the forward looking Recorded Future technology?

The Google angle seems to relate to Fiori search. Based on the screenshots, it appears that Fiori presents SAP’s structured data in a report format. Years ago we used a product called Monarch to deliver this type of information to a client.

My hypothesis is that SAP wants to generate more buzz about its search technology. The company has moved on from TREX, positioned Hana search as a Fast Search emulation, and created Fiori to generate reports from SAP’s structured data management system.

For now, I will keep SAP in my “maybe next year” folder. For now. I am not sure what SAP information access systems deliver beyond basic keyword search, some clustering, and report outputs. SAP at some point may have to embrace open source search solutions. If SAP has maintained its commitment to open source, perhaps these technologies are open source. I would find that reassuring.

Regardless of what SAP is providing licensees, it is clear that the basic features and functions of next generation information access systems are not part of the present line up of products. Like other IBM-inspired companies, the future is rushing forward with SAP search receding in tomorrow’s rear view mirror. Calling a system “Google like” is not helpful, nor does it suggest that SAP is ware of NGIA systems. Some of SAP’s customers will be licensing these systems in order to move beyond what is a variation of query, scan results, open documents, read documents, and hunt for useful information. Organizations require more sophisticated information access services. The models crafted in the 1990s are, in my opinion, are commoditized. Higher value NGIA operations are the future.

Stephen E Arnold, December 25, 2014

Blast toward the Moon With Rocket Software

December 8, 2014

YouTube informational videos are great. They are short, snappy, and often help people retain more information about a product than reading the “about” page on a Web site. Rocket Software has its own channel and the video “Rocket Enterprise Search And Text Analytics” packs a lot of details into 2.49 minutes. The video is described as:

“We provide an integrated search platform for gathering, indexing, and searching both structured and unstructured data?making the information that you depend on more accessible, useful, and intelligent.”

How does Rocket Software defend that statement? The video opens with a prediction that by 2020 data usage will have increased to forty trillion gigabytes. It explains that data is the new enterprise currency and that it needs to be kept organized, then it drops into a plug for the company’s software. The compare themselves to other companies by saying Rocket Software makes the enterprise search and text analytics as simple as a download and then it will be up and running. Other enterprise searches require custom coding, but Rocket Software explains it offers these options out of the box. Plus it is a cheaper product without having to sacrifice quality.

Software usage these days is about functionality and ease of use for powerful software. Rocket Software states it offers this. Try putting it to the test.

Whitney Grace, December 08, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Finding Books: Not Much Has Changed

December 1, 2014

Three or four years ago I described what I called “the book findability” problem. The audience was a group of confident executives trying to squeeze money from an old school commercial database model. Here’s how the commercial databases worked in 1979.

  1. Take content from published sources
  2. Create a bibliographic record, write or edit the abstract included with the source document
  3. Index it with no more than three to six index terms
  4. Digitize the result
  5. Charge a commercial information utility to make it available
  6. Get a share of the revenues.

That worked well until the first Web browser showed up and individuals and institutions began making information available online. There are a number of companies that still use variations of this old school business model. Examples include newspapers that charge a Web browser user for access to content to outfits like LexisNexis, Ebsco, Cambridge Scientific Abstracts, and other outfits.

image

As libraries and individuals resist online fees, many of the old school outfits are going to have to come up with new business models. But adaptation will not be easy. Amazon is in the content business. Why buy a Cliff’s Notes-type summary when there are Amazon reviews? Why pay for news when a bit of sleuthing will turn up useful content from outfits like the United Nations or off the radar outfits like World News at www.wn.com? Tech information is going through a bit of an author revolt. While not on the level of protests in Hong Kong, a lot of information that used to be available in research libraries or from old school database providers is available online. At some point, peer reviewed journals and their charge the author business models will have to reinvent themselves. Even recruitment services like LinkedIn offer useful business information via Slideshare.com.

One black hole concerns finding out what books are available online. A former intelligence officer with darned good research skills was not able to locate a copy of my The New Landscape of Search. You can find it here for free.

I read “Location, Location: GPS in the Medieval Library.” The use of coordinates to locate a book on a shelf or hanging from a wall anchored by a chain is not new to those who have fooled around with medieval manuscripts. Remember that I used to index medieval sermons in Latin as early as 1963.

What the write up triggered was the complete and utter failure of indexing services to make an attempt to locate, index, and provide a pointer to books regardless of form. The baloney about indexing “all” information is shown to be a toothless dragon. The failure of the Google method and the flaws of the Amazon, Library of Congress, and commercial database providers is evident.

Now back to the group of somewhat plump, red face confident wizards of commercial database expertise. The group found my suggestion laughable. No big deal. I try to avoid the Titanic type operations. I collected my check and hit the road.

There are domains of content that warrant better indexing. Books, unfortunately, is one set of content that makes me long for the approach that put knowledge in one place with a system that at least worked and could be supplemented by walking around and looking.

No such luck today.

Stephen E Arnold, December 1, 2014

Enterprise Search: Confusing Going to Weeds with Being Weeds

November 30, 2014

I seem to run into references to the write up by a “expert”. I know the person is an expert because the author says:

As an Enterprise Search expert, I get a lot of questions about Search and Information Architecture (IA).

The source of this remarkable personal characterization is “Prevent Enterprise Search from going to the Weeds.” Spoiler alert: I am on record as documenting that enterprise search is at a dead end, unpainted, unloved, and stuck on the margins of big time enterprise information applications. For details, read the free vendor profiles at www.xenky.com/vendor-profiles or, if you can find them, read one of my books such as The New Landscape of Search.

Okay. Let’s assume the person writing the Weeds’ article is an “expert”. The write up is about misconcepts [sic]; specifically, crazy ideas about what a 50 year plus old technology can do. The solution to misconceptions is “information architecture.” Now I am not sure what “search” means. But I have no solid hooks on which to hang the notion of “information architecture” in this era of cloud based services. Well, the explanation of information architecture is presented via a metaphor:

The key is to understand: IA and search are business processes, rather than one-time IT projects. They’re like gardening: It’s up to you if you want a nice and tidy garden — or an overgrown jungle.

Gentle reader, the fact that enterprise search has been confused with search engine optimization is one thing. The fact that there are a number of companies happily leapfrogging the purveyors of utilities to make SharePoint better or improve automatic indexing is another.

Let’s look at each of the “misconceptions” and ask, “Is search going to the weeds or is search itself weeds?”

The starting line for the write up is that no one needs to worry about information architecture because search “will do everything for us.” How are thoughts about plumbing and a utility function equivalent. The issue is not whether a system runs on premises, from the cloud, or in some hybrid set up. The question is, “What has to be provided to allow a person to do his or her job?” In most cases, delivering something that addresses the employee’s need is overlooked. The reason is that the problem is one that requires the attention of individuals who know budgets, know goals, and know technology options. The confluence of these three characteristics is quite rare in my experience. Many of the “experts” working enterprise search are either frustrated and somewhat insecure academics or individuals who bounced into a niche where the barriers to entry are a millimeter or two high.

Next there is a perception, asserts the “expert”, that search and information architecture are one time jobs. If one wants to win the confidence of a potential customer, explaining that the bills will just keep on coming is a tactic I have not used. I suppose it works, but the incredible turnover in organizations makes it easy for an unscrupulous person to just keep on billing. The high levels of dissatisfaction result from a number of problems. Pumping money into a failure is what prompted one French engineering company to buy a search system and sideline the incumbent. Endless meetings about how to set up enterprise systems are ones to which search “experts” are not invited. The information technology professionals have learned that search is not exactly a career building discipline. Furthermore, search “experts” are left out of meetings because information technology professionals have learned that a search system will consume every available resource and produce a steady flow of calls to the help desk. Figuring out what to build still occupies Google and Amazon. Few organizations are able to do much more that embrace the status quo and wait until a mid tier consultant, a cost consultant, or a competitor provides the stimulus to move. Search “experts” are, in my experience, on the outside of serious engineering work at many information access challenged organizations. That’s a good thing in my view.

The middle example is what the expert calls “one size fits all.” Yep, that was the pitch of some of the early search vendors. These folks packaged keyword search and promised that it would slice, dice, and chop. The reality of information, even for the next generation information access companies with which I work, focus on making customization as painless as possible. In fact, these outfits provide some ready-to-roll components, but where the rubber meets the road is providing information tailored to each team or individual user. At Target last night, my wife and I bought Christmas gifts for needy people. One of the gifts was a 3X sweater. We had a heck of a time figuring out if the store offered such a product. Customization is necessary for more and more every day situations. In organizations, customization is the name of the game. The companies pitching enterprise search today lag behind next generation information access providers in this very important functionality. The reason is that the companies lack the resources and insight needed to deliver. But what about information architecture? How does one cloud based search service differ from another? Can you explain the technical and cost and performance differences between SearchBlox and Datastax?

The penultimate point is just plain humorous: Search is easy. I agree that search is a difficult task. The point is that no one cares how hard it is. What users want are systems that facilitate their decision making or work. In this blog I reproduced a diagram showing one firm’s vision for indexing. Suffice it to say that few organizations know why that complexity is important. The vendor has to deliver a solution that fits the technical profile, the budget, and the needs of an organization. Here is the diagram. Draw your own conclusion:

infolibrarian-metadata-data-goverance-building-blocks

The final point is poignant. Search, the “expert” says, can be a security leak. No, people are the security link. There are systems that process open source intelligence and take predictive, automatic action to secure networks. If an individual wants to leak information, even today’s most robust predictive systems struggle to prevent that action. The most advanced systems from Centripetal Networks and Zerofox offer robust systems, but a determined individual can allow information to escape. What is wrong with search has to do with the way in which provided security components are implemented. Again we are back to people. Information architecture can play a role, but it is unlikely that an organization will treat search differently from legal information or employee pay data. There are classes of information to which individuals have access. The notion that a search system provides access to “all information” is laughable.

I want to step back from this “expert’s” analysis. Search has a long history. If we go back and look at what Fulcrum Technologies or Verity set out to do, the journeys of the two companies are quite instructive. Both moved quickly to wrap keyword search with a wide range of other functions. The reason for this was that customers needed more than search. Fulcrum is now part of OpenText, and you can buy nubbins of Fulcrum’s 30 year old technology today, but it is wrapped in huge wads of wool that comprise OpenText’s products and services. Verity offered some nifty security features and what happened? The company chewed through CEOs, became hugely bloated, struggled for revenues, and end up as part of Autonomy. And what about Autonomy? HP is trying to answer that question.

Net net: This weeds write up seems to have a life of its own. For me, search is just weeds, clogging the garden of 21st century information access. The challenges are beyond search. Experts who conflate odd bits of jargon are the folks who contribute to confusion about why Lucene is just good enough so those in an organization concerned with results can focus on next generation information access providers.

Stephen E Arnold, November 30, 2014

Enterprise Search: Hidden and Intentional Limitations

November 29, 2014

Several years ago, an ArnoldIT team tackled a content management search system that had three characteristics: [1] The system could not locate content a writer saved to the system. [2] The search results lacked relevance that made sense to the user; that is, time sequences for versions of the article were neither in ascending or descending order. [3] Response time was sluggish. Let’s look at each of these obvious problems.

The user wanted to access a document that required final touch ups. The article was not in a result that even when the writer entered the full title of the article. Examination of the system revealed that it lacked keyword search, relying on a training set of documents that did not contain some of the words in the title. The fix was to teach the user to search for articles using words and concepts that appeared in the body of the article. We also identified an indexing latency problem. The system lacked sufficient resources so recent material was not in an index until the indexing system caught up. The organization did not have the money to add additional resources, so the writers were told, “Live with it.”

image

What is that vendor doing amidst the prospects? A happy quack to http://bit.ly/1CrUc81

The chaos in the relevance function was a result of a configuration error during an upgrade and subsequent reindexing. The fix was to reconfigure the system and reindex the content. Keep in mind that indexing required more than two weeks. The attitude of the client was, “It is what it is.”

The third problem was the response time. This particular system used a product from a Canadian vendor of search and content management systems. The firm acquires companies and then “milks” the system. The idea is that updating and bug fixing are expensive. The problem was exacerbated because the search system was a stub provided by another vendor. The result was that in order to get more robust performance, the client had to upgrade the OEM search system AND add computers, memory, and network infrastructure. The client lacked the money to take these actions.

What were the hidden limitations of this enterprise search system?

On the client side, there was a lack of understanding of the interdependencies of a complex system. The client lacked expertise and, more importantly, money to address the problems of an “as is” infrastructure. Moving the operation to the cloud was not possible due to security concerns, knowledge of what to do and how to do it, and initiative on the management team. The conclusion was, “The system is good enough.”

On the vendor side, the marketing team closing the deal did not disclose that the OEM search system was a stub designed to upsell the licensee the seven figure “fix”. The vendor also did not reveal that funds were not being invested in the system due to the vendor’s “milk the cow” philosophy.

I point out this situation because it applies to many vendors of enterprise search systems. The object of the game on the client side is to get a good enough system at the best possible price. Senior managers are rarely experts in search and often search Google anyway.

The vendor has zero incentive to change its business practices. Even with low cost options available, once a prospect becomes a customer, lock in usually keeps the account alive. When a switch is in the wind, the search vendor plays the “legacy” card, pointing out that there are some people who need the older system. As a result, the licensing organization ends up with multiple search systems. The demand for money just goes up and findability remains a bit of a challenge for employees.

image

I do not see a fix under the present enterprise search business model. Education does not work. Search conferences dodge the tough issues, preferring to pander to vendors who buy exhibit stands and sponsor lunch.

Something different is needed: Different vendors, different approaches, and different assemblies of technology.

That’s why next generation information access is going to pose a threat to companies that pitch enterprise search disguised as customer support, business intelligence, analysis systems, and eDiscovery tools.

At some point, the NGIA vendors will emerge as the go-to vendors. Like the promising but now quiet outfits like Hakia and Autonomy, search as it has been practiced for decades is rapidly becoming a digital Antikythera mechanism.

Stephen E Arnold, November 29, 2014

Mid Tier Consultants Try the Turkey Tactic

November 27, 2014

Entering 2015, the economy is not ripping along like some of the MBAs suggest. Life is gloomy for many keyword search, content management, and indexing system vendors. And for good reason. These technologies have run their course. Instead of being must have enterprise solutions, the functions are now utilities. The vendors of proprietary systems have to realize that free and open source systems provide “good enough” utility type functions.

Perhaps this brutal fact is the reason that search “expert” Dave Schubmehl recycled our research on open source software, tried to flog it on Amazon without my permission, and then quietly removed the reports based on ArnoldIT research. When a mid tier consulting firm cannot sell recycled research taken without permission for sale via Amazon for the quite incredible price of $3,500 for eight pages of information garbling our work, you know that times are tough for the mid tier crowd.

image

Are the turkeys the mid-tier consultants or their customers? Predictions about the future wrapped in the tin foil of jargon may not work like touts who pick horses. The difference between a mid tier consulting firm and a predictive analytics firm is more than the distance between an art history major and a PhD in mathematics with a Master’s in engineering and an undergraduate degree in history in my opinion.

Now the focus at the mid tier consulting firms is turning to the issue of raising turkeys. Turkeys are stellar creatures. Is it true that the turkey is the only fowl that will drown itself during a rain storm. My grandmother told me the birds will open their beaks and capture the rain. According to the Arnold lore, some lightning quick turkeys will drown themselves.

For 2015, the mid tier consultants want to get the Big Data bird moving. Also, look for the elegant IoT or Internet of Things to get the blue ribbon treatment. You can get a taste of this dish in this news release: “IDC Hosts Worldwide Internet of Things 2015 Predictions Web Conference.”

Yep, a Web conference. I call this a webinar, and I have webinar fatigue. The conference is intended to get the turkeys in the barn. Presumably some of the well heeled turkeys will purchase the IDC Future Scape report. When I mentioned this to a person with whom I spoke yesterday, I think that individual said, “A predictions conference. You are kidding me.” An, no I wasn’t. Here’s the passage I noted:

“The Internet of Things will give IT managers a lot to think about,” said Vernon Turner, Senior Vice President of Research at IDC. “Enterprises will have to address every IT discipline to effectively balance the deluge of data from devices that are connected to the corporate network. In addition, IoT will drive tough organizational structure changes in companies to allow innovation to be transparent to everyone, while creating new competitive business models and products.”

I think I understand. “Every”, “tough organizational changes,” and “new competitive business models.” Yes. And the first example is a report with predictions.

When I think of predictions, I don’t think of mid tier consultants. I think of outfits like Recorded Future, Red Owl, and Haystax, among others. The predictions these companies output are based on data. Predictions from mid tier consultants are based on a wide range of methods. I have a hunch that some of these techniques include folks sitting around and asking, “Well, what do you think this Internet of Things stuff will mean?”

Give me the Recorded Future approach. Oh, I don’t like turkey. I am okay with macaroni and cheese. Basic but it lacks the artificial fizz with which some farmers charge their fowl.

Stephen E Arnold, November 27, 2014

Elasticsearch Partners With Cisco

November 27, 2014

Cisco is a popular solution for enterprise communication software and now Elasticsearch has joined its team. Unlike other partnerships, which involve either company buyouts or some form of give and take, Cisco is remaining independent. The news came to us from Market Wired in “Elasticsearch Joins the Cisco Solution Partner Program.” Being a member of the Cisco Solution Partner Program allows Elasticsearch to access Cisco’s Internet of Everything network. The program also gives Elasticsearch the opportunity to quickly create and deploy solutions to the Internet of Everything network.

Another boon of Elasticsearch teaming with Cisco is that it brings Mozilla into the circle. Mozilla already uses Elasticsearch and Kibana, an open source security information and event management platform. When you put the three together teamed with Cisco’s UCS infrastructure, real-time indexing, search, real-time analytics, and security protection.

” ‘Elasticsearch, Kibana and Cisco UCS allowed us to quickly stand up an infrastructure we could use to build MozDef, and support our needs for rapid expansion, querying, indexing and replication of data traffic,’ said Jeff Bryner, Intrusion Detection Engineer at Mozilla. ‘Elasticsearch provides us the flexibility and speed to handle our increasing stream of event data, which we can search and visualize in Kibana and then use MozDef to perform incident response, alerting and advanced visualizations to protect Mozilla’s data, systems and customers.’ “

The Cisco Solution Partner Program is a win-win situation for all participants. The partners can draw on each other’s strengths and offer a wider array of services.

Whitney Grace, November 27, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Enterprise Search: Fee Versus Free

November 25, 2014

I read a pretty darned amazing article “Is Free Enterprise Search a Game Changer?” My initial reaction was, “Didn’t the game change with the failures of flagship enterprise search systems?” And “Didn’t the cost and complexity of many enterprise search deployments fuel the emergence of the free and open source information retrieval systems?”

Many proprietary vendors are struggling to generate sustainable revenues and pay back increasingly impatient stakeholders. The reality is that the proprietary enterprise search “survivors” fear meeting the fate of  Convera, Delphes, Entopia, Perfect Search, Siderean Software, TREX, and other proprietary vendors. These outfits went away.

image

Many vendors of proprietary enterprise search systems have left behind an environment in which revenues are simply not sustainable. Customers learned some painful lessons after licensing brand name enterprise search systems and discovering the reality of their costs and functionality. A happy quack to http://bit.ly/1AMHBL6 for this image of desolation.

Other vendors, faced with mounting costs and zero growth in revenues, sold their enterprise search companies. The spate of sell outs that began in the mid 2000s were stark evidence that delivering information retrieval systems to commercial and governmental organizations was difficult to make work.

Consider these milestones:

Autonomy sold to Hewlett Packard. HP promptly wrote off billions of dollars and launched a fascinating lawsuit that blamed Autonomy for the deal. HP quickly discovered that Autonomy, like other complex content processing companies, was difficult to sell, difficult to support, and difficult to turn into a billion dollar baby.

Convera, the product of Excalibur’s scanning legacy and ConQuest Software, captured some big deals in the US government and with outfits like the NBA. When the system did not perform like a circus dog, the company wound down. One upside for Convera alums was that they were able to set up a consulting firm to keep other companies from making the Convera-type mistakes. The losses were measured in the tens of millions.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta