An Egyptian eBooks Search Engine

March 24, 2014

Most people think about the Amazon Kindle, iBooks, and other popular mobile book reading platforms when they hear eBooks. In the Middle East there is fierce competition to dominate eBook sales in the region. Wamda posted the article, “Egyptian eBooks Search Engine Al Kutub Ready To Face The Competition” that gives a rundown about a new player.

Al Kutub is a new book search engine and within twelve days has seen over 10,000 people subscribe. The creator Mohammed Nemat Allah designed Al Kutub to be the largest regional database of digital and audio books. Allah does not host any of the content, instead Al Kutub searches through online sources.

Allah only hosts the books’ bibliographic citation and directs the user toward legitimate book sellers, so he does not have to fear legal action:

“The thirty something Nemat Allah seems to believe in spreading knowledge and is confident of his legal stance, according to statements from his counselor. Whoever objects to the presence of any content, the statements say, should remove it from the source where it was originally posted.”

Al Kutub offers four different subscriptions that offer different services and incentives. There is also an internal social network. The eBook application market is booming! The common belief is that people do not read in this digital age, they just do not read paper.

Whitney Grace, March 24, 2014
Sponsored by, developer of Augmentext

Another Week, Another Enterprise Search System

March 21, 2014

Cloud? Check.

Azure chip consultant reference? Check.

Social angle? Check.

Support for distributed information? Check.

Consumerized interface? Check.

Reference to value? Check.

Automatic alerts? Check.

Customer reference? Check.

Big company pedigree? Check.

Open sourciness? Check.

Exotic technology? Check.

There you have the recipe for a new enterprise search system, at least according to eWeek’s “Highspot Brings Machine Learning to Enterprise Search.” Highpoint’s Web site describes itself this way:

Built for the cloud era, Highspot uses advanced machine learning to help organizations capture, share, and cultivate their most valuable working knowledge.

The pricing information, omitted from the eWeek story just as azure chip consultants omit enterprise search fees, begins at free and comes out of the gate at $20 per user per month or $240 per user per year. For an organization with 400 users, the annual fee works out to about $96,000 for an open source, machine-learning system, a bargain compared to the Google Search Appliance but more expensive than downloading Solr, Searchdaimon, or Elasticsearch and having one staff get search up and running. A less expensive option that works reasonably well is dtSearch, but you need to love the color blue for this search system. If you want an appliance, check out Maxxcat’s systems. These are far less expensive than other appliances, and the new systems are easy to set up and deploy. For cloud action, take a look at Blossom Software’s solution. Chances are your state, country, or municipal government is using the Blossom system built by a former Bell Labs’ whiz kid.

Net net: The enterprise search market is flooded with options. With big, waddling outfits like HP and IBM getting increasingly desperate to make their billion dollar bets pay off, you have high end options as well as free downloadable systems from organizations in Denmark, Norway, Russia, and elsewhere.

Will the pricing hold if a business licensee points the system at 50 million documents? My hunch is that there will be some fine print. Google charges about $900,000 for its appliance capable of processing tens of millions of documents with three years of support. You can check the latest US government discount prices at Just search for “Google Search Appliance” and peruse the government’s price. A commercial price may vary.

The key is that the engines of many systems are open source. The “solution” is software wrappers and checklists that hit the marketing hot buttons. Keep up with Highspot via the company’s blog at

Stephen E Arnold, March 21, 2014

Appen Uses Humans to Improve Non-English Search Relevance

March 21, 2014

The Appen explanation titled Query Relevance delves into the work that the language, search and social technology company has done recently to improve natural language search. Linguist PhD Julie Vonwiller founded the company in 1996 with her engineer husband Chris Vonwiller. In 2010, Appen merged with Butler Hill Group and began making strides in language resources, search, and text. The article explores the issues at hand when it comes to natural language search,

“Even a query as seemingly simple as the word “blue” could be looking for any of the following: a description or picture of the color, a television show, a credit card, a misspelling of an electronic cigarette brand, or a rap artist. By analyzing what the most likely user intent is and returning valid and appropriate results in the correct order of relevance, we encourage a relationship whereby the user will return again and again to our client’s search engine.”

Appen has established a “global network” of locals who are trained experts in the language and local culture. This team allows for the most accurate interpretations of queries from regional users. The company is continually working to improve their processes, both through collaboration with users and advances in the program to provide the best possible results.

Chelsea Kerwin, March 21, 2014

Sponsored by, developer of Augmentext

Lextek Onix Profile Now Available… Free

March 20, 2014

You may not know that profiles of vendors from IDC-type operations can cost $3,500 or more. Even more impressive are azure chip consulting firms’ penchant for using information from folks who provide reports for free. Hey, there are many former middle school teachers, failed Web masters, and even poetry majors who need a job. Have at it, I say.

If you are interested in search and content processing, you may know that I have been posting 15 to 30 pages profiles of information retrieval vendor systems. Today you can snag a PDF report about Lextek International and its Onix search toolkit.

You have not heard of Lextek?

I would wager a cup of tea made from water drawn from Harrods Creek that you have used the search function in Acrobat. If you have, you have experienced the thrills of the Onix toolkit used by Adobe to make it a delight to search a PDF file.

Lextek keeps a low profile. The company operates from a suburban home in Utah., As part of the founder’s diversification effort, the driving force of Onix opened a gourmet chocolate shop. Autonomy bought Verity and Interwoven. Lextek moved into chocolate and did not implement a search system for the new venture’s Web site. Interesting to me.

You can find the report, which is current through late 2008, on my site. The report is at There are 12 reports in the series. IDC has taken down the profiles of open source search systems that appeared between 2012 and March 2014. I will be posting the unfiltered versions of these reports in coming months.

My goal is to make the complete collection of more than 50 vendor profiles available without charge. The index to the free reports in the Xenky series is at

If you want to correct or complain about a particular report, please, use the Comments section of Beyond Search for the article announcing the availability of a profile.

Before writing baloney about vendor’s origin and core technology, I suggest you check out my reports. The misinformation about which company first used the phrase “content intelligence” or “linguistic search” is amazing. My profiles point out which company used a phrase and when. For example, have you heard about “information black holes”? Autonomy used the phrase in a remarkable marketing brochure in 1997. I know that some subsequent users of the phrase assumed it was a product of their fertile mind. Nope.

Enjoy the Lextek write up. You can try the system if you have Acrobat Reader 6 or higher. Did Adobe make optimal use of Onix? In my opinion, not by a long shot.

Stephen E Arnold, March 20, 2014

The HP View of Watson

March 19, 2014

I suppose IBM will respond with more than recipes at South by Southwest. If you enjoy big companies’ analyses of one another, you will want to gobble up “15 Reasons HP Autonomy IDOL OnDemand Beats IBM Watson.” This is not the recipe for making pals with a $100 billion outfit.

What does IBM Watson have as weaknesses? What does the reinvented (sort of) Autonomy technology have as strengths? I cannot reproduce the 15 items, but I can highlight five of the weaknesses and enjoin you to crack open the slideshow that chops up the IBM Watson PR stunt.

Here are the six weaknesses I found interesting:

  1. Reason 3. IBM Watson is a data scientist heavy platform. IDOL is not. My view is that HP paid $11 billion for Autonomy and now has to deal with the write down, legal actions related to the deal, and tossing out Mike Lynch’s revenue producing formula. Set aside the data scientists and the flip side “too few data scientists” and consider the financial mountain HP has to climb. A data scientist or two might help.
  2. Reason 4. HP has “an ultimate partner story.” I find this fascinating. Autonomy grew via acquisitions and an indirect sales model. Now HP wants to make the partner model generate enough revenue to pay off the Autonomy purchase price, grow HP’s top line faster than traditional lines of business collapse, and make partners really happy. This may be a big job. See IBM weakness 9, 11, 12, and 14. There is some overlap which suggests HP is having difficulty cooking up 15 credible weaknesses of Watson. (I can name some, by the way.)
  3. Reason 6. HP offers a “proven power platform for analytics.” I am not sure about the alliteration nor am I confident in my understanding of analytics and search. IBM Watson doesn’t have much to offer in either of these departments. IDOL, at least the pre HP incarnation, had reasonably robust security capabilities. I wonder how these will be migrated to the HP multi cloud environment. IBM Watson is doing recipes, so it too has its hands full.
  4. Reason 10. HP asserts that it offers a “potential app store.” I understand app store. Apple offers one that works well. Google is in the app store business. Amazon has poked its nose into the marketplace as well. I don’t think either HP or IBM have credible app stores for variants of the two companies’ search technologies. Oh, well, it sounds good. “Potential” is a deal breaker for me.
  5. Reason 13. HP “is focused on ramping up the innovation lifecycle.” I think this means coming up with good ideas faster. I am not sure if a service can spark a client’s innovation. Doesn’t lifecycle include death? Since IBM Watson seems a work in progress, I am not sure HP’s just released reinvention of Autonomy has a significant advantage because it too is “ramping up.”
  6. Reason 15. HP has “fired up” engineers. Okay, maybe. IBM has engineers, but I am not sure if they are fired up. My question is, “Is being fired up” a good thing. I want engineers to deliver solutions that work, are not “ramping up,” and not marketing driven.

My take on this slide deck is that it is nothing more than a marketing vehicle. I had to click multiple ads for HP products and services to view the 15 reasons. Imagine my disappointment that five of the IBM weaknesses related to partnering programs. Wow, that must be really helpful to a licensee of cloud Autonomy trying to deal with performance issues on an HP data center. HP is definitely countering IBM Watson’s recipe play with old fashioned cheerleading. Rah, rah.

Stephen E Arnold, March 19, 2014

Improving SharePoint Search Efficiency

March 17, 2014

For many users, search is pretty much the main point of SharePoint, yet many complain of the inefficiency and inaccuracy of the search function. Search Windows Server addresses the issue in a great article that highlights search features from SharePoint 2007 to SharePoint 2013. Read the details in “Five Ways to Make SharePoint Search More Efficient.”

The article begins:

“Admins and end users alike find that using the search feature in SharePoint is helpful, but it can be frustrating . . . We compiled the five best tips to help SharePoint users work through common questions and situations with SharePoint search. Covering multiple versions of SharePoint, these tips highlight how to make searching in SharePoint more efficient, how to improve search functionality and more.”

Stephen E. Arnold has an interest in search; in fact he has made a career of it. His Web site,, highlights the latest in search – the good and the bad. SharePoint gets a lot of coverage.

Emily Rae Aldridge, March 17, 2014

Bing: A Quote to Note and a Search

March 16, 2014

First, navigate to Bing and run the query “Bing Market Share.” The first hit is to “The Bing Dilemma: What To Do With The Little Search Engine That Can.” The write up contains a chart showing Bing market share. Bing is the orange line. The line way at the top is Google.


The Bing Dilemma: What To Do With The Little Search Engine That Can image SEmarket share

In “Bing’s Harry Shum Bags The 2014 Outstanding Technical Leadership Award At Microsoft,” in my opinion there is a quote to note:

“I am proud that we have built a very high-quality search engine comparable to Google and with differentiating features. We have provided to society, even to humanity, a different voice than Google.”

On a philosophical note: If a search engine retrieves in the forest, are its results relevant? Your essay response is 20 percent of your grade.

Stephen E Arnold, March 16, 2014

Layers A Search Engine for Social Media

March 14, 2014

The article on titled Frankfort Teen Creates Idea for New Search Engine discusses the work of fifteen-year-old Spencer Jordan. His new idea for a search engine was to focus the search among ones social media networks. He got the idea when he was switching from one social media app and another, and noticed that it might be possible to streamline that process. Layers, Spencer’s search engine, is still in the “dream” phase,

“For now, “Layers” is just an dream, but to make it reality, Jordan has to pay a programmer to create the site. In order to raise the $10,000 needed, he began fundraising through an online donation website. “I’ve been trying to get my friends, and family and the public to support me, and to back me and to help me accomplish this,” said Jordan. As of Sunday, Jordan hasn’t raised any of his $10,000 goal, but he said failure is not an option.”

In spite of the lack of funding, Spencer is not ready to quit. (As of Thursday, February 27 he has raised $80.) Should Google be nervous or just open its checkbook to buy this idea? The ability to search through Youtube, Facebook, Twitter and Instagram is appealing; in Spencer’s words it “declutters” social media.

Chelsea Kerwin, March 14, 2014

Sponsored by, developer of Augmentext

The Facts About Endeca Guided Search

March 13, 2014

The article (and videos) titled Oracle Endeca Guided Search: Superior Search System For Your Website on the Four Cornerstone Blog will most likely convince any doubters that Endeca is right for your search needs. Endeca’s “guided navigation context” allows users refine their search. Search results are narrowed by categories, and organized within those categories (such as price range, ratings etc.) This prevents users from getting too many results or too few. Other perks included in the article,


“You can also use Oracle Endeca Guided Search with Oracle Endeca Experience Manager.  This way, you can get control over the littlest details that are related to customer experience.  This also allows you to get better content targeting and search personalization when using dynamic pages. Lastly, you can use Oracle Endeca for Mobile and Oracle Endeca for Social so that your customers have the same search experience no matter where they are.”

If this does not convince you, watch the Endeca Extensions for E-Business Suite “The moving parts” which showcases and the “simplicity of the integration” and Leveraging Your Existing OBI Investment with OEID v3.1, which explains how “IT organizations can quickly tap into their existing OBI repositories to jump start the provisioning of their own Endeca discovery applications.” Once you have read the article and seen the above Youtube videos you will most likely lose interest in open source options.

Chelsea Kerwin, March 13, 2014

Sponsored by, developer of Augmentext

HP: Deconstructing IDOL

March 12, 2014

Michael Lynch did what no other founder of a search-and-retrieval company was able to achieve. He operated a company that grew from a couple of government contracts into an $800 million plus giant in 15 years.

My analyses of the pre-Hewlett Packard Autonomy emphasize several facets of Mr. Lynch’s achievement. Competitors were not able to match Autonomy’s marketing. Whether it was the “Portal in a Box” or the augmented reality system Aurasma, competitors had to catch up with Mr. Lynch’s products, features, and benefits. As other search vendors played musical CEOs, Autonomy built a stable senior management team. With each change in leadership, competitors lost time with reorganizations and relearning. Autonomy’s management capabilities have been ignored. Mr. Lynch figured out that growth from search required acquisitions. Once the financing was in place, Autonomy gobbled up companies and its revenues soared.

Companies like Fast Search & Transfer and Endeca labored to close the revenue and marketing gap with Autonomy. Both failed. Fast Search resorted to accounting tricks, and Microsoft has been “investing” in Fast Search technology to make it fit with today’s enterprise. Endeca hit a glass ceiling at about $140 million in annual revenue despite evangelists, fancy MBAs, and a clever partnering method. Oracle is marketing Endeca as a business intelligence system and eCommerce system, not a search system. Other companies with promise just failed. These include Convera, Delphes, and Entopia. TeraText retreated to the government sector. IBM abandoned its in house search technology and just adopted Lucene, an open source toolkit. Other vendors remained essentially invisible like Albert, dtSearch, Lextek, and EPI Thunderstone, among others. Exalead disappeared into an engineering firm that is struggling with its core business.

Autonomy, like it or not, emerged after 15 years as the major brand in search, content processing, and a number of closely related fields.

Despite the changes in the search sector and in Autonomy’s technology line up, Autonomy delivered one product—IDOL, the integrated data operating layer, and its DRE, the digital reasoning engine. One product name persisted for 15 years. One technology, the DRE, powered the famous “black box” at the heart of every autonomy product or service when developed in house or acquired. Once Autonomy bought a company, it IDOLized the product or service.

I read “HP Breaks Autonomy IDOL into Discrete Services.” The write up smacks of the “real journalism” from the azure chip outfit IDC. The story reported in cheerleader fashion:

The service will expose most of the IDOL features as discrete services, accessible through APIs (application programming interfaces). HP is hoping that enterprise developers use the service to embed IDOL functionality into their own applications.

At first glance, this is no big deal. Exalead was moving in this direction before it was purchased by Dassault. Elasticsearch offers a compelling open source and lower cost alternative as well.

In my view, HP has a big job ahead of it. The company has to generate enough revenue from Autonomy licenses to pay back its purchase price, now deeply discounted to several billion dollars. Considering that it took Autonomy 15 years to nose toward $900 million, the HP sales professionals have to get in gear. After all, HP needs to turn Autonomy into a net producer of revenue and profit.

In addition, HP has to make certain that its deconstruction of IDOL does not lose the famous Autonomy magic. Without magic, I am not confident that 1996 technology can cope with the challenges of today’s information processing needs. (Google is also a late 1990s company faced with similar problems of ageing technology and concepts.) Good enough search is available from open source repositories. Lower cost options are available from upstarts like Elasticsearch and Searchdaimon. Once the magic is gone, magic is tough to recapture.

HP has to find a way to make Autonomy’s services usable to those customers who want to download and app and have it work. Autonomy reaches back to the 1990s. Today’s information technology professionals are into a different type of computing experience. Of course, there are organizations that have the money, time, and appetite to tackle Bayesian methods infused with Monte Carlo and Markov Chain methods, seasoned with Laplacian techniques. My hunch is that complexity has the potential to add friction to the chopped up mini-IDOLs and DREs.

Net net: HP has to find a way to make big money flow in a market which is coveted by IBM Watson, Microsoft, and numerous other vendors.

Would Michael Lynch have chopped up IDOL? I don’t think he will be available to answer this question. The squabble about HP’s purchase price generate considerable noise at a time when HP needs focus, clarity, and numerous sales.

Worth watching.

Stephen E Arnold, March 12, 2014

« Previous PageNext Page »