Searchblox Uses ElasticSearch to Power Multilingual Search

December 3, 2014

The article titled Multilingual Search—Easy to Setup and Manage For Your Website on Searchblox discusses the difficulty of multilingual search. If you think search in English only is complicated enough, consider global corporations that must make search possible in any number of languages, all with their own sets of synonyms and double meanings. Cross-language search is particularly difficult, given that the existence of terms that have different meanings in different languages (occasionally with hilarious results.) The article explains,

“SearchBlox provides a simple solution that takes care of setting up search for non-english languages and supports 25+ languages out-of-the-box. Each collection is tied to a specific language which enables you to tune the stop words, synonyms and meta data handling without complicated configuration. SearchBlox lets you search across multiple languages at the same time and display them together taking out the complexities of handling encoding. SearchBlox lets you index multilingual documents like word, pdf, excel and ppt files…”

SearchBlox uses Elasticsearch as an engine. The article lists all of the languages supported by Searchblox, from Arabic and Bengali to Kannada, Slovak, Romanian and Telugu all the way down to Thai and Turkish. Encoding and displaying search results has always been a challenge in multiple languages, but Searchblox guarantees full search capabilities in the whole list of languages.

Chelsea Kerwin, December 03, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Recommind Placed on the Deloitte 2014 Technology Fast List Sixth Year in a Row

December 2, 2014

The article titled Recommind Ranked Among Fastest Growing Companies in North America on Deloitte’s 2014 Technology Fast 500(TM) on Consumer Electronics Net announced that Recommind, software application provider, has garnered a spot on the Deloitte Technology Fast 500. This is the sixth consecutive year that Recommind has earned a place on the list of the fastest growing tech companies in North America. The article states,

“The companies ranked on the 2014 Deloitte Technology Fast 500 continue to set the bar for their industry higher each year,” said Eric Openshaw, vice chairman, Deloitte LLP and U.S. technology, media and telecommunications leader. “There are so many exciting products and smart thought leaders driving this list. We congratulate the Fast 500 companies and look forward to seeing them continue their momentum into 2015.” Recommind attributes success to its innovations in enterprise data management solutions…”

At this rate, Recommind seems to be poised to be the next Fast Search or Autonomy IDOL. Customers include AstraZeneca, The US Department of Energy, Cisco, and Marathon Oil, among others. In order to be considered for the Deloitte listing, companies must own “proprietary intellectual property” the sale of which contributes the majority of the company’s revenue. Recommind is a leader in unstructured data management solutions as well as Discovery technology.

Chelsea Kerwin, December 02, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Useful Tip for Elasticsearch Admins

December 1, 2014

Short honk: Elasticsearch continues to outpace the other open source search vendors. I know that some of the companies with venture funding folks breathing down their necks say otherwise. Keep in mind that there is a difference between performing and saying one is able to perform. Elasticsearch delivers functionality that we find valuable. Also, from the information flowing through my Overflight system, Elasticsearch works. Really!

A useful security configuration article offers helpful tips. Navigate to “Elasticsearch: Dealing with Complex Permissions.” The short article provides some code snippets that you will find instructive.

Stephen E Arnold, December 1, 2014

Finding Books: Not Much Has Changed

December 1, 2014

Three or four years ago I described what I called “the book findability” problem. The audience was a group of confident executives trying to squeeze money from an old school commercial database model. Here’s how the commercial databases worked in 1979.

  1. Take content from published sources
  2. Create a bibliographic record, write or edit the abstract included with the source document
  3. Index it with no more than three to six index terms
  4. Digitize the result
  5. Charge a commercial information utility to make it available
  6. Get a share of the revenues.

That worked well until the first Web browser showed up and individuals and institutions began making information available online. There are a number of companies that still use variations of this old school business model. Examples include newspapers that charge a Web browser user for access to content to outfits like LexisNexis, Ebsco, Cambridge Scientific Abstracts, and other outfits.

image

As libraries and individuals resist online fees, many of the old school outfits are going to have to come up with new business models. But adaptation will not be easy. Amazon is in the content business. Why buy a Cliff’s Notes-type summary when there are Amazon reviews? Why pay for news when a bit of sleuthing will turn up useful content from outfits like the United Nations or off the radar outfits like World News at www.wn.com? Tech information is going through a bit of an author revolt. While not on the level of protests in Hong Kong, a lot of information that used to be available in research libraries or from old school database providers is available online. At some point, peer reviewed journals and their charge the author business models will have to reinvent themselves. Even recruitment services like LinkedIn offer useful business information via Slideshare.com.

One black hole concerns finding out what books are available online. A former intelligence officer with darned good research skills was not able to locate a copy of my The New Landscape of Search. You can find it here for free.

I read “Location, Location: GPS in the Medieval Library.” The use of coordinates to locate a book on a shelf or hanging from a wall anchored by a chain is not new to those who have fooled around with medieval manuscripts. Remember that I used to index medieval sermons in Latin as early as 1963.

What the write up triggered was the complete and utter failure of indexing services to make an attempt to locate, index, and provide a pointer to books regardless of form. The baloney about indexing “all” information is shown to be a toothless dragon. The failure of the Google method and the flaws of the Amazon, Library of Congress, and commercial database providers is evident.

Now back to the group of somewhat plump, red face confident wizards of commercial database expertise. The group found my suggestion laughable. No big deal. I try to avoid the Titanic type operations. I collected my check and hit the road.

There are domains of content that warrant better indexing. Books, unfortunately, is one set of content that makes me long for the approach that put knowledge in one place with a system that at least worked and could be supplemented by walking around and looking.

No such luck today.

Stephen E Arnold, December 1, 2014

Poor Search Equals Poor E-Sales

November 27, 2014

Logically this statement makes sense and if you have been paying attention to facts you already knew it:

“A recent study by the Baymard Institute, an independent web research institute with a focus on e-commerce usability and optimization, found that many of the top 50 U.S. e-commerce sites are lacking essential e-commerce search capabilities which is hindering current online sales.”

Please feel free to insert your favorite exasperation for pointing out the obvious. This is something that even an experienced online retail shopper could tell you. Digital Journal covers the story in “Baymard Institute Study Finds Major Problems With Search On Leading E-Commerce Sites.”

Baymard found that most users don’t like browsing through categories. The search function on these big e-retailers also found they don’t have a spell check feature, did not support thematic or product searches, and required specific jargon.

EasyAsk responded to the Baymard’s with a white paper detailing how e-commerce Web site can improve their search feature to improve sales. One way is supporting natural language search. The white paper is available for free download.

Whitney Grace, November 27, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Disappearing Content: You Cannot Search for It if It Is Not There

November 25, 2014

The issue of shaped and filtered content is becoming more and more of a mainstream topic. I read “Uber Removed Blog Post from Data Science Team That Examined Link between Prostitution and Rides.” The world’s oldest profession meets the world’s newest ride service. I noted this passage in the write up:

“The company examined its rider data, sorting it for anyone who took an Uber between 10 p.m. and 4 a.m. on a Friday or Saturday night. Then it looked at how many of those same people took another ride about four to six hours later – from at or near the previous nights’ drop-off point. Yes, Uber can and does track one-night stands. Consider it the Uber equivalent of the walk of shame.”

How will this corporate approach to content play out? My hunch is that content has been getting removed for a long time. I recall looking for information about the Spyglass browser decades ago and finding a 404 error.

At least today there are copies of the Web, caches on public systems, and people who store content on their drives. Nevertheless, most people cannot search for content that is not “there.” Has anyone looked for CMS information about the original MIC, RAC, and ZPIC contractors? My hunch is that more attention should be paid to content that goes missing, not because of its prurient nature, but because the disappearance of content provides very useful information about the behavior of people, systems, and organizations.

Stephen E Arnold, November 25, 2014

Elasticsearch Ups the Pressure on LucidWorks (Really?)

November 23, 2014

I am not too keen on videos. I prefer reading hard copies. I did find the video referenced in “Elasticsearch Uses Power of Community for Open Source Analytics” useful. My team and I are putting the finishing touches on a report that points out how enterprise search vendors have been leapfrogged by vendors rarely considered by mid tier consultants and the self appointed experts in search. The video drives home a simple point: Combining open source technologies delivers information access functions that are more useful to users than laundry lists, odd ball point and click suggested content, and confusing mash ups of information presented without context.

Why the reference to Lucid? One of the firm’s presidents had been involved with Jaspersoft, an open source analytics outfit. Despite this “inside track”, Elasticsearch has powered past Lucid, leaving that open source vendor struggling to reach parity with Elasticsearch. Elasticsearch itself faces challenges, but that’s the name of the game when keyword search is the keystone of a service. For now, Elasticsearch leaves competitors rushing to close the gap. By the way, this subject was the focal point of one of Dave Schubmehl’s IDC reports that surfed on my name. The juicy part about the “gap” was edited from my original write up. Nevertheless, the facts remain valid. Kudos to Elasticsearch.

Stephen E Arnold, November 23, 2014

Want to Lose Weight? Google Results Could Make You Fat

November 18, 2014

I never know what to make of article that report about research results. Mistakes that would not fly in a Stats 101 class are the norm. I did work through “Poor-Quality Weight Loss Advice Often Appears First in an Online Search.”

Here’s the passage that I highlighted with my new bright pink marker:

The study reveals that the first page of results, using a search engine like Google, is likely to display less reliable sites instead of more comprehensive, high-quality sites, and includes sponsored content that makes unrealistic weight loss promises.

I would not be surprised if there were a Federal grant boosting this ground breaking, never before thought of, issue.

I find the results presented by advertising supported search engines incredibly useful, relevant, and on point. The notion that one might have to use a system other than Bing or Google to get accurate information is a new thought.

I liked this bit about the timeliness and rigor of the research too:

In 2012, the researchers accessed 103 websites for queries specific to weight loss and scored the content on its adherence to available evidence-based guidelines for weight loss. Medical, government and university sites ranked highest, along with blogs.

Yes, blogs and governmental entities are fonts of accurate information. With data from a mind boggling 103 Web sites to evaluate, I am amazed with the speed with which the information found its way to an online publication.

Stephen E Arnold, November 18, 2014

Sinequa Launches U.S. Subsidiary

November 13, 2014

Another French company is setting up shop in America. Business Wire reports, “Sinequa to Open New York Subsidiary Following Continued Success in U.S. Market.” The search and analytics company has found so much prosperity here they’ve decided to make it official. The write-up tells us:

“The creation of the U.S. Corporation and the New York office is a necessary next step as Sinequa has recently announced major client wins in the U.S including AstraZeneca, Biogen Idec, Mercer, and several ongoing RFPs. Sinequa is projecting that the largest part of the company’s future growth will come from the U.S., and they are also expecting to extend their presence there as well.

“The first employee of Sinequa Corp. is Nicolas Brel, who brings his deep experience with Sinequa to New York as the company’s Solutions Architect. The hiring process for a new sales manager is well advanced, and Xavier Pornain, VP Sales & Alliances, will soon relocate to the U.S. and assume the role of country manager and head of Sinequa Corp.”

Will Sinequa outperform Dassault (Exalead), Antidot, or PolySpot? The French market appears to be problematic. The U.S. market, on the other hand, seems to be a slam dunk compared to demand for proprietary search in the healthy European countries. Interesting.

Launched in 2002, Sinequa serves hundreds of organizations around the world. Naturally, the company boasts strong business analytics across multiple data sources and types. Sinequa also emphasizes an intuitive interface and strong visualization tools. Based in Paris, the firm maintains offices in Frankfurt and London as well as their shiny new location in New York City.

Cynthia Murrell, November 13, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Will Two Xooglers Burnish Yahoo?

November 12, 2014

I read an exclusive story. Know how I know the story is “exclusive”? Here’s the title:

Exclusive: Some Unhappy Yahoo Investors Asking AOL for Rescue

Obviously you have to read the foundation’s exclusive. I want to focus on a different question: Can two former Google executives repair Yahoo’s revenues? I am less than optimistic. I used an illustration in one of the briefings I did during the era of Terry Semel. The picture featured a sinking ship with Mr. Semel’s face Photoshopped into a captain’s uniform.

As I pointed out years ago, once an Internet portal service loses its momentum, flat-lining is the upside. The downside is a slow, gentle drift into irrelevance. So the answer to the question, in my opinion, is, “Long shot.”

I like to recall Yahoo’s former chief technology officer railing me on a conference call about Yahoo’s super-advanced search technology. How is that working out?

Stephen E Arnold, November 12, 2014

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta