Ravn Amps Up Its Search Prowess

May 9, 2014

I read “RAVN Systems Revolutionises COWI’s SharePoint 2013 Search.” I learned several things. First, COWI means “a leading international consulting group with 50 remote locations.”

Next, RAVN delivers some performance assertions; for example:

In representative tests across their estate COWI have achieved a 57% reduction in indexing time of remote content, over 90% reduction in bandwidth usage during indexing and 70% reduction in time to preview compared with opening content. They have also estimated a saving of 12 physical servers.

Unfortunately there were no data about life before RAVN, the system’s throughput, etc. But the assertion is interesting.

Finally, the article states:

“RAVN Connect revolutionises SharePoint Search in distributed environments”.

I have heard this before from Fulcrum Technologies decades ago. I assume this time the nail in SharePoint’s findability coffin is hammered tight. No word from the legions of other SharePoint indexing systems, however.

Stephen E Arnold, May 9, 2014

RSuite Incorporates Temis into Content Management Platform

May 8, 2014

RSuite content management users can now can tap into TEMIS, we learn from “RSuite CMS Leverages TEMIS’s Content Enrichment Capabilities to Deliver a Powerful Semantic Solution.” The partnership makes TEMIS’s semantic enrichment capabilities available to RSuite’s customers in the publishing, government, and corporate arenas. The deal was announced at this year’s MarkLogic World conference, held April seventh in San Francisco; both companies are MarkLogic partners.

The press release elaborates:

“RSuite CMS provides an intuitive user interface that minimizes actions required to execute complex searches across an entire set of content. The solution can globally apply metadata, dynamically organize massive amounts of documents into collections, package and distribute content to licensing partners, and enables customers to meet their multi-channel publishing goals.

“By leveraging TEMIS’s Luxid® Content Enrichment Platform, RSuite CMS can enable customers to automatically enrich their content with domain-specific metadata directly within their publishing workflows. This enables faster and more scalable content indexing, improved metadata consistency and governance, more efficient authoring, and more powerful search and discovery features within customer applications and portals.”

With its focus on publishing and media, RSuite strives to meet today’s ever-evolving publication challenges. The company serves such big names as HarperCollins, Audible, and Oxford University Press. RSuite was launched in 2000 and is located in Audubon, Pennsylvania.

With its collaborative platform, TEMIS adds domain-specific metadata to clients’ data, allowing publishers to supply more relevant information to their own audiences. TEMIS maintains several offices across Europe and North America.

Cynthia Murrell, May 08, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

OpenText: Poetry Is Better than Its Search Systems

May 1, 2014

OpenText has a special place in the Overflight archive. The company once sort of supported the Autonomy IDOL engine in something called RedDot. Then OpenText sells mainframey search systems like Information Dimension’ now really old BASIS system and the BRS/Search system. Love those green screens! Somewhere inside the company is Dr. Tim Bray’s SGML search and data management system. And for the history buffs, can you name the 1983 technology that continues to influence Hummingbird, another OpenText information system. Now I am sure I have notes on the Nstein technology, a once much hyped search, indexing, and management system. I grow weary.

I just read “OpenText Launches Discovery Suite to Capture and Create Value in Big Content.” The write up announces something that OpenText has been selling for years. The buzzwordage is notable, and you can find my view of content processing jargon in this six minute video.

What I noted was the probably unintentional inclusion of some Latinate sentence structures and a near miss on a type of poetry not practiced since William Carlos William riffed on red wheelbarrows. Here’s the melodious sequence I noted:

OpenText can integrated the unintegrated, structure the unstructured, and manage the unmanaged.

I am sorely tempted to add some lines like “support the unsupported,” but I will not.

Stephen E Arnold, May 1, 2014

Yandex Profit Goes Up

April 24, 2014

Bloomberg’s real journalists reported some Web search news I found interesting. Navigate to “Yandex Profit Rises 19% on Russia Internet Advertising Demand.” Google gets the spotlight. Yandex warrants more attention. The English language search service at www.yandex.com is okay. The gem is the Yandex Russian service at www.yandex.ru. Content in this index is not easily available via US Web indexing services without the searcher’s performing some acrobatics. Yandex, however, is doing the me too thing. My hunch is that its usefulness will erode as the advertising revenue gains more traction. Precision, recall—just a distant memory for Bing and Google. Yandex’s utility may decline as the money rolls in. By the way, what happened to the Yandex search appliance?

Stephen E Arnold, April 24, 2014

Litigation Software dtSearch Demo

April 16, 2014

The dtSearch Desktop Demonstration Video on nlsblog.org shows how to setup and search with dtSearch for Windows. The 12 minute video begins with an introduction to dtSearch, which is able to “recognize text in over 200 common file types.” By indexing the locations of words in different files, dtSearch is able to build an almost limitless index of documents. The demo walks through the setup of dtSearch. After naming the index,

“It is important to keep in mind that when we add items here, dtSearch is not creating copies… but links to those files. A good practice is to put the files and folder that we want to run searches on into a single centralized location, before we create the index… all we need to do is add this discovery folder, and the subfolders and files will be automatically included…dtSearch reads the text in the linked files and creates a searchable words list.”

Then you are able to search which index to search through, and limit it to one case, or all cases. The word appears with a number, show how often it appears in the index. Then you can add the keyword to the search request to find the documents in which the word appears. You are able to preview a document, copy a file, and create a search report. The demo goes into great detail about all of the search options, and should certainly be viewed in full to learn the best methods, but it does not provide metrics for the time required to build the initial index or update it. These metrics are useful.

Chelsea Kerwin, April 16, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Content Management: A $12 Billion Market in 2019!

April 8, 2014

Now I enjoy crazy numbers. I recall that someone at Yahoo allegedly said to a New York Times reporter:

Yahoo estimates that it would cost $300 million to build a search service from scratch. [See New York Times, July 10, 2008, page C5) My story about this estimate is at http://wp.me/pf6p2-e9.]

Crazy number. Three hundred million would not buy a Web search system in 2008. Today it may cover the cost of jet fuel for Google’s fleet of airplanes.

But crazy numbers get traction and create “real news.”

I read “Enterprise Content Management Market worth $12.32 Billion by 2019.” Now that is an interesting estimate. The calculation surprised me for three reasons:

  1. The outfit promulgating the good “news” is selling a report, presumably to those in the content management sector who need reassurance.
  2. There was no mention of WordPress- and SquareSpace-type outfits, which seem to be moving ahead of the pack of name brand vendors.
  3. The assumption that I actually know what content management or CMS means.

Like search, the CMS vendors have been looking for a way to become more relevant. The implementations of Broadvision, Documentum, Interwoven, Vignette, and other well known CMS systems have had some successes and failures.

The “real” news about this report mentions some aspects of CMS that are similar to the scope creep visible in enterprise search. Here are some examples of what CMS embraces:

enterprise document management, enterprise document imaging and capture, enterprise web content management, enterprise records management, enterprise document collaboration, enterprise digital rights management, content analytics, rich media management, advanced case management, enterprise document output management, enterprise workflow management, and other solutions; by type of emerging applications: social content management, mobile content management, big data management, and cloud content management; by type of deployments: hosted and on-premises; by verticals: academia and education, banking, financial services and insurance (BFSI), consumer goods and retail, energy and power, government and defense, life science and healthcare, manufacturing, media and entertainment, telecom and IT, transportation, tourism, and hospitality, and other verticals; and by regions: North America (NA), Asia Pacific including Japan (APAC), Europe (EU), Middle East and Africa (MEA), and Latin America (LA).

This list is not helpful to me. I think the collection of jargon, buzzwords, and impressive sounding concepts is designed for Web indexing systems and to give a marginalized type of software some strap on muscles.

If information about the magnitude of the CMS market requires this type of verbal legerdemain, how credible is the report, the estimate, and maybe content management itself?

My personal view is that the buzzword content management, like knowledge management, is tough to define and may ultimately lack relevance in today’s business environment. The notion that a specious estimate adds value to those laboring in the CMS sector is amusing. The puffery, apologias, and jargon generated by those trying to sell systems that “manage” content causes me to chortle. Estimates of the volume of Big Data seem to fly in the face of “content management.” Even Google’s robots are struggling to keep pace with content proliferation based on my test queries.

At a time when organizations struggle to figure out what information is in their possession, CMS seems to have failed in its “mission”: Managing content.

CMS’ weakness is the notion of management itself. Since “management” is tough to define, content management sounds like a discipline cooked up by MBA hopefuls in an innovation study group.

Stephen E Arnold, April 7, 2014

Elasticsearch: 70:30 Odds as the Next Big Thing in Search

March 28, 2014

We learned on March 26, 2014  suggesting that the German search vendor Intrafind has been looking for the next big thing. The company may have found it, and we expect that this low profile vendor will be plugging into the Elasticsearch power cable. Wikipedia already has, joining hundreds of other firms looking for a solution to doggy indexing in some other open source centric solutions.

Elasticsearch repackager SearchBlox has rolled out Version 8 of its hosted Elasticsearch system, according to Timo Selvaraj, Co-Founder/VP Product Management of SearchBlox.

As if these two recent developments were not enough, GoveWizely, a Washington, DC engineering services firm, has added Elasticsearch to its arsenal. GovWizely, operated by Erik S. Arnold (yep, that’s my boy) has moved adroitly to capitalize on the surging interest in Elasticsearch’s high performance system.

Contrast Elasticsearch’s rise as the go to open source enterprise search system with the struggles of other open source search vendor and some commercial outfits. LucidWorks has ingested $2 million in venture funding, according to Crunchbase. Elasticsearch has received $34 million in funding. Parity, right?

Not so “fast”. (A gentle nod to the fascinating proprietary system shoe horned by Microsoft into SharePoint.) Elasticsearch seems to be catching up to LucidWorks or winning the critical struggle for developers. Here’s the Elasticsearch pitch:

image

Understated and quiet, according to my engineering team. Could the developments at Intrafind, SearchBlox, and Adhere Solutions, among others, are an early warning system, Elasticsearch certainly could be the “next big thing” in search, enterprise and otherwise.

What’s this mean for the proprietary and non open sourcey vendors like Coveo, Funnelback, Lexmark ISYS, and Hewlett Packard? I would suggest that these firms’ management have to adapt to what appears to an emergent and disruptive force in information processing. If Elasticsearch does emulate the growth of the pre HP Autonomy, the likelihood that the millions of venture funding pumped into search funding and search acquiring may never be repaid. Chilling thought for some stakeholders who may have jumped on the wrong horse and seem compelled to continue to feed the nag fresh, expensive, non recoverable “clover.” (Think millions in hard cash funding with little to show that a payback is imminent or even possible.)

Read more

US Government Content Processing: A Case Study

March 24, 2014

I know that the article “Sinkhole of Bureaucracy” is an example of a single case example. Nevertheless, the write up tickled my funny bone. With fancy technology, USA.gov, and the hyper modern content processing systems used in many Federal agencies, reality is stranger than science fiction.

This passage snagged my attention:

inside the caverns of an old Pennsylvania limestone mine, there are 600 employees of the Office of Personnel Management. Their task is nothing top-secret. It is to process the retirement papers of the government’s own workers. But that system has a spectacular flaw. It still must be done entirely by hand, and almost entirely on paper.

One of President Obama’s advisors is quote as describing the manual operation as “that crazy cave.”

And the fix? The article asserts:

That failure imposes costs on federal retirees, who have to wait months for their full benefit checks. And it has imposed costs on the taxpayer: The Obama administration has now made the mine run faster, but mainly by paying for more fingers and feet. The staff working in the mine has increased by at least 200 people in the past five years. And the cost of processing each claim has increased from $82 to $108, as total spending on the retirement system reached $55.8 million.

One of the contractors operating the system is Iron Mountain. You may recall that this outfit has a search system and caught my attention when Iron Mountain sold the quite old Stratify (formerly Purple Yogi automatic indexing system to Autonomy).

My observations:

  1. Many systems have a human component that managers ignore, do not know about, or lack the management horsepower to address. When search systems or content processing systems generate floods of red ink, human processes are often the culprit
  2. The notion that modern technology has permeated organizations is false. The cost friction in many companies is directly related to small decisions that grow like a snowball rolling down a hill. When these processes reach the bottom, the mess is no longer amusing.
  3. Moving significant information from paper to a digital form and then using those data in a meaningful way to answer questions is quite difficult.

Do managers want to tackle these problems? In my experience, keeping up appearances and cost cutting are more important than old fashioned problem solving. In a recent LinkedIn post I pointed out that automatic indexing systems often require human input. Forgetting about those costs produces problems that are expensive to fix. Simple indexing won’t bail out the folks in the cave.

Stephen E Arnold, March 24, 2014

Stephen E Arnold, March 24, 2014

IBM Watson: Now a Foodie

March 17, 2014

One of my two or three readers sent me a link to “IBM’s New Food Truck Uses a Supercomputer to Dream Up All Their ‘Surprising’ Recipes.” For code wrappers and Lucene, Watson is a versatile information processing system. Instead of an online demo of Web indexing, I learned about “surprising recipes.”

The initiative to boost Watson toward its $10 billion revenue goal involves the Institute of culinary Education.” The idea is that IBM and ICE deliver “computational creativity” to create new recipes. Julia Child would probably resist computerizing her food activities. Her other, less well known activities, would have eagerly accepted Watson’s inputs.

The article quotes IBM as saying:

“Creating a recipe for a novel and flavorful meal is the result of a system that generates millions of ideas out of the quintillions of possibilities,” IBM writes. “And then predicts which ones are the most surprising and pleasant, applying big data in new ways.”

The article even includes a video. Apparently the truck made an appearance at South by Southwest. From my cursory research, the Watson truck was smart enough to be elsewhere when the alleged inebriated driver struck attendees near the pivot point of Austin’s night life.

The IBM marketing professionals are definitely clear headed and destined for fame as the food truck gnaws its way into the $10 billion revenue objective. Did IBM researchers ask Watson is this was an optimal use of its computational capabilities. Did Watson contribute to the new Taco Bell loaded beefy nacho grillers. Ay, Caramba!

Stephen E Arnold,

Search and Management Appliances from InfoLibrarian

February 18, 2014

Folks looking for affordable data-management and search solutions should check out InfoLibrarian. You can get their Metadata Management Appliance and pair it with their Search Appliance, both for about $3,500. Just to be clear, these are not software applications; they are hardware units you would plug into your network like a hard drive. The description for the Management Appliance tells us:

“The InfoLibrarian Metadata Appliance takes enterprise search and metadata management to a whole new level. Manage and synchronize metadata, documents, files, source code, and virtually any digital asset. You name it… InfoLibrarian catalogs it. Hundreds of Adapters and document crawlers are available to automatically index, categorize and keep history of changes over time. Business friendly search engine/portal to navigate categories; perform search, impact analysis and data lineage analysis across disparate systems.”

The page goes on to emphasize certain features, like centralized, role-based security controls; automation options; simplified collaboration; and classification tools that go beyond those normally found in enterprise indexing products. To search your impeccably managed data, you could choose the corresponding Search Appliance. That description reads:

“The InfoLibrarian Search Appliance is ready to go, just plug it into your network and setup indexing of files, databases and web sites. Almost instantly, you can begin searching. It’s Fast … Powerful and Easy!

Hundreds of document crawlers are available to automatically index, categorize and keep history of changes over time. Bundled with all the features you expect including a simple search interface with integrated spell checking, advanced searching and configurable results.”

The page notes that you can customize this appliance with either templates or API. The highlight for me is InfoLibrarian’s vow that this device provides the “most secure search available.” That’s reason enough to look into it. See each product’s page for the full lists of their features.

Headquartered in Rochester, New York, InfoLibrarian has been helping organizations in a range of industries to manage and analyze data since 1998. The privately held company strives to provide their clients with the best metadata-integrated solutions on the market.

Cynthia Murrell, February 18, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta