Enterprise Search and the Mythical Five Year Replacement Cycle

July 9, 2015

I have been around enterprise search for a number of years. In the research we did in 2002 and 2003 for the Enterprise Search Report, my subsequent analyses of enterprise search both proprietary and open source, and the ad hoc work we have done related to enterprise search, we obviously missed something.

Ah, the addled goose and my hapless goslings. The degrees, the experience, the books, and the knowledge had a giant lacuna, a goose egg, a zero, a void. You get the idea.

We did not know that an enterprise licensing an open source or proprietary enterprise search system replaced that system every 60 months. We did document the following enterprise search behaviors:

Users express dissatisfaction about any installed enterprise search system. Regardless of vendor, anywhere from 50 to 75 percent of users find the system a source of dissatisfaction. That suggests that enterprise search is not pulling the hay wagon for quite a few users.
Organizations, particularly the Fortune 500 firms we polled in 2003, had more than five enterprise search systems installed and in use. The reason for the grandfathering is that each system had its ardent supporters. Companies just grandfathered the system and looked for another system in the hopes of finding one that improved information access. No one replaced anything was our conclusion.
Enterprise search systems did not change much from year to year. In fact, the fancy buzzwords used today to describe open source and proprietary systems were in use since the early 1980s. Dig out some of Fulcrum’s marketing collateral or the explanation of ISYS Search Software from 1986 and look for words like clustering, automatic indexing, semantics, etc. A short cut is to read some of the free profiles of enterprise search vendors on my Xenky.com Web site.

I learned about a white paper, which is 21st century jargon for a marketing essay, titled “Best Practices for Enterprise Search: Breaking the Five-Year Replacement Cycle.” The write up comes from a company called Knowledgent. The company describes itself this way on its Who We Are Web page:

Knowledgent [is] a precision-focused data and analytics firm with consistent, field-proven results across industries.

The essay begins with a reference to Lexis, which along with Don Wilson (may he rest in peace) and a couple of colleagues founded. The problem with the reference is that the Lexis search engine was not an enterprise search and retrieval system. The Lexis OBAR system (Ohio State Bar Association) was tailored to the needs of legal researchers, not general employees. Note that Lexis’ marketing in 1973 suggested that anyone could use the command line interface. The OBAR system required content in quite specific formats for the OBAR system to index it. The mainframe roots of OBAR influenced the subsequent iterations of the LexisNexis text retrieval system: Think mainframes, folks. The point is that OBAR was not a system that was replaced in five years. The dog was in the kennel for many years. (For more about the history of Lexis search, see Bourne and Hahn, A History of Online information Services, 1963-1976. By 2010, LexisNexis had migrated to XML and moved from mainframes to lower cost architectures. But the OBAR system’s methods can still be seen in today’s system. Five years. What are the supporting data?

The white paper leaps from the five year “assertion” to an explanation of the “cycle.” In my experience, what organizations do is react to an information access problem and then begin a procurement cycle. Increasingly, as the research for our CyberOSINT study shows, savvy organizations are looking for systems that deliver more than keyword and taxonomy-centric access. Words just won’t work for many organizations today. More content is available in videos, images, and real time almost ephemeral “documents” which can difficult to capture, parse, and make findable. Organizations need systems which provide usable information, not more work for already overextended employees.

The white paper addresses the subject of the value of search. In our research, search is a commodity. The high value information access systems go “beyond search.” One can get okay search in an open source solution or whatever is baked in to a must have enterprise application. Search vendors have a problem because after decades of selling search as a high value system, the licensees know that search is a cost sinkhole and not what is needed to deal with real world information challenges.

What “wisdom” does the white paper impart about the “value” of search. Here’s a representative passage:

There are also important qualitative measures you can use to determine the value and ROI of search in your organization. Surveys can quickly help identify fundamental gaps in content or capability. (Be sure to collect enterprise demographics, too. It is important to understand the needs of specific teams.) An even better approach is to ask users to rate the results produced by the search engine. Simply capturing a basic “thumbs up” or “thumbs down” rating can quickly identify weak spots. Ultimately, some combination of qualitative and quantitative methods will yield an estimate of search, and the value it has to the company.

I have zero clue how this set of comments can be used to justify the direct and indirect costs of implementing a keyword enterprise search system. The advice is essentially irrelevant to the acquisition of a more advanced system from an leading edge next generation information access vendor like BAE Systems (NetReveal), IBM (not the Watson stuff, however), or Palantir. The fact underscored by our research over the last decade is tough to dispute: Connecting an enterprise search system to demonstrable value is a darned difficult thing to accomplish.

It is far easier to focus on a niche like legal search and eDiscovery or the retrieval of scientific and research data for the firm’s engineering units than to boil the ocean. The idea of “boil the ocean” is that a vendor presents a text centric system (essentially a one trick pony) as an animal with the best of stallions, dogs, tigers, and grubs. The spam about enterprise search value is less satisfying than the steak of showing that an eDiscovery system helped the legal eagles win a case. That, gentle reader, is value. No court judgment. No fine. No PR hit. A grumpy marketer who cannot find a Web article is not value no matter how one spins the story.

Written by Stephen E. Arnold · Filed Under Consulting, Enterprise search, Feature | Comments Off on Enterprise Search and the Mythical Five Year Replacement Cycle

Does America Want to Forget Some Items in the Google Index?

July 8, 2015

The idea that the Google sucks in data without much editorial control is just now grabbing brain cells in some folks. The Web indexing approach has traditionally allowed the crawlers to index what was available without too much latency. If there were servers which dropped a connection or returned an error, some Web crawlers would try again. Our Point crawler just kept on truckin’. I like the mantra, “Never go back.”

Google developed a more nuanced approach to Web indexing. The link thing, the popularity thing, and the hundred plus “factors” allowed the Google to figure out what to index, how often, and how deeply (no, grasshopper, not every page on a Web site is indexed with every crawl).

The notion of “right to be forgotten” amounts to a third party asking the GOOG to delete an index pointer in an index. This is sort of a hassle and can create some exciting moments for the programmers who have to manage the “forget me” function across distributed indexes and keep the eager beaver crawler from reindexing a content object.

The Google has to provide this type of third party editing for most of the requests from individuals who want one or more documents to be “forgotten”; that is, no longer in the Google index which the public users’ queries “hit” for results.

According to “Google Is Facing a Fight over Americans’ Right to Be Forgotten.” The write up states:

Consumer Watchdog’s privacy project director John Simpson wrote to the FTC yesterday, complaining that though Google claims to be dedicated to user privacy, its reluctance to allow Americans to remove ‘irrelevant’ search results is “unfair and deceptive.”

I am not sure how quickly the various political bodies will move to make being forgotten a real thing. My hunch is that it will become an issue with legs. Down the road, the third party editing is likely to be required. The First Amendment is a hurdle, but when it comes times to fund a campaign or deal with winning an election, there may be some flexibility in third party editing’s appeal.

From my point of view, an index is an index. I have seen some frisky analyses of my blog articles and my for fee essays. I am not sure I want criticism of my work to be forgotten. Without an editorial policy, third party, ad hoc deletion of index pointers distorts the results as much, if not more, than results skewed by advertisers’ personal charm.

How about an editorial policy and then the application of that policy so that results are within applicable guidelines and representative of the information available on the public Internet?

Wow, that sounds old fashioned. The notion of an editorial policy is often confused with information governance. Nope. Editorial policies inform the database user of the rules of the game and what is included and excluded from an online service.

I like dinosaurs too. Like a cloned brontosaurus, is it time to clone the notion of editorial policies for corpus indices?

Stephen E Arnold, July 8, 2015

Written by Stephen E. Arnold · Filed Under Database, Indexing, News | Comments Off on Does America Want to Forget Some Items in the Google Index?

Semantic Search and Challenging Patent Document Content Domains

July 7, 2015

Over the years, I have bumped into some challenging content domains. One of the most difficult was the collection of mathematical papers organized with the Dienst architecture. Another was a collection of blog posts from African bulletin board systems in a number of different languages, peppered with insider jargon. I also recall my jousts with patent documents for some pretty savvy outfits.

The processing of each of these corpuses and making them searchable by a regular human being remains an unsolved problem. Progress has been slow, and the focus of many innovators has been on workarounds. The challenge of each corpus remains a high hurdle, and in my opinion, no search sprinter is able to make it over the race course without catching a toe and plunging head first into the Multi-layer SB Resin covered surface.

I read “Why Is Semantic Search So Important for Patent Searching?” My answer was and remains, “Because vendors will grab at any buzzy concept in the hopes of capturing a share of the patent research market?”

The write up take a different approach, an approach which I find interesting and somewhat misleading.

The write up states that there are two ways to search for information: Navigational search sort of like Endeca I assume and research search, which is the old fashioned Boolean logic which I really like.

The article points out that keyword search sucks if the person looking for information does not know the exact term. That’s why I used the reference to Dienst. I wanted to provide an example which requires precise knowledge of terminology. That’s a challenge and it requires specialized knowledge from a person who recognizes that he or she may not know the exact terminology required to locate the needed information. Try the Dienst query. Navigate to a whizzy new search engine like www.unbubble.eu and plug away. How is that working out for you, but don’t cheat. You can’t use the term Dienst.

If you run the query on a point and click Web search system like Qwant.com, you cannot locate the term without running a keyword search.

The problems in patents, whether indexed with value added metadata, humans laboring in a warehouse, or with semantic methods are:

Patent documents exist in versions and each document drags along assorted forms which may or may not be findable. Trips to the USPTO with hat in hand and a note from a senator often do not work. Fancy Dan patent attorneys fall back on the good old method of hunting using intermediaries. Not pretty, not easy, not cheap, and not foolproof. The versions and assorted attachments are often unfindable. (There are sometimes interesting reasons for this kettle of fish and the fish within it.) I don’t have a solution to the chains of documents and the versions of patent documents. Sigh.
Patents include art. Usually the novice reacts negatively to lousy screenshots, clunky drawings, and equations which make it tough to figure out what a superscript character is. Keywords and pointing and clicking, metaphors, razzle dazzle search systems, and buzzword charged solutions from outfits like Thomson Reuters and Lexis are just tools, stone tools chiseled by some folks who want to get paid. I don’t have a good solution to the arts and crafts aspect of patent documents. Sigh sigh.
Patent documents are written at a level of generalization, with jargon, Latinate constructs, and assertions that usually give me a headache. Who signed up to read lots of really bad poetry. Working through the Old Norse version of Heimskringla is a walk in the park compared to figuring out what some patents “mean.” I spent a number of years indexing 15th century Latin sermons. At least in that corpus, the common knowledge base was social and political events and assorted religious material. Patents can be all over the known knowledge universe. I don’t know of a patent processing system which can make this weird prose-poetry understandable if there is litigation or findable if there is a need to figure out if someone cooked up the same system and method before the document in question was crafted. Sigh sigh sigh.
None of the systems I have used over the past 40 years does a bang up job of identifying prior art in scientific, technical or medical journal articles, blog posts, trade publications, or Facebook posts by a socially aware astrophysicist working for a social media company. Finding antecedents is a great deal of work. Has been and will be in my opinion. Sigh sigh sigh sigh. But the patent attorneys cry, “Hooray. We get to bill time.”

The write up presents some of those top brass magnets: Snappy visualizations. The idea is that a nifty diagram will address the three problems I identified in the preceding paragraphs. Visualizations may be able to provide some useful way to conceptualize where a particular patent document falls in a cluster of correctly processed patent documents. But an image does not deliver the mental equivalent of a NOW Foods Why Protein Isolate.

Net net: Pitching semantic search as a solution to the challenges of patent information access is a ball. Strikes in patent searching are not easily obtained unless you pay expert patent attorneys and their human assets to do the job. Just bring your checkbook.

Stephen E Arnold, July 7, 2015

Written by Stephen E. Arnold · Filed Under News, Patents, Search, Semantic | Comments Off on Semantic Search and Challenging Patent Document Content Domains

Compound Search Processing Repositioned at ConceptSearching

July 2, 2015

The article titled Metadata Matters; What’s The One Piece of Technology Microsoft Doesn’t Provide On-Premises Or in the Cloud? on ConceptSearching re-introduces Compound Search Processing, ConceptSearching’s main offering. Compound Search Processing is a technology achieved in 2003 that can identify multi-word concepts, and the relationships between words. Compound Search Processing is being repositioned, with Concept Searching apparently chasing Sharepoint Sales. The article states,

“The missing piece of technology that Microsoft and every other vendor doesn’t provide is compound term processing, auto-classification, and taxonomy that can be natively integrated with the Term Store. Take advantage of our technologies and gain business advantages and a quantifiable ROI…

Microsoft is offering free content migration for customers moving to Office 365…If your content is mismanaged, unorganized, has no value now, contains security information, or is an undeclared record, it all gets moved to your brand new shiny Office 365.”

The angle for Concept Searching is metadata and indexing, and they are quick to remind potential customers that “search is driven by metadata.” The offerings of ConceptSearching comes with the promise that it is the only platform that will work with all versions of Sharepoint while delivering their enterprise metadata repository. For more information on the technology, see the new white paper on Compoud Term Processing.
Chelsea Kerwin, July 2, 2014

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Cloud computing, Database, Microsoft, News, Search, Technology | Comments Off on Compound Search Processing Repositioned at ConceptSearching

Google, Search, and Swizzled Results

July 1, 2015

I am tired of answering questions about the alleged blockbuster revelations from a sponsored study and an academic Internet legal eagle wizard. To catch up on the swizzled search results “news”, I direct your attention, gentle reader, to these articles:

I don’t have a dog in this fight. I prefer the biases of Yandex.ru, the wonkiness of Qwant, the mish mash of iSeek, and the mixed outputs of Unbubble.eu.

I don’t look for information using my mobile devices. I use my trusty MacBook and various software tools. I don’t pay much, if any, attention to the first page of results. I prefer to labor through the deeper results. I am retired, out of the game, and ready to charge up my electric wheel chair one final time.

Let me provide you with three basic truths about search. I will illustrate each with a story drawn from my 40 year career in online, information access, and various types of software.

Every Search Engine Provides Tuning Controls

Yep, every search system with which i have worked offers tuning controls. Here’s the real life story. My colleagues and I get a call in our tiny cubicle in an office near the White House. The caller told us to make sure that the then vice president’s Web site came up for specific queries. We created for the Fast Search & Transfer system a series of queries which we hard wired into the results display subsystem. Bingo. When the magic words and phrases were searched, the vice president’s Web page with content on that subject came up. Why did we do this? Well, we knew the reputation of the vice president and I had the experience of sitting in a meeting he chaired. I strongly suggested we just do the hit boosting and stop wasting time. That VP was a firecracker. That’s how life goes in the big world of search.

Key takeaway: Every search engine provides easy or hard ways to present results. These controls are used for a range of purposes. The index just does not present must see benefits information when an employee runs an HR query or someone decides that content is not providing a “good user experience.”

Engineers Tailor Results Frequently

The engineers who have to deal with the weirdness of content indexing, the stuff that ends up in the exception file, a broken relevance function when an external synonym list is created, whatever—these issues have to be fixed one by one. No one talking about the search system knows or cares about this type of grunt work. The right fix is the one that works with the least hassle. If one tries to explain why certain content is not in the index, a broken conversion filter is not germane to the complainer’s conversation. When the exclusions are finally processed, these may be boosted in some way. Hey, people were complaining so weight these cont4ent objects so they show up. This works with grumpy advertisers, cranky Board members, and clueless new hires. Here’s the story. We were trying to figure out why a search system at a major trade association did not display more than half of the available content. The reason was that the hardware and memory were inadequate for the job. We fiddled. We got the content in the index. We flagged it so that it would appear at the top of a results list. The complaining stopped. No one asked how we did this. I got paid and hit the road.

Key takeaway: In real world search, there are decisions made to deal with problems that Ivory Tower types and disaffected online ecommerce sites cannot and will not understand. The folks working on the system put in a fix and move on. There are dozens and dozens of problems with every search system we have encountered since my first exposure to STAIRS III and BRS. Search sucked in the late 1960s and early 1970s, and it sucks today. To get relevant information, one has to be a very, very skilled researcher, just like it was in the 16th century.

New Hires Just Do Stuff

Okay, here’s a fact of life that will grate on the nerves of the Ivy League MBAs. Search engineering is grueling, difficult, and thankless works. Managers want precision and recall. MBAs often don’t understand that which they demand. So why not hard wire every darned query from this ivy bedecked whiz kid. Ask Jeeves took this route and it worked until the money for humans ran out. Today new hires come in to replace the experienced people like my ArnoldIT team who say, “Been there done that. Time for cyberOSINT.” The new lads and lasses grab a problem and solve it. Maybe a really friendly marketer wants Aunt Sally’s home made jam to be top ranked. The new person just sets the controls and makes an offer of “Let’s do lunch.” Maybe the newcomer gets tired of manual hit boosting, writes a script to automate boosting via a form which any marketer can complete. Maybe the script kiddie posts the script on the in-house system. Bingo. Hit boosting is the new black because it works around perceived relevance issues. Real story: At a giant drug company, researchers could not find their content. The fix was to create a separate search system, indexed and scored to meet the needs of the researchers, and then redirect every person from the research department to the swizzled search system. Magic.

Key takeaway: Over time functions, procedures, and fixes get made and managers, like prison guards, no longer perform serious monitoring. Managers are too busy dealing with automated meeting calendars or working on their own start up. When companies in the search business have been around for seven, ten, or fifteen years, I am not sure anyone “in charge” knows what is going on with the newcomers’ fixes and workarounds. Continuity is not high on the priority list in my experience.

What’s My View of the Wu-velations?

I have three observations:

Search results boosting is a core system function; it is not something special. If a search system does not include a boosting function, programmers will find a way to deliver boosting even if it means running two queries and posting results to a form with the boosted content smack in the top spot.
Google’s wildly complex and essentially unmanageable relevance ranking algorithms does stuff that is perplexing because it is tied into inputs from “semantic servers” and heaven knows what else. I can see a company’s Web site disappearing or appearing because no one understands the interactions among the inputs in Google’s wild and crazy system. Couple that with hit boosting and you have a massive demonstration of irrelevant results.
Humans at a search company can reach for a search engineer, make a case for a hit boosting function, and move on. The person doing the asking could be a charming marketer or an errant input system. No one has much, if any, knowledge of actions of a single person or a small team as long as the overall system does not crash and burn.

I am far more concerned about the predictive personalization methods in use for the display of content on mobile devices. That’s why I use Unbubble.eu.

It is the responsibility of the person looking for information to understand bias in results and then exert actual human effort, time, and brain power to figure out what’s relevant and what’s not.

Fine beat up on the Google. But there are other folks who deserve a whack or two. Why not ask yourself, “Why are results from Bing and Google so darned similar?” There’s a reason for that too, gentle reader. But that’s another topic for another time.

Stephen E Arnold, July 1, 2015

Written by Stephen E. Arnold · Filed Under Google, News, Search | Comments Off on Google, Search, and Swizzled Results

Content Grooming: An Opportunity for Tamr

June 20, 2015

Think back. Vivisimo asserted that it deduplicated and presented federated search results. There are folks at Oracle who have pointed to Outside In and other file conversion products available from the database company as a way to deal with different types of data. There are specialist vendors, which I will not name, who are today touting their software’s ability to turn a basket of data types into well-behaved rows and columns complete with metatags.

Well, not so fast.

Unifying structured and unstructured information is a time consuming, expensive process. The reasons for the obese exception files where objects which cannot be processed go to live out their short, brutish lives.

I read “Tamr Snaps Up $25.2 Million to Unify Enterprise Data.” The stakeholders know, as do I, that unifying disparate types of data is an elephant in any indexing or content analytics conference room. Only the naive believe that software whips heterogeneous data into Napoleonic War parade formations. Today’s software processing tools cannot get undercover police officers to look ship shape for the mayor.

Ergo, an outfit with an aversion to the vowel “e” plans to capture the flag on top of the money pile available for data normalization and information polishing. The write up states:

Tamr can create a central catalogue of all these data sources (and spreadsheets and logs) spread out across the company and give greater visibility into what exactly a company has. This has value on so many levels, but especially on a security level in light of all the recent high-profile breaches. If you do lose something, at least you have a sense of what you lost (unlike with so many breaches).

Tamr is correct. Organizations don’t know what data they have. I could mention a US government agency which does not know what data reside on the server next to another server managed by the same system administrator. But I shall not. The problem is common and it is not confined to bureaucratic blenders in government entities.

Tamr, despite the odd ball spelling, has Michael Stonebraker, a true wizard on the task. The write up mentions an outfit what might be politely described as a “database challenge” as a customer. If Thomson Reuters cannot figure out data after decades of efforts and millions upon millions of investment, believe me when I point out that Tamr may be on to something.

Stephen E Arnold, June 20, 2015

Written by Stephen E. Arnold · Filed Under Database, News, Text processing, Tools | Comments Off on Content Grooming: An Opportunity for Tamr

Cloud Search: Are Data Secure?

June 19, 2015

I have seen a flurry of news announcements about Coveo’s cloud based enterprise search. You can review a representative example by reading “Coveo Lassos the Cloud for Enterprise Search.” Coveo is also aware of the questions about security. See “How Does Coveo Secure Your Data and Services.”

With Coveo’s me-too cloud service, I thought about other vendors which offer cloud-based solutions. The most robust based on our tests is Blossom Search. The company was founded by Dr. Alan Feuer, a former Bell Labs’ wizard. When my team was active in government work, we used the Blossom system to index a Federal law enforcement agency’s content shortly after Blossom opened for business in 1999. As government procurements unfold, Blossom was nosed out by an established government contractor, but the experience made clear:

Blossom’s indexing method delivered near real time updates
Creating and building an initial index was four times faster than the reference system against which we test Dr. Feuer’s solution. (The two reference systems were Fast Search & Transfer and Verity.)
The Blossom security method conformed to the US government guidelines in effect at the time we did the work.

I read “Billions of Records at Risk from Mobile App Data Flow.” With search shifting from the desktop to other types of computing devices, I formulated several questions:

Are vendors deploying search on clouds similar to Amazon’s system and method ensuring the security of their customers’ data? Open source vendors like resellers of Elastic and proprietary vendors like MarkLogic are likely to be giving some additional thought to the security of their customers’ data.
Are licensees of cloud based search systems performing security reviews as we did when we implemented the Blossom search system? I am not sure if the responsibility for this security review rests with the vendor, the licensee, or a third party contracted to perform the work.
How secure are hybrid systems; that is, an enterprise search or content processing system which pulls, processes, and stores customer data across disparate systems? Google, based on my experience, does a good job of handling search security for the Google Search Appliance and for Site Search. Other vendors may be taking similar steps, but the information is not presented with basic marketing information.

My view is that certain types of enterprise search may benefit from a cloud based solution. There will be other situations in which the licensee has a contractual or regulatory obligation to maintain indexes and content in systems which minimize the likelihood that alarmist headlines like “Billions of Records at Risk from Mobile App Data Flow.”

Security is the search industry’s industry of a topic which is moving up to number one with a “bullet.”

Stephen E Arnold, June 19, 2015

Written by Stephen E. Arnold · Filed Under Enterprise search, News, Security | Comments Off on Cloud Search: Are Data Secure?

Latest Version of DataStax Enterprise Now Available

June 19, 2015

A post over at the SD Times informs us, “DataStax Enterprise 4.7 Released.” Enterprise is DataStax’s platform that helps organizations manage Apache Cassandra databases. Writer Rob Marvin tells us:

“DataStax Enterprise (DSE) 4.7 includes a production-certified version of Cassandra 2.1, and it adds enhanced enterprise search, analytics, security, in-memory, and database monitoring capabilities. These include a new certified version of Apache Solr and Live Indexing, a new DSE feature that makes data immediately available for search by leveraging Cassandra’s native ability to run across multiple data centers. …

“DSE 4.7 also adds enhancements to security and encryption through integration with the DataStax OpsCenter 5.2 visual-management and monitoring console. Using OpsCenter, developers can store encryption keys on servers outside the DSE cluster and use the Lightweight Directory Access Protocol to manage admin security.”

Four main features/ updates are listed in the write-up: extended search analytics, intelligent query routing, fault-tolerant search operations, and upgraded analytics functionality. See the article for details on each of these improvements.

Founded in 2010, DataStax is headquartered in San Mateo, California. Clients for their Cassandra-management software (and related training and professional services) range from young startups to Fortune 100 companies.

Cynthia Murrell, June 19, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Business intelligence, Data, News, Security, Technology | Comments Off on Latest Version of DataStax Enterprise Now Available

Enterprise Search: The Last Half of 2015

June 16, 2015

I saw a link this morning to an 11 month old report from an azure chip consulting firm. You know, azure chip. Not a Bain, BCG, Booz Allen, or McKinsey which are blue chip firms. A mid tier outfit. Business at the Boozer is booming is the word from O’Hare Airport, but who knows if airport gossip is valid.

Which enterprise search vendor will come up a winner in December 2015?

What is possibly semi valid are analyses of enterprise search vendors. The “Magic Quadrant for Enterprise Search” triggered some fond memories of the good old days in 2003 when the leaders in enterprise search were brands or almost brands. You probably recall the thrilling days of these information retrieval leaders:

Autonomy, the math oriented outfit with components names like neuro linguistic programming and integrated data operating layer and some really big name customers like BAE
Convera, formerly Excalibur with juice from ConQuest (developer by a former Booz, Allen person no less)
Endeca, the all time champ for computationally intensive indexing
Fast Search & Transfer, the outfit that dumped Web search in order to take over the enterprise search sector
Verity, ah, truth be told, this puppy’s architecture ensured plenty of time to dash off and grab a can of Mountain Dew.

In 2014, if the azure chip firm’s analysis is on the money, the landscape was very different. If I understand the non analytic version of Boston Consulting Group’s matrix from 1970, the big players are:

Attivio, another business intelligence solution using open source technology and polymorphic positioning for the folks who have pumped more than $35 million into the company. One executive told me via LinkedIn, that the SEC investigation of an Attivio board member had zero impact on the company. I like the attitude. Bold.
BA Insight, a business software vendor focused on making SharePoint somewhat useful and some investors with deepening worry lines
Coveo, a start up which is nudging close to a decade in age, and more than $30 million in venture backing. I wonder if those stakeholders are getting nervous.
Dassault Systèmes, the owner of Exalead, who said in the most recent quarterly report that the company was happy, happy, happy with Exalead but provided no numbers and no detail about the once promising technology
Expert System, an interesting company with a name that makes online research pretty darned challenging
Google, ah, yes, the proud marketer of the ever thrilling Google Search Appliance, a product with customer support to make American Airlines jealous
Hewlett Packard Autonomy, now a leader in the acrimonious litigation field
IBM, ah, yes, the cognitive computing bunch from Armonk. IBM search is definitely a product that is on everyone’s lips because the major output of the Watson group is a book of recipes
IHS, an outfit which is banking on its patent analysis technology to generate big bucks in the Goldmine cellophane
LucidWorks (Really?), a repackager of open source search and a distant second to Elastic (formerly Elasticsearch, which did not make the list. Darned amazing to me.)
MarkLogic, a data management system trying to grow with a proprietary XML technology that is presented as search, business intelligence, and a tool for running a restaurant menu generation system. Will MarkLogic buy Smartlogic? Do two logics make a rational decision?
Mindbreeze, a side project at Fabasoft which is the darling of the Austrian government and frustrated European SharePoint managers
Perceptive Software, which is Lexmark’s packaging of ISYS Search Software. ISYS incorporates technology from – what did the founder tell me in 2009? – oh, right, code from the 1980s. Might it not be tough to make big bucks on this code base? I have 70 or 80 million ideas about the business challenge such a deal poses
PolySpot, like Sinequa, a French company which does infrastructure, information access, and, of course, customer support
Recommind, a legal search system which has delivered a down market variation of the Autonomy-type approach to indexing. The company is spreading its wings and tackling enterprise search.
Sinequa, another one of those quirky French companies which are more flexible than a leotard for an out of work acrobat

But this line up from the azure chip consulting omits some companies which may be important to those looking for search solutions but not so much for azure chip consultants angling for retainer engagements. Let me highlight some vendors the azure chip crowd elected to ignore:

Written by Stephen E. Arnold · Filed Under Enterprise search, Feature | Comments Off on Enterprise Search: The Last Half of 2015

Solcara Is The Best! Ra Ra Ra!

June 15, 2015

Thomson-Reuters is a world renowned news syndication, but the company also has its own line of search software called Solcara Federated Search also known as Solcara SolSearch.” In a cheerleading press release, Q-resolve highlights Solcara’s features and benefits: “Solcara Legal Search, Federated Search And Know How.” Solcara allows users to search multiple information resources, including intranets, databases, Knowledge Management, and library and document management systems. It returns accurate results according to the inputted search terms or keywords. In other words, it acts like an RSS feed combined with Google.

Solcara also has a search product specially designed for those in the legal profession and the press release uses a smooth reading product description to sell it:

“Solcara legal Search is as easy to use as your favorite search engine. With just one search you can reference internal documents and approved legal information resources simultaneously without the need for large scale content indexing, downloading or restructuring. What’s more, you can rely on up-to-date content because all searches are carried out in real time.”

The press release also mentions some other tools, case studies, and references the semantic Web. While Solcara does sound like a good product and comes from a reliable new aggregator like Thomson-Reuters, the description and organization of the press release makes it hard to understand all the features and who the target consumer group is. Do they want to sell to the legal profession and only that group or do they want to demonstrate how Solcara can be adapted to all industries that digest huge information amounts? The importance of advertising is focusing the potential buyer’s attention. This one jumps all over the place.

Whitney Grace, June 15, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Business strategy, News, Search, Search quality, Semantic | Comments Off on Solcara Is The Best! Ra Ra Ra!

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Enterprise Search and the Mythical Five Year Replacement Cycle

Does America Want to Forget Some Items in the Google Index?

Semantic Search and Challenging Patent Document Content Domains

Compound Search Processing Repositioned at ConceptSearching

Google, Search, and Swizzled Results

Content Grooming: An Opportunity for Tamr

Cloud Search: Are Data Secure?

Latest Version of DataStax Enterprise Now Available

Enterprise Search: The Last Half of 2015

Solcara Is The Best! Ra Ra Ra!

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta