Cyber Wizards Speak Publishes Exclusive BrightPlanet Interview with William Bushee

April 7, 2015

Cyber OSINT continues to reshape information access. Traditional keyword search has been supplanted by higher value functions. One of the keystones for systems that push “beyond search” is technology patented and commercialized by BrightPlanet.

A search on Google often returns irrelevant or stale results. How can an organization obtain access to current, in-depth information from Web sites and services not comprehensively indexed by Bing, Google, ISeek, or Yandex?

The answer to the question is to turn to the leader in content harvesting, BrightPlanet. The company was one of the first, if not the first, to develop systems and methods for indexing information ignored by Web indexes which follow links. Founded in 2001, BrightPlanet has emerged as a content processing firm able to make accessible structured and unstructured data ignored, skipped, or not indexed by Bing, Google, and Yandex.

In the BrightPlanet seminar open to law enforcement, intelligence, and security professionals, BrightPlanet said the phrase “Deep Web” is catchy but it does not explain what type of information is available to a person with a Web browser. A familiar example is querying a dynamic database, like an airline for its flight schedule. Other types of “Deep Web” content may require the user to register. Once logged into the system, users can query the content available to a registered user. A service like Bitpipe requires registration and a user name and password each time I want to pull a white paper from the Bitpipe system. BrightPlanet can handle both types of indexing tasks and many more. BrightPlanet’s technology is used by governmental agencies, businesses, and service firms to gather information pertinent to people, places, events, and other topics

In an exclusive interview, William Bushee, the chief executive officer at BrightPlanet, reveals the origins of the BrightPlanet approach. He told Cyber Wizards Speak:

I developed our initial harvest engine. At the time, little work was being done around harvesting. We filed for a number of US Patents applications for our unique systems and methods. We were awarded eight, primarily around the ability to conduct Deep Web harvesting, a term BrightPlanet coined.

The BrightPlanet system is available as a cloud service. Bushee noted:

We have migrated from an on-site license model to a SaaS [software as a service] model. However, the biggest change came after realizing we could not put our customers in charge of conducting their own harvests. We thought we could build the tools and train the customers, but it just didn’t work well at all. We now harvest content on our customers’ behalf for virtually all projects and it has made a huge difference in data quality. And, as I mentioned, we provide supporting engineering and technical services to our clients as required. Underneath, however, we are the same sharply focused, customer centric, technology operation.

The company also offers data as a service. Bushee explained:

We’ve seen many of our customers use our Data-as-a-Service model to increase revenue and customer share by adding new datasets to their current products and service offerings. These additional datasets develop new revenue streams for our customers and allow them to stay competitive maintaining existing customers and gaining new ones altogether. Our Data-as-a-Service offering saves time and money because our customers no longer have to invest development hours into maintaining data harvesting and collection projects internally. Instead, they can access our harvesting technology completely as a service.

The company has accelerated its growth through a partnering program. Bushee stated:

We have partnered with K2 Intelligence to offer a full end-to-end service to financial institutions, combining our harvest and enrichment services with additional analytic engines and K2’s existing team of analysts. Our product offering will be a service monitoring various Deep Web and Dark Web content enriched with other internal data to provide a complete early warning system for institutions.

BrightPlanet has emerged as an excellent resource to specialized content services. In addition to providing a client-defined collection of information, the firm can provide custom-tailored solutions to special content needs involving the Deep Web and specialized content services. The company has an excellent reputation among law enforcement, intelligence, and security professionals. The BrightPlanet technologies can generate a stream of real-time content to individuals, work groups, or other automated systems.

BrightPlanet has offices in Washington, DC, and can be contacted via the BrightPlanet Web site atwww.brightplanet.com.

The complete interview is available at the Cyber Wizards Speak web site at www.xenky.com/brightplanet.

Stephen E Arnold, April 7, 2015

Blog: www.arnoldit.com/wordpress Frozen site: www.arnoldit.com Current site: www.xenky.com

 

Enterprise Search: Mixed Messages from a Perpetual Confusion Machine

April 5, 2015

I read “Enterprise Search: The Answer to All Our Problems or Technology That Most Users Neither Need Nor Want?” The write up comes from Australia, a country with a long and quite interesting history of information retrieval. I have written about the contributions of Dr. Ron Sacks Davis, an individual whom most North American search vendors, ignore. Some of these vendors reinvented Dr. Sacks Davis’ wheels, but that is the norm in the “new” and “revolutionary” world of search and content processing. Today you can tap Funnelback, a product losing a bit of marketing steam in the last six months, to scratch your information access itch. And there are other Australian milestones to consider; for example, YourAmigo, which is now applying its technology to the search engine optimization problem.

The article which has New South Wales government spin mentions several of the enterprise search marketers’ favorite truisms; for example, find information wherever it resides and boost productivity (yep, that works in a government entity).

What I found interesting about the article is that it states, quite clearly, that “most employees don’t need or want to search for information enterprise wide.” Okay, that jibes with my team’s research. The write up states:

Most employees within these organizations work within a few discrete areas of the business and know exactly where the information they need to do their job is kept. They locate records by navigating structured network drives, document stores etc. One member of the group commented that it is interesting that employees will happily search for information online but prefer to browse for information at work. There are some ‘power users’ within these organizations who either already use or would benefit from the implementation of enterprise search technologies.

The issue, as I think about this statement is cost. Why spend massive sums to benefit a small percentage of a workforce? I think this question strikes at the heart of value, knowledge, and access assumptions.

The article points out that incoming information is classified by enterprise search systems. My take is that this is a useful function. Enterprise search, according to the article, “could be used to facilitate retention and disposal.” After decades of effort, the idea that one can eliminate digital information in order to perform a records management function strikes me as surprising. Does the statement imply that New South Wales does not have a records management system despite massive investments in content management technology.

Notice that the write up has blended enterprise search which means the user looks for content with indexing new information and disposing of old information. I find the mixture a compound with potent confusion power.

Net net: The article makes it clear that enterprise search is not exactly what some people want. Nevertheless, enterprise search performs various information functions which could—note the conditional—have some upsides.

Little wonder why marketers pitching enterprise search benefits talk in circles. The customers themselves are chasing information kangaroos. My question, “Are government entities world wide behaving in a similar fashion?” Fascinating.

Stephen E Arnold, April 5, 2015

Intranet Connections and Super Search Version 13

April 2, 2015

I read “Maximize Productivity with Super Search from Intranet Connections (Version 13.0 Release).”

For decades I have been gathering information about enterprise search and content processing. The name of the company was not familiar to me. The assertions in the news article were, however.

Puzzled, I went through my archive of search vendor information and did not find content about Intranet Connections. I noted the date on this article and wondered if the company were an April Fool’s spoof. I know I am getting on in years, but when I wake up and plop in front of my primitive, coal-fired computer, my memory works reasonably well. I know my name and the day of the week.

The write touts an enterprise search system that includes:

  • New Search Engine
  • Preview Display Cards
  • Intuitive Search Filters
  • Advanced Search Options
  • Controlled Search Security.

I know from the years of experience I have logged examining, testing, and creating search and content processing systems like the one we sold to the long ago Lycos, that “new” is a slippery concept. For some folks, learning about Google’s site operator is a new thing. For others it is a reminder of the many useful search functions that Google no longer exposes to the ad consumers looking for objective information via Google.com.

I am not sure how often a search system innovates across 13 versions. Intranet Connections seems to have been founded in 1999, which makes the company 16 years young. Most of the long lived search engines don’t change too much from the original core; for example, Autonomy IDOL. On the other hand, other companies just discard a search system and graft in open source Lucene and slap on the “new” label. Others take inspiration from Fast Search and call it new.

The company states on its LinkedIn page here:

Intranet Connections is a business intranet software solution that enables organizations to connect, collaborate and create more efficiently yielding significant time-cost savings and stronger employee engagement. We combine key business tools to automate workflows and processes, while delivering improved communication and collaboration among employees to engage and promote culture within the digital workplace.

How new is Version 13 of a search system. I learned from the write up:

“Enterprise intranet search functionality is proving to be more critical than ever as today’s mature intranets face thousands of data entries, pages, forms, and uploaded corporate documents and policies. Employees’ expectations are higher than ever to deliver on an intranet search utility that is fast, focused, intelligent and super simple. We wanted to introduce intranet search that is not just functional but an entire experience.” Douglas also reports that Super Search was a result of close collaboration with Intranet Connections’ customers who were active in feedback for the design and feature set capabilities, ensuring the product release would enhance their needs for enterprise intranet search.

And adds that it can deliver software capable of “triggering emotions on the Intranet using Intranet design such as theme, videos, and photos.” Furthermore, the system seems to be able to marry an “Intranet and an enterprise social network.”

These are significant assertions.

Okay, that sounds great but we are in Version 13, not Version 1.1. The new version was announced in August 2014. In “Intranet Search Designed for Maximum Productivity” I learned:

The first thing customers will notice is the completely redesigned user interface. It is really geared towards making search simple and fast as possible for the average user. We also introduced “one-click filtering”. If a user knows they are looking for a document, a form, or a person, they have the option to filter their results with a single click. This automatically removes search results that aren’t in the specified category. More advanced search options are available for power users, but are hidden by default. These users can choose to filter their results by specific sites, application, modified date, author, or tags. We also introduced the content of feature cards. Most of the time, users can determine if a search result is what they are looking for by the title, category, or short description. However, if there are multiple documents that are similar, a little bit more information may be necessary. Instead of requiring the user to click into the item to view more details, and navigate away from search, we introduced the concept of a feature card. Additional summary information for a search result can be displayed within the search screen, preventing the need to jump back and forth from search and content.

 

Intrigued by my Overflight systems lack of information about Super Search, I visited the company’s Web site. I learned that the system begins at $15,000, which strikes me as a bargain. Low cost search systems often face significant financial demands as the company struggles to keep pace with the needs of customers, support demands, and the inevitable tweaks that are needed to deal with the wild and crazy nature of behind-the-firewall content.

At www.IntranetConnections.com I learned that the company makes “Intranet software made for you.” I assume that means me. I do have a 2.5 million test corpus which has been known to take days of indexing. One German company promised speedy performance, and I had to leave the system on for five days before I could run a test query. The initial crawl failed because this particularly German, Lucene based system choked on Microsoft’s file locks. Yep, every Microsoft system has these types of files. I wondered, “Yo, why not provide some tools to deal with this like a !readme.txt file.”

Back to Intranet Connections.

The company delivers what I think of as a one-stop, 7-11 solution. The Web site highlights a people directory, forms, document management, Intranet Web sites, but not search. After I scrolled through information about the corporate Intranet, the finance Intranet, and the healthcare Intranet. But no direct link to Super Search.

I used my tools to examine the site and located a blog post about Super Search in an article labeled “Super Search Launch [sic] Scavenger Hunt.” There was a phrase about Super Search, “And much more”. But there was no link. I did locate a link to a story with the title “Super Search (V13.0) dated January 27, 2015. That page did provide links to a feature guide, a support page, a webinar recording (Does anyone have webinar fatigue as I do?) and a recursive link to the blog. There is also a link to the installation guide. The guide is 300 words long and helpful provides me with a user name and password. The guide also makes clear that I need to be deep into the Microsoft world. Mac and Linux users do not seem to be encouraged. Unlike the German outfit, Intranet Connections provides a link to information necessary to get the search engine working.

It appears that the company offers an alternative to Microsoft SharePoint. The firm, based in Vancouver, has 1,600 customers. Some have a high profile like NASA and the Mayo Clinics. My hunch is that the company has assembled / developed a suite of software. Search is included.

Other observations:

  • I struggled with the “new” concept. I mean after 13 years, how “new” is “new.”
  • I had to do some poking around to get access to the fact sheet and basic information
  • The pricing of $15,000 seems to apply to the full collection of software available from the company
  • I have yet to figure out how I managed to know zero about a company with a search system named “super.”

I need to improve my enterprise search information collection. Some help from vendors with more comprehensive and easy-to-find information would be helpful.

Stephen E Arnold, April 2, 2015

Is Google Net Neutral?

March 31, 2015

When the FCC passed laws that protect net neutrality, the Internet rejoiced that its crazy antics would be safeguarded and content would not be as regulated when it comes to search retrieval and indexing. Big technology companies that make the bulk of the revenue from Internet related services and products are beginning to voice their opinions on the matter, including Google. Drew Crawford wrote on his blog Sealed Abstract a very heated post about Google’s stance in the entire net neutrality argument: “Google, Our Patron Saint Of The Closed Web.” The blog points out the Google is net neutral with the Droid open market and its employees’ blogs, but apparently Google is also out to destroy the free Web too.

Google plans to take control of all .dev domain addresses and possible others in an effort to have these extensions solely related to Google products and services. In short, if you want to use any domains with this ending, like a blog, you will be forced to use a Google service. It is reminiscent of when Google forced people to sign-up for Google Plus if users wanted to continue using YouTube.

“My point is that if you think Google is some kind of Patron Saint of the Open Web, shit son. Tim Cook on his best day could not conceive of a dastardly plan like this. This is a methodical, coordinated, long-running and well-planned attack on the open web that comes from the highest levels of Google leadership.”

The news is not surprising when you assemble the pieces, but it is disheartening that there do not seem to be any big companies on the little guy’s side. And I thought Google was committed to not being evil.

Whitney Grace, March 31, 2015

Get you copy of CyberOSINT: Next Generation Information

Access at http://www.xenky.com/cyberosint

Enterprise Search Is Important: But Vendor Survey Fails to Make Its Case

March 20, 2015

I read “Concept Searching Survey Shows Enterprise Search Rises in the Ranks of Strategic Applications.” Over the years, I have watched enterprise search vendors impale themselves on their swords. In a few instances, licensees of search technology loosed legal eagles to beat the vendors to the ground. Let me highlight a few of the milestones in enterprise search before commenting on this “survey says, it must be true” news release.

A Simple Question?

What do these companies have in common?

  • Autonomy
  • Convera
  • Fast Search & Transfer?

I know from my decades of work in the information retrieval sector that financial doubts plagued these firms. Autonomy, as you know, is the focal point of on-going litigation over accounting methods, revenue, and its purchase price. Like many high-tech companies, Autonomy achieved significant revenues and caused some financial firms to wonder how Autonomy achieved its hundreds of millions in revenue. There was a report from Cazenove Capital I saw years ago, and it contained analyses that suggested search was not the money machine for the company.

And Convera? After morphing from Excalibur with its acquisition of the manual-indexing ConQuest Technologies, a document scanning with some brute force searching technology morphed into Convera. Convera suggested that it could perform indexing magic on text and video. Intel dived in and so did the NBA. These two deals did not work out and the company fell on hard times. With an investment from Allen & Company, Conquest tried its hand at Web indexing. Finally, stakeholders lost faith and Convera sold off its government sales and folded its tent. (Some of the principals cooked up another search company. This time the former Convera wizards got into the consulting engineering business.) Convera lives on in a sense as part of the Ntent system. Convera lost some money along the way. Lots of money as I recall.

And Fast Search? Microsoft paid $1.2 billion for Fast Search. Now the 1998 technology lives on within Microsoft SharePoint. But Fast Search has the unique distinction of facing both a financial investigation for fancy dancing with its profit and loss statement and the distinction of having its founder facing a jail term. Fast Search ran into trouble when its marketers promised magic from the ESP system. When the pixie dust caused licensees to develop an allergic reaction, Fast ran into trouble. The scrambling caused some managers to flee the floundering Norwegian search ship and found another search company. For those who struggle with Fast Search in its present guise, you understand the issues created by Fast Search’s “sell it today and program it tomorrow” approach.

Is There a Lesson in These Vendors’ Trajectories?

What do these three examples tell us? High flying enterprise search vendors seem to have run into some difficulties. Not surprisingly, the customers of these companies are often wary of enterprise search. Perhaps that is the reason so many enterprise search vendors do not use the words “enterprise search”, preferring euphemisms like customer support, business intelligence, and knowledge management?

The Rush to Sell Out before Drowning in Red Ink

Now a sidelight. Before open source search effectively became the go to keyword search system, there were vendors who had products that for the most part worked when installed to do basic information retrieval. These companies’ executives worked overtime to find buyers. The founders cashed out and left the new owners to figure out how to make sales, pay for research, and generate sufficient revenue to get the purchase price back. Which companies are these? Here’s a short list and incomplete list to help jog your memory:

  • Artificial Linguistics (sold to Oracle)
  • BRS Search (sold to OpenText)
  • EasyAsk (first to Progress Software and then to an individual investor)
  • Endeca to Oracle
  • Enginium (sold to Kroll and now out of business)
  • Exalead to Dassault
  • Fulcrum Technology to IBM (quite a story. See the Fulcrum profile at www.xenky.com/vendor-profiles)
  • InQuira to Oracle
  • Information Dimensions (sold to OpenText)
  • Innerprise (Microsoft centric, sold to GoDaddy)
  • iPhrase to IBM (iPhrase was a variant of Teratext’s approach)
  • ISYS Search Software to Lexmark (yes, a printer company)
  • RightNow to Oracle (RightNow acquired Dutch technology for its search function)
  • Schemalogic to Smartlogic
  • Stratify/Purple Yogi (sold to Iron Mountain and then to Autonomy)
  • Teratext to SAIC, now Leidos
  • TripleHop to Oracle
  • Verity to Autonomy and then HP bought Autonomy
  • Vivisimo to IBM (how clustering and metasearch magically became a Big Data system from the company that “invented” Watson) .

The brand impact of these acquired search vendors is dwindling. The only “name” on the list which seems to have some market traction is Endeca.

Some outfits just did not make it or who are in a very quiet, almost dormant, mode. Consider  these search vendors:

  • Delphes (academic thinkers with linguistic leanings)
  • Edgee
  • Dieselpoint (structured data search)
  • DR LINK (Syracuse University and an investment bank)
  • Executive Search (not a headhunting outfit, an enterprise search outfit)
  • Grokker
  • Intrafind
  • Kartoo
  • Lextek International
  • Maxxcat
  • Mondosoft
  • Pertimm (reincarnated with Axel Springer (Macmillan) money as Qwant, which according to Eric Schmidt, is a threat to Google. Yeah, right.)
  • Siderean Software (semantic search)
  • Speed of Mind
  • Suggest (Weitkämper Technology)?
  • Thunderstone

These are not a comprehensive list. I just wanted to layout some facts about vendors who tilted at the enterprise search windmill. I think that a reasonable person might conclude that enterprise search has been a tough sell. Of the companies that developed a brand, none was able to achieve sustainable revenues. The information highway is littered with the remains of vendors who pitched enterprise search as the killer app for anything to do with information.

Now the survey purports to reveal insights to which I have been insensitive in my decades of work in digital information access.

Here’s what the company sponsoring the survey offers:

Concept Searching [the survey promulgator], the global leader in semantic metadata generation, auto-classification, and taxonomy management software, and developer of the Smart Content Framework™, is compiling the statistics from its 2015 SharePoint and Office 365 Metadata survey, currently unpublished. One of the findings, gathered from over 360 responses, indicates a renewed focus on improving enterprise search.

The focus seems to be on SharePoint. I thought SharePoint was a mishmash of content management, collaboration, and contacts along with documents created by the fortunate SharePoint users. Question: Is enterprise search conflated with SharePoint?

I would not make this connection.

If I understand this, the survey makes clear that some of the companies in the “sample” (method of selection not revealed) want better search. I want better information access, not search per se.

Each day I have dozens of software applications which require information access activity.  I also have a number of “enterprise” search systems available to me. Nevertheless, the finding suggests to me that enterprise search is and has not been particularly good. If I put on my SharePoint sunglasses, I see a glint of the notion that SharePoint search is not very good. The dying sparks of Fast Search technology smoldering in fire at Camp DontWorkGud.

Images, videos, and audio content present me with a challenge. Enterprise search and metatagging systems struggle to deal with these content types. I also get odd ball file formats; for example, Framemaker, Quark, and AS/400 DB2 UDB files.

The survey points out that the problem with enterprise search is that indexing is not very good. That may be an understatement. But the remedy is not just indexing, is it?

After reading the news release, I formed the opinion that the fix is to use the type of system available from the survey sponsor Concept Searching. Is that a coincidence?

Frankly, I think the problems with search are more severe than bad indexing, whether performed by humans or traditional “smart” software.

According the news release, my view is not congruent with the survey or the implications of the survey data:

A new focus on enterprise search can be viewed as a step forward in the management and use of unstructured content. Organizations are realizing that the issue isn’t going to go away and is now impacting applications such as records management, security, and litigation support. This translates into real business currency and increases the risk of non-compliance and security breaches. You can’t find, protect, or use what you don’t know exists. For those organizations that are using, or intend to deploy, a hybrid environment, the challenges of leveraging metadata across the entire enterprise can be daunting, without the appropriate technology to automate tagging.

Real business currency. Is that money?

Are system administrators still indexing human resource personnel records, in process legal documents related to litigation, data from research tests and trials in an enterprise search system? I thought a more fine-grained approach to indexing was appropriate. If an organization has a certain type of government work, knowledge of that work can only be made available to those with a need to know. Is indiscriminate and uncontrolled indexing in line with a “need to know” approach?

Information access has a bright future. Open source technology such as Lucene/Solar/Searchdaimon/SphinxSearch, et al is a reasonable approach to keyword functionality.

Value-added content processing is also important but not as an add on. I think that the type of functionality available from BAE, Haystax, Leidos, and Raytheon is more along the lines of the type of indexing, metatagging, and coding I need. The metatagging is integrated into a more modern system and architecture.

For instance, I want to map geo-coordinates in the manner of Geofeedia to each item of data. I also want context. I need an entity (Barrerra) mapped to an image integrated with social media. And, for me, predictive analytics are essential. If I have the name of an individual, I want that name and its variants. I want the content to be multi-language.

I want what next generation information access systems deliver. I don’t want indexing and basic metatagging. There is a reason for Google’s investing in Recorded Future, isn’t there?

The future of buggy whip enterprise search is probably less of a “strategic application” and more of a utility. Microsoft may make money from SharePoint. But for certain types of work, SharePoint is a bit like Windows 3.11. I want a system that solves problems, not one that spawns new challenges on a daily basis.

Enterprise search vendors have been delivering so-so, flawed, and problematic functionality for 40 years. After decades of vendor effort to make information findable in an organization, has significant progress been made. DARPA doesn’t think search is very good. The agency is seeking better methods of information access.

What I see when I review the landscape of enterprise search is that today’s “leaders”  (Attivio, BA Insight, Coveo, dtSearch, Exorbyte, among others) remind me of the buggy whip makers driving a Model T to lecture farmers that their future depends on the horse as the motive power for their tractor.

Enterprise search is a digital horse, an one that is approaching break down.

Enterprise search is a utility within more feature rich, mission critical systems. For a list of 20 companies delivering NGIA with integrated content processing, check out www.xenky.com/cyberosint.

Stephen E Arnold, March 20, 2015

Enterprise Search: Messages Confuse, Confound

March 19, 2015

I review a couple of times a week a free digital “newspaper” called Paper.li. I learned about this Paper.li “newspaper” When Vivisimo sent me its version of “search news.” The enterprise search newspaper I receive is assembled under the firm hand of Edwin Stauthamer. The stories are automatically assembled into “The Enterprise Search Daily.”

The publication includes a wide range of information. The referrer’s name appears with each article. The title page for the March 18, 2015, issue is looks like this.

image

In the last week or so, I have noticed a stridency in the articles about search and the disciplines the umbrella term protects from would-be encroachers. Search is customer support, but from the enterprise search vendors’ viewpoint, enterprise search is the secret sauce for a great customer support soufflé. Enterprise search also does Big Data, business intelligence, and dozens of other activities.

The reason for the primacy of search, as I understand the assertions of the search companies and the self appointed search “experts” is that information retrieval makes the business work. Improve search. It follows, according to the logic, that revenues will increase, profits will rise, and employee and customer satisfaction will skyrocket.

Unfortunately enterprise search is difficult to position at the alpha and omega of enterprise software. Consider this article from the March 18 edition of The Enterprise Search Daily.

Why Enterprise Search is a Must Have for Any Enterprise Content Management Strategy

The article begins:

Enterprise search has notoriously been a problem in the content management equation. Various content and document management systems have made it possible to store files. But the ability to categorize that information intuitively and in a user-friendly way, and make that information easy to retrieve later, has been one of several missing pieces in the ECM market. When will enterprise search be as easy to use and insightful as Google’s external search engine? If enterprise search worked anywhere near as effectively as Google, it might be the versatile new item in our content management wardrobes, piecing content together with a clean sophistication that would appeal to users by making everything findable, accessible and easy to organize.

I am not sure how beginning with the general perception that enterprise search has been, is, and may well be a failure flips to a “must have” product. My view is that keyword search is a utility. For organizations with cash to invest, automated indexing and tagging systems can add some additional findability hooks. The caveat is that the licensee of these systems must be prepared to spend money on a professional who can ride herd on the automated system. The indexing strays have to be rounded up and meshed with the herd. But the title’s assertion is a dream, a wish. I don’t think enterprise content management is particularly buttoned up in most organizations. Even primitive search systems struggle to figure out what version is the one the user needs to find. Indexing by machine or human often leads to manual inspection of documents in order to locate the one the user requires. Google wanders into the scene because most employees give Google.com a whirl before undertaking a manual inspection job. If the needed document is on the Web somewhere, Google may surface it if the user is lucky enough to enter the secret combination of keywords. Google is deeply flawed, but for many employees, it is better than whatever their employer provides.

Read more

Lookeen Desktop Search: Exclusive Interview Reveals Lucene as a Personal Search Solution

March 17, 2015

Axonic’s enterprise-centric search products eliminate most, if not all, of the problems a Windows user encounters when trying to locate related information produced by different applications on a desktop computer. Email and other types of information are findable with a few keystrokes.

When I was in Germany in June 2014, I learned about Lookeen, a desktop search product that was built on Lucene. The idea was to tap the power of Lucene to put content on a user’s computer at one’s fingertips. Imagine working in Outlook, reading a message, and seeing a reference to a PowerPoint on the user’s external storage device. Lookeen allows access to the content from within Outlook. Now the company is releasing a commercial version of its desktop search product that promises to be a game changer on the desktop and in the enterprise. The company offers robust functionality at a very attractive price point.

The role of Lucene and other technical innovations in the high-performance software appears in an exclusive interview with Lookeen’s chief operating officer. You can find the interview at http://bit.ly/1LizbkQ.

Lookeen Search Results

The Lookeen interface is intuitive. No training is required to install the Lucene-based system nor to use it for simple or complex information retrieval tasks. Image used with the permission of Axonic GmbH.

Lookeen is a product developed by Axonic, a software and services firm located in Karlsruhe, Germany, in Rhine Valley, a short distance from Stuttgart.  Axonic is one of the leading software development and services firms for Outlook and Exchange Server search technologies in Europe. The company specializes in enterprise applications and has a core competency in Microsoft technologies.

I wanted more detail about Lookeen’s approach to desktop search. In an exclusive interview, Peter Oehler, COO, revealed a its breakthrough approach to desktop search. The company’s Lookeen software gives Windows users the industry-leading search technology tuned for the Microsoft environment. Outlook email, PowerPoint decks, Word documents and other common file types are instantly findable.

Peter Oehler said:

We’ve utilized Lucene’s extensive query syntax to enable users to use familiar Google-like Boolean search, as well as wildcard, proximity, and keyword matching.  The introduction of more search strings and filter features enable users to narrow down searches in an easy and intuitive way, and more proficient searchers can access the best of Lucene’s query syntax.

He added:

Lucene is a very good, widely used open source search system. Many of the innovations we’ve developed on top of the Lucene engine stem directly from our extensive experience with Outlook. For example, the Lookeen context menu allows a user to open, reply to, forward, move and summarize emails and topics, all from within Lookeen.

What sets Lookeen apart from proprietary, freeware, and shareware is that Axonic has engineered its system to provide real-time access to information on the user’s computer. The system can handle terabytes of user content, returning results almost instantaneously.

Axonic has deep experience with Microsoft technology. Oehler told me:

Lucene is a beast within the Microsoft environment. Microsoft doesn’t make it easy to work with Outlook without causing problems or affecting performance. Outlook is the lifeblood of most professionals – the most important tool. If it stops working, you stop working. The art of our product is how we tackle the complex code hiding under the surface of Outlook and combine it with Lucene to create a deceptively smooth and simple search solution.

Beyond Search ran tests on Lookeen and compared the results with outputs from a number of test systems. Lookeen’s response times were among the fastest. When indexing and searching email, including archived collections of emails, Lookeen was the top performer. Our test systems include Copernic, dtSearch, Effective File Search, Gaviri, ISYS Desktop Search, and X1.

Lookeen requires no special training or complex set up. Lookeen allows a user to search external shared content directly from the Lookeen app. The interface is clear and logical. A busy professional can access needed documents, view and interact with them without launching an external application.

A 14 day free trial is available. The license fee is $58 for a single user version. The company offers a business edition (at $83) which adds group policy functions and an enterprise edition, which begins at about $116 per user, however volume discounts are available.

To read the complete exclusive interview with Peter Oehler, navigate to the Search Wizards Speak service at this link on ArnoldIT. More information about the company is available at http://www.lookeen.com.

Stephen E Arnold, March 17, 2015

Google: Similarity Function Drifts from Relevance

March 12, 2015

I ran a test query for “Concept Searching,” an indexing outfit. I noticed that Google generated a list of companies with the label “People Also Search For.”

image

What I find interesting is that the list of companies is a bit of a grab bag. Here is the list presented to me:

image

  • X1 Technologies, now in the eDiscovery business
  • GenieKnows, an SEO outfit
  • Funnelback, Squizz’s search solution which has gone quiet since David Hawking shifted roles
  • dtSearch, a Microsoft centric desktop and CD-ROM search system for Windows
  • Northern Light Group, now a research firm
  • Coveo, the ageing startup once focused exclusively on Microsoft centric solutions
  • ZyLAB Technologies, a legal document management and search solution
  • Metalogix, a SharePoint migration specialist
  • Convera, one of the spectacular business implosions which I documented in the Xenky profile available at www.xenky.com/vendor-profiles
  • Dieselpoint, a search outfit that went quiet a couple of years ago
  • Axceler, now a unit of Metalogix
  • Fast Search & Transfer, the search company that has the distinction of a financial misstep and a founder with a painful brush with Norwegian law enforcement
  • Exalead, now a unit of Dassault Systèmes, a company which has largely faded from the North American market
  • Expert System, a quite good semantic vendor based in Modena, Italy
  • Vivisimo, a metasearch outfit acquired by IBM and now part of the IBM Big Data machine.

Quite an assortment. I assume that these suggestions are helpful to the LinkedIn experts, the failed webmasters now rebranded as search wizards, and wanna-be academics looking for consulting revenue.

For me, the list is an illustration of what Google wants to do, provide on point suggestions. However, the list makes vivid the limitations of the Google methods. Hey, the company is focusing attention on balloons.

Stephen E Arnold, March 12, 2015

Enterprise Search: Is Keyword Search a Lycra-Spandex Technology?

March 3, 2015

I read a series of LinkedIn posts about why search may be an enterprise application flop. To access the meanderings of those who believe search is a young Bruce Jenner, you will have to sign up for LinkedIn and then wrangle an invitation to this discussion. Hey, good luck with this access to LinkedIn thing.

Over the years, enterprise search has bulked up. The keyword indexing has been wrapped in layers of helper code. For example, search now classifies, performs work flows operations, identifies entities, supports business intelligence dashboards, delivers self service Web help, handles Big Data, and dozens of other services.

image

Image Source: www.sochealth.co.uk.

I have several theories about this chubbification of keyword search. Let me highlight the thoughts that I jotted down as I worked through the “flop” postings on LinkedIn.

First, keyword search is not particularly useful to some people looking for information in an organization. The employee has to know what he or she needs and the terminology to use to unlock the secrets of the index. Add some time pressure and keyword search becomes infuriating. The fix, which began when Fulcrum Technologies pitched a platform approach to search, was to make search a smaller part of a more robust information management solution. You can still buy pieces of the original 1980s Fulcrum technology from OpenText today.

Second, system users continue to perceive results list as a type of homework. The employee has to browse the results list, click on documents that may contain the needed information, scan the document, identify the factoid or paragraph needed, copy it to another document, and then repeat the process. Employees want answers. What better way to deliver those answers than a “point and click” interface? Just pick what one needs and be done with the drudgery of the keyword search.

Third, professionals working in organizations want to find information from external sources like Web pages and blogs and from internal sources such as the server containing the proposals or president’s PowerPoint presentations. Enterprise search is presented as a solution to information access needs. The licensee quickly learns that most enterprise search systems require money, engineers, and time to set up so that content from disparate sources can be presented from a single interface. Again employees grouse when videos from YouTube and from the training department are not in the search results. Some documents containing needed information are not in the search system’s index but a draft version of the document is available via a Bing or Google search.

Fourth, the enterprise search system built on keywords lacks intelligence. For many vendors the solution is to add semantic intelligence, dynamic personalization which figures out what an employee needs by observing his information behaviors, and predictive analytics which just predicts what is needed for the company, a department and an individual.

Fifth, vendors have emphasized that a smart organization must have a taxonomy, a list of words and concepts tailored to the specific organization. These terms enrich the indexing of content. To make taxonomy management easy as pie, search vendors have tossed in editorial controls for indexing, classification, and hit boosting so that certain information appears whether the employee asked for the data or not.

In short order, the enterprise search system looks quite a bit like the “Obesity Is No Laughing Matter” poster.

This state of affairs is good for consulting engineers (SharePoint search, anyone?), mid tier consulting firm pundits, failed webmasters recast as search experts, and various hangers on. The obese enterprise search system is not particularly good for the licensing organization, the employees who are asked to use the system, or for the system administrators who have to shoehorn search into their already stuffed schedule for maintaining databases, accounting systems, enterprise resource planning, and network services.

Search is morbidly obese. No diet is going to work. The fix, based on the research conducted for my new monograph CyberOSINT is that a different approach is needed. Automated collection, analysis, and outputs are the future of information access.

Keyword search is a utility and available in NGIA systems. Unlike the obese keyword search systems, NGIA information access has been engineered to deliver more integrated services to users relying on mobile devices as well as traditional desktop computers.

Obese search is no laughing matter. One cannot make a utility into an NGIA system. However, and NGIA can incorporate search as a utility function. Keep this in mind if you are embracing Microsoft SharePoint-type systems. Net net: traditional enterprise search is splitting its seams, and it is unsightly.

Stephen E Arnold, March 3, 2015

Taxonomy Turmoil: Good Enough May Be Too Much

February 28, 2015

For years, I have posted a public indexing Overflight. You can examine the selected outputs at this Overflight link. (My non public system is more robust, but the public service is a useful temperature gauge for a slice of the content processing sector.)

When it comes to indexing, most vendors provide keyword, concept tagging, and entity extraction. But are these tags spot on? No, most are good enough.

image

A happy quack to Jackson Taylor for this “good enough” cartoon. The salesman makes it clear that good enough is indeed good enough in today’s marketing enabled world.

I chose about 50 companies that asserted their systems performed some type of indexing or taxonomy function. I learned that the taxonomy business is “about to explode.” I find that to be either an interesting investment tip or a statement that is characteristic of content processing optimists.

Like search and retrieval, plugging in “concepts” or other index terms is a utility function. For example, if one indexes each word in an article appearing in this blog, the article might be about another subject. For example, in this post, I am talking about Overflight, but the real topic is the broader use of metadata in information retrieval systems. I could assign the term “faceted navigation” to this article as a way to mark this article as germane to point and click navigation systems.

If you examine the “reports” Overflight outputs for each of the companies, you will discover several interesting things as I did on February 28, 2015 when I assembled this short article.

  1. Mergers or buying failed vendors at fire sale prices are taking places. Examples include Lucidea’s purchase of Cuadra and InMagic. Both of these firms are anchored in traditional indexing methods and seemed to be within a revenue envelope until their sell out. Business Objects acquired Inxight and then SAP acquired Business Objects. Bouvet acquired Ontopia. Teradata acquired Revelytix
  2. Moving indexing into open source. Thomson Reuters acquired ClearForest and made most of the technology available as OpenCalais. OpenText, a rollup outfit, acquired Nstein. SAS acquired Teragram. Smartlogic acquired Schemalogic. (A free report about Schemalogic is available at www.xenky.com/vendor-profiles.)
  3. A number of companies just failed, shut down, or went quiet. These include Active Classification, Arikus, Arity, Forth ICA, MaxThink, Millennium Engineering, Navigo, Progris, Protege, punkt.net, Questans, Quiver, Reuse Company, Sandpiper,
  4. The indexing sector includes a number of companies my non public system monitors; for example, the little known Data Harmony with six figure revenues after decades of selling really hard to traditional publishers. Conclusion: Indexing is a tough business to keep afloat.

There are numerous vendors who assert their systems perform indexing, entity, and metadata extraction. More than 18 of these companies are profiled in CyberOSINT, my new monograph. Oracle owns Triple Hop, RightNow, and Endeca. Each of these acquired companies performs indexing and metadata operations. Even the mashed potatoes search solution from Microsoft includes indexing tools. The proprietary XML data management vendor MarkLogic asserts that it performs indexing operations on content stored in its repository. Conclusion: More cyber oriented firms are likely to capture the juicy deals.

So what’s going on in the world of taxonomies? Several observations strike me as warranted:

First, none of the taxonomy vendors are huge outfits. I suppose one could argue that IBM’s Lucene based system is a billion dollar baby, but that’s marketing peyote, not reality. Perhaps MarkLogic which is struggling toward $100 million in revenue is the largest of this group. But the majority of the companies in the indexing business are small. Think in terms of a few hundred thousand in annual revenue to $10 million with generous accounting assumptions.

What’s clear to me is that indexing, like search, is a utility function. If a good enough search system delivers good enough indexing, then why spend for humans to slog through the content and make human judgments. Why not let Google funded Recorded Future identify entities, assign geo codes, and extract meaningful signals? Why not rely on Haystax or RedOwl or any one of more agile firms to deliver higher value operations.

I would assert that taxonomies and indexing are important to those who desire the accuracy of a human indexed system. This assumes that the humans are subject matter specialists, the humans are not fatigued, and the humans can keep pace with the flow of changed and new content.

The reality is that companies focused on delivering old school solutions to today’s problems are likely to lose contracts to companies that deliver what the customer perceives as a higher value content processing solution.

What can a taxonomy company do to ignite its engines of growth? Based on the research we performed for CyberOSINT, the future belongs to those who embrace automated collection, analysis, and output methods. Users may, if the user so chooses, provide guidance to the system. But the days of yore, when monks with varying degrees of accuracy created catalog sheets for the scriptoria have been washed to the margin of the data stream by today’s content flows.

What’s this mean for the folks who continue to pump money into taxonomy centric companies? Unless the cyber OSINT drum beat is heeded, the failure rate of the Overflight sample is a wake up call.

Buying Apple bonds might be a more prudent financial choice. On the other hand, there is an opportunity for taxonomy executives to become “experts” in content processing.

Stephen E Arnold, February 28, 2015

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta