Falcon Searches Through Browser History

October 21, 2016

Have you ever visited a Web site and then lost the address or could not find a particular section on it?  You know that the page exists, but no matter how often you use an advanced search feature or scour through your browser history it cannot be found.  If you use Google Chrome as your main browser than there is a solution, says GHacks in the article, “Falcon: Full-Text history Search For Chrome.”

Falcon is a Google Chrome extension that adds full-text history search to a browser.  Chrome usually remembers Web sites and their extensions when you type them into the address bar.  The Falcon extension augments the default behavior to match text found on previously visited Web Sites.

Falcon is a search option within a search feature:

The main advantage of Falcon over Chrome’s default way of returning results is that it may provide you with better results.  If the title or URL of a page don’t contain the keyword you entered in the address bar, it won’t be displayed by Chrome as a suggestion even if the page is full of that keyword. With Falcon, that page may be returned as well in the suggestions.

The new Chrome extension acts as a delimiter to recorded Web history and improves a user’s search experience so they do not have to sift through results individually.

Whitney Grace, October 21, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph


Structured Search: New York Style

October 10, 2016

An interesting and brief search related content marketing white paper “InnovationQ Plus Search Engine Technology” attracted my attention. What’s interesting is that the IEEE is apparently in the search engine content marketing game. The example I have in front of me is from a company doing business as IP.com.

What does InnovationQ Plus do to deliver on point results? The write up says:

This engine is powered by IP.com’s patented neural network machine learning technology that improves searcher productivity and alleviates the difficult task of identifying and selecting countless keywords/synonyms to combine into Boolean syntax. Simply cut and paste abstracts, summaries, claims, etc. and this state-of-the art system matches queries to documents based on meaning rather than keywords. The result is a search that delivers a complete result set with less noise and fewer false positives. Ensure you don’t miss critical documents in your search and analysis by using a semantic engine that finds documents that other tools do not.

The use of snippets of text as the raw material for a behind-the-scenes query generator reminds me of the original DR-LINK method, among others. Perhaps there is some Syracuse University “old school” search DNA in the InnovationQ Plus approach? Perhaps the TextWise system has manifested itself as a “new” approach to patent and STEM (scientific, technology, engineering, and medical)  online searching? Perhaps Manning & Napier’s interest in information access has inspired a new generation of search capabilities?

My hunch is, “Yep.”

If you don’t have a handy snippet encapsulating your search topic, just fill in the query form. Google offers a similar “fill in the blanks” approach even thought a tiny percentage of those looking for information on Google use advanced search. You can locate the Google advanced search form at this link.

Part of the “innovation” is the use of fielded search. Fielded search is useful. It was the go to method for locating information in the late 1960s. The method fell out of favor with the Sillycon Valley crowd when the idea of talking to one’s mobile phone became the synonym for good enough search.

To access the white paper, navigate the IEEE registration page and fill out the form at this link.

From my vantage point, structured search with “more like this” functions is a good way to search for information. There is a caveat. The person doing the looking has to know what he or she needs to know.

Good enough search takes a different approach. The systems try to figure out what the searcher needs to know and then deliver it. The person looking for information is not required to do much thinking.

The InnovationQ Plus approach shifts the burden from smart software to smart searchers.

Good enough search is winning the battle. In fact, some Sillycon Valley folks, far from upstate New York, have embraced good enough search with both hands. Why use words at all? There are emojis, smart software systems predicting what the use wants to know, and Snapchat infused image based methods.

The challenge will be to find a way to bridge the gap between the Sillycon Valley good enough methods and the more traditional structured search methods.

IEEE seems to agree as long as the vendor “participates” in a suitable IEEE publishing program.

Stephen E Arnold, October 10, 2016

Crimping: Is the Method Used for Text Processing?

October 4, 2016

I read an article I found quite thought provoking. “Why Companies Make Their Products Worse” explains that reducing costs allows a manufacturer to expand the market for a product. The idea is that more people will buy a product if it is less expensive than a more sophisticated version of the product. The example which I highlighted in eyeshade green explained that IBM introduced an expensive printer in the 1980s. Then IBM manufactured the different version of the printer using cheaper labor. The folks from Big Blue added electronic components to make the cheaper printer slower. The result was a lower cost printer that was “worse” than the original.


Perhaps enterprise search and content processing is a hybrid of two or more creatures?

The write up explained that this approach to degrading a product to make more money has a name—crimping. The concept creates “product sabotage”; that is, intentionally degrading a product for business reasons.

The comments to the article offer additional examples and one helpful person with the handle Dadpolice stated:

The examples you give are accurate, but these aren’t relics of the past. They are incredibly common strategies that chip makers still use today.

I understand the hardware or tangible product application of this idea. I began to think about the use of the tactic by text processing vendors.

The Google Search Appliance may have been a product subject to crimping. As I recall, the most economical GSA was less than $2000, a price which was relatively easy to justify in many organizations. Over the years, the low cost option disappeared and the prices for the Google Search Appliances soared to Autonomy- and Fast Search-levels.

Other vendors introduced search and content processing systems, but the prices remained lofty. Search and content processing in an organization never seemed to get less expensive when one considered the resources required, the license fees, the “customer” support, the upgrades, and the engineering for customization and optimization.

My hypothesis is that enterprise content processing does not yield compelling examples like the IBM printer example.

Perhaps the adoption rate for open source content processing reflects a pent up demand for “crimping”? Perhaps some clever graduate student would take the initiative to examine the content processing product prices? Licensees spend for sophisticated solution systems like those available from outfits like IBM and Palantir Technologies. The money comes from the engineering and what I call “soft” charges; that is, training, customer support, and engineering and consulting services.

At the other end of the content processing spectrum are open source solutions. The middle between free or low cost systems and high end solutions does not have too many examples. I am confident there are some, but I could identify Funnelback, dtSearch, and a handful of other outfits.

Perhaps “crimping” is not a universal principle? On the other hand, perhaps content processing is an example of a technical software which has its own idiosyncrasies.

Content processing products, I believe, become worse over time. The reason is not “crimping.” The trajectory of lousiness comes from:

  • Layering features on keyword retrieval in hopes of finding a way to generate keen buyer interest
  • Adding features helps justify price increases
  • The greater the complexity of the system, the less likely the licensee will be able to fiddle with the system
  • A refusal to admit that content processing is a core component of many other types of software so “finding information” has become a standard component for other applications.

If content processing is idiosyncratic, that might explain why investors pour money into content processing companies which have little chance to generate sufficient revenue to pay off investors, generate a profit, and build a sustainable business. Enterprise search and content processing vendors seem to be in a state of reinventing or reimagining themselves. Guitar makers just pursue cost cutting and expand their market. It is not so easy for content processing companies.

Stephen E Arnold, October 4, 2016

Pharmaceutical Research Made Simple

October 3, 2016

Pharmaceutical companies are a major power in the United States.  Their power comes from the medicine they produce and the wealth they generate.  In order to maintain both wealth and power, pharmaceutical companies conduct a lot of market research.  Market research is a field based on people’s opinions and their reactions, in other words, it contains information that is hard to process into black and white data.  Lexalytics is a big data platform built with a sentiment analysis to turn market research into useable data.

Inside Big Data explains how “Lexalytics Radically Simplifies Market Research And Voice Of Customer Programs For The Pharmaceutical Industry” with a new package called the Pharmaceutical Industry Pack.  Lexalytics uses a combination of machine learning and natural language processing to understand the meaning and sentiment in text documents.  The new pack can help pharmaceutical companies interpret how their customers react medications, what their symptoms are, and possible side effects of medication.

Our customers in the pharmaceutical industry have told us that they’re inundated with unstructured data from social conversations, news media, surveys and other text, and are looking for a way to make sense of it all and act on it,’ said Jeff Catlin, CEO of Lexalytics. ‘With the Pharmaceutical Industry Pack — the latest in our series of industry-specific text analytics packages — we’re excited to dramatically simplify the jobs of CEM and VOC pros, market researchers and social marketers in this field.

Along with basic natural language processing features, the Lexalytics Pharmaceutical Industry Pack contains 7000 sentiment terms from healthcare content as well as other medical references to understand market research data.  Lexalytics makes market research easy and offers invaluable insights that would otherwise go unnoticed.

Whitney Grace, October 3, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Attensity: A Big 404 in Text Analytics

October 1, 2016

Search vendors can save their business by embracing text analytics. Sounds like a wise statement, right? I would point out that our routine check of search and content processing companies turned up this inspiring Web page for Attensity, the Xerox Parc love child and once hot big dog in text analysis:


Attensity joins a long list of search-related companies which have had to reinvent themselves.

The company pulled in $90 million from a “mystery investor” in 2014. A pundit tweeted in 2015:


In February 2016, Attensity morphed into Sematell GmbH, a company with interaction solutions.

I mention this arabesque because it underscores:

  1. No single add on to enterprise search will “save” an information access company
  2. Enterprise search has become a utility function. Witness the shift to cloud based services like SearchBlox, appliances like Maxxcat, and open source options. Who will go out on a limb for a proprietary utility when open source variants are available and improving?
  3. Pundits who champion a company often have skin in the game. Self appointed experts for cognitive computing, predictive analytics, or semantic link analysis are tooting a horn without other instruments.

Attensity is a candidate to join the enterprise search Hall of Fame. In the shrine are Delphes, Entopia, et al. I anticipate more members, and I have a short list of “who is next” taped on my watch wall.

Stephen E Arnold, October 1, 2016

Bam! Pow! Zap! Palantir Steps Up Fight with US Army

September 25, 2016

Many moons ago I worked at that fun loving outfit Booz, Allen & Hamilton. I recall one Master of the Universe telling me, “Keep the client happy.” Today an alternative approach has emerged. I term it “Fight with the client.” I assume the tactic works really well.


I read “Palantir Claims Army Misled to Keep It Out of DCGS-A Program.” As I understand the Mixed Martial Arts cage match, the US Army wants to build its own software system. Like many ideas emerging from Washington, DC, the system strikes me as complex and expensive. The program’s funding stretches back a decade. My hunch is that the software system will eventually knit together the digital information required by the US Army to complete its missions. Like many other US government programs, there are numerous vendors involved. Many of these are essentially focused on meeting the needs of the US government.

Palantir Technologies is a Sillycon Valley construct. The company poked its beak though a silicon shell in 2003 and opened for “real” business in 2004. That makes the company 12 years old. Like many disruptive unicorns, Palantir appears to be convinced that its Gotham system can do what the US Army wants done. The Shire and its Hobbits are girding for battle. What are the odds that a high technology company can mount its unicorns and charge into battle and win?

Image result for comic book pow zap

The Palantirians’ reasoning is, by Sillycon Valley standards, logical. Google, by way of comparison, believes that it can solve death and compete with AT&T in high speed fiber. Google may demonstrate that the Sillycon Valley way is more than selling ads, but for now, Google is not gaining traction in some of its endeavors. Palantir wants to activate its four wheel drive and power the US Army to digital nirvana.

The Defense News’s write up is a 1,200 word explanation of Palantir’s locker room planning. I noted this passage:

The Palo Alto-based company has argued the way the Army wrote its requirements in a request for proposals to industry would shut out Silicon Valley companies that provide commercially available products. The company contended that the Army’s plan to award just one contract to a lead systems integrator means commercially available solutions would have to be excluded.
Palantir is seeking to show the court that its data-management product — Palantir Gotham Platform — does exactly what DCGS-A is trying to do and comes at a much lower cost.

I like the idea of demonstrating the capabilities of Gotham to legal eagles. I know that lawyers are among the most technologically sophisticated professionals in the world. In addition, most lawyers are really skilled at technical problem solving and can work math puzzles while waiting for a Teavana Shaken Iced Tea.


The article also references “a chain of emails.” Yep, emails can be an interesting component of a cage match. With some Palantir proprietary information apparently surfacing in Buzzfeed, perhaps more emails will be forthcoming.

I have formulated three hypotheses about this tussle with the US Army:

  1. Palantir Technologies is not making progress with Gotham because of the downstream consequences of the i2 Analyst’s Notebook legal matter. The i2 product is owned by IBM, and IBM is a potentially important vendor to the US Army. IBM also has some chums in other big outfits working on the DCGS project. Palantir wants to be live in the big dogs’ kennel, but no go.
  2. Palantir’s revenue may need the DCGS contracts to make up for sales challenges in other market sectors. Warfighting and related security jobs can more predictable than selling a one off to a hospital chain in Tennessee.
  3. Palantir’s perception of Washington may be somewhat negative. Sillycon Valley companies “know” that their “solutions” are the “logical” ones. When Sillycon Valley logic confronts the reality of government contracting, sparks may become visible.

For me, I think the Booz, Allen & Hamilton truism may be on target. Does one keep a customer happy by fighting a public battle designed to prove the “logic” of the Sillycon Valley way?

I don’t think most of the DCGS contractors are lining up to mud wrestle the US Army. I would enjoy watching how legal eagles react to the Gotham wheel menu and learning how long it takes for a savvy lawyer to move discovery content into the Gotham system.

My seeing stone shows an messy five round battle and a lot of clean up and medical treatment after the fight.

Stephen E Arnold, September 25, 2016

A Congressman Seems to Support Palantir Gotham for US Army Personnel

September 23, 2016

I read “Commentary: The US Army Should Rethink Its Approach to DCGS.” The write up is interesting because it helped me understand the relationships which exist between an elected official (Congressman Duncan Hunter, Republican from California) and a commercial enterprise (Palantir Technologies). Briefly: The Congressman believes the US Army should become more welcoming to Palantir Technologies’ Gotham system.


A representation of the Department of Defense’s integrated defense acquisition, technology, and life cycle management system.

The write up points out that the US Army is pretty good with tangible stuff: Trucks, weapons, and tanks. The US Army, however, is not as adept with the bits and the bytes. As a result, the US Army’s home brew Distributed Common Ground System is not sufficiently agile to keep pace with the real world. DCGS has consumed about $4 billion and is the product of what I call the “traditional government procurement.”

The Congressman (a former Marine) wants to US Army to embrace Palantir Gotham in order to provide a better, faster, and cheaper system for integrating different types of information and getting actionable intelligence.


US Marine Captain Duncan Hunter before becoming a Congressman. Captain Hunter served in Iraq and Afghanistan. Captain Hunter was promoted to major in 2012.

The write up informed me:

Congress, soldiers and the public were consistently misinformed and the high degree of dysfunction within the Army was allowed to continue for too long. At least now there is verification—through Army admittance—of the true dysfunction within the program.

Palantir filed a complaint which was promptly sealed. The Silicon Valley company appears to be on a path to sue the US Army because Palantir is not the preferred way to integrate information and provide actionable intelligence to US Army personnel.

The Congressman criticizes a series of procedures I learned to love when I worked in some of the large government entities. He wrote:

he Army and the rest of government should take note of the fact that the military acquisition system is incapable of conforming to the lightening pace and development targets that are necessary for software. This should be an important lesson learned and cause the Army—especially in light of repeated misleading statements and falsehoods—to rethink its entire approach on DCGS and how it incorporates software for the Army of the future.

The call to action in the write up surprised me:

The Army has quality leaders in Milley and Fanning, who finally understand the problem. Now the Army needs a software acquisition system and strategy to match.

My hunch is that some champions of Palantir Gotham were surprised too. I expected the Congressman to make more direct statements about Palantir Gotham and the problems the Gotham system might solve.

After reading the write up, I jotted down these observations:

  • The DCGS system has a number of large defense contractors performing the work. One of them is IBM. IBM bought i2 Group. Before the deal with IBM, i2 sued Palantir Technologies, alleging that Palantir sought to obtain some closely held information about Analyst’s Notebook. The case was settled out of court. My hunch is that some folks at IBM have tucked this Palantir-i2 dust up away and reference it when questions about seamless integration of Gotham and Analyst’s Notebook arise.
  • Palantir, like other search and content processing vendors, needs large engagements. The millions, if not billions, associated with DCGS would provide Palantir with cash and a high profile engagement. A DCGS deal would possibly facilitate sales of Gotham to other countries’ law enforcement and intelligence units.
  • The complaint may evolve into actual litigation. Because the functions of Gotham are often used for classified activities, the buzz might allow high-value information to leak into the popular press. Companies like Centrifuge Systems, Ikanow, Zoomdata, and others would benefit from a more open discussion of the issues related to the functioning of DCGS and Gotham. From Palantir’s point of view, this type of information in a trade publication would not be a positive. For competitors, the information could be a gold mine filled with high value nuggets.

Net net: The Congressman makes excellent points about the flaws in the US Army procurement system. I was disappointed that a reference to the F 35 was not included. From my vantage point in Harrod’s Creek, the F 35 program is a more spectacular display of procurement goofs.

More to come. That’s not a good thing. A fully functioning system would deliver hardware and software on time and on budget. If you believe in unicorns, you will like me have faith in the government bureaucracy.

Stephen E Arnold, September 23, 2016

Dr. Mike Lynch: After Dark Trace, Luminance

September 19, 2016

I read “Time for Robo-lawyer? Mike Lynch backs Cambridge Law-Tech Start-Up Luminance.” The founder of Autonomy worked his magic on Dark Trace. I write a short description of Dark Trace as part of the Commercial Tools section of Dark Web Notebook. With that firm up and growing, Dr. Lynch is now backing smart software to replace human lawyers. With Dr. Lynch’s experience in the rarified atmosphere of the legal eagles, his new venture makes sense. Use software to trim the wings and perhaps the legal fees of the savvy litigators, tort specialists, and interpreters of wild and crazy laws.

According to the write up:

Founded by a combination of lawyers, experts in M&A and mathematicians Luminance’s technology is based on R&D from Cambridge University, and is anchored in Recursive Bayesian Estimation theory. Obviously. It harnesses the power of artificial intelligence to automatically read and understand hundreds of pages of detailed and complex legal documentation every minute. This offers companies the ability to carry out essential due diligence work with much greater speed.

Yep, Bayesian, Markovian, and Laplacian methods are about the fatten Dr. Lynch’s bank account again.

I highlighted this passage:

Luminance has been trained to think like a lawyer,” said CEO Emily Foges. “With Slaughter and May’s help, we are designing the system to understand how lawyers think, and to draw out key findings without the need to be told what to look for. This will transform document analysis and enhance the entire transaction process for law firms and their clients. Highly-trained lawyers who would otherwise be scanning through thousands of pages of repetitive documents can spend more of their time analyzing the findings and negotiating the terms of the deal.


One wonders how Hewlett Packard would have turned out if HP kept Dr. Lynch and let him fix the old time Sillycon Valley icon. Well, I wonder. I don’t think Meg Whitman spends much time thinking about Dr. Lynch until the court date in 2017. Perhaps Dr. Lynch will license Luminance technology to HPE so Meg Whitman can understand the value of Dr. Lynch’s approach to business. On the other hand, HPE may embrace OpenText Recommind. That new Luminance stuff may not make Meg Whitman comfortable.

Stephen E Arnold, September 19, 2016

OpenText: Documentum Enters the Canadian Wilderness

September 14, 2016

Documentum is an outfit that some big companies have to use. Other big outfits have hired integrators like IBM to make Documentum the go to system for creating laws and regulations. Other companies looking for a way to keep track of digital information believed the hyperbole about Documentum. Sure, one can get Documentum to “work.” But like other large scale, multipurpose content processing and management systems, considerable expertise, money, and time are often necessary. Documentum is now more than a quarter century young. Like other giant companies buying late 1980s technology, the job of generating sufficient cash flow is a big one. How is that acquisition of Autonomy going, Hewlett Packard? Oh, right. HP sold Autonomy and has a date in court related to that deal. What about Lexmark and ISYS Search Software? Are those empty offices an indication of rough water? What about IBM and Vivisimo? Oracle and Endeca? Dassault and Exalead? You get the idea. Buy a search vendor and discover that the demand for cash to make the systems hop, skip, and jump are significant. Then there is the pesky problem of open source software. Magento, anyone?

Now OpenText has purchased one of the US Food and Drug Administration’s all time favorite software systems. No doubt that visions of big bucks, juicy renewals, and opportunities to sell hot OpenText properties like BASIS, Fulcrum, and BRS Search are dancing in the heads of the Canadian business wizards.

I learned that OpenText is the proud new owner of Documentum. You can read the details, such as they are, in “OpenText Signs Deal for Dell EMC Division.” I learned that Documentum carried a price tag of $1.62 billion, a little more than what Oracle paid for Endeca and what Microsoft paid for the fascinating and legally confused Fast Search & Transfer content processing systems. OpenText, to its credit, paid one tenth the amount Hewlett Packard paid for Autonomy.

I learned:

“This acquisition further strengthens OpenText as a leader in enterprise information management, enabling customers to capture their digital future and transform into information-based businesses,” OpenText CEO Mark Barrenechea said in a statement Monday. “We are very excited about the opportunities which ECD and Documentum bring, and I look forward to welcoming our new customers, employees, and partners to OpenText.”

I also noted “Moody’s Places Open Text (OTEX) Ratings on Review for Downgrade.” That write up informed me:

Open Text plans to finance the acquisition with a combination of cash on hand, debt and equity. If the company raises equity to finance a significant portion of the purchase price, the Ba1 CFR will likely be confirmed. In a scenario where the company funds the acquisition with just cash on hand and new debt, the Ba1 CFR could face downward pressure. However, in such case Moody’s would evaluate the company’s ongoing commitment and capacity to de-lever, which could mitigate downward rating pressure. Negative ratings movement related to the CFR, if any, would be limited to one notch.

This is financial double talk for we are just not that confident that OpenText can make this deal spew revenue growth and hefty, sustainable profits. But my interpretation is fueled by Kentucky creek water. Your perception may differ. May I suggest you put your life savings into OpenText stock if you see rainbows, unicorns, and tooth fairies in this deal.

I noted this passage:

Open Text has made over $3 billion of acquisitions since 2005 and although the company does not break out results of acquired companies, EBITDA margins have increased to 35% from 17% over this period.

Get out your checkbook. Let the good times roll.

My view from rural Kentucky is less optimistic. Here are the points I noted on my Dollar General notepad as I worked through the articles about this deal:

  1. Michael Dell was quick to dump Documentum, underscoring the silliness of EMC’s rationale for buying the company in 2003 for about $1.7 billion
  2. The cost of maintaining Documentum server and the eight acquired company’s technology is likely to be tough to control
  3. The money needed to keep a 25 year old platform in tip top shape to compete with more youthful alternatives makes me wonder how OpenText will finance innovation
  4. The open source alternatives, whether for nifty NoSQL methods or clones of more traditional content management systems constructed by programmers with time on their hands, are likely to be a challenge.

To sum up, OpenText is a roll up of overlapping and often competing products and services. I hope the OpenText marketing department is able to sort out when to use which OpenText product. If customers are not confused, that’s good. If the customers are confused, the time to close a deal for a giant, rest home qualified software is likely to be lengthy.

OpenText is much loved by those in Canada. I recall the affection felt for Blackberry. Stakeholders will be watching OpenText to make sure that it does not mix up raspberries and blackberries. Blackberries, by the way, have “drupelets.” That sounds like Drupal to me.

Stephen E Arnold, September 14, 2016


Bot Landscape Includes Search

August 23, 2016

Search and retrieval technology finds a place in a “bot landscape.” The collection of icons appears in “Introducing the Bots Landscape: 170+ Companies, $4 Billion in Funding, Thousands of Bots.” The diagram of the bots landscape in the write up is, for me, impossible to read. I admit it does convey the impression of a lot of a bots. The high resolution version was also difficult for me to read. You can download a copy and take a gander yourself at this link. But there is a super high resolution version available for which one must provide a name and an email. Then one goes through a verification step. Clever marketing? Well, annoying to me. The download process required three additional clicks. Here it is. A sight for young eyes.


I was able to discern a reference to search and retrieval technology in the category labeled “AI Tools: Natural Language Processing, Machine Learning, Speech & Voice Recognition.” I was able to identity the logo of Fair Issacs and the mark of Zorro, but the other logos were unreadable by my 72 year old eyes.

The graphic includes these bot-agories too:

  1. Bots with traction
  2. Connectors and shared services
  3. Bot discover
  4. Bot developer frameworks and tools
  5. Analytics
  6. Messaging.

The bot landscape is rich and varied. MBAs and mavens are resourceful and gifted specialists in classification. The fact that the categories are, well, a little muddled is less important than finding a way to round up so many companies worth so much money.

Stephen E Arnold, August 23, 2016

Next Page »