A Potentially Useful List of Enterprise Search Engine Servers

July 20, 2017

We found a remarkable list at Predictive Analytics Today—“Top 23 Enterprise Search Engine Servers.” The write-up introduces its roster of resources:

Enterprise Search is the search information within an enterprise, searching of content from multiple enterprise-type sources, such as databases and intranets. These search systems index data and documents from a variety of sources including file systems, intranets, document management systems, e-mail, and databases. Enterprise search systems also integrate structured and unstructured data in their collections and also use access controls to enforce a security policy on their users.

Entries are logically presented under two categories, proprietary solutions and open source software. From Algolia to Xapian, the article summarizes pros and cons of each. See the post for details.

However, we have a few notes to add about some particular platforms. For example, the Google Search Appliance has been discontinued, though Constellio is still going… in Canada. SearchBlox is now Elasticsearch, and SRCH2 was originally designed for mobile searches. Also, isn’t Sphinx Search specifically for SQL data? Hmm. We suggest this list could make a good springboard, but server shoppers should take its specifics with a grain of salt, and be sure to do your own follow-up research.

Cynthia Murrell, July 20, 2017

IBM Watson: Two Views of the Same Pile of Tinker Toys

July 19, 2017

I find IBM an interesting outfit to watch. But more entertaining is watching how the Watson product and service is perceived by smart people. On the side of the doubters is a Wharton grad, James Kisner, who analyzes for a living at Jeffries & Co. His report “Creating Shareholder Value with AI? Not So Elementary, My Dear Watson?” suggests that IBM is struggling to makes its big bet pay off. If not a Google moon shot, Mr. Kisner thinks the low orbit satellite launch is in an orbit which will result in Watson burring up upon re-entry to reality.

Image result for chihuahua costume

The Big Dog of artificial intelligence and smart software may be a Chihuahua dressed up like a turkey, not a very big dog, not much of a bark, and certainly not equipped to take a big bite out a Wharton trained analyst’s foot.

On the rah rah side is Vijay, a blogger who does not put his name on his blog or on his About page. (One of my intrepid researchers thinks this Vijay’s last name is “Vijayasankar?.” Maybe?) I assume he is famous, just not here in Harrod’s Creek. His most recent write up about Watson is “IBM Watson Is Just Fine, Thank You!” His motivation for the write up is that the attention given to the Jeffries’ report caught his attention. He is a straight shooter; for example:

I am a big fan of criticism of technology – and as folks who have known me over time can vouch, I seldom hold back what is in my mind on any topic. I strongly believe that criticism is healthy for all of us – including businesses, and without it we cannot grow. If you go through my previous blogs, you can see first hand how I throw cold water on hype.

I like the cold water on hype from a person who is an IBM executive, and one who has been involved in the IBM Watson health initiatives. (I think this includes the star crossed Anderson project in Houston. I hear, “Houston, we have a problem,” but you may not.) I highlighted these points in this blog post:

  1. Hey, world, IBM is an enterprise product, not a consumer product. This seems obvious, but apparently IBM’s ability to communicate what it is selling and to whom is not working at peak efficiency or maybe not working because everyone is confused about Watson?
  2. IBM does not do the data federation things with its customer data. That’s good. I know that IBM sells a mainframe that encrypts everything. Interesting but I am not sure how this addresses flat revenue growth, massive layoffs, and the baffling Watson marketing which recently had a white cube floating in a tax preparer’s office. A white cube?
  3. IBM Watson has lots of successes. That’s a great assertion. The problem is that Watson started out as the next big thing. There was a promise of billions in revenue. There was a big office commitment in Manhattan. Then there was the implosion at the Houston health center. “Watson, do you read me?” I once tracked some of the Watson craziness in a series called the “Weakly Watson.” I gave up. The actual examples struck me as a painful type of fake news. What’s interesting is that the “weakly” stories were “real.” Scary to me and to stakeholders.
  4. Watson is not a product. Watson is an API to the IBM ecosystem. Vendor lock in beckons. And, of course, lots of APIs. These digital tinker toys can be snapped together. The problems range from the cost and time required for system training, the consulting and engineering services price tag, and the massaging required to explain that Watson is something that requires a lot of work. For the Instagram crowd that’s a problem. “Houston. Houston. Do you copy? Tinker toys. Lego blocks. Do you copy?”
  5. Watson “some times needs consulting.” Talk about an understatement. Watson needs lots and lots of consulting, engineering services, training, configuring, tuning. and training. Because Watson is a confection of open source, acquired technologies, and home brew code—a lot of work is needed. That’s because Watson was designed to generate high margin services, not the trivial revenue from online ads or from people ordering laundry detergent by pressing a button on their washing machine.
  6. Watson has two things in its bag of tricks: “Great marketing” and “AI talent.” Okay, marketing and smart people. The basic problem IBM has to solve before investors get frisky is generating significant, sustainable revenues and healthy margins. Spending money buys marketing and people. Effective management orchestrates what can be bought into stuff that can be sold at a profit.

The Vijay write up ends with a question. Here you go: “So why is IBM not publishing Watson revenue specifically?” This Vijay fellow who assumes that I know his last name does not answer the question. In the deafening silence, we need an answer.

That brings me to the Jeffries & Co. report by James Kisner, who is certified to do financial analysis. The answer to Vijay’s question consumes 53 pages of verbiage, charts, and tables of numbers. The entire document was available on July 18, 2017, at this link, but it may disappear. Many analyst documents disappear for the average guy. (If the link is dead, head over to Investext or give Jeffries & Co. a quick call to see if that will get you the meaty document.

Image result for snarling guard dog

A Jeffries & Co. analyst with teeth bites into the IBM financial data and seems to be unsatisfied.

In a nutshell, the Jeffries’ report says that IBM Watson is a limp noodle. Among the Watson characteristics are unhappy customers, wild and crazy marketing, misfires on deep learning, and the incredibly difficult, time consuming, and expensive data preparation required to make the system say, “Woof, woof” or maybe “Wolf, wolf” when there is something important for a human to notice.

Net net: IBM’s explanations of Watson have not produced the revenues and profits stakeholders expect. Jeffries & Co. goes MBA crazy providing a wide range of data to support the argument that Watson is struggling.

That “woof, woof” is the sound of a Chihuahua barking with the help of IBM spokespeople and lots of PR and marketing minions. The Wharton guy is a larger dog, barks ferociously, and has a bite backed up by data. IBM has to prove that it can solve problems for clients, generate sustainable revenue, and keep the competition from chowing down on a Watson weighted down with digital tinker toys.

Stephen E Arnold, July 19, 2017

ArnoldIT Publishes Technical Analysis of the Bitext Deep Linguistic Analysis Platform

July 19, 2017

ArnoldIT has published “Bitext: Breakthrough Technology for Multi-Language Content Analysis.” The analysis provides the first comprehensive review of the Madrid-based company’s Deep Linguistic Analysis Platform or DLAP. Unlike most next-generation multi-language text processing methods, Bitext has crafted a platform. The document can be downloaded from the Bitext Web site via this link.

Based on information gathered by the study team, the Bitext DLAP system outputs metadata with an accuracy in the 90 percent to 95 percent range.
Most content processing systems today typically deliver metadata and rich indexing with accuracy in the 70 to 85 percent range.

According to Stephen E Arnold, publisher of Beyond Search and Managing Director of Arnold Information Technology:

“Bitext’s output accuracy establish a new benchmark for companies offering multi-language content processing system.”

The system performs in near real time, more than 15 discrete analytic processes. The system can output enhanced metadata for more than 50 languages. The structured stream provides machine learning systems with a low cost, highly accurate way to learn. Bitext’s DLAP platform integrates more than 30 separate syntactic functions. These include segmentation, tokenization (word segmentation, frequency, and disambiguation, among others. The DLAP platform analyzes more  than 15 linguistic features of content in any of the more than 50 supported languages. The system extracts entities and generates high-value data about documents, emails, social media posts, Web pages, and structured and semi-structured data.

DLAP Applications range from fraud detection to identifying nuances in streams of data; for example, the sentiment or emotion expressed in a document. Bitext’s system can output metadata and other information about processed content as a feed stream to specialized systems such as Palantir Technologies’ Gotham or IBM’s Analyst’s Notebook. Machine learning systems such as those operated by such companies as Amazon, Apple, Google, and Microsoft can “snap in” the Bitext DLAP platform.

Copies of the report are available directly from Bitext at https://info.bitext.com/multi-language-content-analysis Information about Bitext is available at www.bitext.com.

Kenny Toth, July 19, 2017

Darktrace Delivers Two Summer Sizzlers

July 17, 2017

Darktrace offers an enterprise immune system called Antigena. Based on the information gathered in the writing of the “Dark Web Notebook,” the system has a number of quite useful functions. The company’s remarkable technology can perform real time, in depth analyses of an insider’s online activities. Despite the summer downturn which sucks in many organizations, Darktrace has been active. First, the company secured an additional round of investment. This one is in the $75 million range. This brings the funding of the company to the neighborhood of $170 million, according to Crunchbase.

Details about the deal appear in this Outlook Series write up. I noted this statement:

The cyber security firm has raised a $75 million Series D financing round led by Insight Venture Partners, with participation from existing investors Summit Partners, KKR and TenEleven Ventures.

On another front, Darktrace has entered into a partnership with CITIC. This outfit plans to bring “next-generation cyber defense to businesses across Asia Pacific.” Not familiar with CITIC? You might want to refresh your memory bank. Beyond Search believes that this tie up may open the China market for Darktrace. If it does, Darktrace is likely to emerge as one of the top two or three cyber security firms in the world before the autumn leaves begin to fall.

Here in Harrod’s Creek we think about the promise of Darktrace against a background of erratic financial performance from Hewlett Packard. As you may recall, one of the spark plugs for Darktrace is Dr. Michael Lynch, the founder of Autonomy. HP bought Autonomy and found that its management culture was an antigen to its $11 billion investment. It is possible to search far and wide for an HP initiative which has delivered the type of financial lift that Darktrace has experienced.

Information about Darktrace is at www.darktrace.com. A profile about this company appears in the Dark Web Notebook in the company of IBM Analyst’s Notebook, Google/In-Q-Tel Recorded Future, and Palantir Technologies Gotham. You can get these profile at this link: https://gum.co/darkweb.

Stephen E Arnold, July 17, 2107

The Big Problems of Big Data

June 30, 2017

Companies are producing volumes of data. However, no fully functional system is able to provide actionable insights to decision makers in real time. Bayesian methods might pave the way to the solution seekers.

In an article published by PHYS and titled Advances in Bayesian Methods for Big Data, the author says:

Bayesian methods provide a principled theory for combining prior knowledge and uncertain evidence to make sophisticated inference of hidden factors and predictions.

Though the methods of data collection have improved, analyzing and presenting actionable insights in real time is still a big problem for Big Data adopters. Human intervention is required at almost every step which defies the entire purpose of an intelligent system. Hopefully, Bayesian methods can resolve these issues. Experts have been reluctant to adopt Bayesian methods owing to the fact that they are slow and are not scalable. However, with recent advancements in machine learning, the method might work.

Vishal Ingole, June 30, 2017

Google Translate Is Constantly Working

June 28, 2017

It seems we are always just around the corner from creating the universal translation device, but with online translation services the statement is quite accurate.  Google Translate is one of the most powerful and accurate free translation services on the Internet.  Lackuna shares some “Facts About Google Translate You May Not Know” to help you understand the power behind Google Translate.

Google Translation is built on statistical machine translation (SMT), basically it means that computers are analyzing translated documents from the Web to learn languages and find the patterns within them.  From there, the service picks the best probable translation results for each query.  Google Translate used to work differently:

However, Google Translate didn’t always work this way. Initially, it used a rule-based system, where rules of grammar and syntax, along with vocabulary for each language, were manually coded into a computer. Google switched to SMT because it enables the service to improve itself as it combs the web adding to its database of text — as opposed to linguists having to identify and code new rules as a language evolves. This provides a much more accurate translation, as well as saving thousands of programming/linguist man-hours.

While Google might be saving time relying fully on SMT, the linguist human touch is necessary to gain the sentimental and full comprehension of a language.  Companies like Bitext that built analytics engines on linguistics’ knowledge combined with machine learning have a distinct advantage over others.

Meanwhile, Google Translate still remains a valuable service.  It currently translates sixty-four languages, a chatbot translates in real-time and allows people to communicate in their native tongue, it has a speech-to-speech translation in conversation mode node for Android, and also uses photos to translate written language in real time.

Whitney Grace, June 28, 2017


Amazon Answers Artificial Intelligence Questions

May 24, 2017

One big question about Amazon is how the company is building its artificial intelligence and machine learning programs.  It was the topic of conversation at the recent Internet Association’s annual gala, where Jeff Bezos, Amazon CEO, discussed it.  GeekWire wrote about Bezos’s appearance at the gala in the article, “Jeff Bezos Explained Amazon’s Artificial Intelligence And Machine Learning.”

The discussion Bezos participated in covered a wide range of topics, including online economy, Amazon’s media overage, its business principles, and, of course, artificial intelligence.  Bezos compared the time we are living in to the realms of science fiction and Amazon is at the forefront of it.  Through Amazon Web Services, the company has clients ranging from software developers to corporations.  Amazon’s goal is make the technology available to everyone, but deployment is a problem as is finding the right personnel with the right expertise.

Amazon realizes that the power of its technology comes from behind the curtain:

I would say, a lot of the value that we’re getting from machine learning is actually happening beneath the surface. It is things like improved search results. Improved product recommendations for customers. Improved forecasting for inventory management. Literally hundreds of other things beneath the surface.

This reminds me of Bitext, an analytics software company based in Madrid, Spain.  Bitext’s technology is used to power machine learning beneath many big companies’ software.  Bitext is the real power behind many analytics projects.

Whitney Grace, May 24, 2017

Bitvore: The AI, Real Time, Custom Report Search Engine

May 16, 2017

Just when I thought information access had slumped quietly through another week, I read in the capitalist tool which you know as Forbes, the content marketing machine, this article:

This AI Search Engine Delivers Tailored Data to Companies in Real Time.

This write up struck me as more interesting than the most recent IBM Watson prime time commercial about smart software for zealous professional basketball fans or Lucidworks’ (really?) acquisition of the interface builder Twigkit. Forbes Magazine’s write up did not point out that the company seems to be channeling Palantir Technologies; for example, Jeff Curie, the president, refers to employees at Bitvorians. Take that, you Hobbits and Palanterians.


A Bitvore 3D data structure.

The AI, real time, custom report search engine is called Bitvore. Here in Harrod’s Creek, we recognized the combination of the computer term “bit” with a syllable from one of our favorite morphemes “vore” as in carnivore or omnivore or the vegan-sensitive herbivore.

Read more

Palantir Settles Discrimination Case

May 15, 2017

Does this count as irony? Palantir, who has built its data-analysis business largely on its relationships with government organizations, has a Department of Labor analysis to thank for recent charges of discrimination. No word on whether that Department used Palantir software to “sift through” the reports. Now, Business Insider tells us, “Palantir Will Shell Out $1.7 Million to Settle Claims that It Discriminated Against Asian Engineers.” Writer Julie Bort tells us that, in addition to that payout, Palantir will make job offers to eight unspecified Asians. She also explains:

The issue arose because, as a government contractor, Palantir must report its diversity statistics to the government. The Labor Department sifted through these reports and concluded that even though Palantir received a huge number of qualified Asian applicants for certain roles, it was hiring only small numbers of them. Palantir, being the big data company that it is, did its own sifting and produced a data-filled response that it said refuted the allegations and showed that in some tech titles 25%-38% of its employees were Asians. Apparently, Palantirs protestations weren’t enough on to satisfy government regulators, so the company agreed to settle.

For its part, Palantir insists on their innocence but say they settled in order to put the matter behind them. Bort notes the unusual nature of this case—according to the Equal Employment Opportunity Commission, African-Americans, Latin-Americans, and women are more underrepresented in tech fields than Asians. Is the Department of Labor making it a rule to analyze the hiring patterns of companies required to report diversity statistics? If they are consistent, there should soon be a number of such lawsuits regarding discrimination against other groups. We shall see.

Cynthia Murrell, May 15, 2017

AI Not to Replace Lawyers, Not Yet

May 9, 2017

Robot or AI lawyers may be effective in locating relevant cases for references, but they are far away from replacing lawyers, who still need to go to the court and represent a client.

ReadWrite in a recently published analytical article titled Look at All the Amazing Things AI Can (and Can’t yet) Do for Lawyers says:

Even if AI can scan documents and predict which ones will be relevant to a legal case, other tasks such as actually advising a client or appearing in court cannot currently be performed by computers.

The author further explains that what the present generation of AI tools or robots does. They merely find relevant cases based on indexing and keywords, which was a time-consuming and cumbersome process. Thus, what robots do is eliminate the tedious work that was performed by interns or lower level employees. Lawyers still need to collect evidence, prepare the case and argue in the court to win a case. The robots are coming, but only for doing lower level jobs and not to snatch them.

Vishol Ingole, May 9, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta