Progress: From Selling NLP to Providing NLP Services

December 11, 2017

Years ago, Progress Software owned an NLP system. I recall conversations with natural language processing wizards from Easy Ask. Larry Harris developed a natural language system in 1999 or 2000. Progress purchased EasyAsk in 2005 if memory serves. I interviewed Craig Bassin in 2010 as part of my Search Wizards Speak series.

The recollection I have was that Progress divested itself of EasyAsk in order to focus on enterprise applications other than NLP. No big deal. Software companies are bought and sold everyday.

However, what makes this recollection interesting to me is the information in “Beyond NLP: 8 Challenges to Building a Chatbot.” Progress went from a software company who owned an NLP system to a company which is advising people like me how challenging a chatbot system can be to build and make work. (I noted that the Wikipedia entry for Progress does not mention the EasyAsk acquisition and subsequent de-acquisition.) Either small potatoes or a milestone best jumped over I assume.)

Presumably it is easier to advise and get paid to implement than funding and refining an NLP system like EasyAsk. If you are not familiar with EasyAsk, the company positions itself in eCommerce site search with its “cognitive eCommerce” technology. EasyAsk’s capabilities include voice enabled natural language mobile search. This strikes me as a capability which is similar to that of a chatbot as I understand the concept.

History is history one of my high school teachers once observed. Let’s move on.

What are the eight challenges to standing up a chatbot which sort of works? Here they are:

  1. The chat interface
  2. NLP
  3. The “context” of the bot
  4. Loops, splits, and recursions
  5. Integration with legacy systems
  6. Analytics
  7. Handoffs
  8. Character, tone, and persona.

As I review this list, I note that I have to decide whether to talk to a chatbot or type into a box so a “customer care representative” can assist me. The “representative” is, the assumption is, a smart software robot.

I also notice that the bot has to have context. Think of a car dealer and the potential customer. The bot has to know that I want to buy a car. Seems obvious. But okay.

“Loops, splits, and recursions.” Frankly I have no idea what this means. I know that chatbot centric companies use jargon. I assume that this means “programming” so the NLP system returns a semi-on point answer.

Integration with legacy systems and handoffs seem to be similar to me. I would just call these two steps “integration” and be done with it.

The “character, tone, and persona” seems to apply to how the chatbot sounds; for example, the nasty, imperious tone of a Kroger automated check out system.

Net net: Progress is in the business of selling advisory and engineering services. The reason, in my opinion, was that Progress could not crack the code to make search and retrieval generate expected payoffs. Like some Convera executives, selling search related services was a more attractive path.

Stephen E Arnold, December 11, 2017

Google Search and Hot News: Sensitivity and Relevance

November 10, 2017

I read “Google Is Surfacing Texas Shooter Misinformation in Search Results — Thanks Also to Twitter.” What struck me about the article was the headline; specifically, the implication for me was that Google was not responding to user queries. Google is actively “surfacing” or fetching and displaying information about the event. Twitter is also involved. I don’t think of Twitter as much more than a party line. One can look up keywords or see a stream of content containing a keyword or a, to use Twitter speak, “hash tags.”

The write up explains:

Users of Google’s search engine who conduct internet searches for queries such as “who is Devin Patrick Kelley?” — or just do a simple search for his name — can be exposed to tweets claiming the shooter was a Muslim convert; or a member of Antifa; or a Democrat supporter…

I think I understand. A user inputs a term and Google’s system matches the user’s query to the content in the Google index. Google maintains many indexes, despite its assertion that it is a “universal search engine.” One has to search across different Google services and their indexes to build up a mosaic of what Google has indexed about a topic; for example, blogs, news, the general index, maps, finance, etc.

Developing a composite view of what Google has indexed takes time and patience. The results may vary depending on whether the user is logged in, searching from a particular geographic location, or has enabled or disabled certain behind the scenes functions for the Google system.

The write up contains this statement:

Safe to say, the algorithmic architecture that underpins so much of the content internet users are exposed to via tech giants’ mega platforms continues to enable lies to run far faster than truth online by favoring flaming nonsense (and/or flagrant calumny) over more robustly sourced information.

From my point of view, the ability to figure out what influences Google’s search results requires significant effort, numerous test queries, and recognition that Google search now balances on two pogo sticks. Once “pogo stick” is blunt force keyword search. When content is indexed, terms are plucked from source documents. The system may or may not assign additional index terms to the document; for example, geographic or time stamps.

The other “pogo stick” is discovery and assignment of metadata. I have explained some of the optional tags which Google may or may not include when processing a content object; for example, see the work of Dr. Alon Halevy and Dr. Ramanathan Guha.

But Google, like other smart content processing today, has a certain sensitivity. This means that streams of content processed may contain certain keywords.

When “news” takes place, the flood of content allows smart indexing systems to identify a “hot topic.” The test queries we ran for my monographs “The Google Legacy” and “Google Version 2.0” suggest that Google is sensitive to certain “triggers” in content. Feedback can be useful; it can also cause smart software to wobble a bit.

Image result for the impossible takes a little longer

T shirts are easy; search is hard.

I believe that the challenge Google faces is similar to the problem Bing and Yandex are exploring as well; that is, certain numerical recipes can over react to certain inputs. These over reactions may increase the difficulty of determining what content object is “correct,” “factual,” or “verifiable.”

Expecting a free search system, regardless of its owner, to know what’s true and what’s false is understandable. In my opinion, making this type of determination with today’s technology, system limitations, and content analysis methods is impossible.

In short, the burden of figuring out what’s right and what’s not correct falls on the user, not exclusively on the search engine. Users, on the other hand, may not want the “objective” reality. Search vendors want traffic and want to generate revenue. Algorithms want nothing.

Mix these three elements and one takes a step closer to understanding that search and retrieval is not the slam dunk some folks would have me believe. In fact, the sensitivity of content processing systems to comparatively small inputs requires more discussion. Perhaps that type of information will come out of discussions about how best to deal with fake news and related topics in the context of today’s information retrieval environment.

Free search? Think about that too.

Stephen E Arnold, November 10, 2017

Understanding Intention: Fluffy and Frothy with a Few Factoids Folded In

October 16, 2017

Introduction

One of my colleagues forwarded me a document called “Understanding Intention: Using Content, Context, and the Crowd to Build Better Search Applications.” To get a copy of the collateral, one has to register at this link. My colleague wanted to know what I thought about this “book” by Lucidworks. That’s what Lucidworks calls the 25 page marketing brochure. I read the PDF file and was surprised at what I perceived as fluff, not facts or a cohesive argument.

image

The topic was of interest to my colleague because we completed a five month review and analysis of “intent” technology. In addition to two white papers about using smart software to figure out and tag (index) content, we had to immerse ourselves in computational linguistics, multi-language content processing technology, and semantic methods for “making sense” of text.

The Lucidworks’ document purported to explain intent in terms of content, context, and the crowd. The company explains:

With the challenges of scaling and storage ticked off the to-do list, what’s next for search in the enterprise? This ebook looks at the holy trinity of content, context, and crowd and how these three ingredients can drive a personalized, highly-relevant search experience for every user.

The presentation of “intent” was quite different from what I expected. The details of figuring out what content “means” were sparse. The focus was not on methodology but on selling integration services. I found this interesting because I have Lucidworks in my list of open source search vendors. These are companies which repackage open source technology, create some proprietary software, and assist organizations with engineering and integrating services.

The book was an explanation anchored in buzzwords, not the type of detail we expected. After reading the text, I was not sure how Lucidworks would go about figuring out what an utterance might mean. The intent-centric systems we reviewed over the course of five months followed several different paths.

Some companies relied upon statistical procedures. Others used dictionaries and pattern matching. A few combined multiple approaches in a content pipeline. Our client, a firm based in Madrid, focused on computational linguistics plus a series of procedures which combined proprietary methods with “modules” to perform specific functions. The idea for this approach was to reduce the errors in intent identification from accuracy between 65 percent to 80 percent to accuracy approaching and often exceeding 90 percent. For text processing in multi-language corpuses, the Spanish company’s approach was a breakthrough.

I was disappointed but not surprised that Lucidworks’ approach was breezy. One of my colleagues used the word “frothy” to describe the information in the “Understanding Intention” document.

As I read the document, which struck me as a shotgun marriage of generalizations and examples of use cases in which “intent” was important, I made some notes.

Let me highlight five of the observations I made. I urge you to read the original Lucidworks’ document so you can judge the Lucidworks’ arguments for yourself.

Imitation without Attribution

My first reaction was that Lucidworks had borrowed conceptually from ideas articulated by Dr. Gregory Grefenstette and his book Search Based Applications: At the Confluence of Search and Database Technologies. You can purchase this 2011 book on Amazon at this link. Lucidworks’ approach, unlike Dr. Grefenstette’s borrowed some of the analysis but did not include the detail which supports the increasing importance of using search as a utility within larger information access solutions. Without detail, the Lucidworks’ document struck me as a description of the type of solutions that a company like Tibco is now offering its customers.

Read more

AI Predictions for 2018

October 11, 2017

AI just keeps gaining steam, and is positioned to be extremely influential in the year to come. KnowStartup describes “10 Artificial Intelligence (AI) Technologies that Will Rule 2018.” Writer Biplab Ghosh introduces the list:

Artificial Intelligence is changing the way we think of technology. It is radically changing the various aspects of our daily life. Companies are now significantly making investments in AI to boost their future businesses. According to a Narrative Science report, just 38% percent of the companies surveys used artificial intelligence in 2016—but by 2018, this percentage will increase to 62%. Another study performed by Forrester Research predicted an increase of 300% in investment in AI this year (2017), compared to last year. IDC estimated that the AI market will grow from $8 billion in 2016 to more than $47 billion in 2020. ‘Artificial Intelligence’ today includes a variety of technologies and tools, some time-tested, others relatively new.

We are not surprised that the top three entries are natural language generation, speech recognition, and machine learning platforms, in that order. Next are virtual agents (aka “chatbots” or “bots”), then decision management systems, AI-optimized hardware, deep learning platforms, robotic process automation, text analytics & natural language processing, and biometrics. See the write-up for details on each of these topics, including some top vendors in each space.

Cynthia Murrell, October 11, 2017

New Beyond Search Overflight Report: The Bitext Conversational Chatbot Service

September 25, 2017

Stephen E Arnold and the team at Arnold Information Technology analyzed Bitext’s Conversational Chatbot Service. The BCBS taps Bitext’s proprietary Deep Linguistic Analysis Platform to provide greater accuracy for chatbots regardless of platform.

Arnold said:

The BCBS augments chatbot platforms from Amazon, Facebook, Google, Microsoft, and IBM, among others. The system uses specific DLAP operations to understand conversational queries. Syntactic functions, semantic roles, and knowledge graph tags increase the accuracy of chatbot intent and slotting operations.

One unique engineering feature of the BCBS is that specific Bitext content processing functions can be activated to meet specific chatbot applications and use cases. DLAP supports more than 50 languages. A BCBS licensee can activate additional language support as needed. A chatbot may be designed to handle English language queries, but Spanish, Italian, and other languages can be activated with via an instruction.

Dr. Antonio Valderrabanos said:

People want devices that understand what they say and intend. BCBS (Bitext Chatbot Service) allows smart software to take the intended action. BCBS allows a chatbot to understand context and leverage deep learning, machine intelligence, and other technologies to turbo-charge chatbot platforms.

Based on ArnoldIT’s test of the BCBS, accuracy of tagging resulted in accuracy jumps as high as 70 percent. Another surprising finding was that the time required to perform content tagging decreased.

Paul Korzeniowski, a member of the ArnoldIT study team, observed:

The Bitext system handles a number of difficult content processing issues easily. Specifically, the BCBS can identify negation regardless of the structure of the user’s query. The system can understand double intent; that is, a statement which contains two or more intents. BCBS is one of the most effective content processing systems to deal correctly  with variability in human statements, instructions, and queries.

Bitext’s BCBS and DLAP solutions deliver higher accuracy, and enable more reliable sentiment analyses, and even output critical actor-action-outcome content processing. Such data are invaluable for disambiguating in Web and enterprise search applications, content processing for discovery solutions used in fraud detection and law enforcement and consumer-facing mobile applications.

Because Bitext was one of the first platform solution providers, the firm was able to identify market trends and create its unique BCBS service for major chatbot platforms. The company focuses solely on solving problems common to companies relying on machine learning and, as a result, has done a better job delivering such functionality than other firms have.

A copy of the 22 page Beyond Search Overflight analysis is available directly from Bitext at this link on the Bitext site.

Once again, Bitext has broken through the barriers that block multi-language text analysis. The company’s Deep Linguistics Analysis Platform supports more than 50 languages at a lexical level and +20 at a syntactic level and makes the company’s technology available for a wide range of applications in Big Data, Artificial Intelligence, social media analysis, text analytics,  and the new wave of products designed for voice interfaces supporting multiple languages, such as chatbots. Bitext’s breakthrough technology solves many complex language problems and integrates machine learning engines with linguistic features. Bitext’s Deep Linguistics Analysis Platform allows seamless integration with commercial, off-the-shelf content processing and text analytics systems. The innovative Bitext’s system reduces costs for processing multilingual text for government agencies and commercial enterprises worldwide. The company has offices in Madrid, Spain, and San Francisco, California. For more information, visit www.bitext.com.

Kenny Toth, September 25, 2017

Lucidworks: The Future of Search Which Has Already Arrived

August 24, 2017

I am pushing 74, but I am interested in the future of search. The reason is that with each passing day I find it more and more difficult to locate the information I need as my routine research for my books and other work. I was anticipating a juicy read when I requested a copy of “Enterprise Search in 2025.” The “book” is a nine page PDF. After two years of effort and much research, my team and I were able to squeeze the basics of Dark Web investigative techniques into about 200 pages. I assumed that a nine-page book would deliver a high-impact payload comparable to one of the chapters in one of my books like CyberOSINT or Dark Web Notebook.

I was surprised that a nine-page document was described as a “book.” I was quite surprised by the Lucidworks’ description of the future. For me, Lucidworks is describing information access already available to me and most companies from established vendors.

The book’s main idea in my opinion is as understandable as this unlabeled, data-free graphic which introduces the text content assembled by Lucidworks.

image

However, the pamphlet’s text does not make this diagram understandable to me. I noted these points as I worked through the basic argument that client server search is on the downturn. Okay. I think I understand, but the assertion “Solr killed the client-server stars” was interesting. I read this statement and highlighted it:

Other solutions developed, but the Solr ecosystem became the unmatched winner of the search market. Search 1.0 was over and Solr won.

In the world of open source search, Lucene and Solr have gained adherents. Based on the information my team gathered when we were working on an IDC open source search project, the dominant open source search system was Lucene. If our data were accurate when we did the research, Elastic’s Elasticsearch had emerged as the go-to open source search system. The alternatives like Solr and Flaxsearch have their users and supporters, but Elastic, founded by Shay Branon, was a definite step up from his earlier search service called Compass.

In the span of two and a half years, Elastic had garnered more than a $100 million in funding by 2014and expanded into a number adjacent information access market sectors. Reports I have received from those attending Elastic meetings was that Elastic was putting considerable pressure on proprietary search systems and a bit of a squeeze on Lucidworks. Google’s withdrawing its odd duck Google Search Appliance may have been, in small part, due to the rise of Elasticsearch and the changes made by organizations trying to figure out how to make sense of the digital information to which their staff had access.

But enough about the Lucene-Solr and open source versus proprietary search yin and yang tension.

Read more

IBM Watson Deep Learning: A Great Leap Forward

August 16, 2017

I read in the IBM marketing publication Fortune Magazine. Oh, sorry, I meant the independent real business news outfit Fortune, the following article: “IBM Claims Big Breakthrough in Deep Learning.” (I know the write up is objective because the headline includes the word “claims.”)

The main point is that the IBM Watson super game winning thing can now do certain computational tasks more quickly is mildly interesting. I noticed that one of our local tire discounters has a sale on a brand called Primewell. That struck me as more interesting than this IBM claim.

First, what’s the great leap forward the article touts? I highlighted this passage:

IBM says it has come up with software that can divvy those tasks among 64 servers running up to 256 processors total, and still reap huge benefits in speed. The company is making that technology available to customers using IBM Power System servers and to other techies who want to test it.

How many IBM Power 8 servers does it take to speed up Watson’s indexing? I learned:

IBM used 64 of its own Power 8 servers—each of which links both general-purpose Intel microprocessors with Nvidia graphical processors with a fast NVLink interconnection to facilitate fast data flow between the two types of chips

A couple of questions:

  1. How much does it cost to outfit 64 IBM Power 8 servers to perform this magic?
  2. How many Nvidia GPUs are needed?
  3. How many Intel CPUs are needed?
  4. How much RAM is required in each server?
  5. How much time does it require to configure, tune, and deploy the set up referenced in the article?

My hunch is that this set up is slightly more costly than buying a Chrome book or signing on for some Amazon cloud computing cycles. These questions, not surprisingly, are not of interest to the “real” business magazine Fortune. That’s okay. I understand that one can get only so much information from a news release, a PowerPoint deck, or a lunch? No problem.

The other thought that crossed my mind as I read the story, “Does Fortune think that IBM is the only outfit using GPUs to speed up certain types of content processing?” Ah, well, IBM is probably so sophisticated that it is working on engineering problems that other companies cannot conceive let alone tackle.

Now the second point: Content processing to generate a Watson index is a bottleneck. However, the processing is what I call a downstream bottleneck. The really big hurdle for IBM Watson is the manual work required to set up the rules which the Watson system has to follow. Compared to the data crunching, training and rule making are the giant black holes of time and complexity. Fancy Dan servers don’t get to strut their stuff until the days, weeks, months, and years of setting up the rules is completed, tuned, and updated.

Fortune Magazine obviously considers this bottleneck of zero interest. My hunch is that IBM did not explain this characteristic of IBM Watson or the Achilles’ heel of figuring out the rules. Who wants to sit in a room with subject matter experts and three or four IBM engineers talking about what’s important, what questions are asked, and what data are required.

AskJeeves demonstrated decades ago that human crafted rules are Black Diamond ski runs. IBM Watson’s approach is interesting. But what’s fascinating is the uncritical acceptance of IBM’s assertions and the lack of interest in tackling substantive questions. Maybe lunch was cut short?

Stephen E Arnold, August 16, 2017

Tidy Text the Best Way to Utilize Analytics

August 10, 2017

Even though text mining is nothing new natural language processing seems to be the hot new analytics craze. In an effort to understand the value of each, along with the difference, and (most importantly) how to use either efficiently, O’Reilly interviewed text miners, Julia Silge and David Robinson, to learn about their approach.

When asked what advice they would give those drowning in data, they replied,

…our advice is that adopting tidy data principles is an effective strategy to approach text mining problems. The tidy text format keeps one token (typically a word) in each row, and keeps each variable (such as a document or chapter) in a column. When your data is tidy, you can use a common set of tools for exploring and visualizing them. This frees you from struggling to get your data into the right format for each task and instead lets you focus on the questions you want to ask.

The due admits text mining and natural language processing overlap in many areas but both are useful tools for different issues. They regulate text mining to statistical analysis and natural language processing to the relationship between computers and language. The difference may seem minute but with data mines exploding and companies drowning in data, such advice is crucial.

Catherine Lamsfuss, August 10, 2017

Palantir Technologies: Recycling Day Old Hash

July 31, 2017

I read “Palantir: The Special Ops Tech Giant That Wields As Much Real World Power as Google.” I noticed these hot buttons here:

“Special ops” for the Seal Team 6 vibe. Check.

“Wields” for the notion of great power. Check.

“Real world.” A reminder of the here and now, not an airy fairy digital wonkiness. Check.

“Google.” Yes. Palantir as potent as the ad giant Google. Check.

That’s quite a headline.

The write up itself is another journalistic exposé of software which ingests digital information and outputs maps, reports, and visualizations. Humans index too. Like the i2 Analyst Notebook, the “magic” is mostly external. Making these Fancy Dan software systems work requires computers, of course. Humans are needed too. Trained humans are quite important, essential, in fact.

The Guardian story seems to be a book review presented as a Gladwell-like revisionist anecdote. See, for example, Done: The Secret Deals That Are Changing Our World by Jacques Peretti (Hodder & Stoughton, £20). You can buy a copy from bookshop.theguardian.com. (Online ad? Maybe?)

Read the Palantir story which stuffed my Talkwalker alert with references to the article. Quite a few bloggers are recycling the Guardian newspaper story. Buzzfeed’s coverage of the Palo Alto company evoked the same reaction. I will come back to the gaps in these analyses in a moment.

The main point of the Guardian’s July 30, 2017, story strikes me as:

Palantir tracks everyone from potential terrorist suspects to corporate fraudsters…child traffickers, and what they refer to as subversives. But it all done using prediction.

Right. Everyone! Potential terrorist suspects! And my favorite “all”. Using “prediction” no less.

Sounds scary. I am not sure the platforms work with the type of reliability that the word “all” suggests. But this is about selling books, not Palantir and similar companies’ functionality, statistical methods, or magical content processing. Confusing Hollywood with reality is easy today: At least for some folks.

Palantir licenses software to organizations. Palantir is an “it,” not a they. The company uses the lingo of its customers. Subversives is one term, but it is more suggestive in my opinion than “bad actor,” “criminal,” “suspect,” or terrorist.” I think the word “tracks” is pivotal. Palantir’s professionals, like Pathfinder, look at deer tracks and nails the beastie. I want to point out that “prediction”—partly the Bayesian, Monte Carlo, and Markovian methods pioneered by Autonomy in the mid 1990s—is indeed used for certain processes. What’s omitted is that Palantir is just one company in the content processing and search and retrieval game. I am not convinced that its systems and methods are the best ones available today. (Check out Recorded Future, a Google and In-Q-Tel funded company for some big league methods. And there are others. In my CyberOSINT book and my Dark Web Notebook I identify about two dozen companies providing similar services. Palantir is one, admittedly high profile example, of next generation information access providers.

The write up does reveal at the end of the article that the Guardian is selling Jacque Peretti’s book. That’s okay. What’s operating under the radar is a book promo that seems to be one thing but is, in the real world, a nifty book promotion.

In closing, the information presented in the write up struck me as a trifle stale. I am okay with collections of information that have been assembled to make it easy for a reader to get the gist of a system quickly. My Dark Web Notebook is a Cliff’s Notes about what one Tor executive suggests does not exist.

When I read about Palantir, I look for information about:

  • Technical innovations within Gotham and Palantir’s other “products”
  • Details about the legal dust up between i2 and Palantir regarding file formats, an issue which has some here and now relevance with the New York police department’s Palantir experience
  • Interface methods which are designed to make it easier to perform certain data analysis functions
  • Specifics about the data loading, file conversion, and pre-processing index tasks and how these impact timeliness of the information in the systems
  • Issues regarding data reconciliation when local installs lose contact with cloud resources within a unit and across units
  • Financial performance of the company as it relates to stock held by stakeholders and those who want the company to pursue an initial public offering
  • What are the specific differences among systems on offer from BAE, Textron, and others with regards to Palantir Gotham?

Each time I read about Palantir these particular items seem to be ignored. Perhaps these are not sufficiently sexy or maybe getting the information is a great deal of work? The words “hash” and “rehash” come to my mind as one way to create something that seems filling but may be empty calories. Perhaps a “real journalist” will tackle some of the dot points. That would be more interesting than a stale reference to special effects in a star vehicle.

NB. I was an adviser to i2 Group Ltd., the outfit that created the Analyst’s Notebook.

Stephen E Arnold, July 31, 2017

ArnoldIT Publishes Technical Analysis of the Bitext Deep Linguistic Analysis Platform

July 19, 2017

ArnoldIT has published “Bitext: Breakthrough Technology for Multi-Language Content Analysis.” The analysis provides the first comprehensive review of the Madrid-based company’s Deep Linguistic Analysis Platform or DLAP. Unlike most next-generation multi-language text processing methods, Bitext has crafted a platform. The document can be downloaded from the Bitext Web site via this link.

Based on information gathered by the study team, the Bitext DLAP system outputs metadata with an accuracy in the 90 percent to 95 percent range.
Most content processing systems today typically deliver metadata and rich indexing with accuracy in the 70 to 85 percent range.

According to Stephen E Arnold, publisher of Beyond Search and Managing Director of Arnold Information Technology:

“Bitext’s output accuracy establish a new benchmark for companies offering multi-language content processing system.”

The system performs in near real time, more than 15 discrete analytic processes. The system can output enhanced metadata for more than 50 languages. The structured stream provides machine learning systems with a low cost, highly accurate way to learn. Bitext’s DLAP platform integrates more than 30 separate syntactic functions. These include segmentation, tokenization (word segmentation, frequency, and disambiguation, among others. The DLAP platform analyzes more  than 15 linguistic features of content in any of the more than 50 supported languages. The system extracts entities and generates high-value data about documents, emails, social media posts, Web pages, and structured and semi-structured data.

DLAP Applications range from fraud detection to identifying nuances in streams of data; for example, the sentiment or emotion expressed in a document. Bitext’s system can output metadata and other information about processed content as a feed stream to specialized systems such as Palantir Technologies’ Gotham or IBM’s Analyst’s Notebook. Machine learning systems such as those operated by such companies as Amazon, Apple, Google, and Microsoft can “snap in” the Bitext DLAP platform.

Copies of the report are available directly from Bitext at https://info.bitext.com/multi-language-content-analysis Information about Bitext is available at www.bitext.com.

Kenny Toth, July 19, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta