Can Factmata Do What Other Text Analytics Firms Cannot?

April 2, 2018

Consider it a sign of the times—Information Management reveals, “Twitter, Craigslist Co-Founders Back Fact-Check Startup Factmata.” Writer Jeremy Kahn reports:

“Twitter Inc. co-founder Biz Stone and Craigslist Inc. co-founder Craig Newmark are investing in London-based fact-checking startup Factmata, the company said Thursday. … Factmata aims to use artificial intelligence to help social media companies, publishers and advertising networks weed out fake news, propaganda and clickbait. The company says its technology can also help detect online bullying and hate speech.”

Particularly amid concerns about the influence of Russian-backed propaganda in U.S. and the U.K., several tech firms and other organizations have taken aim at false information online. What about Factmata has piqued the interest of leading investors? We’re informed:

“Dhruv Ghulati, Factmata’s chief executive officer, said the startup’s approach to fact-checking differs from other companies. While some companies are looking at a wide range of content, Factmata is initially focused exclusively on news. Many automated fact-checking approaches rely primarily on metadata – the information behind the scenes that describe online news items and other posts. But Factmata is using natural language processing to assess the actual words, including the logic being used, whether assertions are backed up by facts and whether those facts are attributed to reputable sources.”

Ghulati goes on to predict Facebook will be supplanted as users’ number one news source within the next decade. Apparently, we can look forward to the launch of Factmata’s own news service sometime “later this year.”

We will wait. We do want to point out that based on the information available to the Beyond Search and DarkCyber research teams, no vendor has been able to identify text which is weaponized at a high level of accuracy without the assistance of expensive, human, and vacation hungry subject matter experts.

Maybe Factmata will “mata”?

Cynthia Murrell, April 2, 2018

What Happens When Intelligence Centric Companies Serve the Commercial and Political Sectors?

March 18, 2018

Here’s a partial answer:

image

And

image

Plus

image

Years ago, certain types of companies with specific LE and intel capabilities maintained low profiles and, in general, focused on sales to government entities.

How times have changed!

In the DarkCyber video news program for March 27, 2018, I report on the Madison Avenue type marketing campaigns. These will create more opportunities for a Cambridge Analytica “activity.”

Net net: Sometimes discretion is useful.

Stephen E Arnold, March 18, 2018

Searching Video and Audio Files is Now Easier Than Ever

February 7, 2018

While text-based search has been honed to near perfection in recent years, video and audio search still lags. However, a few companies are really beginning to chip away at this problem. One that recently caught our attention was VidDistill, a company that distills YouTube videos into an indexed list.

According to their website:

vidDistill first gets the video and captions from YouTube based off of the URL the user enters. The caption text is annotated with the time in the video the text corresponds to. If manually provided captions are available, vidDistill uses those captions. If manually provided captions are not available, vidDistill tries to fall back on automatically generated captions. If no captioning of any sort is available, then vidDistill will not work.

 

Once vidDistill has the punctuated text, it uses a text summarization algorithm to identify the most important sentences of the entire transcript of the video. The text summarization algorithm compresses the text as much as the user specifies.

It was interesting and did what they claimed, however, we wish you could search for words and have it brought up in the index so users could skip directly to specific parts of a video. This technology has been done in audio, quite well. A service called Happy Scribe, which is aimed at journalists transcribing audio notes, takes an audio file and (for a small fee) transcribes it to text, which can then be searched. It’s pretty elegant and fairly accurate, depending on the audio quality. We could see VidDistill using this mentality to great success.

Patrick Roland, February 7, 2018

AI Predictions for 2018

October 11, 2017

AI just keeps gaining steam, and is positioned to be extremely influential in the year to come. KnowStartup describes “10 Artificial Intelligence (AI) Technologies that Will Rule 2018.” Writer Biplab Ghosh introduces the list:

Artificial Intelligence is changing the way we think of technology. It is radically changing the various aspects of our daily life. Companies are now significantly making investments in AI to boost their future businesses. According to a Narrative Science report, just 38% percent of the companies surveys used artificial intelligence in 2016—but by 2018, this percentage will increase to 62%. Another study performed by Forrester Research predicted an increase of 300% in investment in AI this year (2017), compared to last year. IDC estimated that the AI market will grow from $8 billion in 2016 to more than $47 billion in 2020. ‘Artificial Intelligence’ today includes a variety of technologies and tools, some time-tested, others relatively new.

We are not surprised that the top three entries are natural language generation, speech recognition, and machine learning platforms, in that order. Next are virtual agents (aka “chatbots” or “bots”), then decision management systems, AI-optimized hardware, deep learning platforms, robotic process automation, text analytics & natural language processing, and biometrics. See the write-up for details on each of these topics, including some top vendors in each space.

Cynthia Murrell, October 11, 2017

New Beyond Search Overflight Report: The Bitext Conversational Chatbot Service

September 25, 2017

Stephen E Arnold and the team at Arnold Information Technology analyzed Bitext’s Conversational Chatbot Service. The BCBS taps Bitext’s proprietary Deep Linguistic Analysis Platform to provide greater accuracy for chatbots regardless of platform.

Arnold said:

The BCBS augments chatbot platforms from Amazon, Facebook, Google, Microsoft, and IBM, among others. The system uses specific DLAP operations to understand conversational queries. Syntactic functions, semantic roles, and knowledge graph tags increase the accuracy of chatbot intent and slotting operations.

One unique engineering feature of the BCBS is that specific Bitext content processing functions can be activated to meet specific chatbot applications and use cases. DLAP supports more than 50 languages. A BCBS licensee can activate additional language support as needed. A chatbot may be designed to handle English language queries, but Spanish, Italian, and other languages can be activated with via an instruction.

Dr. Antonio Valderrabanos said:

People want devices that understand what they say and intend. BCBS (Bitext Chatbot Service) allows smart software to take the intended action. BCBS allows a chatbot to understand context and leverage deep learning, machine intelligence, and other technologies to turbo-charge chatbot platforms.

Based on ArnoldIT’s test of the BCBS, accuracy of tagging resulted in accuracy jumps as high as 70 percent. Another surprising finding was that the time required to perform content tagging decreased.

Paul Korzeniowski, a member of the ArnoldIT study team, observed:

The Bitext system handles a number of difficult content processing issues easily. Specifically, the BCBS can identify negation regardless of the structure of the user’s query. The system can understand double intent; that is, a statement which contains two or more intents. BCBS is one of the most effective content processing systems to deal correctly  with variability in human statements, instructions, and queries.

Bitext’s BCBS and DLAP solutions deliver higher accuracy, and enable more reliable sentiment analyses, and even output critical actor-action-outcome content processing. Such data are invaluable for disambiguating in Web and enterprise search applications, content processing for discovery solutions used in fraud detection and law enforcement and consumer-facing mobile applications.

Because Bitext was one of the first platform solution providers, the firm was able to identify market trends and create its unique BCBS service for major chatbot platforms. The company focuses solely on solving problems common to companies relying on machine learning and, as a result, has done a better job delivering such functionality than other firms have.

A copy of the 22 page Beyond Search Overflight analysis is available directly from Bitext at this link on the Bitext site.

Once again, Bitext has broken through the barriers that block multi-language text analysis. The company’s Deep Linguistics Analysis Platform supports more than 50 languages at a lexical level and +20 at a syntactic level and makes the company’s technology available for a wide range of applications in Big Data, Artificial Intelligence, social media analysis, text analytics,  and the new wave of products designed for voice interfaces supporting multiple languages, such as chatbots. Bitext’s breakthrough technology solves many complex language problems and integrates machine learning engines with linguistic features. Bitext’s Deep Linguistics Analysis Platform allows seamless integration with commercial, off-the-shelf content processing and text analytics systems. The innovative Bitext’s system reduces costs for processing multilingual text for government agencies and commercial enterprises worldwide. The company has offices in Madrid, Spain, and San Francisco, California. For more information, visit www.bitext.com.

Kenny Toth, September 25, 2017

IBM Watson Deep Learning: A Great Leap Forward

August 16, 2017

I read in the IBM marketing publication Fortune Magazine. Oh, sorry, I meant the independent real business news outfit Fortune, the following article: “IBM Claims Big Breakthrough in Deep Learning.” (I know the write up is objective because the headline includes the word “claims.”)

The main point is that the IBM Watson super game winning thing can now do certain computational tasks more quickly is mildly interesting. I noticed that one of our local tire discounters has a sale on a brand called Primewell. That struck me as more interesting than this IBM claim.

First, what’s the great leap forward the article touts? I highlighted this passage:

IBM says it has come up with software that can divvy those tasks among 64 servers running up to 256 processors total, and still reap huge benefits in speed. The company is making that technology available to customers using IBM Power System servers and to other techies who want to test it.

How many IBM Power 8 servers does it take to speed up Watson’s indexing? I learned:

IBM used 64 of its own Power 8 servers—each of which links both general-purpose Intel microprocessors with Nvidia graphical processors with a fast NVLink interconnection to facilitate fast data flow between the two types of chips

A couple of questions:

  1. How much does it cost to outfit 64 IBM Power 8 servers to perform this magic?
  2. How many Nvidia GPUs are needed?
  3. How many Intel CPUs are needed?
  4. How much RAM is required in each server?
  5. How much time does it require to configure, tune, and deploy the set up referenced in the article?

My hunch is that this set up is slightly more costly than buying a Chrome book or signing on for some Amazon cloud computing cycles. These questions, not surprisingly, are not of interest to the “real” business magazine Fortune. That’s okay. I understand that one can get only so much information from a news release, a PowerPoint deck, or a lunch? No problem.

The other thought that crossed my mind as I read the story, “Does Fortune think that IBM is the only outfit using GPUs to speed up certain types of content processing?” Ah, well, IBM is probably so sophisticated that it is working on engineering problems that other companies cannot conceive let alone tackle.

Now the second point: Content processing to generate a Watson index is a bottleneck. However, the processing is what I call a downstream bottleneck. The really big hurdle for IBM Watson is the manual work required to set up the rules which the Watson system has to follow. Compared to the data crunching, training and rule making are the giant black holes of time and complexity. Fancy Dan servers don’t get to strut their stuff until the days, weeks, months, and years of setting up the rules is completed, tuned, and updated.

Fortune Magazine obviously considers this bottleneck of zero interest. My hunch is that IBM did not explain this characteristic of IBM Watson or the Achilles’ heel of figuring out the rules. Who wants to sit in a room with subject matter experts and three or four IBM engineers talking about what’s important, what questions are asked, and what data are required.

AskJeeves demonstrated decades ago that human crafted rules are Black Diamond ski runs. IBM Watson’s approach is interesting. But what’s fascinating is the uncritical acceptance of IBM’s assertions and the lack of interest in tackling substantive questions. Maybe lunch was cut short?

Stephen E Arnold, August 16, 2017

Tidy Text the Best Way to Utilize Analytics

August 10, 2017

Even though text mining is nothing new natural language processing seems to be the hot new analytics craze. In an effort to understand the value of each, along with the difference, and (most importantly) how to use either efficiently, O’Reilly interviewed text miners, Julia Silge and David Robinson, to learn about their approach.

When asked what advice they would give those drowning in data, they replied,

…our advice is that adopting tidy data principles is an effective strategy to approach text mining problems. The tidy text format keeps one token (typically a word) in each row, and keeps each variable (such as a document or chapter) in a column. When your data is tidy, you can use a common set of tools for exploring and visualizing them. This frees you from struggling to get your data into the right format for each task and instead lets you focus on the questions you want to ask.

The due admits text mining and natural language processing overlap in many areas but both are useful tools for different issues. They regulate text mining to statistical analysis and natural language processing to the relationship between computers and language. The difference may seem minute but with data mines exploding and companies drowning in data, such advice is crucial.

Catherine Lamsfuss, August 10, 2017

ArnoldIT Publishes Technical Analysis of the Bitext Deep Linguistic Analysis Platform

July 19, 2017

ArnoldIT has published “Bitext: Breakthrough Technology for Multi-Language Content Analysis.” The analysis provides the first comprehensive review of the Madrid-based company’s Deep Linguistic Analysis Platform or DLAP. Unlike most next-generation multi-language text processing methods, Bitext has crafted a platform. The document can be downloaded from the Bitext Web site via this link.

Based on information gathered by the study team, the Bitext DLAP system outputs metadata with an accuracy in the 90 percent to 95 percent range.
Most content processing systems today typically deliver metadata and rich indexing with accuracy in the 70 to 85 percent range.

According to Stephen E Arnold, publisher of Beyond Search and Managing Director of Arnold Information Technology:

“Bitext’s output accuracy establish a new benchmark for companies offering multi-language content processing system.”

The system performs in near real time, more than 15 discrete analytic processes. The system can output enhanced metadata for more than 50 languages. The structured stream provides machine learning systems with a low cost, highly accurate way to learn. Bitext’s DLAP platform integrates more than 30 separate syntactic functions. These include segmentation, tokenization (word segmentation, frequency, and disambiguation, among others. The DLAP platform analyzes more  than 15 linguistic features of content in any of the more than 50 supported languages. The system extracts entities and generates high-value data about documents, emails, social media posts, Web pages, and structured and semi-structured data.

DLAP Applications range from fraud detection to identifying nuances in streams of data; for example, the sentiment or emotion expressed in a document. Bitext’s system can output metadata and other information about processed content as a feed stream to specialized systems such as Palantir Technologies’ Gotham or IBM’s Analyst’s Notebook. Machine learning systems such as those operated by such companies as Amazon, Apple, Google, and Microsoft can “snap in” the Bitext DLAP platform.

Copies of the report are available directly from Bitext at https://info.bitext.com/multi-language-content-analysis Information about Bitext is available at www.bitext.com.

Kenny Toth, July 19, 2017

Bitext and MarkLogic Join in a Strategic Partnership

June 13, 2017

Strategic partnerships are one of the best ways for companies to grow and diamond in the rough company Bitext has formed a brilliant one. According to a recent press release, “Bitext Announces Technology Partnership With MarkLogic, Bringing Leading-Edge Text Analysis To The Database Industry.” Bitext has enjoyed a number of key license deals. The company’s ability to process multi-lingual content with its deep linguistics analysis platform reduces costs and increases the speed with which machine learning systems can deliver more accurate results.

bitext logo

Both Bitext and MarkLogic are helping enterprise companies drive better outcomes and create better customer experiences. By combining their respectful technologies, the pair hopes to reduce data’s text ambiguity and produce high quality data assets for semantic search, chatbots, and machine learning systems. Bitext’s CEO and founder said:

““With Bitext’s breakthrough technology built-in, MarkLogic 9 can index and search massive volumes of multi-language data accurately and efficiently while maintaining the highest level of data availability and security. Our leading-edge text analysis technology helps MarkLogic 9 customers to reveal business-critical relationships between data,” said Dr. Antonio Valderrabanos.

Bitext is capable of conquering the most difficult language problems and creating solutions for consumer engagement, training, and sentiment analysis. Bitext’s flagship product is its Deep Linguistics Analysis Platform and Kantar, GFK, Intel, and Accenture favor it. MarkLogic used to be one of Bitext’s clients, but now they are partners and are bound to invent even more breakthrough technology. Bitext takes another step to cement its role as the operating system for machine intelligence.

Whitney Grace, June 13, 2017

Bitvore: The AI, Real Time, Custom Report Search Engine

May 16, 2017

Just when I thought information access had slumped quietly through another week, I read in the capitalist tool which you know as Forbes, the content marketing machine, this article:

This AI Search Engine Delivers Tailored Data to Companies in Real Time.

This write up struck me as more interesting than the most recent IBM Watson prime time commercial about smart software for zealous professional basketball fans or Lucidworks’ (really?) acquisition of the interface builder Twigkit. Forbes Magazine’s write up did not point out that the company seems to be channeling Palantir Technologies; for example, Jeff Curie, the president, refers to employees at Bitvorians. Take that, you Hobbits and Palanterians.

image

A Bitvore 3D data structure.

The AI, real time, custom report search engine is called Bitvore. Here in Harrod’s Creek, we recognized the combination of the computer term “bit” with a syllable from one of our favorite morphemes “vore” as in carnivore or omnivore or the vegan-sensitive herbivore.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta