ArnoldIT Publishes Technical Analysis of the Bitext Deep Linguistic Analysis Platform

July 19, 2017

ArnoldIT has published “Bitext: Breakthrough Technology for Multi-Language Content Analysis.” The analysis provides the first comprehensive review of the Madrid-based company’s Deep Linguistic Analysis Platform or DLAP. Unlike most next-generation multi-language text processing methods, Bitext has crafted a platform. The document can be downloaded from the Bitext Web site via this link.

Based on information gathered by the study team, the Bitext DLAP system outputs metadata with an accuracy in the 90 percent to 95 percent range.
Most content processing systems today typically deliver metadata and rich indexing with accuracy in the 70 to 85 percent range.

According to Stephen E Arnold, publisher of Beyond Search and Managing Director of Arnold Information Technology:

“Bitext’s output accuracy establish a new benchmark for companies offering multi-language content processing system.”

The system performs in near real time, more than 15 discrete analytic processes. The system can output enhanced metadata for more than 50 languages. The structured stream provides machine learning systems with a low cost, highly accurate way to learn. Bitext’s DLAP platform integrates more than 30 separate syntactic functions. These include segmentation, tokenization (word segmentation, frequency, and disambiguation, among others. The DLAP platform analyzes more  than 15 linguistic features of content in any of the more than 50 supported languages. The system extracts entities and generates high-value data about documents, emails, social media posts, Web pages, and structured and semi-structured data.

DLAP Applications range from fraud detection to identifying nuances in streams of data; for example, the sentiment or emotion expressed in a document. Bitext’s system can output metadata and other information about processed content as a feed stream to specialized systems such as Palantir Technologies’ Gotham or IBM’s Analyst’s Notebook. Machine learning systems such as those operated by such companies as Amazon, Apple, Google, and Microsoft can “snap in” the Bitext DLAP platform.

Copies of the report are available directly from Bitext at Information about Bitext is available at

Kenny Toth, July 19, 2017

The New York Times Pairs up with Spotify for Subscription Gains

July 18, 2017

The article on Quartz Media titled The New York Times Thinks People Will Still Pay for News—

If Given Free Music examines the package deal with Spotify currently being offered by the Times. While subscriptions to the news publication have been on the rise thanks in large part to Donald Trump, they are still hurting. The article points out that if the news and music industries have one thing in common, it is trying to get people to pay for their services.

The two companies announced an offer… giving a free year of Spotify Premium to anyone in the US who signs up for an all-access subscription to the news publication. Premium normally costs $120 a year, and the offer slashes the price of an all-access Times subscription too—from $6.25 a week to $5 a week… While it may seem like both companies will take a hit from these discounts, the boost in new subscribers/readers will likely more than make up for it.

It is a match made on Tinder, a coupling for the new world order. Will this couple get along? As millennials seek new outlets for activism, purchasing a subscription to the Times is a few steps above posting a rant on Facebook. Throw a year of Spotify into the mix and this deal is really appealing to anyone who doesn’t consider the Times a “liberal rag.” So maybe the Donald won’t be interested, but the rest of us sure might consider paying $5/month for legitimate news and music.

Chelsea Kerwin, July 18, 2017

Hope for Improvement in Predictive Modeling

July 18, 2017

A fresh approach to predictive modeling may just improve the process exponentially. reports, “Molecular Dynamics, Machine Learning Create ‘Hyper-Predictive Computer Models.” The insight arose, and is being tested, at North Carolina State University.

The article begins by describing the incredibly complex and costly process of drug development, including computer models that predict the effects of certain chemical compounds. Such models traditionally rely on QSAR modeling and molecular docking. We learn:

Denis Fourches, assistant professor of computational chemistry, wanted to improve upon the accuracy of these QSAR models. … Fourches and Jeremy Ash, a graduate student in bioinformatics, decided to incorporate the results of molecular dynamics calculations – all-atom simulations of how a particular compound moves in the binding pocket of a protein – into prediction models based on machine learning. ‘Most models only use the two-dimensional structures of molecules,’ Fourches says. ‘But in reality, chemicals are complex three-dimensional objects that move, vibrate and have dynamic intermolecular interactions with the protein once docked in its binding site. You cannot see that if you just look at the 2-D or 3-D structure of a given molecule.’

See the article for some details about the team’s proof-of-concept study. Fourches asserts the breakthrough delivers a simulation that would previously have been built over six months in a mere three hours. That is quite an improvement! If this technique pans out, we could soon see more rapid prediction not only in pharmaceuticals but many other areas as well. Stay tuned.

Cynthia Murrell, July 18, 2017

Women in Tech Want Your Opinion on Feminism and Other Falsehoods Programmers Believe

July 14, 2017

The collection of articles on Github titled Awesome Falsehood dives into some of the strange myths and errors believed by tech gnomes and the issues that they can create. For starters, falsehoods about names. Perhaps you have encountered the tragic story of Mr. Null, who encounters a dilemma whenever inputting his last name in a web form because it often will be rejected or even crash the system.

The article explains,

This has all gotten to the point where I’ve developed a number of workarounds for times when this happens. Turning my last name into a combination of my middle name and last name, or middle initial and last name, sometimes works, but only if the website doesn’t choke on multi-word last names. My usual trick is to simply add a period to my name: “Null.” This not only gets around many “null” error blocks, it also adds a sense of finality to my birthright.

Another list expands on the falsehoods about names that programmers seem to buy into. These include cultural cluelessness about people having first names and last names that never change and are all different. Along those lines, one awesome female programmer wrote a list of falsehoods about women in tech, such as their existence revolving around a desire for a boyfriend or to complete web design tasks. (Also, mansplaining is their absolute favorite, did you know?) Another article explores falsehoods about geography, such as the mistaken notion that all places only have one official name, or even one official name per language, or one official address. While the lists may reinforce some negative stereotypes we have about programmers, they also expose the core issues that programmers must resolve to be successful and effective in their jobs.

Chelsea Kerwin, July 14, 2017

Google and Indian Government Spar over Authenticity of Google Maps

July 12, 2017

The Indian government has rejected the authenticity of maps used by popular navigation app Google Maps terming them as technically inaccurate.

Neowin in an article titled Indian Government Calls Google Maps “Inauthentic”; Asks Citizens to Use Their Solution says:

In an attack against the service, Surveyor General of India, Swarna Subba Rao said that the maps used by Google weren’t “authentic” and were “unreliable” with limited accuracy. She also stressed on how Survey of India’s own mapping data was qualitatively more accurate.

The bone of the contention seems to be Google’s inaccurate mapping of Kashmir, the northern territory disputed by Pakistan. Google was also denied permissions to map the country at street levels for Street View citing security concerns.

Considering the fact that Google has the largest user base in India, this seems to be a setback for the company. An official of the Indian government is recommending the use of their own maps for better topographical accuracy. However, the government approved maps are buggy and do not have a great interface like Google Maps.

Vishal Ingole, July 12, 2017


Wield Buzzwords with Precision

July 10, 2017

It is difficult to communicate clearly when folks don’t agree on what certain words mean. Nature attempts to clear up confusion around certain popular terms in, “Big Science Has a Buzzword Problem.” We here at Beyond Search like to call jargon words “cacaphones,” but the more traditional “buzzwords” works, too. Writer Megan Scudellari explains:

‘Moonshot’, ‘road map’, ‘initiative’ and other science-planning buzzwords have meaning, yet even some of the people who choose these terms have trouble defining them precisely. The terms might seem interchangeable, but close examination reveals a subtle hierarchy in their intentions and goals. Moonshots, for example, focus on achievable, but lofty, engineering problems. Road maps and decadal surveys (see ‘Alternate aliases’) lay out milestones and timelines or set priorities for a field. That said, many planning projects masquerade as one title while acting as another.

Strategic plans that bear these lofty names often tout big price tags and encourage collaborative undertakings…. The value of such projects is continually debated. On one hand, many argue that the coalescence of resources, organization and long-term goals that comes with large programmes is crucial to science advancement in an era of increasing data and complexity. … Big thinking and big actions have often led to success. But critics argue that buzzword projects add unnecessary layers of bureaucracy and overhead costs to doing science, reduce creativity and funding stability and often lack the basic science necessary to succeed.

In order to help planners use such terms accurately, Scudellari supplies definitions, backgrounds, and usage guidance for several common buzzwords: “moonshot,” “roadmap,” “initiative,” and “framework.” There’s even a tool to help one decide which term best applies to any given project. See the article to explore these distinctions.

Cynthia Murrell, July 10, 2017

Deleting Yourself from the Internet Too Good to Be True

July 4, 2017

Most people find themselves saddled with online accounts going back decades and would gladly delete them if they could. Some people even wish they could delete all their accounts and cease to exist online. A new service, Deseat, promises just that. According to The Next Web,

Every account it finds gets paired with an easy delete link pointing to the unsubscribe page for that service. Within in a few clicks you’re freed from it, and depending on how long you need to work through the entire list, you can be unwanted-account-free within the hour.

Theoretically, one could completely erase all trace of themselves from the all-knowing cyber web in the sky. But can it really be this easy?

Yes, eliminating outdated and unused accounts is a much-needed step in cleaning up one’s cyber identity, but we must question the validity of total elimination of one’s cyber identify in just a few clicks. Despite the website’s claim to “wipe your entire existence off the internet in a few clicks” ridding the internet of one’s cyber footprints is probably not that easy.

Catherine Lamsfuss, July 4, 2017

The Big Problems of Big Data

June 30, 2017

Companies are producing volumes of data. However, no fully functional system is able to provide actionable insights to decision makers in real time. Bayesian methods might pave the way to the solution seekers.

In an article published by PHYS and titled Advances in Bayesian Methods for Big Data, the author says:

Bayesian methods provide a principled theory for combining prior knowledge and uncertain evidence to make sophisticated inference of hidden factors and predictions.

Though the methods of data collection have improved, analyzing and presenting actionable insights in real time is still a big problem for Big Data adopters. Human intervention is required at almost every step which defies the entire purpose of an intelligent system. Hopefully, Bayesian methods can resolve these issues. Experts have been reluctant to adopt Bayesian methods owing to the fact that they are slow and are not scalable. However, with recent advancements in machine learning, the method might work.

Vishal Ingole, June 30, 2017

Maybe Trump Speak Pretty One Day

June 15, 2017

US President Donald Trump is not the most popular person in the world.  He is a cherished scapegoat for media outlets, US citizens, and other world leaders.  One favorite point of ridicule for people is his odd use of the English language.  Trump’s take on the English tongue is so confusing that translators are left scratching their heads says The Guardian in, “Trump In Translation: President’s Mangled Language Stumps Translators.”  For probably the first time in his presidency, Trump followed proper sentence structure and grammar when he withdrew the US from the Paris Accord.   While the world was in an uproar about the climate change deniers, translators were happy that they could translate his words easier.

Asian translators are especially worried about what comes out of Trump’s mouths.  Asian languages have different root languages than European ones; so direct translations of the colloquial expressions Trump favors are near impossible.

India problems translating Trump to Hindi:

‘Donald Trump is difficult to make sense of, even in English,’ said Anshuman Tiwari, editor of IndiaToday, a Hindi magazine. “His speech is unclear, and sometimes he contradicts himself or rambles or goes off on a tangent. Capturing all that confusion in writing, in Hindi, is not easy,’ he added. ‘To get around it, usually we avoid quoting Trump directly. We paraphrase what he has said because conveying those jumps in his speech, the way he talks, is very difficult. Instead, we summarise his ideas and convey his words in simple Hindi that will make sense to our readers.’

Indian translators also do Trump a favor by translating his words using the same level of the rhetoric of Indian politicians.  It makes him sound smarter than he appears to English-speakers.  Trump needs to learn to trust his speechwriters, but translators should learn they can rely on Bitext’s DLAP to supplement their work and improve local colloquialisms.

Whitney Grace, June 15, 2017


Quote to Note: Hate That Semantic Web Stuff

June 8, 2017

I read “JSON-LD and Why I Hate the Semantic Web. “

Here’s the quote I noted:

I hate the narrative of the Semantic Web because the focus has been on the wrong set of things for a long time. That community, who I have been consciously distancing myself from for a few years now, is schizophrenic in its direction. Precious time is spent in groups discussing how we can query all this Big Data that is sure to be published via RDF instead of figuring out a way of making it easy to publish that data on the Web by leveraging common practices in use today. Too much time is spent assuming a future that’s not going to unfold in the way that we expect it to. That’s not to say that TURTLE, SPARQL, and Quad stores don’t have their place, but I always struggle to point to a typical startup that has decided to base their product line on that technology (versus ones that choose MongoDB and JSON on a regular basis).

There you go.

Stephen E Arnold, June 8, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta