CyberOSINT banner

Machine Intelligence on One Big Poster

December 12, 2014

I need this in my office. I will dump my early 1940s French posters and go for logos.

Navigate to this link: You will be able to download a copy of an infographic (poster) that summarizes “The Current State of Machine Intelligence.” There are some interesting editorial decisions; for example, the cheery Google logo turns up in deep learning, predictive APIs, automotive, and personal assistant. I quite liked the inclusion of IBM Watson in artificial intelligence—recipes with tamarind and post-video editing game show champion. I found the listing of Palantir as one of the “intelligence tools” outfits. Three observations:

  1. I am not sure if the landscape captures what machine intelligence is
  2. The categories, while brightly colored, do not make clear how a core technology can be speech recognition but not part of the “rethinking industries” category
  3. Shouldn’t Google be in every category?

I am confident that mid tier consultants and reputation surfers like Dave Schubmehl will find the chart a source of inspiration. Does Digital Reasoning actually have a product? The company did not make the cut for the top 60 companies in NGIA systems. Hmmm. Live and learn.

Stephen E Arnold, December 12, 2014

Lexalytics Positions Semantria in Europe

December 12, 2014

Analytics outfit Lexalytics is going all-in on their European expansion. The write-up, “Lexalytics Expands International Presence: Launches Pain-Free Text Mining Customization” at Virtual-Strategy Magazine tells us that the company has boosted the language capacity of their recently acquired Semantria platform. The text-analytics and sentiment-analysis platform now includes Japanese, Arabic, Malay, and Russian in its supported-language list, which already included English, French, German, Chinese, Spanish, Portuguese, Italian, and Korean.

Lexalytics is also setting up servers in Europe. Because of upcoming changes to EU privacy law, we’re told companies will soon be prohibited from passing data into the U.S. Thanks to these new servers, European clients will be able to use Semantria’s cloud services without running afoul of the law.

Last summer, the company courted Europeans’ attention by becoming a sponsor of the 2014 Enterprise Hackathon in Prague. The press release tells us:

“All participants of the Hackathon were granted unlimited access and support to the Semantria API during the event. Nearly every team tried Semantria during the 36 hours they had to build a program that could crunch enough data to be used at the enterprise level. Redmore says, “We love innovative, quick development events, and are always looking for good events to support. Please contact us if you have a hackathon where you can use the power of our text mining solutions, and we’ll talk about hooking you up!”

Lexalytics is proud to have been the first to offer sentiment analysis, auto theme detection, and Wikipedia integration. Designed to integrate with third-party applications, their text analysis software is chugs along in the background at many data-related organizations. Founded in 2003, Lexalytics is headquartered in Amherst, Massachusetts.

Cynthia Murrell, December 12, 2014

Sponsored by, developer of Augmentext

Blast toward the Moon With Rocket Software

December 8, 2014

YouTube informational videos are great. They are short, snappy, and often help people retain more information about a product than reading the “about” page on a Web site. Rocket Software has its own channel and the video “Rocket Enterprise Search And Text Analytics” packs a lot of details into 2.49 minutes. The video is described as:

“We provide an integrated search platform for gathering, indexing, and searching both structured and unstructured data?making the information that you depend on more accessible, useful, and intelligent.”

How does Rocket Software defend that statement? The video opens with a prediction that by 2020 data usage will have increased to forty trillion gigabytes. It explains that data is the new enterprise currency and that it needs to be kept organized, then it drops into a plug for the company’s software. The compare themselves to other companies by saying Rocket Software makes the enterprise search and text analytics as simple as a download and then it will be up and running. Other enterprise searches require custom coding, but Rocket Software explains it offers these options out of the box. Plus it is a cheaper product without having to sacrifice quality.

Software usage these days is about functionality and ease of use for powerful software. Rocket Software states it offers this. Try putting it to the test.

Whitney Grace, December 08, 2014
Sponsored by, developer of Augmentext

Fake Content: SEO or Tenure Desperation

November 23, 2014

This morning I thought briefly about “Profanity Laced Academic Paper Exposes Scam Journal.” The Slashdot item comments about a journal write up filled with nonsense. The paper was accepted by the International Journal of Advanced Computer Technology. I have received requests for papers from similar outfits. I am not interested in getting on a tenure track. The notion of my paying someone to publish my writings does not resonate. I either sell my work or give it away in this blog or one of the others I have available to me.

The question in my mind ping ponged between two different ways to approach this “pay to say” situation.

First, the authors who are involved in academic pursuits: “Are these folks trying to get the prestige that comes from publishing in an academic journal?” My hunch is that the motivation is similar to the force that drives the fake data people.

Second, has the search engine optimization crowd infected otherwise semi-coherent individuals that a link—any link—is worth money?

Indexing systems have a spotty record of identifying weaponized, shaped, or distorted information. The fallback position for many vendors is that by processing large volumes of information, the outliers can be easily tagged and either ignored or disproved.

Sounds good. Does it work? Nope. The idea that open source content is “accurate” may be a false assumption. You can run queries on Bing, iSeek, Google, and Yandex for yourself. Check out information related to the Ebola epidemic or modern fighter aircraft. What’s correct? What’s hoo hah? What’s downright craziness? What’s filtered? Figuring out what to accept as close to the truth is expensive and time consuming. Not part of today’s business model in most organizations I fear.

Stephen E Arnold, November 23, 2014

SAS Releases a New Component of Enterprise Miner: SAS Text Miner

November 20, 2014

The product article for SAS Text Miner on SAS Products offers some insight into the new element of SAS Enterprise Miner. SAS acquired Teragram and that “brand” has disappeared. Some of the graphics on the Text Miner page are reminiscent of SAP Business Objects’ Inxight look. The overview explains,

“SAS Text Miner provides tools that enable you to extract information from a collection of text documents and uncover the themes and concepts that are concealed in them. In addition, you can combine quantitative variables with unstructured text and thereby incorporate text mining with other traditional data mining techniques.SAS Text Miner is a component of SAS Enterprise Miner. SAS Enterprise Miner must be installed on the same machine.”

New features and enhancements for the Text Miner include support for English and German parsing and new functionality. For more information about the Text Miner, visit the Support Community available for users to ask questions and discover the best approaches for the analysis of unstructured data. SAS was founded in 1976 after the software was created at North Carolina State University for agricultural research. As the software developed, various applications became possible, and the company gained customers in pharmaceuticals, banks and government agencies.
Chelsea Kerwin, November 20, 2014

Sponsored by, developer of Augmentext

Ah, History and the 20 Somethings

November 16, 2014

I had a conversation last week with a quite assured expert in content processing. I mentioned that I was 70 years old and would not attending a hippy dippy conference in New York. I elicited a chuckle.

I thought of this gentle dismissal of old stuff when I read “Old Scientific Papers Never Die, They Just Fade Away. Or They Used to.” The main idea of the article seems to be that “old” work can provide some useful factoids for the 20 somethings and 35 year old whiz kids who wear shirts with unclothed female on them. Couple a festive shirt with tattoo, and you have a microcosm of the specialists inventing the future.

Here’s a passage I noted:

“Our [Googlers] analysis indicates that, in 2013, 36% of citations were to articles that are at least 10 years old and that this fraction has grown 28% since 1990,” say Verstak and co. What’s more, the increase in the last ten years is twice as big as in the previous ten years, so the trend appears to be accelerating.

Quite an insight considering that much of the math used to deliver whizzy content processing is a couple of centuries old. I looked for a reference to Dr. Gene Garfield and did not notice one. Well, maybe he’s too old to be remembered. Should I send a link to the 20 something with whom I spoke? Nah, waste of time.

Stephen E Arnold, November 16, 2014

eDigital Research and Lexalytics Team Up on Real Time Text Analytics

November 11, 2014

Through the News section of their website, eDigitalResearch announces a new partnership in, “eDigitalResearch Partner with Lexalytics on Real-Time Text Analytics Solution.” The two companies are integrating Lexalytics’ Salience analysis engine into eDigital’s HUB analysis and reporting interface. The write-up tells us:

“By utilising and integrating Lexalytics Salience text analysis engine into eDigitalResearch’s own HUB system, the partnership will provide clients with a real-time, secure solution for understanding what customers are saying across the globe. Able to analyse comments from survey responses to social media – in fact any form of free text – eDigitalResearch’s HUB Text Analytics will provide the power and platform to really delve deep into customer comments, monitor what is being said and alert brands and businesses of any emerging trends to help stay ahead of the competition.”

Based in Hampshire, U.K., eDigitalResearch likes to work closely with their clients to produce the best solution for each. The company began in 1999 with the launch of the eMysteryShopper, a novel concept at the time. As of this writing, eDigitalResearch is looking to hire a developer and senior developer (in case anyone here is interested.)

Founded in 2003, Lexalytics is proud to have brought the first sentiment analysis engine to market. Designed to integrate with third-party applications, their text analysis software is chugging along in the background at many data-related companies. Lexalytics is headquartered in Amherst, Massachusetts.

Cynthia Murrell, November 11, 2014

Sponsored by, developer of Augmentext

Textio is a Promising Text Analysis Startup

November 6, 2014

Here’s an interesting development from the world of text-processing technology. GeekWire reports, “Microsoft and Amazon Vets Form Textio, a New Startup Looking to Discover Patterns in Documents.” The new company expects to release its first product next spring. Writer John Cook tells us:

“Kieran Snyder, a linguistics expert who previously worked at Amazon and Microsoft’s Bing unit, and Jensen Harris, who spent 16 years at Microsoft, including stints running the user experience team for Windows 8, have a formed a new data visualization startup by the name of Textio.

“The Seattle company’s tagline: ‘Turn business text into insights.’ The emergence of the startup was first reported by Re/code, which noted that the Textio tool could be used by companies to scour job descriptions, performance reviews and other corporate HR documents to uncover unintended discrimination. In fact, Textio was formed after Snyder conducted research on gender bias in performance reviews in the tech industry.”

That is an interesting origin, especially amid the discussions about gender that currently suffuse the tech community. Textio sees much room for improvement in text analytics, and hopes to help clients reach insights beyond those competing platforms can divine. CEO Snyder’s doctorate and experience in linguistics and cognitive science should give the young company an edge in the competitive field.

Cynthia Murrell, November 06, 2014

Sponsored by, developer of Augmentext

Altegrity Kroll: Under Financial Pressure

October 30, 2014

Most of the name surfing search experts—like the fellow who sold my content on Amazon without my permission and used my name to boot—will not recall much about Engenium. That’s no big surprise. Altegrity Kroll owns the pioneering company in the value-added indexing business. Altegrity, as you may know, is the owner of the outfit that cleared Edward Snowden for US government work.

I read “Snowden Vetter Altegrity’s Loans Plunge: Distressed Debt”. In that article I learned:

Altegrity Inc., the security firm that vetted former intelligence contractor Edward Snowden, has about six months until it runs out of money as the loss of background-check contracts negate most of a July deal with lenders to extend maturities for five years.

The article reports that “selective default” looms for the company. With the lights  flickering at a number of search and content processing firms, I hope that the Engenium technology survives. The system remains a leader in a segment which has a number of parvenus.

Stephen E Arnold, October 30, 2014

Amazon Learns from XML Adventurers

October 10, 2014

I recall learning a couple of years ago that Amazon was a great place to store big files. Some of the XML data management systems embraced the low prices and pushed forward with cloud versions of their services.

When I read “Amazon’s DynamoDB Gets Hugely Expanded Free Tier And Native JSON Support,” I formed some preliminary thoughts. The trigger was this passage in the write up:

many new NoSQL and relational databases (including Microsoft’s DocumentDB service) now use JSON-style document models. DynamoDB also allowed you to store these documents, but developers couldn’t directly work with the information stored in them. That’s changing today. With this update, developers can now use the AWS SDKs for Java, .NET, Ruby and JavaScript to easily map their JSON data to DynamoDB’s own data types. That turns DynamoDB in a fully-featured document store and is going to make life easier for many developers on the platform.

Is JSON better than XML? Is JSON easier to use than XML? Is JSON development faster than XML? Ask an XML rock star and the answer is probably, “You crazy.” I can hear the guitar riff from Joe Walsh now.

Ask a 20 year old in a university programming class, and the answer may be different. I asked the 20 something sitting in my office about XML and he snorted: “Old school, dude.” I hire only people with respect for their elders, of course.

Here are the thoughts that flashed through my 70 year old brain:

  1. Is Amazon getting ready to make a push for the customers of Oracle, MarkLogic, and other “real” database systems capable of handling XML?
  2. Will Amazon just slash prices, take the business, and make the 20 year old in my office a customer for life just because Amazon is “new school”?
  3. Will Amazon’s developer love provide the JSON fan with development tools, dashboards, features, and functions that push clunky methods like proprietary Xquery messages into a reliquary?

No answers… yet.

Stephen E Arnold, October 10, 2014


« Previous PageNext Page »