Honkin' News banner

Semantify Secures Second Funding Round

August 4, 2016

Data-management firm Semantify has secured more funding, we learn from “KGC Capital Invests in Semantify, Leaders in Cognitive Discovery and Analytics” at Benzinga. The write-up tells us primary investor KGC Capital was joined by KDWC Venture Fund and Bridge Investments in making the investment, as well as by existing investors (including its founder, Vishy Dasari.) The funds from this Series A funding round will be used to address increased delivery, distribution, and packaging needs.

The press release describes Semantify’s platform:

“Semantify automates connecting information in real time from multiple silos, and empowers non-technical users to independently gain relevant, contextual, and actionable insights using a free form and friction-free query interface, across both structured and unstructured content. With Semantify, there would be no need to depend on data experts to code queries and blend, curate, index and prepare data or to replicate data in a new database. A new generation self-service enterprise Ad-hoc discovery and analytics platform, it combines natural language processing (NLP), machine learning and advanced semantic modeling capabilities, in a single seamless proprietary platform. This makes it a pioneer in democratization of independent, on demand information access to potentially hundreds of millions of users in the enterprise and e-commerce world.”

Semantify cites their “fundamentally unique” approach to developing data-management technology as the force behind their rapid deployment cycles, low maintenance needs, and lowered costs. Formerly based in Delaware, the company is moving their headquarters to Chicago (where their investors are based). Semantify was founded in 2008. The company is also hiring; their About page declares, toward the bottom: “Growing fast. We need people;” as of this writing, they are seeking database/ BI experts, QA specialists, data scientists & knowledge modelers, business analysts, program & project managers, and team leads.



Cynthia Murrell, August 4, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph



Honkin News: Beyond Search Video News Program Available Now

August 2, 2016

Honkin’ News is now online via YouTube at https://youtu.be/hf93zTSixgo. The weekly program tries to separate the giblets from the goose feathers in online search and content processing. Each program draws upon articles and opinion appearing in the Beyond Search blog.

The Beyond Search program is presented by Stephen E Arnold, who resides in rural Kentucky. The five minute programs highlights stories appearing in the daily Beyond Search blog and includes observations not appearing in the printed version of the stories. No registration is required to view the free video.

Arnold told Beyond Search:

Online search and content processing generate modest excitement. Honkin’ News comments on some of the more interesting and unusual aspects of information retrieval, natural language processing, and the activities of those working to make software understand digital content. The inaugural program highlights Verizon’s Yahoo AOL integration strategy, explores why search fails, and how manufacturing binders and fishing lures might boost an open source information access strategy.

The video is created using high tech found in the hollows of rural Kentucky; for example, eight mm black-and-white film and two coal-fired computing devices. One surprising aspect of the video is the window showing the vista outside the window of the Beyond Search facility. The pond filled with mine drainage is not visible, however.

Kenny Toth, August 2, 2016

Is IBM Vulnerable to OpenText?

July 21, 2016

I read “Hey, IBM, OpenText Is Coming for You.” The write up reports that the poobah of OpenText said that its new Magellan system is “a next generation analytics platform.” Getting from Yet another OpenText system (YOTS) to the nemesis of IBM is quite a leap.

But here’s the statement, once again from the OpenText poobah, that caught my attention:

But even more interesting than the product itself, is the bullish way in which OpenText is calling out IBM Watson. “We are going to position it directly against Watson. We’re not going to shy away from that at all,” Mark said. “We think there’s a whole class of problems that enterprises want to solve themselves and what they need is an affordable platform, one that’s open and programmable to them and accessible to them and that’s going to be Magellan. So we’re going to position ourselves and stay focused directly against Watson.”

The write up explains that OpenText Magellan is better, faster, and cheaper. I have heard that before I think. But the details are interesting.

Magellan’s software is open., Its hardware is open. Its IP is owned by the licensee. Its deployment options are “run anywhere.” It is extensible by the licensee. Its ecosystem is open. Its cost is a mere one dollar sign.

And what do you think about IBM Watson? Well, its software is closed. Its hardware is closed. Its IP ownership is not the licensee’s. Watson is extensible only by IBM Global Services. IBM’s ecosystem is closed. Best of the points, IBM’s cost is six dollar signs.

OpenText is a $2 billion a year outfit. The hapless IBM is, despite its being lost in revenue space, is about $90 billion a year.

My view is that OpenText is swinging for the marketing and conceptual fences. IBM is trying to find the secret notebook that unlocks revenues.

I would point out that Fancy Dan software designed to help executives make better decisions is plentiful. Many vendors covet this niche. There is excitement ahead. Both OpenText and IBM may find that talk about smart software flows more rapidly than sustainable revenue and healthy profits. Keep in mind the high cost of technological debt. That’s one dot point which IBM and OpenText share a common point of weakness.

Stephen E Arnold, July 21, 2106

The Watson Update

July 15, 2016

IBM invested a lot of resources, time, and finances into developing the powerful artificial intelligence computer Watson.  The company has been trying for years to justify the expense as well as make money off their invention, mostly by having Watson try every conceivable industry that could benefit from big data-from cooking to medicine.  We finally have an update on Watson says ZDNet in the article, “IBM Talks About Progress On Watson, OpenPower.”

Watson is a cognitive computer system that learns, supports natural user interfaces, values user expertise, and evolves with new information.  Evolving is the most important step, because that will allow Watson to keep gaining experience and learn.  When Watson was first developed, IBM fed it general domain knowledge, then made the Watson Discovery to find answers to specific questions.  This has been used in the medical field to digest all the information created and applying it to practice.

IBM also did this:

“Most recently IBM has been focused on making Watson available as a set of services for customers that want to build their own applications with natural question-and-answer capabilities. Today it has 32 services available on the Watson Developer Cloud hosted on its Bluemix platform-as-a-service… Now IBM is working on making Watson more human. This includes a Tone Analyzer (think of this as a sort spellchecker for tone before you send that e-mail to the boss), Emotion Analysis of text, and Personality Insights, which uses things you’ve written to assess your personality traits.”

Cognitive computing has come very far since Watson won Jeopardy.  Pretty soon the technology will be more integrated into our lives.  The bigger question is how will change society and how we live?


Whitney Grace,  July 15, 2016

There is a Louisville, Kentucky Hidden Web/Dark

Web meet up on July 26, 2016. Information is at this link: http://bit.ly/29tVKpx.

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Books about Data Mining: Some Free, Some for Fee

July 14, 2016

If you want to load up on fun beach reading, I have a suggestion for you, gentle reader. KDNuggets posted “60+ Free Books on Big Data, Data Science, Data Mining, Machine Learning, Python, R, and More.” The does contain books about data mining and a number of other subjects. You will have to read the list and figure out which titles are germane to your interests. A number of the books include a helpful Amazon link. If you click on the hyperlink you may get a registration form, a PDF of the book, or this message:


Stephen E Arnold, July 14, 2016

What Could Possibly Go Wrong?

July 13, 2016

After reading The Atlantic’s article, “Technology, The Faux Equalizer” about how technology is limited to the very wealthy and does not level the playing field.  It some ways new technology can be a nuisance to the average person trying to scratch out a living in an unfriendly economy.  Self-driving cars are one fear, but did you ever think bankers and financial advisors would have to compete with algorithms?  The International Business Times shares, “Will Financial Analysts Lose Their Jobs To Intelligent Trading Machines?”

Machine learning software can crunch numbers faster and can extrapolate more patterns than a human.  Hedge fund companies have hired data scientists, physicists, and astronomers to remove noise from data and help program the artificial intelligence software.  The article used UK-based Bridgewater Associates as an example of a financial institute making strides in automizing banking:

“Using Bridgewater as an example, Sutton told IBTimes UK: ‘If you look at their historic trading strategies, it’s been very much long-term bets around what’s happening at a macro level. They have built their entire business on having some of the best research and analytics in the industry and some of the smartest minds thinking on that.  When you combine those two things, I would definitely expect artificial intelligence to be applied to identify large-scale trades that might not be evident to an individual researcher.’”

Developing artificial intelligence for the financial sector has already drawn the attention of private companies and could lead to a 30% lose of jobs due to digitization.  It would allow financial companies a greater range of information to advise their clients on wise financial choices, but it could also mean these institutes lose talent as the analysts role was to groom more talent.

These will probably be more potential clients for IBM’s Watson.  We should all just give up now and hail our robot overlords.


Whitney Grace,  July 13, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Defending Against Java Deserialization Ransomware

July 13, 2016

What is different about the recent rash of ransomware attacks against hospitals (besides the level of callousness it takes to hold the well-being of hospital patients for ransom)? CyberWatch brings us up to date in,  “My Layman’’s Terms: The Java Deserialization Vulnerability in Current Ransomware.” Writer Cheryl Biswas begins by assuring us it is practicality, not sheer cruelty, that has hackers aiming at hospitals. Other entities, like law enforcement agencies, which rely on uninterrupted access to their systems to keep people safe are also being attacked. Oh, goody.

The problem begins with a vulnerability at the very heart of any Java-based system, the server. And here we thought open source was more secure than proprietary software. Biswas informs us:

“This [ransomware] goes after servers, so it can bring down entire networks, and doesn’t rely on the social engineering tactics to gain access.  It’s so bad US-CERT has issued this recent advisory. I’ve laid out what’s been made available on just how this new strain of ransomware works. And I’ve done it in terms to help anybody take a closer look at the middleware running in their systems currently. Because a little knowledge could be dangerous thing used to our advantage this time.”

The article goes on to cover what this strain of ransomware can do, who could be affected, and how. One key point—anything that accepts serialized Java objects could be a target, and many Java-based middleware products do not validate untrusted objects before deserialization.  See the article for more technical details, and for Biswas’ list of sources. She concludes with these recommendations:

“Needs to Happen:

“Enterprises must find all the places they use deserialized or untrusted data. Searching code alone will not be enough. Frameworks and libraries can also be exposed.

“Need to harden it against the threat.

“Removing commons collections from app servers will not be enough. Other libraries can be affected.

“Contrast Sec has a free tool for addressing issue.  Runtime Application Self-Protection RASP.  Adds code to deserialization engine to prevent exploitation.”

Organizations the world over must not put off addressing these vulnerabilities, especially ones in charge of health and safety.


Cynthia Murrell, July 13, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph


Watson Weekly: IBM Watson Service for Use in the IBM Cloud: Bluemix Paas, IBM SPSS, Watson Analytics

July 5, 2016

The article on ComputerWorld titled Review: IBM Watson Strikes Again relates the recent expansions of Watson’s cloud service portfolio, who is still most famous for winning on Jeopardy. The article beings by evoking that event from 2011, which actually only reveals a small corner of Watson’s functions. The article mentions that to win Jeopardy, Watson basically only needed to absorb Wikipedia, since 95% of the answers are article titles. New services for use in the IBM Cloud include the Bluemix Paas, IBM SPSS, and Predictive Analytics. Among the Bluemix services is this gem,

“Personality Insights derives insights from transactional and social media data…to identify psychological traits, which it returns as a tree of characteristics in JSON format. Relationship Extraction parses sentences into their components and detects relationships between the components (parts of speech and functions) through contextual analysis. The Personality Insights API is documented for Curl, Node, and Java; the demo for the API analyzes the tweets of Oprah, Lady Gaga, and King James as well as several textual passages.”

Bluemix also consists of AlchemyAPI for ftext and image content reading, Concept Expansion and Concept Insights, which offers text analysis and linking of concepts to Wikipedia topics. The article is less kind to Watson Analytics, a Web app for data analysis with ML, which the article claims “tries too hard” and is too distracting for data scientists.


Chelsea Kerwin,  July 5, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

What Is in a Name? Procedures Remain Indifferent

July 2, 2016

I read “Brexit: “Bayesian” Statistics Renamed “Laplacian” Statistics.” Years ago I worked briefly with a person who was I later learned a failed webmaster and a would-be wizard. One of that individual’s distinguishing characteristics was an outright rejection of Bayesian methods. The person thought la place was a non American English way or speaking French. I wonder where that self appointed wizard is now. Oh, well. I hope the thought leader is scrutinizing the end of Bayes.

According to the write up:

With the U.K. leaving the E.U., it’s time for “Bayesian” to exit its titular role and be replaced by “Laplacian”.

I support this completely. I assume that the ever efficient EU bureaucracy in the nifty building in Strasbourg will hop on this intellectual bandwagon.

Stephen E Arnold, July 2, 2016

Bad News for Instant Analytics Sharpies

June 28, 2016

I read “Leading Statisticians Establish Steps to Convey Statistics a Science Not Toolbox.” I think “steps” are helpful. The challenge will be to corral the escaped ponies who are making fancy analytics a point and click, drop down punch list. Who needs to understand anything. Hit the button and generate visualizations until somethings looks really super. Does anyone know a general who engages in analytic one-upmanship? Content and clarity sit in the backseat of the JLTV.

The write up is similar to teens who convince their less well liked “pals” to go on a snipe hunt. I noted this passage:

To this point, Meng [real statistics person] notes “sound statistical practices require a bit of science, engineering, and arts, and hence some general guidelines for helping practitioners to develop statistical insights and acumen are in order. No rules, simple or not, can be 100% applicable or foolproof, but that’s the very essence that I find this is a useful exercise. It reminds practitioners that good statistical practices require far more than running software or an algorithm.”

Many vendors emphasize how easy smart analytics systems are to use. The outputs are presentation ready. Checks and balances are mostly pushed to the margins of the interface.

Here are the 10 rules.

  1. Statistical Methods Should Enable Data to Answer Scientific Questions
  2. Signals Always Come with Noise
  3. Plan Ahead, Really Ahead
  4. Worry about Data Quality
  5. Statistical Analysis Is More Than a Set of Computations
  6. Keep it Simple
  7. Provide Assessments of Variability
  8. Check Your Assumptions
  9. When Possible, Replicate!
  10. Make Your Analysis Reproducible

I think I can hear the guffaws from the analytics vendors now. I have tears in my eyes when I think about “statistical methods should enable data to answer scientific questions.” I could have sold that line to Jack Benny if he were still alive and doing comedy. Scientific questions from data which no human has checked for validity. Oh, my goodness. Then reproducibility. That’s a good one too.

Stephen E Arnold, June 28, 2016

« Previous PageNext Page »