Sinequa and Systran Partner on Cyber Defense

May 20, 2015

Enterprise search firm Sinequa and translation tech outfit Systran are teaming up on security software. “Systran and Sinequa Combine in the Field of Cyber Defense,” announces ITRmanager.com. (The article is in French, but Google Translate is our friend.) The write-up explains:

“Sinequa and Systran have indeed decided to cooperate to develop a solution for detecting and processing of critical information in multiple languages ??and able to provide investigators with a panoramic view of a given subject. On one side Systran provides safe instant translation in over 45 languages, and the other Sinequa provides big data processing platform to analyze, categorize and retrieve relevant information in real time. The integration of the two solutions should thus facilitate the timely processing of structured and unstructured data from heterogeneous sources, internal and external (websites, audio transcripts, social media, etc.) and provide a clear and comprehensive view of a subject for investigators.”

Launched in 2002, Sinequa is a leader in the Enterprise Search field; the company boasts strong business analytics, but also emphasizes user-friendliness. Based in Paris, the firm maintains offices in Frankfurt, London, and New York City. Systran has a long history of providing innovative translation services to defense and security organizations around the world. The company’s headquarters are in Seoul, with other offices located in Daejeon, South Korea; Paris; and San Diego.

Cynthia Murrell, May 20, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Data Mining Algorithms Explained

May 18, 2015

In plain English too. Navigate to “Top 10 Data Mining Algorithms in Plain English.” When you fire up an enterprise content processing system, the algorithms beneath the user experience layer are chestnuts. Universities do a good job of teaching students about some reliable methods to perform data operations. In fact, the universities do such a good job that most content processing systems include almost the same old chestnuts in their solutions. The decision to use some or all of the top 10 data mining algorithms has some interesting consequences, but you will have to attend one of my lectures about the weaknesses of these numerical recipes to get some details.

The write up is worth a read. The article includes a link to information which underscores the ubiquitous nature of these methods. This is the Xindong Wu et all write up “Top 10 Algorithms in Data Mining.” Our research reveals that dependence on these methods is more wide spread now than they were seven years ago when the paper first appeared.

The implication then and now is that content processing systems are more alike than different. The use of similar methods means that the differences among some systems is essentially cosmetic. There is a flub in the paper. I am confident that you, gentle reader, will spot it easily.

Now to the “made simple” write up. The article explains quite clearly the what and why of 10 widely used methods. The article also identifies some of the weaknesses of each method. If there is a weakness, do you think it can be exploited? This is a question worth considering I suggest.

Example: What is a weakness of k means:

Two key weaknesses of k-means are its sensitivity to outliers, and its sensitivity to the initial choice of centroids. One final thing to keep in mind is k-means is designed to operate on continuous data — you’ll need to do some tricks to get it to work on discrete data.

Note the key word “tricks.” When one deals with math, the way to solve problems is to be clever. It follows that some of the differences among content processing systems boils down to the cleverness of the folks working on a particular implementation. Think back to your high school math class. Was there a student who just spit out an answer and then said, “It’s obvious.” Well, that’s the type of cleverness I am referencing.

The author does not dig too deeply into PageRank, but it too has some flaws. An easy way to identify one is to attend a search engine optimization conference. One flaw turbocharges these events.

My relative Vladimir Arnold, whom some of the Arnolds called Vlad the Annoyer, would have liked the paper. So do I. The write up is a keeper. Plus there is a video, perfect for the folks whose attention span is better than a goldfish’s.

Stephen E Arnold, May 18, 2015

Don’t  Fear the AI

May 14, 2015

Will intelligent machines bring about the downfall of the human race? Unlikely, says The Technium, in “Why I Don’t Worry About a Super AI.” The blogger details four specific reasons he or she is unafraid: First, AI does not seem to adhere to Moore’s law, so no Terminators anytime soon. Also, we do have the power to reprogram any uppity AI that does crop up and (reason three) it is unlikely that an AI would develop the initiative to reprogram itself, anyway. Finally, we should see managing this technology as an opportunity to clarify our own principles, instead of a path to dystopia. The blog opines:

“AI gives us the opportunity to elevate and sharpen our own ethics and morality and ambition. We smugly believe humans – all humans – have superior behavior to machines, but human ethics are sloppy, slippery, inconsistent, and often suspect. […] The clear ethical programing AIs need to follow will force us to bear down and be much clearer about why we believe what we think we believe. Under what conditions do we want to be relativistic? What specific contexts do we want the law to be contextual? Human morality is a mess of conundrums that could benefit from scrutiny, less superstition, and more evidence-based thinking. We’ll quickly find that trying to train AIs to be more humanistic will challenge us to be more humanistic. In the way that children can better their parents, the challenge of rearing AIs is an opportunity – not a horror. We should welcome it.”

Machine learning as a catalyst for philosophical progress—interesting perspective. See the post for more details behind this writer’s reasoning. Is he or she being realistic, or naïve?

Cynthia Murrell, May 14, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

The Philosophy of Semantic Search

May 13, 2015

The article Taking Advantage of Semantic Search NOW: Understanding Semiotics, Signs, & Schema on Lunametrics delves into semantics on a philosophical and linguistic level as well as in regards to business. He goes through the emergence of semantic search beginning with Ray Kurzweil’s interest in machine learning meaning as opposed to simpler keyword search. In order to fully grasp this concept, the author of the article provides a brief refresher on Saussure’s semantics.

“a Sign is comprised of a signifier, or the name of a thing, and the signified, what that thing represents… Say you sell iPad accessories. “iPad case” is your signifier, or keyword in search marketing speak. We’ve abused the signifier to the utmost over the years, stuffing it onto pages, calculating its density with text tools, jamming it into title tags, in part because we were speaking to robot who read at a 3-year-old level.”

In order to create meaning, we must go beyond even just the addition of price tag and picture to create a sign. The article suggests the need for schema, in the addition of some indication of whom and what the thing is for. The author, Michael Bartholow, has a background in linguistics and marketing and search engine optimization. His article ends with the question of when linguists, philosophers and humanists will be invited into the conversation with businesses, perhaps making him a true visionary in a field populated by data engineers with tunnel-vision.

Chelsea Kerwin, May 13, 2014

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Cloud Adoption Is Like a Lead Balloon

May 8, 2015

According to Datamation’s article, “Deflating The Cloud BI Hype Balloon” the mad, widespread adoption of enterprise cloud computing is deflating like helium out of a balloon.  While the metaphor is apt for any flash pan fad, it also should be remembered that Facebook and email were considered passing trends.  It could be said that when their “newness” wore off they would sink faster than a lead balloon, if we want to continue with the balloon metaphor.  If you are a fan of Mythbusters, however, you know that lead balloons, in fact, do float.

What the article and we are aiming here is that like the Mythbusters’ lead balloon, cloud adoption can be troublesome but it will work or float in the end.  Datamation points out that the urgency for immediate adoption has faded as security risks and integration with proprietary systems become apparent.

Howard Dresner wrote a report called “Cloud Computing And Business Intelligence” that explain his observations on enterprise cloud demand.  Dresner says that making legacy systems adaptable to the cloud will be a continuous challenge, but he stresses that some data does not belong in cloud, while some data needs to be floating about.  The challenge is making the perfect hybrid system.

He makes the same apt observation about the lead balloon:

“Dresner, who was a Gartner fellow and has 34 years in the IT industry, takes a longer-term perspective about the integration challenges.  “We have to solve the same problems we solved on premise,” he explains, and then adds that these problems “won’t persist forever in the enterprise, but they will take a while to solve.”

In other words, it takes time to assemble, but the lead balloon will keep floating around until the next big thing to replace the cloud.  Maybe it will be direct data downloads into the head.

Whitney Grace, May 8, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

How do You use Your Email?

April 28, 2015

Email is still a relatively new concept in the grander scheme of technology, having only been around since the 1990s.  As with any human activity, people want to learn more about the trends and habits people have with email.  Popular Science has an article called “Here’s What Scientists Learned In The Largest Systematic Study Of Email Habits” with a self-explanatory title.  Even though email has been around for over twenty years, no one is quite sure how people use it.

So someone decided to study email usage:

“…researchers from Yahoo Labs looked at emails of two million participants who sent more than 16 billion messages over the course of several months–by far the largest email study ever conducted. They tracked the identities of the senders and the recipients, the subject lines, when the emails were sent, the lengths of the emails, and the number of attachments. They also looked at the ages of the participants and the devices from which the emails were sent or checked.”

The results were said to be so predictable that an algorithm could have predicted them. Usage has a strong correlation to age groups and gender. The young write short, quick responses, while men are also brief in their emails.  People also responded more quickly during work hours and the more emails they receive the less likely they are to write a reply.  People might already be familiar with these trends, but the data is brand new to data scientists.  The article predicts that developers will take the data and design better email platforms.

How about creating an email platform that merges a to-do list with emails, so people don’t form their schedules and tasks from the inbox.

Whitney Grace, April 28, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

EnterpriseJungle Launches SAP-Based Enterprise Search System

April 27, 2015

A new enterprise search system startup is leveraging the SAP HANA Cloud Platform, we learn from “EnterpriseJungle Tames Enterprise Search” at SAP’s News Center. The company states that their goal is to make collaboration easier and more effective with a feature they’re calling “deep people search.” Writer Susn Galer cites EnterpriseJungle Principal James Sinclair when she tells us:

“Using advanced algorithms to analyze data from internal and external sources, including SAP Jam, SuccessFactors, wikis, and LinkedIn, the applications help companies understand the make-up of its workforce and connect people quickly….

Who Can Help Me is a pre-populated search tool allowing employees to find internal experts by skills, location, project requirements and other criteria which companies can also configure, if needed. The Enterprise Q&A tool lets employees enter any text into the search bar, and find experts internally or outside company walls. Most companies use the prepackaged EnterpriseJungle solutions as is for Human Resources (HR), recruitment, sales and other departments. However, Sinclair said companies can easily modify search queries to meet any organization’s unique needs.”

EnterpriseJungle users manage their company’s data through SAP’s Lumira dashboard. Galer shares Sinclair’s example of one company in Germany, which used EnterpriseJungle to match employees to appropriate new positions when it made a whopping 3,000 jobs obsolete. Though the software is now designed primarily for HR and data-management departments, Sinclair hopes the collaboration tool will permeate the entire enterprise.

Cynthia Murrell, April 27, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Contextual Search Recommended for Sales Pros

April 14, 2015

Sales-productivity pro Doug Winter penned “Traditional Search is Dying as Sales Organizations Make Way for “Context” for Entrepreneur. He explains how companies like Google, Apple, and Yahoo have long been developing “contextual” search, which simply means using data it has gathered about the user to deliver more relevant answers to queries, instead of relying on keywords alone. Consumers have been benefiting from this approach online for years now, and Winter says it’s time for salespeople to apply contextual search to their internal content. He writes:

“The key to how contextual search delivers on its magic is the fact that the most advanced ECM systems are, like Google’s search algorithms, much more knowledgeable about the person searching than we care to admit. What you as a sales rep see is tailored to you because when you sign in, the system knows what types of products you sell and in what geographic areas.”

“Tie in customer data from your customer relationship management (CRM) system and now the ECM knows what buying stage and industry your prospect is in. Leveraging that data, you as a rep shouldn’t then see a universe of content you have to manually sort through. Instead, according to Ring DNA, you should see just a handful of useful pieces you otherwise would have spent 30 hours a month searching for on your own.”

As long as the chosen algorithm succeeds in catching what a salesperson needs in its net, this shift could be a terrific time saver. Sales departments should do their research, however, before investing in any contextual-search tools.

Cynthia Murrell, April 14, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Predicting Plot Holes Isn’t So Easy

April 10, 2015

According to The Paris Review’s blog post “Man In Hole II: Man In Deeper Hole” Mathew Jockers created an analysis tool to predict archetypal book plots:

A rough primer: Jockers uses a tool called “sentiment analysis” to gauge “the relationship between sentiment and plot shape in fiction”; algorithms assign every word in a novel a positive or negative emotional value, and in compiling these values he’s able to graph the shifts in a story’s narrative. A lot of negative words mean something bad is happening, a lot of positive words mean something good is happening. Ultimately, he derived six archetypal plot shapes.”

Academics, however, found some problems with Jockers’s tool, such as is it possible to assign all words an emotional variance and can all plots really take basic forms?  The problem is that words are as nuanced as human emotion, perspectives change in an instant, and sentiments are subjective.  How would the tool rate sarcasm?

All stories have been broken down into seven basic plots, so why can it not be possible to do the same for book plots?  Jockers already identified six basic book plots and there are some who are curiously optimistic about his analysis tool.  It does beg the question if will staunch author’s creativity or if it will make English professors derive even more subjective meaning from Ulysses?

Whitney Grace, April 10, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Useful Probability Lesson in Monte Carlo Simulations

April 6, 2015

It is no surprise that probability blogger Count Bayesie, also known as data scientist Will Kurt, likes to play with random data samples like those generated in Monte Carlo simulations. He lets us in on the fun in this useful summary, “6 Neat Tricks with Monte Carlo Simulations.” He begins:

“If there is one trick you should know about probability, it’s how to write a Monte Carlo simulation. If you can program, even just a little, you can write a Monte Carlo simulation. Most of my work is in either R or Python, these examples will all be in R since out-of-the-box R has more tools to run simulations. The basics of a Monte Carlo simulation are simply to model your problem, and then randomly simulate it until you get an answer. The best way to explain is to just run through a bunch of examples, so let’s go!”

And run through his six examples he does, starting with the ever-popular basic integration. Other tricks include approximating binomial distribution, approximating Pi, finding p-values, creating games of chance, and, of course, predicting the stock market. The examples include code snippets and graphs. Kurt encourages readers to go further:

“By now it should be clear that a few lines of R can create extremely good estimates to a whole host of problems in probability and statistics. There comes a point in problems involving probability where we are often left no other choice than to use a Monte Carlo simulation. This is just the beginning of the incredible things that can be done with some extraordinarily simple tools. It also turns out that Monte Carlo simulations are at the heart of many forms of Bayesian inference.”

See the write-up for the juicy details of the six examples. This fun and informative lesson is worth checking out.

Cynthia Murrell, April 6, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta