CyberOSINT banner

Big Data: Slow Down, Think

July 25, 2015

i read “Contradictions of Big Data.” Few articles which I see take a common sense approach to Big Data baloney. (Azure chip consultants bristle at my use of baloney. Too bad.) I liked this article.

The article appeared in my Overflight a day ago even though the write up was posted in March 2015. Big Data does not mean rapid data.

I highlighted this passage:

have been waging an uphill battle against the nonsensical and unsubstantiated idea that more data is better data, but now this view is getting some additional support, and from some surprising corners.

I do not agree. The yap about Big Data has almost overpowered the craziness of search engine optimization’s shouting about semantic search.

The write up points out:

Take it from me [Martyn Jones] , most businesses will not be basing their business strategies on the analysis of a glut of selfies, home videos of cute kittens, or the complete works of William Shakespeare or Dan Brown. Almost all business analysis will continue to be carried out on structured data obtained primarily from internal operational systems and external structured data providers.

The write up points out the silliness of velocity and several other slices of marketing baloney. (Make a sandwich, please.)

I found this paragraph insightful:

I have seen data scientists at work, and the word science doesn’t actually jump out and grab you. It’s difficult to make the connection, just as it is to accurately connect some popular science magazines with fundamental scientific research. If a professional and qualified statistician wants to label themselves a data scientist then I have no issue with that, it’s their problem, but I am not willing to lend credibility to the term ‘data scientist’ when it is merely an interesting job title, with at most a tenuous connection to the actual role, and one that is liberally applied, with the almost customary largesse of IT, to creative code hackers and business-averse dabblers in data.

Harsh words for those who combine an undergraduate degree minor in math with Twitter and come up with data scientist.

Hopefully other will pick up this practical approach to the sliced and processed meat wrapped in plastic and branded Big Data.

Stephen E Arnold, July 25, 2015

Plethora of Image Information

July 24, 2015

Humans are visual creatures and they learn and absorb information better when pictures accompany it.  In recent years, the graphic novel medium has gained popularity amongst all demographics.  The amount of information a picture can communicate is astounding, but unless it is looked for it can be hard to find.   It also cannot be searched by a search engine…or can it?  Synaptica is in the process of developing the “OASIS Deep Image Indexing Using Linked Data,”

OASIS is an acronym for Open Annotation Semantic Imaging System, an application that unlocks image content by giving users the ability to examine an image closer than before and highlighting data points.  OASIS is linked data application that enables parts of the image to be identified as linked data URIS, which can then be semantically indexed to controlled vocabulary lists.  It builds an interactive map of an image with its features and conceptual ideas.

“With OASIS you will be able to pan-and-zoom effortlessly through high definition images and see points of interest highlight dynamically in response to your interaction. Points of interest will be presented along with contextual links to associated images, concepts, documents and external Linked Data resources. Faceted discovery tools allow users to search and browse annotations and concepts and click through to view related images or specific features within an image. OASIS enhances the ability to communicate information with impactful visual + audio + textual complements.”

OASIS is advertised as a discovery and interactive tool that gives users the chance to fully engage with an image.  It can be applied to any field or industry, which might mean the difference between success and failure.  People want to fully immerse themselves in their data or images these days.  Being able to do so on a much richer scale is the future.

Whitney Grace, July 24, 2015

Sponsored by, publisher of the CyberOSINT monograph

What We Know About SharePoint 2016

July 23, 2015

Everyone is vying for a first look at the upcoming SharePoint 2016 release. In reality those details are just now starting to roll in, so little has been known until recently. The first true reveal came from Bill Baer at this spring’s Microsoft Ignite event. CIO distills Baer’s findings down into their article, “SharePoint 2016: What Do We Know?

The article says:

“The session on SharePoint 2016 was presented by Bill Baer, the head of SharePoint at Microsoft. This was the public’s first opportunity to learn what exactly would be in this version of the product, what sorts of changes and improvements have been made, and other things to expect as we look toward the product’s release and general availability in the first quarter of next year. Here’s what we know after streaming Baer’s full presentation.”

The article goes on to discuss cloud integration, migration, upgrades, and what all of this may point to for the future of SharePoint. In order to stay up to date on the latest news, stay tuned to, in particular the dedicated SharePoint feed. Stephen E. Arnold has made a career out of all things search, and his work on SharePoint gives interested parties a lot of information at a glance.

Emily Rae Aldridge, July 23, 2015

Sponsored by, publisher of the CyberOSINT monograph


Disable Annoying Windows Web Search

July 23, 2015

In another attempt to Apple, Microsoft allows users to search not only their computer’s hard drive, but also the Web at the same time.  This is a direct copy of Apple OS’s Spotlight Search, but unlike Apple, Windows’s increased search parameters are annoying. Windows users can disable this supposed “helpful” feature and GHacks has the directions to do it: “How To Disable Web Search In Windows 10’s Start Menu.”

Apple’s Spotlight Search does pretty much the same thing, but it categorizes results into organized categories and does not search the entire Web, only Wikipedia, iTunes, and preselected search engines.  Microsoft has the tendency to go overboard and that usually equals slow response time.  The article mentions the Windows 10 search results are also:

“I will never use the search for a couple of reasons. First, I don’t need it there as I want local files and settings to be returned exclusively when I run a search on Windows 10. Second, the suggestions are too generic most of the time and third, since a browser is open all the time on my system, I can run a search using it as well without having to add another step to the process.”

The good news is that the Web search feature can be disabled, but it is not available to all users.  Does that surprise you?  Microsoft has the tendency to release OS’s without fully fixing all the bugs.  Windows 10 appears to be better than prior releases, but little bugs like this make it annoying.

Whitney Grace, July 23, 2015

Sponsored by, publisher of the CyberOSINT monograph


A Technical Shift in Banking Security

July 23, 2015

Banks may soon transition from asking for your mother’s maiden name to tracking your physical behavior in the name of keeping you (and their assets) safe. IT ProPortal examines “Fraud Prevention: Knowledge-Based Ananlytics in Steep Decline.” Writer Lara Lackie cites a recent report from the Aite Group that indicates a shift from knowledge-based analytics to behavioral analytics for virtual security checkpoints. Apparently, “behavioral analytics” is basically biometrics without the legal implications. Lackie writes:

“Examples of behavioural analytics/biometrics can include the way someone types, holds their device or otherwise interacts with it. When combined, continuous behavioural analysis, and compiled behavioural biometric data, deliver far more intelligence than traditionally available without interrupting the user’s experience….

Julie Conroy, research director, Aite Group, said in the report “When the biometric is paired with strong device authentication, it is even more difficult to defeat. Many biometric solutions also include liveliness checks, to ensure it’s a human being on the other end.’

“NuData Security’s NuDetect online fraud engine, which uses continuous behavioural analysis and compiled behavioral biometric data, is able to predict fraud as early as 15 days before a fraud attempt is made. The early detection offered by NuDetect provides organisations the time to monitor, understand and prevent fraudulent transactions from taking place.”

The Aite report shows over half the banks surveyed plan to move away from traditional security questions over the next year, and six of the 19 institutions plan to enable mobile-banking biometrics by the end of this year. Proponents of the approach laud behavioral analytics as the height of fraud detection. Are Swype patterns and indicators of “liveliness” covered by privacy rights? That seems like a philosophical question to me.

Cynthia Murrell, July 23, 2015

Sponsored by, publisher of the CyberOSINT monograph

IBM SAP Versus SAS: A Faux Dust Up

July 22, 2015

Ah, the freebie statistics are like gnats. One or two make no difference when one is eating a chicken leg. Toss in 20,000 or more and the leg eating becomes a chore.

I read an oblique write up called “SAS UK Chief: Envious Rivals, Skills Gap and Analytics in the Cloud.” The topics are interesting because they are mixed together, a fruit salad to go with that picnic chicken.

The write up begins a statement attributed to an IBM SAP executive along the lines: “SAS could be entirely replaced.” That seems a bit of fortune telling which might not be entirely in line with some SAS users’ plans. IBM, as you may know, is fresh from 13 straight quarters of revenue decline. I interpreted the feisty comment as a signal to IBM management that the much loved SAP division is replete with machismo and doing its bit to increase revenues. There’s nothing like a statistics squabble to pump up the sales spice.

As I understand the write up, that allegedly “put ‘em up, chump” statement caused an SAS executive to flounder. SAS’s problem is that it is still a little chunk of graduate school. SAS faces competition from upstarts like Talend. SAP, on the other hand, is chasing consulting and giant IBM cloud-type things. But the two outfits are old school operations. For proof just ask a graduate student in statistics.

The reality is that both SAP and SAS may be victims of the same market shifts. In order to get either company’s products to deliver a perfect grilled chicken, one has to know about statistics and have resources (money, gentle reader).

Big companies are okay with these requirements. But the buzz in the analytics world is for open source, point and click, ready to run solutions. The outputs of these next generation systems may not meet the standards of the SAPs and the SASs of the world, but the customers don’t care.

These two firms are facing many gnats. Neither is going to have a pleasant meal. The good old days of sunshine, blue skies, and a bug free experience are gone.

Stephen E Arnold, July 22, 2015

Neural Networks and Thought Commands

July 22, 2015

If you’ve been waiting for the day you can operate a computer by thinking at it, check out “When Machine Learning Meets the Mind: BBC and Google Get Brainy” at the Inquirer. Reporter Chris Merriman brings our attention to two projects, one about hardware and one about AI, that stand at the intersection of human thought and machine. Neither venture is anywhere near fruition, but a peek at their progress gives us clues about the future.

The internet-streaming platform iPlayer is a service the BBC provides to U.K. residents who wish to catch up on their favorite programmes. In pursuit of improved accessibility, the organization’s researchers are working on a device that allows users to operate the service with their thoughts. The article tells us:

“The electroencephalography wearable that powers the technology requires lucidity of thought, but is surprisingly light. It has a sensor on the forehead, and another in the ear. You can set the headset to respond to intense concentration or meditation as the ‘fire’ button when the cursor is over the option you want.”

Apparently this operation is easier for some subjects than for others, but all users were able to work the device to some degree. Creepy or cool? Perhaps it’s both, but there’s no escaping this technology now.

As for Google’s undertaking, we’ve examined this approach before: the development of artificial neural networks. This is some exciting work for those interested in AI. Merriman writes:

“Meanwhile, a team of Google researchers has been looking more closely at artificial neural networks. In other words, false brains. The team has been training systems to classify images and better recognise speech by bombarding them with input and then adjusting the parameters to get the result they want.

But once equipped with the information, the networks can be flipped the other way and create an impressive interpretation of objects based on learned parameters, such as ‘a screw has twisty bits’ or ‘a fly has six legs’.”

This brain-in-progress still draws some chuckle-worthy and/or disturbing conclusions from images, but it is learning. No one knows what the end result of Google’s neural network research will be, but it’s sure to be significant. In a related note, the article points out that IBM is donating its machine learning platform to Apache Spark. Who knows where the open-source community will take it from here?

Cynthia Murrell, July 22, 2015

Sponsored by, publisher of the CyberOSINT monograph


Big Data Basics: Garbage In, Garbage Out Still a Problem

July 20, 2015

The person writing “Data Integrity: A Sequence of Words Lost in the World of Big Data” appears to be older than 18. I don’t hear too many young wizards nattering about data integrity. The operative concept is that with enough data, the data work out the bumps in the Big Data tapestry. The cloth may have leaves and twigs in it. But when you make the woven object big enough and hang it on a wall in a poorly illuminated chateau, who can tell. Few visitors demand a ladder and a lanthorn to inspect the handiwork.

According to the write up:

The purpose of this post is to highlight the necessity to keep data clean and orderly so that the results of the analysis are reliable and trustworthy – if data integrity is intact, information derived from this data will be trustworthy resulting in actionable information.

Why tackle this topic in a blog for Big Data professionals?

Answer: No one pays much attention. The author saddles up and does the Don Quixote gallop at the Big Data hyperbole windmill.

The article includes a partial list of questions to ask and, keep this in mind, gentle reader, to answer. One example: “Are values outside of acceptable domain values?”

I found this article refreshing. Take a gander.

Stephen E Arnold, July 20, 2015

Hadoop Rounds Up Open Source Goodies

July 17, 2015

Summer time is here and what better way to celebrate the warm weather and fun in the sun than with some fantastic open source tools.  Okay, so you probably will not take your computer to the beach, but if you have a vacation planned one of these tools might help you complete your work faster so you can get closer to that umbrella and cocktail.  Datamation has a great listicle focused on “Hadoop And Big Data: 60 Top Open Source Tools.”

Hadoop is one of the most adopted open source tool to provide big data solutions.  The Hadoop market is expected to be worth $1 billion by 2020 and IBM has dedicated 3,500 employees to develop Apache Spark, part of the Hadoop ecosystem.

As open source is a huge part of the Hadoop landscape, Datamation’s list provides invaluable information on tools that could mean the difference between a successful project and failed one.  Also they could save some extra cash on the IT budget.

“This area has a seen a lot of activity recently, with the launch of many new projects. Many of the most noteworthy projects are managed by the Apache Foundation and are closely related to Hadoop.”

Datamation has maintained this list for a while and they update it from time to time as the industry changes.  The list isn’t sorted on a comparison scale, one being the best, rather they tools are grouped into categories and a short description is given to explain what the tool does. The categories include: Hadoop-related tools, big data analysis platforms and tools, databases and data warehouses, business intelligence, data mining, big data search, programming languages, query engines, and in-memory technology.  There is a tool for nearly every sort of problem that could come up in a Hadoop environment, so the listicle is definitely worth a glance.

Whitney Grace, July 17, 2015
Sponsored by, publisher of the CyberOSINT monograph


How Not to Drive Users Away from a Website

July 15, 2015

Writer and web psychologist Liraz Margalit at the Next Web has some important advice for websites in “The Psychology Behind Web Browsing.” Apparently, paying attention to human behavioral tendencies can help webmasters avoid certain pitfalls that could damage their brands. Imagine that!

The article cites a problem an unspecified news site encountered when it tried to build interest in its videos by making them play automatically when a user navigated to their homepage. I suspect I know who they’re talking about, and I recall thinking at the time, “how rude!” I thought it was just because I didn’t want to be chastised by people near me for suddenly blaring a news video. According to Margalit, though, my problem goes much deeper: It’s an issue of control rooted in pre-history. She writes:

“The first humans had to be constantly on alert for changes in their environment, because unexpected sounds or sights meant only one thing: danger. When we click on a website hoping to read an article and instead are confronted with a loud, bright video, the automatic response is not so different from that our prehistoric ancestors, walking in the forest and stumbling upon a bear or a saber-toothed hyena.”

This need for safety has morphed into a need for control; we do not like to be startled or lost. When browsing the Web, we want to encounter what we expect to encounter (perhaps not in terms of content, but certainly in terms of format.) The name for this is the “expectation factor,” and an abrupt assault on the senses is not the only pitfall to be avoided. Getting lost in an endless scroll can also be disturbing; that’s why those floating menus, that follow you as you move down the page, were invented. Margalit  notes:

“Visitors like to think they are in charge of their actions. When a video plays without visitors initiating any interaction, they feel the opposite. If a visitor feels that a website is trying to ‘sell’ them something, or push them into viewing certain content without permission, they will resist by trying to take back the interaction and intentionally avoid that content.”

And that, of course, is the opposite of what websites want, so giving users the control they expect is a smart business move. Besides, it’s only polite to ask before engaging a visitor’s Adobe Flash or, especially, speakers.

Cynthia Murrell, July 15, 2015

Sponsored by, publisher of the CyberOSINT monograph

« Previous PageNext Page »