CyberOSINT banner

Neural Networks and Thought Commands

July 22, 2015

If you’ve been waiting for the day you can operate a computer by thinking at it, check out “When Machine Learning Meets the Mind: BBC and Google Get Brainy” at the Inquirer. Reporter Chris Merriman brings our attention to two projects, one about hardware and one about AI, that stand at the intersection of human thought and machine. Neither venture is anywhere near fruition, but a peek at their progress gives us clues about the future.

The internet-streaming platform iPlayer is a service the BBC provides to U.K. residents who wish to catch up on their favorite programmes. In pursuit of improved accessibility, the organization’s researchers are working on a device that allows users to operate the service with their thoughts. The article tells us:

“The electroencephalography wearable that powers the technology requires lucidity of thought, but is surprisingly light. It has a sensor on the forehead, and another in the ear. You can set the headset to respond to intense concentration or meditation as the ‘fire’ button when the cursor is over the option you want.”

Apparently this operation is easier for some subjects than for others, but all users were able to work the device to some degree. Creepy or cool? Perhaps it’s both, but there’s no escaping this technology now.

As for Google’s undertaking, we’ve examined this approach before: the development of artificial neural networks. This is some exciting work for those interested in AI. Merriman writes:

“Meanwhile, a team of Google researchers has been looking more closely at artificial neural networks. In other words, false brains. The team has been training systems to classify images and better recognise speech by bombarding them with input and then adjusting the parameters to get the result they want.

But once equipped with the information, the networks can be flipped the other way and create an impressive interpretation of objects based on learned parameters, such as ‘a screw has twisty bits’ or ‘a fly has six legs’.”

This brain-in-progress still draws some chuckle-worthy and/or disturbing conclusions from images, but it is learning. No one knows what the end result of Google’s neural network research will be, but it’s sure to be significant. In a related note, the article points out that IBM is donating its machine learning platform to Apache Spark. Who knows where the open-source community will take it from here?

Cynthia Murrell, July 22, 2015

Sponsored by, publisher of the CyberOSINT monograph


Short Honk: Open Semantic Search Appliance

July 17, 2015

Several people have asked me about Open Semantic Search. I sent a couple of emails to the professional identified on the DNS record as the contact point. No response yet from our inquiry emails, but this is not unusual. People are so darned busy today.

The Open Semantic Search organization is offering an open semantic search appliance. The appliance is not a box like the much loved Google Search Appliance or the Maxxcat solutions. The appliance is virtual.

The explanation of the  data enriching system is located at this link. The resources required are modest and based on the information I scanned, the open semantic search appliance is a solution to many information access woes.

I will be able to search, explore, and analyze. Give the system a whirl. We will add it to our list of tasks. We assume it will present the same exciting challenges as other Lucene/Solr solutions. The addition of semantics will add a new wrinkle or two.

If you are into semantics and open source, the system may be for you.

Stephen E Arnold, July 17, 2015

Hadoop Rounds Up Open Source Goodies

July 17, 2015

Summer time is here and what better way to celebrate the warm weather and fun in the sun than with some fantastic open source tools.  Okay, so you probably will not take your computer to the beach, but if you have a vacation planned one of these tools might help you complete your work faster so you can get closer to that umbrella and cocktail.  Datamation has a great listicle focused on “Hadoop And Big Data: 60 Top Open Source Tools.”

Hadoop is one of the most adopted open source tool to provide big data solutions.  The Hadoop market is expected to be worth $1 billion by 2020 and IBM has dedicated 3,500 employees to develop Apache Spark, part of the Hadoop ecosystem.

As open source is a huge part of the Hadoop landscape, Datamation’s list provides invaluable information on tools that could mean the difference between a successful project and failed one.  Also they could save some extra cash on the IT budget.

“This area has a seen a lot of activity recently, with the launch of many new projects. Many of the most noteworthy projects are managed by the Apache Foundation and are closely related to Hadoop.”

Datamation has maintained this list for a while and they update it from time to time as the industry changes.  The list isn’t sorted on a comparison scale, one being the best, rather they tools are grouped into categories and a short description is given to explain what the tool does. The categories include: Hadoop-related tools, big data analysis platforms and tools, databases and data warehouses, business intelligence, data mining, big data search, programming languages, query engines, and in-memory technology.  There is a tool for nearly every sort of problem that could come up in a Hadoop environment, so the listicle is definitely worth a glance.

Whitney Grace, July 17, 2015
Sponsored by, publisher of the CyberOSINT monograph


Microsoft Takes SharePoint Criticism Seriously

July 16, 2015

Organizations are reaching the point where a shift toward mobile productivity and adoption must take place; therefore, their enterprise solution must follow suit. While Office 365 adoption has soared in light of the realization, Microsoft still has work to do in order to give users the experience that they demand from a mobile and social heavy platform. ComputerWorld goes into more details with their article, “Onus on Microsoft as SharePoint and OneDrive Roadmaps Reach Crossroads.”

The article states Microsoft’s current progress and future goals:

“With the advent of SharePoint Server 2016 (public beta expected 4Q 2015, with general availability 2Q 2016), Edwards believes Microsoft is placing renewed focus on file management, content management, sites, and portals. Going forward, Redmond claims it will also continue to develop the hybrid capabilities of SharePoint, recognizing that hybrid deployments are a steady state for many large organizations, and not just a temporary position to enable migration to the cloud.”

Few users chose to adopt the opportunities offered by Office 365 and SharePoint 2013, so Microsoft has to make SharePoint Server 2016 look like a new, enticing offering worthy of being taken seriously. So far, they have done a good job of building up some hype and attention. Stephen E. Arnold is a longtime leader in search and he has been covering the news surrounding the release on Additionally, his dedicated SharePoint feed makes it easy to catch the latest news, tips, and tricks at a glance.
Emily Rae Aldridge, July 16, 2015

Sponsored by, publisher of the CyberOSINT monograph

Need Semantic Search: Lucidworks Asserts It Is the Answer by Golly

July 3, 2015

If you read this blog, you know that I comment on semantic technology every month or so. In June I pointed to an article which had been tweeted as “new stuff.” Wrong. Navigate to “Semantic Search Hoohah: Hakia”; you will learn that Hakia is a quiet outfit. Quiet as in no longer on the Web. Maybe gone?

There are other write ups in my free and for fee columns about semantic search. The theme has been consistent. My view is that semantic technology is one component in a modern cybernized system. (To learn about my use of the term cyber, navigate to

I find the promotion of search engine optimization as “semantic” amusing. I find the search service firms’ promotion of their semantic expertise amusing. I find the notion of open source outfits deep in hock to venture capitalists asserting their semantic wizardry amusing.

I don’t know if you are quite as amused as I am. Here’s an easy way to determine your semantic humor score. Navigate to this slideshare link and cruise through the 34 deck presentation made by one of Lucidworks’ search mavens. Lucidworks is a company I have followed since it fired up its jets with Marc Krellenstein on board. Dr. Krellenstein ejected in short order, and the company has consumed many venture dollars with management shifts, repositionings, and the Big Data thing.

We now have Lucidworks in the semantic search sector.

Here’s what I learned from the deck:

  1. The company has a new logo. I think this is the third or fourth.
  2. Search is about technology and language. Without Google’s predictive and personalized routines, words are indeed necessary.
  3. Buzzwords and jargon do not make semantic methods simple. Consider this statement from the deck, “Tokenization plus vector mathematics (TF/IDF) or one of its cousins—“bag of words” – Algorithmic tweaks – enhanced bag of words.” Got that, gentle reader. If not, check out “sausagization.”
  4. Lucidworks offers a “field cache.” Okay, I am not unfamiliar with caching in order to goose performance, which can be an issue with some open source search systems. But Searchdaimon, an open source search system developed in Norway, runs circles around Lucidworks. My team did the benchmark test of major open source systems. Searchdaimon was the speed champ and had other sector leading characteristics as well.)
  5. Lucidworks does the ontology thing as well. The tie up of “category nodes” and “evidence nodes” may be one reason the performance goblin noses into the story.

The problem I encountered is that the write up for the slide deck emphasized Fusion as a key component. I have been poking around the “fusion” notion as we put our new study of the Dark Web together. Fusion is a tricky problem and the US government has made fusion a priority. Keep in mind that content is more than text. There are images, videos, geocodes, cryptic tweets in Farsi, and quite a few challenging issues with making content available to a researcher or analyst.

It seems that Lucidworks has cracked a problem which continues to trouble some reasonably sophisticated folks in the content analysis business. Here’s the “evidence” that Lucidworks can do what others cannot:


This diagram shows that after a connector is available, then “pipelines proliferate.” Well, okay.

I thought the goal was to process content objects with low latency, easily, and with semantic value adds. “Lots of stages” and “index pipelines: one way query pipelines: round trip” does not compute for this addled goose.

If the Lucidworks approach makes sense to you go for it. My team and I will stick to here and now tools and open source technology which works without the semantic jargon which is pretty much incidental to the matter. We need to process more than text. CyberOSINT vendors deliver and most use open source search as a utility function. Yep, utility. Not the main event. The failure of semantic search vendors suggests that the buzzword is not the solution to marketing woes. Pop. (That’s a pre fourth of July celebratory ladyfinger.)

Stephen E Arnold, July 3, 2015

Forget Oracle. Think about Vendors of Proprietary Enterprise Search Systems.

June 14, 2015

Database revenue doom looms for Oracle. Who did not know that, Mr. BigTable and Ms. Spark? Navigate to “Oracle Sales Erode as Startups Embrace Souped-Up Free Software.” The write up makes this point:

The impact [use of proprietary software] shows up in Oracle’s sales of new software licenses, which have declined for seven straight quarters compared with the period a year earlier. New licenses made up 25 percent of total revenue in fiscal 2014, down from 28 percent a year earlier — a sign the company is becoming increasingly dependent on revenue from supporting and maintaining products at existing customers and having a harder time finding new business. Oracle reports fiscal fourth-quarter earnings next week. To blunt this, the Redwood City, California-based company is expanding efforts in cloud computing, which will let it sell packaged high-margin services to customers. That may help balance the slowdown in the basic business. It also operates an open-source database called MySQL.

The unarticulated issue is the word “startup.” Research we conducted and which was verified by various third party sources revealed in 2012 that open source software was getting more attention from Fortune 1000 companies. The reason was that these outfits had the resources to deal with the excitement open source software provides in a Blue Apron type package.

If this Bloomberg write up is correct, the startup crowd is stepping away from Microsoft software and other well known brands toward open source. One can raise prices in the Fortune 1000 arena for a short time. Then, as Thomson Reuters- and Reed Elsevier-type companies have learned, the big boys just go a different direction. Thus, the start up and mid sized market become more and more important to proprietary software vendors.

When the small folks head for the hills, where’s the growth? Price increases? Me too plays? Marketing two steps?

I don’t think so.

Ergo. Trouble ahead for Oracle, but the challenges facing the down market and up market proprietary enterprise search vendors are going to become more severe if Bloomie is on the beam.

Stephen E Arnold, June 14, 2015

Amazon and Elasticsearch

May 29, 2015

If you are curious about the utility of Elastic’s technology, you will find “Indexing Common Crawl Metadata on Amazon EMR Using Cascading and Elasticsearch” a useful article to review. The main idea is that Amazon made Elasticsearch do some circus tricks. The write up explains the approach, provides code snippets, and includes a couple of nifty graphics which help those zany Zonies figure out the implications of the data crunched. the main idea is that Elasticsearch did something use with content in everyone’s favorite magic wand Hadoop. Why didn’t Amazon use LucidWorks (Really?)? Hmm. Good question.

Stephen E Arnold, May 29, 2015

Peruse Until You Are Really Happy

May 22, 2015

Have you ever needed to quickly locate a file that you just know you made, but were unable to find it on your computer, cloud storage, tablet, smartphone, or company pool drive?  What is even worse is if your search query does not pick up on any of your keywords!  What are you supposed to do then?  VentureBeat might have the answer to your problems as explained in the article, “Peruse Is A New Natural Language Search Tool For Your Dropbox And Box Files.”  Peruse is a search tool that allows users to use their natural flow of talking to find their files and information.

Natural language querying is already a big market for business intelligence software, but it is not as common in file sharing services.  Peruse is a startup with the ability to search Dropbox and Box accounts using a regular question.  If you ask, “Where is the marketing data from last week?” The software will be able to pull the file for you without even opening the file. Right now, Peruse can only find information in spreadsheets, but the company is working on expanding the supported file types.

“The way we index these files is we actually look at them visually — it understands them in a way a person would understand them,” said [co-founder and CEO Luke Gotszling], who is showing off Peruse…”

Peruse’s goal is to change the way people use document search.  Document search has remained pretty consistent since 1995, twenty years later Gotszling is believes it is time for big change.  Gotzling is right, document search remains the same, while Web search changes everyday.

Whitney Grace, May 22, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Make Mine Mobile Search

May 21, 2015

It was only a matter of time, but Google searches on mobile phones and tablets have finally pulled ahead of desktop searches says The Register in “Peak PC: ‘Most’ Google Web Searches ‘Come From Mobiles’ In US.”   Google AdWords product management representative Jerry Dischler said that more Google searches took place on mobile devices in ten countries, including the US and Japan.  Google owns 92.22 percent of the mobile search market and 65.73 percent of desktop searches.  What do you think Google wants to do next?  They want to sell more mobile apps!

The article says that Google has not shared any of the data about the ten countries except for the US and Japan and the search differential between platforms.  Google, however, is trying to get more people to by more ads and the search engine giant is making the technology and tools available:

“Google has also introduced new tools for marketers to track their advertising performance to see where advertising clicks are coming from, and to try out new ways to draw people in. The end result, Google hopes, is to bring up the value of its mobile advertising business that’s now in the majority, allegedly.”

Mobile ads are apparently cheaper than desktop ads, so Google will get lower revenues.  What will probably happen is that as more users transition to making purchases via phones and tablets, ad revenue will increase vi mobile platforms.

Whitney Grace, May 21, 2015
Stephen E Arnold, Publisher of CyberOSINT at

Eric Schmidt On Search Ambition and Attitude at the GOOG

May 20, 2015

The article on Business Insider titled Google’s Former CEO Reveals The Complicated Search Question He Wants Google To Be Able To Answer reports on Eric Schmidt’s speech in Berlin where he mentioned the hurdles Google is yet to overcome. Obviously, Google is an incredibly ambitious company, and should never be satisfied. He spelled out one particular question he would like the search engine to be able to answer,

“Try a query like ‘show me flights under €300 for places where it’s hot in December and I can snorkel,'” Schmidt says. “That’s kind of complicated: Google needs to know about flights under €300; hot destinations in winter; and what places are near the water, with cool fish to see. That’s basically three separate searches that have to be cross-referenced to get to the right answer. Sadly, we can’t solve that for you today. But we’re working on it.”

Schmidt also argued on behalf of Google in regards to the EU investigation into Google possibly favoring its own results rather than a fair spread of companies. Schmidt claimed that Google is most interested in simplifying search for users, rather than obliging users to click around. Since Google search is admittedly ad-oriented, Schmidt’s position seems to be at least semi-accurate.

Chelsea Kerwin, May 20 , 2014

Stephen E Arnold, Publisher of CyberOSINT at


Next Page »