CyberOSINT banner

Make Mine Mobile Search

May 21, 2015

It was only a matter of time, but Google searches on mobile phones and tablets have finally pulled ahead of desktop searches says The Register in “Peak PC: ‘Most’ Google Web Searches ‘Come From Mobiles’ In US.”   Google AdWords product management representative Jerry Dischler said that more Google searches took place on mobile devices in ten countries, including the US and Japan.  Google owns 92.22 percent of the mobile search market and 65.73 percent of desktop searches.  What do you think Google wants to do next?  They want to sell more mobile apps!

The article says that Google has not shared any of the data about the ten countries except for the US and Japan and the search differential between platforms.  Google, however, is trying to get more people to by more ads and the search engine giant is making the technology and tools available:

“Google has also introduced new tools for marketers to track their advertising performance to see where advertising clicks are coming from, and to try out new ways to draw people in. The end result, Google hopes, is to bring up the value of its mobile advertising business that’s now in the majority, allegedly.”

Mobile ads are apparently cheaper than desktop ads, so Google will get lower revenues.  What will probably happen is that as more users transition to making purchases via phones and tablets, ad revenue will increase vi mobile platforms.

Whitney Grace, May 21, 2015
Stephen E Arnold, Publisher of CyberOSINT at

Sinequa and Systran Partner on Cyber Defense

May 20, 2015

Enterprise search firm Sinequa and translation tech outfit Systran are teaming up on security software. “Systran and Sinequa Combine in the Field of Cyber Defense,” announces (The article is in French, but Google Translate is our friend.) The write-up explains:

“Sinequa and Systran have indeed decided to cooperate to develop a solution for detecting and processing of critical information in multiple languages ??and able to provide investigators with a panoramic view of a given subject. On one side Systran provides safe instant translation in over 45 languages, and the other Sinequa provides big data processing platform to analyze, categorize and retrieve relevant information in real time. The integration of the two solutions should thus facilitate the timely processing of structured and unstructured data from heterogeneous sources, internal and external (websites, audio transcripts, social media, etc.) and provide a clear and comprehensive view of a subject for investigators.”

Launched in 2002, Sinequa is a leader in the Enterprise Search field; the company boasts strong business analytics, but also emphasizes user-friendliness. Based in Paris, the firm maintains offices in Frankfurt, London, and New York City. Systran has a long history of providing innovative translation services to defense and security organizations around the world. The company’s headquarters are in Seoul, with other offices located in Daejeon, South Korea; Paris; and San Diego.

Cynthia Murrell, May 20, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Searching Bureaucracy

May 19, 2015

The rise of automatic document conversion could render vast amounts of data collected by government agencies useful. In their article, “Solving the Search Problem for Large-Scale Repositories,” GCN explains why this technology is a game-changer, and offers tips for a smooth conversion. Writer Mike Gross tells us:

“Traditional conversion methods require significant manual effort and are economically unfeasible, especially when agencies are often precluded from using offshore labor. Additionally, government conversion efforts can be restricted by  document security and the number of people that require access.     However, there have been recent advances in the technology that allow for fully automated, secure and scalable document conversion processes that make economically feasible what was considered impractical just a few years ago. In one particular case the cost of the automated process was less than one-tenth of the traditional process. Making content searchable, allowing for content to be reformatted and reorganized as needed, gives agencies tremendous opportunities to automate and improve processes, while at the same time improving workflow and providing previously unavailable metrics.”

The write-up describes several factors that could foil an attempt to implement such a system, and I suggest interested parties check out the whole article. Some examples include security and scalability, of course, as well as specialized format and delivery requirements, and non-textual elements. Gross also lists criteria to look for in a vendor; for instance, assess how well their products play with related software, like scanning and optical character recognition tools, and whether they will be able to keep up with the volumes of data at hand. If government agencies approach these automation advances with care and wisdom, instead of reflexively choosing the lowest bidder, our bureaucracies’ data systems may actually become efficient. (Hey, one can dream.)

Cynthia Murrell, May 19, 2015

Stephen E Arnold, Publisher of CyberOSINT at


Hybrid Is Essential to SharePoint 2016

May 19, 2015

It looks like SharePoint is planning to bring the cloud to its SharePoint Server 2016 users at critical points, rather than forcing them to go “all cloud.” This technique allows Microsoft to continue with the cloud-based services that they have invested in, while improving the on-premises experience that users are demanding. ZDNet covers the whole story in their article, “Microsoft’s SharePoint 2016: What’s Hybrid Got to do With It?

The article sums up the much talked about hybrid approach:

“Though it will run on top of Windows Server 2016 R2 and/or Windows Server 2016, SharePoint 2016 will include support for what Microsoft calls ‘cloud-accelerated experiences,’ meaning new hybrid scenarios . . . Instead of trying to push all SharePoint users and all SharePoint workloads to the cloud, Microsoft is acknowledging there are some reasons (compliance among them) that not all data can or should be in SharePoint Online. That said, Microsoft wants to enable its SharePoint users to get at their data wherever it’s stored.”

Stephen E. Arnold is a lifelong leader in search and a long-time expert in SharePoint. He keeps managers and users updated on the latest SharePoint news through his Web service All eyes should stay peeled for continuing developments, as users get closer to seeing a public release of SharePoint Server 2016.

Emily Rae Aldridge, May 19, 2015

Sponsored by, publisher of the CyberOSINT monograph

Open Source Conquers Proprietary Software, Really?

May 19, 2015

Open source is an attractive option for organizations wanting to design their own software as well as saving money of proprietary licenses.  ZDNet reports that “It’s An Open Source World-78 Percent of Companies Run Open Source Software”, but the adopters  do not manage their open source systems very well.  Every year Black Duck Software, an open source software logistics and legal solutions provider, and North Bridge, a seed to growth venture capital firm, run the Future of Open Source Survey.  Organizations love open source, but

“Lou Shipley, Black Duck’s CEO, said in a statement, ‘In the results this year, it has become more evident that companies need their management and governance of open source to catch up to their usage. This is critical to reducing potential security, legal, and operational risks while allowing companies to reap the full benefits OSS provides.’”

The widespread adoption is due to people thinking that open source software is easier to scale, has fewer security problems, and much faster to deploy.  Organizations, however, do not have a plan to manage open source, an automated code approval process, or have an inventory of open source components.  Even worse is that they are unaware of the security vulnerabilities.

It is great that open source is being recognized as a more viable enterprise solution, but nobody knows how to use it.

Whitney Grace, April 19, 2015
Sponsored by, publisher of the CyberOSINT monograph Preserves Online Information

May 18, 2015

Today’s information seekers use the Internet the way some of used reference books growing up. Unlike the paper tomes on our dusty bookshelves, however, websites can change their content without so much as a by-your-leave. Suggestions for preserving online information can be found in “Create Publicly Available Web Page Archives with” at

Writer Martin Brinkmann begins by listing several local options familiar to many of us. There’s Ctrl-s, of course, and assorted screenshot-saving methods. Website archivers like Httrack perform their own crawls and save the results to the user’s local machine. Remotely, automatically creates snapshots of prominent sites, but users cannot control the results. Enter Brinkmann writes: is a free service that helps you out. To use it, paste a web address into the form on the services main page and hit submit url afterwards. The service takes two snapshots of that page at that point in time and makes it available publicly. The first takes a static snapshot of the site. You find images, text and other static contents included while dynamic contents and scripts are not. The second snapshot takes a screenshot of the page instead. An option to download the data is provided. Note that this downloads the textual copy of the site only and not the screenshot. A Firefox add-on has been created for the service which may be useful to some of its users. It creates automatic snapshots of every web page that you bookmark in the web browser after installation of the add-on.”

Wow, don’t set and forget that Firefox option! In fact, the article cautions, be mindful of the public availability of every snapshot; Brinkmann reasonably suggests the tool could benefit from a password feature. Still, this could be an option to preserve important (but, for the prudent, impersonal) information found online.

Cynthia Murrell, May 18, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Developing an NLP Semantic Search

May 15, 2015

Can you imagine a natural language processing semantic search engine?  It would be a lovely tool to use in your daily routines and make research a bit easier.  If you are working on such a project and are making a progress, keep at that startup because this is lucrative field at the moment.  Over at Stack Overflow, an entrepreneuring spirit is trying to develop a “Semantic Search With NLP And Elasticsearch”:

“I am experimenting with Elasticsearch as a search server and my task is to build a “semantic” search functionality. From a short text phrase like “I have a burst pipe” the system should infer that the user is searching for a plumber and return all plumbers indexed in Elasticsearch.

Can that be done directly in a search server like Elasticsearch or do I have to use a natural language processing (NLP) tool like e.g. Maui Indexer. What is the exact terminology for my task at hand, text classification? Though the given text is very short as it is a search phrase.”

Given that this question was asked about three years ago, a lot has been done not only with Elasticsearch, but also NLP.  Search is moving towards a more organic experience, but accuracy is often muddled by different factors.  These include the quality of the technology, classification, taxonomies, ads in results, and even keywords (still!).

NLP semantic search is closer now than it was three years ago, but technology companies would invest a lot of money in a startup that can bridge the gap between natural language and machine learning.

Whitney Grace, May 15, 2015

Sponsored by, publisher of the CyberOSINT monograph

The Latest SharePoint News from Ignite

May 14, 2015

The Ignite conference in Chicago has answered many of the questions that SharePoint users have been curious about for months now. Among them was the question of release timing and features for the latest iteration of SharePoint. CMS Wire gives a rundown in their article, “What’s Up With SharePoint? #MSIgnite.”

The article sums up the biggest news:

“Microsoft will continue to enhance the core offerings in the on-premises edition. It will also continue to develop SharePoint Online and update it as quickly as the updates are available. A preview version of SharePoint 2016 will be made available later this summer, with a beta version expected by the end of the year . . . In an afternoon session entitled Evolution of SharePoint Overview and Roadmap, the duo gave a rough outline of Microsoft’s plans, albeit without precise delivery dates.”

Having had to push back delivery dates once already, Microsoft is likely hesitant to announce anything solid until development is final. As far as qualities for the new version, Microsoft is focusing on: user experience, extensibility, and SharePoint management. The inclusion of user experience should be a welcome change for many. To stay in touch with developments as they become available, keep an eye on, and particularly his feed devoted to SharePoint. Stephen E. Arnold has made a lifelong career out of all things search, and he has a knack for distilling down the “need to know” facts to keep an organization on track.

Emily Rae Aldridge, May 14, 2015

Sponsored by, publisher of the CyberOSINT monograph

Elasticsearch Transparent about Failed Jepsen Tests

May 11, 2015

The article on Aphyr titled Call Me Maybe: Elasticsearch 1.5.0 demonstrates the ongoing tendency for Elasticsearch to lose data during network partitions. The author goes through several scenarios and found that users can lose documents if nodes crash, a primary pauses, a network partitions into two intersecting components or into two discrete components. The article explains,

“My recommendations for Elasticsearch users are unchanged: store your data in a database with better safety guarantees, and continuously upsert every document from that database into Elasticsearch. If your search engine is missing a few documents for a day, it’s not a big deal; they’ll be reinserted on the next run and appear in subsequent searches. Not using Elasticsearch as a system of record also insulates you from having to worry about ES downtime during elections.”

The article praises Elasticsearch for their internal approach to documenting the problems, and especially the page they opened in September going into detail on resiliency. The page clarifies the question among users as to what it meant that the ticket closed. The page states pretty clearly that ES failed their Jepsen tests. The article exhorts other vendors to follow a similar regimen of supplying such information to users.

Chelsea Kerwin, May 11, 2014

Sponsored by, publisher of the CyberOSINT monograph

Cloud Adoption Is Like a Lead Balloon

May 8, 2015

According to Datamation’s article, “Deflating The Cloud BI Hype Balloon” the mad, widespread adoption of enterprise cloud computing is deflating like helium out of a balloon.  While the metaphor is apt for any flash pan fad, it also should be remembered that Facebook and email were considered passing trends.  It could be said that when their “newness” wore off they would sink faster than a lead balloon, if we want to continue with the balloon metaphor.  If you are a fan of Mythbusters, however, you know that lead balloons, in fact, do float.

What the article and we are aiming here is that like the Mythbusters’ lead balloon, cloud adoption can be troublesome but it will work or float in the end.  Datamation points out that the urgency for immediate adoption has faded as security risks and integration with proprietary systems become apparent.

Howard Dresner wrote a report called “Cloud Computing And Business Intelligence” that explain his observations on enterprise cloud demand.  Dresner says that making legacy systems adaptable to the cloud will be a continuous challenge, but he stresses that some data does not belong in cloud, while some data needs to be floating about.  The challenge is making the perfect hybrid system.

He makes the same apt observation about the lead balloon:

“Dresner, who was a Gartner fellow and has 34 years in the IT industry, takes a longer-term perspective about the integration challenges.  “We have to solve the same problems we solved on premise,” he explains, and then adds that these problems “won’t persist forever in the enterprise, but they will take a while to solve.”

In other words, it takes time to assemble, but the lead balloon will keep floating around until the next big thing to replace the cloud.  Maybe it will be direct data downloads into the head.

Whitney Grace, May 8, 2015
Sponsored by, publisher of the CyberOSINT monograph

Next Page »