CyberOSINT banner

Cyber Wizards Speak Publishes Exclusive BrightPlanet Interview with William Bushee

April 7, 2015

Cyber OSINT continues to reshape information access. Traditional keyword search has been supplanted by higher value functions. One of the keystones for systems that push “beyond search” is technology patented and commercialized by BrightPlanet.

A search on Google often returns irrelevant or stale results. How can an organization obtain access to current, in-depth information from Web sites and services not comprehensively indexed by Bing, Google, ISeek, or Yandex?

The answer to the question is to turn to the leader in content harvesting, BrightPlanet. The company was one of the first, if not the first, to develop systems and methods for indexing information ignored by Web indexes which follow links. Founded in 2001, BrightPlanet has emerged as a content processing firm able to make accessible structured and unstructured data ignored, skipped, or not indexed by Bing, Google, and Yandex.

In the BrightPlanet seminar open to law enforcement, intelligence, and security professionals, BrightPlanet said the phrase “Deep Web” is catchy but it does not explain what type of information is available to a person with a Web browser. A familiar example is querying a dynamic database, like an airline for its flight schedule. Other types of “Deep Web” content may require the user to register. Once logged into the system, users can query the content available to a registered user. A service like Bitpipe requires registration and a user name and password each time I want to pull a white paper from the Bitpipe system. BrightPlanet can handle both types of indexing tasks and many more. BrightPlanet’s technology is used by governmental agencies, businesses, and service firms to gather information pertinent to people, places, events, and other topics

In an exclusive interview, William Bushee, the chief executive officer at BrightPlanet, reveals the origins of the BrightPlanet approach. He told Cyber Wizards Speak:

I developed our initial harvest engine. At the time, little work was being done around harvesting. We filed for a number of US Patents applications for our unique systems and methods. We were awarded eight, primarily around the ability to conduct Deep Web harvesting, a term BrightPlanet coined.

The BrightPlanet system is available as a cloud service. Bushee noted:

We have migrated from an on-site license model to a SaaS [software as a service] model. However, the biggest change came after realizing we could not put our customers in charge of conducting their own harvests. We thought we could build the tools and train the customers, but it just didn’t work well at all. We now harvest content on our customers’ behalf for virtually all projects and it has made a huge difference in data quality. And, as I mentioned, we provide supporting engineering and technical services to our clients as required. Underneath, however, we are the same sharply focused, customer centric, technology operation.

The company also offers data as a service. Bushee explained:

We’ve seen many of our customers use our Data-as-a-Service model to increase revenue and customer share by adding new datasets to their current products and service offerings. These additional datasets develop new revenue streams for our customers and allow them to stay competitive maintaining existing customers and gaining new ones altogether. Our Data-as-a-Service offering saves time and money because our customers no longer have to invest development hours into maintaining data harvesting and collection projects internally. Instead, they can access our harvesting technology completely as a service.

The company has accelerated its growth through a partnering program. Bushee stated:

We have partnered with K2 Intelligence to offer a full end-to-end service to financial institutions, combining our harvest and enrichment services with additional analytic engines and K2’s existing team of analysts. Our product offering will be a service monitoring various Deep Web and Dark Web content enriched with other internal data to provide a complete early warning system for institutions.

BrightPlanet has emerged as an excellent resource to specialized content services. In addition to providing a client-defined collection of information, the firm can provide custom-tailored solutions to special content needs involving the Deep Web and specialized content services. The company has an excellent reputation among law enforcement, intelligence, and security professionals. The BrightPlanet technologies can generate a stream of real-time content to individuals, work groups, or other automated systems.

BrightPlanet has offices in Washington, DC, and can be contacted via the BrightPlanet Web site

The complete interview is available at the Cyber Wizards Speak web site at

Stephen E Arnold, April 7, 2015

Blog: Frozen site: Current site:


Useful Probability Lesson in Monte Carlo Simulations

April 6, 2015

It is no surprise that probability blogger Count Bayesie, also known as data scientist Will Kurt, likes to play with random data samples like those generated in Monte Carlo simulations. He lets us in on the fun in this useful summary, “6 Neat Tricks with Monte Carlo Simulations.” He begins:

“If there is one trick you should know about probability, it’s how to write a Monte Carlo simulation. If you can program, even just a little, you can write a Monte Carlo simulation. Most of my work is in either R or Python, these examples will all be in R since out-of-the-box R has more tools to run simulations. The basics of a Monte Carlo simulation are simply to model your problem, and then randomly simulate it until you get an answer. The best way to explain is to just run through a bunch of examples, so let’s go!”

And run through his six examples he does, starting with the ever-popular basic integration. Other tricks include approximating binomial distribution, approximating Pi, finding p-values, creating games of chance, and, of course, predicting the stock market. The examples include code snippets and graphs. Kurt encourages readers to go further:

“By now it should be clear that a few lines of R can create extremely good estimates to a whole host of problems in probability and statistics. There comes a point in problems involving probability where we are often left no other choice than to use a Monte Carlo simulation. This is just the beginning of the incredible things that can be done with some extraordinarily simple tools. It also turns out that Monte Carlo simulations are at the heart of many forms of Bayesian inference.”

See the write-up for the juicy details of the six examples. This fun and informative lesson is worth checking out.

Cynthia Murrell, April 6, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Apache Sparking Big Data

April 3, 2015

Apache Spark is an open source cluster computing framework that rivals MapReduceVenture Beat says that people did not pay that much attention to Apache Spark when it was first invented at University of California’s AMPLAB in 2011.  The article, “How An Early Bet On Apache Spark Paid Off Big” reports the big data open source supporters are adopting Apache Spark, because of its superior capabilities.

People with big data plans want systems that process real-time information at a fast pace and they want a whole lot of it done at once.  MapReduce can do this, but it was not designed for it.  It is all right for batch processing, but it is slow and much to complex to be a viable solution.

“When we saw Spark in action at the AMPLab, it was architecturally everything we hoped it would be: distributed, in-memory data processing speed at scale. We recognized we’d have to fill in holes and make it commercially viable for mainstream analytics use cases that demand fast time-to-insight on hordes of data. By partnering with AMPLab, we dug in, prototyped the solution, and added the second pillar needed for next-generation data analytics, a simple to use front-end application.”

ClearStory Data was built using Apache Spark to access data quickly, deliver key insights, and making the UI very user friendly.  People who use Apache Spark want information immediately to be utilized for profit from a variety of multiple sources.  Apache Spark might ignite the fire for the next wave of data analytics for big data.

Whitney Grace, April 3, 2015
Stephen E Arnold, Publisher of CyberOSINT at

AI Technology Poised to Spread Far and Wide

April 3, 2015

Artificial intelligence is having a moment; the second half of last year saw about half a billion dollars invested in the AI industry. Wired asks and answers, “The AI Resurgence: Why Now?” Writer Babak Hodjat observes that advances in hardware and cloud services have allowed more contenders to afford to enter the arena. Open source tools like Hadoop also help. Then there’s public perception; with the proliferation of Siri and her ilk, people are more comfortable with the whole concept of AI (Steve Wozniak aside, apparently). It seems to help that these natural-language personal assistants have a sense of humor.  Hodjat continues:

“But there’s more substance to this resurgence than the impression of intelligence that Siri’s jocularity gives its users. The recent advances in Machine Learning are truly groundbreaking. Artificial Neural Networks (deep learning computer systems that mimic the human brain) are now scaled to several tens of hidden layer nodes, increasing their abstraction power. They can be trained on tens of thousands of cores, speeding up the process of developing generalizing learning models. Other mainstream classification approaches, such as Random Forest classification, have been scaled to run on very large numbers of compute nodes, enabling the tackling of ever more ambitious problems on larger and larger data-sets (e.g.,”

The investment boom has produced a surge of start-ups offering AI solutions to companies in a wide range of industries. Organizations in fields as diverse as medicine and oil production seem eager to incorporate these tools; it remains to be seen whether the tech is a good investment for every type of enterprise. For his part, Hodjat has high hopes for its use in fraud detection, medical diagnostics, and online commerce. And for ever-improving personal assistants, of course.

Cynthia Murrell, April 3, 2015

Stephen E Arnold, Publisher of CyberOSINT at

EBay Develops Open Source Pulsar for Real Time Data Analysis

April 2, 2015

A new large-scale, real-time analytics platform has been launched in response to one huge company’s huge data needs. VentureBeat reports, “EBay Launches Pulsar, an Open-Source Tool for Quickly Taming Big Data.” EBay has made the code available under an open-source license. It seems traditional batch processing systems, like that found in the widely used open-source Hadoop, just won’t cut it for eBay. That puts them in good company; Google, Microsoft, Twitter, and LinkedIn have each also created their own stream-processing systems.

Shortly before the launch, eBay released a whitepaper on the project, “Pulsar—Real-time Analytics at Scale.” It describes the what and why behind Pulsar’s design; check it out for the technical details. The whitepaper summarizes itself:

“In this paper we have described the data and processing model for a class of problems related to user behavior analytics in real time. We describe some of the design considerations for Pulsar. Pulsar has been in production in the eBay cloud for over a year. We process hundreds of thousands of events/sec with a steady state loss of less than 0.01%. Our pipeline end to end latency is less than a hundred milliseconds measured at the 95th percentile. We have successfully operated the pipeline over this time at 99.99% availability. Several teams within eBay have successfully built solutions leveraging our platform, solving problems like in-session personalization, advertising, internet marketing, billing, business monitoring and many more.”

For updated information on Pulsar, monitor their official website at

Cynthia Murrell, April 2, 2015

Stephen E Arnold, Publisher of CyberOSINT at

A Little Lucene History

March 26, 2015

Instead of venturing to Wikipedia to learn about Lucene’s history, visit the blog and read the post, “Lucene: The Good Parts.”  After detailing how Doug Cutting created Lucene in 1999, the post describes how searching through SQL in the early 2000s was a huge task.   SQL databases are not the best when it comes to unstructured search, so developers installed Lucene to make SQL document search more reliable.  What is interesting is how much it has been adopted:

“At the time, Solr and Elasticsearch didn’t yet exist. Solr would be released in one year by the team at CNET. With that release would come a very important application of Lucene: faceted search. Elasticsearch would take another 5 years to be released. With its recent releases, it has brought another important application of Lucene to the world: aggregations. Over the last decade, the Solr and Elasticsearch packages have brought Lucene to a much wider community. Solr and Elasticsearch are now being considered alongside data stores like MongoDB and Cassandra, and people are genuinely confused by the differences.”

If you need a refresher or a brief overview of how Lucene works, related jargon, tips for using in big data projects, and a few more tricks.  Lucene might just be a java library, but it makes using databases much easier.  We have said for a while, information is only useful if you can find it easily.  Lucene made information search and retrieval much simpler and accurate.  It set the grounds for the current big data boom.

Whitney Grace, March 26, 2015
Stephen E Arnold, Publisher of CyberOSINT at

SAS Text Miner Provides Valuable Predictive Analytics

March 25, 2015

If you are searching for predictive analytics software that provides in-depth text analysis with advanced linguistic capabilities, you may want to check out “SAS Text Miner.”  Predictive Analytics Today runs down the features and what SAS Text Miner and details how it works.

It is a user-friendly software with data visualization, flexible entity options, document theme discovery, and more.

“The text analytics software provides supervised, unsupervised, and semi-supervised methods to discover previously unknown patterns in document collections.  It structures data in a numeric representation so that it can be included in advanced analytics, such as predictive analysis, data mining, and forecasting.  This version also includes insightful reports describing the results from the rule generator node, providing clarity to model training and validation results.”

SAS Text Miner includes other features that draw on automatic Boolean rule generation to categorize documents and other rules can be exported into Boolean rules.  Data sets can be made from a directory on crawled from the Web.  The visual analysis feature highlights the relationships between discovered patterns and displays them using a concept link diagram.  SAS Text Miner has received high praise as a predictive analytics software and it might be the solution your company is looking for.

Whitney Grace, March 25, 2015
Stephen E Arnold, Publisher of CyberOSINT at

Modus Operandi Gets a Big Data Storage Contract

March 24, 2015

The US Missile Defense Agency awarded Modus Operandi a huge government contract to develop an advanced data storage and retrieval system for the Ballistic Missile Defense System.  Modus Operandi specializes in big data analytic solutions for national security and commercial organizations.  Modus Operandi posted a press release on their Web site to share the news, “Modus Operandi Awarded Contract To Develop Advanced Data Storage And Retrieval System For The US Missile Defense Agency.”

The contract is a Phase I Small Business Innovation Research (SBIR), under which Modus Operandi will work on the DMDS Analytic Semantic System (BASS).  The BASS will replace the old legacy system and update it to be compliant with social media communities, the Internet, and intelligence.

“ ‘There has been a lot of work in the areas of big data and analytics across many domains, and we can now apply some of those newer technologies and techniques to traditional legacy systems such as what the MDA is using,’ said Dr. Eric Little, vice president and chief scientist, Modus Operandi. ‘This approach will provide an unprecedented set of capabilities for the MDA’s data analysts to explore enormous simulation datasets and gain a dramatically better understanding of what the data actually means.’ ”

It is worrisome that the missile defense system is relying on an old legacy system, but at least it is being upgraded now.  Modus Operandi also sales Cyber OSINT and they are applying this technology in an interesting way for the government.

Whitney Grace, March 24, 2015
Stephen E Arnold, Publisher of CyberOSINT at

SharePoint’s Evolution of Ease

March 24, 2015

At SharePoint’s beginning, users and managers viewed it as a framework. It is often still referred to as an installation, and many third party vendors do quite well offering add-on options to flesh out the solution. However, due to users’ expectations, SharePoint is shifting its focus to accommodate quick and full implementation without a lengthy build-out. Read more in the CMS Wire article, “From Build It and Go, to Ready to Go with SharePoint.”

The article sums up the transformation:

“We hunger for solutions that can be quickly acquired and implemented, not ones that require building out complex and robust solutions.  The world around us is changing fast and it’s exciting to see how productivity tools are beginning to encompass almost every area of our lives. The evolution not only impacts new tools and products, but also the tools we have been using all long. In SharePoint, we can see this in the addition of Experiences and NextGen Portals.”

SharePoint 2016 is on its way and there will be addition information to leak throughout the coming months. Keep an eye on for breaking news and the latest releases. Stephen E. Arnold has made a career out of all things search, including enterprise and SharePoint, and his dedicated SharePoint feed is a great resource for professionals who need to keep up without a huge investment in research time.

Emily Rae Aldridge, March 24, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Data and Marketing Come Together for a Story

March 23, 2015

An article on the Marketing Experiments Blog titled Digital Analytics: How To Use Data To Tell Your Marketing Story explains the primacy of the story in the world of data. The conveyance of the story, the article claims, should be a collaboration between the marketer and the analyst, with both players working together to create an engaging and data-supported story. The article suggests breaking this story into several parts, similar to the plot points you might study in a creative writing class. Exposition, Rising Action, Climax, Denouement and Resolution. The article states,

“Nate [Silver] maintained throughout his speech that marketers need to be able to tell a story with data or it is useless. In order to use your data properly, you must know what the narrative should be…I see data reporting and interpretation as an art, very similar to storytelling. However, data analysts are too often siloed. We have to understand that no one writes in a bubble, and marketing teams should understand the value and perspective data can bring to a story.”

Silver, Founder and Editor in Chief of is also quoted in the article from his talk at the Adobe Summit Digital Marketing Conference. He said, “Just because you can’t measure it, doesn’t mean it’s not important.” This is the back to the basics approach that companies need to consider.

Chelsea Kerwin, March 23, 2015

Stephen E Arnold, Publisher of CyberOSINT at

« Previous PageNext Page »