CyberOSINT banner

Shades of CrossZ: Compress Data to Speed Search

September 3, 2015

I have mentioned in my lectures a start up called CrossZ. Before whipping out your smartphone and running a predictive query on the Alphabet GOOG thing, sit tight.

CrossZ hit my radar in 1997. The concept behind the company was to compress extracted chunks of data. The method, as I recall, made use of fractal compression, which was the rage at that time. The queries were converted to fractal tokens. The system then quickly pulled out the needed data and displayed them in human readable form. The approach was called as I recall “QueryObject.” By 2002, the outfit dropped off my radar. The downside of the CrossZ approach was that the compression was asymmetric; that is, slow preparing the fractal chunk but really fast when running a query and extracting the needed data.

Flash forward to Terbium Labs, which has a patent on a method of converting data to tokens or what the firm calls “digital fingerprints.” The system matches patterns and displays high probability matches. Terbium is a high potential outfit. The firm’s methods may be a short cut for some of the Big Data matching tasks some folks in the biology lab have.

For me, the concept of reducing the size of a content chunk and then querying it to achieve faster response time is a good idea.

What do you think I thought when I read “Searching Big Data Faster”? Three notions flitter through my aged mind:

First, the idea is neither new nor revolutionary. Perhaps the MIT implementation is novel? Maybe not?

Second, the main point that “evolution is stingy with good designs” strikes me as a wild and crazy generalization. What about the genome of the octopus, gentle reader?

Third, MIT is darned eager to polish the MIT apple. This is okay as long as the whiz kids take a look at companies which used this method a couple of decades ago.

That is probably not important to anyone but me and to those who came up with the original idea, maybe before CrossZ popped out of Eastern Europe and closed a deal with a large financial services firm years ago.

Stephen E Arnold, September 3, 2015

Watson Speaks Naturally

September 3, 2015

While there are many companies that offer accurate natural language comprehension software, completely understanding the complexities of human language still eludes computers.  IBM reports that it is close to overcoming the natural language barriers with IBM Watson Content Analytics as described in “Discover And Use Real-World Terminology With IBM Watson Content Analytics.”

The tutorial points out that any analytics program that only relies on structured data loses about four fifths of information, which is a big disadvantage in the big data era, especially when insights are supposed to be hidden in the unstructured.  The Watson Content Analytics is a search and analytics platform and it uses rich-text analysis to find extract actionable insights from new sources, such as email, social media, Web content, and databases.

The Watson Content Analytics can be used in two ways:

  • “Immediately use WCA analytics views to derive quick insights from sizeable collections of contents. These views often operate on facets. Facets are significant aspects of the documents that are derived from either metadata that is already structured (for example, date, author, tags) or from concepts that are extracted from textual content.
  • Extracting entities or concepts, for use by WCA analytics view or other downstream solutions. Typical examples include mining physician or lab analysis reports to populate patient records, extracting named entities and relationships to feed investigation software, or defining a typology of sentiments that are expressed on social networks to improve statistical analysis of consumer behavior.”

The tutorial runs through a domain specific terminology application for the Watson Content Analytics.  The application gets very intensive, but it teaches how Watson Content Analytics is possibly beyond the regular big data application.

Whitney Grace, September 3, 2015
Sponsored by, publisher of the CyberOSINT monograph

Forbes Bitten by Sci-Fi Bug

September 1, 2015

The article titled Semantic Technology: Building the HAL 9000 Computer on Forbes runs with the gossip from the Smart Data Conference this year. Namely, that semantic technology has finally landed. The article examines several leaders of the field including Maana, Loop AI Labs and Blazegraph. The article mentions,

“Computers still can’t truly understand human language, but they can make sense out of certain aspects of textual content. For example, Lexalytics ( is able to perform sentiment analysis, entity extraction, and ambiguity resolution. Sentiment analysis can determine whether some text – a tweet, say, expresses a positive or negative opinion, and how strong that opinion is. Entity extraction identifies what a paragraph is actually talking about, while ambiguity resolution solves problems like the Paris Hilton one above.”

(The “Paris Hilton problem” referred to is distinguishing between the hotel and the person in semantic search.) In spite of the excitable tone of the article’s title, its conclusion is much more measured. HAL, the sentient computer from 2001: A Space Odyssey, remains in our imaginations. In spite of the exciting work being done, the article reminds us that even Watson, IBM’s supercomputer, is still without the “curiosity or reasoning skills of any two-year-old human.” For the more paranoid among us, this might be good news.

Chelsea Kerwin, September 1, 2015

Sponsored by, publisher of the CyberOSINT monograph

When Do You Snack? ADL Nails You, You Sneak

August 29, 2015

Know much about ADL or Activities of Daily Living? You can get up to speed on the sneak factor of home surveillance by flipping through “Distributional Semantics and Unsupervised Clustering for Sensor Relevancy Prediction.” Sounds pretty slick, right?

The idea is that the embedding of computing devices into your possessions provides useful data to someone. Next one can apply various analyses to make sense of the data. For example, you watch TruTV’s World Dumbest program. Then you hit the fridge. Grab a beer. Open the cupboard and snag a bag of Cheetos Crunchy Flamin’ Hot Cheese Flavored Snacks. Pick up your laptop and navigate to Lean back in your Barcalounger. Live the life.

The fun part is that predictive methods can figure out what you will do next. Good for advertisers. Good for you. Good in general.


Sound creepy? Invasive?

Hey, get with the program. The major benefit is that with these data and the outputs, the metadata, and the bits and bobs like GPS, many magical things can be crafted from passive observational data capture and analysis.

Home delivery of a Backpage solution? Entirely possible.

Just connect the data points. Predict interests. And the Backpage offers come calling.


Why worry?

Stephen E Arnold, August 29, 2015

Beyond Google, How to Work Your Search Engine

August 28, 2015

The article on Funnelback titled Five Ways to Improve Your Website Search offers tips that may seem obvious, but could always stand to be reinforced. Sometimes the Google site:<url> is not enough. The first tip, for example, is simply to be helpful. That means recognizing synonyms and perhaps adding an autocomplete function in case your site users think in different terms than you do. The worst case scenario is search is typing in a term and yielding no results, especially when the problem is just language and the thing being searched for is actually present, just not found. The article goes into the importance of the personal touch as well,

“You can use more than just the user’s search term to inform the results your search engine delivers… For example, if you search for ‘open day’ on a university website, it might be more appropriate to promote and display an ‘International Open Day’ event result to prospective international students instead of your ‘Domestic Student Open Day’ counterpart event. This change in search behavior could be determined by the user’s location – even if it wasn’t part of their original search query.”

The article also suggests learning from the search engine. Obviously, analyzing what customers are most likely to search for on your website will tell you a lot about what sort of marketing is working, and what sort of customers you are attracting. Don’t underestimate search.

Chelsea Kerwin, August 28, 2015

Sponsored by, publisher of the CyberOSINT monograph

Big Data Vendors Don’t Understand Big Data

August 27, 2015

Sit back and absorb this article’s title for a moment: big data vendors don’t understand big data.  How can IT vendors not understand one of the IT industry’s largest selling products?  According to Computing, “SAP, Oracle, and HP ‘Don’t Get’ Big Data, Claims Massive Analytic Chairman” in a very bold statement.

Executive chairman and founder of the Oscar AP platform George Frangou claims that companies that like Oracle, HP, and SAP do not know how to help their customers take advantage of their big data and are more interested in getting customers hooked into their ecosystems than providing true analytical insight.

One of the reasons Frangou says this is because his Oscar AP is more “advanced” and allows users to foretell the future with various outcomes.  The Oscar AP platform is part of the next round of big data called massive analytics.  HP, Oracle, and SAP cannot wrap their heads around massive analytics yet, because they are more concerned with selling their product.

“Because of this, Frangou said Massive Analytic is ‘quite unashamedly following a displacement strategy to displace the incumbents because they’re not getting it.’  He added that SAP HANA, Oracle Exalytics and HP Haven are essentially the same product because they’re built on the same base code.”

Frangou went on to say that big data customers are spending more money than they need to and are getting sucked into purchasing more products in order to make their big data plans work.  It appears to be a vicious cycle.  Frangou said that cloud analytics are the best option for customers and to go with SAP, although still more barriers remain getting a decent cloud analytics platform off the ground.

It does not come as surprising that big data products are falling short of their promised results.  A similar comparison would be the Windows OS falling well below expected desired performance expectations and users spending way too much time troubleshooting than getting their projects done.

Whitney Grace, August 27, 2015
Sponsored by, publisher of the CyberOSINT monograph

Elasticsearch is the Jack of All Trades at Goldman Sachs

August 25, 2015

The article titled Goldman Sachs Puts Elasticsearch to Work on Information Week discusses how programmers at Goldman Sachs are using Elasticsearch. Programmers there are working on applications to exploit both the data retrieval capabilities as well as the faculty it has for unstructured data. The article explains,

“Elasticsearch and its co-products — Logstash, Elastic’s server log data retrieval system, and Kibana, a dashboard reporting system — are written in Java and behave as core Java systems. This gives them an edge with enterprise developers who quickly recognize how to integrate them into applications. Logstash has plug-ins that draw data from the log files of 165 different information systems. It works natively with Elasticsearch and Kibana to feed them data for downstream analytics, said Elastic’s Jeff Yoshimura, global marketing leader.”

The article provides detailed examples of how Elastic is being used in legal, finance, and engineering departments within Goldman Sachs. For example, rather than hiring a “platoon of lawyers” to comb through Goldman’s legal contracts, a single software engineer was able to build a system that digitized everything and flagged contract documents that needed revision. With over 9,000 employees, Goldman currently has several thousand using Elasticsearch. The role of search has expanded, and it is important that companies recognize the many functions it can provide.

Chelsea Kerwin, August 25, 2015

Sponsored by, publisher of the CyberOSINT monograph


The Integration of  Elasticsearch and Sharepoint Adds Capabilities

August 24, 2015

The article on the IDM Blog titled BA Insight Brings Together Elasticsearch and Sharepoint describes yet another vendor embracing Elasticsearch and falling in love again with Sharepoint. The integration of Elasticsearch and Sharepoint enables customers to use Elasticsearch through Sharepoint portals. The integration also made BA Insight’s portfolio accessible through open source Elasticsearch as well as Logstash and Kibana, Elastic’s data retrieval and reporting systems, respectively. The article quotes the Director of Product Management at Elastic,

“BA Insight makes it possible for Elasticsearch and SharePoint to work seamlessly together…By enabling Elastic’s powerful real-time search and analytics capabilities in SharePoint, enterprises will be able to optimize how they use data within their applications and portals.”  “Combining Elasticsearch and SharePoint opens up a world of exciting applications for our customers, ranging from geosearch and pattern search through search on machine data, data visualization, and low-latency search,” said Jeff Fried, CTO of BA Insight.”

Specific capabilities that the integration will enable include connectors to over fifty system, auto-classification, federation to improve the presentation of results within the Sharepoint framework, applications like Smart Previews and Matter Comparison. Users also have the ability to decide for themselves whether they want to use the Sharepoint search engine or Elastic’s, or combine them and put the results together into a set. Empowering users to make the best choice for their data is at the heart of the integration.

Chelsea Kerwin, August 24, 2015

Sponsored by, publisher of the CyberOSINT monograph


Geofeedias Political Action

August 20, 2015

The presidential election is a little over a year away and potential presidential candidates are starting on their campaign trails.  The Republican and Democratic parties are heating up with the GOP debates and voters are engaging with the candidates and each other via social media.   The information posted on social media is a gold mine for the political candidates to learn about the voters’ opinions and track their approval rating.  While Twitter and Facebook data is easy to come by with Google Analytics and other software, visual mapping of the social media data is a little hard to find.

To demonstrate its product capabilities, Geofeedia took social media Instagram, fed it into its data platform, and shared the visual results in the blog post, “Instagram Map: Republican Presidential Debate.”  Geofeedia noted that while business mogul Donald Trump did not fare well during the debate nor is he in the news, he is dominating the social media feeds:

“Of all social content coming out of the Quicken Loans Center, 93% of posts were positive in sentiment. The top keywords were GOP, debate, and first, which was to be expected. Although there was no decided winner, Donald Trump scored the most headlines for a few of his memorable comments. He was, however, the winner of the social sphere. His name was mentioned in social content more than any other candidate.”

One amazing thing is that social media allows political candidates to gauge the voters’ attitudes in real time!  They can alter their answers to debate questions instantaneous to sway approval in their favor.  Another interesting thing Geofeedia’s visual data models showed is a heat map where the most social media activity took place, which happened to be centered in the major US metropolises.  The 2016 election might be the one that harnesses social media to help elect the next president.  Also Geofeedia also has excellent visual mapping tools.

Whitney Grace, August 20, 2015
Sponsored by, publisher of the CyberOSINT monograph


Stroz Friedberg Snaps Up Elysium Digital

August 20, 2015

Cybersecurity, investigation, and risk-management firm Stroz Friedberg has a made a new acquisition, we learn from their announcement, “Stroz Friedberg Acquires Technology Litigation Consulting Firm Elysium Digital” (PDF). Though details of the deal are not revealed, the write-up tells us why Elysium Digital is such a welcome addition to the company:

“Founded in 1997, Elysium Digital has worked with law firms, in-house counsel, and government agencies nationally. The firm has provided a broad range of services, including expert testimony, IP litigation consulting, eDiscovery, digital forensics investigations, and security and privacy investigations. Elysium played a role in the key technology/legal issues of its time and established itself as a premier firm providing advice and quality technical analysis in high-stakes legal matters. The firm specialized in deciphering complex technology and effectively communicating findings to clients, witnesses, judges, and juries.

“‘The people of Elysium Digital possess highly sought after technical skills that have allowed them to tackle some of the most complex IP matters in recent history. Bringing this expertise into Stroz Friedberg will allow us to more fully address the needs of our clients around the world, not just in IP litigation and digital forensics, but across our cyber practices as well,’ said Michael Patsalos-Fox, CEO of Stroz Friedberg.”

The workers of Elysium Digital will be moving into Stroz Friedberg’s Boston office, and its co-founders will continue to play an important role, we’re told. Stroz Friedberg expects the acquisition to bolster their capabilities in the areas of digital forensics, intellectual-property litigation consulting, eDiscovery, and data security.

Founded in 2000, Stroz Friedberg says their guiding principle is to “seek truth” for their clients. Headquartered in New York City, the company maintains offices throughout the U.S. as well as in London, Hong Kong, and Zurich.

Cynthia Murrell, August 20, 2015

Sponsored by, publisher of the CyberOSINT monograph

« Previous PageNext Page »