CyberOSINT banner

Sinequa and Systran Partner on Cyber Defense

May 20, 2015

Enterprise search firm Sinequa and translation tech outfit Systran are teaming up on security software. “Systran and Sinequa Combine in the Field of Cyber Defense,” announces (The article is in French, but Google Translate is our friend.) The write-up explains:

“Sinequa and Systran have indeed decided to cooperate to develop a solution for detecting and processing of critical information in multiple languages ??and able to provide investigators with a panoramic view of a given subject. On one side Systran provides safe instant translation in over 45 languages, and the other Sinequa provides big data processing platform to analyze, categorize and retrieve relevant information in real time. The integration of the two solutions should thus facilitate the timely processing of structured and unstructured data from heterogeneous sources, internal and external (websites, audio transcripts, social media, etc.) and provide a clear and comprehensive view of a subject for investigators.”

Launched in 2002, Sinequa is a leader in the Enterprise Search field; the company boasts strong business analytics, but also emphasizes user-friendliness. Based in Paris, the firm maintains offices in Frankfurt, London, and New York City. Systran has a long history of providing innovative translation services to defense and security organizations around the world. The company’s headquarters are in Seoul, with other offices located in Daejeon, South Korea; Paris; and San Diego.

Cynthia Murrell, May 20, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Searching Bureaucracy

May 19, 2015

The rise of automatic document conversion could render vast amounts of data collected by government agencies useful. In their article, “Solving the Search Problem for Large-Scale Repositories,” GCN explains why this technology is a game-changer, and offers tips for a smooth conversion. Writer Mike Gross tells us:

“Traditional conversion methods require significant manual effort and are economically unfeasible, especially when agencies are often precluded from using offshore labor. Additionally, government conversion efforts can be restricted by  document security and the number of people that require access.     However, there have been recent advances in the technology that allow for fully automated, secure and scalable document conversion processes that make economically feasible what was considered impractical just a few years ago. In one particular case the cost of the automated process was less than one-tenth of the traditional process. Making content searchable, allowing for content to be reformatted and reorganized as needed, gives agencies tremendous opportunities to automate and improve processes, while at the same time improving workflow and providing previously unavailable metrics.”

The write-up describes several factors that could foil an attempt to implement such a system, and I suggest interested parties check out the whole article. Some examples include security and scalability, of course, as well as specialized format and delivery requirements, and non-textual elements. Gross also lists criteria to look for in a vendor; for instance, assess how well their products play with related software, like scanning and optical character recognition tools, and whether they will be able to keep up with the volumes of data at hand. If government agencies approach these automation advances with care and wisdom, instead of reflexively choosing the lowest bidder, our bureaucracies’ data systems may actually become efficient. (Hey, one can dream.)

Cynthia Murrell, May 19, 2015

Stephen E Arnold, Publisher of CyberOSINT at


Hadoop: Its Inventor Speaks

May 18, 2015

I must have my wires crossed about Hadoop. I thought other folks were the creators of what became Hadoop. I read “Where Next for Hadoop? An Interview with Co-Creator Doug Cutting” to get my memory refreshed. (Note: you may have to register or pay to view the full text of this interview.)

According to the article Doug Cutting and mike Cafarella cooked up Hadoop in 2005. Cutting now works at Cloudera, which, according to Crunchbase, is

an enterprise software company that provides Apache Hadoop-based software and training to data-driven enterprises. –

You can find some objective analyses of the company and its technology at I use the term “objective” to mean written by mid tier consultants.

I highlighted this statement:

Hadoop is already much more versatile and user-friendly than it was in the early days and innovations such as Yarn, Impala and Spark as well as a hardening of the platform’s security have all made it more “enterprise ready” too…

To underscore the user friendliness of Hadoop I circled in high intensity pink:

Asked whether some IT people are so bowled over by the number and choice of big data tools that they neglect to think how they will use them, Cutting agrees that this can be the case, but says that as use cases grow this issue will diminish. “It’s in an early stage of maturity so that’s not unexpected, but I think over time people are going to think about the functionality you’ve got in the distribution. You could have a SQL engine for analytics queries. You’ve got a NoSQL engine for reporting queries,” he says. So are companies like Cloudera, which, thanks to support from the likes of Intel (see below) and its vast marketing budget, distracting the market from the bigger picture? “There is confusion but I think it’s mostly because people are new to it and do not have much experience,” Cutting says.

And a final snippet:

Mostly I think this mantle of open and standard is deceptive. It is neither open in that everybody’s really invited on equal terms to play, nor is it a standard. It’s a minority of people out there.”

There are other comments about Hadoop. I will leave them to you. Easy to use, not confusing, and no problems with open and standard. There are many consulting firms thrilled with Hadoop. Snap it in and dig into data. Versatile too.

Stephen E Arnold, May 18, 2015

Popular and Problematic Hadoop

May 15, 2015

We love open source on principle, and Hadoop is indeed an open-source powerhouse. However, any organization considering a Hadoop system must understand how tricky implementation can be, despite the hype. A pair of writers at GCN asks and answers the question, “What’s Holding Back Hadoop?” The brief article reports on a recent survey of data management pros by data-researcher TDWI. Reporters Troy K. Schneider and Jonathan Lutton explain:

“Hadoop — the open-source, distributed programming framework that relies on parallel processing to store and analyze both structured and unstructured data — has been the talk of big data for several years now.  And while a recent survey of IT, business intelligence and data warehousing leaders found that 60 percent will Hadoop in production by 2016, deployment remains a daunting task. TDWI — which, like GCN, is owned by 1105 Media — polled data management professionals in both the public and private sector, who reported that staff expertise and the lack of a clear business case topped their list of barriers to implementation.”

The write-up supplies a couple bar graphs of survey results, including the top obstacles to implementation and the primary benefits of going to the trouble. Strikingly, only six percent or respondents say there’s no Hadoop in their organizations’ foreseeable future. Though not covered in the GCN write-up, the full, 43-page report includes word on best practices and implementation trends; it can be downloaded here (registration required).

Cynthia Murrell, May 15, 2015

Sponsored by, publisher of the CyberOSINT monograph

The Latest SharePoint News from Ignite

May 14, 2015

The Ignite conference in Chicago has answered many of the questions that SharePoint users have been curious about for months now. Among them was the question of release timing and features for the latest iteration of SharePoint. CMS Wire gives a rundown in their article, “What’s Up With SharePoint? #MSIgnite.”

The article sums up the biggest news:

“Microsoft will continue to enhance the core offerings in the on-premises edition. It will also continue to develop SharePoint Online and update it as quickly as the updates are available. A preview version of SharePoint 2016 will be made available later this summer, with a beta version expected by the end of the year . . . In an afternoon session entitled Evolution of SharePoint Overview and Roadmap, the duo gave a rough outline of Microsoft’s plans, albeit without precise delivery dates.”

Having had to push back delivery dates once already, Microsoft is likely hesitant to announce anything solid until development is final. As far as qualities for the new version, Microsoft is focusing on: user experience, extensibility, and SharePoint management. The inclusion of user experience should be a welcome change for many. To stay in touch with developments as they become available, keep an eye on, and particularly his feed devoted to SharePoint. Stephen E. Arnold has made a lifelong career out of all things search, and he has a knack for distilling down the “need to know” facts to keep an organization on track.

Emily Rae Aldridge, May 14, 2015

Sponsored by, publisher of the CyberOSINT monograph

MarkLogic: Now a Unicorn in Database Land

May 12, 2015

I read “Database Vendor MarkLogic Joins Billion Dollar Club with New Funding.” The main point for me is that MarkLogic is described as a “database vendor.” MarkLogic has been working hard to explain that it is an enterprise search vendor, a business intelligence vendor, and an XML publishing system appropriate for finance, health care, and publishing. There is MarkLogic DNA in Autonomy.

The headline brushes these assertions away, clearing the path for the unicorn to charge directly in the face of Oracle and maybe IBM.

According to the write up:

MarkLogic in the last few years has gained several new database rivals–including Cloudera Inc., last valued at $4.1 billion; MongoDB Inc., last valued at $1.6 billion; MapR Technologies Inc.; and Datastax Inc.–in addition to traditional competitors Oracle, Microsoft Corp. and International Business Machines Corp. MarkLogic customers include Dow Jones & Co., which publishes VentureWire and The Wall Street Journal. The company said the new money would be used to expand globally across Europe, Japan and Asia and invested in MarkLogic partners and in research and development.

Is this what MarkLogic will do with the money? I thought some of it would be allocated to purchase other firms; for example, companies which allegedly shore up MarkLogic’s content processing gaps. Concept Searching, Content Analyst, Smartlogic? Also, there may be some long suffering investors who want a payback for the millions pumped into the company. I noticed that the lead investor was Wellington Management with some help from Arrowpoint Partners.

Before the current president, I was working for some of the nifty outfits in Sillycon Valley. I learned that MarkLogic had missed some important financial targets. A spin of the revolving door put some new faces in familiar positions.

If one looks for MarkLogic today, the company is findable, but it maintains a comparatively low profile. I dropped the blog from my useful source list. I can’t recall the last time I saw a substantive link to the company in Twitter. I don’t see the company at some of the conferences I attend, but, hey, I attend some very specialized information centric hoe downs.

Several observations:

Oracle may expand on its”we’re a better XML database white paper which you can find here. An earlier paper called “Mark Logic XML Server 4.1” points out some issues which Oracle perceived in the MarkLogic approach. In a shoot out with Oracle, the bullets will fly. Does MarkLogic have the arsenal to deal with Oracle’s cache of armaments?

Will proprietary NoSQL data management systems be able to generate a billion in revenue in the next six or eight quarters? Outfits like Lucid Imagination (Really?) have been running into headwinds, and I think a similar weather system may turn MarkLogic’s sunny skies into a cloudy day. I understand that the Wall Street Journal is a MarkLogic believer? How many more can MarkLogic bring to its picnic? The assumption, I assume, is a lot.

MarkLogic’s core technology dates from 2001. Like many companies from this time period, MarkLogic has to find a way to get that old time start up excitement back. Companies which are 14 years old often continue along the same trajectory in my experience.

This will be interesting and maybe a big payday for the increasingly strapped owners of companies with technology which can caulk some leaks in the MarkLogic lake raft.

Stephen E Arnold, May 12, 2015

SharePoint Server 2016 Details Released

May 12, 2015

Some details about the rollout of SharePoint Server 2016 were revealed at the much-anticipated Ignite event in Chicago last week. Microsoft now commits to being on track with the project, making a public beta available in fourth quarter of this year, and “release candidate” and “general availability” versions to follow. Read more in the Redmond Magazine article, “SharePoint Server 2016 Roadmap Highlighted at Ignite Event.”

The article addresses the tension between cloud and on-premises versions:

“While Microsoft has been developing the product based on its cloud learnings, namely SharePoint Online as part of its Office 365 services, those cloud-inspired features eventually will make their way back into the server product. The capabilities that don’t make it into the server will be offered as Office 365 services that can be leveraged by premises-based systems.”

It appears that the delayed timeline may be a “worst case scenario” measure, and that the release could happen earlier. After all, it is better for customers to be prepared for the worst and be pleasantly surprised. To stay in touch with the latest news regarding features and timeline, keep an eye on, specifically the SharePoint feed. Stephen E. Arnold is a longtime leader in search and serves as a great resource for individuals who need access to the latest SharePoint news at a glance.

Emily Rae Aldridge, May 12, 2015

Sponsored by, publisher of the CyberOSINT monograph

Making Queries of PostgreSQL Data Easy

May 9, 2015

If you query PostgreSQL tables, you may find yourself making nice with a script herder. Tired of that intermediated approach? Navigate to Slinky. You will want to watch the demo in Internet Explorer because I encountered flakiness in Firefox and Mozilla. You enter what you want in a search box, pick the table, and the system spits out the SQL query. Punch a button and you get a data table. Looked good and worked for us.

Stephen E Arnold,  May 8, 2015

Google Cloud Bigtable: The Real Hadoop de Doop?

May 6, 2015

Navigate to “Announcing Google Cloud Bigtable: The same database that powers Google Search, Gmail and Analytics is now available on Google Cloud Platform.” I learned:

we are excited to introduce Google Cloud Bigtable – a fully managed, high-performance, extremely scalable NoSQL database service accessible through the industry-standard, open-source Apache HBase API. Under the hood, this new service is powered by Bigtable, the same database that drives nearly all of Google’s largest applications.

In the list of benefits Google offers, one caught my attention:

Over the past 10+ years, Bigtable has driven Google’s most critical applications. In addition, the HBase API is a industry-standard interface for combined operational and analytical workloads.

The question becomes, “Is this the real Hadoop?” Another question: “Is Google using decade old technology for its “most critical applications”? I answer, “Nope. I think there are newer, whizzier software in use.”

Stephen E Arnold, May 6, 2015

Indexing Rah Rah Rah!

May 4, 2015

Enterprise search is one of the most important features for enterprise content management systems and there is huge industry for designing and selling taxonomies.  The key selling features for taxonomies are their diversity, accuracy, and quality.  The categories within taxonomies make it easier for people to find their content, but Tech Target’s Search Content Management blog says there is room improvement in the post: “Search-Based Applications Need The Engine Of Taxonomy.”

Taxonomies are used for faceted search, allowing users to expand and limit their search results.  Faceted search gives users a selection to change their results, including file type, key words, and more of the ever popular content categories. Users usually don’t access the categories, primarily they are used behind the scenes and aggregated the results appear on the dashboard.

Taxonomies, however, take their information from more than what the user provides:

“We are now able to assemble a holistic view of the customer based on information stored across a number of disparate solutions. Search-based applications can also include information about the customer that was inferred from public content sources that the enterprise does not own, such as news feeds, social media and stock prices.”

Whether you know it or not, taxonomies are vital to enterprise search.  Companies that have difficulty finding their content need to consider creating a taxonomy plan or invest in purchasing category lists from a proven company.

Whitney Grace, May 4, 2015
Sponsored by, publisher of the CyberOSINT monograph

Next Page »