CyberOSINT banner

Quote to Note: Halevy after 10 Years Before the Ads

September 23, 2015

If you track innovations at the Alphabet Google thing, you will know that a number of wizards make the outfit hum. One of the big wizards is Dr. Alon Halevy. He is a database guru, has patents, and now an essayist.

Navigate to “A Decade at Google.” The write up does not reference the ad model which makes research possible. Legal dust ups are sidestepped. The management approach and the reorganization are not part of the write up.

I did note an interesting passage, which I flagged as a quote to note:

It is common wisdom that you should not choose a project that a product team is likely to be embarking on in the short term (e.g., up to a year). By the time you’ll get any results, they will have done it already. They might not do it as well as or as elegantly as you can, but that won’t matter at that point.

I interpreted this to underscore Alphabet Google thing’s “good enough” approach to its technology. If you have time, think about the confluence of Dr. Halevy’s research and Dr. Guha’s. The semantic search engine optimization crowd may have a field day.

Stephen E Arnold, September 23, 2015

13 Big Data Trends: Fodder for Mid Tier Consultants

September 20, 2015

Let’s assume that a colleague has lost his or her job (xe, in Tennessee, I heard). The question becomes, “What can I do with my current skills to make big money is hot new sector?”

The answer appears in “13 New Trends in Big Data and Data Science.” The write up is intended to be a round up of jazzy hot topics in a couple of even hotter quasi-new facets of the database world. Like enterprise search, databases are in need of juice. Nothing helps established technology than new spins in old orbits.

My6 suggestion is to read through the list of 13 “new trends.” Pick one, and suggest to your prospect hunting pal to get hired. Nothing to it.

Allow me to illustrate the method in action.

I have selected trend 8 “The rise of mobile data exploitation.” There are some companies active in this field; for example, S2T. The S2T name means simulation software and technology. The outfit processes a range of digital information and analyzes it with the company’s own tools. Anyone can work in this sector. The demand for talent is high. The work is not too difficult. The desire to hire “experts” various aspects of data is keen. No problem. Sure, there may be some trivial requirements like checking with a person’s mom and his or her best friends to make sure the applicant can be trusted. Hot trend. No problemo.

Let’s look at another field.

Trend 11. High performance computing (HPC). What could be faster than Apple’s new mobile chip? What could be higher performance than the Facebook or Google infrastructure. If the job seeker is familiar with these technologies, the world of Big Data excitement awaits. The experience is the important thing, not knowledge of optimized parallelization pipelines.


Each of the 13 trends makes it clear that there are numerous opportunities. These range from digital health (IBM Watson is a PR player) to the trivial world of analytic apps and APIs.

After reading the article, I was delighted to see how many important trends are getting buzz.

Big Data is definitely the go to discipline. I anticipate that anyone interested in search and cotnent processing will be able to pursue a career in Big Data.

Now some skeptics believe that Big Data is a nebulous concept. Do not be dissuaded. The 13 trends are evidence that databases and the analysis of their contents is the future. Just as these activities have been since the days of Edgar Codd.

The mid tier consultants can ride with the hounds.

Stephen E Arnold, September 20, 2015

What Is Your Database Worth?

September 11, 2015

I don’t have a single answer to this question. There is an interesting database valuation item in “CrunchBase Is Spinning Out, Backed by Emergence Capital.”

CrunchBase is an aggregator of technology company information. I think the service does a good job with companies in the Sillycon Valley area. The coverage tails a bit for Rust Belt start ups, but that’s no surprise.

The database attracts two million “visitors” each month. I remain uncertain about the meaning of a “visitor,” but when most Web sites get a few hundred or fewer hits, two million seems like a lot. It is almost identical in hype thought to Facebook’s one billion users in a 24 hour period. I was pretty good at math in grade school too.

The write up’s gem was this statement:

Eight-year-old, San Francisco-based CrunchBase looks to become a standalone company in the very near future. According to several sources, the unit, which calls itself the “definitive database of the startup ecosystem,” is finalizing a term sheet with the venture firm Emergence Capital Partners for an investment of between $5 million and $7 million.

Assume that the $7 million number is on the money. That works out to $0.30 per visitor. That is almost a million in my book.

Stephen E Arnold, September 11, 2015

 Datameer Declares a Celebration

September 8, 2015

The big data analytics and visualization company Datameer, Inc. has cause to celebrate, because they have received a huge investment.  How happy is Datameer?  Datameer’s CEO Stefan Groschupf explains on the company blog in the post, “Time To Celebrate The Next Stage Of Our Journey.”

Datameer received $40 million in a round of financing from ST Telemedia, Top Tier Capital Partners, Next World Capital, Redpoint, Kleiner Perkins Caufield & Byers, Software AG and Citi Ventures.  Groschupf details how Datameer was added to the market in 2009 with the vision to democratize analytics.  Since 2009, Datameer has helped solve problems across the globe and is even helping make it a better place.  He continues he is humbled by the trust the investors and clients place in Datameer, which feeds into the importance of analytics for not only companies, but also anyone who wants supportable truth.

Datameer has big plans for the funding:

“We’ll be focusing on expanding globally, with an eye toward APAC and Latin America as well as additional investment in our existing teams. I’m looking forward to continuing our growth and building a long-term, sustainable company that consistently provides value to our customers. Our vision has been the same since day one – to make big data analytics easy for everyone. Today, I’m happy to say we’re still where we want to be.”

Datameer was one of the early contenders in big data that always managed to outshine and outperform its bigger name competitors.  Despite its record growth, Datameer continues to remain true to its open source roots.  The company wants to make analytics available to every industry and everyone.  What is incredibly impressive is that Datameer has numerous applications for its products from gaming to healthcare, which is usually unheard of.  Congratulations to Datameer!

Whitney Grace, September 8, 2015
Sponsored by, publisher of the CyberOSINT monograph

Google Admits to Being a Copycat

August 28, 2015

In elementary school one of the biggest insults a child could throw a their fellow classmate was the slur “copycat.”  All children want to create original work, but when they feel their skills are subpar the work of another student their feel is superior.  Tossing in the old adage that “copying is the sincerest form of flattery” gives way to arguments about patents, theft, and even time outs for those involved.  The Techdirt podcast discussed copying in a recent episode and how big tech companies simply copy the ideas of their rivals and put their on name on it.  The biggest copycat they could find was Google: “The Failure of Google Plus Should Be A Reminder That Big Companies Very Rarely Successfully ‘Copy’ Startups.”

Techdirt points out the fallacy with big companies trying to steal the little startup’s idea:

“As we’ve discussed, in the rare cases when “copying” succeeds, it’s because the second company doesn’t really copy, but actually comes up with a better product, which is something we should celebrate. When they just copy, they tend to only be able to copy the superficial aspects of what they see, rather than all the underlying tacit thinking that makes a product good.”

The article discusses how Google finally admitted that Google Plus was a copy of Facebook, because they search mogul was fearful of losing profit, users, and Web traffic.  The biggest problem that Google Plus had was that it was “forced” on people, like the Star Trek Borg assimilating unsuspecting planets.  Okay, maybe that is a bit of a drastic comparison, but startups are still fearful of their ideas being assimilated by the bigger companies.  This is when the patent topic comes in and whether or not to register for one.

There is good news for startups: “if a startup is doing something really amazing and innovative that people actually want, you can almost always guarantee that (1) the big companies will totally miss the boat for way too long and (2) once they finally wake up, be clumsy and ridiculous in their attempts to copy.”

Also Techdirt sums everything up in an eloquent paragraph that explains the logic in this argument:

“People think it’s easy to copy because copying seems like it should be easy. But it’s not. You can only copy the parts you can see, which leaves out an awful lot of understanding and tacit knowledge hidden beneath the surface. It also leaves out all the knowledge of what doesn’t work that the originator has. And, finally, it ignores the competing interests within a larger business that make it much harder for those companies to innovate.”

In other words, do not worry about Borg assimilation if your startup has a good idea, but do be on the defensive and arm yourself with good weapons.

Whitney Grace, August 28, 2015
Sponsored by, publisher of the CyberOSINT monograph

The Data Lake Is a Hub: For Wheel I Tell You

August 25, 2015

When I read “Why Do I Need a Data Lake,” I thought about Mel Blanc. Mr. Blanc was a voice actor who enlivened the Jack Benny Show and Warner Bros. cartoons. For Mr. Benny, Mr. Blanc was the “sound” of the Maxwell automobile and the participant in the famous “Sí…Sy…sew…Sue” routine.

So what? I imagined Mr. Blanc reading aloud the write up to me as Daffy Duck.

Here’s a passage I highlighted and enjoyed:

The data lake has the potential to transform the business by providing a singular repository of all the organization’s data (structured AND unstructured data; internal AND external data) that enables your business analysts and data science team to mine all of organizational data that today is scattered across a multitude of operational systems, data warehouses, data marts and “spreadmarts”. [Emphasis in the original]

Note that the lake has “potential to transform”. I also like the categorical imperative of “all the organization’s data.” I find the “all” notion quite humorous because there are digital data which are not likely to be pooled and processed. One example is data governed by government contracts for which rules of secrecy apply. Another is digital information germane to a legal matter and in the control of the firm’s legal eagles. There are other examples as well. So the “all” is bobbing buoy. But what the heck is a spreadmart?

But the chortle inducing passage is the conversion of a data lake into a “hub and spoke service architecture.” That is quite a metaphorical shift.

Here’s another passage I highlighted:

the head of EMC Global Services Big Data Delivery team, termed this a “Hub and Spoke” analytics environment where the data lake is the “hub” that enables the data science teams to self-provision their own analytic sandboxes and facilitates the sharing of data, analytic tools and analytic best practices across the different parts of the organization.

I worked through the requisite list of dot points and then came upon a list of confusions for which I was prepared by the lake wheel juxtaposition. One confusion warrants some of my attention: “Create multiple data lakes.”

The idea is that an organization needs just “ONE [emphasis in original] data lake;

a singular repository where all of the organizations data – whether the organization knows what to do with that data or not – can be made available.  Organizations such as EMC are leveraging technologies such as virtualization to ensure that a single data lake repository can scale out and meet the growing analytic needs of the different business units – all from a single data lake.

I can hear Daffy as vivified by Mr. Blanc saying, “Do me a big data favor and scold anyone who starts talking about data lakes (plural) instead of a data lake.”

Okay, scold.

EMC, as I understand the firm’s strategy, is contemplating this action: The company has considered selling itself to one of its subsidiaries.

There you go. An example of a hub and spoke, data lake type analysis applied to storage. Why do I need a data lake.

That’s all folks.

Stephen E Arnold, August 25, 2015

Enterprise Search: MarkLogic Cheerleader Is Surprised

August 25, 2015

Navigate to this link. You will need a LinkedIn account. Lucky you. Here’s the “comment” about a mid tier consulting firm’s magic whozit. The remark amused me:

It’s crazy to me that MarkLogic is not even on the list. All I can say is Gartner is making a mistake by forgetting it. I’m no expert on targeted marketing or how big the enterprise search market is vs the operational db market. But I know MarkLogic as a company is going after the operational db market instead. Yet almost all our customers deploy search applications. And I work for MarkLogic because after hundreds of ES [enterprise search] projects, MarkLogic was my favorite engine by far to install/use.

Well, crazy is as crazy does. My reaction to this comment is a question, “Isn’t MarkLogic an SGML database?” Even Oracle’s aged alternative can be searched, but the internals are, I hate to say it, a database. Bummer.

However, MarkLogic has some aspects which appear to lure mid tier wizards:

  1. MarkLogic is proprietary NoSQL. I think there are some open source NoSQL alternatives. Gartner’s experts seem to prefer proprietary solutions, not the community goodies.
  2. MarkLogic is getting long in the tooth. The company was founded in 2001, which based on my lousy math, is 14 years ago. Ah, technology does march on with the JSON thing, the Elastic gizmos, and an appetite for continued cash infusions. According to Crunchbase, MarkLogic has sucked in $176.6 million in funding with the most recent infusion coming in May 2015. I heard that a couple of years ago, MarkLogic was in the $6 million range. If that number was close to reality, the company has to get its dancing shoes on and win the international tango competition.
  3. MarkLogic “helped power the US government site.” I remember reading something about that Web site. Any publicity is good publicity as the saying goes.

Is MarkLogic a unicorn or just another endangered species? Sorry. No answers in Harrod’s Creek. We just use open source software. Works okay. Can’t beat the price either.

Stephen E Arnold, August 25 2015

Software AG Revenue Drifts Downward Even with JackBe Technology

August 18, 2015

JackBe was an interesting intelligence system. In 2013, Software AG purchased JackBe, and the cyber OSINT brand dropped off my radar. In the 2013 news release, the company explained its positioning in this way:

Software AG (FRA: SOW) helps organizations achieve their business objectives faster. The company’s big data, integration and business process technologies enable customers to drive operational efficiency, modernize their systems and optimize processes for smarter decisions and better service. Building on over 40 years of customer-centric innovation, the company is ranked as a leader in 15 market categories, fueled by core product families Adabas and Natural, ARIS, Terracotta, webMethods and also Alfabet and Apama. Software AG has ca. 5,300 employees in 70 countries and had revenues of €1.05 billion in 2012

With a flurry of management changes, Software AG describes itself this way 24 months after the JackBe deal:

Software AG (Frankfurt TecDAX: SOW) helps organizations achieve their business objectives faster. The company’s big data, integration and business process technologies enable customers to drive operational efficiency, modernize their systems and optimize processes for smarter decisions and better service. Building on over 40 years of customer-centric innovation, the company is ranked as a leader in 14 market categories, fueled by core product families Adabas-Natural, ARIS, Alfabet, Apama, Terracotta and webMethods. Software AG has more than 4,400 employees in 70 countries and had revenues of €858 million in 2014.

Notice that the company is smaller in revenues and staff. There was also a stock market shift. The JackBe technology does not appear to have provided the type of lift I anticipated.

Stephen E Arnold, August 18, 2015



Advice for Smart SEO Choices

August 11, 2015

We’ve come across a well-penned article about the intersection of language and search engine optimization by The SEO Guy. Self-proclaimed word-aficionado Ben Kemp helps website writers use their words wisely in, “Language, Linguistics, Semantics, & Search.” He begins by discrediting the practice of keyword stuffing, noting that search-ranking algorithms are more sophisticated than some give them credit for. He writes:

“Search engine algorithms assess all the words within the site. These algorithms may be bereft of direct human interpretation but are based on mathematics, knowledge, experience and intelligence. They deliver very accurate relevance analysis. In the context of using related words or variations within your website, it is one good way of reinforcing the primary keyword phrase you wish to rank for, without over-use of exact-match keywords and phrases. By using synonyms, and a range of relevant nouns, verbs and adjectives, you may eliminate excessive repetition and more accurately describe your topic or theme and at the same time, increase the range of word associations your website will rank for.”

Kemp goes on to lament the dumbing down of English-language education around the world, blaming the trend for a dearth of deft wordsmiths online. Besides recommending that his readers open a thesaurus now and then, he also advises them to make sure they spell words correctly, not because algorithms can’t figure out what they meant to say (they can), but because misspelled words look unprofessional. He even supplies a handy list of the most often misspelled words.

The development of more and more refined search algorithms, it seems, presents the opportunity for websites to craft better copy. See the article for more of Kemp’s language, and SEO, guidance.

Cynthia Murrell, August 11, 2015

Sponsored by, publisher of the CyberOSINT monograph


Data Companies Poised to Leverage Open Data

July 27, 2015

Support for open data, government datasets freely available to the public, has taken off in recent years; the federal government’s launch of in 2009 is a prominent example. Naturally, some companies have sprung up to monetize this valuable resource. The New York Times reports, “Data Mining Start-Up Enigma to Expand Commercial Business.”

The article leads with a pro bono example of Enigma’s work: a project in New Orleans that uses that city’s open data to identify households most at risk for fire, so the city can give those folks free smoke detectors. The project illustrates the potential for good lurking in sets of open data. But make no mistake, the potential for profits is big, too.  Reporter Steve Lohr explains:

“This new breed of open data companies represents the next step, pushing the applications into the commercial mainstream. Already, Enigma is working on projects with a handful of large corporations for analyzing business risks and fine-tuning supply chains — business that Enigma says generates millions of dollars in revenue.

“The four-year-old company has built up gradually, gathering and preparing thousands of government data sets to be searched, sifted and deployed in software applications. But Enigma is embarking on a sizable expansion, planning to nearly double its staff to 60 people by the end of the year. The growth will be fueled by a $28.2 million round of venture funding….

“The expansion will be mainly to pursue corporate business. Drew Conway, co-founder of DataKind, an organization that puts together volunteer teams of data scientists for humanitarian purposes, called Enigma ‘a first version of the potential commercialization of public data.’”

Other companies are getting into the game, too, leveraging open data in different ways. There’s Reonomy, which supplies research to the commercial real estate market. Seattle-based Socrata makes data-driven applications for government agencies. Information discovery company Dataminr uses open data in addition to Twitter’s stream to inform its clients’ decisions. Not surprisingly, Google is a contender with its Sidewalk Labs, which plumbs open data to improve city living through technology. Lohr insists, though, that Enigma is unique in the comprehensiveness of its data services. See the article for more on this innovative company.


Cynthia Murrell, July 27, 2015

Sponsored by, publisher of the CyberOSINT monograph

Next Page »