CyberOSINT banner

Google Admits to Being a Copycat

August 28, 2015

In elementary school one of the biggest insults a child could throw a their fellow classmate was the slur “copycat.”  All children want to create original work, but when they feel their skills are subpar the work of another student their feel is superior.  Tossing in the old adage that “copying is the sincerest form of flattery” gives way to arguments about patents, theft, and even time outs for those involved.  The Techdirt podcast discussed copying in a recent episode and how big tech companies simply copy the ideas of their rivals and put their on name on it.  The biggest copycat they could find was Google: “The Failure of Google Plus Should Be A Reminder That Big Companies Very Rarely Successfully ‘Copy’ Startups.”

Techdirt points out the fallacy with big companies trying to steal the little startup’s idea:

“As we’ve discussed, in the rare cases when “copying” succeeds, it’s because the second company doesn’t really copy, but actually comes up with a better product, which is something we should celebrate. When they just copy, they tend to only be able to copy the superficial aspects of what they see, rather than all the underlying tacit thinking that makes a product good.”

The article discusses how Google finally admitted that Google Plus was a copy of Facebook, because they search mogul was fearful of losing profit, users, and Web traffic.  The biggest problem that Google Plus had was that it was “forced” on people, like the Star Trek Borg assimilating unsuspecting planets.  Okay, maybe that is a bit of a drastic comparison, but startups are still fearful of their ideas being assimilated by the bigger companies.  This is when the patent topic comes in and whether or not to register for one.

There is good news for startups: “if a startup is doing something really amazing and innovative that people actually want, you can almost always guarantee that (1) the big companies will totally miss the boat for way too long and (2) once they finally wake up, be clumsy and ridiculous in their attempts to copy.”

Also Techdirt sums everything up in an eloquent paragraph that explains the logic in this argument:

“People think it’s easy to copy because copying seems like it should be easy. But it’s not. You can only copy the parts you can see, which leaves out an awful lot of understanding and tacit knowledge hidden beneath the surface. It also leaves out all the knowledge of what doesn’t work that the originator has. And, finally, it ignores the competing interests within a larger business that make it much harder for those companies to innovate.”

In other words, do not worry about Borg assimilation if your startup has a good idea, but do be on the defensive and arm yourself with good weapons.

Whitney Grace, August 28, 2015
Sponsored by, publisher of the CyberOSINT monograph

The Data Lake Is a Hub: For Wheel I Tell You

August 25, 2015

When I read “Why Do I Need a Data Lake,” I thought about Mel Blanc. Mr. Blanc was a voice actor who enlivened the Jack Benny Show and Warner Bros. cartoons. For Mr. Benny, Mr. Blanc was the “sound” of the Maxwell automobile and the participant in the famous “Sí…Sy…sew…Sue” routine.

So what? I imagined Mr. Blanc reading aloud the write up to me as Daffy Duck.

Here’s a passage I highlighted and enjoyed:

The data lake has the potential to transform the business by providing a singular repository of all the organization’s data (structured AND unstructured data; internal AND external data) that enables your business analysts and data science team to mine all of organizational data that today is scattered across a multitude of operational systems, data warehouses, data marts and “spreadmarts”. [Emphasis in the original]

Note that the lake has “potential to transform”. I also like the categorical imperative of “all the organization’s data.” I find the “all” notion quite humorous because there are digital data which are not likely to be pooled and processed. One example is data governed by government contracts for which rules of secrecy apply. Another is digital information germane to a legal matter and in the control of the firm’s legal eagles. There are other examples as well. So the “all” is bobbing buoy. But what the heck is a spreadmart?

But the chortle inducing passage is the conversion of a data lake into a “hub and spoke service architecture.” That is quite a metaphorical shift.

Here’s another passage I highlighted:

the head of EMC Global Services Big Data Delivery team, termed this a “Hub and Spoke” analytics environment where the data lake is the “hub” that enables the data science teams to self-provision their own analytic sandboxes and facilitates the sharing of data, analytic tools and analytic best practices across the different parts of the organization.

I worked through the requisite list of dot points and then came upon a list of confusions for which I was prepared by the lake wheel juxtaposition. One confusion warrants some of my attention: “Create multiple data lakes.”

The idea is that an organization needs just “ONE [emphasis in original] data lake;

a singular repository where all of the organizations data – whether the organization knows what to do with that data or not – can be made available.  Organizations such as EMC are leveraging technologies such as virtualization to ensure that a single data lake repository can scale out and meet the growing analytic needs of the different business units – all from a single data lake.

I can hear Daffy as vivified by Mr. Blanc saying, “Do me a big data favor and scold anyone who starts talking about data lakes (plural) instead of a data lake.”

Okay, scold.

EMC, as I understand the firm’s strategy, is contemplating this action: The company has considered selling itself to one of its subsidiaries.

There you go. An example of a hub and spoke, data lake type analysis applied to storage. Why do I need a data lake.

That’s all folks.

Stephen E Arnold, August 25, 2015

Enterprise Search: MarkLogic Cheerleader Is Surprised

August 25, 2015

Navigate to this link. You will need a LinkedIn account. Lucky you. Here’s the “comment” about a mid tier consulting firm’s magic whozit. The remark amused me:

It’s crazy to me that MarkLogic is not even on the list. All I can say is Gartner is making a mistake by forgetting it. I’m no expert on targeted marketing or how big the enterprise search market is vs the operational db market. But I know MarkLogic as a company is going after the operational db market instead. Yet almost all our customers deploy search applications. And I work for MarkLogic because after hundreds of ES [enterprise search] projects, MarkLogic was my favorite engine by far to install/use.

Well, crazy is as crazy does. My reaction to this comment is a question, “Isn’t MarkLogic an SGML database?” Even Oracle’s aged alternative can be searched, but the internals are, I hate to say it, a database. Bummer.

However, MarkLogic has some aspects which appear to lure mid tier wizards:

  1. MarkLogic is proprietary NoSQL. I think there are some open source NoSQL alternatives. Gartner’s experts seem to prefer proprietary solutions, not the community goodies.
  2. MarkLogic is getting long in the tooth. The company was founded in 2001, which based on my lousy math, is 14 years ago. Ah, technology does march on with the JSON thing, the Elastic gizmos, and an appetite for continued cash infusions. According to Crunchbase, MarkLogic has sucked in $176.6 million in funding with the most recent infusion coming in May 2015. I heard that a couple of years ago, MarkLogic was in the $6 million range. If that number was close to reality, the company has to get its dancing shoes on and win the international tango competition.
  3. MarkLogic “helped power the US government site.” I remember reading something about that Web site. Any publicity is good publicity as the saying goes.

Is MarkLogic a unicorn or just another endangered species? Sorry. No answers in Harrod’s Creek. We just use open source software. Works okay. Can’t beat the price either.

Stephen E Arnold, August 25 2015

Software AG Revenue Drifts Downward Even with JackBe Technology

August 18, 2015

JackBe was an interesting intelligence system. In 2013, Software AG purchased JackBe, and the cyber OSINT brand dropped off my radar. In the 2013 news release, the company explained its positioning in this way:

Software AG (FRA: SOW) helps organizations achieve their business objectives faster. The company’s big data, integration and business process technologies enable customers to drive operational efficiency, modernize their systems and optimize processes for smarter decisions and better service. Building on over 40 years of customer-centric innovation, the company is ranked as a leader in 15 market categories, fueled by core product families Adabas and Natural, ARIS, Terracotta, webMethods and also Alfabet and Apama. Software AG has ca. 5,300 employees in 70 countries and had revenues of €1.05 billion in 2012

With a flurry of management changes, Software AG describes itself this way 24 months after the JackBe deal:

Software AG (Frankfurt TecDAX: SOW) helps organizations achieve their business objectives faster. The company’s big data, integration and business process technologies enable customers to drive operational efficiency, modernize their systems and optimize processes for smarter decisions and better service. Building on over 40 years of customer-centric innovation, the company is ranked as a leader in 14 market categories, fueled by core product families Adabas-Natural, ARIS, Alfabet, Apama, Terracotta and webMethods. Software AG has more than 4,400 employees in 70 countries and had revenues of €858 million in 2014.

Notice that the company is smaller in revenues and staff. There was also a stock market shift. The JackBe technology does not appear to have provided the type of lift I anticipated.

Stephen E Arnold, August 18, 2015



Advice for Smart SEO Choices

August 11, 2015

We’ve come across a well-penned article about the intersection of language and search engine optimization by The SEO Guy. Self-proclaimed word-aficionado Ben Kemp helps website writers use their words wisely in, “Language, Linguistics, Semantics, & Search.” He begins by discrediting the practice of keyword stuffing, noting that search-ranking algorithms are more sophisticated than some give them credit for. He writes:

“Search engine algorithms assess all the words within the site. These algorithms may be bereft of direct human interpretation but are based on mathematics, knowledge, experience and intelligence. They deliver very accurate relevance analysis. In the context of using related words or variations within your website, it is one good way of reinforcing the primary keyword phrase you wish to rank for, without over-use of exact-match keywords and phrases. By using synonyms, and a range of relevant nouns, verbs and adjectives, you may eliminate excessive repetition and more accurately describe your topic or theme and at the same time, increase the range of word associations your website will rank for.”

Kemp goes on to lament the dumbing down of English-language education around the world, blaming the trend for a dearth of deft wordsmiths online. Besides recommending that his readers open a thesaurus now and then, he also advises them to make sure they spell words correctly, not because algorithms can’t figure out what they meant to say (they can), but because misspelled words look unprofessional. He even supplies a handy list of the most often misspelled words.

The development of more and more refined search algorithms, it seems, presents the opportunity for websites to craft better copy. See the article for more of Kemp’s language, and SEO, guidance.

Cynthia Murrell, August 11, 2015

Sponsored by, publisher of the CyberOSINT monograph


Data Companies Poised to Leverage Open Data

July 27, 2015

Support for open data, government datasets freely available to the public, has taken off in recent years; the federal government’s launch of in 2009 is a prominent example. Naturally, some companies have sprung up to monetize this valuable resource. The New York Times reports, “Data Mining Start-Up Enigma to Expand Commercial Business.”

The article leads with a pro bono example of Enigma’s work: a project in New Orleans that uses that city’s open data to identify households most at risk for fire, so the city can give those folks free smoke detectors. The project illustrates the potential for good lurking in sets of open data. But make no mistake, the potential for profits is big, too.  Reporter Steve Lohr explains:

“This new breed of open data companies represents the next step, pushing the applications into the commercial mainstream. Already, Enigma is working on projects with a handful of large corporations for analyzing business risks and fine-tuning supply chains — business that Enigma says generates millions of dollars in revenue.

“The four-year-old company has built up gradually, gathering and preparing thousands of government data sets to be searched, sifted and deployed in software applications. But Enigma is embarking on a sizable expansion, planning to nearly double its staff to 60 people by the end of the year. The growth will be fueled by a $28.2 million round of venture funding….

“The expansion will be mainly to pursue corporate business. Drew Conway, co-founder of DataKind, an organization that puts together volunteer teams of data scientists for humanitarian purposes, called Enigma ‘a first version of the potential commercialization of public data.’”

Other companies are getting into the game, too, leveraging open data in different ways. There’s Reonomy, which supplies research to the commercial real estate market. Seattle-based Socrata makes data-driven applications for government agencies. Information discovery company Dataminr uses open data in addition to Twitter’s stream to inform its clients’ decisions. Not surprisingly, Google is a contender with its Sidewalk Labs, which plumbs open data to improve city living through technology. Lohr insists, though, that Enigma is unique in the comprehensiveness of its data services. See the article for more on this innovative company.


Cynthia Murrell, July 27, 2015

Sponsored by, publisher of the CyberOSINT monograph

Scribd Obtains Social Reading

July 22, 2015

Access to books and other literary material has reached an unprecedented high.  People can download and read millions of books with a few simple clicks.  Handheld ebook readers are curtailing the sales of printed book, but they also are increasing sales of digital books.  One of the good things about ebooks is bibliophiles do not have to drive to a bookstore or get waitlisted on the library.  Writers also can directly sell their material to readers and potentially by pass having to pay agents and publishers.

It occurred to someone that bibliophiles would love to have instant access to a huge library of books, similar to how Netflix offers its customers an unending video library.  There is one and it is called Scribed.  Scribd is described as the Netflix of books, because for a simple $8.99 bibliophiles can read and download as many books as they wish.

The digital landscape is still being tested by book platforms and Scribd has increased its offerings.  VentureBeat reports Scribd’s newest business move in: “Scribd Buys Social Reading App Librify.” Librify is a social media reading app, offering users the opportunity to connect with friends and sharing their reading experiences.  It is advertised as a great app for book clubs.

“In a sparse press release, Scribd argues Librify’s “focus on the social reading experience” made the deal worthwhile. The news arrives at a heated time for the publishing industry, as Amazon, Oyster, and others all fight to be the definitive Netflix for books — all while hawking remarkably similar products.”

Netflix has its own rivals: Hulu, Amazon Prime, Vimeo, and YouTube, but it offers something different by creating new and original shows.  Scribd might be following a similar business move, by offering an original service its rivals do not have.  Will it also offer Scribd only books?

Whitney Grace, July 22, 2015
Sponsored by, publisher of the CyberOSINT monograph

On Embedding Valuable Outside Links

July 21, 2015

If media websites take this suggestion from an article at Monday Note, titled “How Linking to Knowledge Could Boost News Media,” there will be no need to search; we’ll just follow the yellow brick links. Writer Frederic Filloux laments the current state of affairs, wherein websites mostly link to internal content, and describes how embedded links could be much, much more valuable. He describes:

“Now picture this: A hypothetical big-issue story about GE’s strategic climate change thinking, published in the Wall Street Journal, the FT, or in The Atlantic, suddenly opens to a vast web of knowledge. The text (along with graphics, videos, etc.) provided by the news media staff, is amplified by access to three books on global warming, two Ted Talks, several databases containing references to places and people mentioned in the story, an academic paper from Knowledge@Wharton, a MOOC from Coursera, a survey from a Scandinavian research institute, a National Geographic documentary, etc. Since (supposedly), all of the above is semanticized and speaks the same lingua franca as the original journalistic content, the process is largely automatized.”

Filloux posits that such a trend would be valuable not only for today’s Web surfers, but also for future historians and researchers. He cites recent work by a couple of French scholars, Fabian Suchanek and Nicoleta Preda, who have been looking into what they call “Semantic Culturonomics,” defined as “a paradigm that uses semantic knowledge bases in order to give meaning to textual corpora such as news and social media.” Web media that keeps this paradigm in mind will wildly surpass newspapers in the role of contemporary historical documentation, because good outside links will greatly enrich the content.

Before this vision becomes reality, though, media websites must be convinced that linking to valuable content outside their site is worth the risk that users will wander away. The write-up insists that a reputation for providing valuable outside links will more than make up for any amount of such drifting visitors. We’ll see whether media sites agree.

Cynthia Murrell, July 21, 2015

Sponsored by, publisher of the CyberOSINT monograph

US Government and Proprietary Databases: Will Procurement Roadblocks Get Set Up before October 1, 2015?

July 20, 2015

I don’t do the government work stuff anymore. Too old. But some outfits depend on the US government for revenue. I should write “Depend a lot.”

I read “Why Government Needs Open Source Databases.” The article is one of those which is easily overlooked. With the excitement changing like a heartbeat, “database” and “government” are not likely to capture the attention of the iPhone and Android crowd.

I found the article interesting. I learned:

Open source solutions offer greater flexibility in pricing models as well. In some cases, vendors offering open source  databases price on a subscription-based model that eliminates the licensing fees common to large proprietary systems. An important element to a subscription is that it qualifies as an operating expense versus a more complex capital expenditure. Thus, deploying open source and open source-based databases become a simpler process and can cost 80 to 90 percent less than traditional solutions. This allows agencies to refocus these resources on innovation and key organizational drivers.

Wow, cheaper. Maybe better? Maybe faster?

The article raises an interesting topic—security. I assumed that the US government was “into” security. Each time I read disinformation about the loss of personnel data or a misplaced laptop with secret information on its storage device, I am a doubter.

But the article informs me:

Data security has always been and will continue to remain a major priority for government agencies,  given the sensitive and business-critical nature of the information they collect. Some IT departments may be skeptical of the security capabilities of open source solutions. Gartner’s 2014 Magic Quadrant for Operational Database Management Systems showed that open source database solutions are being used successfully in mission-critical applications in a large number of organizations. In addition, mature open source solutions today implement the same, if not better, security capabilities of traditional infrastructures. This includes SQL injection prevention, tools for replication and failover, server-side code protections, row-level security and enhanced auditing features, to name a few. Furthermore, as open source technology, in general, becomes more widely accepted across the public sector – intelligence, civilian and defense agencies across the federal government have adopted open source – database solutions are also growing with specific government mandates, regulations and requirements.

I knew it. Security is job one, well, maybe job two after cost controls. No, no, cost controls and government activities do not compute in my experience.

Open source database technology may be the horse the government knights can ride to the senior executive service. If open source data management systems get procurement love, what does that mean for IBM and Oracle database license fees?

Not much. The revenue comes from services, particularly when things go south. The license fees are malleable, often negotiable. The fees for service continue to honk like golden geese.

Net net: Money will remain the same, just be taken from a different category of expense. In short, the write up is a good effort, but offers little in the way of bad news for the big database vendors. On October 1, 2015, not much change in the flowing river of government expenditures which just keep rising like the pond filled with mine drainage near my hovel in Kentucky.

Stephen E Arnold, July 20, 2015

Publishers Out Of Sorts…Again

July 20, 2015

Here we go again, the same old panic song that has been sung around the digital landscape since the advent of portable devices: the publishing industry is losing money. The Guardian reports on how mobile devices are now hurting news outlets: “News Outlets Face Losing Control To Apple, Facebook, And Google.”

The news outlets are losing money as users move to mobile devices to access the news via Apple, Facebook, and Google. The article shares a bunch of statistics supporting this claim, which only backs up facts people already knew.

It does make a sound suggestion of traditional news outlets changing their business model by possibly teaming with the new ways people consume their news.

Here is a good rebuttal, however:

“ ‘Fragmentation of news provision, which weakens the bargaining power of journalism organisations, has coincided with a concentration of power in platforms,’ said Emily Bell, director of the Tow Center at Columbia university, in a lead commentary for the report.”

Seventy percent of mobile device users have a news app on their phone, but only a third of them use it at least once a week. Only diehard loyalists are returning to the traditional outlets and paying a subscription fee for the services. The rest of the time they turn to social media for their news.

This is not anything new. These outlets will adapt, because despite social media’s popularity there is still something to be said for a viable and trusted news outlet, that is, if you can trust the outlet.

Whitney Grace, July 20, 2015

Sponsored by, publisher of the CyberOSINT monograph

Next Page »