CyberOSINT banner

Data Companies Poised to Leverage Open Data

July 27, 2015

Support for open data, government datasets freely available to the public, has taken off in recent years; the federal government’s launch of in 2009 is a prominent example. Naturally, some companies have sprung up to monetize this valuable resource. The New York Times reports, “Data Mining Start-Up Enigma to Expand Commercial Business.”

The article leads with a pro bono example of Enigma’s work: a project in New Orleans that uses that city’s open data to identify households most at risk for fire, so the city can give those folks free smoke detectors. The project illustrates the potential for good lurking in sets of open data. But make no mistake, the potential for profits is big, too.  Reporter Steve Lohr explains:

“This new breed of open data companies represents the next step, pushing the applications into the commercial mainstream. Already, Enigma is working on projects with a handful of large corporations for analyzing business risks and fine-tuning supply chains — business that Enigma says generates millions of dollars in revenue.

“The four-year-old company has built up gradually, gathering and preparing thousands of government data sets to be searched, sifted and deployed in software applications. But Enigma is embarking on a sizable expansion, planning to nearly double its staff to 60 people by the end of the year. The growth will be fueled by a $28.2 million round of venture funding….

“The expansion will be mainly to pursue corporate business. Drew Conway, co-founder of DataKind, an organization that puts together volunteer teams of data scientists for humanitarian purposes, called Enigma ‘a first version of the potential commercialization of public data.’”

Other companies are getting into the game, too, leveraging open data in different ways. There’s Reonomy, which supplies research to the commercial real estate market. Seattle-based Socrata makes data-driven applications for government agencies. Information discovery company Dataminr uses open data in addition to Twitter’s stream to inform its clients’ decisions. Not surprisingly, Google is a contender with its Sidewalk Labs, which plumbs open data to improve city living through technology. Lohr insists, though, that Enigma is unique in the comprehensiveness of its data services. See the article for more on this innovative company.


Cynthia Murrell, July 27, 2015

Sponsored by, publisher of the CyberOSINT monograph

Scribd Obtains Social Reading

July 22, 2015

Access to books and other literary material has reached an unprecedented high.  People can download and read millions of books with a few simple clicks.  Handheld ebook readers are curtailing the sales of printed book, but they also are increasing sales of digital books.  One of the good things about ebooks is bibliophiles do not have to drive to a bookstore or get waitlisted on the library.  Writers also can directly sell their material to readers and potentially by pass having to pay agents and publishers.

It occurred to someone that bibliophiles would love to have instant access to a huge library of books, similar to how Netflix offers its customers an unending video library.  There is one and it is called Scribed.  Scribd is described as the Netflix of books, because for a simple $8.99 bibliophiles can read and download as many books as they wish.

The digital landscape is still being tested by book platforms and Scribd has increased its offerings.  VentureBeat reports Scribd’s newest business move in: “Scribd Buys Social Reading App Librify.” Librify is a social media reading app, offering users the opportunity to connect with friends and sharing their reading experiences.  It is advertised as a great app for book clubs.

“In a sparse press release, Scribd argues Librify’s “focus on the social reading experience” made the deal worthwhile. The news arrives at a heated time for the publishing industry, as Amazon, Oyster, and others all fight to be the definitive Netflix for books — all while hawking remarkably similar products.”

Netflix has its own rivals: Hulu, Amazon Prime, Vimeo, and YouTube, but it offers something different by creating new and original shows.  Scribd might be following a similar business move, by offering an original service its rivals do not have.  Will it also offer Scribd only books?

Whitney Grace, July 22, 2015
Sponsored by, publisher of the CyberOSINT monograph

On Embedding Valuable Outside Links

July 21, 2015

If media websites take this suggestion from an article at Monday Note, titled “How Linking to Knowledge Could Boost News Media,” there will be no need to search; we’ll just follow the yellow brick links. Writer Frederic Filloux laments the current state of affairs, wherein websites mostly link to internal content, and describes how embedded links could be much, much more valuable. He describes:

“Now picture this: A hypothetical big-issue story about GE’s strategic climate change thinking, published in the Wall Street Journal, the FT, or in The Atlantic, suddenly opens to a vast web of knowledge. The text (along with graphics, videos, etc.) provided by the news media staff, is amplified by access to three books on global warming, two Ted Talks, several databases containing references to places and people mentioned in the story, an academic paper from Knowledge@Wharton, a MOOC from Coursera, a survey from a Scandinavian research institute, a National Geographic documentary, etc. Since (supposedly), all of the above is semanticized and speaks the same lingua franca as the original journalistic content, the process is largely automatized.”

Filloux posits that such a trend would be valuable not only for today’s Web surfers, but also for future historians and researchers. He cites recent work by a couple of French scholars, Fabian Suchanek and Nicoleta Preda, who have been looking into what they call “Semantic Culturonomics,” defined as “a paradigm that uses semantic knowledge bases in order to give meaning to textual corpora such as news and social media.” Web media that keeps this paradigm in mind will wildly surpass newspapers in the role of contemporary historical documentation, because good outside links will greatly enrich the content.

Before this vision becomes reality, though, media websites must be convinced that linking to valuable content outside their site is worth the risk that users will wander away. The write-up insists that a reputation for providing valuable outside links will more than make up for any amount of such drifting visitors. We’ll see whether media sites agree.

Cynthia Murrell, July 21, 2015

Sponsored by, publisher of the CyberOSINT monograph

US Government and Proprietary Databases: Will Procurement Roadblocks Get Set Up before October 1, 2015?

July 20, 2015

I don’t do the government work stuff anymore. Too old. But some outfits depend on the US government for revenue. I should write “Depend a lot.”

I read “Why Government Needs Open Source Databases.” The article is one of those which is easily overlooked. With the excitement changing like a heartbeat, “database” and “government” are not likely to capture the attention of the iPhone and Android crowd.

I found the article interesting. I learned:

Open source solutions offer greater flexibility in pricing models as well. In some cases, vendors offering open source  databases price on a subscription-based model that eliminates the licensing fees common to large proprietary systems. An important element to a subscription is that it qualifies as an operating expense versus a more complex capital expenditure. Thus, deploying open source and open source-based databases become a simpler process and can cost 80 to 90 percent less than traditional solutions. This allows agencies to refocus these resources on innovation and key organizational drivers.

Wow, cheaper. Maybe better? Maybe faster?

The article raises an interesting topic—security. I assumed that the US government was “into” security. Each time I read disinformation about the loss of personnel data or a misplaced laptop with secret information on its storage device, I am a doubter.

But the article informs me:

Data security has always been and will continue to remain a major priority for government agencies,  given the sensitive and business-critical nature of the information they collect. Some IT departments may be skeptical of the security capabilities of open source solutions. Gartner’s 2014 Magic Quadrant for Operational Database Management Systems showed that open source database solutions are being used successfully in mission-critical applications in a large number of organizations. In addition, mature open source solutions today implement the same, if not better, security capabilities of traditional infrastructures. This includes SQL injection prevention, tools for replication and failover, server-side code protections, row-level security and enhanced auditing features, to name a few. Furthermore, as open source technology, in general, becomes more widely accepted across the public sector – intelligence, civilian and defense agencies across the federal government have adopted open source – database solutions are also growing with specific government mandates, regulations and requirements.

I knew it. Security is job one, well, maybe job two after cost controls. No, no, cost controls and government activities do not compute in my experience.

Open source database technology may be the horse the government knights can ride to the senior executive service. If open source data management systems get procurement love, what does that mean for IBM and Oracle database license fees?

Not much. The revenue comes from services, particularly when things go south. The license fees are malleable, often negotiable. The fees for service continue to honk like golden geese.

Net net: Money will remain the same, just be taken from a different category of expense. In short, the write up is a good effort, but offers little in the way of bad news for the big database vendors. On October 1, 2015, not much change in the flowing river of government expenditures which just keep rising like the pond filled with mine drainage near my hovel in Kentucky.

Stephen E Arnold, July 20, 2015

Publishers Out Of Sorts…Again

July 20, 2015

Here we go again, the same old panic song that has been sung around the digital landscape since the advent of portable devices: the publishing industry is losing money. The Guardian reports on how mobile devices are now hurting news outlets: “News Outlets Face Losing Control To Apple, Facebook, And Google.”

The news outlets are losing money as users move to mobile devices to access the news via Apple, Facebook, and Google. The article shares a bunch of statistics supporting this claim, which only backs up facts people already knew.

It does make a sound suggestion of traditional news outlets changing their business model by possibly teaming with the new ways people consume their news.

Here is a good rebuttal, however:

“ ‘Fragmentation of news provision, which weakens the bargaining power of journalism organisations, has coincided with a concentration of power in platforms,’ said Emily Bell, director of the Tow Center at Columbia university, in a lead commentary for the report.”

Seventy percent of mobile device users have a news app on their phone, but only a third of them use it at least once a week. Only diehard loyalists are returning to the traditional outlets and paying a subscription fee for the services. The rest of the time they turn to social media for their news.

This is not anything new. These outlets will adapt, because despite social media’s popularity there is still something to be said for a viable and trusted news outlet, that is, if you can trust the outlet.

Whitney Grace, July 20, 2015

Sponsored by, publisher of the CyberOSINT monograph

Hadoop Rounds Up Open Source Goodies

July 17, 2015

Summer time is here and what better way to celebrate the warm weather and fun in the sun than with some fantastic open source tools.  Okay, so you probably will not take your computer to the beach, but if you have a vacation planned one of these tools might help you complete your work faster so you can get closer to that umbrella and cocktail.  Datamation has a great listicle focused on “Hadoop And Big Data: 60 Top Open Source Tools.”

Hadoop is one of the most adopted open source tool to provide big data solutions.  The Hadoop market is expected to be worth $1 billion by 2020 and IBM has dedicated 3,500 employees to develop Apache Spark, part of the Hadoop ecosystem.

As open source is a huge part of the Hadoop landscape, Datamation’s list provides invaluable information on tools that could mean the difference between a successful project and failed one.  Also they could save some extra cash on the IT budget.

“This area has a seen a lot of activity recently, with the launch of many new projects. Many of the most noteworthy projects are managed by the Apache Foundation and are closely related to Hadoop.”

Datamation has maintained this list for a while and they update it from time to time as the industry changes.  The list isn’t sorted on a comparison scale, one being the best, rather they tools are grouped into categories and a short description is given to explain what the tool does. The categories include: Hadoop-related tools, big data analysis platforms and tools, databases and data warehouses, business intelligence, data mining, big data search, programming languages, query engines, and in-memory technology.  There is a tool for nearly every sort of problem that could come up in a Hadoop environment, so the listicle is definitely worth a glance.

Whitney Grace, July 17, 2015
Sponsored by, publisher of the CyberOSINT monograph


The Skin Search

July 15, 2015

We reported on how billboards in Russia were getting smarter by using facial recognition software to hide ads advertising illegal products when they recognized police walking by.  Now the US government might be working on technology that can identify patterns on tattoos, reports Quartz in, “The US Government Wants Software That Can Detect And Interpret Your Tattoos.”

The Department of Justice, Department of Defense, and the FBI sponsored a competition that the National Institute of Standards and Technology (NIST) recently held on June 8 to research ways to identify ink:

“The six teams that entered the competition—from universities, government entities, and consulting firms—had to develop an algorithm that would be able to detect whether an image had a tattoo in it, compare similarities in multiple tattoos, and compare sketches with photographs of tattoos. Some of the things the National Institute of Standards and Technology (NIST), the competition’s organizers, were looking to interpret in images of tattoos include swastikas, snakes, drags, guns, unicorns, knights, and witches.”

The idea is to use visual technology to track tattoos among crime suspects and relational patterns. Vision technology, however, is still being perfected.  Companies like Google and major universities are researching ways to make headway in the technology.

While the visual technology can be used to track suspected criminals, it can also be used for other purposes.  One implication is responding to accidents as they happen instead of recording them.  Tattoo recognition is the perfect place to start given the inked variety available and correlation to gangs and crime.  The question remains, what will they call the new technology, skin search?

Whitney Grace, July 15, 2015

Sponsored by, publisher of the CyberOSINT monograph

Does America Want to Forget Some Items in the Google Index?

July 8, 2015

The idea that the Google sucks in data without much editorial control is just now grabbing brain cells in some folks. The Web indexing approach has traditionally allowed the crawlers to index what was available without too much latency. If there were servers which dropped a connection or returned an error, some Web crawlers would try again. Our Point crawler just kept on truckin’. I like the mantra, “Never go back.”

Google developed a more nuanced approach to Web indexing. The link thing, the popularity thing, and the hundred plus “factors” allowed the Google to figure out what to index, how often, and how deeply (no, grasshopper, not every page on a Web site is indexed with every crawl).

The notion of “right to be forgotten” amounts to a third party asking the GOOG to delete an index pointer in an index. This is sort of a hassle and can create some exciting moments for the programmers who have to manage the “forget me” function across distributed indexes and keep the eager beaver crawler from reindexing a content object.

The Google has to provide this type of third party editing for most of the requests from individuals who want one or more documents to be “forgotten”; that is, no longer in the Google index which the public users’ queries “hit” for results.

According to “Google Is Facing a Fight over Americans’ Right to Be Forgotten.” The write up states:

Consumer Watchdog’s privacy project director John Simpson wrote to the FTC yesterday, complaining that though Google claims to be dedicated to user privacy, its reluctance to allow Americans to remove ‘irrelevant’ search results is “unfair and deceptive.”

I am not sure how quickly the various political bodies will move to make being forgotten a real thing. My hunch is that it will become an issue with legs. Down the road, the third party editing is likely to be required. The First Amendment is a hurdle, but when it comes times to fund a campaign or deal with winning an election, there may be some flexibility in third party editing’s appeal.

From my point of view, an index is an index. I have seen some frisky analyses of my blog articles and my for fee essays. I am not sure I want criticism of my work to be forgotten. Without an editorial policy, third party, ad hoc deletion of index pointers distorts the results as much, if not more, than results skewed by advertisers’ personal charm.

How about an editorial policy and then the application of that policy so that results are within applicable guidelines and representative of the information available on the public Internet?

Wow, that sounds old fashioned. The notion of an editorial policy is often confused with information governance. Nope. Editorial policies inform the database user of the rules of the game and what is included and excluded from an online service.

I like dinosaurs too. Like a cloned brontosaurus, is it time to clone the notion of editorial policies for corpus indices?

Stephen E Arnold, July 8, 2015

Compound Search Processing Repositioned at ConceptSearching

July 2, 2015

The article titled Metadata Matters; What’s The One Piece of Technology Microsoft Doesn’t Provide On-Premises Or in the Cloud? on ConceptSearching re-introduces Compound Search Processing, ConceptSearching’s main offering. Compound Search Processing is a technology achieved in 2003 that can identify multi-word concepts, and the relationships between words. Compound Search Processing is being repositioned, with Concept Searching apparently chasing Sharepoint Sales. The article states,

“The missing piece of technology that Microsoft and every other vendor doesn’t provide is compound term processing, auto-classification, and taxonomy that can be natively integrated with the Term Store. Take advantage of our technologies and gain business advantages and a quantifiable ROI…

Microsoft is offering free content migration for customers moving to Office 365…If your content is mismanaged, unorganized, has no value now, contains security information, or is an undeclared record, it all gets moved to your brand new shiny Office 365.”

The angle for Concept Searching is metadata and indexing, and they are quick to remind potential customers that “search is driven by metadata.” The offerings of ConceptSearching comes with the promise that it is the only platform that will work with all versions of Sharepoint while delivering their enterprise metadata repository. For more information on the technology, see the new white paper on Compoud Term Processing.
Chelsea Kerwin, July 2, 2014

Sponsored by, publisher of the CyberOSINT monograph


CSC Attracts Buyer And Fraud Penalties

July 1, 2015

According to the Reuters article “Exclusive: CACI, Booz Allen, Leidos Eyes CSC’s Government Unit-Sources,” CACI International, Leidos Holdings, and Booz Allen Hamilton Holdings

have expressed interest in Computer Sciences Corp’s public sector division.  There are not a lot of details about the possible transaction as it is still in the early stages, so everything is still hush-hush.

The possible acquisition came after the news that CSC will split into two divisions: one that serves US public sector clients and the other dedicated to global commercial and non-government clients.  CSC has an estimated $4.1 billion in revenues and worth $9.6 billion, but CACI International, Leidos Holdings, and Booz Allen Hamilton might reconsider the sale or getting the price lowered after hearing this news: “Computer Sciences (CSC) To Pay $190M Penalty; SEC Charges Company And Former Executives With Accounting Fraud” from Street Insider.  The Securities and Exchange Commission are charging CSC and former executives with a $190 million penalty for hiding financial information and problems resulting from the contract they had with their biggest client.  CSC and the executives, of course, are contesting the charges.

“The SEC alleges that CSC’s accounting and disclosure fraud began after the company learned it would lose money on the NHS contract because it was unable to meet certain deadlines. To avoid the large hit to its earnings that CSC was required to record, Sutcliffe allegedly added items to CSC’s accounting models that artificially increased its profits but had no basis in reality. CSC, with Laphen’s approval, then continued to avoid the financial impact of its delays by basing its models on contract amendments it was proposing to the NHS rather than the actual contract. In reality, NHS officials repeatedly rejected CSC’s requests that the NHS pay the company higher prices for less work. By basing its models on the flailing proposals, CSC artificially avoided recording significant reductions in its earnings in 2010 and 2011.”

Oh boy!  Is it a wise decision to buy a company that has a history of stealing money and hiding information?  If the company’s root products and services are decent, the buyers might get it for a cheap price and recondition the company.  Or it could lead to another disaster like HP and Autonomy.

Whitney Grace, July 1, 2015

Sponsored by, publisher of the CyberOSINT monograph

Next Page »