The Myth of Data Federation: Not a New Problem, Not One Easily Solved

July 8, 2020

I read “A Plan to Make Police Data Open Source Started on Reddit.” The main point of this particular article is:

The Police Data Accessibility Project aims to request, download, clean, and standardize public records that right now are overly difficult to find.

Interesting, but I interpreted the Silicon Valley centric write up differently. If you are a marketer of systems which purport to normalize disparate types of data, aggregate them, federate indexes, and make the data accessible, analyzable, retrievable, and bang on dead simple — stop reading now. I don’t want to deal with squeals from vendors about their superior systems.

For the individual reading this sentence, a word of advice. Fasten your seat belt.

Some points to consider when reading the article cited above, listening to a Vimeo “insider” sales pitch, or just doing techno babble with your Spin class pals:

  1. Dealing with disparate data requires time and money as well as NOT ONE but multiple software tools.
  2. Even with a well resourced and technologically adept staff, exceptions require attention. A failure to deal with the stuff in the Exceptions folder can skew the outputs of some Fancy Dan analytic systems. Example: How about that Detroit facial recognition system? Nifty, eh?
  3. The flows of real time data are a big problem — are you ready for this — a challenge to the Facebooks, Googles, and Microsofts of the world. The reason is that the volume of data and CHANGES TO THOSE ALREADY PROCESSED ITEMS OF INFORMATION is a very, very tough problem. No, faster processors, bigger pipes, and zippy SSDs won’t do the job. The trouble lies within, the intradevice and intra software module flow. The fix is to sample, and sampling increases the risk of inaccuracies. Example: Remember Detroit’s facial recognition accuracy. The arrested individual may share some impressions with you.
  4. The baloney about “all” data or “any” type is crazy talk. When one deals with more than 18,000 police forces in the US, outputs from surveillance devices from different vendors, and the geodumps of individuals and their ad tracking beacons — this is going to be mashed up and made usable. Noble idea. There are many noble ideas.

Why am I taking the time to repeat what anyone with experience in large scale data normalization and analysis knows?

Baloney can be thinly sliced, smeared with gochujang, and served on Delft plates. Know what? Still baloney.

Gobble this:

Still, data is an important piece of understanding what law enforcement looks like in the US now, and what it could look like in the future. And making that information more accessible, and the stories people tell about policing more transparent, is a first step.

But the killer assumption is that the humans involved don’t make errors, systems remain online, and file formats are forever.

That baloney. It really is incredible. Just not what you think.

Stephen E Arnold, July 8, 2020

Looking for News Like the Hawaii Volcano Eruption?

June 28, 2018

With the problem of fake news online, the news itself has often made headlines of late. We’ve noticed a couple different news-related moves from big players: TechCrunch reports on Google’s recent project in, “Google Experiments in Local News with an App Called Bulletin.” We learn that Apple, meanwhile, plans to integrate its recent acquisition, magazine aggregator Texture, into Apple News and an upcoming subscription service in, “Apple Said to Plan a ‘Netflix for News’ in Latest Push” at the Daily Herald.

Google’s Bulletin is a place for members of a community to post local news and event notices.  TechCrunch’s Sarah Perez suspects it’s also another attempt by Google to squeeze into the Social Media space. She observes:

“The move to delve into local news would have Google competing with other services where people already share news about what’s happening locally. Specifically, people tend to tweet or live stream when news is breaking …. Meanwhile, if they’re trying to promote a local event …  it’s likely that they’ll post that to the business’s Facebook Page, where it can then be discovered through the Page’s fans and surfaced in Facebook’s Local app. And if Google aims to more directly compete with local news resources like small-town print or online publishers or Patch, it could have a tougher road. Hyperlocal news has been difficult to monetize, and those who have made it work aren’t likely interested in shifting their limited time and energy elsewhere.”

Over at the Daily Herald, reporters Mark Gurman and Gerry Smith Bloomberg note that Apple cut 20 Texture workers shortly after acquiring the company, but we’re cautioned against reading too much into that. The article notes:

“An upgraded Apple News app with the subscription offering is expected to launch within the next year, and a slice of the subscription revenue will go to magazine publishers that are part of the program, [sources] said. … A new, simplified subscription service covering multiple publications could spur Apple News usage and generate new revenue in a similar manner to the $9.99 per month Apple Music offering.”

Will enough folks pay per month for news, like they do for (other) online entertainment? Perhaps now, when it is prudent to be skeptical, people are willing to pay up to 10 bucks a month for a trusted name. We shall see.

What’s clear is that when one looks for “news” about the Hawaii volcano, few of the online news services are useful. In order to keep up to date, old fashioned search, review, and read processes are the norm. Want current videos about the eruption on YouTube? Good luck with that too. Comprehensiveness may be impossible with free or low cost services, but chronological tags and spam content filtering could be helpful.

For slow moving lava, what’s the rush?

Cynthia Murrell, June 28, 2018

 

Filtered Content: Tactical Differences between Dow Jones and Thomson Reuters

December 5, 2017

You may know that Dow Jones has an online search company. The firm is called Factiva, and it is an old-school approach to finding information. The company recently announced a deal with an outfit called Curation. Founded by a former newspaper professional, Curation uses mostly humans to assemble reports on hot topics. Factiva is reselling these services, and advertising for customers in the Wall Street Journal. Key point: This is mostly a manual method. The approach was more in line with the types of “reports” available from blue chip consulting firms.

You may also know that Thomson Reuters has been rolling out machine curated reports. These have many different product names. Thomson Reuters has a large number of companies and brands. Not surprisingly, Thomson’s approach has to apply to many companies managed by executives who compete with regular competitors like Dow Jones but also among themselves. Darwin would have loved Thomson Reuters. The point is that Thomson Reuters’ approach relies on “smart” software.

You can read about Dow Jones’ play here.

You can read about Thomson Reuters’ play here.

My take is that these two different approaches reflect the painful fact that there is not clear path forward for professional publishing companies. In order to make money from electronic information, two of the major players are still experimenting. The digital revolution began, what?, about 40 years ago.

One would have thought that leading companies like Dow Jones and Thomson Reuters would have moved beyond the experimental stage and into cash cow land.

Not yet it seems. The reason for my pointing out these two different approaches is that there are more innovative methods available. For snapshots of companies which move beyond the Factiva and Thomson methods, watch Dark Cyber, a new program is available every Tuesday via YouTube at this link.

Stephen E Arnold, December 5, 2017

Newsreaders through Time

December 30, 2015

Those chart mavens at CBInsights have produced another timeline for wild and crazy Internet services. “The Rise and Fall of Venture Backed News Readers” makes clear the long odds traditional news producers face when trying to find a business model. The chart is a shopping list of case studies for MBA programs. The idea of providing “news” to the hungry minds with mobile devices and sci fi laptops seems to be a bit of a challenge. For investors, these services trigger opportunities to explain why their investments did not perform particularly well. The chart, intentionally or unintentionally, causes Flipboard to stand out from the crowd. It may be the red logo and bold faced type. Alternatively, Flipboard has managed to attract money over the last five years. The chart makes clear why an average millennial may want to take a vacation instead of investing in a newsreader start up.

Stephen E Arnold, December 30, 2015

Link Mischief at Feedly

January 8, 2014

Here is some content excitement of interest to journalists and bloggers everywhere. MakeUseOf informs us that “Feedly Was Stealing Your Content—Here’s the Story, and Their Code.” Apparently, the aggregation site was directing shared links to copies on their own site instead of to original articles, essentially stealing traffic. Writer James Bruce, eager to delve deeper into the code, makes it clear that he is following up on a discovery originally revealed by The Digital Reader.

For example, the article notes that Feedly is now sending links to the proper sites, but by way of JavaScript code instead of in the usual, server-level way. Bruce also noticed that, in its attempt to improve functionality, Feedly was stripping embedded items from content. Advertising, tracking, share buttons, even “donate” buttons—gone.

Bruce writes:

“Not only were Feedly scraping the content from your site, they were then stripping any original social buttons and rewriting the meta-data. This means that when someone subsequently shared the item, they would in fact be sharing the Feedly link and not the original post. Anyone clicking on that link would go straight to Feedly.

So what, you might ask? When a post goes viral, it can be of huge benefit to the site in question — raising page views and ad revenues, and expanding their audience. Feedly was outright stealing that specific benefit away from the site to expand its own user base. The Feedly code included checks for mobile devices that would direct the users to the relevant appstore page.

It wasn’t ‘just making the article easier to view’ — it was stealing traffic, plain and simple. That’s really not cool.”

The write-up goes on to detail the ways Feedly has responded to discoveries, where the issue stands now, and “what we have learnt”: Feedly made some bad choices in the pursuit of a streamlined reading experience. As a parting shot, Bruce cites another example of a bad call by the company—it briefly required a Google+ account to log in. He has a point there.

Cynthia Murrell, January 08, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Give A Hand For Old Fashioned Journalistic Bribery

March 26, 2013

A top news story used to either make or break a reporter, though it can still do so today, the old channels are mostly closed and monitored by the Internet beat. Reporters used to have to bribe sources and the best information continues to come from the source. In a throwback to the old days, The Guardian says, “Wall Street Journal Blames Beijing Troublemaking For US Bribery Probe.” The accusation is that the Chinese Wall Street Journal office bribed government officials with expensive gifts for information. The US Justice Department was already conducting an investigation on the Journal’s parent company News Corporation under the Foreign Corrupt Practices Act.

News Corporation believes that someone only wants to make trouble for the Journal and they are upset over the allegations. They also believe a Chinese government agent tipped off authorities. In an internal investigation, News Corporation did not find anything wrong.

How did this happen?

 

“The newspaper believes the bribery allegation came in relation to the Journal’s reporting of events in Chongqing, the province in which disgraced Chinese official Bo Xilai once had a power base.”

 

and:

“The report also comes in the wake of claims that China has hacked into the systems of US newspapers – allegations that are denied by Beijing.”

 

The proper authorities are conducting further investigation, while the US, England, and China argue back and forth, name-calling and the like. The new Chinese premier Li Keqiang even made a statement that everyone should forget this event and concentrate on preventing further cyber attacks. Only in a perfect world or if something bigger comes along, like North Korea gaining an atom bomb.

 

Whitney Grace, March 26, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Agile Solutions for Big Data Make Exabytes Less Overwhelming

November 14, 2012

Many narratives follow the phrase big data and ZDNet discusses the story that EMC tells about big data. Katharin Winkler, vice president of corporate sustainability for EMC, explained the elusive concept in layman’s terms at the Verge @ Greenbuild summit. The main reason that this concept needs to be brought down to a lower level is because big data is affecting everyone’s lives. It has effectively “escaped the data center.”

Back in 2000, two exabytes of new information were created in the world. In 2011, Winkler said the world was creating data at a rate of more than two exabytes of new information everyday.

In the article, “EMC Explains Making Big Data More Concrete to General Public” we learned about EMC’s strategy:

Winkler briefly outlined EMC’s overall strategy, dubbed “The Human Face of Big Data,” which is designed make big data more comprehensible for everyday Internet users. That strategy includes a book of the same name being published later this month, which features images from more than 150 photojournalists worldwide, demonstrating that basically every moment of our lives can now be chronicled in the cloud.

The possibilities with big data may seem overwhelming at times. Inherently, the opportunities are endless. However, these insights and information can only be delivered to decision-makers with the proper infrastructure technologies in place. We have had our eyes on PolySpot for their agile solutions in this department.

Megan Feil, November 14, 2012

Sponsored by ArnoldIT.com, developer of Augmentext.

Health Information Exchanges Making Progress

September 15, 2012

It looks like the healthcare field may finally be entering the twenty-first century. Agilex informs us that “Maine’s HealthInfoNet Supports CDC Program to Demonstrate the Preventive Care Value of Health Information Exchanges.” We believe the Health Info Exchange (HIE) idea is a good analytics sector, and look forward to following its growth.

The CDC program referred to in the title is long-windedly called “Demonstrating the Preventive Care Value of Health Information Exchanges”, and is being led by Agilex. In 2009, Maine was one of the first states to launch an HIE, a system that is maintained by HealthInfoNet. Since they have had time to work out any kinks, and because almost 80 percent of Maine residents have at least one record in the system, that state is the first to participate in the program.

The press release states:

“HealthInfoNet is using an open-source application called popHealth to de-identify, aggregate and securely transmit clinical quality measures to the Maine Center for Disease Control and Prevention (Maine CDC). Sponsored by the Office of the National Coordinator for Health IT (ONC), popHealth was developed to automate reporting of meaningful use measures from a provider’s electronic health record system while ensuring de-identification of the transmitted data. The application was selected for this program due to its ability to create population-level data that has been de-identified at both the patient and provider level. This population-level data can be used to inform statewide public health and heart disease prevention strategies.”

It sounds like popHealth is a valuable resource. Another important piece of the puzzle is the open source CONNECT platform, that allows HIE’s to share data externally, yet securely, via the Nationwide Health Information Network. See the article for more details.

Headquartered close to DC in Chantilly, Virginia, Agilex serves clients in federal, state, and local governments as well as corporations. They supply mission and technology consulting, software and solution development, and system integration services. In a nod to the company’s commitment to quality, their name combines “agility” with “expertise”. Agilex was founded in 2007.

Cynthia Murrell, September 15, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

KiteDesk Aggregates Cloud Services with Actionable Data

September 8, 2012

KiteDesk, a company focused on integrating multiple cloud services in one location, got a major redesign this week for the company’s official launch. According to the article released about the service on Tech Crunch, titled “KiteDesk Goes Where Greplin Failed: Aggregates Cloud Services for Search, Discovery & Interoperability,” the platform lets users connect email, contacts, calendar events, documents from social networking, and more in your KiteDesk account. From there, you can search all of these services at once and organize the data. KiteDesk is not the first company to try to aggregate the cloud, but most other startups have not fared well.

The article gives this insight:

“[…]KiteDesk co-founder and CEO Jack Kennedy says that he thinks companies that have attempted to compete in this space have been too narrowly focused to achieve the goals that are emerging for this class of software. ‘We see Personalized Information as a “Macro Trend” that’s buttressed by other trends like BYOD, consumerization of I.T., and a gradually diminishing line between personal and professional systems,’ he explains.”

KiteDesk may succeed where others have failed by focusing more on letting users move files between services and creating streams to customize data instead of simply searching and sharing. The company is currently taking sign-ups for the free service and we look forward to seeing more from this niche.

Andrea Hayden, September 08, 2012

Sponsored by ArnoldIT.com, developer of Augmentext.

Informative Paper on Patents

September 3, 2012

Many folks are alarmed and confused about the current state of technology patents, and rightly so. We have found an interesting paper that explains in great detail what has been happening, why and how, and what the trajectory means for the future. To be sure, “The Giants Among Us” (PDF) from Stanford Technology Law Review is not a coffee-break-length piece. It is, however, full of important facts, insights, and observations. A must-read for anyone concerned about today’s tech patent landscape.

The paper, written by Tom Ewing and Robin Feldman, begins with this observation:

“The patent world is quietly undergoing a change of seismic proportions. In a few short years, a

handful of entities have amassed vast treasuries of patents on an unprecedented scale. To give some

sense of the magnitude of this change, our research shows that in a little more than five years, the

most massive of these has accumulated 30,000-60,000 patents worldwide, which would make it the

5th largest patent portfolio of any domestic US company and the 15th largest of any company in the

world.

“These entities, which we call mass aggregators, do not engage in the manufacturing of products

nor do they conduct much research. Rather, they pursue other goals of interest to their founders and

investors.”

Indeed. The rest of the paper supplies facts about such mass aggregators (particularly Intellectual Ventures); gives a nod to potential positive effects; delineates the potential damages from the trend; and wraps up with ideas on what can and should be done. Ewing and Feldman proscribe regulatory oversight, transparency, and undermining trolls’ profit motive.

Excellent research, analysis, and conclusions. But will the FTC and DOJ listen?

Cynthia Murrell, September  03, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Next Page »

  • Archives

  • Recent Posts

  • Meta