CyberOSINT banner

Search the Snowden Documents

July 16, 2015

This cat has long since forgotten what the inside of the bag looked like. Have you perused the documents that were released by Edward Snowden, beginning in 2013? A website simply titled “Snowden Doc Search” will let you do just that through a user-friendly search system. The project’s Description page states:

“The search is based upon the most complete archive of Snowden documents to date. It is meant to encourage users to explore the documents through its extensive filtering capabilities. While users are able to search specifically by title, description, document, document date, and release date, categories also allow filtering by agency, codeword, document topic, countries mentioned, SIGADS, classification, and countries shared with. Results contain not only full document text, pdf, and description, but also links to relevant articles and basic document data, such as codewords used and countries mentioned within the document.”

The result of teamwork between the Courage Foundation and Transparency Toolkit, the searchable site is built upon the document/ news story archive maintained by the Edward Snowden Defense Fund. The sites Description page also supplies links to the raw dataset and to Transparency Toolkit’s Github page, for anyone who would care to take a look. Just remember, “going incognito doesn’t hide your browsing from your employer, your internet service provider, or the websites you visit.” (Chrome)

Cynthia Murrell, July 16 , 2015

Sponsored by, publisher of the CyberOSINT monograph

Pew, Pew, Phew: Bad News for Real Publishers

July 15, 2015

I am not a real publisher. I am mostly retired. I live in Harrod’s Creek, Kentucky. Google thinks I am in Greenspring, Kentucky. The mail person thinks I live in Louisville. The newspapers to which I subscribe think I am Tyson Arnold. Tyson, as you may recall, was one of my prized boxers.

Publishers, in short, don’t know that my dog reads their dead tree outputs. Ah, the life away from the hustle, bustle, tweets, and Facebook posts of the major metropolitan areas.

But apparently, even here, where the AR 15s lay waste to the squirrels, news comes via means other than printed publications. Bummer.

Navigate to “New Pew Data: More Americans Are Getting News on Facebook and Twitter.” I like the sonance of the “new pew” juxtaposition. But, to business. The write up reports:

Facebook and Twitter users across all demographics are increasingly using the social networks as news sources, though they are seeking out different types of news content on each platform…

The article points to a Pew research report, which I don’t think I will scrutinize. (I have a juicy new document from Recorded Future and a couple of European Community reports about the Dark Web.)

You, gentle reader, should plan to scrutinize the data in the study. For me, the report is old news.

For publishers, the Pew data in the study are a knife to the heart. I saw knives plunged into these outfits’ torso years ago.

Everyone seems to recognize that “real publishers” may be facing some challenges when they try to pump up those revenues. The only outfits who seem to be unaware of their plight are—wait for it—the publishers themselves.

Okay, back to more substantive stuff, not “the real world impact of journalism.”

Stephen E Arnold, July 15, 2015

Page Load Speed: Let Us Blame Those in Suits

July 14, 2015

I read “News Sites Are Fatter and Slower Than Ever.” Well, I am not sure about “ever.” I recall when sites simply did not work. Those sites never worked. You can check out the turtles if you can grab a peak at a crawler’s log file. Look for nifty codes like 2000, 4, or 12. Your mileage may vary, but the log file tells the tale.

The write up aims at news sites. My hunch is that the definition of a news site is one of those toip one percent things: The user is looking for information from a big name and generally clueless outfit like The Daily Whatever or a mash up of content from hither and yon.

Enter latency, lousy code, crazy ads from half baked ad servers, and other assorted craziness.

The write up acknowledges that different sites deliver different response times. Okay.

If you are interested in data, the article presents an interesting chart. You can see home page load times with and without ads. There’s a chart which shows page load times via different mobile connections.

The main point, in my opinion, is a good one:

Since its initial release 22 years ago, the Hyper Text Markup Language  (HTML) has gone through many iterations that make web sites richer and smarter than ever. But this evolution also came with loads of complexity and a surfeit of questionable features. It’s time to swing the pendulum back toward efficiency and simplicity. Users are asking for it and will punish those who don’t listen.

My hunch is that speed is a harsh task master. In our work, we have found that with many points in a process, resources are often constrained or poorly engineered. As a result, each new layer of digital plaster contributes to the sluggishness of a system.

Unless one has sufficient resources (money and expertise and time), lousy performance is the new norm. The Google rails and cajoles because slow downs end up costing my favorite search engine big bucks.

Most news sites do not get the message and probably never will. The focus is on another annoying overlay, pop up, or inline video.

Click away, gentle reader, click away. Many folks see the browser as the new Windows 3.11. Maybe browsers are the new Windows 3.11?

Stephen E Arnold, July 14, 2015

Elsevier and Its Business Model May Be Ageing Fast

July 13, 2015

If you need to conduct research and are not attached to a university or academic library, then you are going to get hit with huge subscription fees to have access to quality material.  This is especially true for the scientific community, but on the Internet if there is a will there most certainly is a way.  Material often locked behind a subscription service can be found if you dig around the Internet long enough, mostly from foreign countries, but the material is often pirated.  Gizmodo shares in the article, “Academic Publishing Giant Fights To Keep Science Paywalled” that Elsevier, one of the largest academic publishers, is angry about its content being stolen and shared on third party sites.  Elsevier recently filed a complaint with the New York District Court against Library Genesis and

“The sites, which are both popular in developing countries like India and Indonesia, are a treasure trove of free pdf copies of research papers that typically cost an arm and a leg without a university library subscription. Most of the content on Libgen and SciHub was probably uploaded using borrowed or stolen student or faculty university credentials. Elsevier is hoping to shut both sites down and receive compensation for its losses, which could run in the millions.”

Gizmodo acknowledges Elsevier has a right to complain, but they also flip the argument in the other direction by pointing out that access to quality scientific research material is expensive.  The article brings up Netflix’s entertainment offerings, with Netflix users pay a flat fee every month and have access to thousands of titles.  Netflix remains popular because it remains cheap and the company openly acknowledges that it sets its prices to be competitive against piracy sites.

Publishers and authors should be compensated for their work and it is well known that academics do not rake in millions, but access to academic works should be less expensive.  Following Netflix’s model or having a subscription service like Amazon Prime might be a better business model to follow.

Whitney Grace, July 13, 2015
Sponsored by, publisher of the CyberOSINT monograph

Misinformation and Truth: An Issue in Play

July 6, 2015

Navigate to “Italian Newspaper Creates Fake Restaurant to Prove TripAdvisor Sucks.” The story tells the story of a real journalistic operation which created a non existent restaurant. Then the real journalists contributed reviews of the vaporous eatery. TripAdvisor’s algorithms sucked in the content and, according to the write up,

declared La Scaletta the best restaurant in the town, beating out another highly-regarded restaurant with over 300 reviews (most of them positive).

Ah, real journalism, truth, and the manipulation of socially-anchored systems.

Now direct your attention to “Fact Verification As Easy as Spellcheck?” The point of this article is that figuring what’s accurate and inaccurate is non trivial. The write up reports:

Researchers at Indiana University decided to try a different approach to the problem.  Instead of trying to build complex logic into a program, researchers proposed something simpler.  Why not try measure the likelihood of a statement being true by analyzing the proximity of its terms and the specificity of its connectors?

The procedure involves a knowledge graph. Is this the same, much loved graph approach built with the most frequently used mathematical methods? No information to answer that question is in my files, gentle reader.

My radar is directed at Bloomington, Indiana. Perhaps more information will become available on software’s ability to figure out if the Italian restaurant is real or the confection of real journalists. Note: The GOOG seems to be laboring in this vineyard was well. See this Bezos story.

What if—just hypothetical, of course—the “truth” methods can be spoofed by procedures more sophisticated that cooking up some half cooked tortellini? Those common numerical methods are pliable, based on my team’s research. Really flexible when it comes to what’s “truth.”

Stephen E Arnold, July 6, 2015

Google: Is This an X Lab for Real Journalists

June 23, 2015

I have a colleague who retired. The newspaper for which he worked continued to make like interesting for those over the age of 55. I assume that other real journalists have discovered that the appetite for those born after 1950 is changing. Bring on the younger journalism grads. YouTube savvy? Great. A high traffic blog about veganism? Come on down. A Web site which is magnet for python programmers? Hey, want to work for us?

When I read “Introducing the News Lab,” I had two different thoughts:

  1. What a great idea
  2. Quite a pool of unemployed, under employed, and want to be professionals to tap
  3. How many publishers are like hungry bass in a big lake at a fishing tournament?
  4. How many journalists know how to make Google’s system sing and dance like a top billing at a vaudeville show?

According to the write up:

It’s hard to think of a more important source of information in the world than quality journalism. At its best, news communicates truth to power, keeps societies free and open, and leads to more informed decision-making by people and leaders. In the past decade, better technology and an open Internet have led to a revolution in how news is created, distributed, and consumed. And given Google’s mission to ensure quality information is accessible and useful everywhere, we want to help ensure that innovation in news leads to a more informed, more democratic world.

There you go. What about the right to be forgotten, filtering, predictive search results, and ads? Once again I am mashing up the math club’s manifesto with reality.

The idea is that the journalists embracing the GOOG will use the GOOG to produce content. I learned:

There’s a revolution in data journalism happening in newsrooms today, as more data sets and more tools for analysis are allowing journalists to create insights that were never before possible. To help journalists use our data to offer a unique window to the world, last week we announced an update to our Google Trends platform. The new Google Trends provides journalists with deeper, broader, and real-time data, and incorporates feedback we collected from newsrooms and data journalists around the world. We’re also helping newsrooms around the world tell stories using data, with a daily feed of curated Google Trends based on the headlines of the day, and through partnerships with newsrooms on specific data experiments.

The attentive reader will notice that I have removed the numerous links in the article. Clicking around in the middle of an important article is not something I do nor encourage.

Will the News Lab deliver the benefits journalists expect and the benefit some folks need? Will Google “put wood behind” this initiative or will it suffer the same fate as Web Accelerator? Will the service generate more magnetism than the many news efforts nosing into the datasphere? Will publishers jump with glee because Google empowers new content?

No answers yet.

Stephen E Arnold, June 23, 2015

Publishers Want to Dejuice Apple, Squash It

June 22, 2015

I read “Publishers Slam Apple over Presumptuous News App Conditions.” Publishers presumptuous? I know of one publisher who used my research and marketed it on Amazon without my permission. Was that presumptuous of IDC and its wizard Dave Schubmehl?

According to the write up:

Publishers are up in arms following an email from Apple about inclusion in the firm’s upcoming News application and the kind of conditions that will be imposed. The email said that participants are presumed to have accepted Apple’s terms unless they explicitly opt out. It’s the old opt-out over opt-in thing.

Yes, up in arms. I can see the publishers at the New York Athletic Club wielding their squash rackets with malice. My goodness, what a chilling thought. What if those white clad clubsters were to descend on the Apple store in Manhattan and threaten the geniuses?

My fears subsided when I read:

The service will draw content from publicly available RSS feeds, and it is possible that Apple will be challenged, according to one expert, but not in any really meaningful way.

My concern for a Squash Assault receded. Publishers may have to retire to the Yacht Club to find another option.

Stephen E Arnold, June 22, 2015

Big Data and Old, Incomplete Listicles

June 19, 2015

I enjoy lists of the most important companies, the top 25 vendors of a specialized service, and a list of companies I should monitor. Wonderful stuff because I encounter firms about which I have zero information in my files and about which I have heard nary a word.

An interesting list appears in “50 Big Data Companies to Follow.” The idea is that I should set up a Google Alert for each company and direct my Overflight system to filter content mentioning these firms. The problem with this post is that the information does not originate with Datamation or Data Science Center. The list was formulated by Sand in a story called “Sand Hill 50 “Swift and Strong” in Big Data.” The list was compiled prior to its publication in January 2014. This makes the list 18 months old. With the speed of change in Big Data, the list in my opinion is stale.

A similar list appears in “CRN 50 Big Data business Analytics Companies,” which appears on the Web site. This list appears to date from the middle of 2014, which makes it about a year old. Better but not fresh.

I did locate an update called “2015 Big Data 100: Business Analytics.” Locating a current list of Big Data companies was not easy. Presumably my search skills are sub par. Nevertheless, the list is interesting.

Here are some firms in Big Data which were new to me:

  • Guavas
  • Knime
  • Zoomdata

But the problem was that the CRN Web site presented only 46 vendors, not 100.


  • Datamation is pushing out via its feed links to old content originating on other publishers’ Web sites
  • The obscurity of the names in the list is the defining characteristic of the lists
  • Getting a comprehensive, current list of Big Data vendors is difficult. Data Science just listed 15 companies and back linked to Sand Hill. CRN displayed 46 companies but forced me to click on each listing. I could not view the entire list.

Not too useful, folks.

Stephen E Arnold, June 19, 2015

The Digital Gutenbergs Spur Their Chargers. Giddyap.

June 18, 2015

Forget the mom and pop app. A couple of big outfits are going to select and present information you will consume. Choice? Well, for those who are [a] busy, [b] unable to read, and [c] those with short attention spans—your life is going to be just peachy.

The first rumble comes from lovable Apple. Navigate to “Apple Inc. To Hire Journalists For Curated Content On News App.” I highlighted this passage:

Apple’s decision to hire journalists is the latest example of fusion between news media and tech companies. In the last few years, many social networks such as Facebook, Twitter, and LinkedIn, have hired editors and reporters from high profile news media, such as NBC and News Corp. Recently, Snapchat also hired reporters from CNN, and The Verge, a tech site.

The article reminded me that Facebook is ambling down a content path as well.

The next it is that the recruiting tool LinkedIn is going to use humans to “tailor news.” The details, which I assume are spot are, appear in “LinkedIn Brings Back Human Editors to Tailor News to You.” I circled this statement:

But to compete with these other products, Kothari knows that Pulse must offer something different. It’s the “world’s first personalized business news digest,” he says. More importantly, perhaps, LinkedIn’s Pulse is bringing back human editors, not just algorithms, to tailor the news you see to what it already knows about you. And yet it may not be alone—Apple is reportedly planning to curate news with the help of humans, too.

Also, the GOOG, already armed with APIs and the warm and fuzzy news service is taking another baby step into content as well. The story I printed out is called “A New Window into Our World with Real Time Trends.” Yep, just family because it is “our world.” Google says:

On the new, you’ll find a ranked, real-time list of trending stories that are gaining traction across Google. In addition to Search, we now look at trends from YouTube and Google News and combine them to better understand what topics and stories are trending across the web right now. The redesigned homepage is now available in 28 countries around the world, and we’ll continue to add more locations in the coming months.

What’s the impact of these digital Gutenberg twirls?

My initial reaction is that and similar services will be doing some talking with their investors. Whatever money these news recyclers have is probably not going to be enough to deal with the Apples, Facebooks, Googles, and LinkedIns of the world. Heck, LinkedIn may need more dough too.

Second, are there enough readers to allow each of these services to meet the expectations of the spreadsheet jockeys who project revenues? My hunch is that the answer is, “Nope.” More concentration ahead I opine.

And, third, what about the old line publishing companies which continue to pretend that their products and services are exactly what the market wants? More pain and not much gain I assume.

Exciting times for the digital Gutenbergs? Too bad my study Google: The Digital Gutenberg is out of print. If you are curious about this trend, let me know and I will spin up a PDF of that original study. Write

Stephen E Arnold, June 18, 2015

What Twitter Should Do: The New York Times Opines with Woulda, Coulda, Shoulda Ideas

June 14, 2015

Well, advice from the gray lady about what a digital company should do is fascinating. Frankly, I would be more inclined to go with Snoop Dogg than a newspaper which seems to have made floundering and gesticulating its principal business strategy since Jeff Pemberton walked out the door 40 years ago.


Navigate to “for Twitter, Future Means Here and Now.” Keep in mind that this link may require you to pay money or go on an Easter Egg Hunt for locate a hard copy of the newspaper. Not my problemo, gentle reader. It is the dead tree New York Times’ approach to information.

Here’s one of the passages I circle in yellow and then put a black Sharpie exclamation point next to the sentences:

Twitter, as a service, is many things to many people at different times. It is one of the world’s best sources for news and for jokes about news, a playground for professional networking, and a haven for that most human of pastimes, idle gossip. But because the service offers so many uses, Twitter, as a company, has had trouble focusing on one purpose for which it should aim to excel. The lack of concentration has damaged its prospects with users, investors and advertisers. Choosing a single intent for Twitter — and working to make that a reality — ought to be the next chief’s main task. Among the many uses that Twitter fulfills as a social network, there is one it is uniquely suited for: as a global gathering space for live events. When something goes down in the real world — when a plane crashes, an earthquake strikes, a basketball game gets crazy, or Kanye West hijacks an awards show — Twitter should aim to become the first and only app that people load up to comment on the news.

There you go. Make Twitter into a human intermediated version of the New York Times, lite edition. More data, less filling, and you trim your IQ as well.

I find that journalistic enterprises in the midst of revenue, profit, and innovation swamps have advice to give to digital companies fascinating. I wonder if the gray lady assumes that the stakeholders, Twitter management, and the advisers to the firm have failed to craft options, ideas, tactics, and strategies.

My hunch is that like many Internet centric communication services one rides a curve up due to novelty and apparent utility. Then a new thing comes along like WhatsApp or Jott, and the potential users of the older service just surf newness. Once the cachet fades, a phenomenon with which the New York Times may be familiar, the options just don’t deliver.

Amusing to me, however.

Stephen E Arnold, June 14, 2015

« Previous PageNext Page »