Honkin' News banner

No More Data Mining for Intelligence

August 23, 2016

The U.S. intelligence community will no longer receive information from Dataminr, which serves as a Twitter “fire hose” (Twitter owns five percent of Dataminr). An article, Twitter Turns Off Fire Hose For Intelligence Community from ThreatPost offers the story. A Twitter spokesperson stated they have had a longstanding policy against selling data for surveillance. However, the Journal reported their arrangement was terminated after a CIA test program concluded. The article continues,

Dataminr is the only company allowed to sell data culled from the Twitter fire hose. It mines Tweets and correlates that data with location data and other sources, and fires off alerts to subscribers of breaking news. Reportedly, Dataminr subscribers knew about the recent terror attacks in Brussels and Paris before mainstream media had reported the news. The Journal said its inside the intelligence community said the government isn’t pleased with the decision and hopes to convince Twitter to reconsider.

User data shared on social media has such a myriad of potential applications for business, law enforcement, education, journalism and countless other sectors. This story highlights how applications for journalism may be better received than applications for government intelligence. This is something worth noticing.

Megan Feil, August 23, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/

Content Cannot Be Searched If It Is Not There

August 16, 2016

Google Europe is already dealing with a slew of “right to be forgotten” requests, but Twitter had its own recent fight with deletion related issue.  TechCrunch shares the story about “Deleted Tweet Archive PostGhost Shut Down After Twitter Cease And Desist” order.  PostGhost was a Web site that archived tweets from famous public figures.  PostGhost gained its own fame for recording deleted tweets.

The idea behind PostGhost was to allow a transparent and accurate record.  The Library of Congress already does something similar as it archives every Tweet.  Twitter, however, did not like PostGhost and sent them a cease and desist threatening to remove their API access.  Apparently,Google it is illegal to post deleted tweets, something that evolved from the European “right to be forgotten” laws.

So is PostGhost or Twitter wrong?

“There are two schools of thought when something like this happens. The first is that it’s Twitter’s prerogative to censor anything and all the things. It’s their sandbox and we just play in it.  The second school of thought says that Twitter is free-riding on our time and attention and in exchange for that they should work with their readers and users in a sane way.”

Twitter is a platform for a small percentage of users, the famous and public figures, who instantly have access to millions of people when they voice their thoughts.  When these figures put their thoughts on the Internet it has more meaning than the average tweet.  Other Web sites do the same, but it looks like public figures are exempt from this rule.  Why?  I am guessing money is exchanging hands.


Whitney Grace, August 16, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/

Snowden Makes Rare Comment on Putin’s Politics

August 15, 2016

I off hand heard a comment from someone living in Russia that President Vladimir Putin was returning the country to a time resembling the Soviet days.  To my western ears, that does not sound good.  Things are about to get worse for Russian citizens due to a new law the government signed into law.  Yahoo Tech reports in the article that “Putin Signs Controversial Anti-Terror Measures Into Law” that these new laws are meant to be anti-terror laws, but are better referred to as “Big Brother” laws.

The new laws give the government greater surveillance powers of its citizens.  This means that under the guise of providing extra security communications-based companies will be forced to store people’s calls, messages, photos, videos, and metadata for three years.  The companies must also allow security services full access to all the data and any encryption tools necessary.  It gets even worse:

“They also criminalise several offences, lower the age of criminal responsibility to 14 for some crimes and extend prison sentences for online crimes like abetting terrorism.  The passage of the bills through Russia’s lower and upper houses of parliament sent shockwaves through the internet and telecoms industries.”

Communications-based companies are worried that the new laws will cut into their profit margins.  It is predicted that the new infrastructure necessary to store the massive amount of data will cost four times the industry’s annual profit.  It is recommended that a tax on the entire industry, then use that money to build the infrastructure would be a better option.

The US whistleblower Edward Snowden, currently in Russia for asylum, made a rare comment on Russia’s politics via Twitter about the new laws:

“ ‘Signing the #BigBrother law must be condemned,’ he said, adding that he would criticise the law despite fearing retaliation from Russian authorities.”

Snowden wrote what is already written on the wall when it comes to Russia: Putin is changing the country for the worse and it is scary to imagine where it will go next.


Whitney Grace, August 15, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/

You Do Not Tay?

July 25, 2016

The article titled Microsoft CaptionBot: AI Image Guessing App Really Isn’t Sure Who Barack Obama Is on International Business Times assesses Microsoft’s latest attempt at AI following the catastrophic Twitter robot Tay which quickly “learned” and repeated some pretty darn offensive ideas about Hitler and Obama. The newly released version named CaptionBot is more focused on image descriptions. The article states,

“Users are asked to upload any photo to the site, then Microsoft’s AI system attempts to describe what is in the image. The system can recognise celebrities and understands the basics of image composition but…, it isn’t yet perfect… You know when you recognise someone, but can’t quite put your finger on who it is? Caption Bot doesn’t do that, it just fails to even describe what a photo of Barack Obama is, never mind who he might be.”

From the examples, it is clear that while CaptionBot is much better at understanding and defining objects than people, objects often create difficulty as well. An image of a yellow vehicle from Cars was described (without confidence) as a white toilet next to a yellow building. To be sure, if you stare at the image long enough, the toilet shape emerges.


Chelsea Kerwin, July 25, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden Web/DarkWeb meet up on July 26, 2016. Information is at this link: http://bit.ly/29tVKpx.2


Savanna 4.7 for External Content Links

June 22, 2016

The latest version of Savanna, the collaborative data-visualization platform from Thetus Corporation, has an important new feature—it can now link to external content. The press release at PR Newswire, “Savanna 4.7 Introduces Plugins, Opening ‘A World of New Content’ to Visual Analysis Software,” tells us:

“With Savanna, users can visualize data to document insights mined from complexity and analyze relationships. New in this release are Savanna Plugins. Plugins do more than allow users to import data. The game changer is in the ability to link to external content, leaving the data in its original source. Data lives in many places. Analyzing data from many sources often means full data transformation and migration into a new program.  This process is daunting and exactly what Savanna 4.7 Plugins address. Whether on databases or on the web, users can search all of their sources from one application to enrich a living knowledge base. Plugins also enable Savanna to receive streams of information from sources like RSS, Twitter, geolocators, and others.”

Thetus’ CTO is excited about this release, calling the new feature “truly transformative.” The write-up notes that Plugins opens new opportunities for Thetus to partner with other organizations. For example, the company is working with the natural language processing firm Basis Technology to boost translation and text mining capacities. Founded in 2003, Thetus is based in Portland, Oregon.


Cynthia Murrell, June 22, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Twitter Influential but a Poor Driver of News Traffic

June 20, 2016

A recent report from social analytics firm Parse.ly examined the relationship between Twitter and digital publishers. NeimanLab shares a few details in, “Twitter Has Outsized Influence, but It Doesn’t Drive Much Traffic for Most News Orgs, a New Report Says.” Parse.ly tapped into data from a couple hundred of its clients, a group that includes digital publishers like Business Insider, the Daily Beast, Slate, and Upworthy.

Naturally, news sites that make the most of Twitter do so by knowing what their audience wants and supplying it. The study found there are two main types of Twitter news posts, conversational and breaking, and each drives traffic in its own way. While conversations can engage thousands of users over a period of time, breaking news produces traffic spikes.

Neither of  those findings is unexpected, but some may be surprised that Twitter feeds are not inspiring more visits publishers’ sites. Writer Joseph Lichterman reports:

“Despite its conversational and breaking news value, Twitter remains a relatively small source of traffic for most publishers. According to Parse.ly, less than 5 percent of referrals in its network came from Twitter during January and February 2016. Twitter trails Facebook, Google, and even Yahoo as sources of traffic, the report said (though it does edge out Bing!)”

Still, publishers are unlikely to jettison their Twitter accounts anytime soon, because that platform offers a different sort of value. One that is, perhaps, more important for consumers. Lichterman quotes the report:

“Though Twitter may not be a huge overall source of traffic to news websites relative to Facebook and Google, it serves a unique place in the link economy. News really does ‘start’ on Twitter.”

And the earlier a news organization knows about a situation, the better. That is an advantage few publishers will want to relinquish.



Cynthia Murrell, June 20, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

The Missing Twitter Manual Located

April 7, 2016

Once more we turn to the Fuzzy Notepad’s advice and their Pokémon mascot, Evee.  This time we visited the fuzz pad for tips on Twitter.  The 140-character social media platform has a slew of hidden features that do not have a button on the user interface.  Check out “Twitter’s Missing Manual” to read more about these tricks.

It is inconceivable for every feature to have a shortcut on the user interface.   Twitter relies on its users to understand basic features, while the experienced user will have picked up tricks that only come with experience or reading tips on the Internet.  The problem is:

“The hard part is striking a balance. On one end of the spectrum you have tools like Notepad, where the only easter egg is that pressing F5 inserts the current time. On the other end you have tools like vim, which consist exclusively of easter eggs.

One of Twitter’s problems is that it’s tilted a little too far towards the vim end of the scale. It looks like a dead-simple service, but those humble 140 characters have been crammed full of features over the years, and the ways they interact aren’t always obvious. There are rules, and the rules generally make sense once you know them, but it’s also really easy to overlook them.”

Twitter is a great social media platform, but a headache to use because it never came with an owner’s manual.  Fuzzy notepad has lined up hint for every conceivable problem, including the elusive advanced search page.


Whitney Grace, April 7, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph


Retraining the Librarian for the Future

March 28, 2016

The Internet is often described as the world’s biggest library containing all the world’s knowledge that someone dumped on the floor.  The Internet is the world’s biggest information database as well as the world’s biggest data mess.  In the olden days, librarians used to be the gateway to knowledge management but they need to vamp up their skills beyond the Dewey Decimal System and database searching.  Librarians need to do more and Christian Lauersen’s personal blog explains how in, “Data Scientist Training For Librarians-Re-Skilling Libraries For The Future.”

DST4L is a boot camp for librarians and other information professionals to learn new skills to maintain relevancy.  Last year DST4L was held as:

“DST4L has been held three times in The States and was to be set for the first time in Europe at Library of Technical University of Denmark just outside of Copenhagen. 40 participants from all across Europe were ready to get there hands dirty over three days marathon of relevant tools within data archiving, handling, sharing and analyzing. See the full program here and check the #DST4L hashtag at Twitter.”

Over the course of three days, the participants learned about OpenRefine, a spreadsheet-like application that cane be used for data cleanup and transformation.  They also learned about the benefits of GitHub and how to program using Python.  These skills are well beyond the classed they teach in library graduate programs, but it is a good sign that the profession is evolving even if the academia aspects lag behind.

Whitney Grace, March 28, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph


Open Source Academic Research Hub Resurfaces on the Dark Web

March 11, 2016

Academics are no strangers to the shadowy corners of the Dark Web. In fact, as the The Research Pirates of the Dark Web published by The Atlantic reports, one university student in Kazakhstan populated the Dark Web with free access to academic research after her website, Sci-Hub was shut down in accordance with a legal case brought to court by the publisher Elsevier. Sci-Hub has existed under a few different domain names on the web since then, continuing its service of opening the floodgates to release paywalled papers for free. The article tells us,

“Soon, the service popped up again under a different domain. But even if the new domain gets shut down, too, Sci-Hub will still be accessible on the dark web, a part of the Internet often associated with drugs, weapons, and child porn. Like its seedy dark-web neighbors, the Sci-Hub site is accessible only through Tor, a network of computers that passes web requests through a randomized series of servers in order to preserve visitors’ anonymity.”

The open source philosophy continues to emerge in various sectors: technology, academia, and beyond. And while the Dark Web appears to be a primed for open source proponents to prosper, it will be interesting to see what takes shape. As the article points out, other avenues exist; scholars may make public requests for paywalled research via Twitter and using the hashtag #icanhazpdf.


Megan Feil, March 11, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph


How-To Overview of Building a Data Platform to Handle Real-Time Datasets

March 11, 2016

The article on Insight Data Engineering titled Building a Streaming Search Platform offers a glimpse into the Fellows Program wherein grad students and software engineers alike build data platforms and learn cutting-edge open source technologies. The article delves into the components of the platform, which enables close to real-time search of a streaming text data source, with Twitter as an example. It also explores the usefulness of such a platform,

On average, Twitter users worldwide generate about 6,000 tweets per second. Obviously, there is much interest in extracting real-time signal from this rich but noisy stream of data. More generally, there are many open and interesting problems in using high-velocity streaming text sources to track real-time events. … Such a platform can have many applications far beyond monitoring Twitter…All code for the platform I describe here can be found on my github repository Straw.”

Ryan Walker, a Casetext Data Engineer, describes how these products might deliver major results in the hands of a skilled developer. He uses the example of a speech to text monitor being able to transcribe radio or TV feeds and send the transcriptions to the platform. The platform would then seek key phrases and even be set up to respond with real-time event management. There are many industries that will find this capability very intriguing due to their dependence on real-time information processing, including finance and marketing.


Chelsea Kerwin, March 11, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph


Next Page »