CyberOSINT banner

Peripatetic Big Data: Hit the Road, Jack

February 16, 2015

I read “Patterns in Large Data Show How Information Travels.” Yep, it seems obvious that info moves around. Communication involves passing information from A to B. Isn’t that “moving around”? How naive.

The write up explains:

The results show that people care about local and regional information related to sports, media, celebrities or local places. Moreover, people from countries with similar language or historic backgrounds care about similar information.

Be still my heart. A quick flip through CyberOSINT makes clear that examining information in graph form has been around at least 15 years in the form of commercial software that performs these analyses. Yes, it is a good idea to be able to know when a person of interest communicates, what, to whom, when, and where. Ah, PhDs. Love ‘em.

Stephen E Arnold, February 16, 2015

Behind Search Improvements at Pinterest

February 13, 2015

As a Pinterest user myself, I know how important the site’s search function is. Now, as Gigaom informs us, “Pinterest Explains How It’s Making Its Search Work Better.” It sounds like an approach to semantic machine learning inspired by the crowdsourcing phenomenon. Writer Jonathan Vanian tells us:

“Dong Wang, the Pinterest software engineer who wrote the post, explained that even though a user may search for the word ‘turkey,’ it’s unclear what exactly that person may be looking for. Does he want to find turkey recipes, is he planning a trip to Turkey or is he just interested in poultry — it’s hard to say without some context.

“If that person decides to search for ‘turkey recipes’ as part of his next query, Pinterest takes that into account and can assume that the next person who may be searching for ‘turkey’ might also be craving some turkey recipes as well; maybe it’s holiday season and everyone’s hungry. Pinterest learned that ‘the information extracted from previous query log has shown to be effective in understanding the user’s search intent’ and this can be applied to other Pinterest users as well.”

Pinterest’s data-collection workflow is called QueryJoin, and engineers use it to draw conclusions like the one about turkey recipes, above. Factors analyzed also include data like pins’ image signatures and “engagement stats” like the number of clicks and re-pins it has received. For more information, see Dong Wang’s original post.

Cynthia Murrell, February 13, 2015

Sponsored by, developer of Augmentext

Facebook Gains Natural Language Capacity with Via AI Acquisition

February 11, 2015

Facebook is making inroads into the natural language space, we learn from “Facebook Buys, Adds Natural Language Knowhow” at ZDNet. Reporter Larry Dignan tells us the social-media giant gained more than 6,000 developers in the deal with the startup, who has created an open-source natural language platform with an eye to the “Internet of Things.” He writes:

“ is an early stage startup that in October raised $3 million in seed financing with Andreessen Horowitz as the lead investor. aims to create a natural language platform that’s open sourced and distributed. Terms of the deal weren’t disclosed, but indicates what Facebook is thinking. As the social network is increasingly mobile, it will need natural language algorithms and knowhow to add key features. Rival Google has built in a bevy of natural language tools into Android and Apple has its Siri personal assistant.”

Though the platform is free for open data projects, it earns its keep through commercial instances and queries-per-day charges. launched in October 2013, and is headquartered in Palo Alto, California.

Cynthia Murrell, February 11, 2015

Sponsored by, developer of Augmentext

German Spies Eye Metadata

January 13, 2015

Germany’s foreign intelligence arm (BND) refuses to be outdone by our NSA. The World Socialist Web Site reports, “German Foreign Intelligence Service Plans Real-Time Surveillance of Social Networks.” The agency plans to invest €300 million by 2020 to catch up to the (Snowden-revealed) capabilities of U.S. and U.K. agencies. The stated goal is to thwart terrorism, of course, but reporter Sven Heymann is certain the initiative has more to do with tracking political dissidents who oppose the austerity policies of recent years.

Whatever the motivation, the BND has turned its attention to the wealth of information to be found in metadata. Smart spies. Heymann writes:

“While previously, there was mass surveillance of emails, telephone calls and faxes, now the intelligence agency intends to focus on the analysis of so-called metadata. This means the recording of details on the sender, receiver, subject line, and date and time of millions of messages, without reading their content.

“As the Süddeutsche Zeitung reported, BND representatives are apparently cynically attempting to present this to parliamentary deputies as the strengthening of citizens’ rights and freedoms in order to sell the proposal to the public.”

“In fact, the analysis of metadata makes it possible to identify details about a target person’s contacts. The BND is to be put in a position to know who is communicating with whom, when, and by what means. As is already known, the US sometimes conducts its lethal and illegal drone attacks purely on the basis of metadata.”

The article tells us the BND is also looking into the exploitation of newly revealed security weaknesses in common software, as well as tools to falsify biometric-security images (like fingerprints or iris scans). Though Germany’s intelligence agents are prohibited by law from spying on their own people, Heymann has little confidence that rule will be upheld. After all, so is the NSA.

Cynthia Murrell, January 13, 2015

Sponsored by, developer of Augmentext

Delve, Social, and Other SharePoint Highlights of 2014

December 16, 2014

It is that time of year again – time for year-in-review articles regarding the tech that we know and love. And so it is for SharePoint. Lots of changes have been made and there are plenty of assumptions about the future. So CMS Wire tackles the overview in their article, “The SharePoint Landscape from 30,000 Feet.”

The author begins:

“With the end of the year around the corner, it’s a good time to take a 30,000-foot view of the lay of the SharePoint land and see what’s in store for 2015. While SharePoint may not be perfect, the technology is something many enterprises count on. We’ve seen great growth and energy in SharePoint over the past year and there are some events and developments that will be driving the technology next year.”

The author then goes on to discuss Delve and social projects, including apps. But experts caution that privacy will experience a resurgence in coming months, and the pendulum will swing back the other way, with enterprises concerned about keeping a tight reign on information. To stay on top of all of the latest developments in the new year, stay tuned in to Stephen E. Arnold at He has made a career out of parsing all things search, and his SharePoint feed is extremely helpful for all levels of users.

Emily Rae Aldridge, December 16, 2014

Xoogler Provides Google Plus Analysis

December 1, 2014

I don’t use Google Plus. I think an account was created when we set up Google Mail, but I am not sure. Furthermore, I am not sufficiently motivated to find out more.

But someone cares a lot about Google Plus. You can get a fairly interesting look at some of Google Plus’s “issues” by reading “Thoughts on Google+”: I F**ked Up. So Has Google.”

Google’s efforts, meanwhile, seem disjointed and confused, despite significant improvements to their settings and security features. If Google+ was intended to serve as Google’s “social backbone”, it should be the locus of control and access over the kind of information I’ve described above. And yet… it’s not. Far from it, in fact.

One of the factoids in the write up was that 3,000 people work on Google Plus. How many work on the Google Search Appliance? Two, six, seven?

Keep in mind the author of the analysis likes Google’s Loon balloons.

Stephen E Arnold, December 1, 2014


Remember That Twitter Search System?

November 26, 2014

I read “New Twitter Search API Won’t Be Available to Third-Party Clients.” The write up says:

Twitter doesn’t have the guts to just end them outright, so they’re just gradually inflicting passive-aggressive wounds over time to quietly shove them into the sunset.

The notion that unlimited, free access to the Twitter content resource is one with which I cannot relate. There are useful items tucked into Twitter, and the company is likely to become increasingly restrictive in the access to and use of the Twitter content objects and attendant metadata.

Stephen E Arnold, November 26, 2014

Watson Does Mail and Analytics to Complement Inventing Recipes

November 19, 2014

IBM is beating the drum for Watson. “IBM Brings Watson Tinged Analytics to New Mail and Social Platform” reports about “an enterprise social collaboration platform with built in analytics.”

When I read the article, I thought of Semandex. My recollection is that this New Jersey-based company has a similar system. Perhaps the IBM collaboration function will be different from what Semandex offers.

My reaction to the flow of Watson “news” is that IBM is going to have to shift into high gear in order to generate $1 billion in revenue from scripts and open source software. With the $10 billion target looming 60 months out, I would suggest that IBM needs to make big sales to high profile clients quickly and in a serial fashion.

Right now Watson is enriching public relations and marketing types. IBM needs big, high margin sales. We have identified 36 companies providing more advanced functions than Watson. Time may be running out, particularly if an IBM competitor snaps up two or three of the outfits on our watch list.

Stephen E Arnold, November 19, 2014

Microsoft Turns SharePoint Points Users to Yammer

November 11, 2014

SharePoint is a longstanding leader in enterprise search, but it continues to morph and shift in response to the latest technology and emerging needs. As the move toward social becomes more important, Microsoft is dropping outdated features and shifting its focus toward social components. Read more in the GCN article, “Microsoft Pushes Yammer as it Trims SharePoint Features.”

The article begins:

“Microsoft quietly retired some features from SharePoint Online while it enhanced mobile apps, email integration and collaboration tools of Yammer, the company’s cloud-based enterprise social networking platform. Microsoft MVP and SharePoint expert Vlad Catrinescu posted that the company was removing the Tasks menu option, and the Sync to Outlook button will also be removed. Additionally, SharePoint Online Notes and Tags were deprecated last month.”

Stephen E. Arnold is a longtime leader in search. He keeps a close eye on SharePoint, reporting his findings on The article hints that Microsoft is leaning toward moving to Yammer all the way, meaning that additional features are likely to be retired and collapsed into the new infrastructure. To keep up with all the changes, including the latest tips and tricks, stay tuned to Arnold’s specific SharePoint feed.

Emily Rae Aldridge, November 11, 2014

Twitter Bots Abound

September 23, 2014

Quartz grabs our attention with its headline, “Twitter Admits That as Many as 23 Million of Its Active Users Are Automated.” These accounts, which automatically request updates and may or may not also auto-post, include “users” like third-party data-display apps. Reporter Zachary M. Seward writes:

“The new disclosure was an attempt to clarify an earlier statement (pdf) that 14% of MAUs access the service outside of the official website and mobile apps, by using Twitter’s API. Twitter’s update today specifies that the 14% figure ‘included certain users who accessed Twitter through owned and operated applications.’ Those are likely TweetDeck and Twitter for Mac, which are favored by power tweeters but, for technical reasons, aren’t counted in many of the company’s official statistics. The company said only 11% of MAUs accessed Twitter from applications that the company doesn’t own, like Tweetbot or Flipboard.

“To be clear, automated accounts aren’t necessarily spam accounts, which according to Twitter make up less than 5% of MAUs. Bots can be useful, even essential, accounts for many Twitter users. But once they’re set up, they don’t usually have any humans behind them, which matters greatly to advertisers who are interested in reaching potential customers.”

Seward maintains that Twitter should be concerned for its advertisers (itself included), who may feel they are pouring ad dollars down a black hole. I’m sure they can work out some equitable fee structure(s). We wonder, though, what the implications are for high-value content that attracts interested readers.

Cynthia Murrell, September 23, 2014

Sponsored by, developer of Augmentext

Next Page »