Emphasize Data Suitability over Data Quantity

November 30, 2016

It seems obvious to us, but apparently, some folks need a reminder. Harvard Business Review proclaims, “You Don’t Need Big Data, You Need the Right Data.” Perhaps that distinction has gotten lost in the Big Data hype. Writer Maxwell Wessel points to Uber as an example. Though the company does collect a lot of data, the key is in which data it collects, and which it does not. Wessel explains:

In an era before we could summon a vehicle with the push of a button on our smartphones, humans required a thing called taxis. Taxis, while largely unconnected to the internet or any form of formal computer infrastructure, were actually the big data players in rider identification. Why? The taxi system required a network of eyeballs moving around the city scanning for human-shaped figures with their arms outstretched. While it wasn’t Intel and Hewlett-Packard infrastructure crunching the data, the amount of information processed to get the job done was massive. The fact that the computation happened inside of human brains doesn’t change the quantity of data captured and analyzed. Uber’s elegant solution was to stop running a biological anomaly detection algorithm on visual data — and just ask for the right data to get the job done. Who in the city needs a ride and where are they? That critical piece of information let the likes of Uber, Lyft, and Didi Chuxing revolutionize an industry.

In order for businesses to decide which data is worth their attention, the article suggests three guiding questions: “What decisions drive waste in your business?” “Which decisions could you automate to reduce waste?” (Example—Amazon’s pricing algorithms) and “What data would you need to do so?” (Example—Uber requires data on potential riders’ locations to efficiently send out drivers.) See the article for more notes on each of these guidelines.

Cynthia Murrell, November 30, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Black-Hat SEO Tactics Google Hates

November 16, 2016

The article on Search Engine Watch titled Guide to Black Hat SEO: Which Practices Will Earn You a Manual Penalty? follows up on a prior article that listed some of the sob stories of companies caught by Google using black-hat practices. Google does not take kindly to such activities, strangely enough. This article goes through some of those practices, which are meant to “falsely manipulate a website’s search position.”

Any kind of scheme where links are bought and sold is frowned upon, however money doesn’t necessarily have to change hands… Be aware of anyone asking to swap links, particularly if both sites operate in completely different niches. Also stay away from any automated software that creates links to your site. If you have guest bloggers on your site, it’s good idea to automatically Nofollow any links in their blog signature, as this can be seen as a ‘link trade’.

Other practices that earned a place on the list include automatically generated content, cloaking and irrelevant redirects, and hidden text and links. Doorway pages are multiple pages for a key phrase that lead visitors to the same end destination. If you think these activities don’t sound so terrible, you are in great company. Mozilla, BMW, and the BBC have all been caught and punished by Google for such tactics. Good or bad? You decide.

Chelsea Kerwin, November 16, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Google Gives Third Day Keynote at Pubcon

November 1, 2016

Technology conferences are the thing to do when you want to launch a product, advertise a new business, network, or get a general consensus about the tech industry.  There are multiple conferences revolving around different aspects in the tech industry held each month.  In October 2016, Pubcon took place in Las Vegas, Nevada and they had a very good turn out.  The thing that makes a convention, though, is the guests.  Pubcon did not disappoint as on the third day, Google’s search expert Gary Illyes delivered the morning keynote.  (Apparently, Illyes also hold the title Chief of Sunshine and Happiness at Google).  Outbrain summed up the highlights of Pubcon 2016’s third day in “Pubcon 2016 Las Vegas: Day 3.”

Illyes spoke about search infrastructure, suggesting that people switch to HTTPS.  His biggest push for HTTPS was that it protected users from “annoying scenarios” and it is good for UX.  Google is also pushing for more mobile friendly Web sites.  It will remove “mobile friendly” from search results and AMP can be used to make a user-friendly site.  There is even bigger news about page ranking in the Google algorithm:

Our systems weren’t designed to get two versions of the same content, so Google determines your ranking by the Desktop version only. Google is now switching to a mobile version first index. Gary explained that there are still a lot of issues with this change as they are losing a lot of signals (good ones) from desktop pages that are don’t exist on mobile. Google created a separate mobile index, which will be its primary index. Desktop will be a secondary index that is less up to date.

As for ranking and spam, Illyes explained that Google is using human evaluators to understand modified search better, Rankbrain was not mentioned much, he wants to release the Panda algorithm, and Penguin will demote bad links in search results.  Google will also release “Google O for voice search.

It looks like Google is trying to clean up search results and adapt to the growing mobile market, old news and new at the same time.

Whitney Grace, November 1, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Facebook Still Having Trouble with Trending Topics

October 28, 2016

Despite taking action to fix its problems with Trending Topics,  Facebook is still receiving criticism on the issue. A post at Slashdot tells us, “The Washington Post Tracked Facebook’s Trending Topics for 3 Weeks, Found 5 Fake Stories and 3 Inaccurate Articles.” The Slashdot post by msmash cites a Washington Post article. (There’s a paywall if, like me, you’ve read your five free WP articles for this month.) The Post monitored Facebook’s Trending Topics for three weeks and found that issue far from resolved. Msmash quotes the report:

The Megyn Kelly incident was supposed to be an anomaly. An unfortunate one-off. A bit of (very public, embarrassing) bad luck. But in the six weeks since Facebook revamped its Trending system — and a hoax about the Fox News Channel star subsequently trended — the site has repeatedly promoted ‘news’ stories that are actually works of fiction. As part of a larger audit of Facebook’s Trending topics, the Intersect logged every news story that trended across four accounts during the workdays from Aug. 31 to Sept. 22. During that time, we uncovered five trending stories that were indisputably fake and three that were profoundly inaccurate. On top of that, we found that news releases, blog posts from sites such as Medium and links to online stores such as iTunes regularly trended. Facebook declined to comment about Trending on the record.

It is worth noting that the team may not have caught every fake story, since it only checked in with Trending Topics once every hour. Quite the quandary. We wonder—would a tool like Google’s new fact-checking feature help? And, if so, will Facebook admit its rival is on to something?

Cynthia Murrell, October 28, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

UltraSearch Releases Version 2.1

September 16, 2016

Now, after more than a year, we have a new version of a popular alternative to Windows’ built-in Desktop Search, UltraSearch. We learn the details from the write-up at gHacks.net, “UltraSearch 2.1 with File Content Search.” The application works by accessing a system’s master file table, so results appear almost instantly. Writer Martin Brinkmann informs us:

The list of changes on the official UltraSearch project website is long. While some of them may affect only some users, others are useful or at least nice to have for all. Jam Software, the company responsible for the search program, have removed the advertising banner from the program. There is, however, a new ‘advanced search’ menu option which links to the company’s TreeSize program in various ways. TreeSize is available as a free and commercial program.

As far as functional changes are concerned, these are noteworthy:

  1. File results are displayed faster than before.
  2. New File Type selection menu to pick file groups or types quickly (video files, Office files).
  3. Command line parameters are supported by the program now.
  4. The drive list was moved from the bottom to the top.
  5. The export dialog displays a progress dialog now.
  6. You may deactivate the automatic updating of the MFT index under Options > Include file system changes.

Brinkmann emphasizes that these are but a few of the changes in this extensive update, and suggests Windows users who have rejected it before give it another chance. We remind you, though, that UltraSearch is not your only Windows Desktop Search alternative. Some others include FileSearchEX, Gaviri Pocket SearchLaunchy. Locate32, Search EverythingSnowbird, Sow Soft’s Effective File Search, and Super Finder XT.

Launched back in 1997, Jam Software is based in Trier, Germany.  The company specializes in software tools to address common problems faced by users, developers, and organizations., like TreeSize, SpaceObserver, and, of course, UltraSearch. Though free versions of each are available, the company makes its money by enticing users to invest in the enhanced, professional versions.

Cynthia Murrell, September 16, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

Toshiba Amps up Vector Indexing and Overall Data Matching Technology

September 13, 2016

The article on MyNewsDesk titled Toshiba’s Ultra-Fast Data Matching Technology is 50 Times Faster than its Predecessors relates the bold claims swirling around Toshiba and their Vector Indexing Technology. By skipping the step involving computation of the distance between vectors, Toshiba has slashed the time it takes to identify vectors (they claim). The article states,

Toshiba initially intends to apply the technology in three areas: pattern mining, media recognition and big data analysis. For example, pattern mining would allow a particular person to be identified almost instantly among a large set of images taken by surveillance cameras, while media recognition could be used to protect soft targets, such as airports and railway stations*4by automatically identifying persons wanted by the authorities.

In sum, Toshiba technology is able to quickly and accurately recognize faces in the crowd. But the specifics are much more interesting. Current technology takes around 20 seconds to identify an individual out of 10 million, and Toshiba can do it in under a second. The precision rates that Toshiba reports are also outstanding at 98%. The world of Minority Report, where ads recognize and direct themselves to random individuals seems to be increasingly within reach. Perhaps more importantly, this technology should be of dire importance to the criminal and perceived criminal populations of the world

Chelsea Kerwin, September 13, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monographThere is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

The Decline of Free Software As a Failure of Leadership and Relevance

August 18, 2016

The article on Datamation titled 7 Reasons Why Free Software Is Losing Influence investigates some of the causes for the major slowdown in FOSS (free and open software software). The article lays much of the blame at the feet of the leader of the Free Software Foundation (FSF), Richard Stallman. In spite of his major contributions to the free software movement, he is prickly and occasionally drops Joe Biden-esque gaffes detrimental to his cause. He also has an issue when it comes to sticking to his message and making his cause relevant. The article explains,

“Over the last few years, Richard Stallman has denounced cloud computinge-bookscell phones in general, and Android in particular. In each case, Stallman has raised issues of privacy and consumer rights that others all too often fail to mention. The trouble is, going on to ignore these new technologies solves nothing, and makes the free software movement more irrelevant in people’s lives. Many people are attracted to new technologies, and others are forced to use them because others are.”

In addition to Stallman’s difficult personality, which only accounts for a small part of the decline in the FSF’s influence, the article also has other suggestions. Perhaps most importantly, the FSF is a tiny company without the resources to achieve its numerous goals like sponsoring the GNU Project, promoting social activism, and running campaigns against DRM and Windows.
 

Chelsea Kerwin, August 18, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/

 

More Data to Fuel Debate About Malice on Tor

June 9, 2016

The debate about malicious content on Tor continues. Ars Technica published an article continuing the conversation about Tor and the claims made by a web security company that says 94 percent of the requests coming through the network are at least loosely malicious. The article CloudFlare: 94 percent of the Tor traffic we see is “per se malicious” reveals how CloudFlare is currently handling Tor traffic. The article states,

“Starting last month, CloudFlare began treating Tor users as their own “country” and now gives its customers four options of how to handle traffic coming from Tor. They can whitelist them, test Tor users using CAPTCHA or a JavaScript challenge, or blacklist Tor traffic. The blacklist option is only available for enterprise customers. As more websites react to the massive amount of harmful Web traffic coming through Tor, the challenge of balancing security with the needs of legitimate anonymous users will grow. The same network being used so effectively by those seeking to avoid censorship or repression has become a favorite of fraudsters and spammers.”

Even though the jury may still be out in regards to the statistics reported about the volume of malicious traffic, several companies appear to want action sooner rather than later. Amazon Web Services, Best Buy and Macy’s are among several sites blocking a majority of Tor exit nodes. While a lot seems unclear, we can’t expect organizations to delay action.

 

Megan Feil, June 9, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Extensive Cultural Resources Available at Europeana Collections

May 17, 2016

Check out this valuable cultural archive, highlighted by Open Culture in the piece, “Discover Europeana Collections, a Portal of 48 Million Free Artworks, Books, Videos, Artifacts & Sounds from across Europe.” Writer Josh Jones is clearly excited about the Internet’s ability to place information and artifacts at our fingertips, and he cites the Europeana Collections as the most extensive archive he’s discovered yet. He tells us the works are:

“… sourced from well over 100 institutions such as The European Library, Europhoto, the National Library of Finland, University College Dublin, Museo Galileo, and many, many more, including contributions from the public at large. Where does one begin?

“In such an enormous warehouse of cultural history, one could begin anywhere and in an instant come across something of interest, such as the the stunning collection of Art Nouveau posters like that fine example at the top, ‘Cercle Artstique de Schaerbeek,’ by Henri Privat-Livemont (from the Plandiura Collection, courtesy of Museu Nacional d’Art de Catalynya, Barcelona). One might enter any one of the available interactive lessons and courses on the history of World War I or visit some of the many exhibits on the period, with letters, diaries, photographs, films, official documents, and war propaganda. One might stop by the virtual exhibit, ‘Photography on a Silver Plate,’ a fascinating history of the medium from 1839-1860, or ‘Recording and Playing Machines,’ a history of exactly what it sounds like, or a gallery of the work of Swiss painter Jean Antoine Linck. All of the artifacts have source and licensing information clearly indicated.”

Jones mentions the archive might be considered “endless,” since content is being added faster than anyone could hope to keep up with.  While such a wealth of information and images could easily overwhelm a visitor, he advises us to look at it as an opportunity for discovery. We concur.

 

Cynthia Murrell, May 17, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Google Relies on Freebase Machine ID Numbers to Label Images in Knowledge Graph

May 3, 2016

The article on Seo by the Sea titled Image Search and Trends in Google Search Using FreeBase Entity Numbers explains the transformation occurring at Google around Freebase Machine ID numbers. Image searching is a complicated business when it comes to differentiating labels. Instead of text strings, Google’s Knowledge Graph is based in Freebase entities, which are able to uniquely evaluate images- without language. The article explains with a quote from Chuck Rosenberg,

An entity is a way to uniquely identify something in a language-independent way. In English when we encounter the word “jaguar”, it is hard to determine if it represents the animal or the car manufacturer. Entities assign a unique ID to each, removing that ambiguity, in this case “/m/0449p” for the former and “/m/012×34” for the latter.”

Metadata is wonderful stuff, isn’t it? The article concludes by crediting Barbara Starr, a co-administrator of the Lotico San Diego Semantic Web Meetup, with noticing that the Machine ID numbers assigned to Freebase entities now appear in Google Trend’s URLs. Google Trends is a public web facility that enables an exploration of the hive mind by showing what people are currently searching. The Wednesday that President Obama nominated a new Supreme Court Justice, for example, had the top search as Merrick Garland.

 

Chelsea Kerwin, May 3, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Next Page »