Black-Hat SEO Tactics Google Hates

November 16, 2016

The article on Search Engine Watch titled Guide to Black Hat SEO: Which Practices Will Earn You a Manual Penalty? follows up on a prior article that listed some of the sob stories of companies caught by Google using black-hat practices. Google does not take kindly to such activities, strangely enough. This article goes through some of those practices, which are meant to “falsely manipulate a website’s search position.”

Any kind of scheme where links are bought and sold is frowned upon, however money doesn’t necessarily have to change hands… Be aware of anyone asking to swap links, particularly if both sites operate in completely different niches. Also stay away from any automated software that creates links to your site. If you have guest bloggers on your site, it’s good idea to automatically Nofollow any links in their blog signature, as this can be seen as a ‘link trade’.

Other practices that earned a place on the list include automatically generated content, cloaking and irrelevant redirects, and hidden text and links. Doorway pages are multiple pages for a key phrase that lead visitors to the same end destination. If you think these activities don’t sound so terrible, you are in great company. Mozilla, BMW, and the BBC have all been caught and punished by Google for such tactics. Good or bad? You decide.

Chelsea Kerwin, November 16, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

AI to Profile Gang Members on Twitter

November 16, 2016

Researchers from Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) are claiming that an algorithm developed by them is capable of identifying gang members on Twitter.

Vice.com recently published an article titled Researchers Claim AI Can Identify Gang Members on Twitter, which claims that:

A deep learning AI algorithm that can identify street gang members based solely on their Twitter posts, and with 77 percent accuracy.

The article then points out the shortcomings of the algorithm or AI by saying this:

According to one expert contacted by Motherboard, this technology has serious shortcomings that might end up doing more harm than good, especially if a computer pegs someone as a gang member just because they use certain words, enjoy rap, or frequently use certain emojis—all criteria employed by this experimental AI.

The shortcomings do not end here. The data on Twitter is being analyzed in a silo. For example, let us assume that few gang members are identified using the algorithm (remember, no location information is taken into consideration by the AI), what next?

Is it not necessary then to also identify other social media profiles of the supposed gang members, look at Big Data generated by them, analyze their communication patterns and then form some conclusion? Unfortunately, none of this is done by the AI. It, in fact, would be a mammoth task to extrapolate data from multiple sources just to identify people with certain traits.

And most importantly, what if the AI is put in place, and someone just for the sake of fun projects an innocent person as a gang member? As rightly pointed out in the article – machines trained on prejudiced data tend to reproduce those same, very human, prejudices.

Vishal Ingole, November  16, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

HonkinNews for 15 November 2016 Now Available

November 15, 2016

The weekly Beyond Search news video is available at this link. Stories include Mr. Thiel goes to Washington, the “best” entity extraction software and the not-so-best systems. You will learn the latest about the Yahoot security consequences, and more. The video also includes information about the US government’s open source code Web site. Stephen E Arnold points out that the Darpa Dark Web open source code is not included in the Code.gov offerings. Never fear. The Darpa listing does appear in the forthcoming Dark Web Notebook. If you want a copy of this new Beyond Search study, write benkent2020 at yahoo dot com and reserve your password protected PDF today.

Over the New Year’s break three, free special seven minute programs will air on December 20, December 27, and January 3, 2017, HonkinNews will run a weekly seven minute video. Each video presents the principal takeaways from Stephen E Arnold’s Google Trilogy: The Google Legacy (2004), Google Version 2 (2007), and Google: The Digital Gutenberg (2009). The information remains timely even though Alphabet Google is in a somewhat excited state of shifting in order to generate revenue as the volume of searches from the desktop declines squishing Google’s online ad methods for old fashioned Internet access.

Kenny Toth, November 15, 2016

French Smart Software Companies: Some Surprises

November 15, 2016

I read “French AI Ecosystem.” Most of the companies have zero or a low profile in the United States. The history of French high technology outfits remains a project for an enterprising graduate student with one foot in La Belle France and one in the USA. This write up is a bit of a sales pitch for venture capital in my opinion. The reason that VC inputs are needed is that raising money in France is — how shall I put this? — not easy. There is no Silicon Valley. There is Paris and a handful of other acceptable places to be intelligent. In the Paris high tech setting, there are a handful of big outfits and lots and lots of institutions which keep the French one percent in truffles and the best the right side of the Rhone have to offer. The situation is dire unless the start up is connected by birth, by education at one of the acceptable institutions, or hooked up with a government entity. I want to mention that there is a bit of French ethnocentrism at work in the French high tech scene. I won’t go into detail, but you can check it out yourself if you attend a French high tech conference in one of the okay cities. Ars-en-Ré and Gémenos  do not qualify. Worth a visit, however.

Now to the listings. You will have to work through the almost unreadable graphic or contact the outfit creating the listing, which is why the graphic is unreadable I surmise. From the version of the graphic I saw, I did discern a couple of interesting points. Here we go:

Three outfits were identified as having natural language capabilities. These are Proxem, syJLabs (no, I don’t know how to pronounce this”syjl” string. I can do “abs”, though.), and Yseop k(maybe, Aesop from the fable?). Proxem offers its Advanced Natural Language Object Orient Processing Environment (Antelope). The company was founded in 2007.) syJLabs does not appear in my file of French outfits, and we drew a blank when looking for the company’s Web site. Sigh. Yseop has been identified as a “top IT innovator” by an objective, unimpeachable, high value, super credible, wonderful, and stellar outfit (Ventana Research). Yseop, also founded in 2007, offers a system which “turns data into narrative in English, French, German, and Spanish, all at the speed of thousands of pages per second.”

As I worked through a graphic containing lots of companies, I spotted two interesting inclusions. The first is Sinequa, a vendor of search founded in 2002, now positioned as an important outfit in Big Data and machine learning. Fascinating. The reinvention of Sinequa is a logical reaction to the implosion of the market for search and retrieval for the enterprise. The other company I noted was Antidot, which mounted a push to the US market several years ago. Antidot, like Sinequa, focused on information access. It too is “into” Big Data and machine learning.

I noted some omissions; for example, Hear&Know, among others. Too bad the listing is almost unreadable and does not include a category for law enforcement, surveillance, and intelligence innovators.

Stephen E Arnold, November 15, 2016

Azure Search Overview

November 15, 2016

I know that Microsoft is a world leader in search and retrieval. Look at the company’s purchase of Fast Search & Transfer in 2008. Look at the search in Windows 7, 8, and 10. Look at the Microsoft research postings listed in Bing. I am convinced.

I did learn a bit more about Azure Search in “Microsoft Azure Search and Azure Backup Arrive in Canada.” I learned that search is now a service; for example:

Azure Search is Microsoft search-as-a-service solution for cloud. It allows customers to add search to their applications using REST API or .NET SDK. Microsoft handles the server and infrastructure management, meaning developers don’t need to worry about understanding search.

Here are the features I noted from the write up:

  • Query syntax including Boolean and Lucene conventions
  • Support for 56 different languages
  • Search suggestions for auto complete
  • Hit highlighting
  • Geo spatial support
  • Faceted navigation just like Endeca in 1998

The most interesting statement in the write up was in my opinion:

Microsoft handles the server and infrastructure management, meaning developers don’t need to worry about understanding search.

I love that one does not need to understand search. That’s what makes search so darned fascinating today. Systems which require no understanding. I also believe everything that a search system presents in a list of relevance ranked results. I really do. I, for example, believed that Fast Search & Transfer was the most wonderful search system in the world until, well, the investigators arrived. Azure is even more wonderful as a cloud appliance thing that developers do not need to understand. Great and wonderful.

Stephen E Arnold, November 15, 2016

Oh No! The Ads Are Becoming Smarter

November 15, 2016

I love Christmas and subsequent holiday season, although I am tired of it starting in October.  Thankfully the holiday music does not start playing until Thanksgiving week, as do the ads, although they have been sneaking into the year earlier and earlier.   I like the fact that commercials and Internet ads are inanimate objects, so I can turn them off.  IT Pro Portal tells me, however, that I might be in for a Christmas nightmare; “IBM’s Watson Now Used In Native Advertising” or the ads are becoming smarter!

While credit card expenditures, browsing history, and other factors are already used for individualized, targeted ads, they still remain a static tool dependent on external factors.  Watson is going to try be tried in the advertising game to improve targeting in native advertising.   Watson will add an aesthetic quality too:

The difference is – it’s not just looking at keywords as the practice was so far – it’s actually looking at the ad, determining what it’s about and then places it where it believes is a good fit. According to the press release, Watson “looks at where, why and how the existing editorial content on each site is ‘talking about’ subjects”, and then makes sure best ads are placed to deliver content in proper context.

Another way Watson’s implementation in advertising is “semantic targeting AI for native advertising.”  It will work in real-time and deliver more individualized targeted ads, over your recent Amazon, eBay, and other Web site shopping.  It is an interesting factor how Watson can disseminate all this information for one person, but if you imagine that the same technology is being used in the medical and law fields, it does inspire hope.

Whitney Grace, November 15, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Most Dark Web Content Is Legal and Boring

November 15, 2016

Data crunching done by an information security firm reveals that around 55% is legal and mundane like the clear or Open Web.

Digital Journal, which published the article Despite its Nefarious Reputation, New Report Finds Majority of Activity on the Dark Web is Totally Legal and Mundane, says that:

What we’ve found is that the dark web isn’t quite as dark as you may have thought,” said Emily Wilson, Director of Analysis at Terbium Labs. “The vast majority of dark web research to date has focused on illegal activity while overlooking the existence of legal content. We wanted to take a complete view of the dark web to determine its true nature and to offer readers of this report a holistic view of dark web activity — both good and bad.

The findings have been curated in a report The Truth About the Dark Web: Separating Fact from Fiction that puts the Dark Web in a new light. According to this report, around 55% of the content on Dark Web is legal; porn makes 7% of content on Dark Web, and most of it is legal. Drugs though is a favorite topic, only 45% of the content related to it can be termed as illegal. Fraud, extremism and illegal weapons trading on the other hand just make 5-7% of Dark Web.

The research methodology was done using a mix of machine intelligence and human intelligence, as pointed out in the article:

Conducting research on the dark web is a difficult task because the boundaries between categories are unclear,” said Clare Gollnick, Chief Data Scientist at Terbium Labs. “We put significant effort into making sure this study was based on a representative, random sample of the dark web. We believe the end result is a fair and comprehensive assessment of dark web activity, with clear acknowledgment of the limitations involved in both dark web data specifically and broader limitations of data generally.

Dark Web slowly is gaining traction as users of Open Web are finding utilities on this hidden portion of the Internet. Though the study is illuminating indeed, it fails to address how much of the illegal activity or content on Dark Web affects the real world. For instance, what quantity of drug trade takes place over Dark Web. Any answers?

Vishal Ingole, November  15, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

The House Cleaning of Halevy Dataspace: A Web Curiosity

November 14, 2016

I am preparing three seven minute videos. That effort will be one video each week starting on 20 December 2016. The subject is my Google Trilogy, published by an antique outfit which has drowned in River Avon. The first video is about the 2004 monograph, The Google Legacy. I coined the term “Googzilla” in that 230 page discussion of how Google became baby Google. The second video summarizes several of the take aways from Google: The Calculating Predator, published in 2007. The key to the monograph is the bound phrase “calculating predator.” Yep, not the happy little search out most know and love. The third video hits the main points of Google: The Digital Gutenberg, published in 2009. The idea is that Google spits out more digital content than almost anyone. Few think of the GOOG as the content generator the company has become. Yep, a map is a digital artifact.

Now to the curiosity. I wanted to reference the work of Dr. Alon Halevy, a former University of Washington professor and founder of Nimble and Transformic. I had a stack of links I used when I was doing the research for my predator book. Just out of curiosity I started following the links. I do have PDF versions of most of the open source Halevy-centric content I located.

But guess what?

Dr. Alon Halevy has disappeared. I could not locate the open source version of his talk about dataspaces. I could not locate the Wayback Machine’s archived version of the Transformic.com Web site. The links returned these weird 404 errors. My assumption was that Wayback’s Web pages resided happily on the outfit’s servers. I was incorrect. Here’s what I saw:

image

I explored the bound phrase “Alon Halvey” with various other terms only to learn that the bulk of the information has disappeared. No PowerPoints, no much substantive information. There were a few “information objects” which have not yet disappeared; for example:

  • An ACM blog post which references “the structured data team” and Nimble and Transformic
  • A Google research paper which will not make those who buy into David Gelerter’s The Tides of the Mind thesis
  • A YouTube video of a lecture given at Technion.

I found the gap between my research gathered in 2005 to 2007 interesting. I asked myself, “How did I end up with so many dead links about a technology I have described as one of the most important in database, data management, data analysis, and information retrieval?

Here are the answers I formulated:

  1. The Web is a lousy source of information. Stuff just disappears like the Darpa listing of open source Dark Web software, blogs, and Web sites
  2. I did really terrible research and even worse librarian type behavior. Yep, mea culpa.
  3. Some filtering procedures became a bit too aggressive and the information has been swept from assorted indexes
  4. The Wayback Machine ran off the rails and pointed to an actual 2005 Web site which its system failed to copy when the original spidering was completed.
  5. Gremlins. Hey, they really do exist. Just ask Grace Hopper. Yikes, she’s not available.

I wanted to mention this apparent or erroneous scrubbing. The story in this week HonkinNews video points out that 89 percent of journalists do their research via Google. Now if information is not in Google, what does that imply for a “real” journalist trying to do an objective, comprehensive story? I leave it up to you, gentle reader, to penetrate this curiosity.

Watch for the Google Trilogy seven minute videos on December 20, 2016, December 27, 2016, and

Stephen E Arnold, November 14, 2016, and January 3, 2017. Free. No pay wall. No Patreon.com pleading. No registration form. Just honkin’ news seven days a week and some video shot on an old Bell+Howell camera in a log cabin in rural Kentucky.

Lawyers Might Be Automated Too

November 14, 2016

The worry with artificial intelligence is that it will automate jobs and leave people without a way to earn income.  The general belief is that AI will automate manufacturing, retail, food service, and other industries, but what about law?  One would think that lawyers would never lose their jobs, because a human is required to navigate litigation and represent a person in court, right?  According to The Inquirer article, “UCL Creates AI ‘Lawbot’ That Rules on Cases With Surprising Accuracy” lawyers might be automated too.

On a level akin to Watson, researchers at University College London, led by Dr. Nikoalos Aletras, created an algorithm that peruses case information and can predict accurate verdicts.  The UCL team fed the algorithm litigation information from cases about torture, degrading treatment, privacy, and fair trials.  They hope the algorithm will be used to identify patterns in human rights abuses.

Dr. Aletras does not think AI will replace judges and lawyers, but it could be used as a tool to identify patterns in cases with specific outcomes.  The algorithm has a 79% accuracy rate, which is not bad considering the amount of documentation involved.  Also the downside is:

At a wider level, although 79 percent is a bit more ED-209 than we’d like for now, it does suggest that we’re a long way towards being able to install an ethical and moral code that would allow AI to … you know, not kill us and that.  With so many doomsayers warning us that the closer that we get to the so-called ‘singularity’ between humans and machines, the more likely we are to be toast as a race, it’s something of a good news story to see what’s being done to ensure AI stays on the straight and narrow.

Automation in the legal arena is a strong possibility for when “…implementation and interpretation of the law that is required, less so than the fact themselves.”  The human element is still needed to decide cases, but perhaps it would cut down on the amount of light verdicts for pedophiles, sex traffickers, rapists, and other bad guys.  It does make one wonder what alternative fields lawyers would consider?

Whitney Grace, November 14, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Project Tor Releases the Browser Manual

November 14, 2016

Tor Browser, the gateway to Dark Web has got its user manual that tells users a step-by-step procedure to download, install use and uninstall the browser in the most efficient manner.

On the official Tor blog titled Announcing the Tor Browser User Manual it says:

The community team is excited to announce the new Tor Browser User Manual! The manual is currently only available in English. We will be adding more languages in the near future, as well as adding the manual to Transifex.

Web users are increasingly adopting secure browsers like Tor that shields them from online tracking. With this manual, users who are not well-versed with Dark Web and want to access it or want to surf the web anonymously will get detailed instructions on doing so.

Some of the critical areas (apart from basic instructions like download and install) covered in the manual include – circumventing the network restrictions, managing identities, securely connecting to Tor, managing plugins, and troubleshooting most common problems.

The manual was created after taking feedback from various mailing lists and IRC forums, as the blog points out:

During the creation of this manual, community feedback was requested over various mailing lists / IRC channels. We understand that many people who read this blog are not part of these lists / channels, so we would like to request that if you find errors in the manual or have feedback about how it could be improved, please open a ticket on our bug tracker and set the component to “community”.

The manual will soon be released in other major languages that will benefit non-English speaking users. The aim is to foster growth and adoption of Tor, however, will only privacy-conscious users will be using the browser?

Vishal Ingole, November 14, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta