Hacking the Internet of Things

November 17, 2016

Readers may recall that October’s DoS attack against internet-performance-management firm Dyn, which disrupted web traffic at popular sites like Twitter, Netflix, Reddit, and Etsy. As it turns out, the growing “Internet of Things (IoT)” facilitated that attack; specifically, thousands of cameras and DVRs were hacked and used to bombard Dyn with page requests. CNet examines the issue of hacking through the IoT in, “Search Engine Shodan Knows Where Your Toaster Lives.”

Reporter Laura Hautala informs us that it is quite easy for those who know what they’re doing to access any and all internet-connected devices. Skilled hackers can do so using search engines like Google or Bing, she tells us, but tools created for white-hat researchers, like Shodan, make the task even easier. Hautala writes:

While it’s possible hackers used Shodan, Google or Bing to locate the cameras and DVRs they compromised for the attack, they also could have done it with tools available in shady hacker circles. But without these legit, legal search tools, white hat researchers would have a harder time finding vulnerable systems connected to the internet. That could keep cybersecurity workers in a company’s IT department from checking which of its devices are leaking sensitive data onto the internet, for example, or have a known vulnerability that could let hackers in.

Even though sites like Shodan might leave you feeling exposed, security experts say the good guys need to be able to see as much as the bad guys can in order to be effective.

Indeed. Like every tool ever invented, the impacts of Shodan depend on the intentions of the people using it.

Cynthia Murrell, November 17, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Black-Hat SEO Tactics Google Hates

November 16, 2016

The article on Search Engine Watch titled Guide to Black Hat SEO: Which Practices Will Earn You a Manual Penalty? follows up on a prior article that listed some of the sob stories of companies caught by Google using black-hat practices. Google does not take kindly to such activities, strangely enough. This article goes through some of those practices, which are meant to “falsely manipulate a website’s search position.”

Any kind of scheme where links are bought and sold is frowned upon, however money doesn’t necessarily have to change hands… Be aware of anyone asking to swap links, particularly if both sites operate in completely different niches. Also stay away from any automated software that creates links to your site. If you have guest bloggers on your site, it’s good idea to automatically Nofollow any links in their blog signature, as this can be seen as a ‘link trade’.

Other practices that earned a place on the list include automatically generated content, cloaking and irrelevant redirects, and hidden text and links. Doorway pages are multiple pages for a key phrase that lead visitors to the same end destination. If you think these activities don’t sound so terrible, you are in great company. Mozilla, BMW, and the BBC have all been caught and punished by Google for such tactics. Good or bad? You decide.

Chelsea Kerwin, November 16, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Azure Search Overview

November 15, 2016

I know that Microsoft is a world leader in search and retrieval. Look at the company’s purchase of Fast Search & Transfer in 2008. Look at the search in Windows 7, 8, and 10. Look at the Microsoft research postings listed in Bing. I am convinced.

I did learn a bit more about Azure Search in “Microsoft Azure Search and Azure Backup Arrive in Canada.” I learned that search is now a service; for example:

Azure Search is Microsoft search-as-a-service solution for cloud. It allows customers to add search to their applications using REST API or .NET SDK. Microsoft handles the server and infrastructure management, meaning developers don’t need to worry about understanding search.

Here are the features I noted from the write up:

  • Query syntax including Boolean and Lucene conventions
  • Support for 56 different languages
  • Search suggestions for auto complete
  • Hit highlighting
  • Geo spatial support
  • Faceted navigation just like Endeca in 1998

The most interesting statement in the write up was in my opinion:

Microsoft handles the server and infrastructure management, meaning developers don’t need to worry about understanding search.

I love that one does not need to understand search. That’s what makes search so darned fascinating today. Systems which require no understanding. I also believe everything that a search system presents in a list of relevance ranked results. I really do. I, for example, believed that Fast Search & Transfer was the most wonderful search system in the world until, well, the investigators arrived. Azure is even more wonderful as a cloud appliance thing that developers do not need to understand. Great and wonderful.

Stephen E Arnold, November 15, 2016

The House Cleaning of Halevy Dataspace: A Web Curiosity

November 14, 2016

I am preparing three seven minute videos. That effort will be one video each week starting on 20 December 2016. The subject is my Google Trilogy, published by an antique outfit which has drowned in River Avon. The first video is about the 2004 monograph, The Google Legacy. I coined the term “Googzilla” in that 230 page discussion of how Google became baby Google. The second video summarizes several of the take aways from Google: The Calculating Predator, published in 2007. The key to the monograph is the bound phrase “calculating predator.” Yep, not the happy little search out most know and love. The third video hits the main points of Google: The Digital Gutenberg, published in 2009. The idea is that Google spits out more digital content than almost anyone. Few think of the GOOG as the content generator the company has become. Yep, a map is a digital artifact.

Now to the curiosity. I wanted to reference the work of Dr. Alon Halevy, a former University of Washington professor and founder of Nimble and Transformic. I had a stack of links I used when I was doing the research for my predator book. Just out of curiosity I started following the links. I do have PDF versions of most of the open source Halevy-centric content I located.

But guess what?

Dr. Alon Halevy has disappeared. I could not locate the open source version of his talk about dataspaces. I could not locate the Wayback Machine’s archived version of the Transformic.com Web site. The links returned these weird 404 errors. My assumption was that Wayback’s Web pages resided happily on the outfit’s servers. I was incorrect. Here’s what I saw:

image

I explored the bound phrase “Alon Halvey” with various other terms only to learn that the bulk of the information has disappeared. No PowerPoints, no much substantive information. There were a few “information objects” which have not yet disappeared; for example:

  • An ACM blog post which references “the structured data team” and Nimble and Transformic
  • A Google research paper which will not make those who buy into David Gelerter’s The Tides of the Mind thesis
  • A YouTube video of a lecture given at Technion.

I found the gap between my research gathered in 2005 to 2007 interesting. I asked myself, “How did I end up with so many dead links about a technology I have described as one of the most important in database, data management, data analysis, and information retrieval?

Here are the answers I formulated:

  1. The Web is a lousy source of information. Stuff just disappears like the Darpa listing of open source Dark Web software, blogs, and Web sites
  2. I did really terrible research and even worse librarian type behavior. Yep, mea culpa.
  3. Some filtering procedures became a bit too aggressive and the information has been swept from assorted indexes
  4. The Wayback Machine ran off the rails and pointed to an actual 2005 Web site which its system failed to copy when the original spidering was completed.
  5. Gremlins. Hey, they really do exist. Just ask Grace Hopper. Yikes, she’s not available.

I wanted to mention this apparent or erroneous scrubbing. The story in this week HonkinNews video points out that 89 percent of journalists do their research via Google. Now if information is not in Google, what does that imply for a “real” journalist trying to do an objective, comprehensive story? I leave it up to you, gentle reader, to penetrate this curiosity.

Watch for the Google Trilogy seven minute videos on December 20, 2016, December 27, 2016, and

Stephen E Arnold, November 14, 2016, and January 3, 2017. Free. No pay wall. No Patreon.com pleading. No registration form. Just honkin’ news seven days a week and some video shot on an old Bell+Howell camera in a log cabin in rural Kentucky.

Project Tor Releases the Browser Manual

November 14, 2016

Tor Browser, the gateway to Dark Web has got its user manual that tells users a step-by-step procedure to download, install use and uninstall the browser in the most efficient manner.

On the official Tor blog titled Announcing the Tor Browser User Manual it says:

The community team is excited to announce the new Tor Browser User Manual! The manual is currently only available in English. We will be adding more languages in the near future, as well as adding the manual to Transifex.

Web users are increasingly adopting secure browsers like Tor that shields them from online tracking. With this manual, users who are not well-versed with Dark Web and want to access it or want to surf the web anonymously will get detailed instructions on doing so.

Some of the critical areas (apart from basic instructions like download and install) covered in the manual include – circumventing the network restrictions, managing identities, securely connecting to Tor, managing plugins, and troubleshooting most common problems.

The manual was created after taking feedback from various mailing lists and IRC forums, as the blog points out:

During the creation of this manual, community feedback was requested over various mailing lists / IRC channels. We understand that many people who read this blog are not part of these lists / channels, so we would like to request that if you find errors in the manual or have feedback about how it could be improved, please open a ticket on our bug tracker and set the component to “community”.

The manual will soon be released in other major languages that will benefit non-English speaking users. The aim is to foster growth and adoption of Tor, however, will only privacy-conscious users will be using the browser?

Vishal Ingole, November 14, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Why Search When You Can Discover

November 11, 2016

What’s next in search? My answer is, “No search at all. The system thinks for you.” Sounds like Utopia for the intellectual couch potato to me.

I read “The Latest in Search: New Services in the Content Discovery Marketplace.” The main point of the write up is to highlight three “discovery” services. A discovery service is one which offers “information users new avenues to the research literature.”

See, no search needed.

The three services highlighted are:

  • Yewno, which is powered by an inference engine. (Does anyone remember the Inference search engine from days gone by?). The Yewno system uses “computational analysis and a concept map.” The problem is that it “supplements institutional discovery.” I don’t know what “institutional discovery” means, and my hunch is that folks living outside of rural Kentucky know what “institutional discovery” means. Sorry to be so ignorant.
  • ScienceOpen, which delivers a service which “complements open Web discovery.” Okay. I assume that this means I run an old fashioned query and ScienceOpen helps me out.
  • TrendMD, which “serves as a classic “onward journey tool” that aims to generate relevant recommendations serendipitously.”

I am okay with the notion of having tools to make it easier to locate information germane to a specific query. I am definitely happy with tools which can illustrate connections via concept maps, link analysis, and similar outputs. I understand that lawyers want to type in a phrase like “Panama deal” and get a set of documents related to this term so the mass of data can be chopped down by sending, recipient, time, etc.

But setting up discovery as a separate operation from keyword or entity based search seems a bit forced to me. The write up spins its lawn mower blades over the TrendMD service. That’s fine, but there are a number of ways to explore scientific, technical, and medical literature. Some are or were delightful like Grateful Med; others are less well known; for example, Mednar and Quertle.

Discovery means one thing to lawyers. It means another thing to me: A search add on.

Stephen E Arnold, November 11, 2016

Three Deadlines in October and November Mark Three Strikes on Google

November 11, 2016

The article titled Google Is Getting Another Extension to Counter EU Antitrust Charges on Fortune begs the question, how many more times will the teacher accept the “I need more time” argument? With the potential for over a billion dollar penalty of Google is found guilty, the company is vying for all the time it can get before answering accusations of unfair treatment of rival shopping services through its search results. The article tell us,

The U.S. technology giant was due to respond to the accusations on Thursday but requested more time to prepare its defense. The company now has until Nov. 7, a European Commission spokesman said. “Google asked for additional time to review the documents in the case file. In line with normal practice, the commission analysed the reasons for the request and granted an extension allowing Google to fully exercise its rights of defense,” he said.

If anyone is counting at this point, the case is now 6 years old, meaning it has probably graduated kindergarten and moved into the First Grade. The article does not comment on how many extensions have been requested altogether, but it does mention that another pair of deadlines are looming in Google’s near future. October 26 and October 31 are the dates by which Google must respond to the charges of blocking competitor advertisements and using the Android operating system to suppress rivals.

Chelsea Kerwin, November 11, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Google Biases: Real, Hoped For, or Imagined?

November 10, 2016

I don’t have a dog in this fight. Here at Beyond Search we point to open source documents and offer comments designed to separate the giblets from the goose feathers. Yep, that’s humor, gentle reader. Like it or not.

The write up “Opinion: Google Is Biased Toward Reputation-Damaging Content” pokes into an interesting subject. When I read the article, I thought, “Is this person a user of Proton Mail?”

The main point of the write up is that the Google relevance ranking method responds in an active manner to content which the “smart” software determines is negative. But people wrote the software, right? What’s up, people writing relevance ranking modules?

The write up states:

Google has worked very hard to interpret user intent when searches are conducted. It’s not easy to fathom what people may be seeking when they submit a keyword or a keyword phrase.

Yep, Google did take this approach prior to its initial public offering in 2004. Since then, I ask, “What changes did Google implement in relevance in the post IPO era?” I ask, “Did Google include some of the common procedures which have known weaknesses with regard to what lights the fires of the algorithms’ interests?”

The write up tells me:

Since Google cannot always divine a specific intention when a user submits a search query, it’s evolved to using something of a scattergun approach — it tries to provide a variety of the most likely sorts of things that people are generally seeking when submitting those keywords. When this is the name of a business or a person, Google commonly returns things like the official website of the subject, resumes, directory pages, profiles, business reviews and social media profiles. Part of the search results variety Google tries to present includes fresh content — newly published things like news articles, videos, images, blog posts and so on. [Emphasis added.]

Perhaps “fresh” content triggers the following relevance components? For example, fresh content signals change and change may mean that the “owner” of the Web page may be interested in buying AdWords. A boost for “new stuff” means that when a search result drifts lower over a span of a week or two, the willingness to buy AdWords goes up? I think about this question because it suggests that tuning certain methods provides a signal to the AdWords’ subsystems of people and code. I have described how such internal “janitors” within Google modules perform certain chores. Is this a “new” chore designed to create a pool of AdWords’ prospects? Alas, the write up does not explore this matter.

The write up points to a Googler’s public explanation of some of the relevance ranking methods in use today. That’s good information. But with the public presentations of Google systems and methods with which I am familiar, what’s revealed is like touching an elephant when one is blind. There is quite a bit more of the animal to explore and understand. In fact “understand” is pretty tough unless one is a Googler with access to other Googlers, the company’s internal database system, and the semi clear guidelines from whoever seems to be in charge at a particular time.

I highlighted this passage from the original write up as interesting:

I’ve worked on a number of cases in which all my research indicates my clients’ names have extremely low volumes of searches.  The negative materials are likely to receive no more clicks than the positive materials, according to my information, and, in many cases, they have fewer links.

Okay, so there’s no problem? If so, why is the write up headed down the Google distorts results path? My hunch is that the assurance is a way to keep Googzilla at bay. The author may want to work at the GOOG someday. Why be too feisty and remind the reader of the European Commission’s view of Google’s control of search results?

The write up concludes with a hope that Google says more about how it handles relevance. Yep, that’s a common request from the search engine optimization crowd.

My view from rural Kentucky is that there are a number of ways to have an impact on what Google presents in search results. Some of these methods exploit weaknesses in the most common algorithms used for basic functions within the Google construct. Other methods are available as well, but these are identified by trial and error by SEO wizards who flail for a way to make their clients’ content appear in the optimum place for one of the clients’ favorite keywords.

Three observations:

  • The current crop of search mavens at Google are in the business of working with what is already there. Think in terms of using a large, frequently modified, and increasingly inefficient system for determining relevance. That’s what the new hires confront. Fun stuff.
  • The present climate for relevance at Google is focused on dealing with the need to win in mobile search. The dominant market share in desktop search is not a given in the mobile world. Google is fragmenting its index for a reason. The old desktop model looks a bit like a 1990s Corvette. Interesting. Powerful. Old.
  • The need for revenue is putting more and more pressure on Google to make up for the mobile user behavior and the desktop user behavior in terms of search. Google is powerful, but different methods are needed to get closer to that $100 billion in revenue Eric Schmidt referenced in 2006. Relevance may be an opportunity.

My view is that Google is more than 15 years down the search road. Relevance is no longer defined by precision and recall. What’s important is reducing costs, increasing revenue, and dealing with the problems posed by Amazon, Facebook, Snapchat, et al.

Relevance is not high on the list of to dos in some search centric companies. Poking Google about relevance may produce some reactions. But not from me. I love the Google. Proton Mail is back in the index because Google allegedly made a “fix.” See. Smart algorithms need some human attention. If you buy a lot of AdWords, I would wager that some human Googlers will pay attention to you. Smart software isn’t everything once it alerts a Googler to activate the sensitivity function in the wetware.

Stephen E Arnold, November 10, 2016

Google Search Tips That Make You Say DUH

November 10, 2016

Unless you are establishing the trends in the search field, then there is room for you to learn new search-related skills.  Search is a basic function in the developed world and is more powerful than typing a word or phrase into Google’s search box.  Google also has more tricks in its toolbox than you might be aware of.  Single Grain published, “Google Like A Pro: 42 Of The Most Useful Google Search Tricks” that runs down useful ways to use the search engine.  Some of them, however, are cheap tricks we have discussed before.

Single Grain runs down the usual Google stats about how many people use the search engine, its multiple services, and the hard to find Advanced Search.  Here is a basic article description:

Here’s a list of 42 of the most useful Google search tricks that’ve probably never thought of—some practical, some just plain fun. Everyone knows how to Google, but by learning how to Google like a pro, you can harness the full power of the search giant and impress your boss and friends alike. Or at least find stuff.

These tips include: calculator, package tracker, stock watcher, tip calculator, conversions, weather, flight tracker, coin flipping, voice search, fact checking, and other tips you probably know.  What I love is that it treats Boolean operators as if they are a brand new thing.  They do not even use Boolean in the article!  Call me old school, but give credit where credit is due.

Whitney Grace, November 10, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Shining a Flashlight in Space

November 9, 2016

A tired, yet thorough metaphor of explaining the dark web is shining a flashlight in space.  If you shine a flashlight in space, your puny battery-powered beacon will not shed any light on the trillions of celestial objects that exist in the vacuum.  While you wave the flashlight around trying to see something in the cosmos, you are too blind to see the grand galactic show hidden by the beam.  The University of Michigan shared the article, “Shadow Of The Dark Web” about Computer Science and Engineering Professor Mike Cafarella and his work with DARPA.

Cafarella is working on Memex, a project that goes beyond the regular text-based search engine.  Using more powerful search tools, Memex concentrates on discovering information related to human trafficking.  Older dark web search tools skimmed over information and were imprecise.  Cafarella’s work improved dark web search tools, supplying data sets with more accurate information on traffickers, their contact information, and their location.

Humans are still needed to interpret the data as the algorithms do not know how to interpret the black market economic worth of trafficked people.  His dark web search tools can be used for more than just sex trafficking:

His work can help identify systems of terrorist recruitment; bust money-laundering operations; build fossil databases from a century’s worth of paleontology publications; identify the genetic basis of diseases by drawing from thousands of biomedical studies; and generally find hidden connections among people, places, and things.

I would never have thought a few years ago that database and data-mining research could have such an impact, and it’s really exciting,’ says Cafarella. ‘Our data has been shipped to law enforcement, and we hear that it’s been used to make real arrests. That feels great.

In order to see the dark web, you need more than a flashlight.  To continue the space metaphor, you need a powerful telescope that scans the heavens and can search the darkness where no light ever passes.

Whitney Grace, November 9, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta