True or False: Google Fakes Results for Social Engineering

September 13, 2016

Here in Harrod’s Creek, we love the Alphabet Google thing. When we read anti Google articles, we are baffled. Why don’t these articles love and respect the GOOG as we do? A case in point is “How Google’s Search Engines Use Faked Results for Social Engineering.” The loaded words “faked results” and “social engineering” put us on our guard.

What is the angle the write up pursues? Let’s look.

I highlighted this passage as a way get my intellectual toe in the murky water:

Google published an “overview” of how SEO works, but in a nutshell, Google searches for the freshest, most authoritative, easiest-to-display (desktop/laptop and mobile) content to serve its search engine users. It crawls, caches (grabs) content, calculates the speed of download, looks at textual content, counts words to find relevance, and compares how it looks on different sized devices. It not only analyzes what other sites link to it, but counts the number of these links and then determines their quality, meaning the degree to which the links in those sites are considered authoritative. Further, there are algorithms in place that block the listing of “spammy” sites, although, spam would not be relevant here. And recently, they have claimed to boost sites using HTTPS to promote security and privacy (fox henhouse?).

I am not sure about the “fox hen house” reference because fox is a popular burgoo addition. As a result the critters are few and far between. Too bad. They are tasty and their tails make nifty additions to cold weather parkas.

The author of the write up is not happy with how Google responds to a query for “Jihad.” I learned:

Google’s search results give pride of place to IslamicSupremeCouncil.org. The problem, according to the write up, is that this site is not a big hitter in the Jihad content space.

The article points out that Google does not return the search results the person running the test queries expected. The article points out:

When someone in the US, perhaps wanting to educate themselves on the subject, searches for “Jihad” and sees the Islamic Supreme Council as the top-ranked site, the perception is that this is the global, unbiased and authoritative view. If they click on that first, seemingly most popular link, their perception of Jihad will be skewed by the beliefs and doctrine of this peaceful group of people. These people who merely dabble on the edge of Islamic doctrine. These people who are themselves repeatedly targeted for their beliefs that are contrary to those of the majority of Muslims. These people who do not even come close to being any sort of credible or realistic representation of the larger and more prevalent subscribers (nay soldiers) of the “Lesser Jihad” (again, the violent kind).

My thought is that the results I expect from any ad supported, publicly accessible search system are rarely what I expect. The more I know about a particular subject—how legacy search system marketing distorts what the systems can actually do—the more disappointed I am with the search results.

I don’t think Google is intentionally distorting search results. Certain topics just don’t match up to the Google algorithms. Google is pretty good at sports, pizza, and the Housewives of Beverly Hills. Google is not particularly good with fine grained distinctions in certain topic spaces.

If the information presented by, for instance, the Railway Retirement Board is not searched, the Google system does its best to find a way to sell an ad against a topic or word. In short, Google does better with certain popular subjects which generate ad revenue.

Legacy enterprise search systems like STAIRS III are not going to be easy to search. Nailing down the names of the programmers in Germany who worked on the system and how the STAIRS III system influenced BRS Search is a tough slog with the really keen Google system.

If I attribute Google’s indifference to information about STAIRS III to a master scheme put in place by Messrs. Brin and Page, I would be giving them a heck of a lot of credit for micro managing how content is indexed.

The social engineering angle is more difficult for me to understand. I don’t think Google is biased against mainframe search systems which are 50 years old. The content, the traffic, and the ad focus pretty much guarantee that STAIRS III is presented in a good enough way.

The problem, therefore, is that Google’s whiz kid technology is increasingly good enough. That means average or maybe a D plus. The yardstick is neither precision nor recall. At Google, revenue counts.

Baidu, Bing, Silobreaker, Qwant, and Yandex, among other search systems, have similar challenges. But each system is tending to the “good enough” norm. Presenting any subject in a way which makes a subject matter expert happy is not what these systems are tuned to do.

Here in Harrod’s Creek, we recognize that multiple queries across multiple systems are a good first step in research. Then there is the task of identifying individuals with particular expertise and trying to speak with them or at least read what they have written. Finally, there is the slog through the dead tree world.

Expecting Google or any free search engine to perform sophisticated knowledge centric research is okay. We prefer the old fashioned approach to research. That’s why Beyond Search documents some of the more interesting approaches revealed in the world of online analysis.

I like the notion of social engineering, particularly the Augmentext approach. But Google is more interested in money and itself than many search topics which are not represented in a way which I would like. Does Google hate me? Nah, Google doesn’t know I exist. Does Google discriminate against STAIRS III? Nah, of Google’s 65,000 employees probably fewer than 50 know what STAIRS III is? Do Googlers understand revenue? Yep, pretty much.

Stephen E Arnold, September 13, 2016

Toshiba Amps up Vector Indexing and Overall Data Matching Technology

September 13, 2016

The article on MyNewsDesk titled Toshiba’s Ultra-Fast Data Matching Technology is 50 Times Faster than its Predecessors relates the bold claims swirling around Toshiba and their Vector Indexing Technology. By skipping the step involving computation of the distance between vectors, Toshiba has slashed the time it takes to identify vectors (they claim). The article states,

Toshiba initially intends to apply the technology in three areas: pattern mining, media recognition and big data analysis. For example, pattern mining would allow a particular person to be identified almost instantly among a large set of images taken by surveillance cameras, while media recognition could be used to protect soft targets, such as airports and railway stations*4by automatically identifying persons wanted by the authorities.

In sum, Toshiba technology is able to quickly and accurately recognize faces in the crowd. But the specifics are much more interesting. Current technology takes around 20 seconds to identify an individual out of 10 million, and Toshiba can do it in under a second. The precision rates that Toshiba reports are also outstanding at 98%. The world of Minority Report, where ads recognize and direct themselves to random individuals seems to be increasingly within reach. Perhaps more importantly, this technology should be of dire importance to the criminal and perceived criminal populations of the world

Chelsea Kerwin, September 13, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monographThere is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

Elastic Links Search and Social Through Graph Capabilities

September 13, 2016

The article titled Confused About Relationships? Elasticsearch Gets Graphic on The Register communicates the latest offering from Elasticsearch, the open-source search server based on Apache’s Lucene. Graph capabilities are an exciting new twist on search that enables users to map out relationships through the search engine and the Kibana data visualization plug-in. The article explains,

By fusing graph with search, Elastic hopes to combine the power of social with that earlier great online revolution, the revolution that gave us Google: search. Graph in Elasticsearch establishes relevance by establishing the significance of each relationship versus the global average to return important results. That’s different to what Elastic called “traditional” relationship mapping, which is based on a count of the frequency of a given relationship.

Elasticsearch sees potential for their Graph capabilities in behavioral analysis, particularly in areas such as drug discovery, fraud detection, and customized medicine and recommendations. When it comes to identifying business opportunities, Graph databases have already proven their value. Discovering connections and trimming degrees of separation are all of vital importance in social media. Social networks like Twitter have been using them since the beginning of NoSQL. Indeed, Facebook is a customer of Elastic, the business version of Elasticsearch that was founded in 2012. Other users of Elasticsearch include Netflix, StumbleUpon, and Mozilla.

Chelsea Kerwin, September 13, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

Autonomy Back Home in Merrie Olde England

September 12, 2016

I read “Hewlett Packard Offloads Last Autonomy Assets in Software Deal.” I think that Autonomy is now going back home. Blood pudding, the derbies, and Indian take aways—yes, the verdant isle.

The union of Hewlett Packard (once an ink outfit) and the love child of Bayesian and Laplacian methods is burst asunder. HPE (the kissin’ cousin of the ink outfit) fabricated a deal only lawyers, MBAs, and accountants can conjure.

There is an $8 billion deal, cash to HPE, and a fresh swath of lush pasture for Micro Focus to cultivate.

I learned:

“Autonomy doesn’t really exist as an entity, just the products,” said Kevin Loosemore, executive chairman of Micro Focus. Loosemore said the Newbury-based business conducted due diligence across all of the products included in the deal, with no different approach taken for the Autonomy assets. No legal liabilities from Autonomy will be transferred to Micro Focus.

Integration is what Micro Focus does. Autonomy embodied in products was once a goal for some senior Autonomy executives. The golden sun is rising over the mid 1990s technology.

We wish Micro Focus well. We wish HPE well as it moves toward the resolution of its claims against Autonomy for assorted misdeeds.

Without search, HPE ceases to interest me. While HPE was involved in search, there was some excitement generated, but that is winding down and, for some I imagine, has long since vaporized.

I will have fond memories of HP blaming Autonomy for HP’s decision to buy Autonomy. Amazing. One of the great comedic moments in search and fading technology management.

Autonomy is dead. Long live Autonomy. Bayes lasted 60 years; Autonomy may have some legs even if embodied in other products. IDOL hands are the devil’s playthings I think. PS. I will miss the chipper emails from BM.com. Substantive stuff.

Stephen E Arnold, September 12, 2016

Ads Appear Here, There, and Everywhere Across Google Landscape

September 12, 2016

The article on CNN Money titled Google Is Going to Start Showing You More Ads discusses the surge in ads that users can expect to barely notice over the coming weeks and months. In efforts to ramp up mobile ad revenue to match the increasing emphasis on mobile search, Google is making mobile ads bigger, more numerous, and just more. The article explains,

Google will be simplifying the work flow for businesses to create display ads with images. The company says advertisers need to “simply provide headlines, a description, an image, and a URL,” and Google will automatically design ads for the business. Location-based ads will start showing up on Google too. If you search for “shoe store” or “car repair near me,” ads for local businesses will populate the search results… The changes come as Google is trying to stay ahead of customers’ changing demands.

Google claims in the article that the increase is already showing strong results for advertisers, which click-through rates (CTR) up 20%. But it is hard to believe. As ads flood the space between articles, search results, and even Google Map directions, they seem to be no more significant than an increase in white noise. If Google really wants to revolutionize marketing, they are going to need to dig deeper than just squeezing more ads in between the lines.

Chelsea Kerwin, September 12, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

 

Google Springboard: Diving into Familiar Water

September 10, 2016

In June 2016, Google we learned the creator of the late, the replacement for the champion Google Search Appliance was bouncing up and down on the enterprise search diving board. Springboard, GOOG’s latest “new” search product  was was, like the GSA, designed to put the right information at one’s fingertips. After the announcement in the Google for Work Official Blog, the product has down shallow dives in kiddie pools. Three laps later, Google is checking out more competitive indoor swimming facilities.

We learned this in “Box Teams Up with Google for Docs and Springboard Integration.” The announcement reveals a different approach to enterprise search for the GOOG. In the good old days, one could pony up hefty sums to license the Google Search Appliance. Google had determined more than a decade ago that on premises enterprise search systems like Autonomy IDOL (RIP) or Fast Search & Transfer ESP were too difficult for mere mortal to deploy in a cost effective manner. Google figured a search appliance, a finding toaster if I may craft a metaphor, was the solution. It really wasn’t. Google backed away from the expensive servers. From the fit go, Google’s use of on premises, old fashioned hardware seemed to run counter to the Google cloud ad search business.

We noted this statement in the “Box Teams Up” write up:

It may seem a little odd for Google to be collaborating with Box on cloud storage when Google has its own offering there, which is also a revenue driver for the search giant. But the partnership is actually only really likely to benefit customers of both groups, without really biting into the customer base of either, given the distinctions between what Box and Google Drive can provide.

The major features of Springboard from what we can see from our cabin in Harrod’s Creek are:

  • Connectors to federate content
  • Quick and easy searching across the content
  • Assistance with “useful and actionable information throughout the day.

For more than six years the savvy Alphabet Google thing watched Amazon, Elastic, SearchBlox, Yippy and other vendors roll out cloud search solutions. As surprising as it is to some people, Google’s slow response to cloud based enterprise search underscores the malaise which seems to be emerging around the volleyball court. Will Googlers execute perfectly an arm stand back double somersault tuck into the pool from its Springboard?

Google’s marketing reminded me that I was  19 percent of one’s time looking for information. If I own a GSA (which I no longer possess), that device did not really help me out if Google’s data are correct? Will Springboard?

We will have to wait for an enterprise search competition before we know if Google wins a medal. One hopes Springboard will have that Elastic bounce.

Stephen E Arnold, September 10, 2016

Enterprise Search: Pool Party and Philosophy 101

September 8, 2016

I noted this catchphrase: “An enterprise without a semantic layer is like a country without a map.” I immediately thought of this statement made by Polish-American scientist and philosopher Alfred Korzybski:

The map is not the territory.

When I think about enterprise search, I am thrilled to have an opportunity to do the type of thinking demanded in my college class in philosophy and logic. Great fun. I am confident that any procurement team will be invigorated by an animated discussion about representations of reality.

I did a bit of digging and located “Introducing a Graph-based Semantic Layer in Enterprises” as the source of the “country without a map” statement.

What is interesting about the article is that the payload appears at the end of the write up. The magic of information representation as a way to make enterprise search finally work is technology from a company called Pool Party.

Pool Party describes itself this way:

Pool Party is a semantic technology platform developed, owned and licensed by the Semantic Web Company. The company is also involved in international R&D projects, which continuously impact the product development. The EU-based company has been a pioneer in the Semantic Web for over a decade.

From my reading of the article and the company’s marketing collateral it strikes me that this is a 12 year old semantic software and consulting company.

The idea is that there is a pool of structured and unstructured information. The company performs content processing and offers such features as:

  • Taxonomy editor and maintenance
  • A controlled vocabulary management component
  • An audit trail to see who changed what and when
  • Link analysis
  • User role management
  • Workflows.

The write up with the catchphrase provides an informational foundation for the company’s semantic approach to enterprise search and retrieval; for example, the company’s four layered architecture:

image

The base is the content layer. There is a metadata layer which in Harrod’s Creek is called “indexing”. There is the “semantic layer”. At the top is the interface layer. The “semantic” layer seems to be the secret sauce in the recipe for information access. The phrase used to describe the value added content processing is “semantic knowledge graphs.” These, according to the article:

let you find out unknown linkages or even non-obvious patterns to give you new insights into your data.

The system performs entity extraction, supports custom ontologies (a concept designed to make subject matter experts quiver), text analysis, and “graph search.”

Graph search is, according to the company’s Web site:

Semantic search at the highest level: Pool Party Graph Search Server combines the power of graph databases and SPARQL engines with features of ‘traditional’ search engines. Document search and visual  analytics: Benefit from additional  insights through interactive visualizations of reports and search results derived from your data lake by executing sophisticated SPARQL queries.

To make this more clear, the company offers a number of videos via YouTube.

The idea reminded us of the approach taken in BAE NetReveal and Palantir Gotham products.

Pool Party emphasizes, as does Palantir, that humans play an important role in the system. Instead of “augmented intelligence,” the article describes the approach methods which “combine machine learning and human intelligence.”

The company’s annual growth rate is more than 20 percent. The firm has customers in more than 20 countries. Customers include Pearson, Credit Suisse, the European Commission, Springer Nature, Wolters Kluwer, and the World Bank and “many other customers.” The firm’s projected “Euro R&D project volume” is 17 million (although I am not sure what this 17,000,000 number means. The company’s partners include Accenture, Complexible, Digirati, and EPAM, among others.

I noted that the company uses the catchphrase: “Semantic Web Company” and the catchphrase “Linking data to knowledge.”

The catchphrase, I assume, make it easier for some to understand the firm’s graph based semantic approach. I am still mired in figuring out that the map is not the territory.

Stephen E Arnold, September 8, 2016

HonkinNews for September 6, 2016, Now Available

September 6, 2016

If you visit Zimbabwe, what risks do you face when you use Facebook? Is the CIA’s investment arm too secretive? Whom do you consult to get the inside scoop about legacy code running on the mainframe in the basement? For the answers to these questions, invest six minutes in the September 6, 2016, edition of HonkinNews, a round up of stories from Beyond Search. You can view this week’s program at this link or click on the embedded viewer on the Beyond Search blog.

Kenny Toth, September 6, 2016

Watson Ads for Branded Answers to the Little Questions of Life

September 6, 2016

Here is a potent new way for brands to worm their way into every aspect of consumers’ lives. “IBM Watson Is Now Offering AI-Powered Digital Ads That Answer Consumers’ Questions,” we learn from AdWeek. Watson Ads will hook users up with answers to their everyday questions—answers supplied by advertisers. Apparently, IBM’s Weather-Company acquisition supplied the tools behind this product. Writer Christopher Heine explains:

IBM’s relatively new ownership of The Weather Company’s digital properties is coming into play in a serious fashion: Watson Ads will first appear on Weather.com, the Weather mobile app and the company’s data-driven WeatherFX platform. Later, IBM plans to allow them to appear on third-party properties.

Campbell Soup Company, Unilever and GSK Consumer Healthcare are some of the brands that will run the ads in the coming days. Watson Ads’ pricing details were not disclosed.

Jeremy Steinberg, global head of sales, The Weather Company, described how they work, stating that ‘machine learning and natural-language capabilities will allow it to provide accurate responses. What we’re doing is moving away from keyword searches and towards more natural language and well-reasoned answers.

Heine outlines Campbell’s plan as an example—their Watson Ads will present “Chef Watson,” the helpful AI which suggests recipes based on criteria like available ingredients, the time of day, and what the weather is like. Those recipes will be pulled from Campbell’s existing site Campbell’s Kitchen. Not surprisingly, their ingredient lists rely heavily on Campbell’s product line (which goes well beyond soup these days).

Another Watson Ads client is GSK Consumer Healthcare, which plans to use the tech to help users make better real-time health decisions—a worthy project, I’ll admit. I am curious to see how Unilever, and other companies down the line, will leverage their digital voices of authority. See the article for more details on the project.

Cynthia Murrell, September 6, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

Verizon Strategizes to Get Paid for Installing Big Brand Apps That You Will Probably Never Open

September 5, 2016

The article titled Verizon Offered to Install Marketers’ Apps Directly on Subscribers’ Phones on AdAge discusses the next phase in Verizon’s marketing strategy, a seeming inheritance of product placement: automatic installations for big brands onto your phone. Next time you notice an app that you didn’t download on your phone, look no further. Verizon has been in talks with both retail and finance brands about charging between $1 and $2 per device, which sounds small until you multiply it by 75 million Verizon smartphone subscribers. The article discusses some of the potential drawbacks.

Verizon has stoked some user frustration in the past with “bloatware,” as have many carriers and phone manufacturers. Bloatware comprises the often irrelevant apps that arrive pre-installed on phones, though they’re less often major brands’ apps and more often small, proprietary services from the carriers and manufacturers…There is no guarantee, however, that Verizon subscribers open the apps they find pre-installed on their phones. “If a user is not interested, they just delete it without activating.

Sara Choi, COO of AirFox, is quoted in the article making a great point about the importance to carriers to innovate new strategies for profit growth. Ultimately, the best use for this marketing technique is a huge number of immediate downloads. How to engage users once you have gotten into their phones is the next question. If this goes through, there will be no need to search to get an ad, which could mean bad news for online ad search.

Chelsea Kerwin, September 5, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta