Online Search: Ad Fraud and Relevance
November 13, 2013
Do you click on ads? Do you recognize ads? The dominance of Google translates to big money for capturing Google users’ attention.
How big of a problem is online fraud? There is a partial answer in the write up “Inside Ad Tech Fraud: Confessions of a Fake Web Traffic Buyer.” In addition to the revelations about online click fraud, the write up contains some fascinating quotations; for example:
Quality didn’t really matter to us, though.
I learned other things from the article as well. I know that the information in the story is accurate. Why would distorted or fraud-related information appear in a story about online? I underlined this statement:
I believe publishers are willing to do anything to make their economics work.
The word “anything” is an interesting one.
A few years ago, a colleague in New York City wanted my team to prepare a seminar about online ad fraud. I refused. Among the reasons was the simple fact that I wanted to avoid the pushback from “experts.”
As a result, I have avoided direct involvement in the methods that allegedly manipulate people like you, gentle reader. More recently, I have adopted the ostrich posture. I ignore what is now the norm. I prefer to live in a make believe world in which information is straight and true.
Life is simpler for me now. Online advertising is just so special. Online content benefits from the influence of advertising-supported information. Why pay for a commercial online service when the world is a click away. Ads make the content possible. In fact, ads are the point of content, right?
Stephen E Arnold, November 13, 2013
A Look at Bing vs Google
November 13, 2013
Over at Search Engine Watch, Mark Jackson reminds us that Google was not always top dog in the search field. His article, “Could Bing Ever Overtake Google in Search?” emphasizes that competition is a good thing. While this is true, could the SEO CEO have any other reasons to hope for the search giant’s wane? Vicious pandas and penguins, perhaps? After all, Jackson opens with an admission that he is angry at Google for ceasing to push keyword data into the public realm (where search engine gamers, er, optimizers can get to it) while continuing to supply that data to paid advertisers. The nerve!
Jackson does make some interesting points. He cites a recent Pubcon keynote address given by Google’s own Matt Cutts, which discusses some major developments for the leading search engine. Knowledge Graph, of course, will continue to play a role, as will voice search and “conversational” search. Jackson picks up on Cutt’s last item, “deep learning.”
He writes:
“Google is focused on deep learning and understanding what users want so searchers don’t have to use simple keyword phrases to search Google. Bing, on the other hand, has partnerships with every major social site and receives data directly from those sources. So, rather than trying to understand what users mean and predicting it, Bing actually knows what user want based on actual data from social sites.”
The piece goes on to emphasize the importance of mobile devices to the future of search, an area where he suspects Bing may really take over. Apparently, Bing’s more “personalized,” social-media-informed approach will especially make the difference in mobile, somehow. He also speculates that users may not take kindly to Google’s changes, particularly Knowledge Graph. He ventures:
“In my opinion, this is a make it or break it type of move by Google. Google users will either continue to like their search or they will end up using search less and less to find what they’re looking for. Bing users may be more likely to actually like their search results because the results are biased towards their own social media activities and friends’ activities online.”
I’d like to think more people are looking for objective information than for material that confirms their existing biases, but I suppose that is naïve. See the article for more on Jackson’s reasoning and hopes for a Bingy future. Is he right, or will Google maintain its search dominance for years to come?
Cynthia Murrell, November 13, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Riak 2.0 Now Available for Technical Preview
November 13, 2013
Basho has released a technical preview of Riak 2.0, the company announced at the Ricon West developers’ conference last month in San Francisco. Several key improvements have been made to the open source distributed database: additional Riak data types; the option for strong consistency; full-text search integration with Apache Solr; more flexibility in security administration; simplified configuration management; and the option of storing fewer replicas across multiple data centers. See the article for details on each of these changes.
The press release emphasizes that this is not the final release of Riak 2.0, and that Basho would like users’ feedback:
“Please note that this is only a Technical Preview of Riak 2.0. This means that it has been tested extensively, as we do with all of our release candidates, but there is still work to be completed to ensure its production hardened. Between now and the final release, we will be continuing manual and automated testing, creating detailed use cases, gathering performance statistics, and updating the documentation for both usage and deployment. As we are finalizing Riak 2.0, we welcome your feedback for our Technical Preview. We are always available to discuss via the Riak Users mailing list, IRC (#riak on freenode), or contact us.”
Riak is developed by Basho Technologies, who naturally offers a commercial edition of the NOSQL database. They also offer Riak CS, a cloud-based object storage system deployable on top of Riak. The company positions their enterprise version as the solution for companies whose needs go beyond the traditional database or who have wrestled with scalability constraints within relational databases. Founded in 2008, Basho is headquartered in in Cambridge, Massachusetts, and maintains offices in London, San Francisco, Tokyo, and Washington D.C.
Cynthia Murrell, November 13, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
MarkLogic Recognized for Database Management
November 13, 2013
We already knew that MarkLogic is good at search. Now the company is being recognized for its database management chops, we learn from “MarkLogic Featured in the Gartner Magic Quadrant for Operational Database Management Systems” at BWW Geeks World.
The press release tells us:
“MarkLogic has been positioned for its ability to execute and is the only Enterprise NoSQL database vendor featured in the report that integrates search and application services. . . .
MarkLogic is the only schema-agnostic Enterprise NoSQL database that integrates semantics, search and application services with the enterprise features customers require for production applications. This combination helps enterprises make better-informed decisions and create robust, scalable applications to drive revenue, streamline operations, manage risk and make the world safer. MarkLogic features ACID transactions, horizontal scaling, real-time indexing, high availability, disaster recovery, and government-grade security.”
CEO Gary Bloom does not let us forget his company’s search success. He points out that they also captured a place on Gartner‘s 2013 Magic Quadrant for Enterprise Search roster, and that they are the only company to be included in both reports. He understandably takes this achievement as evidence that MarkLogic is on the right track with its integrated approach. The company focuses on scalability, enterprise-readiness, and leveraging the latest technology. Founded in 2001, MarkLogic is headquartered in Silicon Valley and maintains offices around the world.
Cynthia Murrell, November 13, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Xenky Vendor Profiles: Siderean Software Now Available
November 12, 2013
If you are a fan of semantic methods, you may find the Siderean Software profile a useful case study. You can find the write up, among others, at this location. The chatter at conferences about semantic methods is finally burning out. Nevertheless, semantic methods bubble beneath the surface of many modern search systems. The Siderean case is an example of what types of content processing operations are required to perform “deep indexing” or “rich metadata extraction.” The first step, as you will learn, is to have content tagged. That means SGML or XML.
The question becomes, “How do I get my content into these formats?” The answer, for many budgets, is a deal breaker. One the content is processable, then a number of manipulations are possible. Think of Siderean’s system as delivering the type of flip and flop of data that Excel provides in its pivot table. Now ask yourself, “How often do I use a pivot table?” Exactly.
Remember. I am posting pre-publication drafts of analyses that may have been used, recycled, or just ripped off by various “real” publishers over the years. If there are errors in these drafts, you can “correct” them by adding a comment to this post in Beyond Search. The archive of case studies or profiles will not be updated.
I am providing these for personal use. If a frisky soul wants to use them for commercial purposes, I will take some type of action. If you were in my lecture at the enterprise search conference in New York last week, you will know that I called attention to one of the most slippery of the azure chip consulting firms. I showed a slide that listed the same “expert” twice on a $3,500 report. Not bad, since the outfit’s expert did not create the information in the report.
Stephen E Arnold, November 12, 2013
Neuroscientists Advance Predictive Analytics
November 12, 2013
Where do the fields of neuroscience and predictive analytics intersect? Apparently at the University of Sussex. Phys.org reveals, “Scientists Identify a Mathematical ‘Crystal Ball’ that May Predict Calamities.” It makes sense when you consider that both disciplines deal with complex systems.
In systems ranging in scale from the planet’s climate to an epileptic’s brain, the transition from a healthy to an unhealthy state is marked by a peak in information flow between elements. Until now, it has been difficult to impossible to predict these peaks in advance. Working together, scientists from the University of Sussex’s Sackler Centre for Consciousness Science and the Centre for Research in Complex Systems at Australia’s Charles Sturt University have made a breakthrough regarding such predictions. The article explains:
“Essentially this means finding a way to characterize, mathematically, the extent to which the parts of a complex system are simultaneously segregated (they all behave differently) and integrated (they all depend on each other). In the present study the research team managed to do just this, and to show for the first time that their measure reliably predicts phase transitions in standard systems studied by physicists now for many decades (the so-called ‘Ising’ model).
“Professor Anil Seth, Co-Director of the Sackler Centre, says: ‘The implications of the work are far-reaching. If the results generalise to other real-world systems, we might have ways of predicting calamitous events before they happen, which would open the possibility for intervention to prevent the transition from occurring.'”
Such interventions would obviously be beneficial in many circumstances. As this science progresses, we may be surprised at how widely the method could be applied. The possibilities seem endless. Can an application to horse racing be far behind?
Cynthia Murrell, November 12, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Google EU Concessions to Be Kept Under Wraps
November 12, 2013
The Register‘s indignation at being kept out of the loop is evident in their headline, “Google Rivals GAGGED from Exposing Ad Giant’s EU Search Peace Offering.” Apparently, Google’s European compromises are to be considered privileged information, at least while the European Commission’s three-years-and-counting antitrust investigation is still in progress.
Writer Kelly Fiveash tells us:
“How exactly Google will overhaul its web-search engine in Europe is unlikely to be aired in public: the European Commission’s competition officials have sent out copies of the advertising giant’s package of concessions to its rivals for comment – along with a requirement for them to agree to confidentiality clauses.
“Despite repeated questioning from The Register, antitrust commissioner Joaquin Almunia’s spokesman declined to tell us if details of the offer would be published. The package of promises details changes to Google’s web search operation in Europe following claims it unfairly favours its own services over competitors’ websites in search results.”
The representative did say that the commission is asking for feedback from complainants on Google’s proposals. As of the date of the statement, they also sought more “concrete technical elements” from Google. Fiveash concludes by noting that Google has captured about 90 percent of Europe’s search market. It will be interesting to see how this plays out.
Cynthia Murrell, November 12, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
IT Dangers Revealed
November 12, 2013
Those of us with experience in IT may not be surprised by the revelations InfoWorld shares in “6 Dirty Secrets of the IT Industry.” This magazine of IT gospel asked its readers to share their observations of shady IT matters, then fact-checked the results. See the article for the whole roster, but I’ll share a few bits here.
Secret number one is the broadest; Writer Dan Tynan colorfully titles this one, “Sys admins have your company by the short hairs.” He quotes Pierluigi Stella, CTO of security firm Network Box USA, who gives each of us good reason to send our IT departments the random gift basket:
“There are no secrets for IT. I can run a sniffer on my firewall and see every single packet that comes in and out of a specific computer. I can see what people write in their messages, where they go to on the Internet, what they post on Facebook. In fact, only ethics keep IT people from misusing and abusing this power. Think of it as having a mini-NSA in your office.”
Speaking of the NSA, Tynan calls those government snoopers “punks compared to consumer marketing companies and data brokers.” He cites the practices in casinos as the epitome of this very individualized marketing tactic, and provides examples. He goes on to quote former casino executive and Louisiana State University professor Michael Simon, who emphasizes that the practice is far from limited to casinos:
“I teach an MBA class on database analysis and mining, and all the companies we study collect customer information and target offers specific to customer habits. It’s routine business practice today, and it’s no secret. For example, I bring my dog to PetSmart for specific services and products, and the offers they send me are specific to my spending habits. . . instead of wasting time sending me stuff I won’t use like discounts on cat food or tropical fish.”
Whether you, like Simon, appreciate targeted marketing or you find it creepy, it is worth remembering how much data these entities are collecting on each of us.
It is also good to keep in mind some pitfalls of another practice that has become commonplace—storing data in the cloud. In fact, this could be the most disconcerting item on this list. Though we tend to think of the cloud in nebulous terms, that data is actually stored on real servers somewhere. When our data shares rack space with that of other entities, we run the risk of intrusion and confiscation through no fault of our own. The article emphasizes:
“Your cloud data could be swept up in an investigation of an entirely unrelated matter — simply because it was unlucky enough to be kept on the same servers as the persons being investigated. . . . Users who want to protect themselves against this worst-case scenario need to know where their data is actually being kept and which laws may pertain to it, says David Campbell, CEO of cloud security firm JumpCloud. ‘Our recommendation is to find cloud providers that guarantee physical location of servers and data, such as Amazon, so that you can limit your risk proactively,’ he says.”
Another suggestion is to encrypt your data, of course. Keeping a local backup is another good idea, since law enforcement seems to be under no obligation to grant access to your own confiscated data. For some of us, this is just more evidence that sensitive information does not belong in the cloud. Caveat Emptor.
Cynthia Murrell, November 12, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Sponsored Content: Reinventing the Wheel
November 11, 2013
I want to ask you, gentle reader, “Do you recall the messages that whipped Rome’s citizens into a fury when the death of Germanicus became known?” If you did, you are aware of the value of sponsored content. If you did not, you will find something incredibly new, totally exciting, and probably revolutionary when you read “Marrying Companies and Content.” If the link is dead, you will have to find a content repository like the public library to read the article in the November 11, 2013, New York Times.
The main point of the write up is that since 1947 companies have been sponsoring content. Imagine that! 1947. The article explains that sponsored content is a darned good way to market. I liked this statement in the write up:
“This is not a fad,” he [PR maven at Weber Shandwick] said, pointing out that both corporate money (advertising) and venture money (backing) were pouring into brand publishing. “These guys stand out because they bring a depth of understanding to the economic proposition and know that for it to work, it has to be done right.”
For a more informative view of manipulated information, I suggest a spin through Jacques Ellul’s Propaganda is a useful first step.
To see the consequences of sponsored content, may I suggest:
- Running a query and identifying which hits are accurate, which are disinformation, misinformation, or reformation
- Standing in front of Cuba Libre in Washington, DC, and running a Google query for restaurants on your iPhone
- Considering the “value” of outputs from Jike.com, the Chinese centric search system
- Listening to either Harry Shearer or No Agenda and comparing the information with that in a mass media outlet.
By the way, do today’s college graduates have the tools to identify and remediate malformed information in search results? Is this discussion of Germanicus accurate? Can your colleagues handle ancient history or more timely outputs from a Big Data system?
Stephen E Arnold, November 11, 2013
Kroll Ontrack Presents Products to Control the Rising Costs of EDiscovery
November 11, 2013
The Minneapolis/St. Paul Business Journal reports on the cost of litigation in the article titled How Much Will the Next Lawsuit Cost? Kroll Ontrack Tech Can Tell You. The product line recently released by Kroll aims to keep legal defense budgets under control as Big Data drives them upward. The products emphasize a repeatable approach to ediscovery, as opposed to the standard treatment of every project as new.
The article explains:
“For instance, an analysis may reveal that a business’ chief product officer kept every memo received since 1985, driving up the cost of data mining. The company could then encourage the officer to shred paper more often, reducing legal costs going forward. Analyzing the cost of past cases makes it easier to predict future legal costs, Hager said. “You can budget for litigation the same way you budget to close financial books or the same way you budget for payroll.”
Some of the products include Ediscovery for Portfolio Management, which turns the management of ediscovery findings as a portfolio, a collaboration rather than individual projects, Ediscovery.com Review, which allows for wide-ranging control over data and costs, and Ediscovery.com Collect, a product made to help manage time and cost in the collection of ediscovery data.
Chelsea Kerwin, November 11, 2013
Sponsored by ArnoldIT.com, developer of Augmentext