Google Ad Revenue: Squeezing Ahead but Who Will Be the Squeezee?

March 31, 2016

I read “Google makes One Third of Its Global Revenue from Advertisements.” I have been off base because I assumed that Google derived 90 percent of its revenue from online advertising. I stand corrected even if I am not 100 percent confident in the report in Propakistani.

Set the numbers aside for a nonce. If one considers the relative relationship in ad revenue among Facebook, Google, and Yahoo (poor old Yahoo), the write up hits on an important point:

Google’s share in the global ad market is also diminishing. Its percentage in the net share of the total global online ad revenue has actually decreased to 33.3 percent. The figure was 34.6 percent in 2014. Analysts from Statista have predicted an even greater decline in market share in 2016, down to 30.9 percent.

Okay, Statista may be the source of the insight.

From my point of view, Google will have to figure out what to do about Zuck and his band of former Xooglers. If Facebook continues to enjoy robust growth, life might become more interesting at the Alphabet Google thing.

One other thought: It might become more expensive to run ads on the Google platform unless the sale of Loon balloons soars. Revenue issues may ground the fleet in the future as part of the new fiscal order at the search giant.

Stephen E Arnold, March 31, 2016

GoPubMed Sorts Searching

March 31, 2016

Do you search the US government’s PubMed.gov content? If you use the PubMed.gov system, you may want to come at the information in different or more useful ways.

The German company Transinsight makes available its search system, GoPubMed.org.

@@ pubmed

You can use the semantic search system to explore the knowledgebase.

Features of the system include:

  • A sidebar which allows one click access to concepts, authors, journals, etc.
  • A search box which accepts keyword queries and offers suggestions for the query
  • A results list.

The layout is clear. A bit of hunting around is necessary, but that is a common experience when trying to figure out if there is a way to narrow a broad search based on a lousy query.

There have been many search systems built to make the PubMed information findable. My favorite, though long gone, was Grateful Med. Like patent searching, queries of medical information are tricky. Some day I will write about the Information Health Reference Center, circa 1989. That was exciting.

Stephen E Arnold, March 31, 2016

Patents and Semantic Search: No Good, No Good

March 31, 2016

I have been working on a profile of Palantir (open source information only, however) for my forthcoming Dark Web Notebook. I bumbled into a video from an outfit called ClearstoneIP. I noted that ClearstoneIP’s video showed how one could select from a classification system. With every click,the result set changed. For some types of searching, a user may find the point-and-click approach helpful. However, there are other ways to root through what appears to be patent applications. There are the very expensive methods happily provided by Reed Elsevier and Thomson Reuters, two find outfits. And then there are less expensive methods like Alphabet Google’s odd ball patent search system or the quite functional FreePatentsOnline service. In between, you and I have many options.

None of them is a slam dunk. When I was working through the publicly accessible Palantir Technologies’ patents, I had to fall back on my very old-fashioned method. I tracked down a PDF, printed it out, and read it. Believe me, gentle reader, this is not the most fun I have ever had. In contrast to the early Google patents, Palantir’s documents lack the detailed “background of the invention” information which the salad days’ Googlers cheerfully presented. Palantir’s write ups are slogs. Perhaps the firm’s attorneys were born with dour brain circuitry.

I did a side jaunt and came across a white paper from ClearstoneIP called “Why Semantic Searching Fails for Freedom-to-Operate (FTO).”i The 12 page write up is from a company called ClearstoneIP, which is a patent analysis company. The firm’s 12 pager is about patent searching. The company, according to its Web site is a “paradigm shifter.” The company describes itself this way:

ClearstoneIP is a California-based company built to provide industry leaders and innovators with a truly revolutionary platform for conducting product clearance, freedom to operate, and patent infringement-based analyses. ClearstoneIP was founded by a team of forward-thinking patent attorneys and software developers who believe that barriers to innovation can be overcome with innovation itself.

The “freedom to operate” phrase is a bit of legal jargon which I don’t understand. I am, thank goodness, not an attorney.

The firm’s search method makes much of the ontology, taxonomy, classification approach to information access. Hence, the reason my exploration of Palantir’s dynamic ontology with objects tossed ClearstoneIP into one of my search result sets.

The white paper is interesting if one works around the legal mumbo jumbo. The company’s approach is remarkable and invokes some of my caution light words; for example:

  • “Not all patent searches are the same.”, page two
  • “This all leads to the question…”, page seven
  • “…there is never a single “right” way to do so.”, page eight
  • “And if an analyst were to try to capture all of the ways…”, page eight
  • “to capture all potentially relevant patents…”, page nine.

The absolutist approach to argument is fascinating.

Okay, what’s the ClearstoneIP search system doing? Well, it seems to me that it is taking a path to consider some of the subtlties in patent claims’ statements. The approach is very different from that taken by Brainware and its tri-gram technology. Now that Lexmark owns Brainware, the application of the Brainware system to patent searching has fallen off my radar. Brainware relied on patterns; ClearstoneIP uses the ontology-classification approach.

Both are useful in identifying patents related to a particular subject.

What is interesting in the write up is its approach to “semantics.” I highlighted in billable hour green:

Anticipating all the ways in which a product can be described is serious guesswork.

Yep, but isn’t that the role of a human with relevant training and expertise becomes important? The white paper takes the approach that semantic search fails for the ClearstoneIP method dubbed FTO or freedom to operate information access.

The white paper asserted:

Semantic

Semantic searching is the primary focus of this discussion, as it is the most evolved.

ClearstoneIP defines semantic search in this way:

Semantic patent searching generally refers to automatically enhancing a text -based query to better represent its underlying meaning, thereby better identifying conceptually related references.

I think the definition of semantic is designed to strike directly at the heart of the methods offered to lawyers with paying customers by Lexis-type and Westlaw-type systems. Lawyers to be usually have access to the commercial-type services when in law school. In the legal market, there are quite a few outfits trying to provide better, faster, and sometimes less expensive ways to make sense of the Miltonesque prose popular among the patent crowd.

The white paper, in a lawyerly way, the approach of semantic search systems. Note that the “narrowing” to the concerns of attorneys engaged in patent work is in the background even though the description seems to be painted in broad strokes:

This process generally includes: (1) supplementing terms of a text-based query with their synonyms; and (2) assessing the proximity of resulting patents to the determined underlying meaning of the text – based query. Semantic platforms are often touted as critical add-ons to natural language searching. They are said to account for discrepancies in word form and lexicography between the text of queries and patent disclosure.

The white paper offers this conclusion about semantic search:

it [semantic search] is surprisingly ineffective for FTO.

Seems reasonable, right? Semantic search assumes a “paradigm.” In my experience, taxonomies, classification schema, and ontologies perform the same intellectual trick. The idea is to put something into a cubby. Organizing information makes manifest what something is and where it fits in a mental construct.

But these semantic systems do a lousy job figuring out what’s in the Claims section of a patent. That’s a flaw which is a direct consequence of the lingo lawyers use to frame the claims themselves.

Search systems use many different methods to pigeonhole a statement. The “aboutness” of a statement or a claim is a sticky wicket. As I have written in many articles, books, and blog posts, finding on point information is very difficult. Progress has been made when one wants a pizza. Less progress has been made in finding the colleagues of the bad actors in Brussels.

Palantir requires that those adding content to the Gotham data management system add tags from a “dynamic ontology.” In addition to what the human has to do, the Gotham system generates additional metadata automatically. Other systems use mostly automatic systems which are dependent on a traditional controlled term list. Others just use algorithms to do the trick. The systems which are making friends with users strike a balance; that is, using human input directly or indirectly and some administrator only knowledgebases, dictionaries, synonym lists, etc.

ClearstoneIP keeps its eye on its FTO ball, which is understandable. The white paper asserts:

The point here is that semantic platforms can deliver effective results for patentability searches at a reasonable cost but, when it comes to FTO searching, the effectiveness of the platforms is limited even at great cost.

Okay, I understand. ClearstoneIP includes a diagram which drives home how its FTO approach soars over the competitors’ systems:

image

ClearstoneIP, © 2016

My reaction to the white paper is that for decades I have evaluated and used information access systems. None of the systems is without serious flaws. That includes the clever n gram-based systems, the smart systems from dozens of outfits, the constantly reinvented keyword centric systems from the Lexis-type and Westlaw-type vendor, even the simplistic methods offered by free online patent search systems like Pat2PDF.org.

What seems to be reality of the legal landscape is:

  1. Patent experts use a range of systems. With lots of budget, many fee and for fee systems will be used. The name of the game is meeting the client needs and obviously billing the client for time.
  2. No patent search system to which I have been exposed does an effective job of thinking like an very good patent attorney. I know that the notion of artificial intelligence is the hot trend, but the reality is that seemingly smart software usually cheats by formulating queries based on analysis of user behavior, facts like geographic location, and who pays to get their pizza joint “found.”
  3. A patent search system, in order to be useful for the type of work I do, has to index germane content generated in the course of the patent process. Comprehensiveness is simply not part of the patent search systems’ modus operandi. If there’s a B, where’s the A? If there is a germane letter about a patent, where the heck is it?

I am not on the “side” of the taxonomy-centric approach. I am not on the side of the crazy semantic methods. I am not on the side of the keyword approach when inventors use different names on different patents, Babak Parviz aliases included. I am not in favor of any one system.

How do I think patent search is evolving? ClearstoneIP has it sort of right. Attorneys have to tag what is needed. The hitch in the git along has been partially resolved by Palantir’’-type systems; that is, the ontology has to be dynamic and available to anyone authorized to use a collection in real time.

But for lawyers there is one added necessity which will not leave us any time soon. Lawyers bill; hence, whatever is output from an information access system has to be read, annotated, and considered by a semi-capable human.

What’s the future of patent search? My view is that there will be new systems. The one constant is that, by definition, a lawyer cannot trust the outputs. The way to deal with this is to pay a patent attorney to read patent documents.

In short, like the person looking for information in the scriptoria at the Alexandria Library, the task ends up as a manual one. Perhaps there will be a friendly Boston Dynamics librarian available to do the work some day. For now, search systems won’t do the job because attorneys cannot trust an algorithm when the likelihood of missing something exists.

Oh, I almost forget. Attorneys have to get paid via that billable time thing.

Stephen E Arnold, March 30, 2016

Third Party Company Profiteering

March 31, 2016

We might think that we keep our personal information from the NSA, but there are third party companies that legally tap ISP providers and phone companies and share the information with government agencies. ZDNet shares the inside story about this legal loophole, “Meet The Shadowy Tech Brokers That Deliver Your Data To The NSA.”  These third party companies hide under behind their neutral flag and then reap a profit.  You might have heard of some of them: Yaana, Subsentio, and Neustar.

“On a typical day, these trusted third-parties can handle anything from subpoenas to search warrants and court orders, demanding the transfer of a person’s data to law enforcement. They are also cleared to work with classified and highly secretive FISA warrants. A single FISA order can be wide enough to force a company to turn over its entire store of customer data.

Once the information passes through these third party companies it is nearly impossible to figure out how it is used.  The third party companies do conduct audits, but it does little to protect the average consumer.  Personal information is another commodity to buy, sell, and trade.  It deems little respect for the individual consumer.  Who is going to stand up for the little guy?  Other than Edward Snowden?

 

Whitney Grace, March 31, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

RAVN ACE Can Help Financial Institutions with Regulatory Compliance

March 31, 2016

Increased regulations in the financial field call for tools that can gather certain information faster and more thoroughly. Bobsguide points to a solution in, “RAVN Systems Releases RAVN ACE for Automated Data Extraction of ISDA Documents Using Artificial Intelligence.” For those who are unaware, ISDA stands for International Swaps and Derivatives Association, and a CSA is a Credit Support Annex. The press release informs us:

“RAVN’s ground-breaking technology, RAVN ACE, joins elements of Artificial Intelligence and information processing to deliver a platform that can read, interpret, extract and summarise content held within ISDA CSAs and other legal documents. It converts unstructured data into structured output, in a fraction of the time it takes a human – and with a higher degree of accuracy. RAVN ACE can extract the structure of the agreement, the clauses and sub-clauses, which can be very useful for subsequent re-negotiation purposes. It then further extracts the key definitions from the contract, including collateral data from tabular formats within the credit support annexes. All this data is made available for input to contract or collateral management and margining systems or can simply be provided as an Excel or XML output for analysis. AVN ACE also provides an in-context review and preview of the extracted terms to allow reviewing teams to further validate the data in the context of the original agreement.”

The write-up tells us the platform can identify high-credit-risk relationships and detail the work required to repaper those accounts (that is, to re-draft, re-sign, and re-process paperwork). It also notes that even organizations that have a handle on their contracts can benefit, because the platform can compare terms in actual documents with those in that have been manually abstracted.

Based in London, enterprise search firm RAVN tailors its solutions to the needs of each industry it serves. The company was founded in 2011.

 

Cynthia Murrell, March 31, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Attensity Europe Has a New Name

March 30, 2016

Short honk: The adventure of Attensity continues. Attensity Europe has renamed itself Sematell Interactive Solutions. You can read about the change here. The news release reminds the reader that Sematell is “the leading provider of interaction solutions.” I am not able to define interaction solutions, but I assume the company named by combining semantic and intelligence will make the “interaction solutions” thing crystal clear. The url is www.sematell.de.

Stephen E Arnold, March 30, 2016

Content Analyst Sold to kCura

March 30, 2016

kCura, an e-discovery company, purchased Content Analyst. Content Analyst was a spin out from a Washington, DC consulting and services firm. According to “kCura Acquires Content Analyst Company, Developers of High-Performance Advanced Text Analytics Technologies

Content Analyst’s analytics engine has been fully integrated into Relativity Analytics for eight years, supporting a wide range of features that are flexible enough to handle the needs of any type or size of case — everything from organizing unstructured data to email threading to categorization that powers flexible technology-assisted review workflows….By joining teams, kCura will bring Content Analyst’s specialized engineering talent closer to Relativity users, in order to continue building a highly scalable analytics solution even faster.

Content Analytics performs a number of text processing functions, including entity extraction and concept identification for metatagging text. When the initial technology was developed by the DC firm specializing in intelligence and related work for the US government, the system captured the attention of the intelligence community. The systems and methods used by Content Analyst remain useful.

Unlike some text processing companies, Content Analyst focused on legal e-discovery. kCura is the new Content Analyst. What company will acquire Recommind?

Stephen E Arnold, March 30, 2016

Microsoft and the Open Source Trojan Horse

March 30, 2016

Quite a few outfits embrace open source. There are a number of reasons:

  1. It is cheaper than writing original code
  2. It is less expensive than writing original code
  3. It is more economical than writing original code.

The article “Microsoft is Pretending to be a FOSS Company in Order to Secure Government Contracts With Proprietary Software in ‘Open’ Clothing” reminded me that there is another reason.

No kidding.

I know that IBM has snagged Lucene and waved its once magical wand over the information access system and pronounced, “Watson.” I know that deep inside the kind, gentle heart of Palantir Technologies, there are open source bits. And there are others.

The write up asserted:

For those who missed it, Microsoft is trying to EEE GNU/Linux servers amid Microsoft layoffs; selfish interests of profit, as noted by some writers [1,2] this morning, nothing whatsoever to do with FOSS (there’s no FOSS aspect to it at all!) are driving these moves. It’s about proprietary software lock-in that won’t be available for another year anyway. It’s a good way to distract the public and suppress criticism with some corny images of red hearts.

The other interesting point I highlighted was:

reject the idea that Microsoft is somehow “open” now. The European Union, the Indian government and even the White House now warm up to FOSS, so Microsoft is pretending to be FOSS. This is protectionism by deception from Microsoft and those who play along with the PR campaign (or lobbying) are hurting genuine/legitimate FOSS.

With some government statements of work requiring “open” technologies, Microsoft may be doing what other firms have been doing for a while. See points one to three above. Microsoft is just late to the accountants’ party.

Why not replace the SharePoint search thing with an open source solution? What’s the $1.2 billion MSFT paid for the fascinating Fast Search & Transfer technology in 2008? It works just really well, right?

Stephen E Arnold, March 30, 2016

Google Reveals Personal Data in Search Results

March 30, 2016

Our lives are already all over the Internet, but Google recently unleashed a new feature that takes it to a new level.  Search Engine Watch tells us about, “Google Shows Personal Data Within Search Results, Tests ‘Recent Purchases’ Feature” and the new way to see your Internet purchases.

Google pulls the purchase information most likely from Gmail or Chrome.   The official explanation is that Google search is now more personalized, because it does pull information from Google apps:

“You can search for information from other Google products you use, like Gmail, Google Calendar, and Google+. For example, you can search for information about your upcoming flights, restaurant reservations, or appointments.”

Personalized Google search can display results now only from purchases but also bills, flights, reservations, packages, events, and Google Photos.  It is part of Google’s mission to not only organize the world, but also be a personal assistant, part of the new Google Now.

While it is a useful tool to understand your personal habits, organize information, and interact with data like in a science-fiction show, at the same time it is creepy being able to search your life with Google.  Some will relish in the idea of having their lives organized at their fingertips, but others will feel like the NSA or even Dark Web predators will hack into their lives.

 

Whitney Grace, March 30, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Predictive Analytics on a Budget

March 30, 2016

Here is a helpful list from Street Fight that could help small and mid-sized businesses find a data analysis platform that is right for them—“5 Self-Service Predictive Analytics Platforms.”  Writer Stephanie Miles notes that, with nearly a quarter of small and mid-sized organizations reporting plans to adopt predictive analytics, vendors are rolling out platforms for companies with smaller pockets than those of multinational corporations. She writes:

“A 2015 survey by Dresner Advisory Services found that predictive analytics is still in the early stages of deployment, with just 27% of organizations currently using these techniques. In a separate survey by IDG Enterprise, 24% of small and mid-size organizations said they planned to invest in predictive analytics to gain more value from their data in the next 12 months. In an effort to encourage this growth and expand their base of users, vendors with business intelligence software are introducing more self-service platforms. Many of these platforms include predictive analytics capabilities that business owners can utilize to make smarter marketing and operations decisions. Here are five of the options available right now.”

Here are the five platforms listed in the write-up: Versium’s Datafinder; IBM’s Watson Analytics; Predixion, which can run within Excel; Canopy Labs; and Spotfire from TIBCO. See the article for Miles’ description of each of these options.

 

Cynthia Murrell, March 30, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

 

Next Page »

  • Archives

  • Recent Posts

  • Meta