The Story of Google and How It Remains Reliable

April 13, 2016

I noted that Google Books offers a preview of “Site Reliability Engineering: How Google Runs Production System” by a gaggle of Googlers. The book will soon be available from O’Reilly which has given its permission to Google to provide a preview of a book about Google written by Google. You can also find a “summary” of the book at this link. I am not sure who DanLuu is, but the individual “likes this book a lot.” I would, therefore, conclude that he is either a Googler, a Xoogler, or a Googler in waiting.

From the introduction available on Google Books, it seems that the authors are Googlers. The information appears to be an explanation of some of the innovations produced by the Google in the last 15 years, a lot of the philosophy of speed and efficiency, and a bit of Google cheerleading.

What’s the book cover? Here’s a sampling of the subjects:

  • A run down of Google’s philosophy of site reliability engineering
  • The principles of SRE (eliminating boring manual work, simplicity, etc.)
  • Practices (handling problems like cascading failure, data integrity). I would point out that Palantir moved beyond Google’s methods in its rework of Percolator to achieve greater reliability.)
  • Management (more of engineering practices than orchestrating humans)
  • Conclusions (Google learns which suggests other organizations do not learn).

Each of these sections is chopped into smaller segments. In generate, the writing is less academic than the approach into the technical papers which Googlers deliver at conferences.

You can order the book on Amazon too.

Stephen E Arnold, April 13, 2016

Inbenta on Track for Millions in Funding

April 12, 2016

Make sure you note the spelling of the company’s name. It is i-n-b-e-n-t-a. The company is involved in artificial intelligence and semantics. There is a Wikipedia entry here. I read “Barcelona AI Startup Inbenta To Close €10 Million Series B.” The write up points out that Inbenta opened its doors in 2005, which makes the company more than 10 years old. Nevertheless, the article describes Inbenta as a start up. The company generates more than US$8 million in annual revenue and is growing at a reported 60 percent each year. That is remarkable. Few companies engaged in search and content processing have been able to generate robust growth in the current economic thrill ride.

The write up reported:

Inbenta offers support features to businesses that want to make search better and smarter on their website. They do this through semantic search and artificial intelligence, and according to themselves decrease support costs and increase conversion rates.

The company’s Web site states:

Intelligent search for your customers. Provide search results that actually make sense. Let your customers find what they are searching for on your website. Reduce support cost and increase conversion.

These statements suggest that Inbenta is in the ecommerce search space which embraces companies like Endeca, EasyAsk, and SLI, among others.

I noted these technologies which are part of the Inbenta solution:

  • Artificial intelligence
  • Natural language technology
  • Semantic clustering.

The company’s most recent news reports a solution called “hybrid chat.” The idea is that an “AI powered virtual assistance combined with human live chat [sic].” I think of this approach as “augmented intelligence.” Palantir has used this human-software method with some measure of success. Will Inbenta work a similar magic and hit a multi-billion dollar valuation as Palantir did?

What interested me was that Inbenta like Coveo has positioned search and content processing as a customer support solution. Inbenta seems to be nosing into the self service niche in ecommerce.

Those investing in Inbenta will be eager to watch the company’s growth rate because today’s investment in a 10 year old start up could grow into the next big thing. Inbenta’s apparent success might, however, spark some interest from Palantir-type companies to compete in this augmented service sector.

Stephen E Arnold, April 12, 2016

AnalyzeThe.US, the 2016 Version?

April 12, 2016

I read “With Government Data Unlocked, MIT Tries to Make It Easier to Soft Through.” I came away from the write up a bit confused. I recall that Palantir Technologies offered for a short period of time a site called AnalyzeThe.US. It disappeared. I also recalled seeing a job posting for a person with a top secret clearance who knew Tableau (Excel on steroids) and Palantir Gotham (augmented intelligence). Ii am getting old but I thought that Michael Kim, once a Deloitte wizard, gave a lecture about how one can use Palantir for analytics.

Why is this important?

The write up points out that MIT worked with Deloitte which, I learned:

provided funding and expertise on how people use government data sets in business and for research.

The Gray Lady’s article does  not see any DNA linking AnalyzeThe.US, Deloitte, and the “new” Data USA site. Palantir’s Stephanie Yu gave a talk at MIT. I wonder if those in that session perceive any connection between Palantir and MIT. Who knows. I wonder if the MIT site makes use of AngularJS.

With regard to US government information, www.data.gov is still online. The information can be a challenge for a person without Tableau and Palantir expertise to wrangle in my experience. For those who don’t think Palantir is into sales, my view is that Palantir sells via intermediaries. The deal, in this type of MIT case, is to try to get some MIT students to get bitten by the Gotham and Metropolitan fever. Thank goodness I am not a real journalist trying to figure out who provides what to whom and for what reason. Okay, back to contemplating the pond filled with Kentucky mine run off water.

Stephen E Arnold, April 12, 2016

MarkLogic: Not Much Information about DI2E on the MarkLogic Web Site

April 11, 2016

Short honk: I have been thinking about MarkLogic in the context of Palantir Technologies. The two companies are sort of pals. Both companies are playing the high stakes game for next generation augmented intelligence systems for the Department of Defense. Palantir’s approach has been to generate revenues from sales to the intelligence community. MarkLogic’s approach has been to ride on the Distributed Common Ground System which is now referenced in some non-Hunter circles as Di2E.

You can get a sense of what MarkLogic makes available by navigating to www.marklogic.com and running a query for DI2E or DCGS.

The Plugfest documents provide a snapshot of the vendors involved as of December 2015 in this project. Here’s a snippet from the unclassified set of slides “Plugfest Industry Day: Plugfest/Mashup 2016.”

palantir vs marklogic plugfest

What caught my attention is that Palantir, which has its roots in CIA-type thought processes, is in the same “industry partner” illustration as MarkLogic. I noticed that IBM (the DB2 folks) and Oracle (the one-time champion in database technology) are also “partners.”

The only hitch in this “plugfest” partnering deal is Palantir’s quite interesting AlphaDB innovation and the disclosure of data management systems and methods in US 2016/0085817, “System and Method for Investigating Large Amounts of Data”, an invention of the now not-so-secret Hobbits Geoffrey Stowe, Chris Fischer, Paul George, Eli Bingham, and Rosco Hill.

Palantir’s one-two punch is AtlasDB and its data management method. The reason I find this interesting is that MarkLogic is the NoSQL, XML, slice-and-dice advanced technology which some individuals find difficult to use. IBM and Oracle are decidedly old school.

MarkLogic may not publicize its involvement in DCGS/DI2E, but the revenue is important for MarkLogic and the other vendors in the “partnering” diagram. Palantir, however, has been diversifying with, from what I hear, considerable success.

MarkLogic is a Silicon Valley innovator which opened its doors in 2001. Yep, that’s 15 years ago. Palantir Technologies is the newer kid on the block. The company was set up in 2003, that 13 years ago. What I find interesting is that MarkLogic’s approach is looking a bit long in the tooth. Palantir’s approach is a bit more current, and its user experience is more friendly than wrestling with XQuery and its extensions.

What happens if Palantir becomes the plumbing for the DCGS/DI2E system? Perhaps IBM or Oracle will have to think about acquiring Palantir. With technology IPOs somewhat rare, Palantir stakeholders may find that thinking the unthinkable is attractive.

What happens if Palantir takes its commercial business into a separate company and then formulates a deal to sell only the high-vitamin augmented intelligence business? MarkLogic may be faced with some difficult choices. Simplifying its data management and query systems may be child’s play compared to figuring out what its future will be if either IBM or Oracle snap up the quite interesting Palantir technologies, particularly the database and data management systems.

Watch for my for-fee report about Palantir Technologies. There will be a discounted price for law enforcement and intelligence professionals and another price for those not engaged in these two disciplines. Expect the report in early summer 2016. A small segment of the Palantir special report will appear in the forthcoming “Dark Web Notebook”, which I referenced in the Singularity 1 on 1 interview in mid-March 2016. To reserve copies of either of these two new monographs, write benkent2020 at Yahoo dot com.

Stephen E Arnold, April 11, 2016

Glueware: A Sticky and Expensive Mess

April 5, 2016

I have been gathering open source information about DCGS, a US government information access and analysis system. I learned that the DCGS project is running a bit behind its original schedule formulated about 13 years ago. I also learned that the project is little over budget.

I noted “NASA Launch System Software Upgrade Now 77% overt Budget.” What interested me was the reference to “glueware.” The idea appears to be that it is better, faster, and maybe cheaper to use many different products. The “glueware” idea allows these technologies to be stuck or glued together. This is an interesting idea.

According to the write up:

To develop its new launch software, NASA has essentially kluged together a bunch of different software packages, Martin noted in his report. “The root of these issues largely results from NASA’s implementation of its June 2006 decision to integrate multiple products or, in some cases, parts of products rather than developing software in-house or buying an off-the-shelf product,” the report states. “Writing computer code to ‘glue’ together disparate products has turned out to be more complex and expensive than anticipated. As of January 2016, Agency personnel had developed 2.5 million lines of ‘glue-ware,’ with almost two more years of development activity planned.”

The arguments for the approach boil down to the US government’s belief that many flowers blooming in one greenhouse is better than buying flowers from a farm in Encinitas.

The parallels with DCGS and its well known government contractors and Palantir with its home brew Gotham system are interesting to me. What happens if NASA embraces a commercial provider? Good news for that commercial provider and maybe some push back from the firms chopped out of the pork loin. What happens if Palantir gets rebuffed? Unicorn burgers, anyone?

Stephen E Arnold, April 5, 2016

Patents and Semantic Search: No Good, No Good

March 31, 2016

I have been working on a profile of Palantir (open source information only, however) for my forthcoming Dark Web Notebook. I bumbled into a video from an outfit called ClearstoneIP. I noted that ClearstoneIP’s video showed how one could select from a classification system. With every click,the result set changed. For some types of searching, a user may find the point-and-click approach helpful. However, there are other ways to root through what appears to be patent applications. There are the very expensive methods happily provided by Reed Elsevier and Thomson Reuters, two find outfits. And then there are less expensive methods like Alphabet Google’s odd ball patent search system or the quite functional FreePatentsOnline service. In between, you and I have many options.

None of them is a slam dunk. When I was working through the publicly accessible Palantir Technologies’ patents, I had to fall back on my very old-fashioned method. I tracked down a PDF, printed it out, and read it. Believe me, gentle reader, this is not the most fun I have ever had. In contrast to the early Google patents, Palantir’s documents lack the detailed “background of the invention” information which the salad days’ Googlers cheerfully presented. Palantir’s write ups are slogs. Perhaps the firm’s attorneys were born with dour brain circuitry.

I did a side jaunt and came across a white paper from ClearstoneIP called “Why Semantic Searching Fails for Freedom-to-Operate (FTO).”i The 12 page write up is from a company called ClearstoneIP, which is a patent analysis company. The firm’s 12 pager is about patent searching. The company, according to its Web site is a “paradigm shifter.” The company describes itself this way:

ClearstoneIP is a California-based company built to provide industry leaders and innovators with a truly revolutionary platform for conducting product clearance, freedom to operate, and patent infringement-based analyses. ClearstoneIP was founded by a team of forward-thinking patent attorneys and software developers who believe that barriers to innovation can be overcome with innovation itself.

The “freedom to operate” phrase is a bit of legal jargon which I don’t understand. I am, thank goodness, not an attorney.

The firm’s search method makes much of the ontology, taxonomy, classification approach to information access. Hence, the reason my exploration of Palantir’s dynamic ontology with objects tossed ClearstoneIP into one of my search result sets.

The white paper is interesting if one works around the legal mumbo jumbo. The company’s approach is remarkable and invokes some of my caution light words; for example:

  • “Not all patent searches are the same.”, page two
  • “This all leads to the question…”, page seven
  • “…there is never a single “right” way to do so.”, page eight
  • “And if an analyst were to try to capture all of the ways…”, page eight
  • “to capture all potentially relevant patents…”, page nine.

The absolutist approach to argument is fascinating.

Okay, what’s the ClearstoneIP search system doing? Well, it seems to me that it is taking a path to consider some of the subtlties in patent claims’ statements. The approach is very different from that taken by Brainware and its tri-gram technology. Now that Lexmark owns Brainware, the application of the Brainware system to patent searching has fallen off my radar. Brainware relied on patterns; ClearstoneIP uses the ontology-classification approach.

Both are useful in identifying patents related to a particular subject.

What is interesting in the write up is its approach to “semantics.” I highlighted in billable hour green:

Anticipating all the ways in which a product can be described is serious guesswork.

Yep, but isn’t that the role of a human with relevant training and expertise becomes important? The white paper takes the approach that semantic search fails for the ClearstoneIP method dubbed FTO or freedom to operate information access.

The white paper asserted:

Semantic

Semantic searching is the primary focus of this discussion, as it is the most evolved.

ClearstoneIP defines semantic search in this way:

Semantic patent searching generally refers to automatically enhancing a text -based query to better represent its underlying meaning, thereby better identifying conceptually related references.

I think the definition of semantic is designed to strike directly at the heart of the methods offered to lawyers with paying customers by Lexis-type and Westlaw-type systems. Lawyers to be usually have access to the commercial-type services when in law school. In the legal market, there are quite a few outfits trying to provide better, faster, and sometimes less expensive ways to make sense of the Miltonesque prose popular among the patent crowd.

The white paper, in a lawyerly way, the approach of semantic search systems. Note that the “narrowing” to the concerns of attorneys engaged in patent work is in the background even though the description seems to be painted in broad strokes:

This process generally includes: (1) supplementing terms of a text-based query with their synonyms; and (2) assessing the proximity of resulting patents to the determined underlying meaning of the text – based query. Semantic platforms are often touted as critical add-ons to natural language searching. They are said to account for discrepancies in word form and lexicography between the text of queries and patent disclosure.

The white paper offers this conclusion about semantic search:

it [semantic search] is surprisingly ineffective for FTO.

Seems reasonable, right? Semantic search assumes a “paradigm.” In my experience, taxonomies, classification schema, and ontologies perform the same intellectual trick. The idea is to put something into a cubby. Organizing information makes manifest what something is and where it fits in a mental construct.

But these semantic systems do a lousy job figuring out what’s in the Claims section of a patent. That’s a flaw which is a direct consequence of the lingo lawyers use to frame the claims themselves.

Search systems use many different methods to pigeonhole a statement. The “aboutness” of a statement or a claim is a sticky wicket. As I have written in many articles, books, and blog posts, finding on point information is very difficult. Progress has been made when one wants a pizza. Less progress has been made in finding the colleagues of the bad actors in Brussels.

Palantir requires that those adding content to the Gotham data management system add tags from a “dynamic ontology.” In addition to what the human has to do, the Gotham system generates additional metadata automatically. Other systems use mostly automatic systems which are dependent on a traditional controlled term list. Others just use algorithms to do the trick. The systems which are making friends with users strike a balance; that is, using human input directly or indirectly and some administrator only knowledgebases, dictionaries, synonym lists, etc.

ClearstoneIP keeps its eye on its FTO ball, which is understandable. The white paper asserts:

The point here is that semantic platforms can deliver effective results for patentability searches at a reasonable cost but, when it comes to FTO searching, the effectiveness of the platforms is limited even at great cost.

Okay, I understand. ClearstoneIP includes a diagram which drives home how its FTO approach soars over the competitors’ systems:

image

ClearstoneIP, © 2016

My reaction to the white paper is that for decades I have evaluated and used information access systems. None of the systems is without serious flaws. That includes the clever n gram-based systems, the smart systems from dozens of outfits, the constantly reinvented keyword centric systems from the Lexis-type and Westlaw-type vendor, even the simplistic methods offered by free online patent search systems like Pat2PDF.org.

What seems to be reality of the legal landscape is:

  1. Patent experts use a range of systems. With lots of budget, many fee and for fee systems will be used. The name of the game is meeting the client needs and obviously billing the client for time.
  2. No patent search system to which I have been exposed does an effective job of thinking like an very good patent attorney. I know that the notion of artificial intelligence is the hot trend, but the reality is that seemingly smart software usually cheats by formulating queries based on analysis of user behavior, facts like geographic location, and who pays to get their pizza joint “found.”
  3. A patent search system, in order to be useful for the type of work I do, has to index germane content generated in the course of the patent process. Comprehensiveness is simply not part of the patent search systems’ modus operandi. If there’s a B, where’s the A? If there is a germane letter about a patent, where the heck is it?

I am not on the “side” of the taxonomy-centric approach. I am not on the side of the crazy semantic methods. I am not on the side of the keyword approach when inventors use different names on different patents, Babak Parviz aliases included. I am not in favor of any one system.

How do I think patent search is evolving? ClearstoneIP has it sort of right. Attorneys have to tag what is needed. The hitch in the git along has been partially resolved by Palantir’’-type systems; that is, the ontology has to be dynamic and available to anyone authorized to use a collection in real time.

But for lawyers there is one added necessity which will not leave us any time soon. Lawyers bill; hence, whatever is output from an information access system has to be read, annotated, and considered by a semi-capable human.

What’s the future of patent search? My view is that there will be new systems. The one constant is that, by definition, a lawyer cannot trust the outputs. The way to deal with this is to pay a patent attorney to read patent documents.

In short, like the person looking for information in the scriptoria at the Alexandria Library, the task ends up as a manual one. Perhaps there will be a friendly Boston Dynamics librarian available to do the work some day. For now, search systems won’t do the job because attorneys cannot trust an algorithm when the likelihood of missing something exists.

Oh, I almost forget. Attorneys have to get paid via that billable time thing.

Stephen E Arnold, March 30, 2016

Microsoft and the Open Source Trojan Horse

March 30, 2016

Quite a few outfits embrace open source. There are a number of reasons:

  1. It is cheaper than writing original code
  2. It is less expensive than writing original code
  3. It is more economical than writing original code.

The article “Microsoft is Pretending to be a FOSS Company in Order to Secure Government Contracts With Proprietary Software in ‘Open’ Clothing” reminded me that there is another reason.

No kidding.

I know that IBM has snagged Lucene and waved its once magical wand over the information access system and pronounced, “Watson.” I know that deep inside the kind, gentle heart of Palantir Technologies, there are open source bits. And there are others.

The write up asserted:

For those who missed it, Microsoft is trying to EEE GNU/Linux servers amid Microsoft layoffs; selfish interests of profit, as noted by some writers [1,2] this morning, nothing whatsoever to do with FOSS (there’s no FOSS aspect to it at all!) are driving these moves. It’s about proprietary software lock-in that won’t be available for another year anyway. It’s a good way to distract the public and suppress criticism with some corny images of red hearts.

The other interesting point I highlighted was:

reject the idea that Microsoft is somehow “open” now. The European Union, the Indian government and even the White House now warm up to FOSS, so Microsoft is pretending to be FOSS. This is protectionism by deception from Microsoft and those who play along with the PR campaign (or lobbying) are hurting genuine/legitimate FOSS.

With some government statements of work requiring “open” technologies, Microsoft may be doing what other firms have been doing for a while. See points one to three above. Microsoft is just late to the accountants’ party.

Why not replace the SharePoint search thing with an open source solution? What’s the $1.2 billion MSFT paid for the fascinating Fast Search & Transfer technology in 2008? It works just really well, right?

Stephen E Arnold, March 30, 2016

Expert System Does a Me Too Innovation

March 29, 2016

Years ago I was a rental to an outfit called i2 Group in the UK. Please, don’t confuse the UK i2 with the ecommerce i2 which chugged along in the US of A.

The UK i2 had a product called Analysts Notebook. At one time it was basking in a 95 percent share of the law enforcement and intelligence market for augmented investigatory software. Analysts Notebook is still alive and kicking in the loving arms of IBM.

I thought of the vagaries of product naming when I read “Expert System USA Launches Analysts’ Workspace.”

According to the write up:

Analysts’ Workspace features comprehensive enterprise search and case management software integrated with a customizable semantic engine. It incorporates a sophisticated and efficient workflow process that enables team-wide collaboration and rapid information sharing. The product includes an intuitive dashboard allowing analysts to monitor, navigate, and access information using different taxonomies, maps, and worldviews, as well as intelligent workflow features specifically designed to proactively support analysts and investigators in the different phases of their activities.

The lingo reminds me of the early i2 Group marketing collateral. The terminology has surfaced in some of Palantir’s marketing statements and, quite recently, in the explanation of the venture funded Digital Shadows’ service.

I love me-too products. Where would one be if Mozart had not heard and remembered the note sequences of other composers.

Now the trick will be to make some money. Mozart, though a very good me too innovator, struggled in that department. Expert System, according to Google Finance, is going to have to find a way to keep that share price climbing. Today’s (March 22, 2016) share price is in penny stock territory:

image

Stephen E Arnold, March 29, 2016

Advertising and Search Confidence: Google As Government

March 26, 2016

I read “US State Department Emails: Google Wanted in 2012 to Help Syria’s Rebels Overthrow Assad.” The story might be a load of horse feathers. I stopped and read the article and noted this passage:

Messages between former secretary of state Hillary Clinton’s team and one of the company’s executives detailed the plan for Google to get involved in the region. “Please keep close hold, but my team is planning to launch a tool … that will publicly track and map the defections in Syria and which parts of the government they are coming from,” Jared Cohen, the head of what was then the company’s “Google Ideas” division, wrote in a July 2012 email to several top Clinton officials.

Perhaps this is Palantir envy? Clever folks are confident of their abilities. And here is a See Also reference.

Stephen E Arnold, March 26, 2016

Not So Weak. Right, Watson?

March 25, 2016

I read an article which provided to be difficult to find. None of my normal newsreaders snagged the write up called “The Pentagon’s Procurement System Is So Broken They Are Calling on Watson.” Maybe it is the singular Pentagon hooked with the plural pronoun “they”? Hey, dude, colloquial writing is chill.

Perhaps my automated systems’ missing the boat was the omission of the three impressive letters “IBM”? If you follow the activities of US government procurement, you may want to note the article. If you are tracking the tension between IBM i2 and Palantir Technologies, the article adds another flagstone to the pavement that IBM is building to support it augmented intelligence activities in the Department of Defense and other US government agencies.

Let me highlight a couple of comments in the write up and leave you to explore the article at whatever level you choose. I noted these “reports”:

The Air Force is currently working with two vendors, both of which have chosen Watson, IBM’s cognitive learning computer, to develop programs that would harness artificial intelligence to help businesses and government acquisitions officials work through the mind-numbing system.

The write up identifies one of the vendors working on IBM Watson for the US Air Force. The company is Applied Research.

I circled this quote: “The Pentagon’s procurement system is the “perfect application for Watson.”

The goslings and I love “perfect” applications.

How does Watson learn about procurement? The approach is essentially the method used in the mid 1990s by Autonomy IDOL. Here’s a passage I highlighted:

But first Watson must be trained. The first step is to feed it all the relevant documents. Then its digital intellect will be molded by humans, asking question after question, about 5,000 in all, to help understand context and the particular nuance that comes with federal procurement law.

How does this IBM deal fit into the Palantir versus IBM interaction? That’s a good question. What is clear is that the US Air Force has embraced a solution which includes systems and methods first deployed two decades ago.

What’s that about the pace of technology?

Stephen E Arnold, March 25, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta