CyberOSINT banner

Career Advice from Successful Googlers

November 18, 2015

A few words of wisdom from a Google veteran went from Quora query to Huffington Post article in, “What It Takes to Rise the Ranks at Google: Advice from a Senior Staff Engineer.” The original question was, “How hard is it to make Senior Engineer at Google.” HuffPo senior editor Nico Pitney reproduces the most popular response, that of senior engineer Carlos Pizano. Pizano lists some of his education and pre-Google experience, and gives some credit to plain luck, but here’s the part that makes this good guidance for approaching many jobs:

“I happen to be a believer of specialization, so becoming ‘the person’ on a given subject helped me a lot. Huge swaths of core technology key to Google’s success I know nothing about, of some things I know all there is to know … or at least my answers on the particular subject were the best to be found at Google. Finally, I never focused on my career. I tried to help everybody that needed advice, even fixing their code when they let me and was always ready to spread the knowledge. Coming up with projects but giving them to eager, younger people. Shine the light on other’s accomplishments. All that comes back to you when performance review season comes.”

Knowing your stuff and helping others—yes, that will go a long way indeed. For more engineers’ advice, some of which is more Google-specific, navigate to the list of responses here.

Cynthia Murrell, November 18, 2015

Sponsored by, publisher of the CyberOSINT monograph


Dassault: Lowered to Hold and Doing the Foundation Thing

November 13, 2015

Dassault Systèmes owns Exalead, one of the search companies forged in the white hot crucibles of the late 1990s. I did a quick check on the fortunes of Exalead, which was purchased by Dassault in 2010. I don’t hear much about Exalead, which had at the time of its acquisition some interesting technology.

What I learned in my quick check was two things. Both struck me as interesting.

First, in “Dassault Systemes Receives Consensus Rating of “Hold” from Brokerages,” I noted the “hold.” That’s one way of saying, “Yikes, we need to watch this outfit.” Some might argue that this is a vote of confidence. I, on the other hand, believe that this is one more signal that companies which have bet big on search are going to face some lean times in the months ahead. I noted this passage in the write up:

Berenberg Bank reissued a “sell” rating on shares of Dassault Systèmes in a report on Friday, September 25th. Credit Suisse restated an “outperform” rating on shares of Dassault Systèmes in a research report on Monday, September 21st. Finally, Zacks cut shares of Dassault Systèmes from a “buy” rating to a “hold” rating in a research report on Tuesday.

Second, Dassault is doing what Thomson Reuters did; that is, morph into foundationville. I am not sure what the tax advantages of this are and I am not too curious. I read in “La Fondation” that:

La Fondation Dassault Systèmes will provide grants, digital content and skill sets in virtual technologies to education and research initiatives at forward-thinking academic institutions, research institutes, museums, associations, cultural centers and other general interest organizations throughout the European Union. This support will provide greater access to 3D content, technology and simulation applications that have long been used by industry for the design, engineering and manufacturing of most of the products society relies on today. Such access can help create new learning experiences and encourage greater interest in science, math, engineering and technology disciplines among students.

From my crumbling office in rural Kentucky, this looks like a reprise of the “old” Lexis effort of providing “free access” to the Lexis online system in the hopes that future attorneys will continue to use Lexis. The free stuff goes away when the aspiring lawyer or future Uber driver passes the bar. How is that free stuff working out?

My thought is that neither of these news items does much to boost my confidence that Exalead is becoming a big revenue player at the upscaliest of the upscale French corporations.

The Exalead folks did know how to provide a great box lunch before the acquisition.

Stephen E Arnold, November 13, 2015

SLI H116 and Related Info Swizzles

November 13, 2015

I read an item produced by a research outfit called Edison. What’s interesting is that the “news” refers to SLI Systems, a New Zealand based outfit which sells eCommerce search software. The company has been going through some choppy water and has two new executives. One is a president, Chris Brennan. The more recent appointment is Martin Onofrio’s taking the job of Chief Revenue Officer. Prior to joining SLI, Mr. Onofrio was, according to the Edison news item, the chief revenue officer at Attensity. That’s one of the sentiment oriented content processing outfits. (Attensity has been a low profile outfit for a while.)

In that “report” from Edison which you can read at this link, I noted a reference to H116 revenue. The report did not explain what this type of revenue is. I did a quick search and learned that H116 does not seem to be a major revenue type. H116 is a type of aluminum, a motorized stepper, and a string of characters used by a number of different manufacturers.

After some thinking whilst listening to the Jive Five, I realized that Edison and SLI Systems are using H116 as a token for “revenues for the first half of fiscal 2016.” There you go.

Another write up adds this color, which I think the Edison experts could have recycled when they made clear what H116 means:

Revenue is forecast to rise to $17.3 million in the six months ending December 31 from $13.6 million a year earlier when sales accelerated at a 27% pace, the Christchurch-based company said in a statement.

Here’s the important part in my view:

The software developer missed its sales forecast for the second half of the 2015 year, and has hired Martin Onofrio as its new chief revenue officer to drive revenue growth.

A couple of quick thoughts before I go watch the mist rise from the mine drainage pond:

  1. SLI might want to make sure that its experts output “news” which is easy to understand
  2. Inclusion of revenue challenges is probably as important, if not more important, than opining about the future. The future is not yet here, so, like picking the winner of the Kentucky Derby, touts are different from which nag crosses the finish line first.
  3. Attensity, in my opinion, has faced its own revenue head winds. I wonder if a chief revenue officer can generate revenue in a world in which there are open source and low cost eCommerce search systems?

A word to Edison: Please, do not write to complain about my nagging about the H116 thing. You offer a two page report which is one page. What’s up with that? Friday the 13th bad luck or a standard work product?

Stephen E Arnold, November 13, 2015

Product Hunt Adds Collections to Its Search Results

November 13, 2015

Product Hunt is a website for the cutting-edge consumer, where users share information about the latest and greatest in the tech market. The Next Web tells us, “Product Hunt Now Lets You Follow and Search for Collections.” A “collection” can be established by any user to curate and share groups of products. An example would be a selection of website-building tools, or of the best electronic-device accessories for charging electronic devices. The very brief write-up reveals:

Product Hunt, the Web’s favorite destination to discover new apps, gadgets and connected services, has updated its Collections feature, allowing users to follow and search for curated lists. You can now follow any collection you find interesting to receive notifications when new products are added to them. Collections will also show up in search results alongside products. In addition, curators can add comments to products in their collections to describe them or note why they’ve included them in their list.”

So now finding the best of the latest is even easier. An important tool for anyone with a need, and the means, to keep in front of the technology curve. Launched in 2013, Product Hunt is based in San Francisco. Their Collections feature was launched last December, and this year the site also added sections specifically for books and for games.

Cynthia Murrell, November 13, 2015

Sponsored by, publisher of the CyberOSINT monograph


Semantics and the Web: The Bacon Has Been Delivered

November 12, 2015

I read and viewed “What Happened to the Semantic Web.” For one thing, the search engine optimization has snagged the idea in order to build interest in search result rankings. The other thing I know if that most people are blissfully unaware of what semantics are supposed to be and how semantics impacts their lives. Many folks are thrilled when their mobile phone points them to a pizza joint or out of an unfamiliar part of town.

The write up explains that for the last 15 years there has been quite a bit of the old rah rah for semantics on the Web. Well, the semantics are there. The big boys like Google and Microsoft are making this happen. If you are interested in triples, POST, and RDF, you can work through the acronyms and get to the main points of the article.

The bulk of the write up is a series of comparative screen shots. I looked at these and tried to replicate a couple of them. I was not able to derive the same level of thrillness which the article expresses. Your mileage may vary.

Here’s the passage I highlighted in a definitely pale shade of green:

As you can see, there is no question that the Web already has a population of HTML documents that include semantically-enriched islands of structured data. This new generation of documents creates a new Web dimension in which links are no longer seen solely as document addresses, but can function as unambiguous names for anything, while also enabling the construction of controlled natural language sentences for encoding and decoding information [data in context] — comprehensible by both humans and machines (bots). The fundamental goal of the Semantic Web Project has already been achieved. Like the initial introduction of the Web, there wasn’t an official release date — it just happened!

I surmise this is the semantic heaven described by Ramanathan Guha and his series of inventions, now almost a decade old. What’s left out is a small point: The semantic technology allows Google and some other folks to create a very interesting suite of databases. Good or bad? I will leave it to you to revel in this semantic fait accompli.

Stephen E Arnold, November 12, 2015

Google Books Is Not Violating Copyright

November 12, 2015

Google Books was controversial the moment it was conceived.  The concept is simple and effective though: books in academic libraries are scanned and snippets are made available online.  People have the ability to search Google Books for specific words or phrases, then they are shown where it is contained within a book.  The Atlantic wrote, “After Ten Years, Google Books Is Legal” about how a Second Circuit judge panel ruled in favor of Google Books against the Authors Guild.

The panel ruled that Google Books fell under the terms of “Fair Use,” which as most YouTubers know, is the ability to use a piece of copyrighted content within a strict set of rules.  Fair usage includes works of parody, academic works, quotations, criticism, or summarization.

The Authors Guild argued that Google Books was infringing upon its members copyrights and stealing potential profits, but anyone knows that too much of a copyright is a bad thing.  It places too many limitations on how the work can be used, harming the dissemination of creative and intellectual thought.

“’It gives us a better senses of where fair use lies,” says Dan Cohen, the executive director of the Digital Public Library of America. They “give a firmer foundation and certainty for non-profits…Of all the parts of Judge Leval’s decision, many people I talked to were happiest to see that it stressed that fair use’s importance went beyond any tool, company, or institution. ‘To me, I think a muscular fair use is an overall benefit to society, and I think it helps both authors and readers,’ said Cohen.”

Authors do have the right to have their work copyright and make a profit off it, which should be encouraged and a person’s work should not be given away for free.  There is a wealth of information out there, however, that is kept under lock and key and otherwise would not be accessed with a digital form.  Google Books only extends a book’s reach, speaking from one who has relied on it for research.

Whitney Grace, November 12, 2015
Sponsored by, publisher of the CyberOSINT monograph

Google Takes Aim at Internet Crime

November 12, 2015

Google has a plan to thwart Internet crime: make it too expensive to be worth it. The company’s Online Security Blog examines the issue in “New Research: The Underground Market Fueling For-Profit Abuse.” The research was presented last June at the Workshop on the Economics of Information Security 2015; I recommend those interested check out the full report here.

The post describes the global online black market that has grown over the last ten years or so, where criminals trade in such items as stolen records, exploit kits, scam hosting, and access to compromised computers. The profit centers which transfer the shady funds rest upon an infrastructure, the pieces of which cost money. Google plans to do what it can to increase those costs. The write-up explains:

“Client and server-side security has dominated industry’s response to digital abuse over the last decade. The spectrum of solutions—automated software updates, personal anti-virus, network packet scanners, firewalls, spam filters, password managers, and two-factor authentication to name a few—all attempt to reduce the attack surface that criminals can penetrate. While these safeguards have significantly improved user security, they create an arms race: criminals adapt or find the subset of systems that remain vulnerable and resume operation.

“To overcome this reactive defense cycle, we are improving our approach to abuse fighting to also strike at the support infrastructure, financial centers, and actors that incentivize abuse. By exploring the value chain required to bulk register accounts, we were able to make Google accounts 30–40% more expensive on the black market. Success stories from our academic partners include disrupting payment processing for illegal pharmacies and counterfeit software outlets advertised by spam, cutting off access to fake accounts that pollute online services, and disabling the command and control infrastructure of botnets.”

Each of the links in the above quote goes to an in-depth paper, so there’s plenty of material to check out there. Society has been trying for centuries to put black markets out of business. Will the effort be more successful in the virtual realm?

Cynthia Murrell, November 12, 2015

Sponsored by, publisher of the CyberOSINT monograph

On the Prevalence of Open Source

November 11, 2015

Who would have thought, two decades ago, that open source code was going to dominate the software field? Vallified’s Philip O’Toole meditates on “The Strange Economics of Open-Source Software.” Though  the industry gives so much away for free, it’s doing quite well for itself.

O’Toole notes that closed-source software is still in wide use, largely in banks’ embedded devices and underpinning services. Also, many organizations are still attached to their Microsoft and Oracle products. But the tide has been turning; he writes:

“The increasing dominance of open-source software seems particularly true with respect to infrastructure software.  While security software has often been open-source through necessity — no-one would trust it otherwise — infrastructure is becoming the dominant category of open-source. Look at databases — MySQL, MongoDB, RethinkDB, CouchDB, InfluxDB (of which I am part of the development team), or cockroachdb. Is there anyone today that would even consider developing a new closed-source database? Or take search technology — elasticsearch, Solr, and bleve — all open-source. And Linux is so obvious, it is almost pointless to mention it. If you want to create a closed-source infrastructure solution, you better have an enormously compelling story, or be delivering it as part of a bigger package such as a software appliance.”

It has gotten to the point where developers may hesitate to work on a closed-source project because it will do nothing for their reputation.  Where do the profits come from, you may ask? Why in the sale of services, of course. It’s all part of today’s cloud-based reality.

Cynthia Murrell, November 11, 2015

Sponsored by, publisher of the CyberOSINT monograph

Another Semantic Search Play

November 6, 2015

The University of Washington has been search central for a number of years. Some interesting methods have emerged. From Jeff Dean to Alon Halevy, the UW crowd has been having an impact.

Now another search engine with ties to UW wants to make waves with a semantic search engine. Navigate to “Artificial-Intelligence Institute Launches Free Science Search Engine.” The wizard behind the system is Dr. Oren Etzioni. The money comes from Paul Allen, a co founder of Microsoft.

Dr. Etzioni has been tending vines in the search vineyard for many years. His semantic approach is described this way:

But a search engine unveiled on 2 November by the non-profit Allen Institute for Artificial Intelligence (AI2) in Seattle, Washington, is working towards providing something different for its users: an understanding of a paper’s content. “We’re trying to get deep into the papers and be fast and clean and usable,” says Oren Etzioni, chief executive officer of AI2.

Sound familiar: Understanding what a sci-tech paper means?

According to the write up:

Semantic Scholar offers a few innovative features, including picking out the most important keywords and phrases from the text without relying on an author or publisher to key them in. “It’s surprisingly difficult for a system to do this,” says Etzioni. The search engine uses similar ‘machine reading’ techniques to determine which papers are overviews of a topic. The system can also identify which of a paper’s cited references were truly influential, rather than being included incidentally for background or as a comparison.

Does anyone remember Gene Garfield? I did not think so. There is a nod to Expert System, an outfit which has been slogging semantic technology in an often baffling suite of software since 1989. Yep, that works out to more than a quarter of a century.) Hey, few doubt that semantic hoohah has been a go to buzzword for decades.

There are references to the Microsoft specialist search and some general hand waving. The fact that different search systems must be used for different types of content should raise some questions about the “tuning” required to deliver what the vendor can describe as relevant results. Does anyone remember what Gene Garfield said when he accepted the lifetime achievement award in online? Right, did not think so. The gist was that citation analysis worked. Additional bells and whistles could be helpful. But humans referencing substantive sci-tech antecedents was a very useful indicator of the importance of a paper.

I interpreted Dr. Garfield’s comment as suggesting that semantics could add value if the computational time and costs could be constrained. But in an era of proliferating sci-tech publications, bells and whistles were like chrome trim on a 59 Oldsmobile 98. Lots of flash. Little substance.

My view is that Paul Allen dabbled in semantics with Evri. How did that work out? Ask someone from the Washington Post who was involved with the system.

Worth testing the system in comparative searches against commercial databases like Compendex, ChemAbs, and similar high value commercial databases.

Stephen E Arnold, November 5, 2015

Google Continues to Improve Voice Search

November 5, 2015

Google’s research arm continues to make progress on voice search. The Google Research Blog updates us in, “Google Voice Search: Faster and More Accurate.” The Google Speech Team begins by referring back to 2012, when they announced their Deep Neural Network approach. They have since built on that concept; the team now employs a couple of models built upon recurrent neural networks, which they note are fast and accurate: connectionist temporal classification and sequence discriminative (machine) training techniques. The write-up goes into detail about how speech recognizers work and what makes their latest iteration the best yet. I found the technical explanation fascinating, but it is too lengthy to describe here; please see the post for those details.

I am still struck when I see any article mention that an algorithm has taken the initiative. This time, researchers had to rein in their model’s insightful decision:

“We now had a faster and more accurate acoustic model and were excited to launch it on real voice traffic. However, we had to solve another problem – the model was delaying its phoneme predictions by about 300 milliseconds: it had just learned it could make better predictions by listening further ahead in the speech signal! This was smart, but it would mean extra latency for our users, which was not acceptable. We solved this problem by training the model to output phoneme predictions much closer to the ground-truth timing of the speech.”

At least the AI will take direction. The post concludes:

“We are happy to announce that our new acoustic models are now used for voice searches and commands in the Google app (on Android and iOS), and for dictation on Android devices. In addition to requiring much lower computational resources, the new models are more accurate, robust to noise, and faster to respond to voice search queries – so give it a try, and happy (voice) searching!”

We always knew natural-language communication with machines would present huge challenges, ones many said could never be overcome. It seems such naysayers were mistaken.

Cynthia Murrell, November 5, 2015

Sponsored by, publisher of the CyberOSINT monograph


« Previous PageNext Page »