CyberOSINT banner

DuckDuckGo: Filtering

July 22, 2016

I read “Is DuckDuckGo.com Partially Enforcing the “Celebrity Threesome Injunction“? The point of the write up is that information is filtered from search systems, including the privacy-centric system DuckDuckGo.com. I assume the queries summarized in the write up are spot on. If accurate, one cannot search that which is not in an index. That’s helpful for those who want to be thorough. It is also helpful for those who find themselves the subject of write ups already published and want to keep the links out of a search system’s results page. With folks loving the mobile research experience, who would know? The more interesting question, “Does anyone care?” A good example is the “artist” whose work disappeared from the Alphabet Google thing’s Blogger system. See “Google Deletes Artist’s Blog and a Decade of His Work along with It.” Back ups are good if not filtered by a helpful cloud service. Where did my music go anyway?

Stephen E Arnold, July 22, 2016

Alphabet Google Is Busy Reinventing

July 22, 2016

From Forbes in India (“Sundar Pichai to Reinvent Google with a Heavy Dose of Artificial Intelligence” which may require a proxy maneuver due to the digitally with it Forbes) or Switzerland (“Google’s New Research Lab in Zurich Is Inventing the Future of Search”) — the Alphabet Google thing is trying to reinvent search.

There you go: Stark evidence that Google information retrieval system is deeply flawed. The electric car does not reinvent the car. But search has to reinvent search.

This is a big and probably futile job. My view is that search is an evolutionary beastie. Incremental innovations from research labs, one man band coders, and start ups with one good idea and couple of crazed investors do the job.

Google itself was a roll up of ideas from IBM Almaden (hell, Jon Kleinberg), AltaVista (hello, Jeff Dean, Simon Tong, and Sanjay Ghemawat), and the fumble bumbles of folks at precursors (hello, AskJeeves and Lycos).

The India angle states:

Think of it as Search 3.0—a new, interactive way to communicate with Google itself. With it you’ll be able to order a ticket, book a flight, play music, schedule a task, reply to a message; the Google assistant might even write it for you. It might prompt you to order flowers ahead of Mother’s Day or to pack for your upcoming trip, and it might be able to pick up an earlier conversation from where you left off. In other words, it will be there, ready to help, in your phone, your speakers, your television, your car, your watch and eventually everywhere. “You are trying to go about your day, and in an ambient way, things are there to help you,” Pichai says. Making sure this assistant lives up to its full potential will take years, and building it will be harder than it was for Page and co-founder Sergey Brin to create search itself. Adds Pichai: “In every dimension, it is more ambitious.”

Yep, ambitious.

From the Swiss side:

he new team has a distinct goal: to invent the future of Search, a voice-activated, human-like entity that can answer any query intelligently. “We are building the ultimate assistant. In two years, you can expect Google to become a personal life assistant across multiple surfaces, including your phone, Google Home, even cars,” Mogenet [Google wizard] said. Some of Google’s best-known products are already shaped by machine learning, the ability of computers to spot patterns in large datasets and learn by example. For instance, Google Photos uses it to understand the content of an image. This means you could search for “cardigan corgi” or “passport” or “birthday celebrations 2014” and the app will bring up the relevant photos.

There you go. Reinvent.

The challenge is to find a way to avoid the stagnation which seems to befall certain types of high technology outfits. Do you use your DEC Rainbow today?

I love the Google. It is just super. The problem is that as it has concentrated traffic, it has left itself unable to respond to opportunities such as those identified by Facebook and Amazon. By the way, both of these outfits face some challenges as well.

The investment in search will benefit some folks. But how likely is it that Google will come up with an “innovation” that matters. I think that when octopus companies do something — whether it is good or bad — it is easy to define whatever happens as success.

The problem is that information returned from Google is often off point. When I run queries for documents I have in my hand, I cannot find them without jumping through hoops. I documented this with a Dark Web paper from Denmark in this blog. Homonyms give the Google fits. Even though my search history is available to Mother Google, the system is tone deaf for my queries. When I look for certain information, the data are often disappeared. I noticed that indexing of pastesites, PDF files, and PowerPoint presentations has become laughable.

Innovation is more than a public relations campaign. How do I know? Google’s marketing is starting to remind me of IBM Watson. You know Watson, the revolutionary information access system from Big Blue. Yep, innovation.

Stephen E Arnold, July 22, 2016

Amazon: Not the Corner Store? Big Insight

July 21, 2016

I love Amazon almost as much as I love Google. I would have a tough time deciding which of these services warrants more of my affection, trust, and respect. I said to myself “Bummer” when I read “Amazon’s Dominance Is Bad for Your Business.” I recently ordered a paperback from Amazon and noticed that the 150 page monograph was a $1,000, not $10. Anyone could have clicked the incorrect link between the correctly priced volume and the used discounted books. Amazon respected my klutziness, and I think I got my money back after I sent the $1000 paperback back to the outstanding merchant. This firm obviously valued its paperback more highly than the half dozen vendors selling the same paperback for $10. What more could one want? (One of my goslings asked me, “Why does Amazon list certain products at vastly inflated prices? I don’t know. I love Amazon. Love is blind.)

The write up includes a quote allegedly generated by the world’s smartest person, Jeff Bezos; to wit:

“…Amazon should approach these small publishers the way a cheetah would pursue a sickly gazelle.”

I like that. Google’s meat eating dinosaur is, after all, dead unless the team solving death brings T Rex back to life. A cheetah is a here and now creature able to snag small, sickly, or inept prey with a batting average a major league player would covet.

The write up also states:

Amazon has done a very good job with search and discovery on mobile,” BloomReach marketing chief Joelle Kaufman said. “They are capturing the lion’s share of mobile revenue. Consumers said they start on a cellphone and they use it as a research tool. But 81 percent want to buy on that laptop/desktop.”

Google, it seems, is an also ran in the shopping search sector. But what about Amazon’s competitors and merchants who do not want to sell their products via Amazon?

The answer is, according to the write up:

There are still a plethora of avenues to make sales through, and portals to gain consumer attention. Despite Amazon’s utter dominance in the U.S. e-retail market, you can still grow your business, and become highly successful along the way. Just remember the importance of content, social media, and a great attitude. If David had submitted to Goliath’s size before the battle had begun, he never would have realized his own strength and capabilities.

This sounds like Google Adwords, Snapchat, and YouTube videos to me? Those work really well for mom and pop merchants (at least for the small number remaining in the good, old USA), small businesses, and unfunded start ups.

Is what’s good for Amazon good for us or was it “What’s good for General Motors is good for the USA”? When will Amazon address the shortcomings I find in Amazon search? Maybe never. If it is not broken, why try to fix it. That’s why suggested prices are irrelevant in the Amazon jungle.

Stephen E Arnold, July 21, 2016

Coveo Wins a Stevie. Congrats Coveo. What Is a Stevie?

July 21, 2016

The article titled Coveo Sweeps Early 2016 Awards Programs on Coveo promotes some of the many honors and recognitions that the Coveo company and its apps have earned. Among these is the Gold Stevie Award they earned for Sales and Customer Service through Coveo Reveal. The article details the competition for this prestigious yet unknown award,

“More than 2,100 nominations from organizations of all sizes and in virtually every industry were evaluated in this year’s competition, an increase of 11% over 2015. Finalists were determined by the average scores of 115 professionals worldwide, acting as preliminary judges. More than 60 members of several specialized judging committees determined the Gold, Silver and Bronze Stevie Award placements from among the Finalists during final judging.”

Coveo Reveal is the first cloud-based, machine leaning search platform for the enterprise. Its main users are customer service professionals, who are able to gain a stronger understanding of areas that can be improved in the overall search process. No surprise that it is winning awards, but we are unfamiliar with this Stevie recognition. According to the American Stevie Awards website, the award has been around since 2002 is named Stevie as in Stephen after the Greek derivation: “crowned.”

 

Chelsea Kerwin, July 21, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.

 

Scholarship Evolving with the Web

July 21, 2016

Is big data good only for the hard sciences, or does it have something to offer the humanities? Writer Marcus A Banks thinks it does, as he states in, “Challenging the Print Paradigm: Web-Powered Scholarship is Set to Advance the Creation and Distribution of Research” at the Impact Blog (a project of the London School of Economics and Political Science). Banks suggests that data analysis can lead to a better understanding of, for example, how the perception of certain historical events have evolved over time. He goes on to explain what the literary community has to gain by moving forward:

“Despite my confidence in data mining I worry that our containers for scholarly works — ‘papers,’ ‘monographs’ — are anachronistic. When scholarship could only be expressed in print, on paper, these vessels made perfect sense. Today we have PDFs, which are surely a more efficient distribution mechanism than mailing print volumes to be placed onto library shelves. Nonetheless, PDFs reinforce the idea that scholarship must be portioned into discrete units, when the truth is that the best scholarship is sprawling, unbounded and mutable. The Web is flexible enough to facilitate this, in a way that print could never do. A print piece is necessarily reductive, while Web-oriented scholarship can be as capacious as required.

“To date, though, we still think in terms of print antecedents. This is not surprising, given that the Web is the merest of infants in historical terms. So we find that most advocacy surrounding open access publishing has been about increasing access to the PDFs of research articles. I am in complete support of this cause, especially when these articles report upon publicly or philanthropically funded research. Nonetheless, this feels narrow, quite modest. Text mining across a large swath of PDFs would yield useful insights, for sure. But this is not ‘data mining’ in the maximal sense of analyzing every aspect of a scholarly endeavor, even those that cannot easily be captured in print.”

Banks does note that a cautious approach to such fundamental change is warranted, citing the development of the data paper in 2011 as an example.  He also mentions Scholarly HTML, a project that hopes to evolve into a formal W3C standard, and the Content Mine, a project aiming to glean 100 million facts from published research papers. The sky is the limit, Banks indicates, when it comes to Web-powered scholarship.

 

Cynthia Murrell, July 21, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.

 

Coveo Changes Its Positioning

July 20, 2016

Short honk: Coveo, the Canadian enterprise search outfit, has changed its positioning. I should probably say “added to” it positioning as an information retrieval vendor. “Montreal Opening for Big Data Search Firm Coveo” reports that the company has a new office in Montréal. What I noticed was the description of Coveo as a “big data search firm.” The company has been describing itself as a customer support solution and a vendor of unified search. But Big Data is a thing, so it makes sense that an information processing outfit would embrace the moniker. The write up reports that a Coveo wizard said:

We have an amazing pipeline of cloud solutions, and the integration of machine learning, artificial intelligence and data-driven personalization to our technology creates huge market opportunities. We believe Montreal is the best place for us to build on this momentum and assert our position as market leader.

The write up does not mention if any provincial or national subsidies were provided to Coveo. I am no expert on Canada, but I have heard that incentives, including salary support, have been made available to firms meeting certain criteria.

Stephen E Arnold, July 20, 2016

Recommind Follows BRS, IDI Basis, Fulcrum, and Nstein

July 19, 2016

OpenText is, by golly, one of the outfits which “owns” more search and retrieval technology than any other firm I can name. I read “OpenText Lives Up to Promise, Acquires Recommind.” The write up points out:

Just a week after it announced it was selling off $600 million worth of senior debt notes to fund future acquisitions, OpenText dropped $163 million to acquire Recommind, an e-discovery and information analytics provider.

The write up explains that Recommind “could generate between $70 and $80 million of annualized revenues.” This is a hefty sum for a system which has in my mind been dumped into the Autonomy-type search system pigeon hole. (If anyone is interested, I have a profile of Recommind technology. Write benkent2020 at yahoo dot com for details.) Frankly I was surprised at the modest size of the deal. What would Recommind have been worth if it had added Big Data, advanced analytics, and artificial intelligence to its system? On the other hand, maybe Recommind did exactly that.

Several observations:

  • Search and content processing systems incur significant technological debt. This means that the software system has be fed regular injections of real cash to work, keep customers happy, and keep pace with the competition
  • A vendor with multiple systems has to figure out exactly what system to pitch to a potential customer. This is often difficult if the prospect asks such questions as, “What is Nstein’s capability in terms of Recommind’s functions?” Or, “What search system is included with RedDot and what other options are available to install today and use tomorrow?”
  • Portfolio search and content processing vendors are rare birds in today’s corporate jungle. IBM is similar, and its financial performance suggests that having numerous search and content processing arrows in its quiver does not seem to hit the financial bull’s eye.

OpenText, in my view, is a company which may have to make very hard decisions about what technology debt to retire. The interest on that debt could, if left unmanaged, could lead to financial headaches.

Stephen E Arnold, July 19, 2016

Elasticsearch API Calls

July 17, 2016

Short honk: Are you a fan of Elasticsearch, the Lucene based open source system giving proprietary vendors of search systems a migraine? If you are, you will want to point your browser at “Elasticsearch-API Info.” The information is presented in a table which lists and annotates Elasticsearch’s APIs from bulk to update. Useful stuff.

Stephen E Arnold, July 17, 2016

Short Honk: Elassandra

July 16, 2016

Just a factoid. There is now a version of Elasticsearch which is integrated with Cassandra. You can get the code for version 2.1.1-14 via Github. Just another example of the diffusion of the Elastic search system.

Stephen E Arnold, July 16, 2016

Google and Song Lyrics

July 13, 2016

I love the results I get for pop stars, TV shows, and binge watching. To feed the curious minds of online researchers, Google has upped the ante. “Google Licenses LyricFind for Search Results” reports that Google has addressed its miserable search systems for the words in tunes. Consider this lyric:

“My wrist deserve a shout out, I’m like “what up, wrist’?
My stove deserve a shout out, I’m like “what up, stove’?”

According to the write up:

A query for the lyrics to a specific song will pull up the words to much of that song, freeing users from having to click through to another website. Google rolled out the lyrics feature in the U.S. today (June 27), though it has licenses to display the lyrics internationally as well.

I am definitely thrilled. Why worry about the indexing of PowerPoints, PDFs, and other content when I have access to the source of:

I’m that red bull, now let’s fly away.

What’s really flown away? Rag mop.

Stephen E Arnold, July 13, 2016

Next Page »