Smartlogic: A Buzzword Blizzard

August 2, 2017

I read “Semantic Enhancement Server.” Interesting stuff. The technology struck me as a cross between indexing, good old enterprise search, and assorted technologies. Individuals who are shopping for an automatic indexing systems (either with expensive, time consuming hand coded rules or a more Autonomy-like automatic approach) will want to kick the tires of the Smartlogic system. In addition to the echoes of the SchemaLogic approach, I noted a Thomson submachine gun firing buzzwords; for example:

best bets (I’m feeling lucky?)
dynamic summaries (like Island Software’s approach in the 1990s)
faceted search (hello, Endeca?)
model
navigator (like the Siderean “navigator”?)
real time
related topics (clustering like Vivisimo’s)
semantic (of course)
taxonomy
topic maps
topic pages (a Google report as described in US29970198481)
topic path browser (aka breadcrumbs?)
visualization

What struck me after I compiled this list about a system that “drives exceptional user search experiences” was that Smartlogic is repeating the marketing approach of traditional vendors of enterprise search. The marketing lingo and “one size fits all” triggered thoughts of Convera, Delphes, Entopia, Fast Search & Transfer, and Siderean Software, among others.

I asked myself:

Is it possible for one company’s software to perform such a remarkable array of functions in a way that is easy to implement, affordable, and scalable? There are industrial strength systems which perform many of these functions. Examples range from BAE’s intelligence system to the Palantir Gotham platform.

My hypothesis is that Smartlogic might struggle to process a real time flow of WhatsApp messages, YouTube content, and mobile phone intercept voice calls. Toss in the multi language content which is becoming increasingly important to enterprises, and the notional balloon I am floating says, “Generating buzzwords and associated over inflated expectations is really easy. Delivering high accuracy, affordable, and scalable content processing is a bit more difficult.”

Perhaps Smartlogic has cracked the content processing equivalent of the Voynich manuscript.

image

Will buzzwords crack the Voynich manuscript’s inscrutable text? What if Voynich is a fake? How will modern content processing systems deal with this type of content? Running some content processing tests might provide some insight into systems which possess Watson-esque capabilities.

What happened to those vendors like Convera, Delphes, Entopia, Fast Search & Transfer, and  Siderean Software, among others? (Free profiles of these companies are available at www.xenky.com/vendor-profiles.) Oh, that’s right. The reality of the marketplace did not match the companies’ assertions about technology. Investors and licensees of some of these systems were able to survive the buzzword blizzard. Some became the digital equivalent of Ötzi, 5,300 year old iceman.

Stephen E Arnold, August 2, 2017

Palantir Technologies: Recycling Day Old Hash

July 31, 2017

I read “Palantir: The Special Ops Tech Giant That Wields As Much Real World Power as Google.” I noticed these hot buttons here:

“Special ops” for the Seal Team 6 vibe. Check.

“Wields” for the notion of great power. Check.

“Real world.” A reminder of the here and now, not an airy fairy digital wonkiness. Check.

“Google.” Yes. Palantir as potent as the ad giant Google. Check.

That’s quite a headline.

The write up itself is another journalistic exposé of software which ingests digital information and outputs maps, reports, and visualizations. Humans index too. Like the i2 Analyst Notebook, the “magic” is mostly external. Making these Fancy Dan software systems work requires computers, of course. Humans are needed too. Trained humans are quite important, essential, in fact.

The Guardian story seems to be a book review presented as a Gladwell-like revisionist anecdote. See, for example, Done: The Secret Deals That Are Changing Our World by Jacques Peretti (Hodder & Stoughton, £20). You can buy a copy from bookshop.theguardian.com. (Online ad? Maybe?)

Read the Palantir story which stuffed my Talkwalker alert with references to the article. Quite a few bloggers are recycling the Guardian newspaper story. Buzzfeed’s coverage of the Palo Alto company evoked the same reaction. I will come back to the gaps in these analyses in a moment.

The main point of the Guardian’s July 30, 2017, story strikes me as:

Palantir tracks everyone from potential terrorist suspects to corporate fraudsters…child traffickers, and what they refer to as subversives. But it all done using prediction.

Right. Everyone! Potential terrorist suspects! And my favorite “all”. Using “prediction” no less.

Sounds scary. I am not sure the platforms work with the type of reliability that the word “all” suggests. But this is about selling books, not Palantir and similar companies’ functionality, statistical methods, or magical content processing. Confusing Hollywood with reality is easy today: At least for some folks.

Palantir licenses software to organizations. Palantir is an “it,” not a they. The company uses the lingo of its customers. Subversives is one term, but it is more suggestive in my opinion than “bad actor,” “criminal,” “suspect,” or terrorist.” I think the word “tracks” is pivotal. Palantir’s professionals, like Pathfinder, look at deer tracks and nails the beastie. I want to point out that “prediction”—partly the Bayesian, Monte Carlo, and Markovian methods pioneered by Autonomy in the mid 1990s—is indeed used for certain processes. What’s omitted is that Palantir is just one company in the content processing and search and retrieval game. I am not convinced that its systems and methods are the best ones available today. (Check out Recorded Future, a Google and In-Q-Tel funded company for some big league methods. And there are others. In my CyberOSINT book and my Dark Web Notebook I identify about two dozen companies providing similar services. Palantir is one, admittedly high profile example, of next generation information access providers.

The write up does reveal at the end of the article that the Guardian is selling Jacque Peretti’s book. That’s okay. What’s operating under the radar is a book promo that seems to be one thing but is, in the real world, a nifty book promotion.

In closing, the information presented in the write up struck me as a trifle stale. I am okay with collections of information that have been assembled to make it easy for a reader to get the gist of a system quickly. My Dark Web Notebook is a Cliff’s Notes about what one Tor executive suggests does not exist.

When I read about Palantir, I look for information about:

  • Technical innovations within Gotham and Palantir’s other “products”
  • Details about the legal dust up between i2 and Palantir regarding file formats, an issue which has some here and now relevance with the New York police department’s Palantir experience
  • Interface methods which are designed to make it easier to perform certain data analysis functions
  • Specifics about the data loading, file conversion, and pre-processing index tasks and how these impact timeliness of the information in the systems
  • Issues regarding data reconciliation when local installs lose contact with cloud resources within a unit and across units
  • Financial performance of the company as it relates to stock held by stakeholders and those who want the company to pursue an initial public offering
  • What are the specific differences among systems on offer from BAE, Textron, and others with regards to Palantir Gotham?

Each time I read about Palantir these particular items seem to be ignored. Perhaps these are not sufficiently sexy or maybe getting the information is a great deal of work? The words “hash” and “rehash” come to my mind as one way to create something that seems filling but may be empty calories. Perhaps a “real journalist” will tackle some of the dot points. That would be more interesting than a stale reference to special effects in a star vehicle.

NB. I was an adviser to i2 Group Ltd., the outfit that created the Analyst’s Notebook.

Stephen E Arnold, July 31, 2017

Helpful Search Operators for Google Users

July 31, 2017

We have found a resource that can help readers google like never before: GoogleGuide’s article is titled simply, “Search Operators.” Unsatisfied with the information she found at Google’s website, mathematician and search enthusiast Nancy Blachman started GoogleGuide to enlighten us all on advanced Google Search methods. In “Search Operators,” she and colleague Jerry Peek educate us on one exacting approach. They write:

The following is an alphabetical list of the search operators. This list includes operators that are not officially supported by Google and not listed in Google’s online help. Note: Google may change how undocumented operators work or may eliminate them completely. Each entry typically includes the syntax, the capabilities, and an example.

The article leads with a table listing the search operators next to the relevant Google service: Web search, image search, groups, etc., which can be cross-referenced with the alphabetical list. Operator functions include useful tasks like searching for specific pages by title, discovering who has linked to a certain website and restricting searches by file type. The team even concludes with a set of exercises for practice with the operators. Check it out to make your internet searches even more efficient.

Cynthia Murrell, July 31, 2017

The EU Takes on Google in Landmark Case

July 31, 2017

It seems like Google is everyone’s favorite punching back, especially when it comes to anti-trust and monopolizing. Recently the EU has decided to take on the behemoth with a series of crushing fines.

One allegation is that Google is preferring its own shopping service when users search for products. The EU claims this violates antitrust laws although no such case has ever been tried.

Fortune.com explains why this is such a momentous case:

The legal battles will also provide helpful markers for the fast-moving tech industry and regulators struggling to impose old rules on new markets and dominant social platforms, said economist, Georgios Petropoulos. ‘We need some decisions on what is good and what is bad. All these will provide more clarity on how this market works,’ said Collyer Bristow lawyer, Stephen Critchley.

Surely, the US and others are keeping a close eye on how these cases unfold. Could Google, as it is known today, be forced to drastically change how they operate? What might this mean for other search engines?

Catherine Lamsfuss, July 31, 2017

A Wonky Analysis of Search Today: The SEO Wizard View

July 24, 2017

I read what one of my goslings described as a “wonky” discussion of search. You will have to judge for yourself, gentle reader. In an era of fake news, I am not sure what to make of a semi factual, incomplete write up with the title “How Search Reveals the World.” Search does not reveal “the world”; search provides some — note the word “some” — useful information about the behaviors of individuals who run queries or make use of systems like the oh, so friendly Amazon Alexa.

I learned that there are three types of search, and I have to tell you that these points were not particularly original. Here they are:

  • Navigational search queries. Don’t think about Endeca’s “guided navigation.” Think about Google Maps, which is going to morph into a publishing platform, a fact not included in the write up which triggered ruffled gosling feathers
  • Information search queries. Ah, now we’re talking. A human types 2.4 words in a search box and feels lucky or just looks at the first few hits on the first search page. Could these hits be ads unrelated or loosely related to the user’s query? Sure, absolutely.
  • Transactional search queries. I am not sure what this phrase “transactional search queries” means, but that’s not too surprising. The confusion rests with me when I think of looking for a product like a USB C plug on Amazon versus navigating to my bank’s fine, fine Web site and using a fine, fine interface to move money from Point A to Point B. Close enough for horseshoes.

image

Skimming the surface is good for seaplanes but not a plus for an analysis of search and retrieval.

But the most egregious argument in the write up is that search becomes little more than a rather clumsy manipulative tool for “marketers, advertisers, and business owners.” Why clumsy? The write up is happily silent about Facebook’s alleged gaming of its system for various purposes. Filtering hate speech, for example, seems admirable until someone has to define “hate speech.” Filtering live streaming of a suicide or crime in progress is a bit more problematic. But search is a sissy compared with the alleged Facebook methods. With marketers looking to make a buck, Facebook seems to slip the pager mâché noose of the write up’s argument.

But there is a far larger omission. One of the most important types of search is “pervasive, predictive search.” The idea is a nifty one. Using various “signals” a system presents information automatically to a user who is online and looking at an output. No specific action on the part of the user is required. The user sees what he or she presumably wants. Search without search! The marketer’s Holy Grail.

There are some important components of this type of search.

Perhaps an SEO expert will explain them instead of recycling old information and failing to define 33 percent of the bedrock statements. But that may be a bridge to far for those who would try to manipulate the systems and methods of some of the providers of free, ad supported search systems. The longest journey begins with a single step. Didn’t an SEO expert say that too?

Stephen E Arnold, July 24, 2017

Ambercite: A Patent Similarity Service

July 20, 2017

We learned about an Australian start up called Ambercite. The company’s service allows those wanting to know the answer to a question like this:

What patents are similar to US7593939?

Most of the online patent search systems do not deliver quick, comprehensive similarity results. When I have to think about patent similarity, I have found that several services have to be consulted and then some old-fashioned, billable time must be generously applied. Ambercite wants to change this approach to one powered by a more practical system. The company says:

Ambercite can help you quickly find patents and commercial opportunities, in many cases, missed by others, with its tools and services.

For more information about the firm, point your browser to this link. Worth watching.

Stephen E Arnold, July 20, 2017

Information about Dark Web Notebook

July 11, 2017

An email arrived yesterday saying, “We can’t find the Dark Web Notebook” on Bing, Google, or any other online search system. If you want to locate information about this new book, just navigate to Google and search for

Arnold Dark Web Notebook

Alternatively, you can use these links:

Buy the book: https://gum.co/darkweb

Table of contents: http://www.xenky.com/darkwebnotebook

The Association of Former Intelligence Officers has a profile of the book on its members-only Web site. Log in to obtain access to book synopsis.

Kenny Toth, July 11, 2017

 

Video Search

July 11, 2017

Why do we not have better video search yet? Searching for a video online still requires old-school hunting around. Take your quest beyond the familiar YouTube with the MakeUseOf piece, “10 Video Sites that Are Better than YouTube.” Writer Kayla Matthews recommends Vimeo, Metacafe, Veoh, the Internet Archive, Crackle, Screen Junkies, MySpace (it still exists!), The Open Video Project. GAG, and TED (yes, as in TED Talks). Some of these are more specialized than others; see the article for details. I’m happy to see the valuable Internet Archive on this list, about which Matthews writes:

As its name suggests, Internet Archive is a web-based library of all sorts of free content, including books, music, software, and, of course, movies. Just as you might associate a physical library with doing research, one of the strengths of the Internet Archive’s video content is its vast collection of historical content. While it does also have some newer content, some of its best videos are older and obscure news reports, TV series, and movies that are typically harder to find on other sites. Like many other sites, users can also upload videos to the Internet Archive.

Meanwhile, TechCrunch looks at the recently introduced search functionality from Snapchat in, “Trying Out Snapchat’s New Universal Search Capabilities.” Reporter Anthony Ha supplies a demonstrative video, but it seems the tool is pretty straightforward. Is it an effort to address a noted weakness ahead of Snap Inc.’s much-anticipated IPO? Perhaps, but whatever the reason, it is a bit of progress in the realm of video search.

Cynthia Murrell, July 11, 2017

Bing Introduces an Image Feed

June 30, 2017

Here’s a short write-up about a notable addition to Bing —On MSFT reports, “Bing Image Search Updated with Image Feed, Taking on Pinterest.” After noting that the Tools menu has been renamed “Filter” and moved to the right of the screen, writer Jack Wilkinson explains:

A new feature has also appeared, known as Image Feed, which replaces where Tools originally used to be placed. Image Feed allows you to choose a feed of images…. When selecting an image feed to look at, it allows you to follow it as an ‘interest’, so that you can see new images in a feed. Your personalised image feed can be accessed here. By the looks of it, it appears as though Bing’s new image feed is taking a hit at Pinterest – bringing all the images you could want into one place via a feed, in similar fashion to Pinterest.

Yes, this could certainly replace Pinterest for many users, especially ones who already frequent Bing. I had noticed the refine-by-keyword list at the top of Google’s image results page is formatted much like the one on my Pinterest account. Will that online search platform, still number one by far, also implement a Pinterest-like image feed? Stay tuned.

Cynthia Murrell, June 30, 2017

Legal Media Search Site Baits Pirates with Keywords

June 26, 2017

How do you attract a (media) pirate? Apparently, with targeted keywords. Torrent Freak reports, “Film Industry’s Latest Search Engine Draws Traffic with ‘Pirate’ Keywords.” Interesting tactic. Apparently a Dutch answer to Hollywood’s legal-content-finder WhereToWatch, the search engine Film.nl returns legal content. However, they’ve peppered their descriptions with keywords associated with pirated content. For example, “Don’t Wrestle With Nasty Torrents. Ignore the Rogue One: A Star Wars Story torrent.” Intriguing tactic. Reporter Ernesto writes:

Those who scroll down long enough will notice that each page has a targeted message for pirates as well. The descriptions come in a few variations but all mention prominent keywords such as ‘torrents’ and reference ‘illegal downloading’ and unauthorized streaming. …

 

While the piracy related messaging is unusual, it’s actually quite clever. Since a lot of people are searching for ‘torrent,’ ‘streaming’ and ‘download’ related terms combined with movie and TV-show titles, it helps to keep search traffic away from pirate sites. In other words, it’s a smart search engine optimization trick, helping it to directly compete with pirate sites on this front. The big question is whether people who search for ‘Movie X torrent’ will be satisfied with the results Film.nl offers. That said, from a movie industry perspective, it definitely beats doing nothing at all.

Does it? When prospective viewers learn their desired content is not yet legally available, we suspect most will simply navigate away to more shady destinations. Will a significant number be persuaded to wait for the legal version by Film.nl’s combination of keyword bait and moralizing? I doubt it. But it is an interesting play to note.

Cynthia Murrell, June 26, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta