The Challenge of Synonyms

April 12, 2015

I am okay with automated text processing systems. The challenge is for software to keep pace with the words and phrases that questionable or bad actors use to communication. The marketing baloney cranked out by vendors suggests that synonyms are not a problem. I don’t agree. I think that words used to reference a subject can fool smart software and some humans as well. For an example of the challenge, navigate to “The Euphemisms People Use to Pay Their Drug Dealer in Public on Venmo.” The write up presents some of the synonyms for controlled substances; for example:

  • Kale salad thanks
  • Columbia in the 1980s
  • Road trip groceries
  • Sanity 2.0
  • 10 lbs of sugar

The synonym I found interesting was an emoji, which most search and content processing systems cannot “understand.”

image

and

image

Attensity asserts that it can “understand” emojis. Sure, if there is a look up list hard wired to a meaning. What happens if the actor changes the emoji? Like other text processing systems, the smart software may become less adept than the marketers state.

But why rain on the hype parade and remind you that search is difficult? Moving on.

Stephen E Arnold, April 12, 2015

Search and Identify a YouTube or Vimeo Tune

April 12, 2015

Need to identify a song used in a YouTube video? “Name That tune on Any YouTube Video with MooMa.sh” explains that now you can perform this search and retrieval task. Navigate to http://www.mooma.sh/. Paste a YouTube, Vimeo, or Dailymotion link into the search box and Moo1. That’s the service’s name for search, not mine. There is a video explaining how the service works and a Freshman Comp 101 write up that explains how. I use Samba Pump, for which I paid a fee. MooMa.sh reported:

image

Stephen E Arnold, April 12, 2015

Elastic What: Stretching Understanding to the Snapping Point

April 10, 2015

I love Amazon. I love Elastic as a name for search. I hate confusion. Elasticsearch is now “Elastic.” I get it. But after I read “Amazon Launches New File Storage Service For EC2”, there may be some confusion between Amazon’s use of Elastic, various Amazon “elastic” services, and search. Is Amazon going to embrace the word “elastic” to describe its information retrieval system. Will this cause some confusion with the open source search vendor Elastic? I find it interesting that name confusion is an ever present issue in search. I have mentioned what happens when a company loses control of its name. Examples range from Thunderstone (a maker of search and search appliances) and the consumer software with the same name. Smartlogic (indexing software) is now facing encroachment from Smartlogic.io (consulting services). Brainware, now owned by Lexmark, lost control of its brand when distasteful videos appeared with the label Brainware. The brand was blasted with nasty bits. Where is the search oriented Brainware now? Retired I believe just as I am.

Little wonder some people have difficulty figuring out which vendor offers what software. Stretch your mind around the challenge of explaining that you want the Amazon elastic and the Elastic elastic. Vendors seem to operate without regard to the need to reduce signal mixing.

Stephen E Arnold, April 10, 2015

A Former Googler Reflects

April 10, 2015

After a year away from Google, blogger and former Googler Tim Bray (now at Amazon) reflects on what he does and does not miss about the company in his post, “Google + 1yr.” Anyone who follows his blog, ongoing, knows Bray has been outspoken about some of his problems with his former employer: First, he really dislikes “highly-overprivileged” Silicon Valley and its surrounds, where Google is based. Secondly, he found it unsettling  to never communicate with the “actual customers paying the bills,” the advertisers.

What does Bray miss about Google? Their advanced bug tracking system tops the list, followed closely by the slick and efficient, highly collaborative internal apps deployment. He was also pretty keen on being paid partially in Google stock between 2010 and 2014. The food on campus is everything it’s cracked up to be, he admits, but as a remote worker, he rarely got to sample it.

It was a passage in Bray’s “neutral” section that most caught my eye, though. He writes:

“The number one popular gripe against Google is that they’re watching everything we do online and using it to monetize us. That one doesn’t bother me in the slightest. The services are free so someone’s gotta pay the rent, and that’s the advertisers.

“Are you worried about Google (or Facebook or Twitter or your telephone company or Microsoft or Amazon) misusing the data they collect? That’s perfectly reasonable. And it’s also a policy problem, nothing to do with technology; the solutions lie in the domains of politics and law.

“I’m actually pretty optimistic that existing legislation and common law might suffice to whack anyone who really went off the rails in this domain.

“Also, I have trouble getting exercised about it when we’re facing a wave of horrible, toxic, pervasive privacy attacks from abusive governments and actual criminals.”

Everything is relative, I suppose. Still, I think it understandable for non-insiders to remain a leery about these companies’ data habits. After all, the distinction between “abusive government” and businesses is not always so clear these days.

Cynthia Murrell, April 10, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

 

Enterprise Search and Marketers: Think Endpoint Computing

April 9, 2015

I have to hand it to the mid tier consultants. Just when I thought the baloney about enterprise search had begun to recede, I learned I was wrong. That puts me in my place.

Search is now “endpoint computing.” I know this because I received an email from the incubator-spawned X1 search company. I have tested X1 over the years, and I have come to think neutral thoughts about the company’s administrative options and its interface.

The method of communicating with me was a somewhat dry email that began with the salutation, “Hello.”

image

The email offered me a report by the ever fascinating Gartner Group. The point of the email is that X1 is a cool vendor. That’s nice. Curious I clicked on the link and was redirected to this page:

image

Okay, a lead generating system. I filled out the information and then I received another email. This one was a bit more serious.

The author, an earnest person named “Janice” wanted to speak with me to discuss my search requirement. Furthermore the person looks forward to speaking with me about “unified search and discovery for virtual, cloud, and hybrid environments.” X1 was founded in 2003and has experienced several management changes, which is common in the “unified search and discovery for virtual, cloud, and hybrid environments” market.

What makes X1 cool? To answer the question I had to read the Gartner Report, a task which I know is a chore.

image

The idea is that search is now endpoint computing. Okay. I guess. The report reassures me that the information in the report is not an “exhaustive list of vendors.” That’s good because in the report there are five companies mentioned:

  • Login Consultants, a workspace consultant, but I don’t know what this term means
  • Tanium, a company offering endpoint security and systems management, which strikes me as a consulting outfit
  • X1, a search and retrieval vendor offering desktop search, eDiscovery, and enterprise search
  • Kaviza (a where are they now company which puzzles me) a virtualized desktop outfit now owned by Citrix
  • Framehawk (another where are they outfit), a company in the high definition user experience business (I have no idea what this means). Apparently Citrix does because Citrix also acquired Framehawk.

Quite an eclectic list. I remember when I worked at Ziff Communications in Manhattan. I listened to a group of editors working up a list of top trends over lunch. So much for methodology. The approach produced a somewhat eclectic list which was, in my opinion, of little value. The list was silly. But these were professionals. Who was I?

So the Gartner list is neither exhaustive nor coherent from my point of view.

What’s cool about X1 search as endpoint computing?

According to the mid tier consulting firms’ authors, X1 is cool because:

“Implementing VDI that provides a user experience that’s equal or superior to a distributed PC environment has been a huge challenge for organizations. While much of the innovation in the VDI space over the past few years has been focused on reducing cost and complexity, some vendors, like X1, have concentrated on removing barriers or exceptions that make VDI a compromise rather than a business enabler.” (page 3)

In the context of the firms profiled by Gartner’s “expert, the explanation of the X1 cool factor baffles me. I am not confused. I just don’t know what Gartner is trying to communicate.

I have several thoughts running through my head:

First, Gartner obviously has a financial model in place that makes it possible for the mid tier consulting firm to crank out analyses that seem to be authoritative. On closer inspection, the terminology and the information provided are not particularly useful. Does Gartner write these for free and allow the “cool” vendors to distribute these analyses for free? Why do I get a copy for free? Hmmm.

Second, there are obviously companies which value the Gartner endorsement even if it is not exactly clear what the message is. These companies—specifically X1—have seized upon the Gartner report as a way to generate leads and sales. I have no problem with that, but sending information that makes sense would appeal to me more than what I perceive as “information free” commentary.

Third, I continue to worry about the chance for meaningful discourse about the relative merits of information retrieval systems. The presentation of vendors in the context of buzzwords does little to convince me of the merits of X1 or the credibility of Gartner Group. I suppose that is why there are blue chip consulting firms and mid tier (azure chip) consulting firms. One good point: Unlike IDC’s Dave Schubmehl, the report was not $3,500 available on Amazon with my name slapped on as the “author.”

Score one for Gartner’s merrie band.

Stephen E Arnold, April 9, 2015

Twitter Search: Well, Sort Of

April 9, 2015

I read “Updating Trends on Mobile.” I am more interested in more detailed information about Twitter content, users, and tags. General purpose or massified outputs are of little utility in my little world.

I noted this passage:

We’ve been working to make content easier to find over the last several months in places like your home timeline – with recaps and Tweets from within your network – and through efforts like MagicRecs. We’ll continue to make improvements like these in the future.

If you navigate to the Twitter search page and enter a string like “enterprise search”, you will see variants of the term or phrase expressed as Twitter hash tags. The trends displayed were reflective of what Twitter’s log suggest is hot. Here’s an example:

image

How many of these trends do you recognize. I knew about iOS 8.3, Apple Watch, and not much else.

Queries for tweets remain a bit problematic for me.

Stephen E Arnold, April 9, 2015

The Cost of a Click Through Bing Ads

April 9, 2015

Wow. As an outsider to the world of marketing, I find these figures rather astounding. MarketingProfs shares an infographic titled, “The 20 Most Expensive Bing Ads Keywords.” The data comes from a recent analysis by WordStream of 10 million English keywords, grouped into categories. Writer Vahe Habeshian tells us:

“WordStream analyzed some 10 million English keywords and grouped the them into categories to determine the most expensive types of keywords (see infographic, below).

“(Also see a similar analysis of the most expensive keywords in Google AdWords advertising from 2011.)

“The most expensive keyword on Bing Ads is ‘lawyer,’ which would cost advertisers seeking the top ad spot a whopping $109.21 per click. Not surprisingly, the top 5 keywords are related to the legal world, indicating how lucrative clients can be.”

Yes, almost $110 per click whether legitimate, a human error, or a robot script. That’s a lot of fruitless clicks. It seems irrational, but it must be working if companies keep spending the dough. Right?

The word in second place, “attorney,” comes to $101.77 per click, and “DUI” is a comparative bargain at $68.56. After the top five, law-related words, there are such valuable terms as “annuity,” “rehab,”  and “exterminator.” See the infographic for more examples.

Cynthia Murrell, April 09, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Progress in Image Search Tech

April 8, 2015

Anyone interested in the mechanics behind image search should check out the description of PicSeer: Search Into Images from YangSky. The product write-up goes into surprising detail about what sets their “cognitive & semantic image search engine” apart, complete with comparative illustrations. The page’s translation seems to have been done either quickly or by machine, but don’t let the awkward wording in places put you off; there’s good information here. The text describes the competition’s approach:

“Today, the image searching experiences of all major commercial image search engines are embarrassing. This is because these image search engines are

  1. Using non-image correlations such as the image file names and the texts in the vicinity of the images to guess what are the images all about;
  2. Using low-level features, such as colors, textures and primary shapes, of image to make content-based indexing/retrievals.”

With the first approach, they note, trying to narrow the search terms is inefficient because the software is looking at metadata instead of inspecting the actual image; any narrowed search excludes many relevant entries. The second approach above simply does not consider enough information about images to return the most relevant, and only most relevant, results. The write-up goes on to explain what makes their product different, using for their example an endearing image of a smiling young boy:

“How can PicSeer have this kind of understanding towards images? The Physical Linguistic Vision Technologies have can represent cognitive features into nouns and verbs called computational nouns and computational verbs, respectively. In this case, the image of the boy is represented as a computational noun ‘boy’ and the facial expression of the boy is represented by a computational verb ‘smile’. All these steps are done by the computer itself automatically.”

See the write-up for many more details, including examples of how Google handles the “boy smiles” query. (Be warned– there’s a very brief section about porn filtering that includes a couple censored screenshots and adult keyword examples.) It looks like image search technology progressing apace.

Cynthia Murrell, April 08, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Google Altered Search Results?!  

April 8, 2015

If you know anything about search results, search engine optimization, and search algorithms, you probably wondered if Google ever changed its search results so they would be favor one search result over another.  Google already alters results with Google AdWords, the Right to Forgotten, and removing results if they break rules.

The FTC revealed via The Wall Street Journal that Google has been altering its search results for profit: “Inside The US Antitrust Probe Of Google.”  The FTC found that Google was using its monopoly on search to harm Internet users and its rivals.  FTC recommended a lawsuit be brought against Google for three of its practices.  The FTC voted to end the investigation in 2013, which is strange, but they did so because they had competing recommendations.

Google continues to stand by its own innocence, citing that the case closed two years ago and that people continue to use its services.  There is one big thing that the Wall Street Journal points out:

“On one issue—whether Google used anticompetitive tactics for its search engine—the competition staff recommended against a lawsuit, although it said Google’s actions resulted in “significant harm” to rivals. In three other areas, the report found evidence the company used its monopoly behavior to help its own business and hurt its rivals.”

Can this be considered part of their “do not evil” bylaw?

Whitney Grace, April 8, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Cyber Wizards Speak Publishes Exclusive BrightPlanet Interview with William Bushee

April 7, 2015

Cyber OSINT continues to reshape information access. Traditional keyword search has been supplanted by higher value functions. One of the keystones for systems that push “beyond search” is technology patented and commercialized by BrightPlanet.

A search on Google often returns irrelevant or stale results. How can an organization obtain access to current, in-depth information from Web sites and services not comprehensively indexed by Bing, Google, ISeek, or Yandex?

The answer to the question is to turn to the leader in content harvesting, BrightPlanet. The company was one of the first, if not the first, to develop systems and methods for indexing information ignored by Web indexes which follow links. Founded in 2001, BrightPlanet has emerged as a content processing firm able to make accessible structured and unstructured data ignored, skipped, or not indexed by Bing, Google, and Yandex.

In the BrightPlanet seminar open to law enforcement, intelligence, and security professionals, BrightPlanet said the phrase “Deep Web” is catchy but it does not explain what type of information is available to a person with a Web browser. A familiar example is querying a dynamic database, like an airline for its flight schedule. Other types of “Deep Web” content may require the user to register. Once logged into the system, users can query the content available to a registered user. A service like Bitpipe requires registration and a user name and password each time I want to pull a white paper from the Bitpipe system. BrightPlanet can handle both types of indexing tasks and many more. BrightPlanet’s technology is used by governmental agencies, businesses, and service firms to gather information pertinent to people, places, events, and other topics

In an exclusive interview, William Bushee, the chief executive officer at BrightPlanet, reveals the origins of the BrightPlanet approach. He told Cyber Wizards Speak:

I developed our initial harvest engine. At the time, little work was being done around harvesting. We filed for a number of US Patents applications for our unique systems and methods. We were awarded eight, primarily around the ability to conduct Deep Web harvesting, a term BrightPlanet coined.

The BrightPlanet system is available as a cloud service. Bushee noted:

We have migrated from an on-site license model to a SaaS [software as a service] model. However, the biggest change came after realizing we could not put our customers in charge of conducting their own harvests. We thought we could build the tools and train the customers, but it just didn’t work well at all. We now harvest content on our customers’ behalf for virtually all projects and it has made a huge difference in data quality. And, as I mentioned, we provide supporting engineering and technical services to our clients as required. Underneath, however, we are the same sharply focused, customer centric, technology operation.

The company also offers data as a service. Bushee explained:

We’ve seen many of our customers use our Data-as-a-Service model to increase revenue and customer share by adding new datasets to their current products and service offerings. These additional datasets develop new revenue streams for our customers and allow them to stay competitive maintaining existing customers and gaining new ones altogether. Our Data-as-a-Service offering saves time and money because our customers no longer have to invest development hours into maintaining data harvesting and collection projects internally. Instead, they can access our harvesting technology completely as a service.

The company has accelerated its growth through a partnering program. Bushee stated:

We have partnered with K2 Intelligence to offer a full end-to-end service to financial institutions, combining our harvest and enrichment services with additional analytic engines and K2’s existing team of analysts. Our product offering will be a service monitoring various Deep Web and Dark Web content enriched with other internal data to provide a complete early warning system for institutions.

BrightPlanet has emerged as an excellent resource to specialized content services. In addition to providing a client-defined collection of information, the firm can provide custom-tailored solutions to special content needs involving the Deep Web and specialized content services. The company has an excellent reputation among law enforcement, intelligence, and security professionals. The BrightPlanet technologies can generate a stream of real-time content to individuals, work groups, or other automated systems.

BrightPlanet has offices in Washington, DC, and can be contacted via the BrightPlanet Web site atwww.brightplanet.com.

The complete interview is available at the Cyber Wizards Speak web site at www.xenky.com/brightplanet.

Stephen E Arnold, April 7, 2015

Blog: www.arnoldit.com/wordpress Frozen site: www.arnoldit.com Current site: www.xenky.com

 

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta