FTC and Google: Never Complain, Never Explain Usually

March 26, 2015

I read “FTC Addresses Its Choice Not to Sue Google.” The write up reports that the FTC is explaining its decision not to chase Google around the conference table. Heck, would that tire out the Googlers, making it tough to stay awake in a White House meeting?

According to the write up:

“All five Commissioners (three Democrats and two Republicans) agreed that there was no legal basis for action with respect to the main focus of the investigation — search,” the statement released on Wednesday read. “The Commission’s decision on the search allegations was in accord with the recommendations of the F.T.C.’s Bureau of Competition, Bureau of Economics, and Office of General Counsel.”

I think this means, “No problemo.”

I also found this statement about the FTC’s expertise in information governance interesting:

In the final paragraph of the commissioners’ statement, the agency once more expressed regret at the inadvertent release of its internal document. “We are taking additional steps to ensure that such a disclosure does not occur in the future,” it said.

That’s good. The future. Many search vendors point out that the functions their marketers say are available today really mean in the “future.” Is this a characteristic of our digital era.

Stephen E Arnold, March 26, 2015

Big Data and Their Interesting Processes

March 25, 2015

I love it when mid tier consultants wax enthusiastically about Big Data. Search your data lake, enjoins one clueless marketer. Big Data is the future, sings a self appointed expert. Yikes.

To get a glimpse of exactly what has to be done to process certain types of Big Data in an economical yet timely manner, I suggest you read “Analytics on the Cheap.” The author is 0X74696D. Get it?

The write up explains the procedures required to crunch data and manage the budget. The work flow process I found interesting is:

  • Incoming message passes through our CDN to pick up geolocation headers
  • Message has its session authenticated (this happens at our routing layer in Nginx/OpenResty)
  • Message is routed to an ingest server
  • Ingest server transforms message and headers into a single character-delimited querystring value
  • Ingest server makes a HTTP GET to a 0-byte file on S3 with that querystring
  • The bucket on S3 has S3 logging turned on.
  • We ingest the S3 logs directly into Redshift on a daily basis.

The write up then provides code snippets and some business commentary. The author also identifies the upside of the approach used.

Why is this important? It is easy to talk about Big Data. Looking at what is required to make use of Big Data reveals the complexity of the task.

Keep this hype versus real world split in mind the next time you listen to a search vendor yak about Big Data.

Stephen E Arnold, March 25, 2015

Relaxing a Query: PostgreSQL Style

March 22, 2015

If you are a user of PostgreSQL and want to implement fuzzy, relaxed, or “show ‘em something sort of close to the user’s query,” you will want to read “Super Fuzzy Searching on PostgreSQL.” Fuzzy search makes it possible to show a user who is not quite sure how terms appear in an index. Fuzzy is not exactly like “close” in horseshoes. More algorithmic magic is at play in information retrieval systems.

The article explains PostgreSQL fuzzy capabilities and launches into the notion of trigrams. Keep in mind that Manning & Napier (creators of DR LINK) possess some n-gram patents. The old Brainware which may have once been SER) also possesses some n-gram type patents. I recall hearing years ago that Brainware developed a trigram search system which worked reasonably well when looking for similar patent claims. Brainware is now part of a printer company, and I have lost track of the search technology. I suppose I could investigate the Brainware/Lexmark status, but I have other tasks beckoning my attention.

The write up explains how to implement trigrams for PostgreSQL. The code examples are useful and the tips for dealing with large datasets are quite helpful. The author does not mention the n-gram related patents. I assume that the author assumes that the patent holders assume no one is infringing. That is a triple assumption set. int ere sti ngt rig ram coi nci den ce_

Stephen E Arnold, March 22, 2015

Adobe: A Document Cloud Looms

March 19, 2015

Adobe is moving from PDF creation to document management. I avoid Adobe Acrobat because it bedeviled me years ago with a PDF dongle. The dongle had a counter. After we created the number of documents authorized by the dongle, the opportunity to purchase another dongle arose. Exciting. That warned me off the outfit.

I brushed against Adobe when I researched the original Enterprise Search Report in 2003. That was a mere 12 years ago, yet the memory is still fresh. I was trying to figure out what vendor provided the search system for Adobe products. After reading publicly accessible information and making fruitless attempts to speak to a person who knew about search at Adobe, I learned by accident the name of the provider.

Do you recognize the name Lextek. I sure did not. I offer a no cost summary of this company and its search system at this link. I was fascinated with Lextek because I had difficulty locating information using the Adobe products which incorporated this system. I had a short list of other search systems Adobe has used over the years to the same result. I invite you to fire up an Adobe product and try to locate the information needed to solve a problem or learn a procedure or figure out what state an Adobe software product is in. Let me know how that works out for you.

I read “Adobe Unveils Cloud Electronic Document Service.” I learned that “Adobe Systems will launch a cloud-based document management service within the month.” That’s soon. The article continued:

The company said the core of the new service is Adobe Acrobat, the world’s most sought-after document management software. The upgraded Adobe Acrobat Document Cloud enables document managers to produce, check and confirm official documents on both personal computers and mobile devices. They also can put an electronic signature to the Portable Document Format (PDF) file to give it a legal force, the company said.

Yikes, another silo of data for an organization to “federate.”

Several questions crossed my mind:

  • What is the search system for the system? (Lextek’s owners operate a confectionary store if I understood the research my team assembled.)
  • What is the programmatic access Adobe will provide to an organization placing its PDF documents in the Adobe Document Cloud?
  • What is the security provided for these customers?

Adobe’s play is an interesting one. I wonder if the company will allow its customers to mark documents “public” and then provide an online access service? Worth watching.

Stephen E Arnold, March 19, 2015

SharePoint Gets Serious with Information Governance

March 19, 2015

SharePoint has enjoyed continued success over the last 15 years, but it has not been without some bumps along the way. Information governance is one of the noted areas in which Share has fallen flat. Read more in the CMS Wire article, “Keeping SharePoint In Check with Information Governance.”

The article begins:

“Historically, SharePoint was thought to cause as many information governance problems as it solved. The 2001 to 2003 versions did not show Microsoft putting much effort into helping customers with information governance. But after the massive take up of SharePoint Portal Server 2007 licenses, and the often negative conversations coming out of the sizable SharePoint user community, Microsoft started to take governance issues seriously.”

In addition to keep an eye on your news feed for the latest SharePoint buzz, staying tuned to experts in the field is a great way to save time and get pointed information pertaining to improving a SharePoint installation. Stephen E. Arnold has one such SharePoint feed on his Web site, ArnoldIT.com. Focusing on tips, tricks, and news, Arnold collocates much of content that users and managers alike will find helpful for navigating day-to-day SharePoint operations.

Emily Rae Aldridge, March 19, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Duck Duck Jumbawumba?

March 18, 2015

Usually if you want a private search, free of targeted ads you head on over to DuckDuckGo.com. While DuckDuckGo holds its on against bigger search engines, because it is the nice guy of search, no one has really come out to challenge water fowl. The Pittsburgh Post-Gazette has a story about another private-based search engine: “Hampton Entrepreneur Seeks To Launch Privacy-Friendly Search Engine,” but you cannot so much as call it a DuckDuckGo rival as another option.

Michael DeKort launched a $125,000 Kickstarter campaign to fund Jumbawumba, a search engine that uses Google’s prowess while retaining a user’s privacy. It also would create cohesive search results using video, images, news, and Web sites on one page, instead of four.

How does it work?

“Jumbawumba taps Google’s vast reach. To Google’s eyes, though, the queries come from Jumbawumba, not from the originating computer, Mr. DeKort said. And while Google, Bing and Yahoo! keep records of each computer’s searches, and use them to tailor advertising, Jumbawumba pledges not to store any data on one-time searches. (It would keep records of ongoing search queries, but wouldn’t sell them to marketing firms, Mr. DeKort said.) Jumbawumba’s computer server will ultimately be overseas, limiting government access, though the company would respect law enforcement subpoenas.”

While private search engines like Jumbawumba will probably never be able to compete with Google, it is good to know that Michael DeKort are fighting to protect online privacy. The more the merrier for private search!

Whitney Grace, March 18, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Qwant Develops Qwant Junior, the Search Engine for Children

March 17, 2015

The article on Telecompaper titled Qwant Tests Child-Friendly Search Engine discusses the French companies work. Qwant is focused on targeting 3 to 13 year olds with Qwant Junior, in partnership with the Education Ministry. Twenty percent of the company is owned by digital publishing powerhouse Axel Springer. The child-friendly search engine will attempt to limit the access to inappropriate content while encouraging children to use the search engine to learn. The article explains,

“The new version blocks or lists very far down in search results websites that show violence and pornography, as well as e-commerce sites. The version features an education tab separately from the general web search that offers simplified access to educational programme, said co-founder Eric Leandri. Qwant Junior’s video tab offers child-appropriate videos from YouTube, Dailymotion and Vimeo. After tests with the ministry, the search engine will be tested by several hundred schools.”

Teaching youngsters the ways of the search engine is important in our present age. The concept of listing pornography “very far down” on the list of results might unsettle some parents of young teens smart enough to just keep scrolling, but it is France! Perhaps the expectation of blocking all unsavory material is simply untenable. Qwant is planning on a major launch by September, and is in talks with Brazil for a similar program.

Chelsea Kerwin, March 17, 2014

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

A New French Business Search Engine

March 16, 2015

France is not the first country you think about when it comes to developing new search techniques, much less search engines. However, Web Time Media reports that, “France Datafari Labs Launches New Business Search Engine” meant to rival Polyspot, Exalead, and Sinequa.

France Labs designed the Datafari 1.0 specifically for the cloud and big data and it offers a complete open source enterprise search solution. It was made to be the top performing search application available via open source, making it stiff competition for Apache Solr and ElasticSearch as well.

The description of its offerings is pretty exciting [via Google Translate]:

“The promise to companies is to allow them to retrieve the data wherever they are, whatever they are, safe. Datafari for that innovates on several axes. At the technical level, it manages corpus big data, integrating Apache SolrCloud. Level analysis, it offers analytical queries and dashboards corpus. At development, it is Apache license, non-viral for business (they do not have the obligation to provide the community the developments they do). Finally, the interoperability level Datafari provides a set of REST APIs to expose its connectors as well as its search engine.”

Datfari 1.0 is already being downloaded and experimented with by developers to see if it offers a new, viable, and flexible solution for enterprise and singular networks. The open source search market is already swollen in the English-speaking world, so Datafari needs to explain more about what makes it different from other search applications.

Whitney Grace, March 16, 2015
Sponsored by ArnoldIT.com, developer of Augmentext

Microsoft Makes Bing Faster

March 16, 2015

Bing is classified as a generic search engine living in Google’s as well as DuckDuckGo’s shadows. In an attempt to make Bing a more viable product, ExtremeTech tells us that “Microsoft To Accelerate Bing Search With Neural Network.” When Bing scours the Internet, it pulls results from a Web index that is half the size of Google’s. Microsoft wants to increase Bing’s efficiency and speed, so they created the Field-Programmable Gate Array (FPGA) technology.

Microsoft breaks Bing’s search into three parts: machine learning scoring, feature extraction, and free-form expressions. Bing still uses Xeon processors for its document selection service and it needs to switch over to new FPGA software to increase its search speed. Microsoft called the team developing the new FPGA technology Project Catapult. Project Catapult uses similar tech designed in 2011, but it relies on half the servers as it did in the past.

Microsoft is relying on convolution neural network accelerators (CNNs) for the project:

“Convolutional neural networks (CNNS) are composed of small assemblies of artificial neurons, where each focuses on a just small part of an image — their receptive field. CNNs have already bested humans in classifying objects in challenges like the ImageNet 1000. Classifying documents for ranking is a similar problem, which is now one among many Microsoft hopes to address with CNNs.”

Armed with the new FPGA, Microsoft hopes to increase Bing’s search and rank business to compete at a greater level with Google. While that may increase Bing’s chances of returning better results, remember that Microsoft still creates OS’s that still fail on initial public releases.

Whitney Grace, March 16, 2015
Sponsored by ArnoldIT.com, developer of Augmentext

Swiftype Raises More Money for Web Site Search

March 16, 2015

TechCrunch tells us that search startup “Swiftype Raises $13M More For Its Starter Site And App Search.” Swiftype’s mission is pretty straightforward: they want to create customizable search tools that do not suck (TechCrunch’s own language). You have to admit that it is a bold move, considering many out-of-the-box solutions do stink worse than dial-up from 1995 and open source (while it is free and awesome) requires a bit of developer experience. Swiftype takes the guesswork and makes a tailored solution without the hassle or developer experience.

While Swiftype originally started out for Web sites, they have moved into other areas:

“On the other hand, online publishers might not be the most lucrative customer base, so while co-founders Matt Riley and Quin Hoxie told me they still support publishers (and we still use Swiftype at TechCrunch), they’ve also expanded into other areas, particularly knowledge bases (basically, FAQs and customer support sites) and e-commerce.”

The search company will use the $13 million will probably invest the money to expand its already popular search tools. New Enterprise Associates led the Series B funding and they were used for the original Series A round. Swiftype used New Enterprise Associates to form a long-term partnership.

Whitney Grace, March 16, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta