The IHS Invention Machine: US 8,666,730

July 31, 2014

I am not an attorney. I consider this a positive. I am not a PhD with credentials as impressive Vladimir Igorevich Arnold, my distant relative. He worked with Andrey Kolmogorov, who was able to hike in some bare essentials AND do math at the same time. Kolmogorov and Arnold—both interesting, if idiosyncratic, guys. Hiking in the wilderness with some students, anyone?

Now to the matter at hand. Last night I sat down with a copy of US 8,666,730 B2 (hereinafter I will use this shortcut for the patent, 730), filed in an early form in 2009, long before Information Handing Service wrote a check to the owners of The Invention Machine.

The title of the system and method is “Question Answering System and Method Based on Semantic Labeling of Text Documents and User Questions.” You can get your very own copy at www.uspto.gov. (Be sure to check out the search tips; otherwise, you might get a migraine dealing with the search system. I heard that technology was provided by a Canadian vendor, which seems oddly appropriate if true. The US government moves in elegant, sophisticated ways.

Well, 730 contains some interesting information. If you want to ferret out more details, I suggest you track down a friendly patent attorney and work through the 23 page document word by word.

My analysis is that of a curious old person residing in rural Kentucky. My advisors are the old fellows who hang out at the local bistro, Chez Mine Drainage. You will want to keep this in mind as I comment on this James Todhunter (Framingham, Mass), Igor Sovpel (Minsk, Belarus), and Dzianis Pastanohau (Minsk, Belarus). Mr. Todhunter is described as “a seasoned innovator and inventor.” He was the Executive Vice President and Chief Technology Officer for Invention Machine. See http://bit.ly/1o8fmiJ, Linked In at (if you are lucky) http://linkd.in/1ACEhR0, and this YouTube video at http://bit.ly/1k94RMy. Igor Sovpel, co inventor of 730, has racked up some interesting inventions. See http://bit.ly/1qrTvkL. Mr. Pastanohau was on the 730 team and he also helped invent US 8,583,422 B2, “System and Method for Automatic Semantic Labeling of Natural Language Texts.”

The question answering invention is explained this way:

A question-answering system for searching exact answers in text documents provided in the electronic or digital form to questions formulated by user in the natural language is based on automatic semantic labeling of text documents and user questions. The system performs semantic labeling with the help of markers in terms of basic knowledge types, their components and attributes, in terms of question types from the predefined classifier for target words, and in terms of components of possible answers. A matching procedure makes use of mentioned types of semantic labels to determine exact answers to questions and present them to the user in the form of fragments of sentences or a newly synthesized phrase in the natural language. Users can independently add new types of questions to the system classifier and develop required linguistic patterns for the system linguistic knowledge base.

The idea, as I understand it, is that I can craft a question without worrying about special operators like AND or field labels like CC=. Presumably I can submit this type of question to a search system based on 730 and its related inventions like the automatic indexing in 422.

The references cited for this 2009 or earlier invention are impressive. I recognized Mr. Todhunter’s name, that of a person from Carnegie Mellon, and one of the wizards behind the tagging system in use at SAS, the statistics outfit loved by graduate students everywhere. There were also a number of references to Dr. Liz Liddy, Syracuse University. I associated her with the mid to late 1990s system marketed then as DR LINK (Document Retrieval Linguistic Knowledge). I have never been comfortable with the notion of “knowledge” because it seems to require that subject matter experts and other specialists update, edit, and perform various processes to keep the “knowledge” from degrading into a ball of statistical fuzz. When someone complains that a search system using Bayesian methods returns off point results, I look for the humans who are supposed to perform “training,” updates, remapping, and other synonyms for “fixing up the dictionaries.” You may have other experiences which I assume are positive and have garnered you rapid promotion for your search system competence. For me, maintaining knowledge bases usually leads to lots of hard work, unanticipated expenses, and the customary termination of a scapegoat responsible for the search system.

I am never sure how to interpret extensive listings of prior art. Since I am not qualified to figure out if a citation is germane, I will leave it to you to wade through the full page of US patent, foreign patent documents, and other publications. Who wants to question the work of the primary examiner and the Faegre Baker Daniels “attorney, agent, or firm” tackling 730.

On to the claims. The patent lists 28 claims. Many of them refer to operations within the world of what the inventors call expanded Subject-Action-Object or eSAO. The idea is that the system figures out parts of speech, looks up stuff in various knowledge bases and automatically generated indexes, and presents the answer to the user’s question. The lingo of the patent is sufficiently broad to allow the system to accommodate an automated query in a way that reminded me of Ramanathan Guha’s massive semantic system. I cover some of Dr. Guha’s work in my now out of print monograph, Google Version 2.0, published by one of the specialist publishers that perform Schubmehl-like maneuvers.

My first pass through the 730’s claims was a sense of déjà vu, which is obviously not correct. The invention has been award the status of a “patent”; therefore, the invention is novel. Nevertheless, these concepts pecked away at me with the repetitiveness of the woodpecker outside my window this morning:

Automatic semantic labeling which I interpreted as automatic indexing
Natural language process, which I understand suggests the user takes the time to write a question that is neither too broad nor too narrow. Like the children’s story, the query is “just right.”
Assembly of bits and chunks of indexed documents into an answer. For me the idea is that the system does not generate a list of hits that are probably germane to the query. The Holy Grail of search is delivering to the often lazy, busy, or clueless user an answer. Google does this for mobile users by looking at a particular user’s behavior and the clusters to which the user belongs in the eyes of Google math, and just displaying the location of the pizza joint or the fact that a parking garage at the airport has an empty space.
The system figures out parts of speech, various relationships, and who-does-what-to-whom. Parts of speech tagging has been around for a while and it works as long as the text processed in not in the argot of a specialist group plotting some activity in a favela in Rio.
The system performs the “e” function. I interpreted the “e” to mean a variant of synonym expansion. DR LINK, for example, was able in 1998 to process the phrase white house and display content relevant to presidential activities. I don’t recall how this expansion from bound phrase to presidential to Clinton. I do recall that DR LINK had what might be characterized as a healthy appetite for computing resources to perform its expansions during indexing and during query processing. This stuff is symmetrical. What happens to source content has to happen during query processing in some way.
Relevance ranking takes place. Various methods are in use by search and content processing vendors. Some of based on statistical methods. Others are based on numerical recipes that the developer knows can be computed within the limits of the computer systems available today. No N=NP, please. This is search.
There are linguistic patterns. When I read about linguistic patterns I recall the wild and crazy linguistic methods of Delphes, for example. Linguistics are in demand today and specialist vendors like Bitext in Madrid, Spain, are in demand. English, Chinese, and Russian are widely used languages. But darned useful information is available in other languages. Many of these are kept fresh via neologisms and slang. I often asked my intelligence community audiences, “What does teddy bear mean?” The answer is NOT a child’s toy. The clue is the price tag suggested on sites like eBay auctions.

The interesting angle in 730 is the causal relationship. When applied to processes in the knowledge bases, I can see how a group of patents can be searched for a process. The result list could display ways to accomplish a task. NOTting out patents for which a royalty is required leaves the searcher with systems and methods that can be used, ideally without any hassles from attorneys or licensing agents.

Several questions popped into my mind as I reviewed the claims. Let me highlight three of these:

First, computational load when large numbers of new documents and changed content has to be processed. The indexes have to be updated. For small domains of content like 50,000 technical reports created by an engineering company, I think the system will zip along like a 2014 Volkswagen Golf.

Source: US8666730, Figure 1

When terabytes of content arrived every minute, then the functions set forth in the block diagram for 730 have to be appropriately resourced. (For me, “appropriately resourced” means lots of bandwidth, storage, and computational horsepower.)

Second, the knowledge base, as I thought about when I first read the patent, has to be kept in tip top shape. For scientific, technical, and medical content, this is a more manageable task. However, when processing intercepts in slang filled Pashto, there is a bit more work required. In general, high volumes of non technical lingo become a bottleneck. The bottleneck can be resolved, but none of the solutions are likely to make a budget conscious senior manager enjoy his lunch. In fact, the problem of processing large flows of textual content is acute. Short cuts are put in place and few of those in the know understand the impact of trimming on the results of a query. Don’t ask. Don’t tell. Good advice when digging into certain types of content processing systems.

Third, the reference to databases begs this question, “What is the amount of storage required to reduce index latency to less than 10 seconds for new and changed content?” Another question, “What is the gap that exists for a user asking a mission critical question between new and changed content and the indexes against which the mission critical query is passed?” This is not system response time, which as I recall for DR LINK era systems was measured in minutes. The user sends a query to the system. The new or changed information is not yet in the index. The user makes a decision (big or small, significant or insignificant) based on incomplete, incorrect, or stale information. No big problem is one is researching a competitor’s new product. Big problem when trying to figure out what missile capability exists now in an region of conflict.

My interest is enterprise search. IHS, a professional publishing company that is in the business of licensing access to its for fee data, seems to be moving into the enterprise search market. (See http://bit.ly/1o4FyL3.) My researchers (an unreliable bunch of goslings) and I will be monitoring the success of IHS. Questions of interest to me include:

What is the fully loaded first year cost of the IHS enterprise search solution? For on premises installations? For cloud based deployment? For content acquisition? For optimization? For training?
How will the IHS system handle flows of real time content into its content processing system? What is the load time for 100 terabytes of text content with an average document size of 50 Kb? What happens to attachments, images, engineering drawings, and videos embedded in the stream as native files or as links to external servers?
What is the response time for a user’s query? How does the user modify a query in a manner so that result sets are brought more in line with what the user thought he was requesting?
How do answers make use of visual outputs which are becoming increasingly popular in search systems from Palantir, Recorded Future, and similar providers?
How easy is it to scale content processing and index refreshing to keep pace with the doubling of content every six to eight weeks that is becoming increasingly commonplace for industrial strength enterprise search systems? How much reengineering is required for log scale jumps in content flows and user queries?

Take a look at 730 an d others in the Invention Machine (IHS) patent family. My hunch is that if IHS is looking for a big bucks return from enterprise search sales, IHS may find that its narrow margins will be subjected to increased stress. Enterprise search has never been nor is now a license to print money. When a search system does pump out hundreds of millions in revenue, it seems that some folks are skeptical. Autonomy and Fast Search & Transfer are companies with some useful lessons for those who want a digital Klondike.

Written by Stephen E. Arnold · Filed Under algorithms, News, Search | 2 Comments

Markov Chains for the English Major Turned Search Expert

July 31, 2014

I know there are quite a few experts in enterprise search, content processing, and the near mystical Big Data thing. I wanted to point out that if you want to know more about Markov Chains so you can explain how stuff works in most content centric systems with fancy math work, this is for you. Navigate to Setosa Blog and Markov Chains: A Visual Explanation. This one is pretty good. You can poke around for an IBM presentation on the same subject. IBM includes some examples of the way the numerical recipe can assign a probability to an event that is likely to take place.

Stephen E Arnold, July 31, 2014

Written by Stephen E. Arnold · Filed Under Analytics, News | Comments Off on Markov Chains for the English Major Turned Search Expert

Gartner and Enterprise Search 2014

July 31, 2014

At lunch yesterday, several search aware people discussed a July 2014 Gartner study. One of the folks had a crumpled image of the July 2014 “magic quadrant.” This is, I believe, report number G00260831. Like other mid tier consulting firms, Gartner works hard to find something that will hook customers’ and prospects’ attention. The Gartner approach is focused on companies that purport to have enterprise search systems. From my vantage point, the Gartner approach is miles ahead of the wild and illogical IDC report about knowledge, a “quotient,” and “unlocking” hidden value. See http://bit.ly/1rpQymz. Now I have not fallen in love with Gartner. The situation is more like my finding my content and my name for sale on Amazon. You can see what my attorney complained about via this link, http://bit.ly/1k7HT8k. I think I was “schubmehled,” not outwitted.

I am the really good looking person. Image source: http://bit.ly/1rPWjN3

What the IDC report lacks in comprehensiveness with regard to vendors, Gartner mentions quite a few companies allegedly offering enterprise search solutions. You must chase down your local Garnter sales person for more details. I want to summarize the points that surfaced in our lunch time pizza fest.

First, the Gartner “study” includes 18 or 19 vendors. Recommind is on the Gartner list even though a supremely confident public relations “professional” named Laurent Ionta insisted that Recommind was not in the July 2014 Gartner report. I called her attention to report number G00260831 and urged her to use her “bulldog” motivation to contact her client and Gartner’s experts to get the information from the horse’s mouth as it were. (Her firm is www.lewispr.com and its is supported to be the Digital Agency of the Year and on the Inc 5000 list of the fastest growing companies in America.) I am impressed with the accolades she included in her emails to me. The fact that this person who may work on the Recommind account was unaware that Gartner pegged Recommind as a niche player seemed like a flub of the first rank. When it comes to search, not even those in the search sector may know who’s on first or among the chosen 19.

To continue with my first take away from lunch, there were several companies that those at lunch thought should be included in the Gartner “analysis.” As I recall, the companies to which my motley lunch group wanted Gartner to apply their considerable objective and subjective talents were:

ElasticSearch. This in my view is the Big Dog in enterprise search at the moment. The sole reason is that ElasticSearch has received an injection of another $70 million to complement the $30 odd million it had previously gather. Oh, ElasticSearch is a developer magnet. Other search vendors should be so popular with the community crowd.
Oracle. This company owns and seems to offer Endeca solutions along with RightNow/InQuira natural language processing for enterprise customer support, the fading Secure Enterprise Search system, and still popping and snapping Oracle Text. I did not mention to the lunch crowd that Oracle also owns Artificial Linguistics and Triple Hop technology. This information was, in my view, irrelevant to my lunch mates.
SphinxSearch. This system is still getting love from the MySQL contingent. Imagine no complex structured query language syntax to find information tucked in a cell.

There are some other information retrieval outfits that I thought of mentioning, but again, my free lunch group does not know what it does not know. Like many folks who discuss search with me, learning details about search systems is not even on the menu. Even when the information is free, few want to confuse fantasy with reality.

The second take away is that rational for putting most vendors in the niche category puzzled me. If a company really has an enterprise search solution, how is that solution a niche? The companies identified as those who can see where search is going are, as I heard, labeled “visionaries.” The problem is that I am not sure what a search visionary is; for example, how does a French aerospace and engineering firm qualify as a visionary? Was HP a visionary when it bought Autonomy, wrote off $8 billion, and initiated litigation against former colleagues? How does this Google supplied definition apply to enterprise search:

able to see visions in a dream or trance, or as a supernatural apparition?

The final takeaway for me was the failure to include any search system from China, Germany, or Russia. Interesting. Even my down on their heels lunch group was aware of Yandex and its effort in enterprise search via a Yandex appliance. Well, internationalization only goes so far I suppose.

I recall hearing one of my luncheon guests say that IBM was, according the “experts” at Gartner, a niche player.Gentle reader, I can describe IBM many ways, but I am not sure it is a niche player like Exorbyte (eCommerce mostly) and MarkLogic (XML data management). Nope, IBM’s search embraces winning Jeopardy, creating recipes with tamarind, and curing assorted diseases. And IBM offers plain old search as part of DB2 and its content management products plus some products obtained via acquisition. Cybertap search, anyone? When someone installs, what used to be OmniFind, I thought IBM was providing an enterprise class information retrieval solution. Guess I am wrong again.

Net net: Gartner has prepared the ground for a raft of follow on analyses. I would suggest that you purchase a copy of the July 2014 Gartner search report. You may be able to get your bearings so you can answer these questions:

What are the functional differences among the enterprise search systems?
How does the HP Autonomy “solution” compare to the pre-HP Autonomy solution?
What is the cost of a Google Search Appliance compared to a competing product from Maxxcat or Thunderstone? (Yep, two more vendors not in the Gartner sample.)
What causes a company to move from being a challenger in search to a niche player?
What makes both a printer company and a Microsoft-centric solution qualified to match up with Google and HP Autonomy in enterprise search?
What are the licensing costs, customizing costs, optimizing costs, and scaling costs of each company’s enterprise search solution? (You can find the going rate for the Google Search Appliance at www.gsaadvantage.gov. The other 18? Good luck.)

I will leave you to your enterprise search missions. Remember. Gartner, unlike some other mid-tier consulting firms, makes an effort to try to talk about what its consultants perceive as concrete aspects of information retrieval. Other outfits not so much. That’s why I remain confused about the IDC KQ (knowledge quotient) thing, the meaning of hidden value, and unlocking. Is information like a bike padlock?

Stephen E Arnold, July 31, 2014

Written by Stephen E. Arnold · Filed Under Consulting, Enterprise search, Feature | 4 Comments

Software-Cluster Promises Better Mobile Services

July 31, 2014

A German software company popped into the aggregator with a press release entitled “Software-Cluster Designing Platforms For Innovative Internet Services” from ConWeaver. The release details how Software-Cluster is working on a platform to provide its customers with better connectively to mobile services. The information is very vague, but Software-Cluster’s Website yields a bit more one what the company actually does.

The page “Software-Cluster Enables Platforms For Innovative Internet Services” (translated via Google Translate) explains that Software-Cluster is currently focused on ending the communication issues users face when they are using multiple apps.

“For a variety of individual services such complex applications may arise. The platform allows the management of Internet-based services, makes these comparable and combines retrievable and usable. The user receives an appropriate solution to his problem of these services tailored that can be billed according to usage. The platform offers a range of standard-based services for the metering, monitoring and billing of the services offered on the platform can use the provider of services on the platform. To this end, the platform uses a service repository in which the services are stored and about the to which may be measured, as well as tools for service discovery and service selection and billing for the services used.”

The platform will be offered as an SaaS, PaaS, IaaS and is made with three specific markets in mind: handling logistics processes, trading energy quotas, and resolving mobility challenges.

Seamless integration between cloud and mobile apps? Yeah, there is a market for that.

Whitney Grace, July 31, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Mobile, News | 2 Comments

Google Not Winning On Removing Information

July 31, 2014

Google has faced numerous lawsuits about having content removed from search results. International Business Times explains about a current battle in the UK: “Google’s Right To Be Forgotten: 70,000 Politicians, Criminals, And Individuals Want Offending Content Erased.” The European Union Court of Justice ruled in May that European citizens have the “right to be forgotten” and thusly their information removed from search results. Google’s received over 70,000 takedown requests. Google argues that it helps people keep their reputations intact and able to recover from past mistakes.

Google has removed many links related to UK media organizations. As one can imagine, users are not too happy about this, because the common belief is that once it is on the Internet it should be free to all.

Google says the opposite.

“ ‘The issues at stake here are important and difficult, but we’re committed to complying with the court’s decision,’ writes [David Drummond, Google’s chief legal officer]. ‘Indeed, it’s hard not to empathise with some of the requests that we’ve seen – from the man who asked that we do not show a news article saying that he had been questioned in connection with a crime (he’s able to demonstrate that he was never charged) to the mother who requested that we remove news articles for her daughter’s name as she had been the victim of abuse.’ “

Google’s created an advisory council to handle all request. They even post the question “How should one person’s right to be forgotten be balanced with the public’s right to know?” on the page. It is a philosophical question, but it appears to be taken on a case by case basis. How long will Google be willing to do that?

Whitney Grace, July 31, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Google, Legal matters, News | Comments Off on Google Not Winning On Removing Information

Twitter: Short Text Outfit Gets Excited about Computer Vision

July 30, 2014

Robots. Magic stuff like semantic concept lenses. Logical wackiness like software that delivers knowledge.

I read “Twitter Acquires Deep Learning Startup Madbits.” The write up points out to the drooling venture capitalists that Twitter’s purchase is “the latest in a spate of deep learning and computer vision acquisitions that also includes Google, Yahoo, Dropbox, and Pinterest.” What this means is that these oh-so-hot outfits are purchasing companies that purport to have software that can figure stuff out.

I recall a demonstration in Japan in the late 1990s. I was giving some talks in Osaka and Tokyo. One of the groups I met showed me a software system that could process a picture and spit out what was in the picture. I remember that the system was able to analyze a photo of a black and white cow standing in a green pasture.

The software nailed it. The system displayed in Japanese, ?. My hosts explained that the idiograph meant “cow.” High fives ensued. On other pictures, the system did not perform particularly well.

Flash forward 30 years. In a demonstration of image recognition at an intelligence conference, the system worked like a champ on clear images that allowed the algorithm to identify key points, compute distances, and then scurry off to match the numeric values of one face with those in the demo system’s index. The system, after decades of effort and massive computational horsepower increases, was batting about .500.

The problem is that different pictures have different looks. When the subject is wearing a baseball cap, has grown a beard, or is simply laughing, the system does not do particularly well.

You can see how Google performs. Navigate to Google Images, select a picture of a monument, and let Google find matches. Some are spot on. I use a perfect match example in my lectures about open source intelligence tools. I have some snaps in my presentatio0n that do not work particularly well. Here’s an example of a Google baffler:

This is a stuffed pony wearing a hat. Photo was taken in Santiago, Chile at an outdoor flea market.

This is the match that Google returned:

Notice that there were no stuffed horses in the retrieved data set. The “noise” in the original image does not fool a human. Google algorithms are another kettle of fish or booth filled with stuffed ponies.

The Twitter purchase of Madbits (the name suggests swarming or ant methods) delivers some smart folks who have, according to the write up, developed software that:

automatically understands, organizes and extracts relevant information from raw media. Understanding the content of an image, whether or not there are tags associated with that image, is a complex challenge. We developed our technology based on deep learning, an approach to statistical machine learning that involves stacking simple projections to form powerful hierarchical models of a signal.

Once some demonstrations of Twitter’s scaling of this interesting technology is available, I can run the radiation poisoning test. Math is wonderful except when it is not able to do what users expect, hope, or really want to get.

Marketing is good. Perhaps Twitter will allow me to track down this vendor of stuffed ponies. (Yep, it looked real to me.) I know, I know. This stuff works like a champ in the novels penned by Alastair Reynolds. Someday.

Stephen E Arnold, July 30, 2014

Written by Stephen E. Arnold · Filed Under Image search, News | Comments Off on Twitter: Short Text Outfit Gets Excited about Computer Vision

Quote to Note: Thomson Reuters Sustainable Growth

July 30, 2014

Good news for Thomson Reuters, one of the bellwether outfits for professional publishing and “real” news. The company continues to struggle with flat line revenue. But profits are up. You can read the good news in the Thomson Reuters’ story about Thomson Reuters in “Thomson Reuters Reports Rise in Revenue, Profit.” Nestled comfortably in the story is a quote to note:

“The actions we are taking are building a platform for sustainable growth,” Smith said, “and we will continue to simplify our organization and position resources behind the most promising growth opportunities.”

I will not ask the question, “Is sustainable growth based on flat top line revenue?” I will not ask, “What cost cutting steps are in store for employees in the next six to nine months?”

Thomson Reuters’ frequently rotated executives have been trimming, squeezing, and cutting back for four, maybe five or more years. Other professional publishing companies have been trodding the same path, now becoming well worn. The easy reductions may be difficult to identify. Whatever is next may be like a person forced to dine on a weight loss clinic in Arizona.

One atta boy for Thomson Reuters: Years ago when I did some low level work for them, then top dog Michael Brown and Gene Gartlan paid the bill and were quite professional. I wish to point out that their behavior stands in sharp contrast to that of IDC, a mid tier consulting firm, who took a different approach toward my work. See my Schubmehl surfing write up at http://bit.ly/1o8XCiF. I am cheering for Thomson Reuters. IDC? Eh, not so much.

Stephen E Arnold, July 30, 2014

Written by Stephen E. Arnold · Filed Under Financial, News | Comments Off on Quote to Note: Thomson Reuters Sustainable Growth

No Search Or Publishing For Science

July 30, 2014

The scientific method is used to approach a problem logically and come to reasonable conclusion based off the presented evidence. Allow me to present the following question: if only a small percentage of scientists publish their work, does that not distort scientific information? Let us approach this problem in the same manner that Erik Stokstad did in his Science Magazine article “The 1% Of Scientific Publishing.”

Stokstad already knew it was tough to get published in a scientific journal, but his findings were that one percent of scientists actually see their work published on a continuous basis and that equals 150,608 people. The number comes from a study done by John Ioannidis of Stanford University when he and colleagues searched Elsevier’s Scopus database of papers published in 1996-2001. Most of these scientists head laboratories, thusly adding their name to every research project or they have garnered enough of a reputation to do whatever they want in the scientific community.

What is sad is that new minds are often overlooked:

“But there’s also a lot of grunt work behind these papers that appear like clockwork from highly productive labs. ‘In many disciplines, doctoral students may be enrolled in high numbers, offering a cheap workforce,’ Ioannidis and his co-authors write in their paper. These students may spend years on research that yields, then, only one or a few papers. ‘[I]n these cases, the research system may be exploiting the work of millions of young scientists.’ ”

Based on the findings, it leads to the conclusion that only a small percentage of scientific research is available. The results are distorted and favor one side of the scale. It is an aggravating thought, especially with digital publishing. You would think that with the infinite amount of digital space that publishers would not be worried about the paper copies anymore.

Whitney Grace, July 30, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under News, Publishing | 1 Comment

Google Manipulating Content And Yelp Is Perturbed

July 30, 2014

Google is manipulating search results so that they favor Google content over Yelp. What a big shocker! Not really, so why are some people surprised? TechCrunch says “Leaked Documents Show How Yelp Think It Is Getting Screwed By Google.” To retaliate, Yelp has joined a formal complaint about the leniency of an EU antitrust settlement with Google. In 2013, Google flew through a similar settlement in the US that deemed bumping up their own products was legal. Yelp and others do not want the same thing to happen in the EU.

The leaked documents show how Google search results differ in the US and in Europe. Follow the above link and you will be amazed how US results are like looking at a Google shopping catalog and EU results are more toned down.

“My source says Yelp believes this is because Google wants to downplay in front of the EU regulators how it manipulates results until a lenient settlement passes. The source says that in some cases, even searching on Google.com from a European IP address will illustrate less aggressive marketing of Google’s own services like Google+, suggesting Google is actively trying to hide these result formats from people in Europe.”

That’s the just the tip of the iceberg when you read the other leaked documents, including a user-behavior study Yelp headed. Google’s tactics are classic ways to deter clients from competitors’ products. Unlike physically taking a customer away from Yelp, Google is altering the information, so people are digest Google content from them and only them. The argument then is are Google’s tactics interfering with a free market?

Whitney Grace, July 30, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Google, News | 19 Comments

SharePoint is Valuable but Underutilized Legal Tool

July 30, 2014

Document discovery is a big deal in the legal world – it is not only important but it is also time consuming. Lots of specialty software exists to aid legal firms in document management, but one writer shares his belief that SharePoint can do a pretty good job on its own. Jeff Bennion writes on AboveTheLaw in his article, “Why SharePoint Is The Most Underutilized Legal Tool That Microsoft Has To Offer.”

Bennion writes:

“SharePoint is an online project management system. It does a lot if things. If you work in a law office, you are going to probably use it for its document management capabilities and maybe its intranet capabilities. If any of you have had the misfortune of working in a large firm, you know what document management systems are. It’s somewhat clunky software that lets you categorize documents firm-wide using preset categories . . . SharePoint solves this problem: upload your documents into SharePoint and create columns and tags for them.”

And while every legal firm won’t have the time or capability to customize SharePoint and train its employees on how to use it, many will find this low cost solution helpful. Stephen E. Arnold highlights ideas like this on the SharePoint feed of his Web site, ArnoldIT.com. He has devoted his career to all things search, and SharePoint plays a large role in the current enterprise search discussion.

Emily Rae Aldridge, July 30, 2014

Written by Stephen E. Arnold · Filed Under News, SharePoint | 2 Comments

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

The IHS Invention Machine: US 8,666,730

Markov Chains for the English Major Turned Search Expert

Gartner and Enterprise Search 2014

Software-Cluster Promises Better Mobile Services

Google Not Winning On Removing Information

Twitter: Short Text Outfit Gets Excited about Computer Vision

Quote to Note: Thomson Reuters Sustainable Growth

No Search Or Publishing For Science

Google Manipulating Content And Yelp Is Perturbed

SharePoint is Valuable but Underutilized Legal Tool

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta