Automated Understanding: Digital Reasoning Cracks the Information Maze
March 4, 2011
I learned from one reader that the presentation by Tim Estes, the founder of Digital Reasoning, caused some positive buzz at a recent conference on the west coast. According to my source, this was a US government sponsored event focused on where content processing was going. The surprise was that as other presenters talked about the future, a company called Digital Reasoning displayed a next generation system. Keep in mind that i2 Ltd. is a solid analyst’s tool with technology roots that stretch back 15 years. (I did some work for the founder of i2 a few years ago and have a great appreciation for the case value of the system for law enforcement.) Palantir has some useful visualization tools, but the company continues to attract attention from litigation and brushes with outfits with some interesting sales practices. Beyond Search covered this story here and here.
ArnoldIT.com sees Digital Reasoning’s Synthesys as solving difficult information puzzles quickly and efficiently because it eliminates most of the false path or trial-and-error of traditional systems. Solving the information maze of real world flows is now possible in our view.
The shift was from semi-useful predictive numerical recipes and overlays or augmented outputs to something quite new and different. The Digital Reasoning presentation focused on real data and what the company called “automated understanding.”
For a few bucks last year, one of my colleagues and I got a look at the automated understanding approach of the Synthesys 3 platform. Tim Estes explained that real data poses major challenges to systems that lack an ability to process large flows, discern nuances, and apply what Mr. Estes described as “entity oriented analytics.”
Our take at ArnoldIT.com is that Digital Reasoning moves “beyond search” in a meaningful way. The key points we recall from our briefing was the a modular approach eliminates the need for a massive infrastructure build and the analytics reflect what is happening in a real time flow of unstructured information. My personal view is that historical research is best served by key word systems. The more advanced methods deliver actionable information and better decisions by focusing on the vast amounts of “now” data. A single Twitter message can be important. A meaningful analysis of a flow of Twitter messages moves insight to the next level.
Google and Search Tweaks
February 25, 2011
Chatter blizzard! There is a flurry of commentary about Google’s change to cope with outfits that generate content to attract traffic, get a high Google ranking, and deliver information to users! You can read the Google explanation in “Finding More High-Quality Sites in Search” and learn about the tweaks. I found this passage interesting:
We can’t make a major improvement without affecting rankings for many sites. It has to be that some sites will go up and some will go down. Google depends on the high-quality content created by wonderful websites around the world, and we do have a responsibility to encourage a healthy web ecosystem. Therefore, it is important for high-quality sites to be rewarded, and that’s exactly what this change does.
Google faces increasing scrutiny for its display of content from some European Web sites. In fact, one of the companies affected has filed an anti trust complain against Google. You can read about the 1PlusV matter and the legal information site EJustice at this link (at least for a while. News has a tendency to disappear these days.)
Source: http://www.mentalamusement.com/our%20store/poker/casino_accessories.htm
Why did I find this passage interesting?
Well, it seems that when Google makes a fix, some sites go up or down in the results list. Interesting because as I understand the 1PlusV issue, the site arbitrarily disappeared and then reappeared. On one hand, human intervention doesn’t work very well. And, if 1PlusV is correct, human intervention does work pretty well.
Which is it? Algorithm that adapts or a human or two doing their thing independently or as the fingers of a committee.
I don’t know. My interest in how Google indexed Web sites diminished when I realized that Google results were deteriorating over the last few years. Now my queries are fairly specialized, and most of the information I need appears in third party sources. Google’s index, for me, is useful, but it is now just another click on a series of services I must use to locate information.
A good example is trying to locate information about a specific US government program. The line up of services I had to use to locate the specific item of information I sought included:
- Bing.com
- Blekko.com
- DuckDuckGo.com
- EBSCO Electronic Publishing
- Exalead Search
- Google.com
- Google’s little known Uncle Sam
- Lexis.com
- ProQuest
- USA.gov (Keep in mind that my son is involved with this outfit.)
I also enlisted the help of two specialists. One in Israel and one here in the United States. As you can see, Google’s two services made up about one tenth of my bibliographic research.
Why?
First, Google’s Web index appears larger to me, but it seems to me that it returns hits that are distorted by either search engine optimization tricks such as auto-generated pages. These are mostly useless to me as are links to sites that contain incorrect information and Web pages for which the link is dead and the content no longer in the Google cache.
In my experience, this happens frequently when running queries for certain government agencies such as Health and Human Services or the documents for a US Congressional hearing. Your mileage may differ because the topics for which I want information are far from popular.
Second, I need coverage that does not arbitrarily stop after following links a couple of levels deep. Some services like Exalead do a better job of digging into the guts of large sites, particularly for certain European sources.
Third, the Blekko folks are going a pretty good job of keeping the older information easily identifiable. This date tagging is important to me, and I appreciate either seeing an explicit date or have a link to a page that displays a creation date.
Who Defriended Google?
February 24, 2011
Did Facebook defriend Google? Did Google defriend Facebook? With Xooglers making up about 20 percent of the Facebook staff, the questions are not innocuous. The fate of Google’s new social play may hang in the balance. What are friends for?
Meow.
There’s something catty about how Google has snubbed Facebook in the latest iteration of Google Social. The official blog post to announce the new improvements says not one word about Facebook, the elephant in the room. In “Analysis: Google Social Search Is All About Blocking Facebook/Twitter Search” Tom Foremski’s take is that this
“Google move is better understood as a blocking measure to stop people from asking their social network directly. “
Will it work? Let’s think about it.
Google Social has been around since 2009, but these latest improvements take results that were at the bottom of the screen and place them high up in the search results, as well as adding notes for links your connections have shared, and expanded the ways you can connect your accounts. Google, of course, always tries to act like it’s taking the high road when it comes to Facebook, stressing that Facebook is a closed system while Google is as open and free as the air we breathe. Personally, I think public data is overrated and I think many other people do too. Why else is there a huge backlash every time Facebook tries to sneak in more openness to its users’ profiles?
What happens when the big dogs set up a pack without a little dog? Answer: Bowling alone.
When I look at Google Social, I have to ask myself if people would choose this over Facebook. Facebook, of course, has momentum on its side since nearly everyone and his grandmother is on Facebook already and accessing it frequently. Another question is how can Google know whose opinion I actually care about when giving me search results?
When Intelligence Methods Go Out of Bounds
February 21, 2011
For many years, the ArnoldIT.com team has supported different next-generation technology firms. It is important to go “beyond search”, but the question is, “How far should an organization go to get work to keep revenues flowing?” We have worked from some interesting US government agencies. We built plumbing for a couple of information “push” systems that bridged the gap between search and actionable intelligence. In the course of that work, we have been successful in separating commercial work from the intelligence work.
Fact is, most of the companies with which we have some knowledge operate in a similar matter. Keeping the commercial application of technology distinct from the non-commercial application of technology has been a standard practice. No one told me to keep the work distinct. The learning was imparted by culture, first at the nuclear unit of Halliburton and later at the technology unit of the original, pre-break up Booz, Allen & Hamilton.
In search and content processing technology, life has become more complicated for three reasons. First, there is intense pressure on firms with next generation technology to generate revenue. Information processing software is among the most costly to develop, enhance, and enrich. With that pressure for funding comes some different expectations about what to do to pay the bills.
Second, there is more awareness of what can be done with flows of data processed by next generation systems. Even the least sophisticated Web search user recognizes that the ads are either related to the subject of the search or reasonably pertinent to the particular user.
Source: My home town newspaper. The Peoria Journal Star.
Going after the ball when out of bounds.
Third, the cultural boundaries of distinct information communities is becoming more porous. Information technology osmosis is now a fact of life.
When one combines these three factors, one consequence has been the disquieting disclosure that a number of firms appear to be using certain types of information technology in ways that run counter to expectations. The example fresh in my mind is the disclosure of emails, PowerPoints, and chit chat about the use of next generation information technology to “bring down” Wikileaks and individuals associated with that Web site.
What Has Google Learned from Salesforce.com?
February 15, 2011
Several years ago, a Googler made a comment about Salesforce.com. My recollection is that the person, no longer laboring in the Elysian Fields said, “We really like those guys.” The “those guys” referred to Salesforce.com. In one of my Google briefings, I mentioned that Google was not yet ready to get married to Salesforce.com. Google did some flirting but nothing serious.
I read the oddly named article “Could Google Risk Taking on the CRM Market?” Despite the headline, the main idea of the article is sound. The question becomes, “Will Google compete with the big dogs in customer relationship management?” Now CRM is an ambiguous phrase. For some poobahs, CRM is nothing more than a reason to sell a search and retrieval system so the “customer” can look up his or her own answers. The idea is for the licensee to fire staff or outsource to customers and low wage workers as much of the annoying queries from paying customers. CRM is a refuge for search and retrieval vendors who find the competition too stiff for the big jobs which go to companies with modern, scalable systems that work as information platforms. CRM is a smaller fish pond and the fish are not the predators found in the Fortune 1000 market.
I see the future… Source: http://reason.com/blog/2011/01/10/the-cbos-crystal-ball
Others, like Salesforce.com, see CRM as a way to keep track of prospects, proposals, and the detritus essential to closing a deal. Sales professionals, as you may know, live or die by their contacts. Putting those contacts in a system that the boss can tap for reports is a hot idea for some senior managers. Mercenary sales professionals use systems like Salesforce.com to manage their professional life. No matter what happens on the job or when a laptop is ripped off, the sales person’s contacts are safe in the Salesforce.com cloud.
The Wages of SEO Sin
February 13, 2011
So Google can be fooled. It’s not nice to fool Mother Google. The inverse, however, is not accurate. Mother Google can take some liberties. Any indexing system can. Objectivity is in the eye of the beholder or the person who pays for results.
Judging from the torrent of posts from “experts”, the big guns of search are saying, “We told you so.” The trigger for this outburst of criticism is the New York Times’s write up about JC Penny. You can try this link, but I expect that it and its SEO crunchy headline will go dark shortly. (Yep, the NYT is in the SEO game too.)
Everyone from AOL news to blog-o-rama wizards are reviling Google for not figuring out how to stop folks from gaming the system. Sigh.
I am not sure how many years ago I wrote the “search sucks” article for Searcher Magazine. My position was clear long before the JC Penny affair and the slowly growing awareness that search is anything BUT objective.
Source: http://www.brianjamesnyc.com/blog/?p=157
In the good old days, database bias was set forth in the editorial policies for online files. You could disagree with what we selected for ABI/INFORM, but we made an effort to explain what we selected, why we selected certain items for the file, and how the decision affected assignment of index terms and classification codes. The point was that we were explaining the mechanism for making a database which we hoped would be useful. We were successful, and we tried to avoid the silliness of claiming comprehensive coverage. We had an editorial policy, and we shaped our work to that policy. Most people in 1980 did not know much about online. I am willing to risk this statement: I don’t think too many people in 2011 know about online and Web indexing. In the absence of knowledge, some remarkable actions occur.
You don’t know what you don’t know or the unknown unknowns. Source: http://dealbreaker.com/donald-rumsfeld/
Flash forward to the Web. Most users assume incorrectly that a search engine is objective. Baloney. Just as we set an editorial policy for ABI/INFORM each crawler and content processing system has similar decisions beneath it.
The difference is that at ABI/INFORM we explained our bias. The modern Web and enterprise search engines don’t. If a system tries to explain what it does, most of the failed Web masters, English majors working as consultants, and unemployed lawyers turned search experts just don’t care.
Search and content processing are complicated businesses, and the appetite for the gory details about certain issues are of zero interest to most professionals. Here’s a quick list of “decisions” that must be made for a basic search engine:
- How deep will we crawl? Most engines set a limit. No one, not even Google, has the time or money to follow every link.
- How frequently will we update? Most search engines have to allocate resources in order to get a reasonable index refresh. Sites that get zero traffic don’t get updated too often. Sites that are sprawling and deep may get three of four levels of indexing. The rest? Forget it.
- What will we index? Most people perceive the various Web search systems as indexing the entire Web. Baloney. Bing.com makes decisions about what to index and when, and I find that it favors certain verticals and trendy topics. Google does a bit better, but there are bluebirds, canaries, and sparrows. Bluebirds get indexed thoroughly and frequently. See Google News for an example. For Google’s Uncle Sam, a different schedule applies. In between, there are lots of sites and lots of factors at play, not the least of which is money.
- What is on the stop list? Yep, a list can kill index pointers, making the site invisible.
- When will we revisit a site with slow response time?
- What actions do we take when a site is owned by a key stakeholder?
Synthesys Platform Beta Available
February 7, 2011
Digital Reasoning alerted us last week that a new beta program for the Synthesys Platform is available. Digital Reasoning has emerged as one of “the leader in complex, large scale unstructured data analytics.” The Synthesys platform is one of the “leaders in complex, large scale unstructured data analytics.” We have interviewed the founder of Digital Reasoning in our Search Wizards Speak series. These interviews are available on ArnoldIT.com’s Search Wizards Speak series here and here. Digital Reasoning is one of the leaders in making next-generation analytics available via the cloud, on premises, and hybrid methods.
© Digital Reasoning, 2011
This platform version of Digital Reasoning’s software will provide beta users immediate API-level access to the firm’s analytics software and access to tools that will be added through the beta program.
Matthew Russell, vice president of engineering at Digital Reasoning said:
We are excited to introduce Synthesys Platform to the market. By allowing users to upload their data into the cloud for analysis, many more users will get the opportunity to experience next generation data analytics while exploring their own data.
Digital Reasoning Systems (www.digitalreasoning.com) solves the problem of information overload by providing the tools people need to understand relationships between entities in vast amounts of unstructured and structured data.
Digital Reasoning builds data analytic solutions based on a distinctive mathematical approach to understanding natural language. The value of Digital Reasoning is not only the ability to leverage an organization’s existing knowledge base, but also to reveal critical hidden information and relationships that may not have been apparent during manual or other automated analytic efforts. Synthesys is a registered trademark of Digital Reasoning Systems, Inc.
Digital Reasoning will be exhibiting at the upcoming Strata Conference on February 28 and March 1, 2011. For more information about Digital Reasoning, navigate to the company’s Web site at www.digitalreasoning.com.
Stephen E Arnold, February 7, 2011
Arnold Columns for February 2011
January 31, 2011
The ArnoldIT.com team has completed Stephen E Arnold’s for-fee columns for February 2011. These articles will run any time between mid-February 2011 and the end of April 2011. Print publications have longer production processes. Online versions of the columns may appear at different intervals.
This months’ topics by journal, tabloid, or online magazine are:
- For Enterprise Technology Management, the column talks about Google and its compound documents. Quite a search and retrieval challenge brewing we think. We don’t have an answer to searching compound documents when legal discovery kicks in, but we raise some questions for US readers or no US companies with offices in the USA.
- For Information Today, this month’s column takes a look at discovery services that have moved from the Department of Defense to a library near you. Our focus is on EBSCO, a giant in the commercial database and information services world. Librarians will like this write up.
- For KMWorld, the column talks about the semantic challenges of the new content types. We highlight Expert System, an Italian outfit with some nifty semantic technology.
- For Searcher Magazine, we took our 1999 essay about Internet video, critiqued it and identified our errors. Then we looked at what seems to be the trajectory of today’s Internet video options. The question we answer, “Is Internet video viable yet?” Yes, we discuss Google TV. Wow, what a product.
- For Smart Business Network, “Groupon: The Social Coupon Revolution.” The write up describes Groupon.com, mentions Living Social, and references Google’s forthcoming social coupon service, Google Offers. The column explains what businesses are more likely to succeed with social coupons and which are more likely to achieve unsatisfactory results.
Social Media, Niches, and Search
January 14, 2011
MySpace seems to be struggling. “Struggling” may be the wrong word, particularly if you were one of the hundreds of employees nuked in the riffing a few days ago. You can get the “real” news from Silicon Republic’s “MySpace Confirms Layoffs of 500 Staff Members.” I never was a MySpacer.
Image Source: http://www.webguild.org/20110104/myspace-to-layoff-50-of-employees. Good write up too.
The last time I watched a demo, the page on display flickered and brayed noise. What I do associate with MySpace are:
- iPad publisher and financial expert, Rupert Murdoch paid $580 million for the ur-Facebook.
- Mr. Murdoch’s comment: “The world is changing very fast. Big will not beat small anymore. It will be the fast beating the slow.” Source: Woopidoo.com here. I think of this quote when I read about Facebook’s Goldman Sachs’s deal and the MySpace “challenge”.
But what is interesting is that social media content is moving into a walled garden. Facebook’s content has value partly because of its walled garden. Even Google’s shift in support for content on YouTube reminds me of an exclusionary move. The chatter about curation, filtering, and controls translate in my addled goose brain to a shift from open to closed.
This has several implications for search:
First, the idea of going one place to access content is getting more and more difficult. I think the hurdles posed by registration processes and other methods of capturing “value” are building blocks of a new type of digital real estate: the private park, the walled garden, and a snooty country club. Who will be able to access which service for information.
Second, search is going to require a user to run the same query in different systems and then aggregate the meaningful results. Federated search is going to be increasingly important. Few users will tolerate manual hunting for a content collection, registering and maybe paying for access, and then figuring out what to do with results from different collections.
A bastion of the old way in the online walled garden business.
Third, because of the difficulty in accessing content, users—particularly in North America—will create an interesting new market for snippets, digests, and nuggets. Research for some people will become variations on “Farmville for Dummies” and “How to Lose 10 Pounds in 10 Days.”
What I find really interesting is that “the Internet” seems to be shaping into a variant of the original online industry. Content islands have to be visited by a researcher on a digital cruise ship. The pricing, access methods, and restrictions will vary. I thought research the pre-Internet way was gone for good.
Nope.
It’s 1980 all over again.
Stephen E Arnold, January 14, 2011
Freebie
A Trend in 2011: Criticizing Google?
January 5, 2011
The Atlantic Monthly is jumping into the digital world. What better way to rack up clicks than to tackle a subject that will work like an the best a search engine optimization expert can craft. Navigate to “Is Google Too Big?” The write up explains that some folks think Google is, well, to big. On the other hand, some folks think that Google is just fine. After 153 years of operation, the Atlantic Monthly—er, Atlantic Wire—let me know that Google is either too big or not too big. I whipped out one of my candy colored 4X6 note cards and jotted down: “Google, either too big or just right.”
Does anyone remember these ads about a strong person picking on a weaker person. Some companies are now picking on Google, which is far from a weakling in our opinion.
When I was a college debater, I treasured factoids that I could use to crush my opponents argument. One never knows when a “too big, just right” factoid will come in handy. For my part, the Google has been chugging along for 11, 12 years. Google has not changed all that much in the last five years. What’s changed is that folks are now understanding the importance of infrastructure, the third party payer model, the importance of integrated services, and usage tracking.
Light bulbs have been operating on a time delay. Too bad the room illuminated has been lived in for a long time by the Math Club members. Room occupied. Look elsewhere. But picking on Google is au courant.
Here’s a run down of the “challenges” Google faces in 2011: