A Fatter Big Brother? Search, Surveillance, and More
September 19, 2010
Big Brother. The Man. Spies. All three of these buzzwords conjures up many things in peoples minds. Who are they? What exactly do they do? Are they watching me and recording every move I make? To most, silly paranoia. But, in an eye opening article “Big Brothers of Multiculturalism,” Ms. Julienne Eden Buši?’s points got me thinking.
Follow along from this excerpt from the article:
The time I got into an argument with a waiter named Tony at a restaurant behind the Votiv Church and was escorted roughly out, never to return. They must have been snickering at my indignation, these omnipresent agents. Who does she think she is? Creating a ruckus, disturbing the other guests? Another time at the Prater amusement park, had they been there, too, when I had….oh the indignity of it all! It was bad enough that I remembered, but to think that others remembered, too, that they had written down all the gory details in a secret report so that others could visualize it as well….that they had then talked about it with still more people, perhaps their wives or colleagues, chuckled again about the “American girl”, her scandalous behavior, her embarrassment, excessiveness…this was unbearable. Who did she think she was, anyway? On the other hand, many of the other dossier allegations, observations, statements, conclusions were total fabrications, less believable than if they had written that I’d suddenly grown a long, hairy tail and sprouted horns, and intended quite obviously to gain praise from one’s boss, or perhaps a raise in position or salary. So how effective, after all, was the notorious spy agency, if its actions were predicated upon some agent’s literary flights of fancy?
The exact time frame of that statement is unknown, and information gathered in that same time is also unknown. The real question then is, What is the quality of the information gathered? Is it from a trusted source? How reliable are the “facts”? Someone gathered it from somewhere, but was that information handled correctly? It would also be a safe bet that some of this information recorded was not at all accurate, but an all out fabrication of a persons mind.
Fast forward to 9-11. After the attacks on America, the Federal Government shifts into ultra high gear. Overdrive is an understatement. The effort and investment are mirrored by Defense Secretary Robert Gates, “We did as we so often do in this country…the attitude was, if it’s worth doing, it’s probably worth overdoing.”
The article is a thought starter if largely unverified. Interesting to consider search, surveillance, and content processing in the context of Eden Buši?’s remarks.
Glenn Black, September 19, 2010
Freebie
Twitter Morphs into an Application
September 16, 2010
The Web pundits are in full stampede mode. Twitter, beloved of those who live and breathe real time connectivity, has changed from a Fail Whale into a Application. You can get a useful summary of the new features at “How Twitter.com Gives Your Favorite App a Run for Its Money.” The idea is that one does not need a service like Collecta.com or one of the dozen of other Twitter-attuned services to make sense of the tweet stream. Nope. You can do it all from Twitter. I find this development interesting for three reasons:
- The new layout makes monetization options blossom like dogwoods the week before the Kentucky Derby.
- The Twitter-centric services will have to put on their innovation sneakers and get moving. Twitter, long content to deal with stability issues and explaining what tweets are, is on the move.
- The shift takes another chunk out of the hide of traditional key word search. The narrowing by hash tags, the social component, the following—each of these makes a Boolean query look like a Babylonian clay tablet.
With complexity overwhelming many computer users, a service that becomes an application runs the risk of feature-itis. I find the new service quite interesting, but it tells me more about how companies like Twitter are reacting to the laundry list approach to finding information. That’s what makes the goose paddle faster.
And it is “real time.” That’s a fuzzy concept but it mashes up info in an app. Sort of new methinks.
Stephen E Arnold, September 16, 2010
Freebie
More Reassurances about Google Instant
September 14, 2010
Me thinks some doth protest too much. Apologies to Billy Shakespeare but the stories running in the “real” media’s Web sites and blog posts are catching my attention. From the goose pond, I see Google Instant as a marketing play, a service designed to pump up revenues, and a reminder that Googlers can have a potentially fatal disease called “feature-itis”.
You make up your own mind. Navigate to “Google: Concerns over Instant Unwarranted.” For professional journalists, the article is a long one. It has two parts. The story is an interview with a Googler wrapped in well-crafted rhetorical bookends. No problem. I could, if I were motivated, identify a quote to note in the verbiage.
instead I noted this passage:
As tends to happen whenever Google introduces a potentially disruptive technology, a debate has sprouted, in this case focused on how Instant potentially changes three things: the way publishers optimize their pages to rank in Google results; the way marketers pick and bid on keywords for search ad campaigns; and the way end users articulate queries and review results.
I look at this from the perspective of an addled goose and ask, “Why bother?” I recall one of the rich guys I used to work for before he keeled over from a stroke in one of his more interesting business facilities, “Never complain. Never explain.” I read the article and noted both complaining, well, maybe just whining and quite a bit of explaining.
Instant is for me a feature that strikes at the heart of search engine optimization’s base camp, gives the Google a reason to captivate the world’s media with crazy statements about saving billions of hours when searching, and triggers a “debate”. The reaction is interesting because it really means little to me.
What it tells me is this:
First, Google wants to capture headlines and attention after the holiday weekend. Mission accomplished. Good job, marketing department.
Second, Google has not really innovated because Instant strikes me as rewarding big companies and deep pockets. With Instant running, one has to focus in order to get a complete, original query into the search box and launched. The suggestions method will appeal to a certain type of Google user. Other types of Google users may shift to advanced search or just use a different service of which there are quite a few, gentle reader.
Third, the Instant function does not address the increasing problems I have experienced in getting fresh, precise, and relevant results. For example, I ran a series of queries on Google and a competitor called DuckDuckGo.com and on the Xoogler site Cuil.com. Guess what I found. I was able to obtain more relevant results on my test topic related to what are called “RAC” in the health care business than on Google.
In short, like Buzz and Wave, the benefits to me are not great. Therefore, the volubility of Google about Instant suggests that what looked so good over Odwallas may be having some unexpected consequences. Words won’t address these. Cats out of the bag are tough to recapture.
Stephen E Arnold, September 14, 2010
Freebie
Jetwick Twitter Search
September 6, 2010
Peter Karich on DZone “Twitter Search Jetwick – Powered by Wicket and Solr,” where he narrates his experiences with creation of ‘Jetwick’ provides some insight into the steps required to create a Twitter search. He noted, “From a quick-start project to production many if not all things can change.”
The article recounts the task to create a similar-user finding service ‘Jetwick’ on Twitter based on their tweeted content, whose prototype was created only within a week. It needed some extra tweaking and switching to Solr for facets, which was done in a couple of hours. However, user trials highlighted need of changes in the layout that was achieved by switching to another web UI, taking a couple of days. Then, the rectification process of a problem with Db4 took about a week, finally needing to switch to Hibernate that took another couple of weeks. However, as Twitter released a similar service, the base concept of Jetwick had to be changed from user-search to a regular tweet search. The final production changed considerably. We found this a useful case example.
Leena Singh, September 6, 2010
ZL Systems and TREC
August 13, 2010
I don’t write anything about TREC, the text retrieval conference “managed” by NIST (US Department of Commerce’s National Institute of Standards and Technology). The participants in the “tracks”, as I understand the rules, may not use the data for Madison Avenue-style cartwheels and reality distortion exercises.
The TREC work is focused on what I characterize as “interesting academic exercises.” Over the years, the commercial marketplace has moved in directions that are different from the activities for the TREC “tracks”. A TREC exercise is time consuming and expensive. The results are difficult for tire kickers to figure out. In the last three years, the commercial market is moving in a manner different from academic analyses. You may recall my mentioning that Autonomy had 20,000 customers and that Microsoft SharePoint has tens of millions of licensees. Each license contains search technology and cultivates a fiercely competitive ecosystem to “improve” findability in SharePoint. Google is chugging along without much worry about what’s happening outside of the Googleplex unless it involves Apple, money, and lawyers. In short, research is one thing. Commercial success is quite another.
I was, therefore, interested to see “Study Finds that E-Discovery Using Enterprise-Wide Search Improves Results and Reduces Costs.” The information about this study appeared in the ZL Technologies’ blog The Modern Archivist in June 2010. You can read the story “New Scientific Paper for TREC Conference”, which was online this morning (August 10, 2010). In general information about TREC is hard to find. Folks who post links to TREC presentations often find that the referenced document is a very short item or no longer available. However, you can download the full “scientific paper” from the TREC Web site.
The point of the ZL write up is summarized in this passage:
Using two fully-independent teams, ZL tested the increased responsiveness of the enterprise-wide approach and the results were striking: The enterprise-wide search yielded 77 custodians and 302 responsive email messages, while the custodian approach failed to identify 84% of the responsive documents.
The goose translates this to mean that there’s no shortcut when hunting for information. No big surprise to the goose, but probably a downer to those who like attention deficit disorder search systems.
So what’s a ZL Technologies? The company says:
[It] provides cutting-edge enterprise software solutions for e-mail and files archiving for regulatory compliance, litigation support, corporate governance, and storage management. ZL’s Unified Archive, offers a single unified platform to provide all the above capabilities, while maintaining a single copy and a unified policy across the enterprise. With a proven track record and enterprise clients which include top global institutions in finance and industry, ZL has emerged as the specialized provider of large-scale email archiving for eDiscovery and compliance.
Some information about TREC 2010 appears in “TREC 2010 Web Track Guidelines”. The intent is to describe one “track”, but the information provides some broader information about what’s going on for 2010. The “official” home page for TREC may be useful to some Beyond Search readers.
For more TREC information, you will have to attend the conference or contact TREC directly. The goose is now about to get his feathers ruffled about the availability of presentations that point out that search and retrieval has a long journey ahead.
Reality is often different from what the marketers present in my opinion.
Stephen E Arnold, August 12, 2010
Freebie
Minority Report and Reality: The Google and In-Q-Tel Play
August 9, 2010
Unlike the film “Minority Report”, predictive analytics are here and now. More surprising to me is that most people don’t realize that the methods are in the cateogry of “been there, done that.”
I don’t want to provide too much detail about predictive methods applied to military and law enforcement. Let me remind you, gentle reader, that using numerical recipes to figure out what is likely to happen is an old, old discipline. Keep in mind that the links in this post may go dead at any time, particularly the link to the Chinese write up.
There are companies who have been grinding away in this field for a long time. I worked at an outfit that had a “pretzel factory”. We did not make snacks; we made predictions along with some other goodies.
In this blog I have mentioned over time companies who operate in this sector; for example, Kroll, recently acquired by Altegrity and Fetch Technologies. Now that’s a household name in Sioux City and Seattle. I have even mentioned a project on which I worked which you can ping at www.tosig.com. Other hints and clues are scattered like wacky Johnny Appleseed trees. I don’t plan on pulling these threads together in a free blog post.
© RecordedFuture, 2010. Source: http://www.analysisintelligence.com/
I can direct your attention to the public announcement that RecordedFuture has received some financial Tiger Milk from In-Q-Tel, the investment arm of one of the US government entities. Good old Google via its ventures arm has added some cinnamon to the predictive analytics smoothie. You can get an acceptable run down in Wired’s “Exclusive: Google, CIA Invest in ‘Future’ of Web Monitoring.” I think you want to have your “real journalist” baloney detector on because In-Q-Tel invested in RecordedFuture in January 2010, a fact disclosed on the In-Q-Tel Web site many moons ago. RecordedFuture also has a Web site at www.recordedfuture.com, rich with marketing mumbo jumbo, a video, and some semi-useful examples of what the company does. I will leave the public Web site to readers with some time to burn. If you want to get an attention deficit disorder injection, here you go:
The Web contains a vast amount of unstructured information. Web users access specific content of interest with a variety of Websites supporting unstructured search. The unstructured search approaches clearly provide tremendous value but are unable to address a variety of classes of search. RecordedFuture is aggregating a variety of Web-based news and information sources and developing semantic context enabling more structured classes of search. In this presentation, we present initial methods for accessing and analyzing this structured content. The RJSONIO package is used to form queries and manage response data. Analytic approaches for the extracted content include normalization and regression approaches. R-based visualization approaches are complemented with data presentation capabilities of Spotfire.
Storm Warnings for OneRiot?
August 2, 2010
Short honk: Search and content processing vendors face a tough market. Some outfits have figured out how to make money; two examples are Autonomy and Exalead. Google is an ad agency and not knocking the socks off the enterprise search crowd. Microsoft is stuffing search into everything in hopes of getting CALs, so its efforts don’t line up with what other outfits do. IBM is in the open source search hoard. Real time search outfits like OneRiot.com had an angle, but if “Layoffs, Reshuffle at OneRiot” is on the money, OneRiot.com may be struggling. I like OneRiot.com. I don’t want to put it on the list with former search engines along with Convera, Delphes, and Entopia. Here’s hoping, but there are four outfits in Europe hanging by a thread. Tough times the 2010s.
Stephen E Arnold, August 2, 2010
Freebie
Exclusive Interview: Mike Horowitz, Fetch Technologies
July 20, 2010
Savvy content processing vendors have found business opportunities where others did not. One example is Fetch Technologies, based in El Segundo, California. The company was founded by professors at the University of Southern California’s Information Sciences Institute. Since the firm’s doors opened in the late 1990s, Fetch has developed a solid clientele and a reputation for cracking some of the most challenging problems in information processing. You can read an in-depth explanation of the Fetch system in the Search Wizards Speak’s interview with Mike Horowitz.
The Fetch solution uses artificial intelligence and machine learning to intelligently navigate and extract specific data from user specified Web sites. Users create “Web agents” that accurately and precisely extract specific data from Web pages. Fetch agents are unique in that they can navigate through form fields on Web sites, allowing access to data in the Deep Web, which search engines generally miss.
You can learn more about the company and its capabilities in an exclusive interview with Mike Horowitz, Fetch’s chief product officer. Mr. Horowitz joined Fetch after a stint at Googler.
In the lengthy discussion with Mr. Horowitz, he told me about the firm’s product line up:
Fetch currently offers Fetch Live Access as an enterprise software solution or as a fully hosted SaaS option. All of our clients have one thing in common, and that is their awareness of data opportunities on the Web. The Internet is a growing source of business-critical information, with data embedded in millions of different Web sites – product information and prices, people data, news, blogs, events, and more – being published each minute. Fetch technology allows organizations to access this dynamic data source by connecting directly to Web sites and extracting the precise data they need, turning Web sites into data sources.
The company’s systems and methods make use of proprietary numerical recipes. Licensees, however, can program the Fetch system using the firm’s innovative drag-and-drop programming tools. One of the interesting insights Mr. Horowitz gave me is that Fetch’s technology can be configured and deployed quickly. This agility is one reason why the firm has such a strong following in the business and military intelligence markets.
He said:
Fetch allows users to access the data they need for reports, mashups, competitive insight, whatever. The exponential growth of the Internet has produced a near-limitless set of raw and constantly changing data, on almost any subject, but the lack of consistent markup and data access has limited its availability and effectiveness. The rise of data APIs and the success of Google Maps has shown that there are is an insatiable appetite for the recombination and usage of this data, but we are only at the early stages of this trend.
The interview provides useful insights into Fetch and includes Mr. Horowitz’s views about the major trends in information retrieval for the last half of 2010 and early 2011.
Now, go Fetch.
Stephen E Arnold, July 20, 2010
Freebie. I wanted money, but Mr. Horowitz provided exclusive screen shots for my lectures at the Special Library Association lecture in June and then my briefings in Madrid for the Department of State. Sigh. No dough, but I learned a lot.
The Flux in Free Search
July 16, 2010
I liked the good old days when the azure chip crowd and the data satraps would point out that Google was number one in Web search. For most people, the idea that Google.com was the number one place to go when looking for information was okay. Life was simple and the PageRank method generated useful results for most queries. Something was working if two thirds of the search traffic went to the Mountain View outfit, right?
Now something is changing, and I am not sure I like the shift.
First, I read Fast Company’s article “Twitter Now the World’s Fastest Growing Search Engine.” The key factoid comes from Biz Stone (great name for sure). He suggested that Twitter fields 800 million search questions per day or 24 billion queries per month. Google, according to my addled estimates is in the billions per day. The key point is that Twitter continues to gain search traction. Twitter is an information utility. Each time the addled goose writes a goose-based post like this one, we fire it out to Twitter. Believe it or not, people tweet about our articles. Yesterday our Yahoo story was fired around. I am not sure if that helps or hurts Beyond Search, but it is interesting to me.
Second, I read the New York Times’s “Friending the World” article in my hard copy paper on page B-1 and B 8. You may be able to snag a peak at this url under the article title “Facebook Makes Headway Around the World”. Don’t honk at me if you have to pay. The point of the write up is that Facebook is getting big and fast. In India, where Google’s Orkut was the big dog, Facebook is sniffing at Google’s chicken korma. What happens if Facebook’s search starts gaining traction?
My view is that Google may find itself having to work hard as it did in the 1998 to 2003 period. With free search appearing to be in flux, Google may have to take prompt action to deal with the upstarts Facebook and Twitter. My hunch is that these two services continue to grow because people like the addled goose figured neither had much of a change in a Googley world. As I say on my About page, I am often wrong. Perhaps this is an instance of how the addled goose cannot see the 20 somethings accurately?
Stephen E Arnold, July 16, 2010
Freebie
Exalead and Mobile Search
July 5, 2010
Podcast Interview with Paul Doscher, Part 4
Exalead’s Paul Doscher talks about Exalead and mobile search on the July 5, 2010 ArnoldIT Beyond Search podcast. Exalead, now part of the large French software and services company Dassault, continues to ramp up its search, content processing, and search enabled applications. (Now part of Dassault, one of the world’s leading software and services engineering firms acquired Exalead earlier this year. You can read about the acquisition in “Exalead Acquired by Dassault” and “Exalead and Dassault Tie Up, Users Benefit.”
In the July 2010 podcast, Mr. Doscher talks about Exalead and mobile search, one of the hottest sectors in information retrieval. Exalead has assisted one of its clients (Urbanizer.com) has developed an innovative method of locating information.
The Exalead user experience approach makes it possible to deliver access via a range of mobile devices for consumer and special purpose access.
You can listen to the podcast on the ArnoldIT.com Web site. More information about Exalead is available from www.exalead.com.
The ArnoldIT podcast series extends the Search Wizards Speak series of interview beyond text into rich media. Watch this blog for announcements about other rich media programs from the professionals who move information retrieval beyond search.
Stephen E Arnold, July 5, 2010
This one is a freebie