Social Search: Don Quixote Is Alive and Well
January 18, 2013
Here I float in Harrod’s Creek, Kentucky, an addled goose. I am interested in other geese in rural Kentucky. I log into Facebook, using a faux human alias (easier than one would imagine) and run a natural language query (human language, of course). I peck with my beak on my iPad using an app, “Geese hook up 40027.” What do I get? Nothing, Zip, zilch, nada.
Intrigued I query, “modern American drama.” What do I get? Nothing, Zip, zilch, nada.
I give up. Social search just does not work under my quite “normal” conditions.
First, I am a goose spoofing the world as a human. Not too many folks like this on Facebook, so my interests and my social graph is useless.
Second, the key words in my natural language query do not match the Facebook patterns, crafted by former Googlers and 20 somethings to deliver hook up heaven and links to the semi infamous Actor’s Theater or the Kentucky Center.
Social search is not search. Social search is group centric. Social search is an outstanding system for monitoring and surveillance. For information retrieval, social search is a subset of information retrieval. How do semantic methods improve the validity of the information retrieved? I am not exactly sure. Perhaps the vendors will explain and provide documented examples?
Third, without context, my natural language queries shoot through the holes in the Swiss Cheese of the Facebook database.
After I read “The Future of Social Search,” I assumed that information was available at the peck of my beak. How misguided was I? Well, one more “next big thing” in search demonstrated that baloney production is surging in a ailing economy. Optimism is good. Crazy predictions about search are not so good. Look at the sad state of enterprise search, Web search, and email search. Nothing works exactly as I hope. The dust up between Hewlett Packard and Autonomy suggests that “meaning based computing” is a point of contention.
If social search does not work for an addled goose, for whom does it work? According to the wild and crazy write up:
Are social networks (or information networks) the new search engine? Or, as Steve Jobs would argue, is the mobile app the new search engine? Or, is the question-and-answer formula of Quora the real search 2.0? The answer is most likely all of the above, because search is being redefined by all of these factors. Because search is changing, so too is the still maturing notion of social search, and we should certainly think about it as something much grander than socially-enhanced search results.
Yep, Search 2.0.
But the bit of plastic floating in my pond is semantic search. Here’s what the Search 2.0 social crowd asserts:
Let’s embrace the notion that social search should be effortless on the part of the user and exist within a familiar experience — mobile, social or search. What this foretells is a future in which semantic analysis, machine learning, natural language processing and artificial intelligence will digest our every web action and organically spit out a social search experience. This social search future is already unfolding before our very eyes. Foursquare now taps its massive check in database to churn out recommendations personalized by relationships and activities. My6sense prioritizes tweets, RSS feeds and Facebook updates, and it’s working to personalize the web through semantic analysis. Even Flipboard offers a fresh form of social search and helps the user find content through their social relationships. Of course, there’s the obvious implementations of Facebook Instant Personalization: Rotten Tomatoes, Clicker and Yelp offer Facebook-personalized experiences, essentially using your social graph to return better “search” results.
Semantics. Better search results. How does that work on Facebook images and Twitter messages?
My view is that when one looks for information, there are some old fashioned yardsticks; for example, precision, recall, editorial policy, corpus provenance, etc.
When a clueless person asks about pop culture, I am not sure that traditional reference sources will provide an answer. But as information access is trivialized, the need for knowledge about the accuracy and comprehensiveness of content, the metrics of precision and recall, and the editorial policy or degree of manipulation baked into the system decreases.
See Advantech.com for details of a surveillance system.
Search has not become better. Search has become subject to self referential mechanisms. That’s why my goose queries disappoint. If I were looking for pizza or Lady Gaga information, I would have hit pay dirt with a social search system. When I look for information based on an idiosyncratic social fingerprint or when I look for hard information to answer difficult questions related to client work, social search is not going to deliver the input which keeps this goose happy.
What is interesting is that so many are embracing a surveillance based system as the next big thing in search. I am glad I am old. I am delighted my old fashioned approach to obtaining information is working just fine without the special advantages a social graph delivers.
Will today’s social search users understand the old fashioned methods of obtaining information? In my opinion, nope. Does it matter? Not to me. I hope some of these social searchers do more than run a Facebook query to study for their electrical engineering certification or to pass board certification for brain surgery.
Stephen E Arnold, January 18, 2013
Dr. Jerry Lucas: Exclusive Interview with TeleStrategies ISS Founder
January 14, 2013
Dr. Jerry Lucas, founder of TeleStrategies, is an expert in digital information and founder of the ISS World series of conferences. “ISS” is shorthand for “intelligence support systems.” The scope of Mr. Lucas’ interests range from the technical innards of modern communications systems to the exploding sectors for real time content processing. Analytics, fancy math, and online underpin Mr. Lucas’ expertise and form the backbone of the company’s training and conference activities.
What makes Dr. Lucas’ viewpoint of particular value is his deep experience in “lawful interception, criminal investigations, and intelligence gathering.” The perspective of an individual with Dr. Lucas’ professional career offers an important and refreshing alternative to the baloney promulgated by many of the consulting firms explaining online systems.
Dr. Lucas offered a more “internationalized” view of the Big Data trend which is exercising many US marketers’ and sales professionals’ activities. He said:
“Big Data” is an eye catching buzzword that works in the US. But as you go east across the globe, “Big Data” as a buzzword doesn’t get traction in the Middle East, Africa and Asia Pacific Regions if you remove Russia and China. One interesting note is that Russian and Chinese government agencies only buy from vendors based in their countries. The US Intelligence Community (IC) has big data problems because of the obvious massive amount of data gathered that’s now being measured in zettabytes. The data gathered and stored by the US Intelligence Community is growing beyond what typical database software products can handle as well as the tools to capture, store, manage and analyze the data. For the US, Western Europe, Russia and China, “Big Data” is a real problem and not a hyped up buzzword.
Western vendors have been caught in the boundaries between different countries’ requirements. Dr. Lucas observed:
A number of western vendors made a decision because of the negative press attention to abandon the global intelligence gathering market. In the US Congress Representative Chris Smith (R, NJ) sponsored a bill that went nowhere to ban the export of intelligence gathering products period. In France a Bull Group subsidiary, Amesys legally sold intelligence gathering systems to Lybia but received a lot of bad press during Arab Spring. Since Amesys represented only a few percent of Bull Group’s annual revenues, they just sold the division. Amesys is now a UAE company, Advanced Middle East Systems (Ames). My take away here is governments particularly in the Middle East, Africa and Asia have concerns about the long term regional presence of western intelligence gathering vendors who desire to keep a low public profile. For example, choosing not to exhibit at ISS World Programs. The next step by these vendors could be abandoning the regional marketplace and product support.
The desire for federated information access is, based on the vendors’ marketing efforts, is high. Dr. Lucas made this comment about the existence of information silos:
Consider the US where you have 16 federal organizations collecting intelligence data plus the oversight of the Office of Director of National Intelligence (ODNI). In addition there are nearly 30,000 local and state police organizations collecting intelligence data as well. Data sharing has been a well identified problem since 9/11. Congress established the ODNI in 2004 and funded the Department of Homeland Security to set up State and Local Data Fusion Centers. To date Congress has not been impressed. DNI James Clapper has come under intelligence gathering fire over Benghazi and the DHS has been criticized in an October Senate report that the $1 Billion spent by DHS on 70 state and local data fusion centers has been an alleged waste of money. The information silo or the information stovepipe problem will not go away quickly in the US for many reasons. Data cannot be shared because one agency doesn’t have the proper security clearances, job security which means “as long as I control access the data I have a job,” and privacy issues, among others.
The full text of the exclusive interview with Dr. Lucas is at http://www.arnoldit.com/search-wizards-speak/telestrategies-2.html. The full text of the 2011 interview with Dr. Lucas is at this link. Stephen E Arnold interviewed Dr. Lucas on January 10, 2013. The full text of the interview is available on the ArnoldIT.com subsite “Search Wizards Speak.”
Worth reading.
Donald Anderson, January 14, 2013
Semantria Goes Pentalingual
January 1, 2013
Semantria is a text analytics and sentiment analysis solutions company. In order to reach a new clientele as well as work with companies with an international base, “Semantria Announces Content Classification and Categorization Functionality in 5 Languages.” Semantria now speaks English, Spanish, French, German, and Portuguese.
To power its categorization functionality, Semantria uses the Concept Matrix. It is a large thesaurus that used Wikipedia in its beta phase. After digesting Wikipedia, the Concept Matrix created lexical connections between every concept within it. Semantria developed the technology with Lexalytics and the Lexalytics Salience 5 engine powers the Concept Matrix. The Concept Matrix is a one of a kind tool that organizes and classifies information:
“Seth Redmore, VP Product Management and Marketing at Lexalytics, explains; ‘Text categorization requires an understanding of how things are alike. Before the Concept Matrix, you’d have to use a massive amount of training data to “teach” your engine, i.e. ‘documents about food’.’ And, he continues, ‘With the Concept Matrix, the training’s already done, and by providing Semantria a few keywords, it drops your content into the correct categories.’ ”
A piece of software that does all the organizing for you, how amazing is that? If it “ate” Wikipedia and made lexical connections, what could it do with Google, Bing, the entire Internet?
Whitney Grace, January 01, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Gannett Wants Better Content Management
December 31, 2012
Gannett is a media and marketing company that represents USA Today, Shop Local, and Deal Chicken. One can imagine that such a prolific company has a lot of data that needs to be organized and made workable. Marketing and media companies are on the forefront of the public eye and if they do not get their client’s name out in the open, then it means less dollars in the bank for them. One way this could happen is if they do not centralize a plan for information governance. The good news is “Gannett Chooses ITM for Centralized Management of Reference Vocabularies,”as reported via the Mondeca news Web site. Mondeca is a company that specializes in knowledge management with a variety of products that structure knowledge in many possible ways. Its ITM system was built to handle knowledge structures from conception to usage and the maintenance process afterward. ITM helps organize knowledge, accessing data across multiple platforms, improved search and navigation, and aligning/merging taxonomies and ontologies.
Gannet selected Mondeca for these very purposes:
“Gannett needed software to centrally manage, synchronize, and distribute its reference vocabularies across a variety of systems, such as text analytics, search engines, and CMS. They also wanted to create vocabularies and enrich them using external sources, with the help of MEI. Gannett selected ITM as the best match for the job. At the end of the project, Gannett intends to achieve stronger semantic integration across its content delivery workflow.”
Gannett is sure to discover that Mondeca’s ITM software will provide them with better control over its data, not to mention new insights into its knowledge base. Data organization and proper search techniques are the master key to any organization’s success.
Whitney Grace, December 31, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
New Offering from Attensity Poised to Blow Up ROI
December 21, 2012
Analytics tools from social-minded vendors are now using text analytics technology to report on market perception and consumer preferences before the product launch. BtoB reported on this new offering in the article, “Attensity Releases Analytics Tools for Product Introductions.”
Now, businesses will be able to monitor product introductions with this new tool from Attensity. It is only a matter of time before we start seeing specific technology solutions to evaluate and analyze every specific phase of the product development cycle.
Both new insights for further developments and opportunities to avoid risk will be possible with New Product Introduction.
The article states:
“The tool uses text analytics technology to report on market perception and preferences before roll out, uncovering areas of risk and opportunity, according to the company. It then tracks customer reception upon and after the launch to determine the impact of initial marketing efforts. Attensity said the New Product Introduction tool is one in a series of planned social text-analytics applications devoted to customer care, branding, and campaign and competitive analytics.”
Many organizations will be chomping at the bit to utilize this technology since it offers an easy way to improve ROI.
Megan Feil, December 21, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
Predictive Coding: Who Is on First? What Is the Betting Game?
December 20, 2012
I am confused, but what’s new? The whole “predictive analytics” rah rah causes me to reach for my NRR 33 dB bell shaped foam ear plugs.
Look. If predictive methods worked, there would be headlines in the Daily Racing Form, in the Wall Street Journal, and in the Las Vegas sports books. The cheerleaders for predictive wizardry are pitching breakthrough technology in places where accountability is a little fuzzier than a horse race, stock picking, and betting on football games.
The godfather of cost cutting for legal document analysis. Revenend Thomas Bayes, 1701 to 1761. I heard he said, “Praise be, the math doth work when I flip the numbers and perform the old inverse probability trick. Perhaps I shall apply this to legal disputes when lawyers believe technology will transform their profession.” Yep, partial belief. Just the ticket for attorneys. See http://goo.gl/S5VSR.
I understand that there is PREDICTION which generates tons of money to the person who has an algorithm which divines which nag wins the Derby, which stock is going to soar, and which football team will win a particular game. Skip the fuzzifiers like 51 percent chance of rain. It either rains or it does not rain. In the harsh world of Harrod’s Creek, capital letter PREDICTION is not too reliable.
The lower case prediction is far safer. The assumptions, the unexamined data, the thresholds hardwired into the off-the-shelf algorithms, or the fiddling with Bayesian relaxation factors is aimed at those looking to cut corners, trim costs, or figure out which way to point the hit-and-miss medical research team.
Which is it? PREDICTION or prediction.
I submit that it is lower case prediction with an upper case MARKETING wordsmithing.
Here’s why:
I read “The Amazing Forensic Tech behind the Next Apple, Samsun Legal Dust Up (and How to Hack It).” Now that is a headline. Skip the “amazing”, “Apple”, “Samsung,” and “Hack.” I think the message is that Fast Company has discovered predictive text analysis. I could be wrong here, but I think Fast Company might have been helped along by some friendly public relations type.
Let’s look at the write up.
First, the high profile Apple Samsung trial become the hook for “amazing” technology. the idea is that smart software can grind through the text spit out from a discovery process. In the era of a ballooning digital data, it is really expensive to pay humans (even those working at a discount in India or the Philippines) to read the emails, reports, and transcripts.
Let a smart machine do the work. It is cheaper, faster, and better. (Shouldn’t one have to pick two of these attributes?)
Fast Company asserts:
“A couple good things are happening now,” Looby says. “Courts are beginning to endorse predictive coding, and training a machine to do the information retrieval is a lot quicker than doing it manually.” The process of “Information retrieval” (or IR) is the first part of the “discovery” phase of a lawsuit, dubbed “e-discovery” when computers are involved. Normally, a small team of lawyers would have to comb through documents and manually search for pertinent patterns. With predictive coding, they can manually review a small portion, and use the sample to teach the computer to analyze the rest. (A variety of machine learning technologies were used in the Madoff investigation, says Looby, but he can’t specify which.)
Visualization Woes: Smart Software Creates Human Problems
December 10, 2012
I am not dependent on visualization to figure out what data imply or “mean.” I have been a critic of systems which insulate the professional from the source information and data. I read “Visualization Problem”. The article focuses on the system user’s inability to come up with a mental picture or a concept. I learned:
I know I am supposed to get better with time, but it feels that the whole visualization part shouldn’t be this hard, especially since I can picture my wonderland so easily. I tried picturing my tulpa in my wonderland, in black/white voids, without any background, even what FAQ_man guide says about your surroundings, but none has worked. And I really have been working on her form for a long time.
A “tulpa” is a construct. But the key point is that the software cannot do the work of an inspired human.
The somewhat plaintive lament trigger three thoughts about the mad rush to “smart software” which converts data into high impact visuals.
First, a user may not be able to conceptualize what the visualization system is supposed to deliver in the first place. If a person becomes dependent on what the software provides, the user is flying blind. In the case of the “tulpa” problem, the result may be a lousy output. In the case of a smart business intelligence system such as Palantir’s or Centrifuge Systems’, the result may be data which are not understood.
Second, the weak link in this shift from “getting one’s hands dirty” by reviewing data, looking at exceptions, and making decisions about the processes to be used to generate a chart or graph puts the vendor in control. My view is that users of smart software have to do more than get the McDonald’s or KFC’s version of a good meal.
Third, with numerical literacy and a preference for “I’m feeling lucky” interfaces, the likelihood of content and data manipulation increases dramatically.
I am not able to judge a good “tulpa” from a bad “tulpa.” I do know that as smart software diffuses, the problem software will solve is the human factor. I think that is not such a good thing. From the author’s pain learning will result. For a vendor, from the author’s pain motivation to deliver predictive outputs and more training wheel functions will be what research and develop focuses upon.
I prefer a system with balance like Digital Reasoning’s: Advanced technology, appropriate user controls, and an interface which permits closer looks at data.
Stephen E Arnold, December 10, 2012
Exclusive Interview with the CTO of Cybertap
December 4, 2012
Cybertap is a company which pushes beyond key word search. The firm’s technology permits a different type of information retrieval.
In an exclusive interview with ArnoldIT, Cybertap revealed that hidden within the network traffic are malicious attacks, personal and medical information leaks, and insider theft of intellectual property and financial information. Cybertap’s clients use Recon to keep tabs on the good and the bad being done on their networks and who’s doing it, so that they can take the proper actions to mitigate any damage and bring the individuals to account.
Dr. Russ Couturier, Chief Technology Officer of Cybertap, recently granted an exclusive interview to the Arnold Information Technology Search Wizards Speak series to discuss Cybertap Recon, a product that applies big data analytics to captured network traffic to give organizations unparalleled visibility into what is transpiring both on and to their networks.
Until recently, the firm’s technology was available to niche markets. However, due to the growing demand to identify potentially improper actions, Cybertap has introduced its technology to organizations engaged in fraud detection and related disciplines. The Cybertap system facilitates information analysis in financial services, health care, and competitive intelligence.
Dr. Couturier said:
Recon is able to decrease risk and improve your situational awareness by decreasing the time to resolution of a cyber event and by improving your knowledge of what happened during a cyber event. We are incorporating big data analysis techniques to reduce the meaningless data and quantify the meaningful information using categorization, semantic, and sentiment tools,” Couturier said. “Recon presents the information as it was originally seen so analysts can follow conversations and threads in context.
The firm’s system processes content, embedded files, attachments, attributes, network protocol data, metadata, and entities. Developers incorporated semantic analysis tools to “roll-up” large volumes of data into what they call “themes” and “topics.” This aggregation enables researchers to more quickly decide whether information is relevant.
He added:
Mash ups and data fusion are crucial when dealing with big data. You can search, visualize, link, and reconstruct exactly what happened from the primary source and reduce investigation times by hours or days.
Cybertap is one of a handful of content processing firms taking findability to a new level of utility. The firm’s system combines next-generation methods with a search box and visualization to provide unique insights into information processed by the Cybertap system. The full text of the interview is available at www.arnoldit.com/search-wizards-speak/cybertap.html.
Cybertap LLC’s vision is to integrate the best-of-breed cyber forensics, analysis, and security technologies. Cybertap serves all markets requiring solutions next generation data analysis tools including: federal government markets, both civilian and Department of Defense agencies; commercial markets; and state and local governments. The privately held company has offices located in Vienna, Virginia; Englewood, Colorado and Palmer, Massachusetts.
The system is important because it underscores the opportunities for innovators in information retrieval and analysis. Cybertap combines search with a range of functions which allow a combination of alerting, discovering, and finding. In my experience, few products offer this type of pragmatic insight without the costs and complexities of traditional systems built by cobbling together different vendors’ products.
Search Wizards Speak is the largest collection of interviews with innovators and developers working in search and content processing. An index to the more than 60 interviews is available at http://www.arnoldit.com/search-wizards-speak/.
Additional information about Cybertap LLC is available at http://www.cybertapllc.com.
Stephen E Arnold, December 4, 2012
IntelTrax Summary November 16 to November 22
November 26, 2012
This week the IntelTrax advanced intelligence blog published some excellent pieces regarding the state of big data and analytics technologies.
“Diversity is the New Key for Analytic Success” looks at how Burberry is using analytics technology to analyze customer buying behavior.
The article states:
“SAP is pushing further in this vein and has this week announced its SAP Customer 360 transactional system which the firm says is being used by fashion retailer Burberry to analyse customer buying behaviour and provide on the floor sales staff with access to big data analytics on mobile devices. This “immediate information” is then (in theory) available to help these same staff personalise fashion advice to customers.
Do we really want this amount of technology in our lives?
SAP’s other Co-CEO Bill McDermott has predicted that by 2030 there will be an additional two billion consumers on the planet by 2030 and … “They want to purchase in the digital world,” he said.”
Another interesting story, “Big Data Moves Continue” announced some impressive news in the big data community.
The article states:
“Cray announced it was awarded a contract to deliver a uRiKA graph-analytics appliance to the Oak Ridge National Laboratory (ORNL). Analysts at ORNL will use the uRiKA system as they conduct research in healthcare fraud and analytics for a leading healthcare payer. The uRiKA system is a Big Data appliance for graph analytics that enables discovery of unknown relationships in Big Data. It is a highly scalable, real-time platform for ad hoc queries, pattern-based searches, inferencing and deduction.
“Identifying healthcare fraud and abuse is challenging due to the volume of data, the various types of data, as well as the velocity at which new data is created. YarcData’s uRiKA appliance is uniquely suited to take on these challenges, and we are excited to see the results that will come from the strategic analysis of some very large and complex data sets,” said Jeff Nichols, Associate Laboratory Director for Computing and Computational Sciences at Oak Ridge National Laboratory.”
“Big Data Expert Overlooks the Obvious” shares some interesting thoughts on the future of big data. However, it leaves out some pretty important things.
The article states:
“The goal of all the discussion around big data and data analysis is, as I’ve argued, not to make the wrong decision faster, but to develop the best decision at the right time and deliver the information to the people that most need the information. In an Information Week column Wednesday, Tony Byrne argued small data beat big data in the presidential election.
Call it business intelligence, data analysis or predictive analytics, IT’s role here is to provide a foundation for your company to make the right decisions. Those decisions might be what to charge passengers for seats on a flight, how much to charge to for a season ticket or how many widgets to create to strike the right balance among manufacturing costs, inventory and availability. These decisions are fundamental to business success.”
When it comes to finding big data intelligence solutions that work for your organization, it is important that businesses find a trusted provider. Digital Reasoning’s Synthesys helps streamline expenses for intelligence, healthcare and finance clients.
Jasmine Ashton, November 26, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
IntelTrax Summary: November 9 to November 15
November 19, 2012
This week, the IntelTrax advanced intelligence blog published some important information regarding the state of big data and its impact on some of the world’s most up and coming industries.
“The Ethics of Big Data” examines the possible ethical quandries that develop from big data analysis. However, despite the potential ethical challenges that face the industry in the end the pros, outweigh the cons.
The article states:
“Yet it cuts both ways: Consumers also can take advantage of the democratizing effects of big data. In fact, there’s an app for that: RateDriverenables users to quickly determine the appropriate rate they should expect to pay for attorney’s fees in 51 U.S. markets.
Big data holds promise to improve the legal profession and the quality of service that we deliver to clients, says Carolyn Elefant, a Washington, D.C., attorney and technology evangelist. “Significantly, big data would inject a strong dose of transparency into lawyer marketing and assist consumers in hiring lawyers. How so? Because big data can be used to show the likelihood of winning a case and the true cost.”
An article that shows the way that big data is transforming the healthcare industry is, “Big Data is the New Anti-Virus.” However, it looks at it from the angle of computer health and how to better detect viruses.
The article states:
“With Seculert Sense, customers can now upload log files using a Secure FTPS tunnel, or upstream logs through Syslog directly from a secure web gateway or web proxy devices, or log aggregation solution for real-time detection and forensics investigation. Built on Amazon Elastic MapReduce, Seculert Sense launches a “big data analysis cloud” that rapidly analyzes an organization’s vast amount of log data, going back months or even years and comparing it against the thousands of unique malware samples collected by Seculert. Over time, Seculert Sense continues to digest huge amounts of data in order to identify persistent attacks that are going undetected by next generation IPs, Anti-Bot and Secure Web Gateways.”
Big data analytics is not only taking off in America, it is becoming a world-wide phenomenon. “Asian Analytics on the Verge of a Boom” describes the potential for big data analytics success in Asia.
According to the article,
“Two different consumer analytics platforms from Singapore Management University (SMU) and StarHub respectively aim to provide insights into consumer behavior, so companies can develop and tailor initiatives that will be more relevant to and better received by customers.
Rajesh Balan, director of LiveLabs Urban Lifestyle Innovation Platform at SMU, said the platform will enable organizations to utilize real-time insights, helping their campaigns go to market and assess the outcome faster. On the consumer end, it will turn what most users perceive as intrusive spam messages on their phones into something useful.”
It does not matter what country you live in or what industry you work in. Big Data analytics technology is becoming too important to overlook. Digital Reasoning has been using automated understanding of big data for nearly a decade.
Jasmine Ashton, November 19, 2012
Sponsored by ArnoldIT.com, developer of Augmentext