Search and Content Processing: Excess or Exception?
September 2, 2013
I read “From Example to Excess in Silicon Valley.” You can locate the write up in the Sunday, September 1, New York Times, in the business section on page 3. If the higher powers are on your side, this link may display the write, but no promises, not even for Pandas and Penguins.
Like many “real” journalism articles, I am not sure of the intellectual context nor the scythe the author has in his or her hands. I did note this snippet:
Popular social media sites like Twitter, Facebook and Tumblr did not exist in the ‘90s, after all.
Okay. Got it.
The fact followed references to “wonderment”. Examples which triggered a Jonathan Edwards’ moment included the iPhone and Goog-411. The thrill is apparently gone. I read:
It feels as if the promise of the tech world — its utopian ideals and democratic aspirations — has dissolved into much more selfish pursuits of power and wealth.
The knowledge base for the author’s perception is Evgeny Morozov’s To Save Everything: Click Here: The Folly of Technological Solutionism. I wonder if the author of the “Facebook and Tumblr did not exist” statement has read Jacque Ellul’s Work; for example, The Technological Bluff.
Like many of the “insights” and “innovations” which flow through my information stream, what is “new”?
Those who strike it rich spend their money, indulge themselves, and behave the way moguls have behaved for many years. Disagree with Commodore Vanderbilt you might have been punched in the head and then ostracized from the New York elite.
The most recent imbroglio for Google concerns the behavior of a Google founder. On a recent trip to San Francisco, I watched a big, weird sailboat rip past other vessels as a physical reminder that the founder of Oracle has a “real” sailboat. The frenzied behavior which I describe in my forthcoming Citizentekk article about “desperate housewives” describes how some of the high-tech executives to whom I am exposed conduct themselves.
What about search, analytics, and content processing?
Check out the claims from the vendors list in our “what’s happening service”, Overflight at www.arnoldit.com/overflight, www.arnoldit.com/trax, or www.arnoldit.com/taxonomy. Scanning the news the companies are essentially marching in lock step. I am not sure if these companies are innovating.
I find the notion that technology is progress intriguing. On a recent trip to Portugal, I was with a group in a hill town in the Douro Valley. The town once had 2,400 inhabitants. Today, the town has 60 residents. But I took the picture below:
These Portuguese children were playing outside. No iPads. No smartphones. No rubber mats under plastic jungle gyms behind cyclone fences. The kids were not technological.
Technology has delivered what in countries like Portugal? Unemployment because aluminum feet now stomp on grapes. Massive debt as the country tries to cope with the flow of people from rural areas into cities strapped for cash.
For every fanciful indulgence of folks with oodles of cash, technology becomes an expensive way to solve problems which are decidedly slippery.
Leave the youthful millionaires and billionaires to their own devices. At some point the problem will go away. I am confident that Yahooligans and Googlers can mend rifts in Egypt and the suburbs of New Orleans. A new logo and augmented reality headgear are “progress.”
What about search? This slide from one of my 2011 reports continues to generate some email.
The blue line shows that the technology for processing content is not improving as quickly as marketers and pressurized executives fervently hope. Marketers assure the prospect about the wonders of search, content processing, and next-generation analytics.
The reality is that the flow of data is crating a very big gap. Today’s modern systems are not up to the job of doing even rudimentary processing of available digital content.
What’s worse that wretched excess? For me, organizations which continue to license systems which disappoint and often deliver misleading outputs are a more telling indicator of today’s technology problems?
Technology cannot solve problems when those who create it deliver me-too products, when those who procure it assume the “new” system is better, and when allegedly educated adults punch and swipe their way through work.
Excess part of the furniture of living for many with Silicon Valley dreams. The exceptions are far too rare in my opinion.
Stephen E Arnold, September 2, 2013
Sponsored by Xenky
A New Approach to Enterprise Search – Information Governance
September 2, 2013
The article titled Solving the Inadequacies and Failures in Enterprise Search on AIIM addresses the ongoing problems with metadata infrastructure. This problem translates into a major hurdle for all those organizations hoping to utilize enterprise search. According to the author, with an appropriately stringent framework in place, progress is possible. This includes the ability to use search in order to implement actual helpful tools like records management, migration and data privacy. The article states,
“An information governance approach that creates the metadata infrastructure framework to encompass automated intelligent metadata generation, auto-classification, and the use of goal and mission aligned taxonomies is required. From this framework, intelligent metadata enabled solutions can be rapidly developed and implemented. Only then can organizations leverage their knowledge assets to support search, litigation, eDiscovery, text mining, sentiment analysis, and business intelligence. The need for organizations to access and fully exploit the use of their unstructured content won’t happen overnight. “
With these improvements in place, the article suggests that enterprise search will be fixed. Decades of floundering will come to an end if this assertion is proven correct. Features and “bells and whistles” will no longer reside in the place of actual information and wisdom gleaned from content, what search is actually supposed to provide.
Chelsea Kerwin, September 02, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Xerox Devices Play Fast and Loose with Numbers
September 2, 2013
Recently, it has been discovered that some Xerox scanners and copiers, specifically those in its WorkCentre line, have begun to make changes to documents. The compression technology meant to save memory is making users question their own recall by swapping out similar-looking numerals. The ensuing havoc can be anything from mildly annoying to downright dangerous, depending on the document in question. German blogger D. Kriesel goes into the technical details in, “Xerox Scanners/ Photocopiers Randomly Alter Numbers in Scanned Documents.”
From the post, it looks like Kriesel was part of the group that originally discovered the problem while reviewing some construction plans they were working on. (Hooray, due diligence!) He has worked hard to reproduce the error, and documents his observations with plenty of screenshots. He writes (and amends):
“There seems to be a correlation between font size, scan dpi used. I was able to reliably reproduce the error for 200 DPI PDF scans w/o OCR, of sheets with Arial 7pt and 8pt numbers. Overall it looks like some sort of compression algorithm using patches more than once (I think I could even identify some equally-pixeled eights).
“Edit: It seems that the above thought was not that wrong at all. Several mails I got suggest that the Xerox machines use JBIG2 for compression. Even though the specification only cover the JBIG2 decompression, in reality, there is often created a dictionary of image patches found to be ‘similar’. Those patches then get reused instead of the original image data, as long as the error generated by them is not ‘too high.’ Makes sense.”
Kriesel goes on to note that the JGIB2 compression standard gives “no guarantee that parts of the scanned image actually come from the corresponding place on the paper.” So, as long as the bit of the image looks right, the software goes with it. That may be just fine for purely aesthetic applications, or even number-free text (if there is such a thing), but numbers kinda need to adhere to the original.
As Kriesel points out, the snafu raises serious questions about Xerox’s quality-control process. He continues to add information to this post as the issue develops. Will the tech company adequately address the error?
Cynthia Murrell, September 02, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
True Facts about Open Source
September 2, 2013
Open source is lauded as the end all solution to all software needs and the end of proprietary software. PBS’s Idea Lab speaks to the contrary in “6 Things To Know About Successful (And Failed) Open Source Software.” Rich Gordon researched how open source software was adopted and he discovered that University of Massachusetts faculty members Charles Schweik and Robert English had already done most of the legwork. The pair analyzed open source projects hosted on SourceForge and classified them into two categories: initiation stage and growth stage. With the addition of SourceForge survey of 1400 developers, they gathered their results. Gordon summarized their results.
Rich found from the research that most open source projects are not successful, but they all share the common characteristics of a defined vision, clear goals, defined set of users, and a modular architecture for others to work on. The biggest common factor being effective leadership. Open source software really takes off when the developers are the actual end users, which comes into play with trying to find collaborators. You can be sure that if someone has a specific need, another person in the world has it too, so finding teammates is not as hard via the Internet. As the project progresses beginning features often lose their importance, just as success cannot be measured through a large-scale adoption.
One of the main factors of success for either open source stage is this:
“…[W]hether the people leading the project have demonstrated leadership by articulating a clear vision, having a professional web presence and maintaining an active bug-tracking system or other communication platform for interacting with the user community.”
Dedication, a strong team, and equally strong leaders are the key to a successful open source project. Roots only spread as far as the sun provides nourishment.
Whitney Grace, September 02, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Big Data Joins The Justice League
September 1, 2013
The Justice League’s headquarters, either the Hall of Justice or the Watch Tower, has state of the art equipment to track bad guys and their criminal activities. We puny mortals might actually have a tool to put Batman’s own deductive skills to shame with big data, says The News Factor in the article, “Watch Out, Terrorists: Big Data Is On The Case.” Big data is nothing new, we just finally have the technology to aggregate the data and follow patterns using data mining and data visualization.
The Institute for the Study of Violent Groups is searching through ten years of data about suspected groups and individuals involved with terrorism and other crimes. The Institute is discovering patterns and information that was never possible before. Microsoft’s security researchers are up to their eyeballs in data on a daily basis that they analyze for cyber attacks. Microsoft recently allocated more resources to develop better network analytical tools.
The article says that while these organizations’ efforts are praiseworthy, the only way to truly slow cyber crime is to place a filter over the entire Internet. Here comes the company plug:
“That’s where new data-visualization technology, from vendors such as Tableau and Tibco Software, hold potential for making a big difference over time. These tools enable rank-and-file employees to creatively correlate information and assist in spotting, and stopping, cybercriminals.”
Big data’s superpowers are limited to isolated areas and where it has been deployed. Its major weakness is the entire Internet. Again, not the end all answer.
Whitney Grace, September 01, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Social Media Can Prevent Death
September 1, 2013
In addition to exercise and therapy, there might be another way to lower the suicide rate among veterans. Benton Pena takes a look at how, “Monitoring Amicable Media To Cut A Troops Self-Murder Rate.” Big data specialists believe that by watching veterans’ amicable social media for despondency signs they will be able to intervene at the proper moment. Using analytics, the specialists would inspect thousands of posts for key terms and other red flags. Dubbed the Durkheim Project, the goal is to build algorithms to track the phrases or words that are predictive of suicide.
The way veterans use social media is a direct reflection of their attitude. It corresponds with doctors’ notes about how veterans behave, such as a healthy patient focusing on hygiene while an unhealthy one will segway the conversation onto restlessness and fears. Monitoring social media over time paints a picture of the patient’s mood.
“This kind of sundry language, as good as a shorthand used on amicable media, can be intensely severe to analyze, pronounced Sid Probstein, a arch record officer for Attivio, that is obliged for that analysis. How those phrases change over time can also be a warning sign, Probstein said, so a outrageous volume of information has to be collected from content messages, Twitter, Facebook, and other amicable media outlets and analyzed.”
Social media has become a cache all of confessions and random thoughts, sort of like journals from the days of old. Unlike private journals, which were usually kept hidden, social media can be monitored and analyzed instantly. Preventing veteran suicides is important to post-war recovery and the work by the Durkheim Projects may indeed contribute to the saving of lives.
Whitney Grace, September 01, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search