Google Docs Gets PDF Search
March 8, 2012
Short honk: We learned this morning that Google has added PDF search to Google Docs. “PDF Search and Comments Features Added to Google Docs” said:
Google added a feature to Docs that lets users search for text inside PDFs in their documents list, thanks to optical character recognition technology. That has been extended so that users can now search for and copy highlighted text when they open a scanned PDF, such as a fax or a receipt.
You can get more information from Google’s blog post, but these are too upbeat and cheerful for a dour, addled, old goose.
Stephen E Arnold, March 8, 2012
Sponsored by Pandia.com
More Allegations about Fast Search Impropriety
March 8, 2012
With legions of Microsoft Certified Resellers singing the praises of the FS4SP (formerly the Fast Search & Transfer search and retrieval system), sour notes are not easily heard. I don’t think many users of FS4SP know or care about the history of the company, its university-infused technology, or the machinations of the company’s senior management and Board of Directors. Ancient history.
I learned quite a bit in my close encounters with the Fast ESP technology. No, ESP does not mean extra sensory perception. ESP allegedly meant the enterprise search platform. Fast Search, before its purchase by Microsoft, was a platform, not a search engine. The idea was that the collection of components would be used to build applications in which search was an enabler. The idea was a good one, but search based applications required more than a PowerPoint to become a reality. The 64 bit Exalead system, developed long before Dassault acquired Exalead, was one of the first next generation, post Google systems to have a shot at delivering a viable search based application. (The race for SBAs, in my opinion, is not yet over, and there are some search vendors like PolySpot which are pushing in interesting new directions.) Fast Search was using marketing to pump up license deals. In fact, the marketing arm was more athletic than the firm’s engineering units. That, in my view, was the “issue” with Fast Search. Talk and demos were good. Implementation was a different platter of herring five ways.
Fast Search block diagram circa 2005. The system shows semantic and ontological components, asserts information on demand, and content publishing functions—all in addition to search and retrieval. Similar systems are marketed today, but hybrid content manipulation systems are often a work in progress in 2012. © Fast Search & Transfer
I once ended up with an interesting challenge resulting from a relatively large-scale, high-profile search implementation. Now you may have larger jobs than I typically get, but I was struggling with the shift from Inktomi to the AT&T Fast search system in order to index the public facing content of the US federal government.
Inktomi worked reasonably well, but the US government decided in its infinite wisdom to run a “free and open competition.” The usual suspects responded to the request for proposal and statement of work. I recall that “smarter than everyone else” Google ignored the US government’s requirements.
This image is from a presentation by Dr. Lervik about Digital Libraries, no date. The slide highlights the six key functions of the Fast Search search engine. These are extremely sophisticated functions. In 2012, only a few vendors can implement a single system with these operations running in the core platform. In fact, the wording could be used by search vendor marketers today. Fast Search knew where search was heading, but the future still has not arrived because writing about a function is different from delivering that function in a time and resource window which licensees can accommodate. © Fast Search & Transfer
Fast Search, with the guidance of savvy AT&T capture professionals, snagged the contract. That was a fateful procurement. Fast Search yielded to a team from Vivisimo and Microsoft. Then Microsoft bought Fast Search, and the US government began its shift to open source search. Another consequence is that Google, as you may know, never caught on in the US Federal government in the manner that I and others assumed the company would. I often wonder what would have happened if Google’s capture team had responded to the statement of work instead of pointing out that the requirements were not interesting.
Protected: Share Info and Party Tips at ShareFEST
March 8, 2012
MessageSolution for Managing SharePoint Data
March 8, 2012
We have here another attempt to tame SharePoint’s content wild ponies: “MessageSolution Showcasing SharePoint Governance and eDiscovery Platform at Microsoft SharePoint Technology Conference 2012”, reports SeattlePi. The write up declares:
By integrating award-winning enterprise archiving policy with SharePoint’s record center functions, MessageSolution has created a framework to automate eDiscovery and manage risk in SharePoint distributed farms. Now SharePoint administrators can regulate compliance, remotely offload Blobs to optimize SharePoint storage space and server performance, as well as search and restore objects instantly without the need for additional IT assistance.
For those who may not know, a Blob (also written BLOB) is a Binary Large Object. By offloading these objects from a SharePoint server using Microsoft approved EBS and RBS protocols, MessageSolution can speed up tasks in the SharePoint environment. This comes in handy when searching and restoring data for legal discovery proceedings. Furthermore, the article asserts, the tool reduces storage requirements with a high compression rate and single-instance storage.
Designed for both mid- and large-scale organizations, the product also sports a unified index; retention management; legal holds with hold notifications; a unified user interface and index; and federated search. The product’s focus on back-end design, according to the write up, means fewer hassles during installation and maintenance as well as a reduced backup time. See the article for more details.
MessageSolution’s SharePoint Management Solutions and its Enterprise eDiscovery Platform will be showcased at the 2012 SharePoint Technology Conference February 28-29 at the Union Square Hilton in San Francisco (booth #808).
Founded in 2002, MessageSolution has assembled a team of veterans from a number of other Silicon Valley enterprises. The company prides itself on providing solutions that simplify the complex processes of archiving and eDiscovery, including managing language differences, for organizations around the world.
Search Technologies stands ready to assist clients with search and content processing services for Microsoft SharePoint environments.
Iain Fletcher, Search Technologies, March 8, 2012
Sponsored by Pandia.com
There Something Brewing in the PLM Market
March 8, 2012
The product lifecycle management (PLM) market is on the rise. The top three PLM vendors, Dassault, Siemens and PTC, are faced with new challenges and new challengers like Autodesk. The Oleg Shilovitsky’s article PLM Perfect Storm 2012 takes a look at how all of this is brewing and what the aftermath may look like.
It seems that one of the draw backs of all top three vendors is their reliance upon the larger companies. They “failed to deliver scalable PLM solution for mid-range manufacturing companies” and they “didn’t deliver any PLM product to the market of small manufacturing companies” either. Even though Autodesk has entered the PLM market and taken on cloud-based solutions, it is not clear what path they will take.
Shilovitsky believes:
“[l]arge PLM companies have a lot of money to play the future PLM game. They have a lot to win as well as to lose, in case something will go wrong.
So where do smaller, more niche companies fit into this perfect storm? It seems to us that they could be the shining light in this storm. Companies like Inforbix will continue to bring a new look to PLM. They provide cutting-edge solutions that help users access product data located in multiple systems. While the Autodesks and the Dassaults of the world are busy slugging it out for PLM supremecy, Inforbix will be revolutionizing PLM solutions.
Jennifer Wensink, March 8, 2012
Customizing SharePoint to Build a Powerful Intranet Portal for your Users
March 8, 2012
An organization’s intranet is the company home for news and collaboration. A successful intranet also needs to be powerful, yet user-friendly. Can SharePoint accomplish all of this out of the box? The topic is addressed in HR Communication’s publication, “Just How Much Customization Does SharePoint Need?”
The answer is not black and white. Amy Schade, Director of User Experience for the Nielsen Norman Group, says one big factor is getting the software to work for your company’s culture. The experts weigh-in:
Does that mean you have to load SharePoint down with third-party applications? It depends. IBF CEO and founder Paul Miller observed that Duke Energy was a winner last year, basically using SharePoint “out of the box.” Bert Sandie, director of technical excellence at Electronic Arts’ EA University, said a great intranet “should look like one thing,” even with third-party apps. It ought to look like tools people use at home, such as Facebook and Flickr. IBF Live co-host Paul Levy observed, ‘SharePoint out of the box doesn’t lend itself to that experience’.
An out of the box design that keeps users in mind may not be SharePoint’s strongest characteristic. But it doesn’t take much to customize your farm to work for the masses. Instead of bogging down your SharePoint system with third party applications, save time and money on implementations with one lean solution. We like the feedback from Fabasoft Mindbreeze customers. The Chamber of Commerce, Upper Austria had this to say:
Fabasoft Mindbreeze Enterprise provides our staff quickly and efficiently with all the information they need. The service center staff is able to respond to requests without delay, as all relevant information is found with only one query. This even further improves the quality of our customer services whilst simultaneously minimizing effort of our staff.
Look for quick results from an out of the box solution at Fabasoft Mindbreeze.
Philip West, March 8, 2012
Sponsored by Pandia.com
Big Data Excitement at the 2012 Strata Conference
March 8, 2012
Don’t get hit by a stray bullet at the big data corral. IT World examines “The Wild West of Big Data.” Fresh from this year’s Strata Conference in Santa Clara, journalist Brian Proffitt describes how the current hubbub around big data mirrors the open source environment of a decade ago (the sense of urgency around a rising technology) and how it doesn’t (the lack of a grass-roots community feel).
Excitement is understandable in this burgeoning field, and Proffit felt the anticipation of profit as a “zing” in the air. However, he seems to long for the atmosphere of yore, when excited hackers fueled the advance of innovation for innovation’s sake, rather than the current domination of cloud advances by corporate types looking to make a buck. While he admits companies acknowledge the open source contributions to their products, they usually do so by way of pointing out their own efforts to give back.
The article observes:
“Big data’s community is purely commercial and without the threat of a big competitor to stand in its way. In this sense, it is more of a gold rush than Linux ever was, because without the checks of the internal and external pressures that the early Linux community endured, there seems to be nothing that can get in big data’s way.
“Which may be why we are seeing, even now, signs from experts that are warning potential customers and the vendors willing to take their money to slow down and start really thinking about what they want to do.”
Excellent advice for any gold rush, we’d say. Proffit feels the same, but observes that such voices of caution were in the minority among the Conference’s speakers. No surprise there; who has time for the voice of reason during a stampede?
Cynthia Murrell, March 8, 2012
Sponsored by Pandia.com
Social Media Analytics: What Are Social Media Data?
March 8, 2012
We have been following Text Analytics News, along with Useful Social Media, in its recent series of interviews with experts in the field of Social Media Analytics. The third installment focuses on what exactly social media data is and where it comes from.
“Social Media Analytics Expert Interview Series: Part 3” is conducted by the Chief Editor of Text Analytics news, Ezra Steinberg. The interviews are published as a lead-up to the Social Media Analytics Summit. The interview panel for this installment includes: Tom H. C. Anderson CEO, OdinText – Anderson Analytics; Nathan Gilliatt Principal, Social Target; Chris Moody COO, Gnip; and Kami Watson Huyse CEO, Zoetica Media. The interview covers experts’ definitions and interpretations of social media data and attempts to resolve confusion about how to use these data. Some insights from the interview follow:
“USM: When you think of “Social Media Data,” what do you think of first? Second?
Kami (Zoetica Media): Social media data is at the heart of understanding your community. Far from being cold and impersonal, data can tell a story that intuition alone cannot deliver. As much as we like to believe that we fully understand our community, what people say and what people do are often very different. Data can help to guide intuition.
For that reason, the second thing I think of when I consider social media data is its importance as a tool to diagnose, prioritize and evaluate what you are doing as an organization and use it to make course corrections.
USM: Do you think there is currently a common understanding as to what constitutes social media data?
Chris (Gnip): Definitely not. For example, some think of social media data as Twitter data because Twitter has done a better job than some other companies of making their data available in a full coverage, reliable, scalable format. The reality is that social media data comes in lots of different forms from lots of different sources. We’re working hard to help companies understand how different types of social data can be useful for different types of analysis.”
The interview focuses on understanding social media data and getting the most out of the analytics that it provides. Focus is also given to social media monitoring vendors and analytical tools, with opinions from the experts on which ones are valuable and how they work. Businesses are learning that considering these opinions and implementing social media is valuable when attempting to learn and understand customers and potential customers. The full interview can be found here and can give insight on this marketing tool and how it works.
Andrea Hayden, March 8, 2012
Sponsored by Pandia.com
Protected: Activate SharePoint Features Before You Start a Project
March 7, 2012
Exogenous Complexity 5: Fees for Online Content
March 7, 2012
I wanted to capture some thoughts sparked by some recent articles about traditional publishing. If you believe that the good old days are coming back for newspapers and magazines, stop reading. If you want to know my thoughts about the challenges many, many traditional publishers face, soldier on. Want to set me straight. Please, use the comments section of this Web log. —Stephen E Arnold
Introduction
I would have commented on the Wall Street Journal’s “Papers Put Faith in Pay Walls” on Monday, March 5, 2012. Unfortunately, the dead tree version of the newspaper did not arrive until this morning (March 6, 2012). I was waiting to find out how long it would take the estimable Wall Street Journal to get my print subscription to me in rural Kentucky. The answer? A day as in “a day late and a dollar short.”
Here’s what I learned on page B5:
As more newspapers close the door on free access to their websites [sic], some publishers are still waiting for paying customers to pour in.
No mention of the alleged calisthenics in which the News Corp.’s staff have undertaken in order to get a story. But the message for me was clear. Newspapers, like most of those dependent on resource rich, non digital methods of generating revenue, have to do something. In the case of the alleged actions of the News Corp. I am hypothesizing that almost anything seems to be worth considering.
Is online to blame? Are dark forces of 12 year olds who download content the root of the challenges? Is technology going to solve another problem or just add to the existing challenges?
Making money online is a tough, thankless task. A happy quack to http://www.calwatchdog.com/tag/sisyphus/
My view is that pay walls are just one manifestation of the wrenching dislocations demographic preferences, technology, financial larking, and plain old stubbornness unleash. The Wall Street Journal explains several pay wall plans; for example, the Wall Street Journal is $207 a year versus the New York Times’s fee of $195.
The answer for me is that I did not miss the hard copy Wall Street Journal too much. I dropped the New York Times print subscription and I seem to be doing okay without that environmentally-hostile bundle of cellulose and chemically-infused ink. Furthermore, I don’t use either the Wall Street Journal or the New York Times online. The reason is that edutainment, soft features, recycled news releases, and sensationalism do not add value to my day in Harrod’s Creek, Kentucky. When I look at an aggregation of stories, sometimes I click and see a full text article from one of these two newspapers. Sometimes I get the story. Sometimes I get asked to sign up. If the story displays, fine. If not, I click to another tab.
When I worked at Halliburton NUS and then at Booz, Allen & Hamilton, reading the Wall Street Journal and the New York Times was part of the “package”. Now I am no longer a blue chip “package.” I am okay with that repositioning. The upside is that I don’t fool with leather briefcases, ties, and white shirts unless I have to attend a funeral. At my own, The downside is that I am getting old, and at age 67 less and less interested in MBA wackiness. I have no doubt I will be decked out in my “real” job attire. For now, I am okay with tan pants, a cheap nylon shirt, a worn Reebok warm up jacket, and whatever information I can view on my computing device.
Newspaper publishing has not adjusted to age as I have. Here’s a factoid from my notes about online revenue:
When printed content shifts to digital form, the online version shifts from “must have” to “nice to have”. As a result, the revenues from online cost more to generate and despite the higher costs, the margins suck. Publishers don’t like to accept the fact that the shift to online alters the value of the content. Publishers have high fixed costs, and online thrives when costs are driven as low as possible.