The Governance Air Craft Carrier: Too Big to Sail?

August 31, 2011

In a few days, I disappear into the wilds of a far off land. In theory, a government will pay me, but I am increasingly doubtful of promises made from 3,000 miles from Harrod’s Creek. As part of the run up to my departure, we held a mini webinar/consultation on Tuesday, August 30, 2011, with a particularly energetic company engaged in “governance.” (SharePoint Semantics has dozens of articles about governance. One example is “A Useful Guide to SharePoint Success from Symon Garfield”. The format of the call was basic. The people on the call asked me questions, and I provided only the perspective of three score years and as many online failures can provide. (I will mention SharePoint but my observations apply to other systems as well; for instance, Documentum, Interwoven, FileNet, etc.)

What I want to do in this short write up is identify a subject that we did not tackle directly in that call, which concerned a government project. However, after the call, I realized that what I call an “air craft carrier” problem was germane to the discussion of automated indexing and entity extraction. An air craft carrier today is a modular construction. The idea is that the flight deck is made by one or more vendors, moved to the assembly point, and bolted down. The same approach is taken with cabins, electronics, and weapon systems.

The basic naval engineering best practice is to figure out how to get the design nailed down. Who wants to have propeller assemblies arrive that do not match the hull clearance specification?

What’s an air craft carrier problem? An air craft carrier is a big ship. It is, according to my colleague Rick Fiust, a former naval officer, a “really big ship.” Unlike a rich person’s yacht or a cruise ship, an air craft carrier does more than surprise with its size. Air craft carriers pack a wallop. In grade school I remember learning the phrase “gun boat diplomacy.” The idea was that a couple of gun boats sends a powerful message.

What every content centric system aspires to be. Some information technology professionals will tell their bosses or clients, “You have a state of the art search and content processing system. Everything works.” Unlikely in my experience.

Governance or what I like to think of as “editorial policy” is an air craft carrier. The connotation of governance is broad, involves many different functions, and sends a powerful message. The problem is that when content in an organization becomes unmanageable, the air craft carrier runs aground and the crew is not exactly sure what to to about the problem.

Consider this real life example. A well meaning information technology manager installs SharePoint to allow the professionals in marketing to share their documents, price lists, and snippets from a Web site. Then the company acquires another firm, which runs SharePoint as well as a handful of enterprise applications. On the surface, the situation looks straight forward. However, the task of getting the two organizations’ systems to work smoothly is a bit tricky. There are the standard challenges of permissions and access as well as somewhat more exotic ones of coping with intra-unit indexing and index refreshes. Then a third company is acquired, and it runs SharePoint. Unlike the first two installations which were “by the book”, the third company’s information technology unit used SharePoint as a blank canvas and created specialized features and services, plugged in third party components, and some home grown code.

Now the content issue arises. What content is available, when, to whom, and under what circumstances. Because the SharePoint installation was built in separate modules over time, will these fit together? Nope. There was no equivalent of the naval engineering best practice.

Governance, in my opinion, is the buzz word slapped on content centric systems of which SharePoint is but one example. The same governance problem surfaces when multiple content centric systems are joined.

Will after the fact governance solve the content problems in a SharePoint or other content centric environment? In my experience, the answer is, “Unlikely.” There are four reasons:

Cost. Reworking three systems built on the same platform should be trivial. The work is difficult and in some situations, scrapping the original three systems and starting over may be a more cost effective solution. Who knows what interdependencies lurk within the three systems which are supposed to work as one? Open ended engineering projects are likely to encounter funding problems, and the systems must be used “as is” or fixed a problem at a time.

Written by Stephen E. Arnold · Filed Under Feature, Indexing, Search, Technology, Text processing | 1 Comment

EasyAsk Sweetens Sugar CRM

August 31, 2011

The world of customer relationship management (CRM) just got a lot sweeter a few months ago with the announcement that Sugar CRM is partnering with EasyAsk. SugarCRM is a leader in the world of customer relationship management. SugarCRM said:

SugarCRM helps your business communicate with prospects, share sales information, close deals and keep customers happy. SugarCRM is an affordable web-based CRM solution for small- and medium-sized businesses. Offered in the Cloud or on-site, it is easy to customize and adapt to the way you do business.

“EasyAsk and SugarCRM to Provide Natural Language Search and Analysis,” covers the news of this exciting joint venture. We learned:

EasyAsk and SugarCRM announced . . . at SugarCon 2011 that they will team up to offer EasyAsk for SugarCRM, a new version of EasyAsk with natural search and analysis software integrated with SugarCRM. The integrated product will deliver SugarCRM information through EasyAsk’s language interface and tools. With EasyAsk for SugarCRM, users can ask questions in English and get immediate answers FROM THEIR SugarCRM system.

The natural language processing (NLP) offered by EasyAsk allows users to query and communicate in English, smoothing the language barrier between man and machine. EasyAsk’s NLP technology and engineering really move the SugarCRM cloud offerings to the next level.

We are glad to see such a natural partnership taking place between two innovators. Other businesses, especially those in eCommerce and mobile apps, would do well to incorporate the EasyAsk language interface and tools into their offerings. Doing so would most certainly increase user satisfaction and reduce their own engineering and design stress.

Other independent software vendors have embedded the EasyAsk natural language interface into their offerings with considerable success. Among the companies using EasyAsk’s NLP technology are Siemens, Personnel Data Systems, Ceridian, and Gensource.

Although IBM has been intent on wowing the consumer with the Jeopardy game show demonstration, EasyAsk has been building a market for real-world natural language solutions. In addition, EasyAsk has also delivered a version of its NLP tools on NetSuite, one of the leaders in software as a service enterprise resource planning solution providers.

The EasyAsk-SugarCRM partnership came together smoothly, with both firms able to sweeten their product offerings while solving problems for licensees. We will continue to cover EasyAsk. We think the search firm will continue to prove itself a market and technology leader.

Stephen E Arnold, August 31, 2011

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Business process, Natural language processing, News, Technology, Text processing | 3 Comments

Do Search Engines Have a Future?

August 31, 2011

In a recent guest post at OStatic.com, Grant Ingersoll, founder of Lucid Imagination, (provider of open-source Apache-based search tools) discusses the increasing sophistication of search engines. Ingersoll attributes this spike in sophistication to the increased amount of data available, as well as the increase in resources available to interpret that data. In addition to this, there is more of a focus on the ways in which the data is used, rather than just the sheer amount of data available. He asserts:

Previously, search engines only seemed to care about the text ingested. Now, we care not only about the content, but also how the users interacted with the results both personally and in the aggregate and across time. This social metadata helps bring the focus of search back on the user and their information need instead of solely on the raw data and the parsing of the language.

The ability to process large amounts of data has also become more available over the years, allowing search engine innovation to focus on analyzing data rather than merely processing it. Libraries like Apache Lucene, which allow for much quicker, easier, and more successful searching, are the next step in the evolution of Internet search engines. The combination of results-focused searches and collaborative communities mark the next phase in its development.

And the question, “Do search engines have a future?” We recognize that social content is important. We also know that when clicks and preferences replace curation, the notions of precision and recall require a new context. What happens if the “soft” side of information—the preferences of a possibly ill informed group of users—defines the high value content? We think that there are some significant issues related to provenance, accuracy, and bias that are now assumed to be no big deal. We think precision, recall, accuracy, objectivity, and provenance are important. Search engines are now either part of the problem or part of the solution. These issues are not whether a system is free, low cost or high cost. The issue with which we are concerned is the sort of thing that gave Alexis de Tocqueville pause:

In the United States, the majority undertakes to supply a multitude of ready-made opinions for the use of individuals, who are thus relieved from the necessity of forming opinions of their own.

Now search?

Jody Barnes, August 31, 2011

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Business strategy, News, Search | 1 Comment

StoredIQ Releases New eDiscovery App

August 31, 2011

StoredIQ, one of the leading companies in the field of internet-based legal discovery has just announced the release of StoredIQ Integrated Legal Hold, a product designed to streamline the legal hold process. Integrating many different aspects of the discovery process, including notification, acknowledgement, and collection and preservation, this new package is intended to simplify the process for users. We learned:

We want to change the customer mindset to define Legal Hold as not just the simple act of notifying custodians to preserve relevant data, but see it as a holistic process that includes notification tightly coupled with the collection and preservation of responsive data,” said Amir Jaibaji, vice president of product management for StoredIQ.

The product makes it easier for companies to comply with all aspects of case law, a process which, in the past, was a daunting task for those involved. By integrating Legal Hold with DiscoveryIQ, the company’s eDiscovery application, all aspects of the discovery process are contained in one simple, easy-to-use program. This is all part of StoredIQ’s plan to not only ease the process, but to redefine the Legal Hold concept itself – to consider it a “holistic process” in which every step is part of the same overall plan.

Jody Barnes, August 31, 2011

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Business strategy, EDiscovery, News, Text processing | Comments Off on StoredIQ Releases New eDiscovery App

CourseSmart: College eBook Leader

August 31, 2011

We monitor the eBook market. The search functions for eBooks are an area ripe for innovation. As we were looking for more effective search solutions for eBooks, we came across an item which we wanted to document.

An example of the way that technology has pervaded every aspect of our lives is the increased use of digital textbooks by college students. Due to high costs of college tuition, many students are finding eBooks to be more affordable and they offer direct access to media resources like online quizzes and extra course material not taught in lectures.

In order to capitalize on this new market, electronic textbook companies like CourseSmart, Barnes & Noble, and Amazon are all fighting for the claim of largest eBook library. Unfortunately, there was no way to objectively compare each company’s offerings–until now.
According to the study What Electronic Textbook Provider Has The Biggest Library? the textbook price comparison site Campusbooks recently did a study to find a winner. The study states:

The site worked with partner booksellers to come up with a list of the 1,000 most popular textbooks for fall 2011 to use as its metric. It takes into account data that professors share with bookstores in order to help them determine demand, including which books they have selected for their upcoming classes and how many students are signed up for them.

After reviewing their data, Campusbooks declared CourseSmart to be the leader in eBooks.

Jasmine Ashton, August 31, 2011

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Business strategy, News, Publishing | 1 Comment

Google Study Finds Web Banners Ineffective

August 31, 2011

On Saturday, one reader sent us a link to this story: “Is Google’s Search for Quality Content a Ruse for a Massive Diversion of Cash to Its Own Sites?” We are not sure if the points in the write up are spot on, but the theme of the article connected to another story we noticed.

According to a 2010 survey by Google, the average click through rate for banner ads this past year was 0.09 percent which is down from 0.1 percent in 2009. This decrease leads me to believe that attempts to make banner ads more inviting to potential customers are failing miserably. However, the article Google: Click-Through Rates Fell in 2010 [Study] states:

[The study] found that the format of a display ad can make a difference. A 250×250 pixel ad using Flash got the highest CTR of any format — 0.26%. The worst performers were vertical 120×240 banners with Flash and a full (468×60) banner with Flash, which both got rates of 0.05%.

As with television ads, it’s difficult to determine the effectiveness of digital advertising by only looking at click-through. It is important that we recognize that banner-ads are not created inside a vacuum, but are rather one small part of a larger complex advertising strategy. Needless to say, if studies continue to come out showing any aspect of this strategy to be failing it could lead to major implications for Google.

At lunch on Sunday, I discussed these two items with two people immersed in Web advertising. Three observations stuck in my mind:

First, if there is a softening in click through or online ad revenue, Google will have little choice but find ways to pump up its revenue.

Second, the notion of social media fatigue seems germane. People may be tired of online ads. The result is to shift to a more low profile “pay to play” model. Overt ads may be on the down side after a long run up.

Third, the urgency for organizations like Google and Flipbook to find a way to inject rich media is an indication that the ad revenues flowing to television advertisers are the next Klondike.

I am not sure what to think, but this notion that online ad revenue may need some xoskeletal supports is fascinating. There are significant implications for objective search results as well.

Jasmine Ashton, August 31, 2011

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Business strategy, Marketing, News, Online (general), Rich media, SEO | Comments Off on Google Study Finds Web Banners Ineffective

Protected: Complex SharePoint Performance Management

August 31, 2011

Written by Stephen E. Arnold · Filed Under Enterprise, Enterprise search, News, Search, SharePoint | Comments Off on Protected: Complex SharePoint Performance Management

Facebook NLP Group Postings

August 30, 2011

ZDNet reports on the technical side of Facebook’s newest adjustment to the ubiquitous news feed in, “Facebook using natural language processing to group posts, link to Pages.”

Facebook has added a new type of story to its News Feed today: if more than one of your friends post about the same topic, and it has a Page on the social network, the posts will be grouped under a Posted About story, even if your friends don’t explicitly tag the Page. The story is posted in the following format: “[Friend] and [x] other friends posted about [Page]” where the last part is a link to the Page in question. It turns out Facebook is using natural language processing on status updates as well as the headlines of posted links to figure out if a topic mentioned has a corresponding Page, and then searches to see if your other friends have done so as well.

This is Facebook’s first attempt to match the “trending” ability of Twitter. I’m just not sure that it’s effective yet. As a librarian and a Facebook user, there’s a personal and a professional side to this story for me. The privacy aspect of the social network prevents a metasearch that would allow a user to find relevant posts by subject across the platform. So in its place, it seems Facebook is attempting to show you common themes among those within your friend group, those to whom you have access.

A sure complaint will be the mile-long list one has to scroll through when a friend’s birthday is being universally acknowledged. Sometimes the groupings don’t even make sense, as is the case when two people on different days post to the same person’s wall, but the news feed is grouped together. The only common denominator in that equation is the individual whose wall was being written on. The topics can be completely separate and yet somehow wind up grouped together.

The real power that Facebook should leverage with this feature is the ad revenue potential. The ZDNet article shows an example of several friends discussing the new Harry Potter movie, with a news grouping headline of “So and so and two other friends also posted about Harry Potter.” “Harry Potter” then acts as a link to the product page, in this case, the movie promotion page. So here there is potential to take casual conversation, use NLP to pick up on products or services that also use Facebook, and then direct those users back to the product pages for that particular product or service. If Facebook could find a way to do this effectively, and get companies to buy in on this type of targeted marketing, it could be much more effective than sidebar ads. Expect to see more experimentation from Facebook with natural language processing.

Emily Rae Aldridge, August 30, 2011

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Business strategy, Facebook, News, Technology, Text processing | Comments Off on Facebook NLP Group Postings

Apache Lucene 4.0 Changes Revealed

August 30, 2011

We prepared a report for a search vendor last week and reported that in our sample of organizations, more than 12 percent reported using open source software. Compared to three years ago, that’s a significant jump. Open source, despite the machinations of some large out fits, continues to make in roads in certain organizations. We learned that when there are strong advocates of open source working at an organization, there is a correlation between access to expertise and and internal cheerleader and the appetite for open source solutions.

Curious about the upcoming Apache Lucene 4.0? Ostatic gives us this “Guest Post: Under the Hood in Apache Lucene 4.0,” in which Lucene insider Simon Willnauer details a few big changes.

The decision to let go of backward compatibility allows for significant advances. For one, in the search engine library, indexing text strings are replaced with UTF8 bytes. This revision increases efficiency in term dictionary loading, memory usage, and search speeds. The change also allows for the much anticipated “flexible indexing.” Willnauer explains:

Optimized codecs can be loaded to suit the indexing of individual datasets or even individual fields. . . . New indexing codecs can be developed and existing ones updated without the need for hard-coding within Lucene. There is no longer any need for project-level compromise on the best general-purpose index formats and data structures.

Next, multiple threads will now be used for indexing. This shift makes better use of multi-core processing and input/output resources. Then there’s “concurrent flushing,” where each thread buffer can flush its memory separately without interfering with other users. Finally, a painstakingly revised Levenshtein Automation algorithm greatly improves fuzzy matching.

According to Willnauer, these tidbits are just the beginning. We agree, but the involvement of legal eagles could destabilize the open source band wagon.

Cynthia Murrell August 30, 2011

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under News, Open source, Search | Comments Off on Apache Lucene 4.0 Changes Revealed

Calibre Aces Ebook Conversion and Management

August 30, 2011

Anyone who uses an eBook knows how challenging managing all the books can be. To solve this annoying problem a new program has entered the market: Calibre, an eBook management tool. With so many different types of files and equally different types of eReaders available, it’s nice to finally have a central command to sort through it all.

The concept was borne from an avid eBook enthusiast and reader, who was unhappy with the software available for eBook management and file conversion. Calibre, as it is today, is a work-in-progress that aims to meet the demands of busy eReading folk. As the website explains,

Today Caliber is a vibrant open-source community with half a dozen developers and many, many testers and bug reporters. It is used in over 200 countries and has been translated into a dozen different languages by volunteers. Calibre has become a comprehensive tool for the management of digital texts, allowing you to do whatever you could possibly imagine with your e-book library.

Perhaps the best feature of Calibre is its ability to convert all types of files making it possible for one to download an eBook of any type and then miraculously send it to the eReader of choice. Voila! As one Calibre fan wrote in the article, Best Ebook Library Manager: Calibre, on Book Sprung, “Calibre’s secret weapon is that it’s got crazy ninja formatting skills, and can convert all sorts of files into all sorts of other files. For Kindle owners, this means you can convert unusable file formats into the .mobi format that Kindle likes.”

We look forward to seeing what else Calibre can pull out of its hat, and more importantly, if the eBook providers of the world will play nice with the newest teacher’s pet.

Catherine Lamsfuss, August 30, 2011

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Business strategy, News, Online (general), Publishing | 2 Comments

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

The Governance Air Craft Carrier: Too Big to Sail?

EasyAsk Sweetens Sugar CRM

Do Search Engines Have a Future?

StoredIQ Releases New eDiscovery App

CourseSmart: College eBook Leader

Google Study Finds Web Banners Ineffective

Protected: Complex SharePoint Performance Management

Facebook NLP Group Postings

Apache Lucene 4.0 Changes Revealed

Calibre Aces Ebook Conversion and Management

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta