Open Source Impacts Information Infrastructure

January 2, 2013

Open source continues to not just meet information needs, but also drives the future direction of information including new technologies and architectural structures. Adrian Bridgwater at Open Source Insider looks ahead at the future of open source and information infrastructure in his article, “The Future Impact of Open Source on our Information Infrastructure.”

Bridgwater quotes some numbers from Gartner showing that by 2015, 25% of new database management systems will be supporting alternative data types and non-traditional data structures. He continues:

“Gartner’s Merv Adrian says that the products needed to be able to perform this work will need to “purpose-built alternatives” but that they are, as yet, immature . . . ‘This was before massive scale-out architectures were commonplace and the variety of data types now being deployed existed. New product categories have sprung up, designed to fit new needs at lower cost and with better alignment to new development tools and practices. Products and vendors are continually emerging to offer more purpose-built alternatives; these are immature, but often worth assessing.’”

In the quote above, the products and vendors continually emerging point to open source solutions. Open source is a cost-effective and efficient way to meet the needs of non-traditional data structures and types. Proprietary solutions are often incapable of reacting quickly and affordably to emerging trends. For instance, Big Data solutions are now almost entirely dominated by open source. LucidWorks is one such vendor offering a great open source Big Data solution, but LucidWorks Search is also a leading enterprise search option.

Emily Rae Aldridge, January 02, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Big data, Infrastructure, News, Open source | 1 Comment

Big Data and Search

January 1, 2013

A new year has arrived. Flipping a digit on the calendar prompts many gurus, wizards, failed Web masters, former real journalists, and unemployed English majors to identify trends. How can I resist a chrome plated, Gangnam style bandwagon? Big Data is no trend. It is, according to the smart set:

that Big Data would be “the next big chapter of our business history.

My approach is more modest. And I want to avoid silver-numbered politics and the monitoring business. I want to think about a subject of interest to a small group of techno-watchers: Big Data and search.

My view is that there has been Big Data for a long time. Marketers and venture hawks circle an issue. If enough birds block the sun, others notice. Big Data is now one of the official Big Trends for 2013. Search, as readers of this blog may know, experiences the best of times and the worst of times regardless of the year or the hot trends.

As the volume of unstructured information increases, search plays a part. What’s different for 2013 is that those trying to make better decisions need a helping hand, crutches, training wheels, and tools. Vendors of analytics systems like SAS and IBM SPSS should be in the driver’s seat. But these firms are not. An outfit like Palantir claims to be the leader of the parade. The company has snazzy graphics and $150 million in venture funding. Good enough for me I suppose. The Palantirs suggest that the old dudes at SAS and SPSS still require individuals who understand math and can program for the “end user”. Not surprisingly, there are more end users than there are SAS and SPSS wizards. One way around the shortage is to make Big Data a point-and-click affair. Satisfying? The marketers say, “For sure.”

A new opportunity arises for those who want the benefits of fancy math without the cost, hassle, and delay of dealing with intermediaries who may not have an MBA or aspire to be independently wealth before the age of 30. Toss in the health care data the US Federal government mandates, the avalanche of fuzzy thinking baloney from blogs like this one, and the tireless efforts of PR wizards to promote everything thing from antique abacuses to zebra striped fabrics. One must not overlook e-mail, PowerPoint presentations, and the rivers of video which have to be processed and “understood.” In these streams of real time and semi-fresh data, there must be gems which can generate diamond bright insights. Even sociology major may have a shot at a permanent job.

The biggest of the Big Berthas are firing away at Big Data. Navigate to “Sure, Big Data Is Great. But So Is Intuition.” Harvard, MIT, and juicy details explain that the trend is now anchored into the halls of academe. There is even a cautionary quote from an academic who was able to identify just one example of Big Data going somewhat astray. Here’s the quote:

At the M.I.T. conference, a panel was asked to cite examples of big failures in Big Data. No one could really think of any. Soon after, though, Roberto Rigobon could barely contain himself as he took to the stage. Mr. Rigobon, a professor at M.I.T.’s Sloan School of Management, said that the financial crisis certainly humbled the data hounds. “Hedge funds failed all over the world,” he said. THE problem is that a math model, like a metaphor, is a simplification. This type of modeling came out of the sciences, where the behavior of particles in a fluid, for example, is predictable according to the laws of physics.

Sure Big Data has downsides. MBAs love to lift downsides via their trusty, almost infallible intellectual hydraulics.

My focus is search. The trends I wish to share with my two or three readers require some preliminary observations:

Search vendors will just say they can handle Big Data. Proof not required. It is cheaper to assert a technology than actually develop a capability.
Search vendors will point out that sooner or later a user will know enough to enter a query. Fancy math notwithstanding, nothing works quite like a well crafted query. Search may be a commodity, but it will not go away.
Big Data systems are great at generating hot graphics. In order to answer a question, a Big Data system must be able to display the source document. Even the slickest analytics person has to find a source. Well, maybe not all of the time, but sometimes it is useful prior to a deposition.
Big Data systems cannot process certain types of data. Search systems cannot process certain types of data. It makes sense to process whatever fits into each system’s intake system and use both systems. The charm of two systems which do not quite align is sweet music to a marketer’s ears. If a company has a search system, that outfit will buy a Big Data system. If a company has a Big Data system, the outfit will be shopping for a search system. Nice symmetry!
Search systems and Big Data systems can scale. Now this particular assertion is true when one criterion is met; an unending supply of money. The Big Data thing has a huge appetite for resources. Chomp. Chomp. That’s the sound of a budget being consumed in a sprightly way.

Now the trends:

Trend 1. Before the end of 2013, Big Data will find itself explaining why the actual data processed were Small Data. The assertion that existing systems can handle whatever the client wants to process will be exposed as selective content processing systems. Big Data are big and systems have finite capacity. Some clients may not be thrilled to learn that their ore did not include the tonnage that contained the gems. In short, say hello to aggressive sampling and indexes which are not refreshed in anything close to real time.

Trend 2. Big Data and search vendors will be tripping over themselves in an effort to explain which system does what under what circumstances. The assertion that a system can do both structured and unstructured while uncovering the meaning of the data is one I want to believe. Too bad the assertion is mushy in the accuracy department’s basement.

Trend 3.The talent pool for Big Data and search is less plentiful than the pool of art history majors. More bad news. The pool is not filling rapidly. As a result, quite a few data swimmers drown. Example: the financial crisis perhaps? The talent shortage suggests some interesting cost overruns and project failures.

Trend 4. A new Big Thing will nose into the Big Data and search content processing space. Will the new Big Thing work? Nah. The reason is that extracting high value knowledge from raw data is a tough problem. Writing new marketing copy is a great deal easier. I am not sure what the buzzword will be. I am pretty sure vendors will need a new one before the end of 2013. Even PSY called it quits with Gangnam style. No such luck in Big Data and search at this time.

Trend 5. The same glassy eyed confusion which analytics and search presentations engender will lead to greater buyer confusion and slow down procurements. Not even the magic of the “cloud” will be able to close certain deals. In a quest for revenue, the vendors will wrap basic ideas in a cloud of unknowing.

I suppose that is a good thing. Thank goodness I am unemployed, clueless, and living in a rural Kentucky goose pond.

Stephen E Arnold, January 1, 2012

Another Beyond Search analysis for free

Written by Stephen E. Arnold · Filed Under Analytics, Big data, Feature, Search | 3 Comments

IBM Forgets About Vivisimo

January 1, 2013

IBM wants to take a bite out of Big Data by educating people on the topic and then encouraging them to use their products. One of the ways that IBM does this is through its Redbooks publications. A recent publication called IBM InfoSphere Streams V3.0: Addressing Volume, Velocity, and Variety that discusses how a Big Data platform will allow people to structure and use their data:

“There are multiple uses for big data in every industry—from analyzing larger volumes of data than was previously possible to driving more precise answers, to analyzing data at rest and data in motion to capture opportunities that were previously lost. A big data platform will enable your organization to tackle complex problems that previously could not be solved using traditional infrastructure. As the amount of data available to enterprises and other organizations dramatically increases, more and more companies are looking to turn this data into actionable information and intelligence in real time. Addressing these requirements requires applications that are able to analyze potentially enormous volumes and varieties of continuous data streams to provide decision makers with critical information almost instantaneously. “

The publication suggests using IBM InfoSphere as the enterprise platform for Big Data developments. The InfoSphere can be used as a testing ground for analyzing the data and deciding the best ways to govern it. Did IBM forget about its “other” Big Data” platform, though? Vivisimo was acquired to be the spotlight of Big Data for IBM. Why is it not discussed here?

Whitney Grace, January 01, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Big data, Data, News | 1 Comment

Information Management and Accessibility Promotes Fully Connected Enterprise

December 31, 2012

The end of the year always signals a time for reflection on past trends and projections for the future. Speculation on anything and everything big data is typical throughout the year, but even this already popular subject is seeing a spike. From Computer Weekly comes a countdown in regards to big data called the Top 10 Information Management Stories of 2012.

The article points to specific themes that popped up after sifting through articles that deal big data and information management.

According to this article, some big data solutions have been tough to get off the ground and to get up and running:

Open source big data technologies seem stuck in the sand pit of experimentation in UK corporate organizations. Experts say: experiment, but keep business value in mind. Speaking at a Computer Weekly roundtable on the topic, Bob Harris, chief technology officer at Channel 4, said big data initiatives will likely require organizations to adopt new technologies.

Of course open source big data technologies have the capabilities for continued experimentation. What else would any client want in this era of constant technological evolution? The point is that flexibility should not be interpreted as inability to operate in the current landscape. We see solutions such as PolySpot work for countless organizations in making information and insights accessible to the entire enterprise.

Megan Feil, December 31, 2012

Sponsored by ArnoldIT.com, developer of Beyond Search.

Written by Stephen E. Arnold · Filed Under Big data, Enterprise, News | Comments Off on Information Management and Accessibility Promotes Fully Connected Enterprise

Unstructured Information Has Many Possibilities and Dangers

December 31, 2012

Unstructured information has been piling up for years and it was not until the Big Data boom of early 2012 did people really begin to see its hidden potential. Like a huge boulder rolling down hill, unstructured information uses have taken off and businesses are doing their best to take advantage of the new information streams. There are many cases, however, where businesses do not know the first place to begin. OpenText has caught onto this need and they have started a new series: “Introducing the OpenText CEO White Paper Series.” The first paper was only recently published (talk about fresh) and OpenText’s CEO Mark J. Barrenechea wrote it. Here is a brief explanation about what to expect from the new series:

“’Each corporate information asset represents both risk and value to today’s organization. Every email is a potential smoking gun and every contract the potential solution to a costly litigation. At the same time, unstructured information is today’s oil, and being able to capture, preserve, manage, and capitalize on it is the next frontier of competitive business. EIM acts as a force multiplier in helping organizations unlock the untapped value of unstructured information, while complying with regulatory requirements and ensuring that corporate data is safe.’”

OpenText, we are waiting for more to come out. People need this information for their Big Data projects. Think of it as structured and unstructured information for dummies.

Whitney Grace, December 31, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Big data, Data, News | Comments Off on Unstructured Information Has Many Possibilities and Dangers

Effective Search Technology Is Business Critical in 2013

December 28, 2012

This has been the year of “big data” and search acquisitions, and we have tracked the changes closely. CMSWire has also noticed the trend, making note in the recent article “Search in 2013 Will Become a Business Critical Application.” The article comments on the emerging trend of search implementation and argues that information and actionable insights from big data are critical to business. The need has never been greater for effective search technology and support.

The article continues:

“The Findwise survey […] indicates that less than 20 percent of organizations have a strategy for search even though many of them will be supporting multiple search applications. I expect this figure to improve markedly by the time the 2013 Findwise survey is presented [in May].

The 2011/2012 search vendor acquisition frenzy took out most of the mid-range vendors. In 2013 we will find out whether smaller commercial vendors can attract the investment they need to bring their technologies to a wider market or whether the space will be taken by open-source applications.”

We believe search will be critical to successful business operations in 2013. Secure search and the use of metatagging will lead to improved business processes and enterprise decisions driven by content. The article mentions Intrafind as a potential option for a blend of open-source and proprietary modules and we believe this software has the necessary offerings to help integrate business-critical search.

Andrea Hayden, December 28, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Big data, Enterprise search, News, Open source, Search | 1 Comment

PolySpot Enables Efficient Information Dissemination and Analysis in the Enterprise

December 28, 2012

Bruno Aziza has two titles for his position at data analytics company SiSense. One is the Vice President of Worldwide Marketing…and the other: Data Geek. He comes out as such in a recent MIT Sloan Management Review article called, “The Big Deal About a Big Data Culture (and Innovation).”

In a conversation with their contributing editor, Aziza talks about the developing role of data analytics and offers insight into how to successfully utilize data and analytics effectively.

Aziza raises the awareness of the context surrounding the surge of interest in big data:

Secondly, I think the term analytics has raised the awareness of the problem. Before we used to call this business intelligence, and it’s funny how just the change of a term to business analytics made other people want to be interested in it. Also the financial crisis has helped people realize that you can be doing business in the old fashioned way, or you can be trying to be smarter than the other guys.

Whether it is called big data analytics or business intelligence, the important part of the evolution is that businesses know how important having efficient access to big data is to gaining a competitive advantage over other companies. One solution we have seen translate into ROI for organizations extracting value from big data is PolySpot. Their technology allows for information dissemination and analysis to happen quickly and effectively.

Megan Feil, December 28, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Big data, Business intelligence, News | Comments Off on PolySpot Enables Efficient Information Dissemination and Analysis in the Enterprise

Bust Big Data Down to Size with PolySpot and Enable Unified Information Access

December 27, 2012

While undertaking a big data strategy is automatically an affordable option for a larger corporation or company, there has been much said on the subject of affordability and feasibility for small businesses. Small Business Labs reports some interesting data in their recent article, “Survey – How Small Businesses View Big Data.”

The facts they report stem from the Big Data for the Little Guy project by Intuit. This company surveyed 500 small business owners. Of course, small business owners see the potential value and unlimited opportunities within big data, but they do maintain concerns over whether or not the big data train is something they can hop on board. Reportedly, four in ten believe that it would be useful.

The article states:

71% of the respondents noted potential barriers, with the top being:
-15% felt big data might be too costly
-14% said they didn’t have the time for implementation
-10% said they didn’t understand it
-9% said they don’t have data
-8% said they lack the expertise required
-6% said they don’t have the tools
-6% said it’s too hard
The survey asked for their top reason, but I’m sure if they had been asked to list all the potential barriers most of these would be listed by most of the respondents.

There are many open source solutions that are priced affordably and are customizable and scalable to fit small businesses. A vital technology companies need to extract value from big data is something that deals in everything related to information access and delivery; for example, solutions from PolySpot are designed for this big data busting purpose.

Megan Feil, December 27, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Big data, Business strategy, News | Comments Off on Bust Big Data Down to Size with PolySpot and Enable Unified Information Access

Rackspace Unveils Cloud Database as a Service

December 27, 2012

Rackspace has made a name for itself providing Cloud infrastructure. The future is definitely in the Cloud, as security becomes less of an issue and price continues to drop. Rackspace’s newest offering is Cloudant, a Cloud database as service. Read the full details in the CRN.com story, “Rackspace Unveils Cloud Database as a Service with Cloudant.”

The article begins:

“Cloud infrastructure provider Rackspace is offering a database as a service for developers of Web and mobile applications in its Cloud Tools program. The NoSQL database as a service is provided by Cloudant through its Data Layer, a collection of database clusters hosted in Rackspace’s worldwide data centers. Cloudant’s Data Layer offers a CouchDB-compatible, RESTful JSON API; a MapReduce engine; and built-in full-text search, based on Apache Lucene, which is a Java-based, open-source information retrieval library.”

Apache Lucene is a powerful base on which to build. LucidWorks also uses the power of Lucene as its source. LucidWorks offers a different type of product, primarily search and Big Data solutions for the enterprise. The emergence of such popular and effective solutions based on open source infrastructure is proof that the future is in open source, and that organizations need to stay in tune with the latest technology in order to stay relevant and effective.

Emily Rae Aldridge, December 27, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Big data, Cloud computing, News, Open source | 1 Comment

Google Spanner Achieves the Impossible

December 27, 2012

Wired has posted a thorough article about a recent Google tech breakthrough in its “Exclusive: Inside Google Spanner, the Largest Single Database on Earth.” This database, like many other Googley projects, grew out of the company’s solution to an internal problem—collaborating between their scattered offices without being slowed down by the delay that usually plagues global communications and data sharing.

“Spanner is a creation so large, some have trouble wrapping their heads around it. But the end result is easily explained: With Spanner, Google can offer a web service to a worldwide audience, but still ensure that something happening on the service in one part of the world doesn’t contradict what’s happening in another. . . .

“Before Spanner was revealed, many didn’t even think it was possible. Yes, we had ‘NoSQL’ databases capable of storing information across multiple data centers, but they couldn’t do so while keeping that information ‘consistent’ — meaning that someone looking at the data on one side of the world sees the same thing as someone on the other side. The assumption was that consistency was barred by the inherent delays that come when sending information between data centers.”

Google’s engineers have found a way, though, one that involved creating its own time-keeping mechanism and ended up reducing costs in the bargain. It is well worth reading the article for the details.

What caught our eye most, though, is the hostility toward Google in some of the comments. I won’t reproduce them here, but we wonder: why the chip on the community’s shoulder?
We officially love Google.

Cynthia Murrell, December 27, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Big data, Data, Google, News | Comments Off on Google Spanner Achieves the Impossible

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Open Source Impacts Information Infrastructure

Big Data and Search

IBM Forgets About Vivisimo

Information Management and Accessibility Promotes Fully Connected Enterprise

Unstructured Information Has Many Possibilities and Dangers

Effective Search Technology Is Business Critical in 2013

PolySpot Enables Efficient Information Dissemination and Analysis in the Enterprise

Bust Big Data Down to Size with PolySpot and Enable Unified Information Access

Rackspace Unveils Cloud Database as a Service

Google Spanner Achieves the Impossible

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta