Computational Constraints: Big Data Are Big

July 8, 2015

Navigate to “Genome Researchers Raise Alarm over Big Data.” The point of the write up is that “genome data will exceed the computing challenges of YouTube and Twitter.” This may be a surprise to the faux Big Data experts. The write up points out:

… they [computer wizards] agree that the computing needs of genomics will be enormous as sequencing costs drop and ever more genomes are analyzed. By 2025, between 100 million and billion human genomes could have been sequenced, according to the report, which is published in the journal PLoS Biology. The data-storage demands for this alone could run to as much as 2^40 exabytes (1 exabyte is 1018 bytes), because the number of data that must be stored for a single genome are 30 times larger than the size of the genome itself, to make up for errors incurred during sequencing and preliminary analysis.

Until computing resources are sufficiently robust and affordable, the write up states:

Nevertheless, Desai [an expert] says, genomics will have to address the fundamental question of how much data it should generate. “The world has a limited capacity for data collection and analysis, and it should be used well. Because of the accessibility of sequencing, the explosive growth of the community has occurred in a largely decentralized fashion, which can’t easily address questions like this,” he says. Other resource-intensive disciplines, such as high-energy physics, are more centralized; they “require coordination and consensus for instrument design, data collection and sampling strategies”, he adds. But genomics data sets are more balkanized, despite the recent interest of cloud-computing companies in centrally storing large amounts of genomics data.

Will the reality of Big Data increase awareness of the need for Little Data; that is, trimmed sets? Nah, probably not.

Stephen E Arnold, July 8, 2015

The Bing Listicle: Bing Search Strategy

July 8, 2015

I noted a slide show designed to pump up page views for eWeek. Navigate to “What the Bing Search Engine Brings to Microsoft’s Web Strategy.” Prepare to be patient because the code used to display the content makes life interesting.

Strategy means the big picture. Tactics means changing the color of an item in the picture. Bing has been an interesting search engine. The team has had a bit of a revolving door. The spin of the door has sucked in Australian and Chinese search wizards. The Bing thing sold its map “business.” The Bing thing cut a deal with AOL to provide search and ads, a sure fire combination for improved relevance in search results.

The listicle hits a number of strategic points. I want to comment on three. Visit the original listicle for the remaining strategic gems.

Strategic Move 1: Apple and Microsoft have a search partnership. Now Apple is rumored to be poking around in the Web search space. The listicle asserts that “Apple, Microsoft Form Search Partnership.” I find this interesting. It may be tactical for Apple and strategic for Microsoft. If Apple creates a semi workable search system, will Apple continue to embrace the besieged Microsoft? My money is on Apple for a deal that helps out Apple until the deal no longer helps out Apple.

Strategic Move 2: Bing offers a rewards program. This is pay to play. If lots of people use rewards, will Microsoft find the offer untenable. My hunch is that this Rewards thing is like the annoying and now-dead Scroogle: A desperate tactic, not a strategic move.

Strategic Move 3: Bing is “handy on Microsoft hardware.” Okay, but I use Apple computers. The notion that Bing is baked into Windows 10 and Windows hardware seems to make sense. But I turn off the crazy Microsoft search functions and rely on third party tools. The strategic move is great for Microsoft internal pitches. The tactic is one that may annoy some folks who use Windows hardware and is essentially another tactic to make Bing zing. If Bing is so wonderful, what’s Microsoft doing with Fast Search technology and the Delve search? I would conclude there is no search strategy at Microsoft.

Stephen E Arnold, July 8, 2015

Sprinklr Aims to Conquer Consolidation Market

July 8, 2015

Sprinklr is in a race with the likes of Salesforce as well as fellow social-consolidation startups. Forbes declares, “Sprinklr Acquires NewBrand, the $1 Billion Social Startup’s Seventh Buy in 18 Months.” Back when social media was new, companies scrambled to leverage its potential with a hodgepodge of tools. Now, Sprinklr founder Ragy Thomas sees a wave of consolidation approaching, as companies tire of struggling to unite disparate solutions. Writer Alex Konrad writes:

“Sprinklr is one of a number of companies facing pressure to provide a more complete stack to brands looking to integrate their social marketing and customer support, Thomas says. An obvious example is the Salesforce Marketing Cloud, built off a nucleus of its own acquisitions like ExactTarget, Buddy Media and Radian6. Demand for a more end-to-end solution has intensified in the last year, Thomas argues. That’s why Sprinklr has acquired so much and so quickly, the CEO argues, typically taking the absorbed startup and absorbing its code directly into Sprinklr’s main code. …

“Sprinklr will face competition from also well-financed startups like Percolate as well as from larger suite offerings like Salesforce. ‘We are in a race against time to provide the capability to brands,’ Thomas says. ‘It’s becoming a three or four horse race with a clear set of companies that big brands can bank on moving forward.’”

 At the moment, it looks like Sprinklr may be ahead in that race; predictive-analytics/ business-intelligence firm NewBrand is its seventh acquisition since the beginning of 2014. NewBrand launched in 2010, and is based in Washington, DC.

 Ragy Thomas founded Sprinklr in 2009. The company is headquartered in New York City, with offices around the world. The other six companies it has snapped up include Scup, Get Satisfaction, Pluck, Branderati, TBG Digital, and Dachis Group.

Cynthia Murrell, July 8, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Want To Know What A Semantic Ecosystem Is

July 8, 2015

Do you want to know what a semantic ecosystem is? The answer is available from TopQuadrant in its article, “Semantic Ecosystem-What’s That About?”  According to the article, a semantic ecosystem enables patterns to be discovered, show the relationships between and within data sources, add meaning to raw data artifacts, and dynamically bring information together.

In short, it shows how data and its sources connect with each other and extracts relationships from it.

What follows the brief explanation about what a semantic ecosystem can do is a paragraph about the importance of data, how it takes many forms, etc., etc.  Trust me, you have heard it before. It then makes a comparison with a natural ecosystem, i.e. the ones find in nature.

The article continues with this piece:

“As in natural ecosystems, we believe that success in business is based on capability – and the ability to adapt and evolve new capabilities. Semantic ecosystems transform existing diverse information into valuable semantic assets. Key characteristics of a semantic ecosystem are that it is adaptable and evolvable. You can start small – with one or more key business solutions and a few data sources – and the semantic foundation can grow and evolve with you.”

It turns out a semantic ecosystem is just another name for information management.  TopQuadrant coined the term to associate with their products and services.  Talk about fancy business jargon, but TopQuadrant makes a point about having an information system work so well that it seems natural.  When a system works naturally, it is able to intuit needs, interpret patterns, and make educated correlations between data.

Whitney Grace, July 8, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Semantic Search and Challenging Patent Document Content Domains

July 7, 2015

Over the years, I have bumped into some challenging content domains. One of the most difficult was the collection of mathematical papers organized with the Dienst architecture. Another was a collection of blog posts from African bulletin board systems in a number of different languages, peppered with insider jargon. I also recall my jousts with patent documents for some pretty savvy outfits.

The processing of each of these corpuses and making them searchable by a regular human being remains an unsolved problem. Progress has been slow, and the focus of many innovators has been on workarounds. The challenge of each corpus remains a high hurdle, and in my opinion, no search sprinter is able to make it over the race course without catching a toe and plunging head first into the Multi-layer SB Resin covered surface.

I read “Why Is Semantic Search So Important for Patent Searching?” My answer was and remains, “Because vendors will grab at any buzzy concept in the hopes of capturing a share of the patent research market?”

The write up take a different approach, an approach which I find interesting and somewhat misleading.

The write up states that there are two ways to search for information: Navigational search sort of like Endeca I assume and research search, which is the old fashioned Boolean logic which I really like.

The article points out that keyword search sucks if the person looking for information does not know the exact term. That’s why I used the reference to Dienst. I wanted to provide an example which requires precise knowledge of terminology. That’s a challenge and it requires specialized knowledge from a person who recognizes that he or she may not know the exact terminology required to locate the needed information. Try the Dienst query. Navigate to a whizzy new search engine like www.unbubble.eu and plug away. How is that working out for you, but don’t cheat. You can’t use the term Dienst.

If you run the query on a point and click Web search system like Qwant.com, you cannot locate the term without running a keyword search.

The problems in patents, whether indexed with value added metadata, humans laboring in a warehouse, or with semantic methods are:

  1. Patent documents exist in versions and each document drags along assorted forms which may or may not be findable. Trips to the USPTO with hat in hand and a note from a senator often do not work. Fancy Dan patent attorneys fall back on the good old method of hunting using intermediaries. Not pretty, not easy, not cheap, and not foolproof. The versions and assorted attachments are often unfindable. (There are sometimes interesting reasons for this kettle of fish and the fish within it.) I don’t have a solution to the chains of documents and the versions of patent documents. Sigh.
  2. Patents include art. Usually the novice reacts negatively to lousy screenshots, clunky drawings, and equations which make it tough to figure out what a superscript character is. Keywords and pointing and clicking, metaphors, razzle dazzle search systems, and buzzword charged solutions from outfits like Thomson Reuters and Lexis are just tools, stone tools chiseled by some folks who want to get paid. I don’t have a good solution to the arts and crafts aspect of patent documents. Sigh sigh.
  3. Patent documents are written at a level of generalization, with jargon, Latinate constructs, and assertions that usually give me a headache. Who signed up to read lots of really bad poetry. Working through the Old Norse version of Heimskringla is a walk in the park compared to figuring out what some patents “mean.” I spent a number of years indexing 15th century Latin sermons. At least in that corpus, the common knowledge base was social and political events and assorted religious material. Patents can be all over the known knowledge universe. I don’t know of a patent processing system which can make this weird prose-poetry understandable if there is litigation or findable if there is a need to figure out if someone cooked up the same system and method before the document in question was crafted. Sigh sigh sigh.
  4. None of the systems I have used over the past 40 years does a bang up job of identifying prior art in scientific, technical or medical journal articles, blog posts, trade publications, or Facebook posts by a socially aware astrophysicist working for a social media company. Finding antecedents is a great deal of work. Has been and will be in my opinion. Sigh sigh sigh sigh. But the patent attorneys cry, “Hooray. We get to bill time.”

The write up presents some of those top brass magnets: Snappy visualizations. The idea is that a nifty diagram will address the three problems I identified in the preceding paragraphs. Visualizations may be able to provide some useful way to conceptualize where a particular patent document falls in a cluster of correctly processed patent documents. But an image does not deliver the mental equivalent of a NOW Foods Why Protein Isolate.

Net net: Pitching semantic search as a solution to the challenges of patent information access is a ball. Strikes in patent searching are not easily obtained unless you pay expert patent attorneys and their human assets to do the job. Just bring your checkbook.

Stephen E Arnold, July 7, 2015

Walmart and the Big Data Elephant Riders

July 7, 2015

Navigate to the Capitalist Tool’s write up “Walmart: The big Data Skills Crisis and Recruiting Analytics Talent.” Stating the obvious is something that most jargon delivery mechanisms avoid. Why be clear when obfuscation provides so many MBA-type chuckles?

The write up states about Big Data:

There just aren’t enough people with the required skills to analyze and interpret this information–transforming it from raw numerical (or other) data into actionable insights – the ultimate aim of any Big Data-driven initiative.

I had to sit down. Imagine. Specific skills are required to assemble data, formulate hypotheses, configure the numerical recipes, obtain outputs, and then analyze what the magic of math delivers.

Who would have thought that the average marketer might be a tiny bit under equipped to deal with Big Data in the here and now?

The write up states:

Last year, they [sic. The reference is to the single firm Walmart] turned to crowd sourced analytics competition platform Kaggle. At Kaggle, an army of “armchair data scientists” apply their skills to analytical problems submitted by companies, with the designer of the best solution being rewarded – sometimes financially, in this case with a job.

That’s a great solution. No problem with confidentiality in the crowdsourcing ecosystem. But Walmart hired candidates. Walmart explains what it seeks:

“Fundamentally,” says Thakur [Walmart manager], “we need people who are absolute data geeks–people who love data, and can slice it, dice it and make it do what they want it to do.

Walmart also uses an “analytics rotation program.” I assume this is designed to ensure that the Big Data analytics wizard can “run in the right direction.”

Walmart, it appears, is the leader in using crowd sourced methods for finding talent. Perhaps Walmart perceives itself as one of the leaders in the use of this method. It is good to be a visionary in Walmart land. What is Walmart’s next innovation? I cannot anticipate the next revolutionary breakthrough from the retailer many local retail stores perceives as a good neighbor made better with Big Data.

Stephen E Arnold, July 7, 2015

Amazon: Its Search Warrants Watching

July 7, 2015

I read “Amazon Must Face Trademark Lawsuit over Search Results.” The write up reports that “the online retailer’s search results can cause confusion for potential customers.” The product in quest is a watch from a “high end watchmaker Multi Time Machine.”

My own experience with Amazon search results is that, on the whole, the system outputs “close” results. Close as in horseshoes. My annoyance grows each time I click on a title only to learn that it is not available. Grrr. How tough is it to allow me to NOT out results which I do not want to view? There are other issues as well. These range from the do it yourself approach to content processing for Amazon’s “enterprise search” on AWS to the baffling listing of results which are Amazon’s, in Amazon’s warehouse, available from an Amazon partner, or listed by a now unemployed middle school teacher after the product did not move at a recent garage sale.

The write up points out:

Amazon displays MTM Special Ops in the search field and immediately below the search field, along with similar watches manufactured by MTM’s competitors for sale. MTM alleged this could cause customers to buy from one of those competitors, rather than encouraging the shopper to look for MTM watches elsewhere.

But everyone loves Amazon, the click throughs (which are not used to fund Beyond Search, thank you), and the wonky lovable founder. I am convinced he is the world’s smartest man. I mean who could even think of being more intelligent?

I suppose my dull average intelligence, like Multi Time’s, is just not able to understand the relevance of Amazon’s search and retrieval system.

Stephen E Arnold, July 7, 2015

SharePoint 2016 to Feature Deeply Ingrained Cloud Services

July 7, 2015

As additional details continue to be released, the SharePoint community speculates about the role of the cloud in the upcoming 2016 version. According to the GCN article, “SharePoint 2016 Built on Cloud Foundation,” cloud will play a central role.

Read all the details in the article, which begins:

“When SharePoint Server 2016 is released next year, Microsoft’s cloud services will be deeply ingrained, creating a more unified end user experience across components. ‘Everything we’re doing in Office 365 inspires the [SharePoint Server] product going forward, and you’ll see this cadence continuing,’ said Mark Kashman, a senior product manager at Microsoft on the SharePoint team.”

It sounds like users may have a steeper learning curve on this upcoming version, but then the pace may be set for the next several years. What will be interesting to see is whether users find the cloud focus to be intuitive, or if it is a hindrance, particularly for those who have voiced a preference for on-premises capabilities to continue. Microsoft is definitely trying to walk the line and be all things to all people, but then that has always been both its greatest strength and its greatest weakness. Stephen E. Arnold is a longtime leader in search and he knows the strengths and weaknesses well. His Web service, ArnoldIT.com, features a dedicated SharePoint feed, and is a great resource for users who need to stay up to speed without a huge investment in research time.

Emily Rae Aldridge, July 7, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Coveo Partners with Etherios on Salesforce Services

July 7, 2015

Professional services firm Etherios is teaming up with Coveo in a joint mission to add even more value to customers’ Salesforce platforms, we learn from “Etherios and Coveo Announce Strategic Alliance” at Yahoo Finance. Etherios is a proud Salesforce Platinum Partner. The press release tells us:

 “Coveo connects information from across a company’s IT ecosystem of record and delivers the knowledge that matters to customers and agents in context. Coveo for Salesforce – Communities Edition helps customers solve their own cases by proactively offering case-resolving knowledge suggestions, and Coveo for Salesforce – Service Cloud Edition allows customer support agents to upskill as they engage customers by injecting case-resolving content and experts into the Salesforce UI as they work.

“Etherios provides customers with consulting and implementation services in the areas of Sales, Customer Service, Field Service and IoT [Internet of Things]. … Etherios capabilities span operational strategy, business process, technical design and implementation expertise.”

 Founded in 2005, Coveo leverages search technology to boost users’ skills, knowledge, and proficiency while supplying tools for collaboration and self-service. The company maintains offices in the U.S. (SanMateo, CA), the Netherlands, and Quebec.

 A division of Digi International, Etherios launched in 2008 specifically to supply cloud-based tools for Salesforce users. They prefer to inhabit the cutting edge, and operate out of Chicago, Dallas, and San Francisco.

 Cynthia Murrell, July 7, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Digestible Content Tool For The Busy Person

July 7, 2015

RSS feeds and Web page readers curate content from select Web sites tailored to suit a users’ needs.  While all of the content is gathered in one spot and the headlines are available to read, sometimes the readers return hundreds of articles and users do not have the time to read all of them.  True, sometimes users can glen the facts from the headlines and the small blurb included with it, but sometimes it is not enough.

There are apps that gather and summarize a users’ content, but these are usually geared towards a specific industry or an enterprise system.   There is a content reader that was designed for the average user, while at the same time it can be programmed to serve the needs of many professionals.  The Context Organizer from Content Discovery Inc. is an application that summarizes Web pages and documents in order to pinpoint relevant information.    The Content Organizer works via five basic steps:

“1. Get to the point – Speed-up reading by condensing web pages, emails and documents into keywords and summaries presented in context.

  1. Make a Long Story Short – The Short Summary headlines most important sentences – instant information capsules.
  2. Accelerate Search – Search the web with relevant keywords. Summarize Google search results for rapid understanding.
  3. Take Notes – Quickly collect topics and sentences. Send them to WordPad or Word. Share notes – send them by e-mail.
  4. Visualize – View summaries in context as Mindjet MindManager maps.”

There are three different Context Organizer versions: one that specifically searches the Web, another that searches the Web and Microsoft Products, and the third is a combination of the prior versions plus it includes the Mindjet MindManager.  The prices range from $60-$120 with a free twenty-one day trial, which we suggest you start with.  Always start with free trial first, because you mind be throwing away money on an item you do not like.  With the amount of content available on the Web, any tool that helps organize and summarize it is worth investigating.

Whitney Grace, July 7, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta