Fragmented Data: Still a Problem?

January 28, 2019

Digital transitions are a major shift for organizations. The shift includes new technology and better ways to serve clients, but it also includes massive amounts of data. All organizations with a successful digital implementation rely on data. Too much data, however, can hinder organizations’ performance. The IT Pro Portal explains how data and something called mass data fragmentation is a major issue in the article, “What Is Mass Data Fragmentation, And What Are IT Leaders So Worried About It?”

The biggest question is: what exactly is mass data fragmentation? I learned:

“We believe one of the major culprits is a phenomenon called mass data fragmentation. This is essentially just a technical way of saying, ’data that is siloed, scattered and copied all over the place’ leading to an incomplete view of the data and an inability to extract real value from it. Most of the data in question is what’s called secondary data: data sets used for backups, archives, object stores, file shares, test and development, and analytics. Secondary data makes up the vast majority of an organization’s data (approximately 80 per cent).”

The article compares the secondary data to an iceberg, most of it is hidden beneath the surface. The poor visibility leads to compliance and vulnerability risks. In other words, security issues that put the entire organization at risk. Most organizations, however, view their secondary data as a storage bill, compliance risk (at least that is good), and a giant headache.

When surveyed about the amount of secondary data they have, it was discovered that organizations had multiple copies of the same data spread over the cloud and on premise locations. IT teams are expected to manage the secondary data across all the locations, but without the right tools and technology the task is unending, unmanageable, and the root of more problems.

If organizations managed their mass data fragmentation efficiently it would increase their bottom line, reduce costs, and reduce security risks. With more access points to sensitive data and they are not secure, it increases the risk of hacking and information being stolen.

Whitney Grace, January 28, 2019

Written by Stephen E. Arnold · Filed Under Cloud computing, Data, Database, Federated search, News | Comments Off on Fragmented Data: Still a Problem?

Amazon Intelligence Gets a New Data Stream

June 28, 2018

I read “Amazon’s New Blue Crew.” The idea is that Amazon can disintermediate FedEx, UPS (the outfit with the double parking brown trucks), and the US Postal Service.

On the surface, the idea makes sense. Push down delivery to small outfits. Subsidize them indirectly and directly. Reduce costs and eliminate intermediaries not directly linked to Amazon.

FedEx, UPS, and the USPS are not the most nimble outfits around. I used to get FedEx envelopes every day or two. I haven’t seen one of those for months. Shipping vis UPS is a hassle. I fill out forms and have to manage odd slips of paper with arcane codes on them. The US Postal Services works well for letters, but I have noticed some returns for “addresses not found.” One was an address in the city in which I live. I put the letter in the recipient’s mailbox. That worked.

The write up reports:

The new program lets anyone run their own package delivery fleet of up to 40 vehicles with up to 100 employees. Amazon works with the entrepreneurs — referred to as “Delivery Service Partners” — and pays them to deliver packages while providing discounts on vehicles, uniforms, fuel, insurance, and more. They operate their own businesses and hire their own employees, though Amazon requires them to offer health care, paid time off, and competitive wages. Amazon said entrepreneurs can get started with as low as $10,000 and earn up to $300,000 annually in profit.

Now what’s the connection to Amazon streaming data services and the company’s intelligence efforts? Several hypotheses come to mind:

Amazon obtains fine grained detail about purchases and delivery locations. These are data which no longer can be captured in a non Amazon delivery service system
The data can be cross correlated; for example, purchasers of a Kindle title with the delivery of a particular product; for example, hydrogen peroxide
Amazon’s delivery data make it possible to capture metadata about delivery time, whether a person accepted the package or if the package was left at the door and other location details such as a blocked entrance, for instance.

A few people dropping off packages is not particularly useful. Scale up the service across Amazon operations in the continental states or a broader swatch of territory and the delivery service becomes a useful source of high value information.

FedEx and UPS are ripe for disruption. But so is the streaming intelligence sector. Worth monitoring this ostensible common sense delivery play.

Stephen E Arnold, June 28, 2018

Written by Stephen E. Arnold · Filed Under Amazon, Analytics, Data, Data mining, News | Comments Off on Amazon Intelligence Gets a New Data Stream

Health Care: Data an Issue

June 28, 2018

Healthcare analytics is helping doctors and patients make decisions in ways we never could have dreamed. From helping keep your heart healthy to deciding when to have major surgery, analytic numbers make a big impact. However, that data needs to be perfect in order to work, according a recent ZD Net story, “Google AI is Very Good at Predicting When a Patient is Going to Die.”

According to the story:

“As noted, 80 percent of the effort in creating an analytic model is in cleaning the data, so it could provide a way to scale up predictive models, assuming the data is available to mine…. “This technique would allow clinicians to check whether a prediction is based on credible facts and address concerns about so-called ‘black-box’ methods that don’t explain why a prediction has been made.”

This really illustrates how powerful clean data can be in the health field. However, cleaning data is just about the most misunderstood wallflower in the often tedious world of machine learning and data science—not just in healthcare. According to Entrepreneur magazine, the act of filling in blanks, removing outliers, and basically looking at all the data to make sure it will be accurate, is the most important part of the process and also the hardest role to fill on a team.

Garbage in, garbage out. True decades ago. True today. How do we know? Just ask one of IBM Watson’s former health care specialists. Querying patients who were on the wrong end of a smart output may be helpful as well.

Patrick Roland, June 28, 2018

Written by Stephen E. Arnold · Filed Under AI, Data, healthcare, News | Comments Off on Health Care: Data an Issue

Who Guesses Better: Humans or Smart Software

March 28, 2018

MBAs are likely to pay close attention to smart software which makes decisions about which start up or stock to back.

With all the hand wringing about how artificial intelligence is going to put a lot of people out of work and drastically change our future landscape, it’s almost as if commentators are making it a given that humans are inferior. These writers and thinkers don’t seem to have any faith that our brains can do the heavy lifting to. CNBC recently found a niche where maybe we simple men and women can keep up thanks to…research, of course. We learned more in the article, “Doing Your Homework Does Lead to Better Investing returns.”

According to the story:

“…sophisticated hedge-fund managers are simply more skilled at processing swaths of information and data, their advantage may be more in their ability to match private data with public disclosures and SEC filings. ‘We look at the people who do robotic downloading. The people who use it suggests that hedge funds are going out and that they’re getting public information whenever they need.’”

It’s a great angle, for sure. That with endless hours of research, our investments can turn to gold. However, this overlooks the idea that there may be flaws with the data itself. What if you are using biased info or downright bad data?

Perhaps the humans are better at picking winners than smart software. Data are not created equal. Smart software may incur a penalty because of flawed inputs. Bad data can cripple some data analytics outputs.

Net net, as the MBAs say, data have to be reliable. For now, bet on the human when it comes to deciding about investments.

Patrick Roland, March 28, 2018

Written by Stephen E. Arnold · Filed Under AI, Data, News | Comments Off on Who Guesses Better: Humans or Smart Software

Importance of Good Data to AI Widely Underappreciated

March 27, 2018

Reliance on AI has now become embedded in our culture, even as we struggle with issues of algorithmic bias and data-driven discrimination. Tech news site CIO reminds us, “AI’s Biggest Risk Factor: Data Gone Wrong.” In the detailed article, journalist Maria Korolov begins with some early examples of “AI gone bad” that have already occurred, and explains how this happens; hard-to-access data, biases lurking within training sets, and faked data are all concerns. So is building an effective team of data management workers who know what they are doing. Regarding the importance of good data, Korolov writes:

Ninety percent of AI is data logistics, says JJ Guy, CTO at Jask, an AI-based cybersecurity startup. All the major AI advances have been fueled by advances in data sets, he says. ‘The algorithms are easy and interesting, because they are clean, simple and discrete problems,’ he says. ‘Collecting, classifying and labeling datasets used to train the algorithms is the grunt work that’s difficult — especially datasets comprehensive enough to reflect the real world.’… However, companies often don’t realize the importance of good data until they have already started their AI projects. ‘Most organizations simply don’t recognize this as a problem,’ says Michele Goetz, an analyst at Forrester Research. ‘When asked about challenges expected with AI, having well curated collections of data for training AI was at the bottom of the list.’ According to a survey conducted by Forrester last year, only 17 percent of respondents say that their biggest challenge was that they didn’t ‘have a well-curated collection of that to train an AI system.’

Eliminating bias gleaned from training sets (like one AI’s conclusion that anyone who’s cooking must be a woman) is tricky, but certain measures could help. For example, tools that track how an algorithm came to a certain conclusion can help developers correct its impression. Also, independent auditors bring in a fresh perspective. These delicate concerns are part of why, says Korolov, AI companies are “taking it slow.” This is slow? We’d better hang on to our hats whenever (they decide) they’ve gotten a handle on these issues.

Cynthia Murrell, March 27, 2018

Written by Stephen E. Arnold · Filed Under AI, algorithms, Data, News | Comments Off on Importance of Good Data to AI Widely Underappreciated

Million Short: A Metasearch Option

March 22, 2018

An interview at Forbes delves into the story behind Million Short, an alternative to Google for Internet Search. As concerns grow about online privacy, information accuracy, and filter bubbles, options that grant the user more control appeal to many. Contributor Julian Mitchell interviews Million Short founder and CEO Sanjay Arora in his piece, “This Search Engine Startup Helps You Find What Google Is Missing.” Mitchell informs us:

Founded in 2012, Million Short is an innovative search engine that takes a new and focused approach to organizing, accessing, and discovering data on the internet. The Toronto-based company aims to provide greater choices to users seeking information by magnifying the public’s access to data online. Cutting through the clutter of popular searches, most-viewed sites and sponsored suggestions, Million Short allows users to remove up to the top one million sites from the search set. Removing ‘an entire slice of the web’, the company hopes to balance the playing field for sites that may be new, suffer from poor SEO, have competitive keywords, or operate a small marketing budget. Million Short Founder and CEO Sanjay Arora shares the vision behind his company, overthrowing Google’s search engine monopoly, and his insight into the future of finding information online.

The subsequent interview gets into details, like Arora’s original motivation for creating Million Short—Search is too important to be dominated by a just few companies, he insists. The pair explores both advantages and challenges the company has seen, as well as a look to the future. See the article for more.

Cynthia Murrell, March 22, 2018

Written by Stephen E. Arnold · Filed Under Data, News, Search, search engine | Comments Off on Million Short: A Metasearch Option

Reddit Turns to Bing for AI Prowess

March 19, 2018

This is an interesting development—MSPoweruser announces, “Reddit Partners with Microsoft and Bing for AI Tools.” We’d though Reddit was thrilled with Solr. Reddit CEO Alexis Ohanian announced the partnership at the Everyday AI event in San Francisco, saying his company required “AI heavy lifting” to analyze the incredible amounts of data it collects. For its part, Bing gets access to valuable data. Writer Surur tells us:

The partnership will benefit both parties with Reddit contributing content to Bing such as AMAs and advertising upcoming AMAs and Reddit Answers and Microsoft making subreddit content more visible in their search results. Now when searching for a subreddit in Bing it will deliver a live snapshot of the top threads in the subreddit. Ohanian noted that Reddit is the largest answer database of nuanced, verified answers, offering an amazing resource to Bing. He noted that the Bing partnership was like a crown jewel for Reddit and just scratches the surface of what is possible with Microsoft’s AI expertise and Reddit data. For companies who use Reddit for professional and commercial reasons, Reddit will be offering the Power BI suite of solution templates for brand management and targeting on Reddit which will enable brands, marketers, and budget owners to quickly analyze their Reddit footprint and determine how, where, and with whom to engage in the Reddit community.

With 330 million active monthly users, Reddit is about the same size as Twitter; that is indeed a lot of data. Surur points us to Reddit’s blog post on the subject for more information.

Cynthia Murrell, March 19, 2018

Written by Stephen E. Arnold · Filed Under AI, Data, Microsoft, News | Comments Off on Reddit Turns to Bing for AI Prowess

New SEO Predictions May Just Be Spot On

March 7, 2018

What will 2018 bring us? If the past twelve months were any indication, we have no idea what will hit next. However, that doesn’t stop the experts from trying to cash in on their Nostradamus abilities. Some of them actually sound pretty plausible, like Search Engine Journal article, “47 Experts on the Top SEO Trends For 2018.”

There are some real longshots on the list, but also some really insightful thoughts like:

In 2018 there will be an even bigger focus on machine learning and “SEO from data.” Of course, the amplification side of things will continue to integrate increasingly with genuine public relations exercises rather than shallow-relationship link building, which will become increasingly easy to detect by search engines.

Something which was troubling about 2017, and as we head into 2018, is the new wave of organizations merely bolting on SEO as a service without any real appreciation of structuring site architectures and content for both humans and search engine understanding. While social media is absolutely essential as a means of reaching influencers and disrupting a conversation to gain traction, grow trust and positive sentiment, those who do not take the time to learn about how information is extracted for search too may be disappointed.

We especially agree with how the importance of SEO will grow in the new year. Innovative organizations are finding amazing new ways to manipulate the data and we don’t expect that to stop. It’ll be interesting to see where we stand twelve months from now.

Patrick Roland, March 7, 2018

Written by Stephen E. Arnold · Filed Under Data, News, search engine, SEO | Comments Off on New SEO Predictions May Just Be Spot On

Universal Text Translation Is the Next Milestone for AI

February 9, 2018

As the globe gets smaller, individuals are in more contact with people who don’t speak their language. Or, we are reading information written in a foreign language. Programs like Google Translate are flawed at best and it is clear this is a niche waiting to be filled. With the increase of AI, it looks like that is about to happen, according to a recent GCN article, “IARPA Contracts for Universal Text Translator.”

According to the article:

The Intelligence Advanced Research Projects Activity is a step closer to developing a universal text translator that will eventually allow English speakers to search through multilanguage data sources — such as social media, newswires and press reports — and retrieve results in English.

The intelligence community’s research arm awarded research and performance monitoring contracts for its Machine Translation for English Retrieval of Information in Any Language program to teams headed by leading research universities paired with federal technology contractors.

Intelligence agencies, said IARPA project managers in a statement in late December, grapple with an increasingly multilingual, worldwide data pool to do their analytic work. Most of those languages, they said, have few or no automated tools for cross-language data mining.

This sounds like a very promising opportunity to get everyone speaking the same language. However, we think there is still a lot of room for error. We are hedging our bets on Unibabel’s AI translation software that is backed up by human editors. (They raised $23M, so they must be doing something right.) That human angle seems to be the hinge that will be a success for someone in this rich field.

Patrick Roland, February 9, 2018

Written by Stephen E. Arnold · Filed Under AI, Data, News, Tools | Comments Off on Universal Text Translation Is the Next Milestone for AI

Data Governance is the Hot Term in Tech Now

February 5, 2018

Data governance is a headache many tech companies have to juggle with. With all the advances in big data and search, how can we possibly make sense of this rush of information? Thankfully, there are new data governance advances that aim to help. We learned more from a recent Top Quadrant story, “How Does SHACL Support Data Governance.”

According to the story:

“SHACL (SHAPES Constraint Language) is a powerful, recently released W3C standard for data modeling, ontology design, data validation, inferencing and data transformation. In this post, we explore some important ways in which SHACL can be used to support capabilities needed for data governance.

Below, each business capability or value relevant to data governance is introduced with a brief description, followed by an explanation of how the capability is supported by SHACL, accompanied by a few specific examples from the use of SHACL in TopBraid Enterprise Data Governance.

So, governance is a great way for IT and business to communicate better and wade through the data. Others are starting to take notice and SHACL is not just the only solution. In fact, there are a wealth of options available, you just have to know where to look. Regardless, your business is going to have to take governance seriously and it’s better to start sooner than later.

Patrick Roland, February 5, 2018

Written by Stephen E. Arnold · Filed Under Data, Enterprise, Government, News | Comments Off on Data Governance is the Hot Term in Tech Now

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Employment
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Fragmented Data: Still a Problem?

Amazon Intelligence Gets a New Data Stream

Health Care: Data an Issue

Who Guesses Better: Humans or Smart Software

Importance of Good Data to AI Widely Underappreciated

Million Short: A Metasearch Option

Reddit Turns to Bing for AI Prowess

New SEO Predictions May Just Be Spot On

Universal Text Translation Is the Next Milestone for AI

Data Governance is the Hot Term in Tech Now

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta