Watson Is Laying Startup Eggs

December 21, 2015

Incubators are warming stations for eggs. Without having to rely on an organism’s DNA donor, an incubator provides a warm, safe environment for the organism to develop, hatch, and eventually be ready to face the world. Watson has decided it is time for itself to propagate, but instead of knitting tiny computer cases Watson will invest its digital DNA in startups. The Chicago Tribune discusses Watson’s reproduction efforts and progeny in “Watson, IBM’s Big-Data Program Is Also A Startup Incubator.”

While IBM sells Watson’s ability to scan and understand terabytes of data, the company also welcomes developers to use Watson for new ideas. What is even more amazing is that IBM gives developers the ability to use Watson for free for a limited time.

“In Ecosystem, everyone is invited to play with Watson for free (for a limited time); some 77,000 developers have accepted. If your Watson-powered startup shows promise, it becomes a “partner,” often via a quasi-incubator model, and enjoys access to IBM business and technology advisers–and a shot at a capital infusion from the $100 million IBM is making available to Watson startups…”

Ecosystem has been used for startups that feature lifestyle coaching, personal shopping, infrastructure guards, veterinarian advice, fantasy sports calculator, 311 information, and even a hotel butler.

To quote the biblical justification for propagation: “Go forth and multiply the [Watson startups].”

Whitney Grace, December 21, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Big data, Business strategy, Data, IBM Watson, Marketing, News, Search | Comments Off on Watson Is Laying Startup Eggs

New Patent for a Google PageRank Methodology

December 18, 2015

Google recently acquired a patent for a different approach to page ranking, we learn from “Recalculating PageRank” at SEO by the Sea. Though the patent was just granted, the application was submitted back in 2006. Writer Bill Slawski informs us:

“Under this new patent, Google adds a diversified set of trusted pages to act as seed sites. When calculating rankings for pages. Google would calculate a distance from the seed pages to the pages being ranked. A use of a trusted set of seed sites may sound a little like the TrustRank approach developed by Stanford and Yahoo a few years ago as described in Combating Web Spam with TrustRank (pdf). I don’t know what role, if any, the Yahoo paper had on the development of the approach in this patent application, but there seems to be some similarities. The new patent is: Producing a ranking for pages using distances in a Web-link graph.”

The theory behind trusted pages is that “good pages seldom point to bad ones.” The patent’s inventor, Nissan Hajaj, has been a Google senior engineer since 2004. See the write-up for the text of the patent, or navigate straight to the U.S. Patent and Trademark Office’s entry on the subject.

Cynthia Murrell, December 18, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Data, Google, News, Search, SEO, Web Services, Yahoo | 1 Comment

Old School Mainframes Still Key to Big Data

December 17, 2015

According to ZDNet, “The Ultimate Answer to the Handling of Big Data: The Mainframe.” Believe it or not, a recent survey of 187 IT pros from Syncsort found the mainframe to be the important to their big data strategy. IBM has even created a Hadoop-capable mainframe. Reporter Ken Hess lists some of the survey’s findings:

*More than two-thirds of respondents (69 percent) ranked the use of the mainframe for performing large-scale transaction processing as very important

*More than two-thirds (67.4 percent) of respondents also pointed to integration with other standalone computing platforms such as Linux, UNIX, or Windows as a key strength of mainframe

*While the majority (79 percent) analyze real-time transactional data from the mainframe with a tool that resides directly on the mainframe, respondents are also turning to platforms such as Splunk (11.8 percent), Hadoop (8.6 percent), and Spark (1.6 percent) to supplement their real-time data analysis […]

*82.9 percent and 83.4 percent of respondents cited security and availability as key strengths of the mainframe, respectively

*In a weighted calculation, respondents ranked security and compliance as their top areas to improve over the next 12 months, followed by CPU usage and related costs and meeting Service Level Agreements (SLAs)

*A separate weighted calculation showed that respondents felt their CIOs would rank all of the same areas in their top three to improve

Hess goes on to note that most of us probably utilize mainframes without thinking about it; whenever we pull cash out of an ATM, for example. The mainframe’s security and scalability remain unequaled, he writes, by any other platform or platform cluster yet devised. He links to a couple of resources besides the Syncsort survey that support this position: a white paper from IBM’s Big Data & Analytics Hub and a report from research firm Forrester.

Cynthia Murrell, December 17, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Analytics, Data, IBM Watson, News, Search, Technology | Comments Off on Old School Mainframes Still Key to Big Data

Google Timeline Knows Where You Have Been

December 16, 2015

We understand that to get the most out of the Internet, we sacrifice a bit of privacy; but do we all understand how far-reaching that sacrifice can be? The Intercept reveals “How Law Enforcement Can Use Google Timeline to Track Your Every Move.” For those who were not aware, Google helpfully stores all the places you (or your devices) have traveled, down to longitude and latitude, in Timeline. Now, with an expansion launched in July 2015, that information goes back years, instead of just six months. Android users must actively turn this feature off to avoid being tracked.

The article cites a report titled “Google Timelines: Location Investigations Involving Android Devices.” Written by a law-enforcement trainer, the report is a tool for investigators. To be fair, the document does give a brief nod to privacy concerns; at the same time, it calls it “unfortunate” that Google allows users to easily delete entries in their Timelines. Reporter Jana Winter writes:

“The 15-page document includes what information its author, an expert in mobile phone investigations, found being stored in his own Timeline: historic location data — extremely specific data — dating back to 2009, the first year he owned a phone with an Android operating system. Those six years of data, he writes, show the kind of information that law enforcement investigators can now obtain from Google….

“The ability of law enforcement to obtain data stored with privacy companies is similar — whether it’s in Dropbox or iCloud. What’s different about Google Timeline, however, is that it potentially allows law enforcement to access a treasure trove of data about someone’s individual movement over the course of years.”

For its part, Google admits they “respond to valid legal requests,” but insists the bar is high; a simple subpoena has never been enough, they insist. That is some comfort, I suppose.

Cynthia Murrell, December 16, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Analytics, Data, Google, Mobile, News, Search | Comments Off on Google Timeline Knows Where You Have Been

Big Data Gets Emotional

December 15, 2015

Christmas is the biggest shopping time of the year and retailers spending months studying consumer data. They want to understand consumer buying habits, popular trends in clothing, toys, and other products, physical versus online retail, and especially what competition will be doing sale wise to entice more customers to buy more. Smart Data Collective recently wrote about the science of shopping in “Using Big Data To Track And Measure Emotion.”

Customer experience professionals study three things related to customer spending habits: ease, effectiveness, and emotion. Emotion is the biggest player and is the biggest factor to spur customer loyalty. If data specialists could figure out the perfect way to measure emotion, shopping and science would change as we know it.

“While it is impossible to ask customers how do they feel at every stage of their journey, there is a largely untapped source of data that can provide a hefty chunk of that information. Every day, enterprise servers store thousands of minutes of phone calls, during which customers are voicing their opinions, wishes and complaints about the brand, product or service, and sharing their feelings in their purest form.”

The article describes some methods emotional data is fathered: phone recordings, surveys, and with vocal layer speech layers being the biggest. Analytic platforms that measure vocal speech layers that measure relationships between words and phrases to understand the sentiment. The emotions are ranged on a five-point scale, ranging from positive to negative to discover patterns that trigger reactions.

Customer experience input is a data analyst’s dream as well as nightmare based on all of the data constantly coming.

Whitney Grace, December 15, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Big data, Data, News, Search, Search quality, Security | Comments Off on Big Data Gets Emotional

Easy as 1,2,3 Common Mistakes Made with Data Lakes

December 15, 2015

The article titled Avoiding Three Common Pitfalls of Data Lakes on DataInformed explores several pitfalls that could negate the advantages of data lakes. The article begins with the perks, such as easier data access and of course, the cost-effectiveness of keeping data in a single hub. The first is sustainability (or the lack thereof), since the article emphasizes that data lakes actually require much more planning and management of data than conventional databases. The second pitfall raised is resource allocation,

“Another common pitfall of implementing data lakes arises when organizations need data scientists, who are notoriously scarce, to generate value from these hubs. Because data lakes store data in their native format, it is common for data scientists to spend as much as 80 percent of their time on basic data preparation. Consequently, many of the enterprise’s most valued resources are dedicated to mundane, time-consuming processes that considerably lengthen time to action on potentially time-sensitive big data.“

The third pitfall is technology contradictions or trying to use traditional approaches on a data lake that holds both big and unstructured data. Be not alarmed, however, the article goes into great detail about how to avoid these issues through data lake development with smart data technologies such as semantic tech.

Chelsea Kerwin, December 15, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Analytics, Data, News, Search, Security, Technology | 1 Comment

Bill Legislation Is More Complicated than Sitting on Capitol Hill

December 14, 2015

When I was in civics class back in the day and learning about how a bill became an official law in the United States, my teacher played Schoolhouse Rock’s famous “I’m Just a Bill” song. While that annoying retro earworm still makes the education rounds, the lyrics need to be updated to record some of the new digital “paperwork” that goes into tracking a bill. Engaging Cities focuses on legislation data in “When Lobbyists Write Legislation, This Data Mining Tool Traces The Paper Trail.”

While the process to make a bill might seem simple according to Schoolhouse Rock, it is actually complicated and is even crazier as technology pushes more bills through the legislation process. In 2014, there were 70,000 state bills introduced across the country and no one has the time to read all of them. Technology can do a much better and faster job.

“ A prototype tool, presented in September at Bloomberg’s Data for Good Exchange 2015 conference, mines the Sunlight Foundation’s database of more than 500,000 bills and 200,000 resolutions for the 50 states from 2007 to 2015. It also compares them to 1,500 pieces of “model legislation” written by a few lobbying groups that made their work available, such as the conservative group ALEC (American Legislative Exchange Council) and the liberal group the State Innovation Exchange(formerly called ALICE).”

A data-mining tool for government legislation would increase government transparency. The software tracks earmarks in the bills to track how the Congressmen are benefiting their states with these projects. The software analyzed earmarks as far back as 1995 and it showed that there are more than anyone knew. The goal of the project is to scour the data that the US government makes available and help people interpret it, while also encouraging them to be active within the laws of the land.

The article uses the metaphor “need in a haystack” to describe all of the government data. Government transparency is good, but when they overload people with information it makes them overwhelmed.

Whitney Grace, December 14, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Analytics, Data, Government, News, Search, Security | Comments Off on Bill Legislation Is More Complicated than Sitting on Capitol Hill

Search Data from Bing for 2015 Yields Few Surprises

December 11, 2015

The article on Search Engine Watch titled Bing Reveals the Top US and UK Searches of 2015 in the extremely intellectual categories of Celebs, News, Sport(s), Music, and Film. Starting with the last category, guess what franchise involving wookies and Carrie Fisher took the top place? For Celebrity searches, Taylor Swift took first in the UK, and Caitlyn Jenner in the US, followed closely by Miley Cyrus (and let’s all take a moment to savor the seething rage this data must have caused in Kim Kardashian’s heart.) What does this trivia matter? Ravleen Beeston, UK Sales Director of Bing, is quoted in the article with her two cents,

“Understanding the interests and motivations driving search behaviour online provides invaluable insight for marketers into the audiences they care about. This intelligence allows us to empower marketers to create meaningful connections that deliver more value for both consumers and brands alike. By reflecting back on the key searches over the past 12 months, we can begin to anticipate what will inspire and how to create the right experience in the right context during the year to come.”

Some of the more heartening statistics were related to searches for women’s sports news, which increased from last year. Serena Williams was searched more often than the top five male tennis players combined. And saving the best for last, in spite of the dehumanizing and often racially biased rhetoric we’ve all heard involving Syrian refugees, there was a high volume of searches in the US asking how to provide support and aid for refugees, especially children.

Chelsea Kerwin, December 11, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Bing, Data, Marketing, News, Search, Social Media | 1 Comment

Know Thy Hacker

December 10, 2015

Writer Alastair Paterson at SecurityWeek suggests that corporations and organizations prepare their defenses by turning a hacking technique against the hackers in, “Using an Attacker’s ‘Shadow’ to Your Advantage.” The article explains:

“A ‘digital shadow’ is a subset of a digital footprint and consists of exposed personal, technical or organizational information that is often highly confidential, sensitive or proprietary. Adversaries can exploit these digital shadows to reveal weak points in an organization and launch targeted attacks. This is not necessarily a bad thing, though. Some digital shadows can prove advantageous to your organization; the digital shadows of your attackers. The adversary also casts a shadow similar to that of private and public corporations. These ‘shadows’ can be used to better understand the threat you face. This includes attacker patterns, motives, attempted threat vectors, and activities. Armed with this enhanced understanding, organizations are better able to assess and align their security postures.”

Paterson observes that one need not delve into the Dark Web to discern these patterns, particularly when the potential attacker is a “hactivist” (though one can find information there, too, if one is so bold). Rather, hactivists often use social media to chronicle their goals and activities. Monitoring these sources can give a company clues about upcoming attacks through records like target lists, responsibility claims, and discussions on new hacking techniques. Keeping an eye on such activity can help companies build appropriate defenses.

Cynthia Murrell, December 10, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Dark Web, Data, News, Security | 1 Comment

Understanding Trolls, Spam, and Nasty Content

December 9, 2015

The Internet is full of junk. It is a cold hard fact and one that will never die as long as the Internet exists. The amount of trash content was only intensified with the introduction of Facebook, Twitter, Instagram, Pinterst, and other social media platforms and it keeps pouring onto RSS feeds. The academic community is always up for new studies and capturing new data, so a researcher from the University of Arkansas decided to study mean content. “How ‘Deviant’ Messages Flood Social Media” from Science Daily is an interesting new idea that carries the following abstract:

“From terrorist propaganda distributed by organizations such as ISIS, to political activism, diverse voices now use social media as their major public platform. Organizations deploy bots — virtual, automated posters — as well as enormous paid “armies” of human posters or trolls, and hacking schemes to overwhelmingly infiltrate the public platform with their message. A professor of information science has been awarded a grant to continue his research that will provide an in-depth understanding of the major propagators of viral, insidious content and the methods that make them successful.”

Dr. Nitin Agarwal and will study what behavioral, social, and computational factors cause Internet content to go viral, especially if they have deviant theme. Deviant means along the lines something a troll would post. Agarwal’s research is part of a bigger investigation funded by the Office of Naval Research, Air Force Research, National Science Foundation, and Army Research Office. Agarwal will have a particular focus on how terrorist groups and extremist governments use social media platforms to spread their propaganda. He will also be studying bots that post online content as well.

Many top brass organizations do not have the faintest idea of even what some of the top social media platforms are, much less what their purpose is. A study like this will raise the blinders about them and teach researchers how social media actually works. I wonder if they will venture into 4chan.

Whitney Grace, December 9, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Data, Management, News, Search, Social Media | 1 Comment

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Watson Is Laying Startup Eggs

New Patent for a Google PageRank Methodology

Old School Mainframes Still Key to Big Data

Google Timeline Knows Where You Have Been

Big Data Gets Emotional

Easy as 1,2,3 Common Mistakes Made with Data Lakes

Bill Legislation Is More Complicated than Sitting on Capitol Hill

Search Data from Bing for 2015 Yields Few Surprises

Know Thy Hacker

Understanding Trolls, Spam, and Nasty Content

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta