Markov: Maths for the Newspaper Reader

September 14, 2017

Remarkable. I read a pretty good write up called “That’s Maths: Andrey Markov’s Brilliant Ideas Are Still Bearing Fruit.” I noted the source of the article: The Irish Times. A “real” newspaper. Plus it’s Irish. Quick name a great Irish mathematician? I like Sir William Rowan Hamilton, who my slightly addled mathy relative Vladimir Igorevich Arnold and his boss/mentor/leader of semi clothed hikes in the winter Andrey Kolmogorov thought was an okay guy.

Markov liked literature. Well, more precisely, he liked to count letter frequencies and occurrence in Russian novels like everyone’s fave Eugene Onegin. His observations fed his insight that a Markov Process or Markov Chain was a useful way to analyze probabilities in certain types of data. Applications range from making IBM Watson great again to helping outfits like Sixgill generate useful outputs. (Not familiar with Sixgill? I cover the company in my forthcoming lecture at the TechnoSecurity & Digital Forensics Conference next week.)

I noted this passage which I thought was sort of accurate or at least close enough for readers of “real” newspapers:

For a Markov process, only the current state determines the next state; the history of the system has no impact. For that reason we describe a Markov process as memoryless. What happens next is determined completely by the current state and the transition probabilities. In a Markov process we can predict future changes once we know the current state.

The write up does not point out that the Markov Process becomes even more useful when applied to Bayesian methods enriched with some LaPlacian procedures. Now stir in the nuclear industry’s number one with a bullet Monte Carlo method and stir the ingredients. In my experience and that of my dear but departed relative, one can do a better job at predicting what’s next than a bookie at the Churchill Downs Racetrack. MBAs on Wall Street have other methods for predicting the future; namely, chatter at the NYAC or some interactions with folks in the know about an important financial jet blast before ignition.

A happy quack to the Irish Times for running a useful write up. My great uncle would emit a grunt, which is as close as he came to saying, “Good job.”

Stephen E Arnold, September 14, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Mathematics, News | Comments Off on Markov: Maths for the Newspaper Reader

Instagram Algorithm to Recognize Cruelty and Kindness

September 14, 2017

Instagram is using machine learning to make its platform a kinder place, we learn from the CBS News article, “How Instagram is Filtering Out Hate.” Contributor (and Wired Editor-In-Chief) Nick Thompson interviewed Instagram’s CEO Kevin Systrom, and learned the company is using about 20 humans to teach its algorithm to distinguish naughty from nice. The article relates:

Systrom has made it his mission to make kindness itself the theme of Instagram through two new phases: first, eliminating toxic comments, a feature that launched this summer; and second, elevating nice comments, which will roll out later this year. ‘Our unique situation in the world is that we have this giant community that wants to express themselves,’ Systrom said. ‘Can we have an environment where they feel comfortable to do that?’ Thompson told ‘CBS This Morning’ that the process of ‘machine learning’ involves teaching the program how to decide what comments are mean or ‘toxic’ by feeding in thousands of comments and then rating them.

It is smarter censorship if you will. Systrom seems comfortable embracing a little censorship in favor of kindness, and we sympathize; “trolls” are a real problem, after all. Still, the technology could, theoretically, be used to delete or elevate certain ideological or political content. To censor or not to censor is a fine and important line, and those who manage social media sites will be the ones who must walk it. No pressure.

Cynthia Murrell, September 14, 2017

Written by Stephen E. Arnold · Filed Under algorithms, News, Social Media, Technology | Comments Off on Instagram Algorithm to Recognize Cruelty and Kindness

An Algorithm with Unintended Consequences

September 12, 2017

Some of us who follow developments in AI wondered about this: apparently, the algorithm YouTube tasked with eliminating “extremist content” on its platform goes too far. Business Insider reports, “YouTube’s Crackdown on Extremist Content and ISIS Is Also Hurting Researchers and Journalists.” It is a good thing there now exist commercial services that can meet the needs of analysts, researchers, and government officials; many of these services are listed in Stephen E Arnold’s Dark Web Notebook.

In this case, the problem is an algorithm that cannot always distinguish between terrorist propaganda and terrorist coverage. Since the site implemented its new steps to combat terrorist content, several legitimate researchers and journalists have protested that their content was caught in the algorithm’s proverbial net and summarily removed; some of it had been available on the site for years. Reporter Rob Price writes

Open-source researcher Eliot Higgins says he has had his old videos about Syria deleted and his account was suspended as the Google-owned video platform attempts to tackle material that supports terrorism. Middle East Eye reports that Syrian opposition news site Orient News was also deleted, as was a video uploaded by one of the publication’s own journalists. ‘YouTube has now suspended my account because of videos of Syria I uploaded 2-3 years ago. Nice anti-ISIS AI you’ve got there, YouTube,’ Higgins tweeted on Saturday. ‘Ironically, by deleting years-old opposition channels YouTube is doing more damage to Syrian history than ISIS could ever hope to achieve.’ In another incident, a video from American journalist Alexa O’Brien’s video that was used in Chelsea Manning’s trial was deleted, according to Middle East Eye.

Higgins, whose account has since been reinstated, has an excellent point—ultimately, tools that destroy important documentation along with propaganda are counter-productive. Yes, algorithms are faster (and cheaper) than human workers. But do we really want to sacrifice first-hand footage of crucial events for the sake of speedy sanitization? There must be a better way.

Cynthia Murrell, September 12, 2017

Written by Stephen E. Arnold · Filed Under AI, algorithms, Dark Web, News | Comments Off on An Algorithm with Unintended Consequences

My Feed Personalization a Step Too Far

September 8, 2017

In an effort to be even more user-friendly and to further encourage a narcissistic society, Google now allows individuals to ‘follow’ or ‘unfollow’ topics, delivered daily to devices, as they deem them interesting or uninteresting. SEJ explains the new feature which is considered an enhancement of their ‘my feed’ which is intended to personalize news.

As explained in the article,

Further advancements to Google’s personalized feed include improved machine learning algorithms, which are said to be more capable at anticipating what an individual may find interest. In addition to highlighting stories around manually and algorithmically selected topics of interest, the feed will also display stories trending in your area and around the world.

That seems like a great way to keep people current on topics ranging geographically, politically and culturally, but with the addition of ‘follow’ or ‘unfollow’, once again, individuals can reduce their world to a series of pop-star updates and YouTube hits. Isn’t it an oxymoron to both suggest topics and stories in an effort to keep an individual informed of the world around them, and yet allow them to stop the suggestions are they appear boring or lack familiarity? Now, Google, you can do better.

Catherine Lamsfuss, September 15, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Google, News, User experience | Comments Off on My Feed Personalization a Step Too Far

How Search Moves Forward

September 8, 2017

Researchers at UT Austin are certainly into search engines, and are eager to build improved neural models. The piece “The Future of Search Engines” at Innovation Toronto examines two approaches, suggested by associate professor Matthew Lease, to create more effective information retrieval systems. The article begins by describing how search engines currently generate their results:

The outcome is the result of two powerful forces in the evolution of information retrieval: artificial intelligence — especially natural language processing — and crowdsourcing. Computer algorithms interpret the relationship between the words we type and the vast number of possible web pages based on the frequency of linguistic connections in the billions of texts on which the system has been trained. But that is not the only source of information. The semantic relationships get strengthened by professional annotators who hand-tune results — and the algorithms that generate them — for topics of importance, and by web searchers (us) who, in our clicks, tell the algorithms which connections are the best ones. Despite the incredible, world-changing success of this model, it has its flaws. Search engine results are often not as ‘smart’ as we’d like them to be, lacking a true understanding of language and human logic. Beyond that, they sometimes replicate and deepen the biases embedded in our searches, rather than bringing us new information or insight.

The first paper, Learning to Effectively Select Topics For Information Retrieval Test Collections (PDF), details a way to pluck and combine the best work of several annotators, professional and crowd-sourced alike, for each text. The Innovation Toronto article spends more time on the second paper, Exploiting Domain Knowledge via Grouped Weight Sharing with Application to Text Categorization (PDF). The approach detailed here taps into existing resources like WordNet, a lexical database for the English language, and domain ontologies like the Unified Medical Language System. See the article for the team’s suggestions on using weight sharing to blend machine learning and human knowledge.

The researchers’ work was helped by grants from the National Science Foundation, the Institute of Museum and Library Services, and the Defense Advanced Research Projects Agency, three government organizations hoping for improvements in the quality of crowdsourced information. We’re reminded that, though web-search companies do perform their own research, it is necessarily focused on commercial applications and short-term solutions. The sort of public investment we see at work here can pave the way to more transformative, long-term developments, the article concludes.

Cynthia Murrell, September 8, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Government, News, search engine | Comments Off on How Search Moves Forward

Google to Further Predict Relevant News to Subscribers

August 30, 2017

It’s no surprise that these days most people rely on something other than themselves to find relevant news stories, be it social media, a news feed or even Google. For many, it’s easier to let others determine what is truly important. Google, a leader in pointing out useful information and news, has stepped up their steering game and announce the update to their app which will further think for each user.

According to a recent liliputing article,

…the feed still shows things like news, videos, and sports scores. But Google isn’t just choosing content based on the way you interact with Google search, apps, and services anymore. The company will also surface items that are trending locally and around the globe, helping you stay up to date on things that you might otherwise have missed. The company says it uses machine learning algorithms to predict which things you’ll be most interested in seeing.

For those uncomfortable with only seeing news stories Google’s algorithms deem worthy of your consideration there are steps you can take to delete your preferences and habits. Perhaps Google’s intentions are altruistic and the app will be Big Brother approved really helpful to the masses. We sure hope so!

Catherine Lamsfuss, August 30, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Applications, Google, News | 1 Comment

Learn About Machine Learning

August 30, 2017

For an in-depth look at the technology behind Google Translate, turn to Stats and Bots’ write-up, “Machine Learning Translation and the Google Translate Algorithm.” Part of a series that aims to educate users about the technology behind machine learning (ML), the illustrated article delves into the details behind Google’s deep learning translation tools. Writer Daniil Korbut explains the factors that make it problematic to “teach” human language to an AI, then describes Long Short-Term Memory (LSTM) networks, bidirectional RNNs, sequence-to-sequence models, and how Google put those tools together. See the article for those details that are a bit above this writer’s head. There’s just one thing missing—any acknowledgment of the third parties that provide Google with language technology. Oh well.

Another valuable resource on machine learning, found at YCombinator, is Google researcher Jeff Dean’s Lecture for YC AI. The post includes a video that is over an hour long, but it also shares the informative slides from Dean’s presentation. They touch on scientific and medical applications for machine learning, then examine sequence-to-sequence models, automated machine learning, and “higher performance” ML models. One early slide reproduces a Google blog post in which Dean gives a little history (and several relevant links):

Allowing computers to better understand human language is one key area for our research. In late 2014, three Brain team researchers published a paper on Sequence to Sequence Learning with Neural Networks, and demonstrated that the approach could be used for machine translation. In 2015, we showed that this this approach could also be used for generating captions for images, parsing sentences, and solving computational geometry problems. In 2016, this previous research (plus many enhancements) culminated in Brain team members worked closely with members of the Google Translate team to wholly replace the translation algorithms powering Google Translate with a completely end-to-end learned system (research paper). This new system closed the gap between the old system and human quality translations by up to 85% for some language pairs. A few weeks later, we showed how the system could do “zero-shot translation”, learning to translate between languages for which it had never seen example sentence pairs (research paper). This system is now deployed on the production Google Translate service for a growing number of language pairs.

These surveys of Google’s machine translation tools offer a lot of detailed information for those interested in the topic. Just remember that Google is not (yet?) the only game in town.

Cynthia Murrell, August 30, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Google, News, Tools | 1 Comment

Google: Ethics, Algorithms and Pirates, Oh My!

July 28, 2017

That search engines have changed the way the world looks for information, no one will argue, but for as much good as Google and other major search engines have done at promoting the free sharing of all types of information, it has also allowed and, at the time, encouraged the illegal sharing of private property. Namely, search engines are encouraging, even if absently, piracy of music and video.

Recently, despite vowing the contrary, Google has been called out for pirate sites being found not only in the coveted ‘top 10’ of search results, but even highlighted by Google.

The findings aren’t going to help Google’s already contentious relationship with the music and video industries, both of which have spent years accusing Google of doing too little to prevent piracy. They’ve routinely argued that Google should outright remove pirate sites from its results, not just demote them…

Google claims the algorithm is to blame – it is too good at finding the results people want. This puts Google in yet another tough spot. Should they continue to let the algorithm rule, come what may, or tinker with it to punish pirates? If the latter is to be, what does this say for other controversial sites?

Catherine Lamsfuss, July 28, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Google, News, search engine | Comments Off on Google: Ethics, Algorithms and Pirates, Oh My!

Western in Western Out

July 26, 2017

A thoughtful piece at Quartz looks past filter bubbles to other ways mostly Western developers are gradually imposing their cultural perspectives on the rest of the world—“Silicon Valley Has Designed Algorithms to Reflect Your Biases, Not Disrupt Them.” Search will not get you objective information, but rather the content your behavior warrants. Writer Ramesh Srinivasan introduces his argument:

Silicon Valley dominates the internet—and that prevents us from learning more deeply about other people, cultures, and places. To support richer understandings of one another across our differences, we need to redesign social media networks and search systems to better represent diverse cultural and political perspectives. The most prominent and globally used social media networks and search engines— Facebook and Google—are produced and shaped by engineers from corporations based in Europe and North America. As a result, technologies used by nearly 2 billion people worldwide reflect the design perspectives of the limited few from the West who have power over how these systems are developed.

It is worth reading the whole article for its examination of the issue, and suggestions for what to do about it. Algorithm transparency, for example, would at least let users know what principles guide a platform’s content selections. Taking input from user communities in other cultures is another idea. My favorite is a proposal to prioritize firsthand sources over Western interpretations, even ones with low traffic or that are not in English. As Srinivasan writes:

Just because this option may be the easiest for me to understand doesn’t mean that it should be the perspective I am offered.

That sums up the issue nicely.

Cynthia Murrell, July 26, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Facebook, Google, News | Comments Off on Western in Western Out

AI Feeling a Little Sentimental

July 24, 2017

Big data was one of the popular buzzwords a couple years ago, but one conundrum was how organizations were going to use all that mined data? One answer has presented itself: sentiment analysis. Science shares the article, “AI In Action: How Algorithms Can Analyze The Mood Of The Masses” about how artificial intelligence is being used to gauge people’s emotions.

Social media presents a constant stream of emotional information about products, services, and places that could be useful to organizations. The problem in the past is that no one knew how to fish all of that useful information out of the social media Web sites and make it a usable. By using artificial intelligence algorithms and natural language processing, data scientists are finding associations between words, the language used, posting frequency, and more to determine everything from a person’s mood to their personality, income level, and political associations.

‘There’s a revolution going on in the analysis of language and its links to psychology,’ says James Pennebaker, a social psychologist at the University of Texas in Austin. He focuses not on content but style, and has found, for example, that the use of function words in a college admissions essay can predict grades. Articles and prepositions indicate analytical thinking and predict higher grades; pronouns and adverbs indicate narrative thinking and predict lower grades…’Now, we can analyze everything that you’ve ever posted, ever written, and increasingly how you and Alexa talk,’ Pennebaker says. The result: ‘richer and richer pictures of who people are.’

AI algorithms are able to turn a person’s online social media accounts and construct more than a digital fingerprint of a person. The algorithms act like digital mind readers and recreate a person based on the data they publish.

Whitney Grace, July 24, 2017

Written by Stephen E. Arnold · Filed Under AI, algorithms, News, Social Media | Comments Off on AI Feeling a Little Sentimental

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Markov: Maths for the Newspaper Reader

Instagram Algorithm to Recognize Cruelty and Kindness

An Algorithm with Unintended Consequences

My Feed Personalization a Step Too Far

How Search Moves Forward

Google to Further Predict Relevant News to Subscribers

Learn About Machine Learning

Google: Ethics, Algorithms and Pirates, Oh My!

Western in Western Out

AI Feeling a Little Sentimental

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta