Searching Video and Audio Files is Now Easier Than Ever

February 7, 2018

While text-based search has been honed to near perfection in recent years, video and audio search still lags. However, a few companies are really beginning to chip away at this problem. One that recently caught our attention was VidDistill, a company that distills YouTube videos into an indexed list.

According to their website:

vidDistill first gets the video and captions from YouTube based off of the URL the user enters. The caption text is annotated with the time in the video the text corresponds to. If manually provided captions are available, vidDistill uses those captions. If manually provided captions are not available, vidDistill tries to fall back on automatically generated captions. If no captioning of any sort is available, then vidDistill will not work.

Once vidDistill has the punctuated text, it uses a text summarization algorithm to identify the most important sentences of the entire transcript of the video. The text summarization algorithm compresses the text as much as the user specifies.

It was interesting and did what they claimed, however, we wish you could search for words and have it brought up in the index so users could skip directly to specific parts of a video. This technology has been done in audio, quite well. A service called Happy Scribe, which is aimed at journalists transcribing audio notes, takes an audio file and (for a small fee) transcribes it to text, which can then be searched. It’s pretty elegant and fairly accurate, depending on the audio quality. We could see VidDistill using this mentality to great success.

Patrick Roland, February 7, 2018

Written by Stephen E. Arnold · Filed Under algorithms, News, Search, Text analytics, Video | 1 Comment

Tweaking Algorithms: Touching Up Nicks and Scrapes?

January 30, 2018

Is it possible that algorithms are growing and changing along with the basic needs of our artificial intelligence? Yes, and no, is pretty much the answer. We learned how many are claiming to be revolutionizing the DNA of artificial intelligence and how they might be blowing smoke from a recent Phys.org story, “The Algorithms of Our Future Thinking Machines.”

According to the story:

“The challenge of constructing algorithms for dynamic systems is in the nature of those systems: they are in constant change. Traffic cameras, radar, and inertial sensors are some of the devices delivering the information the algorithm requires. Now another extremely dynamic system is becoming more central to Thomas Schön’s and his colleagues’ projects: the human body.”

This is some neat research and well worth a read, but the grandiose claims are a little unfounded. People have been constructing algorithms for dynamic systems for quite a while. From search engines to video games to the above mentioned, there is not a lot of undiscovered territory in this land. However, that does not mean there are not new depths to explore. We have found the mountain, so now it is time to crack it open and see what gems are hiding inside. If bright minds like this can tune their thinking that way, we could be in for some grand surprises or clumsy scratch repairs.

Patrick Roland, January 30, 2018

Written by Stephen E. Arnold · Filed Under algorithms, News | Comments Off on Tweaking Algorithms: Touching Up Nicks and Scrapes?

Palantir: Accused of Hegelian Contradictions

January 29, 2018

I bet you have not thought about Hegel since you took that required philosophy course in college. Well, Hegel and his “contradictions” are central to “WEF 2018: Davos, Data, Palantir and the Future of the Internet.”

I highlighted this passage from the essay:

Data is the route to security. Data is the route to oppression. Data is the route to individual ideation. Data is the route to the hive mind. Data is the route to civic wealth. Data is the route to civic collapse.

Thesis, antitheses, synthesis in action I surmise.

The near term objective is synthesis. I assume this is the “connecting the dots” approach to finding what one needs to know.

I learned:

The stakes for big data couldn’t be bigger.

Okay, a categorical in our fast changing, diverse economic and political climate. Be afraid seems to be the message.

Palantir’s point of operations in Davos is described in the write up as “a pimped up liquor store.” Helpful and highly suggestive too.

The conclusion of the essay warranted a big red circle:

So next time you hear the names Palantir or Alex Karp, stop what you’re doing and pay attention. The future – your future – is under discussion. Under construction. This little first draft of history of which you’ve made it to the end (congratulations and thanks) – the history of data – is of a future that will in time come to be seen for what it is: digital that truly matters.

Several observations:

The author wants me to believe that Palantir is not a pal.
The big data thing troubles the author because Palantir is one of the vendors providing next generation information access.
The goal of making Palantir into something unique is best accomplished by invoking Fancy Dan ideas.

I would suggest that knowledge about companies like Gamma Group FinFisher, Shoghi, Trovicor, and some other interesting non US entities might put Palantir in perspective. Palantir has an operational focus; some of the other vendors perform different information services.

Palantir is an innovator, but it is part of a landscape of data intercept and analysis organizations. I could make a case that Palantir is capable but some companies in Europe and the East are actually more technologically advanced.

But these outfits were not at Davos. Why? That’s a good question. Perhaps they were too busy with their commercial and government work. My hunch is that a few of these outfits were indeed “there”, just not noticed by the expert who checked out the liquor store.

Stephen E Arnold, January 29, 2019

Written by Stephen E. Arnold · Filed Under AI, algorithms, Big data, News, Palantir | Comments Off on Palantir: Accused of Hegelian Contradictions

IBM and Algorithmic Bias

January 25, 2018

I read “Unexplainable Algos? Get Off the Market, Says IBM Chief Ginni Rometty.” The idea is in line with Weapons of Math Destruction and the apparent interest in “black box” solutions. If you are old enough, you will remember the Autonomy IDOL system. It featured a “black box” which licensees used without the ability to alter how the system operated. You may also recall that the first Google Search Appliances locked users out as well. One installed the GSA and it just worked—at least, in theory.

This article includes information derived from the IBM content output for the World Economic Forum where it helps to have one’s own helicopter for transportation.

I noted this statement:

“When it comes to the new capabilities of artificial intelligence, we must be transparent about when and how it is being applied and about who trained it, with what data, and how,” the IBM chairman, president and CEO wrote.

I don’t want to be too picky but IBM owns the i2 Analyst Notebook system. If you are not familiar with this platform, it provides law enforcement and intelligence professionals with tools to organize, analyze, and marshal information for an investigation. As a former consultant to i2, I am not sure if the plumbing developed by i2 is public. In fact, IBM and Palantir jousted in court when IBM sued Palantir for improper use of its intellectual property; that is a fancy way of saying, “Palantir engineers tried to figure out how i2 worked.” The case settled out of court and many of the documents are sealed because no one party to the case wanted certain information exposed to bright sunlight.

IBM operates a number of cybersecurity services. One of these has the ability to intercept a voice call and map that call to email and other types of communications. The last time I received some information about this service I had to sign a bundle of documents. The idea, of course, is that much of the technology was, from my point of view, a “black box.”

So what?

The statement by IBM’s CEO is important because it is, in my opi9nion, hand waving. IBM deals in systems which are neither fully understood by some of the IBM experts selling these solutions, and some of the engineers who may know more about the inner working of secret or confidential systems and methods are not talking. An expert knows stuff others do not; therefore, why talk and devalue one’s expertise.

To sum up, talk about making math centric systems and procedures transparent is just noise. The number of people who can explain how systems which emerged from Cambridge University like Autonomy’s Neurolinguistic System or i2’s Analyst Notebook are in short supply.

How can one who does not understand explain how a complex system works. Black boxes exist to keep those which thumbs for fingers from breaking what works.

Talk doesn’t do much to deal with the algorithmic basics:

Some mathematical procedures in wide use are not easily explained or reverse engineered; hence, the IBM charge that Palantir tried a short cut through the words to the cookie jar
Most next generation systems are built on a handful of algorithms. I have identified 10 which I explain in my lectures about the flaws embedded in “smart” systems. Each of the most widely used algorithms can be manipulated in a number of ways. Some require humans to fiddle; other fiddle when receiving inputs from other systems.
Explainable systems are based on rules. By definition, one assumes the rules work as the authors intended. News flash. Rule based systems can behave in unpredictable, often inexplicable ways. A fun example is for you, gentle reader, to try and get the default numbering system in Microsoft Word to perform consistently with regard to left justification of numbered lists.
Chain a series of algorithms together in a work flow. Add real time data to update thresholds. Watch the outputs. Now explain what happened. Good luck with that.

I love IBM. Always marketing.

Stephen E Arnold, January 25, 2018

Written by Stephen E. Arnold · Filed Under algorithms, IBM Watson, News | Comments Off on IBM and Algorithmic Bias

How SEO Has Shaped the Web

January 19, 2018

With the benefit of hindsight, big-name thinker Anil Dash has concluded that SEO has contributed to the ineffectiveness of Web search. He examines how we got here in his article, “Underscores, Optimization & Arms Races” at Medium. Starting with the year 2000, Dash traces the development of Internet content management systems (CMS’s), of which he was a part. (It is a good brief summary for anyone who wasn’t following along at the time.) WordPress is an example of a CMS.

As Google’s influence grew, online publishers became aware of an opportunity—they could game the search algorithm to move their site to the top of “relevant” results by playing around with keywords and other content details. The question of whether websites should bow to Google’s whims seemed to go unasked, as site after site fell into this pattern, later to be known as Search Engine Optimization. For Dash, the matter was symbolized by a question over hyphens or underbars to represent spaces in web addresses. Now, of course, one can use either without upsetting Google’s algorithm, but that was not the case at first. When Google’s Matt Cutts stated a preference for the hyphen in 2005, most publishers fell in line. Including Dash, eventually and very reluctantly; for him, the choice represented nothing less than the very nature of the Internet.

He writes:

You see, the theory of how we felt Google should work, and what the company had often claimed, was that it looked at the web and used signals like the links or the formatting of webpages to indicate the quality and relevance of content. Put simply, your search ranking with Google was supposed to be based on Google indexing the web as it is. But what if, due to the market pressure of the increasing value of ranking in Google’s search results, websites were incentivized to change their content to appeal to Google’s algorithm? Or, more accurately, to appeal to the values of the people who coded Google’s algorithm?

Eventually, even Dash and his CMS caved and switched to hyphens. What he did not notice at the time, he muses, was the unsettling development of the entire SEO community centered around appeasing these algorithms. He concludes:

By the time we realized that we’d gotten suckered into a never-ending two-front battle against both the algorithms of the major tech companies and the destructive movements that wanted to exploit them, it was too late. We’d already set the precedent that independent publishers and tech creators would just keep chasing whatever algorithm Google (and later Facebook and Twitter) fed to us. Now, the challenge is to reform these systems so that we can hold the big platforms accountable for the impacts of their algorithms. We’ve got to encourage today’s newer creative communities in media and tech and culture to not constrain what they’re doing to conform to the dictates of an opaque, unknowable algorithm.

Is that doable, or have we gone too far toward appeasing the Internet behemoths to turn back?

Cynthia Murrell, January 19, 2018

Written by Stephen E. Arnold · Filed Under algorithms, Internet, News, Social Media | 1 Comment

Out with the Old, in with the New at Google

January 17, 2018

It may have started with its finance app, but Google is making some drastic changes you might want to keep an eye on. We discovered the tip of the iceberg with Google Blog piece, “Stay on Top of Finance Information on Google.”

According to the story:

Now under a new search navigation tab called “Finance,” you’ll have easier access to finance information based on your interests, keeping you in the know about the latest market news and helping you get in-depth insights about companies. On this page, you can see performance information about stocks you’ve chosen to follow, recommendations on other stocks to follow based on your interests, related news, market indices, and currencies.

As part of this revamped experience, we’re retiring a few features of the original Google Finance, including the portfolio, the ability to download your portfolio, and historical tables. However, a list of the stocks from your portfolio will be accessible through Your Stocks in the search result, and you can get notifications when there are any notable changes on their performance.

Not a big shock, but a big part of Google trying to freshen things up. The company has been in hot water with a string of YouTube videos deemed too much. So, with moves like improving its algorithm to weed out fake news, changes to Google Home, and even Maps, Google is sending a message. The message is one of change and one we hope is for the better.

Patrick Roland, January 17, 2018

Written by Stephen E. Arnold · Filed Under algorithms, Financial, Google, News | 1 Comment

AI Makes Life-Saving Medical Advances

January 2, 2018

Too often we discuss the grey area around AI and machine learning. While that is incredibly important during this time, it is also not all this amazing technology can do. Saving lives, for instance. We learned a little more on that front from a recent Digital Journal story, “Algorithm Repairs Corrupted Digital Images.”

According to the story:

University of Maryland researchers have devised a technique exploits the power of artificial neural networks to tackle multiple types of flaws and degradations in a single image in one go.

The researchers achieved image correction through the use of a new algorithm. The algorithm operates artificial neural networks simultaneously, so that the networks apply a range of different fixes to corrupted digital images. The algorithm was tested on thousands of damage digital images, some with severe degradations. The algorithm was able to repair the damage and return each image to its original state.

The application of such technology crosses the business and consumer divide, taking in everything from everyday camera snapshots to lifesaving medical scans. The types of faults digital images can develop include blurriness, grainy noise, missing pixels and color corruption.

Very promising from a commercial and medical standpoint. Especially, the medical side. This news, coupled with the story in Forbes about AI disrupting healthcare norms in 2018 makes for a big promise. We are looking forward to seeing what the new year brings for medical AI.

Patrick Roland, January 2, 2018

Written by Stephen E. Arnold · Filed Under AI, algorithms, healthcare, News | Comments Off on AI Makes Life-Saving Medical Advances

Turning to AI for Better Data Hygiene

December 28, 2017

Most big data is flawed in some way, because humans are imperfect beings. That is the premise behind ZDNet’s article, “The Great Data Science Hope: Machine Learning Can Cure Your Terrible Data Hygiene.” Editor-in-Chief Larry Dignan explains:

The reality is enterprises haven’t been creating data dictionaries, meta data and clean information for years. Sure, this data hygiene effort may have improved a bit, but let’s get real: Humans aren’t up for the job and never have been. ZDNet’s Andrew Brust put it succinctly: Humans aren’t meticulous enough. And without clean data, a data scientist can’t create algorithms or a model for analytics.

Luckily, technology vendors have a magic elixir to sell you…again. The latest concept is to create an abstraction layer that can manage your data, bring analytics to the masses and use machine learning to make predictions and create business value. And the grand setup for this analytics nirvana is to use machine learning to do all the work that enterprises have neglected.

I know you’ve heard this before. The last magic box was the data lake where you’d throw in all of your information–structured and unstructured–and then use a Hadoop cluster and a few other technologies to make sense of it all. Before big data, the data warehouse was going to give you insights and solve all your problems along with business intelligence and enterprise resource planning. But without data hygiene in the first place enterprises replicated a familiar, but failed strategy: Poop in. Poop out.

What the observation lacks in eloquence it makes up for in insight—the whole data-lake concept was flawed from the start since it did not give adequate attention to data preparation. Dignan cites IBM’s Watson Data Platform as an example of the new machine-learning-based cleanup tools, and points to other noteworthy vendors investigating similar ideas—Alation, Io-Tahoe, Cloudera, and HortonWorks. Which cleaning tool will perform best remains to be seen, but Dignan seems sure of one thing—the data that enterprises have been diligently collecting for the last several years is as dirty as a dustbin lid.

Cynthia Murrell, December 28, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Big data, IBM Watson, News | Comments Off on Turning to AI for Better Data Hygiene

New York Begins Asking If Algorithms Can Be Racist

December 27, 2017

The whole point of algorithms is to be blind to everything except data. However, it is becoming increasingly clear that in the wrong hands, algorithms and AI could have a very negative impact on users. We learned more in a recent ACLU post, “New York Takes on Algorithm Discrimination.”

According to the story:

A first-in-the-nation bill, passed yesterday in New York City, offers a way to help ensure the computer codes that governments use to make decisions are serving justice rather than inequality.

Algorithms are often presumed to be objective, infallible, and unbiased. In fact, they are highly vulnerable to human bias. And when algorithms are flawed, they can have serious consequences.

The bill, which is expected to be signed by Mayor Bill de Blasio, will provide a greater understanding of how the city’s agencies use algorithms to deliver services while increasing transparency around them. This bill is the first in the nation to acknowledge the need for transparency when governments use algorithms…

This is a very promising step toward solving a very real problem. From racist coding to discriminatory AI, this is a topic that is creeping into the national conversation. We hope others will follow in New York’s footsteps and find ways to prevent this injustice from going further.

Patrick Roland, December 27, 2017

Written by Stephen E. Arnold · Filed Under AI, algorithms, Data, News | Comments Off on New York Begins Asking If Algorithms Can Be Racist

A Look at Chinese Search Engine Sogou

December 25, 2017

An article at Search Engine Watch draws our attention to one overseas search contender—“What Do You Need to Know About Chinese Search Engine Sogou?” Sogu recently announced terms for a proposed IPO, so writer Rebecca Sentance provides a primer on the company. She begins with some background—the platform was launched in 2004, and the name translates to “searching dog.” She also delves into the not-so-clear issue of where Sogu stands in relation to China’s top search engine, Baidu, and some other contenders for the second-place, so see the article for those details.

I was interested in what Sentance writes about Sogou’s use of AI and natural language search:

It also plans to shift its emphasis from more traditional keyword-based search to answer questions, in line with the trend towards natural language search prompted by the rise of voice search and digital assistants. Sogou has joined major search players such as Bing, Baidu and of course Google in investing in artificial intelligence, but its small size may put it at a disadvantage. A huge search engine like Baidu, with an average of more than 583 million searches per day, has access to reams more data with which to teach its machine learning algorithms.

But Sogou has an ace up its sleeve: it is the only search engine formally allowed to access public messages on WeChat – a massive source of data that will be particularly beneficial for natural language processing. Plus, as I touched on earlier, language is something of a specialty area for Sogou, as Sogou Pinyin gives it a huge store of language data with which to work. Sogou also has ambitious plans to bring foreign-language results to Chinese audiences via its translation technology, which will allow consumers to search the English-speaking web using Mandarin search terms.

The article wraps up by looking at Sogou’s potential effect on search markets; basically, it could have a large impact within China, especially if Baidu keeps experiencing controversy. For the rest of the world, though, the impact should be minimal. Nevertheless, this is one company worth keeping an eye on.

Cynthia Murrell, December 25, 2017

Written by Stephen E. Arnold · Filed Under AI, algorithms, Digital Assistant, News, search engine | Comments Off on A Look at Chinese Search Engine Sogou

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Searching Video and Audio Files is Now Easier Than Ever

Tweaking Algorithms: Touching Up Nicks and Scrapes?

Palantir: Accused of Hegelian Contradictions

IBM and Algorithmic Bias

How SEO Has Shaped the Web

Out with the Old, in with the New at Google

AI Makes Life-Saving Medical Advances

Turning to AI for Better Data Hygiene

New York Begins Asking If Algorithms Can Be Racist

A Look at Chinese Search Engine Sogou

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta