Inside the Creative Commons Dataset from Yahoo and Flickr

January 5, 2015

These are not our grandparents’ photo albums. With today’s technology, photos and videos are created and shared at a truly astounding pace. Much of that circulation occurs on Flickr, who teamed up with Yahoo to create a cache of nearly 100 million photos and almost 800,000 videos with creative commons licenses for us all to share. gives us the details in “The Ins and Outs of the Yahoo Flickr Creative Commons 100 Million Dataset.” Researchers Bart Thomée and David A. Shamma report:

“To understand more about the visual content of the photos in the dataset, the Flickr Vision team used a deep-learning approach to find the presence of visual concepts, such as people, animals, objects, events, architecture, and scenery across a large sample of the corpus. There’s a diverse collection of visual concepts present in the photos and videos, ranging from indoor to outdoor images, faces to food, nature to automobiles.”

The article goes on to explore the frequency of certain tags, both user-annotated and machine-generated. The machine tags include factors like time, location, and camera used, suggesting rich material for data analysts to play with. The researchers conclude with praise for their team’s project:

“The collection is one of the largest released for academic use, and it’s incredibly varied—not just in terms of the content shown in the photos and videos, but also the locations where they were taken, the photographers who took them, the tags that were applied, the cameras that were used, etc. The best thing about the dataset is that it is completely free to download by anyone, given that all photos and videos have a Creative Commons license. Whether you are a researcher, a developer, a hobbyist or just plain curious about online photography, the dataset is the best way to study and explore a wide sample of Flickr photos and videos.”

See the article for more details on those tags found within the massive dataset. To download the whole assemblage from Yahoo Labs, click here.

Cynthia Murrell, January 05, 2015

Sponsored by, developer of Augmentext


Comments are closed.

  • Archives

  • Recent Posts

  • Meta