Close Menu
  • Home
  • UNSUBSCRIBE
  • News
  • Lifestyle
  • Tech
  • Entertainment
  • Sports
  • Travel
Facebook X (Twitter) WhatsApp
Trending
  • All the foldable smartphones you can buy in 2025
  • Everton want Jack Grealish loan deal from Man City as Man Utd weight up Gianluigi Donnarumma move – Paper Talk | Football News
  • Law Roach Shares Update on Zendaya, Tom Holland's Wedding Planning
  • Sen. Josh Hawley introduces bill to send tariff rebate checks to Americans
  • Planned C-sections linked to increased risk of childhood leukemia, study finds
  • Eric Church and Morgan Wallen’s Field & Stream apparel debuts on Amazon
  • Microsoft debuts Copilot Mode in Edge. Try the agentic browser now.
  • Emma Raducanu: British No 1 continues winning feeling after first-round success in Montreal | Tennis News
Get Your Free Email Account
Facebook X (Twitter) WhatsApp
Baynard Media
  • Home
  • UNSUBSCRIBE
  • News
  • Lifestyle
  • Tech
  • Entertainment
  • Sports
  • Travel
Baynard Media
Home»Tech»One million public Bluesky posts scraped for AI training
Tech

One million public Bluesky posts scraped for AI training

EditorBy EditorNovember 28, 2024No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

Bluesky is already facing its first major AI scrape, despite the stance of its owners that it will never train generative AI on user data.

Reported by 404Media on Nov. 26, one million public Bluesky posts — complete with identifying user information — were crawled and then uploaded to AI company Hugging Face. The dataset was created by machine learning librarian Daniel van Strien, intended to be used in the development of language models and natural language processing, as well as general analysis of social media trends, content moderation, and posting patterns. It contains users’ decentralized identifiers (DIDs) and even has a search function to find content from specific users.

According to the dataset’s description, the set “contains 1 million public posts collected from Bluesky Social’s firehose API (Application Programming Interface), intended for machine learning research and experimentation with social media data. Each post contains text content, metadata, and information about media attachments and reply relationships.”

Mashable Light Speed

SEE ALSO:

Leaving X for bluer pastures? What to know about Bluesky’s owners and policies.

Bluesky users didn’t opt-in to such uses of their content, but neither is it expressly prohibited by Bluesky. The platform’s firehose API is an “aggregated, chronological stream of all the public data updates as they happen in the network, including posts, likes, follows, handle changes, and more.” Bluesky’s API — coupled with the public and decentralized Authenticated Transfer (AT) Protocol the site is built on — means Bluesky content is open and available to the third party developers the platform is trying to court, 404Media explains.


Black Friday deals you can shop right now

Products available for purchase here through affiliate links are selected by our merchandising team. If you buy something through links on our site, Mashable may earn an affiliate commission.


This could be a major warning sign to many of the site’s millions of new users, many of whom left competitor X in the wake of an alarming new AI training policy. A Bluesky representative responded to 404Media’s requests for comment: “Bluesky is an open and public social network, much like websites on the Internet itself. Just as robots.txt files don’t always prevent outside companies from crawling those sites, the same applies here. We’d like to find a way for Bluesky users to communicate to outside orgs/developers whether they consent to this and that outside orgs respect user consent, and we’re actively discussing how to achieve this.”

Shortly after the article’s publication, the dataset was removed from Hugging Face. “I’ve removed the Bluesky data from the repo. While I wanted to support tool development for the platform, I recognize this approach violated principles of transparency and consent in data collection. I apologize for this mistake,” van Strien wrote in a follow-up Bluesky post.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleLegendary coach Nick Saban highlights potential issue with new College Football Playoff format
Next Article Ollie Robinson: England call up wicketkeeper for New Zealand Test series after Jordan Cox injury | Cricket News
Editor
  • Website

Related Posts

Tech

All the foldable smartphones you can buy in 2025

July 28, 2025
Tech

Microsoft debuts Copilot Mode in Edge. Try the agentic browser now.

July 28, 2025
Tech

Shein prices spike after Trump ends de minimis exemption

July 28, 2025
Add A Comment

Comments are closed.

Categories
  • Entertainment
  • Lifestyle
  • News
  • Sports
  • Tech
  • Travel
Recent Posts
  • All the foldable smartphones you can buy in 2025
  • Everton want Jack Grealish loan deal from Man City as Man Utd weight up Gianluigi Donnarumma move – Paper Talk | Football News
  • Law Roach Shares Update on Zendaya, Tom Holland's Wedding Planning
  • Sen. Josh Hawley introduces bill to send tariff rebate checks to Americans
  • Planned C-sections linked to increased risk of childhood leukemia, study finds
calendar
July 2025
M T W T F S S
 123456
78910111213
14151617181920
21222324252627
28293031  
« May    
Recent Posts
  • All the foldable smartphones you can buy in 2025
  • Everton want Jack Grealish loan deal from Man City as Man Utd weight up Gianluigi Donnarumma move – Paper Talk | Football News
  • Law Roach Shares Update on Zendaya, Tom Holland's Wedding Planning
About

Welcome to Baynard Media, your trusted source for a diverse range of news and insights. We are committed to delivering timely, reliable, and thought-provoking content that keeps you informed
and inspired

Categories
  • Entertainment
  • Lifestyle
  • News
  • Sports
  • Tech
  • Travel
Facebook X (Twitter) Pinterest WhatsApp
  • Contact Us
  • About Us
  • Privacy Policy
  • Disclaimer
  • UNSUBSCRIBE
© 2025 copyrights reserved

Type above and press Enter to search. Press Esc to cancel.