urbanists.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
We're a server for people who like bikes, transit, and walkable cities. Let's get to know each other!

Server stats:

553
active users

#aitraining

2 posts2 participants0 posts today

PressGazette: ‘Unsustainable status quo’: AI companies and publishers respond to Govt copyright consultation. “The UK Government’s proposal to allow AI companies to automatically train their models on online content unless the rightsholder specifically opts out has been described as ‘unworkable’. A range of responses to the Government consultation on its proposed change to the existing […]

https://rbfirehose.com/2025/04/19/unsustainable-status-quo-ai-companies-and-publishers-respond-to-govt-copyright-consultation-pressgazette/

!!!!! F*ck off Meta !!!!! Meta gab heute bekannt, dass es in Kürze mit dem Training seiner KI-Modelle anhand von Inhalten erwachsener europäischer Nutzer auf seinen Social-Media-Plattformen Facebook und Instagram beginnen wird. Zu den Inhalten, die für das KI-Training verwendet werden, gehören Beiträge und Kommentare erwachsener Nutzer sowie Fragen und Anfragen aus der Interaktion mit dem Meta-KI-Assistenten.

"Finally, AI can fact-check itself. One large language model-based chatbot can now trace its outputs to the exact original data sources that informed them.

Developed by the Allen Institute for Artificial Intelligence (Ai2), OLMoTrace, a new feature in the Ai2 Playground, pinpoints data sources behind text responses from any model in the OLMo (Open Language Model) project.

OLMoTrace identifies the exact pre-training document behind a response — including full, direct quote matches. It also provides source links. To do so, the underlying technology uses a process called “exact-match search” or “string matching.”

“We introduced OLMoTrace to help people understand why LLMs say the things they do from the lens of their training data,” Jiacheng Liu, a University of Washington Ph.D. candidate and Ai2 researcher, told The New Stack.

“By showing that a lot of things generated by LLMs are traceable back to their training data, we are opening up the black boxes of how LLMs work, increasing transparency and our trust in them,” he added.

To date, no other chatbot on the market provides the ability to trace a model’s response back to specific sources used within its training data. This makes the news a big stride for AI visibility and transparency."

thenewstack.io/llms-can-now-tr

The New Stack · Breakthrough: LLM Traces Outputs to Specific Training DataAi2’s OLMoTrace uses string matching to reveal the exact sources behind chatbot responses

Big tech companies want total control but opt-out should be the way to go:

"OpenAI and Google have rejected the government’s preferred approach to solve the dispute about artificial intelligence and copyright.

In February almost every UK daily newspaper gave over its front page and website to a campaign to stop tech giants from exploiting the creative industries.

The government’s plan, which has prompted protests from leading figures in the arts, is to amend copyright law to allowdevelopers to train their AI models on publicly available content for commercial use without consent from rights holders, unless they opt out.

However, OpenAI has called for a broader copyright exemption for AI, rejecting the opt-out model."

thetimes.com/uk/technology-uk/

The Times · AI giants reject government’s approach to solving copyright rowBy Georgia Lambert
#AI#GenerativeAI#UK

"Now consider the chatbot therapist: what are its privacy safeguards? Well, the companies may make some promises about what they will and won't do with the transcripts of your AI sessions, but they are lying. Of course they're lying! AI companies lie about what their technology can do (of course). They lie about what their technologies will do. They lie about money. But most of all, they lie about data.

There is no subject on which AI companies have been more consistently, flagrantly, grotesquely dishonest than training data. When it comes to getting more data, AI companies will lie, cheat and steal in ways that would seem hacky if you wrote them into fiction, like they were pulp-novel dope fiends:
(...)
But it's not just people struggling with their mental health who shouldn't be sharing sensitive data with chatbots – it's everyone. All those business applications that AI companies are pushing, the kind where you entrust an AI with your firm's most commercially sensitive data? Are you crazy? These companies will not only leak that data, they'll sell it to your competition. Hell, Microsoft already does this with Office365 analytics:
(...)
These companies lie all the time about everything, but the thing they lie most about is how they handle sensitive data. It's wild that anyone has to be reminded of this. Letting AI companies handle your sensitive data is like turning arsonists loose in your library with a can of gasoline, a book of matches, and a pinky-promise that this time, they won't set anything on fire."

pluralistic.net/2025/04/01/doc

pluralistic.netPluralistic: Anyone who trusts an AI therapist needs their head examined (01 Apr 2025) – Pluralistic: Daily links from Cory Doctorow

The Conversation: Africa’s data workers are being exploited by foreign tech firms – 4 ways to protect them. “Since 2015, we have been studying the central role of African data workers in building and maintaining artificial intelligence (AI) systems, acting as ‘data janitors’. Our research found that companies rarely acknowledge the use of human workers in AI value chains, thus they […]

https://rbfirehose.com/2025/04/01/the-conversation-africas-data-workers-are-being-exploited-by-foreign-tech-firms-4-ways-to-protect-them/

Emboldened by #Trump , A.I. Companies Lobby for Fewer Rules

President Trump at the White House in January with, from left, Oracle’s chairman, Larry Ellison; SoftBank’s chief executive, Masayoshi Son; and OpenAI’s chief executive, Sam Altman.
#ai #privacy #openai #softbank #oracle #aitraining #training

nytimes.com/2025/03/24/technol

The New York Times · Emboldened by Trump, A.I. Companies Lobby for Fewer RulesBy Cecilia Kang

Fast Company: Hollywood warns about AI industry’s push to change copyright law. “A who’s who of musicians, actors, directors, and more have teamed up to sound the alarm as AI leaders including OpenAI and Google argue that they shouldn’t have to pay copyright holders for AI training material. In an open letter, submitted to the White House Office of Science and Technology, more than 400 […]

https://rbfirehose.com/2025/03/20/fast-company-hollywood-warns-about-ai-industrys-push-to-change-copyright-law/

"The AI landscape is in danger of being dominated by large companies with deep pockets. These big names are in the news almost daily. But they’re far from the only ones – there are dozens of AI companies with fewer than 10 employees trying to build something new in a particular niche.

This bill demands that creators of any AI model–even a two-person company or a hobbyist tinkering with a small software build– identify copyrighted materials used in training. That requirement will be incredibly onerous, even if limited just to works registered with the U.S. Copyright Office. The registration system is a cumbersome beast at best–neither machine-readable nor accessible, it’s more like a card catalog than a database–that doesn’t offer information sufficient to identify all authors of a work, much less help developers to reliably match works in a training set to works in the system.

Even for major tech companies, meeting these new obligations would be a daunting task. For a small startup, throwing on such an impossible requirement could be a death sentence. If A.B. 412 becomes law, these smaller players will be forced to devote scarce resources to an unworkable compliance regime instead of focusing on development and innovation. The risk of lawsuits—potentially from copyright trolls—would discourage new startups from even attempting to enter the field."

eff.org/deeplinks/2025/03/cali

Electronic Frontier Foundation · California’s A.B. 412: A Bill That Could Crush Startups and Cement A Big Tech AI MonopolyCalifornia legislators have begun debating a bill (A.B. 412) that would require AI developers to track and disclose every registered copyrighted work used in AI training. At first glance, this might sound like a reasonable step toward transparency. But it’s an impossible standard that could crush...
#USA#California#AI

"Anyone at an AI company who stops to think for half a second should be able to recognize they have a vampiric relationship with the commons. While they rely on these repositories for their sustenance, their adversarial and disrespectful relationships with creators reduce the incentives for anyone to make their work publicly available going forward (freely licensed or otherwise). They drain resources from maintainers of those common repositories often without any compensation. They reduce the visibility of the original sources, leaving people unaware that they can or should contribute towards maintaining such valuable projects. AI companies should want a thriving open access ecosystem, ensuring that the models they trained on Wikipedia in 2020 can be continually expanded and updated. Even if AI companies don’t care about the benefit to the common good, it shouldn’t be hard for them to understand that by bleeding these projects dry, they are destroying their own food supply.

And yet many AI companies seem to give very little thought to this, seemingly looking only at the months in front of them rather than operating on years-long timescales. (Though perhaps anyone who has observed AI companies’ activities more generally will be unsurprised to see that they do not act as though they believe their businesses will be sustainable on the order of years.)

It would be very wise for these companies to immediately begin prioritizing the ongoing health of the commons, so that they do not wind up strangling their golden goose. It would also be very wise for the rest of us to not rely on AI companies to suddenly, miraculously come to their senses or develop a conscience en masse.

Instead, we must ensure that mechanisms are in place to force AI companies to engage with these repositories on their creators' terms."

citationneeded.news/free-and-o

Citation Needed · “Wait, not like that”: Free and open access in the age of generative AIThe real threat isn’t AI using open knowledge — it’s AI companies killing the projects that make knowledge free