@cerevant

cerevant@lemmy.world · 10 months ago

Nature knows how to solve this problem.

cerevant@lemmy.world · 11 months ago

Not really - it isn’t prediction, it is early detection. Interpretive AI (finding and interpreting patterns) is way ahead of generative AI.

cerevant@lemmy.world · 11 months ago

The irony that this story was posted by a bot…

cerevant@lemmy.world · 11 months ago

AI content isn’t watermarked, or detection would be trivial. What he’s talking about is that certain words have a certain probability of appearing after certain other words in a certain context. While there is some randomness to the output, certain words or phrases are unlikely to appear because the data the model was based on didn’t use them.

All I’m saying is that the more a writer’s writing style and word choice are similar to the data set, the more likely their original content would be flagged as AI generated.

cerevant@lemmy.world · 11 months ago

Here’s the thing though - the probabilities for word choice come from the data the model was trained on. While someone that uses a substantially different writing style / word choice than the LLM could easily be identified as being not from the LLM, someone with a similar writing style might be indistinguishable from the LLM.

Or, to oversimplify: given that Reddit was a large portion of the input data for ChatGPT, all you need to do is write like a Redditor to sound like ChatGPT.

cerevant@lemmy.world · 11 months ago

If it could, it couldn’t claim that the content out produced was original. If AI generated content were detectable, that would be a tacit admission that it is entirely plagiarized.

cerevant@lemmy.world · edit-2 1 year ago

The base assumption of those with that argument is that an AI is incapable of being original, so it is “stealing” anything it is trained on. The problem with that logic is that’s exactly how humans work - everything they say or do is derivative from their experiences. We combine pieces of information from different sources, and connect them in a way that is original - at least from our perspective. And not surprisingly, that’s what we’ve programmed AI to do.

Yes, AI can produce copyright violations. They should be programmed not to. They should cite their sources when appropriate. AI needs to “learn” the same lessons we learned about not copy-pasting Wikipedia into a term paper.

cerevant@lemmy.world · 1 year ago

Though, ironically a scale of Full - 3/4 - half - 1/4 - empty is perfectly fine for gas. There is usually a visual gauge of % for charge, but it isn’t as prominent as the range. Oddly, my car has it divided roughly in thirds.

cerevant@lemmy.world · 1 year ago

The problem is that other vehicles adjust the projection based on current conditions - when I drive up a mountain, my projected range drops like a rock. When I drive back down I can end up with more range than I started. Reporting the “ideal” case during operation is misleading at best.

cerevant@lemmy.world · 1 year ago

Copyright 100% applies to the output of an AI, and it is subject to all the rules of fair use and attribution that entails.

That is very different than saying that you can’t feed legally acquired content into an AI.

cerevant@lemmy.world · 1 year ago

No, you misunderstand. Yes, they can control how the content in the book is used - that’s what copyright is. But they can’t control what I do with the book - I can read it, I can burn it, I can memorize it, I can throw it up on my roof.

My argument is that the is nothing wrong with training an AI with a book - that’s input for the AI, and that is indistinguishable from a human reading it.

Now what the AI does with the content - if it plagiarizes, violates fair use, plagiarizes- that’s a problem, but those problems are already covered by copyright laws. They have no more business saying what can or cannot be input into an AI than they can restrict what I can read (and learn from). They can absolutely enforce their copyright on the output of the AI just like they can if I print copies of their book.

My objection is strictly on the input side, and the output is already restricted.

cerevant@lemmy.world · 1 year ago

Again, my point is that the output is what can violate the law, not the input. And we already have laws that govern fair use, rebroadcast, etc.

cerevant@lemmy.world · 1 year ago

My point is that the restrictions can’t go on the input, it has to go on the output - and we already have laws that govern such derivative works (or reuse / rebroadcast).

cerevant@lemmy.world · edit-2 1 year ago

Then this is a copyright violation - it violates any standard for such, and the AI should be altered to account for that.

What I’m seeing is people complaining about content being fed into AI, and I can’t see why that should be a problem (assuming it was legally acquired or publicly available). Only the output can be problematic.

cerevant@lemmy.world · 1 year ago

There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.

There are plagiarism and copyright laws to protect the output of these tools: if the output is infringing, then sue them. However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.

When you sell a book, you don’t get to control how that book is used. You can’t tell me that I can’t quote your book (within fair use restrictions). You can’t tell me that I can’t refer to your book in a blog post. You can’t dictate who may and may not read a book. You can’t tell me that I can’t give a book to a friend. Or an enemy. Or an anarchist.

Folks, this isn’t a new problem, and it doesn’t need new laws.

cerevant@lemmy.world · 1 year ago

The fediverse is the name for services that use ActivityPub - a communication protocol. What you are saying is like saying “tech companies, banks and regulators need to crack down on http because there is CSAM on the web”.

cerevant@lemmy.world · 1 year ago

Yeah, let’s talk about self driving cars…

cerevant@lemmy.world · edit-2 1 year ago

In medicine, when a big breakthrough happens, we hear that we could see practical applications of the technology in 5-10 years.

In computer technology, we reach the same level of proof of concept and ship it as a working product, and ignore the old adage “The first 90% of implementation takes 90% of the time, and the last 10% takes the other 90%”.

cerevant@lemmy.world · edit-2 1 year ago

As noted elsewhere, do everything you can to avoid handing your card to anyone.

Use tap to pay wherever possible, then chip - neither of those methods give the card number to the merchant. Do not swipe unless you absolutely have to, and then inspect what you are swiping to make sure nothing is attached to the card reader.

For online purchases, do everything you can to avoid giving your card number to anyone - use ApplePay / GooglePay / Amazon Pay / PayPal etc. wherever possible. These can be used to put charges on your card without giving your card # to the merchant. These are one-time authorizations (unless you explicitly identify it as a subscription / recurring charge), so they can’t reuse the transaction token they get.

cerevant@lemmy.world · 1 year ago

The cost savings is a supply chain wet dream.