Comment

Comment on Larian CEO Responds to Divinity Gen AI Backlash: 'We Are Neither Releasing a Game With Any AI Components, Nor Are We Looking at Trimming Down Teams to Replace Them With AI'

<- View Parent

utopiah@lemmy.world ⁨2⁩ ⁨months⁩ ago

There are AI’s that are ethically trained

Can you please share examples and criteria?

source

Sort:hotnew top

dogslayeggs@lemmy.world ⁨2⁩ ⁨months⁩ ago
Sure. My company has a database of all technical papers written by employees in the last 30-ish years. Nearly all of these contain proprietary information from other companies (we deal with tons of other companies and have access to their data), so we can’t build a public LLM nor use a public LLM. So we created an internal-only LLM that is only trained on our data.

source
- Fmstrat@lemmy.world ⁨2⁩ ⁨months⁩ ago
  I’d bet my lunch this internal LLM is a trained open weight model, which has lots of public data in it. Not complaining about what your company has done, as I think that makes sense, just providing a counterpoint.
  
  source
- utopiah@lemmy.world ⁨2⁩ ⁨months⁩ ago
  You are solely using your own data or rather you are refining an existing LLM or rather RAG?
  
  I’m not an expert but AFAIK training an LLM requires, by definition, a vast mount of text so I’m skeptical that ANY company publish enough papers to do so. I understand if you can’t share more about the process. Maybe me saying “AI” was too broad.
  
  source
- tb_@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Completely from scratch?
  
  source
oplkill@lemmy.world ⁨2⁩ ⁨months⁩ ago
It can use public domain licenced data

source
- utopiah@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Right, and to be clear I’m not saying it’s not possible. This isn’t a trick question, it’s a genuine request to hopefully be able to rely on such tools.
  
  source
Fmstrat@lemmy.world ⁨2⁩ ⁨months⁩ ago

Apertus was developed with due consideration to Swiss data protection laws, Swiss copyright laws, and the transparency obligations under the EU AI Act. Particular attention has been paid to data integrity and ethical standards: the training corpus builds only on data which is publicly available. It is filtered to respect machine-readable opt-out requests from websites, even retroactively, and to remove personal data, and other undesired content before training begins.

www.swiss-ai.org/apertus

Fully open source, even the training data is provided for download. That being said, this is the only one I know of.

source
- utopiah@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Thanks, a friend recommended it few days ago indeed but unfortunately AFAICT they don’t provide the CO2eq in their model card nor an analogy equivalence non technical users could understand.
  
  source
Hackworth@piefed.ca ⁨2⁩ ⁨months⁩ ago
Adobe’s image generator (Firefly) is trained only on images from Adobe Stock.

source
- utopiah@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Does it only use that or doesn’t it also use an LLM to?
  
  source
  - Hackworth@piefed.ca ⁨2⁩ ⁨months⁩ ago
    The Firefly image generator is a diffusion model, and the Firefly video generator is a diffusion transformer. LLMs aren’t involved in either process. I believe there are some ChatGPT integrations with Reader and Acrobat, but that’s unrelated to Firefly.
    
    source
    utopiah@lemmy.world ⁨2⁩ ⁨months⁩ ago
    Surprising, I would expect it’d rely at some point on something like CLIP in order to be prompted.
    
    source
    -> View More Comments