Loading…
Attending this event?
June 19-20, 2024
Paris, France
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for AI_dev Europe to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in CEST (Central European Summer Time) UTC/GMT +2 hours. To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Thursday, June 20 • 09:25 - 09:40
Keynote: Common Corpus: Opening Data for Building Open Source LLMs - Anastasia Stasenko, Co-founder, pleias & Associate Senior Lecturer, Sorbonne-Nouvelle

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Training data has long been a bottleneck in the development of fully open source and reproducible Large Language Models. Between legal issues, as most current major gen AI players utilise copyrighted content for training their models, and a growing understanding that data quality is the real driver for models' performance, establishing a fully open corpus massive and qualitative enough to train state of the art LLMs could be seen as on the main vectors for empowering open source AI community.

Following this ambition, in the beginning of 2024 pleias launched an international project "Common Corpus" resulting in the first release of the largest collection (1T tokens) of fully open data for training LLMs. This conference will first describe how this corpus was built as well as challenges and opportunities that it presents for pushing forward the notion of openness in the realm of generative AI.

Speakers
avatar for Anastasia Stasenko

Anastasia Stasenko

Co-founder, pleias & Associate Senior Lecturer, Sorbonne-Nouvelle
Anastasia Stasenko is a cofounder of pleias, a French startup specialised in development of open science LLMs trained on the fully open copyright-free data. In parallel, she holds a position of Senior Associate Lecturer in Data Analysis and Digital Strategy at Sorbonne-Nouvelle U... Read More →


Thursday June 20, 2024 09:25 - 09:40 CEST
Theatre (Level -1)
Feedback form isn't open yet.