Webformers
Introduction Extracting structured information from web pages remains a challenging task in natural language processing. Regular transformers architecture are not designed to encode hierarchical information. Each token is connected to every tokens in the input sequence, regardless of their position (even though there are a few mechanisms to introduce positional information, such as positional encoding). In a webpage, information is highly structured. The HTML represents a tree, with each node having a parent and potential siblings and children. This makes that some nodes might be semantically connected while relatively far away from each other if we consider only the number of tokens between them. ...
Kolmogorov AI Framework | Part2 - brainfuck
Motivations I recently started a series of article about a framework I was thinking about, which allows agent to be trained in a reinforcement learning basis, executing one action at a time on specific environments. The first part of the series can be found here: Kolmogorov AI Framework. Environment To build this proof of concept, I chose the brainfuck programming language. It is part of the esoteric programming languages family, but it is made of only 6 instructions, making it very easy to start with as an execution environment. ...
EPITA Courses - Continuous Physics Informed Neural Networks
Introduction Neural networks require large amounts of data to converge. Those data need to represent the task the neural network is trying to learn. Data collection is a tedious process, especially when collecting data can be difficult or expensive. In science, physics for instance, many phenomenon are described using theories that we know are working very well. Using those data as regularization can help neural networks generalize better with less data. ...
EPITA Courses - Transformers
Context Generating data is now a hot topic in machine learning. The idea of using statistical methods to produce synthetic data is rather old. Many methods are proven to be effective in different scenarios. Today, the most well-known ways to generate synthetic data are: VAE GAN Transformers Transformers A bit of history We talked about RNN last week and we saw how they can be used to predict sequences. Unfortunately, RNN suffer some problems, especially with long sequences where they seem to forget what happened. ...
EPITA Courses - Recurrent Neural Networks
Introduction Motivation Why is it necessary to introduce the concept of recurrence in neural networks? We know certain things in sequence. Let’s take the example of the alphabet. We can recite it without even thinking about it in the order from A to Z. However, if we were asked to recite it backwards, or even worse, to recite the alphabet based on the position of the letters (give the 17th letter, then the 5th…), we would be unable to do so. ...
EPITA Courses - Timeseries
Time Series Time series & stochastic processes A time series is a set of observations $x_t$ generated sequentially over time $t$. There are two main types of time series: continuous time series discrete time series We can also differentiate time series whose values can be determined by mathematical functions, deterministic time series, from the time series that have some random component, non-deterministic time series. To forecast non-deterministic time series, we assume that there is a probability model that generates the observations of the time series. ...
About Me
I am Julien, french engineer in AI since 2017. I have worked in several organizations on different topics, all related to AI or where AI is applied to. I love playing golf (even though I am not very good at it…), going to the gym, reading, learning new things and working (not a joke, I do love it). I sometimes write articles about tech, most of them are AI related: ...
MLX DQN
Reinforcement Learning with Apple MLX Framework Today, a very short article about Apple MLX framework. I recently learned that Apple has its own machine learning framework, and as a heavy Mac user I thought I’d give it a try. It is very easy to use and intuitive, the syntax is nice and looks like Numpy and PyTorch, which is convenient as a PyTorch user. As an example, let me present a Deep Q Learning implementation that I wrote. It comes from a nice DeepMind paper. ...
Kolmogorov AI Framework | Part 1
Shannon Entropy Concept In 1948, engineer and mathematician Claude Shannon published a foundational paper for computer science, and later artificial intelligence: A Mathematical Theory of Communication. This article defines a central idea in the training of current algorithms: information entropy. $$ H = -\sum_{i = 1}^{n} p_i \log_2(p_i) $$ This formula allows us to quantify how random or organized a data source, such as a text-generating program, is. The higher the entropy, the more random the source; the lower the entropy, the more the data consists of recognizable patterns that allow us to predict the next words. ...
Physics Informed Neural Networks
Introduction Neural networks require large amounts of data to converge. Those data need to represent the task the neural network is trying to learn. Data collection is a tedious process, especially when collecting data can be difficult or expensive. In science, physics for instance, many phenomenon are described using theories that we know are working very well. Using those data as regularization can help neural networks generalize better with less data. ...