Key Resources That Shaped My Deep Learning Understanding

Deep Learning with Python

Manning Publications (2021 - 2nd ed.), by François Chollet

AI and ML books are plentiful, and since you have to choose somewhere, in September 2023 I bought a paper book, mainly based on its title — somewhat randomly, I admit — spurred by the shock wave of Midjourney and Stable Diffusion image generation revolution: “Deep Learning with Python”, by François Chollet.

I really like Manning Publications ; I have the impression that their books regularly stand out from the crowd. I’ve spent quite some time digging through several of their works, including Taming Text (2012), a bestseller, Collective Intelligence in Action (2008), Machine Learning in Action (2012), and so many others.

François Chollet is French, trained at ENSTA, worked 9 years at Google, and is notably known for creating Keras, a high-level framework with elegant syntax that can completely mask the underlying complexity of deep learning. That’s a lot of good points to start with him.

The book’s title is somewhat misleading—we could have called it Deep Learning with Keras, which isn’t insignificant when you later discover that the industry has massively chosen PyTorch (even though Keras v3 allows you to build on JAX, TensorFlow, or PyTorch).

Chapter 2, after presenting the famous tensors and affine transformations, tackles the heart of the machine: the backpropagation mechanism and gradient descent, a technique that was notably Nobel Prize-worthy in 2024.

You sometimes get a bit dizzy when you start to grasp how certain technologies work that you previously only perceived as black boxes — this is obviously the case here.

Chapters 5 and 6 are general chapters that apply to any deep learning project and attempt to define a universal workflow for approaching a machine learning problem. But Chapter 5 is a bit more than that—it tries to explain why deep learning “works”:

More generally, the manifold hypothesis posits that all natural data lies on a low-dimensional manifold within the high-dimensional space where it is encoded. That’s a pretty strong statement about the structure of information in the universe. As far as we know, it’s accurate, and it’s the reason why deep learning works. It’s true for MNIST digits, but also for human faces, tree morphology, the sounds of the human voice, and even natural language.

Here too, you get that feeling of neurons connecting and the vertigo of understanding…

Chapter 7 explores Keras in depth, notably presenting the functional API, which opens up a world of possibilities. The final chapters introduce the classic application fields of deep learning: computer vision, time series, and generation.

Deep Learning: A Visual Approach

No Starch Press (2021), by Andrew Glassner

Let me break out the superlatives here — I think this book is a masterpiece in the underrated discipline of technical writing. Without writing a single line of math (or very little), without even a line of code, it manages the feat of making almost the entire field covered by machine learning accessible to everyone.

Andrew Glassner is no ordinary person: he worked at the Xerox Palo Alto Research Center, at Microsoft, is a true expert in computer graphics, and author of successful books in this discipline.

So why is it a masterpiece? I think this gentleman is a great popularizer. He starts from the basics, Bayesian probabilities, and gradually builds up through derivatives to explain gradients, takes a detour through Shannon’s information theory, explaining why the concept of entropy is useful in ML (to compare two probability distributions). I believe he’s the only one, among all the books I’ve browsed, who goes so deeply into popularizing the theoretical foundation of deep learning. All of this, once again, with little to no math and no code, just in natural language, understandable by everyone.

He devotes an entire second part to “old” ML algorithms: k-nearest neighbor, decision trees, support vector machines, random forests, and more, before moving on to two dense parts on deep learning, from the famous perceptron, through convolutional neural networks, autoencoders, RNNs, and GANs before arriving at transformers, the foundation of our beloved LLMs. The author goes into detail on each concept and seeks to make everyone understand, without jargon, how it works.

The transformer architecture is notoriously challenging to comprehend, and despite watching countless YouTube explainers, I long struggled to grasp this cornerstone of modern LLMs. Yet Glassner’s copious schematics truly deliver on the promise of “A Visual Approach,” allowing readers to trace the flow of tensors and see the neural network’s mechanics in action. For visual learners like myself, this is what makes the book singular: its wealth of diagrams isn’t just decorative, but the very key that unlocks understanding and brings the concepts to life.

I think this makes it the perfect book for managers and project leaders who want to invest some time to understand what their teams are talking about exactly. But even for someone like me, looking for more concrete, operational know-how, I found this book tremendously valuable.

Deep Learning with PyTorch

Manning Publications (2023 - 2nd ed.), by Eli Stevens, Luca Antiga, Thomas Viehmann

I didn’t read this one as thoroughly as the other two — that is, in a scholarly way, with note-taking, reformulation, and re-reading. But it contains two important chapters, the 3rd and 4th, which offer a real introduction to the famous tensors, placing them within the Python data structures ecosystem and offering the reader a very didactic approach that allows you to manipulate data, which is perfectly effective for assimilating the technology.

LLM Engineer’s Handbook

Packt Publishing (2024), by Paul Lusztin and Maxime Labonne

When you approach a new discipline, after the 101 basics, questions always arise about the industry — how do they really do it at OpenAI and company to manage their models?

This book sheds light on managing models in production, framing them as continuous flows. It does include the substantial code samples Packt is known for — which some may feel belong more in a GitHub repo — but breezing past those sections makes for a swift, focused read. What remains is exactly what you came for: a practical, end-to-end ML workflow built on industry-standard tools. I devoured it in a few hours and walked away with two essentials I now use every day: MLflow and Airflow.

Currently on My E-Reader

AI Engineering (O’Reilly, 2024), by Chip Huyen

The author seeks to share her experiences on putting models into production: choosing and evaluating “foundation models”, prompt engineering, RAG & agents, fine-tuning, dataset engineering — there’s even a chapter on inference optimization, a discipline I struggle with on my self-hosted Jetson. I’m deep into it right now. She has a very practical but precise style, and I find her table of contents perfect for developers like me who want to lift the hood a bit without necessarily becoming data scientists, but simply want to use, modify, and exploit ML models in their projects.

Final Note

I could list many others, but I’ve decided to limit myself to those I’ve really taken the time to study thoroughly.

None of this would have been possible without an O’Reilly subscription. Being able to browse, choose, and then read all these books from different publishers in one place, with a single subscription, is a real opportunity, and as a CTO, I always try to provide access to my team.

Other media have of course accompanied me — so many YouTube tutorials have helped me solve technical issues I was stuck on. I’ll only mention one: Fine tuning Pixtral - Multi-modal Vision and Text Model by Trelis Research, but that doesn’t do justice to this indispensable platform given the number of videos I’ve consumed.

There are also pure gems on YouTube, particularly the Collège de France channel. Yann LeCun held the annual chair in “Computer Science and Digital Sciences” in 2015, so we have all these lectures available for free on the platform. I’ve watched the inaugural lecture and the first two courses — it’s obviously essential, sometimes a bit technical, but you get to the bottom of things. Despite the fact that these lectures took place just before the arrival of transformers and the LLM revolution, it’s surprising how well the content has aged.

I’d like to finish by mentioning Grant Sanderson’s YouTube channel 3Blue1Brown, which produces a lot of content on mathematics but has released a playlist on deep learning, which I’ve watched and rewatched sometimes. He too is a great popularizer. Also see his blog.