Recognizing Google Fonts in Images

Have you ever stared long and deep at an image with text and wondered what font was used to render the text? Us too. That’s why we built an AI classifier that is able to identify the Google font that most closely matches the text in an image, out of a collection of over 3,000 fonts. Surprisingly, it works on AI-generated images too!

Our classifier is publicly available on Github, including a pre-trained checkpoint as well as code for training a model from scratch, in case you need to work with a different set of fonts. Alternatively, if all you’re looking for is a dead-simple and intuitive interface for figuring out what font was used in an image, head over to Storia Lab. Font classification is free and unlimited even without a subscription!

Recognizing Google fonts in images with Storia Lab

The backstory

When building Textify, our Storia Lab feature that enables users to change text in images seamlessly, we realized that using diffusion models for rendering is not always a silver bullet. Indeed, deep machine learning models are absolutely necessary for photorealistic images, where the text needs to abide by the laws of physics; like any real-world object, text can be positioned at a certain angle, have a curvature, be illuminated by a source of light, be partially obstructed, or show natural defects.

However, there are situations where traditional text rendering (using a font file) is a more appropriate solution. This is the case for certain graphic illustrations, where the text isn’t necessarily part of a realistic scene, does not need to conform to the rules of physics, and is a somewhat independent element. Occasionally, this works even for simpler photorealistic images where the text is directly facing the camera and there are no complicated shadows.

For situations like the latter, Textify executes the following steps to update an existing piece of text with a new one:

  1. Remove the existing text while preserving the background. This can be done by a lightweight diffusion model like LaMa.

  2. Select the font that resembles the original text most closely, from a predefined collection of fonts.

  3. Render the desired text programmatically, with the chosen font.

The font classifier comes in handy for step 2, by suggesting the Google font that most closely matches the original image.

Training procedure

We synthetically produced 250,000 <image, font> pairs for training, applying the following steps:

  1. Choose an image background. With 80% probability, we selected an image from the COCO dataset. With 20% probability, we selected a plain color.

  2. Choose a piece of text. We randomly selected text snippets from English Wikipedia of a randomly-chosen length N. This inevitably resulted in truncated words, but we embraced it as an opportunity for regularization. Given that this classifier will be used on a wide variety of images (including AI-generated ones), we don’t necessarily expect that, at inference time, the text will always be correct English.

  3. Render the text onto the background. We selected a random position, font size, and font color to render the text.

  4. Apply augmentation techniques including JPEG compression, shift, scaling, and rotation. This is to prevent the model from overfitting to the training set, and accommodate for a wider variety of inputs.

For the model architecture, we used Resnet50 — a deep convolutional neural network with 50 layers that have residual connections. Our Github repository contains all the necessary code to reproduce training data generation, as well as scripts to retrain the model. Or you can simply use our pre-trained checkpoint.

Font Classification API

If you’d like to skip the headache of hosting your own endpoint for font classification and would rather just call our API, let us know at info@storia.ai.

Next
Next

The Science Behind Textify — Scene Text Editing