Introduction

Image uploads are a common feature while building web applications but can pose security risks if not validated properly. While validation is done on the server to ensure maximum security, client-side validation does not hurt.

When we submit a file to the form, its extension determines its mime type. Thus, the file's mime type can also be changed by changing the file extension.



export default function FileUpload() {
  const [file, setFile] = useState<File | null>(null);

  const containerStyle = {
    display: "flex",
    flexDirection: "column",
    alignItems: "flex-start",
  };

  const inputStyle = {
    marginBottom: "10px",
    // Add more input styles here as needed
  };

  const buttonStyle = {
    marginTop: "10px",
    // Add more button styles here as needed
  };

  const textStyle = {
    fontStyle: "italic",
    // Add more text styles here as needed
  };

  return (
    <Form method="post" encType="multipart/form-data">
      <div style={containerStyle}>
        <input
          type="file"
          name="file"
          onChange={(e) => setFile(e.target.files[0])}
          style={inputStyle}
        />

        <text style={textStyle}>{file?.type}</text>

        <button type="submit" style={buttonStyle}>
          Submit
        </button>
      </div>
    </Form>
  );
}

Form Component generated with ChatGPT.

For this form, let's say we upload a pdf to this form.

Correct Mime type

The type is rightly shown as expected. But well, what if we change the extension from .png to .pdf.

Incorrect Mime type


The type is now application/pdf although it's an image file. This makes sense because you cannot expect the browser to actually go through the inner contents of the file to determine the file type.

Magic Bytes

Magic Bytes are the first bits of a file that uniquely identify its type. These sequences help identify the file type without relying on the file extension, which we just saw above can easily be manipulated.
For example:

  • The JPEG image format typically starts with bytes FF D8 FF.
  • The PNG image format typically starts with bytes 89 50 4E 47 0D 0A 1A 0A.

Magic Bytes, also known as file signatures or magic bytes, are specific sequences of bytes at the beginning of a file that uniquely identify the file type or format. These sequences are often used to determine how to handle or interpret the file.

Validating Image

If you head to https://mimesniff.spec.whatwg.org/#matching-an-image-type-pattern, you can see how we can identify mime type from the byte pattern.

Image type Pattern

Let's extend our previous code to show text from the byte pattern.

Add Byte Patterns

interface Mime {
  mime: string;
  pattern: (number | undefined)[];
}

const imageMimes: Mime[] = [
  {
    mime: "image/png",
    pattern: [0x89, 0x50, 0x4e, 0x47],
  },
  {
    mime: "image/jpeg",
    pattern: [0xff, 0xd8, 0xff],
  },
  {
    mime: "image/gif",
    pattern: [0x47, 0x49, 0x46, 0x38],
  },
];

function isMime(bytes: Uint8Array, mime: Mime): boolean {
  return mime.pattern.every((p, i) => !p || bytes[i] === p);
}

Adding Image Mime Types

Here, we import mime type and their respective pattern for identification. The isMime function checks if the pattern matches with the file content.

Get Mime Type from Pattern

import { match } from "ts-pattern";

function getImageMimeType(file: File): Promise<string> {
  return new Promise((resolve, reject) => {
    const fileChunk = file.slice(0, 4);
    const reader = new FileReader();
    reader.onload = function (e) {
      const bytes = new Uint8Array(reader.result as ArrayBuffer);
      const mime = match(bytes)
        .when(
          (bytes) => isMime(bytes, imageMimes[0]),
          () => imageMimes[0].mime
        )
        .when(
          (bytes) => isMime(bytes, imageMimes[1]),
          () => imageMimes[1].mime
        )
        .when(
          (bytes) => isMime(bytes, imageMimes[2]),
          () => imageMimes[2].mime
        )
        .otherwise(() => "Invalid Image Type");
      resolve(mime);
    };

    reader.onerror = function (e) {
      reject("Error reading file");
    };

    reader.readAsArrayBuffer(fileChunk);
  });
}

Function to Get Mime Type from File

To retrieve the MIME type of the uploaded file, we read the required portion of the file content to minimize memory usage and processing time. We then use ts-pattern library to compare the initial bytes of the file content with the predefined patterns.

Integrate With Form Component

export default function FileUpload() {
  const [file, setFile] = useState<File | null>(null);
  const [mimeType, setMimeType] = useState<string>("");

  const onChange = async (e: React.ChangeEvent<HTMLInputElement>) => {
    if (e.target.files) {
      const selectedFile = e.target.files[0];
      setFile(selectedFile);
      const type = await getImageMimeType(selectedFile);
      setMimeType(type);
    }
  };
  const containerStyle = {
    display: "flex",
    flexDirection: "column",
    alignItems: "flex-start",
  };

  const inputStyle = {
    marginBottom: "10px",
    // Add more input styles here as needed
  };

  const buttonStyle = {
    marginTop: "10px",
    // Add more button styles here as needed
  };

  const textStyle = {
    fontStyle: "italic",
    // Add more text styles here as needed
  };

  return (
    <Form method="post" encType="multipart/form-data">
      <div style={containerStyle}>
        <input type="file" name="file" onChange={onChange} style={inputStyle} />

        <text style={textStyle}>{mimeType}</text>

        <button type="submit" style={buttonStyle}>
          Submit
        </button>
      </div>
    </Form>
  );
}

Extend Previous Form Component

Now, we can use our code getImageMimeType instead of relying on the extension for retrieving the mime type for the file and displaying the file's mime type.

Proper File Type Validation

As we can see, even though the extension of the file is .pdf the mime type is rightly shown as image/png.

Conclusion

This way, using magic numbers for MIME-type validation provides an effective initial step for file validation. However, it's essential to acknowledge that the magic number validation should be a starting point rather than solely relying on them for file type validation. We also need to ensure that proper checks are done on the server and should not rely on validation from the client side only. We need additional checks and validation to ensure unwanted or harmful file types are not inadvertently stored or processed.

Thank you for reading this article. Catch you in the next one.