Hiding Data Inside an Image Using Python

2189 Views, 5 Favorites, 0 Comments

Hiding Data Inside an Image Using Python

In this instructable, we are going to use Python with the Pillow library to conceal a file inside an image, this practice is called steganography, which consists of hiding a message (like a text file) inside another message (like an image), note that even though in this case we are putting a text file inside an image, we can put any type of data (even another image) as long as it's not too large

We will use google colab to run our code, so you will need a google account

I first heard of this in this video and i think it's worth looking at

Overview

Before we start with the code, let's define first some things

We will consider a digital image as a set of values, each value describing a color. in a grayscale image for example, we usually have a set of values that range from 0 to 255, where 0 represents black, 255 represents white, and the values in between represent different levels of gray, these values are arranged in a 2D plane as shown in the image, notice how when the value is closer to 0 the shade of gray is darker, and when it's closer to 255 it gets lighter.

With this, we can consider color images as 3 grayscale images, each of the grayscale images corresponding to the red, green and blue channels of the color image, as shown.

Now, how do we modify an image to add arbitrary data while keeping the image close to the original?

First we consider the value of a pixel, we know that it ranges from 0 to 255 and that's because each pixel value is an integer of 8 bits in size, so the minimum value in binary would be 0000 0000 (the 8 bits in 0) and the maximum value would be 1111 1111 (the 8 bits in 1), and for example a value of 135 would be 1000 0111

consider now that, we want to save 1 bit of data in a pixel, so what we do is change one of the 8 bits in the value of the pixel to the value that we want to save, but, which of the bits do we change so that the value of the pixel doesn't change a lot?

We can do an experiment, for example, let's say that the bit of data that we want to save is the number 0, and the pixel where we are going to save it has a value of 135 (1000 0111 in binary)

now, we have 8 options as there are 8 bits, but we are going to consider only the bits that are in bold because they are basically the two extremes, now suppose we save our 0 in the bit that's on the left, so we have

1000 0111 → 0000 0111

135 → 7

note that the new value is 7, which means changing this bit has a really big impact on the value of the pixel, so let's try the other option

changing the bit on the right, we have

1000 0111 → 1000 0110

135 → 134

note that the pixel value only changed by one, so if we want to modify pixel values to store arbitrary data, we should modify the bit on the right

The bit on the right is named the Least significant bit (LSB) and the bit on the left, the Most significant bit (MSB) appropriately so, changing the LSB doesn't cause much change on the value, and changing the MSB causes a lot of change on the value, and the impact that a bit has on the value of a pixel increases moving from the LSB to the MSB.

Overview

We know how to save 1 bit of data in a pixel without changing its color too much; now to save multiple bytes of data, what we need is multiple pixels, we will need 8 pixels for each byte as we are saving 1 bit of data per pixel.

we have now a general procedure, consider dbits as a list containing the bits of the data we want to save in the image, and consider pixels as a simple 1-dimensional list containing the pixels in the image, and list[i] corresponds to the element with position i on the list list

What we want to do is set the value of the least significant bit of pixels[i] to the value of dbits[i] for each value of i from 0 to the length of dbits

Now an example: consider we want to save the text "hi" inside a 32x32 image, first, we find the representation of the text in numbers, using this tool we know that the letter h is represented with a value of 104 and the letter i is represented with a value of 105, we will need 16 pixels as each character in the text is 1 byte (8 bits) long, so we take the decimal representation of the characters, and find the binary representation, 104 → 0110 1000, and 104 → 0110 1001.

Now we start at the top left corner of the image, because generally that's the first pixel (pixel at coordinates (x,y) = (0,0), where x is the horizontal position of the pixel, and y is the vertical position)

We set the LSB of that pixel to the first bit in our data, and then we move to the next pixel (pixel at coordinates (1,0)), and set the LSB of that pixel to the second bit in our data, and we repeat the process again until all the data we want to write to the image has been written.

Note that after the pixel at coordinates (0,0) we move to the pixel at coordinates (1,0) meaning the pixel on its right, this is arbitrary, and we can move to the pixel at coordinates (0,1) instead, the order in which we write the data to the image is necessary to know only when we are extracting the data.

Retrieving the data is a simpler process, just go through each pixel and extract the last bit, then make groups of 8 with the bits to form the bytes and we are done.

Also, for color images the process is pretty much the same, except that we can consider the pixel coordinates to be (x,y,c) where c is the color channel

Now let's start with the code

Starting With the Code

First, we need an image and some data, the image we will use is the one shown above, and the data is a text file, with the code below in google colab, we download the image and save it as image.png and we download the text file and save it as data.txt

!wget -O image.png "http://r0k.us/graphics/kodak/kodak/kodim07.png"
!wget -O data.txt "https://pastebin.com/raw/fvuinc16"

After that, we import the necessary libraries and define variables with the names of each file

from PIL import Image

imgName = "image.png"
imgSecretName = "imgWithSecret.png"

dataSecretName = "data.txt"
dataRecoverName = "dataOut.txt"<br>

imgSecretName holds the name of the image that we will create, it will have the data from data.txt hidden inside it

dataRecoverName holds the name of the data we extract from imgSecretName, it should be the same as data.txt

Putting Data Inside the Image

Getting bits from bytes

We are going to first define a function that takes a list of bytes, and returns a list of bits, this is probably not the best approach in terms of memory usage, but it makes things easier to work with

def getDataBits(data):
  bits = []
  for byte in data:
    for b in range(0,8):
      bits.append((byte >> b) & 0b1)
  
  return bits

in the function, we first define the empty list bits, and what we want to do is iterate through data (the list of bytes) and append every bit to bits

we first iterate through every byte in data and then we iterate through every bit in byte

to get a specific bit from a byte, we use some bitwise operations, more specifically, right shift ( >> ), and bitwise and ( & )

right shift works by literally shifting the places of the bits to the right by a certain amount

for example

we have the byte x = 0100 0000

the operation x >> 4 results in the number 0000 0100

notice that the bits in bold are appended to the left (they didn't exist before the operation), and they are always 0

bitwise and performs an and operation between two numbers bit by bit, and the way we are using this operation in this function is often called bit masking

Our approach to get byte[i] (the bit number i in the number byte), is to first move the bit we want, from the position i to the position 0 (we do a right shift by the position → byte >> i), and then we read the bit number 0 of that result by making every other bit 0 ( x & 1 ) (if we do x & 1, which is the same as x & 0000 0001, notice every bit is going to be 0 except the one at the position 0, and we know that if have a certain bit y, the operation y & 1 results in y)

After we get the bit we want, we use the method append() to add it to the list

Note that even though we call it a list of bits, if you check the type of an item in the list it will show it's a list of integers (int), however, we call it a list of bits simply because we know the values in the list can only be 0 or 1

Writing bits to the image

We define a function that takes an image (an image object from the Pillow library), and a list of bytes, this function modifies the image by hiding the data inside it

def putDataInsideImage(im, data):
  dataBits = getDataBits(data)

  maxX = im.size[0] # width of the image
  maxY = im.size[1] # height of the image
  maxC = len(im.getpixel((0,0))) # number of channels in the image

  x = 0 # x coordinate of the pixel
  y = 0 # y coordinate of the pixel
  c = 0 # color channel of the pixel

  for bit in dataBits:
    color = list(im.getpixel((x,y)))
    color[c] = (color[c] & (~0b1)) | bit;
    im.putpixel((x,y), tuple(color)) 

    c = c + 1 # we first iterate through the color channel
    if c >= maxC:
      c = 0
      x = x + 1 # then through the x coordinate
      if x >= maxX:
        x = 0
        y = y + 1 # and finally through the y coordinate
	if y >= maxY:
          print("Not enough pixels!")
          return

        # knowing the order in which we save the bits is important when recovering the data from the image

The first thing we do, is get the list of bits from the list of bytes by using the function we defined above, getDataBits

Then, we define variables that hold the width of the image, the height of the image and the number of channels of the image, we have to know them so we can iterate through all the pixels of the image while avoiding accessing a pixel in a coordinate that doesn't exist

we then define counters for each coordinate (x is the horizontal coodinate, y is the vertical coordinate and c is the color channel)

now we iterate through every bit in the list of bits

what we do is get the current pixel we are at based on the counters (x,y) using the method getpixel((coords)), and then again, using bitwise operations, we change the least significant bit of the value of the pixel in channel c to the bit we want to save, the bitwise or ( | ) work similarly to the bitwise and, except it performs the or operation bit by bit, and the bitwise not ( ~ ) flips every bit in the number (changes every 0 to a 1, and every 1 to a 0), our approach is first to set the LSB to 0 by using bit masking (~1 is equal to 1111 11110, so if we do the operation x & ~1, we set the LSB of x to 0) and then we do a bitwise or to set the LSB to the value of the bit we want to save

We update the pixel values in the image with the method putpixel((coords), (color))

Then we increment the counters and check if they are over the maximum value to reset them, notice that we first move through each color channel, then through every horizontal coordinate, and then through every vertical coordinate, we need this information when we are going to extract the data

Note: if the image doesn't have enough pixels to hold all the data, we exit the function and print a message to indicate that

Now, executing the next bit of code will open the image and show it on google colab

im = Image.open(imgName)
im

and executing the next bit will open the data file and read the contents (the bytes) and save it into a variable

with open(dataSecretName, "rb") as dataFile: # Note that we open in binary mode
  secretData = dataFile.read()

Now we can execute the function and save the image

putDataInsideImage(im, secretData)
im.save(imgSecretName)
im

The first image in this step shows the image with the data added to the LSB, in contrast, the second image is the image with the data added but to the MSB, notice how the top of the image changes a lot (also notice where the data ends)

Recovering Data From Image

Getting bits from image

We first define a function that extracts the LSB of every value in the image

(notice that we don't know how long the data is, so we don't know how many bits to read from the image to get the hidden message, we will solve this problem later)

def getBitsFromImage(im):
  bits = []

  maxC = len(im.getpixel((0,0)))
  for y in range(0, im.size[1]):
    for x in range(0, im.size[0]):
      for c in range(0, maxC):
        bits.append(im.getpixel((x,y))(c) & 0b1)

  return bits

The function takes an image as an input, and outputs the list of bits corresponding to the LSB of every pixel value

we first define an empty list of bits, we are going to append every bit there

notice the order of the iterative loops, we first go through every value of c for a certain value of x and y, then go through every value of x for a value of y and then we change y this order is the same one as in the function used to write the data inside the image

we use the method getpixel((coord)) and we get the value of the channel c, then we use bit masking to get the LSB and we append the value of the bit to the list of bits

Note that this function will return every LSB in the image, but we saw in the last step that only a small part of the image contains the data, so we will get a lot of data after our hidden data that we don't need, so we will actually define this function a little different, by adding an argument with the maximum number of bits we want from the image in case we know the length of the data

def getBitsFromImage(im, maxBits = 0):
  bits = []

  maxC = len(im.getpixel((0,0)))
  for y in range(0, im.size[1]):
    for x in range(0, im.size[0]):
      for c in range(0, maxC):
        bits.append(im.getpixel((x,y))[c] & 0b1)
        if maxBits is not 0 and len(bits) >= maxBits:
          return bits

  return bits

if the argument is 0 we get all the bits, but if it's not 0 then the second argument determines the amount of bits to read

Getting bytes from bits

we define a function that takes a list of bits and returns a list of bytes

def bitsToBytes(dataBits):
  dataBytes = []
  for i in range(0, len(dataBits), 8):
    if i+7 >= len(dataBits):
      break
    
    currByte = 0
    for b in range(0, 8):
      currByte = currByte | (dataBits[i + b] << b)

    dataBytes.append(currByte)

  return(bytes(dataBytes))<br>

we define the empty list dataBytes where we are going to append every byte

we iterate through the list in steps of 8 because we take 8 elements every loop to make a byte, we do some checks to see if the next operation will fail and break the loop accordingly

then, we define the variable currByte, which is the current byte we are calculating, we then use bitwise operations to update every bit in currByte, notice the left shift ( << ) operation which works similarly to the right shift, what we do here is move the current bit to it's position and set it in the currByte variable

after currByte is ready, we append it to the list

we do return(bytes(dataBytes)) because dataBytes is a list, and we want to return an object of type bytes so we can save it directly to a file

Getting data from image

We define a function for convenience that simply uses the two functions defined above

def getDataFromImage(im):
  dataBits = getBitsFromImage(im)
  return bitsToBytes(dataBits)<br>

we first get every bit from the image, and then get the list of bytes

Now we open the image, read the secret data and save it in a file

imO = Image.open(imgSecretName)
with open(dataRecoverName, "wb") as dataFile: # Note that we open in binary mode
  dataFile.write(getDataFromImage(imO))<br>

We open the file dataOut.txt and notice that the start of the file is correct, but the actual data is followed by some random characters (as shown in the image for this step), we fix that in the next step

Improvement

Writing the data

To solve the problem we have of not knowing the size of the data when extracting it, we do a very simple thing.

When saving the data, we save the size of the data on the first bits of the image, and after those bits we save the actual data

so we define the function

import struct
def putDataInsideImageWithSize(im, data):
  data = list(data)
  data[0:0] = struct.pack("=L", len(data)) # insert 4 bytes at the start of the data that specify the ammount of data in number of bytes, the pack function takes the number and returns the bytes representing the number
  putDataInsideImage(im, bytes(data))<br>

which first prepends 4 bytes to the data which contain the length, and then calls putDataInsideImage

we use struct.pack() to get the 4 bytes that represent the ammount of data

the usage is the same as the other function

im = Image.open(imgName)
with open(dataSecretName, "rb") as dataFile: # Note that we open in binary mode
  secretData = dataFile.read()
putDataInsideImageWithSize(im, secretData)
im.save(imgSecretName)<br>

Reading the data

Now, we need to define a function that first reads the size of the data form the image, and then reads the actual data, so we define the function

def getDataFromImageWithSize(im):
  dataSizeBits = getBitsFromImage(im, 4*8)
  dataSize = struct.unpack("=L", bitsToBytes(dataSizeBits))[0]
  dataBits = getBitsFromImage(im, 4*8 + dataSize*8) 
  return bitsToBytes(dataBits)[4:]<br>

What we do is we first read 4*8 bits from the image (so 4 bytes), then we use struct.unpack() to get the number those 4 bytes are representing, and now, knowing the data size, we read again from the image, now we read 4*8 + dataSize*8, meaning the 4*8 bits of the data size plus the ammount of actual data in bits

We return only from the 5th bit to the last to ignore the 4 bytes that represent the data

The usage is the same as the other function

imO = Image.open(imgSecretName)
with open(dataRecoverName, "wb") as dataFile: # Note that we open in binary mode
  dataFile.write(getDataFromImageWithSize(imO))<br>

Notice that when we open dataOut.txt, the random data is not there, and the files data.txt and dataOut.txt are identical

The End

So we have implemented a way to put data inside images

im = Image.open(imgName)
with open(dataSecretName, "rb") as dataFile: # Note that we open in binary mode
  secretData = dataFile.read()
putDataInsideImageWithSize(im, secretData)
im.save(imgSecretName)

and to read the data from those images

imO = Image.open(imgSecretName)
with open(dataRecoverName, "wb") as dataFile: # Note that we open in binary mode
  dataFile.write(getDataFromImageWithSize(imO))

Notice that the data file is read and written in binary mode, so we can put any type of file inside the image as long as the image is big enough to hold all the data

Also notice that we can save two bits per pixel value instead of one if we want to save more data, as the second bit also causes very minimal changes in the value of the pixel, you can check the google colab notebook for information in implementing the version with 2 bits per pixel

Now some considerations. Even though we successfully achieved hiding the data and retrieving it, we didn't send it through a messaging service or uploaded it to a website to see the effects there might be on the hidden data on a real world usage scenario, truth is if the image is compressed in any way, there's a high chance that the hidden data is lost, however, it depends on the service used, for example by uploading the image (with the hidden data) to imgur and then downloading it in colab and extracting the data, we get dataOut.txt to be the same as data.txt, however by sending it through whatsapp and downloading it, we get an image with extension jfif (this format uses lossy compression) and trying to extract data results in a bunch of random characters.