DNA Data Storage is the Future!
Articles Blog

DNA Data Storage is the Future!

Hi there, my name is Xavier and the fact that
I have blue eyes, blond hair and that I have an average height is all encoded in my DNA. It’s nature’s way of storing data, but
could we also use it to store digital data like documents, photos, music, and videos? And if we could, why would we? What are the pros and cons of DNA data storage
and what problems does it solve? Before answering these questions though, let’s
look at how we’ve been storing data up until today. The vast majority of data is currently stored
on magnetic media like hard drives or tapes and some of it is stored on optical media
like CDs and DVDs. However, our modern storage techniques have
a few flaws: they are not robust, have a low information density and each media requires
a special device to read and write data. Let’s look at each of these problems in
more detail and see how DNA can help. Let’s start with robustness, or rather lack
thereof. Hard drives are pretty unreliable with 1-3%
of drives dying within their first year. In subsequent years that can climb to almost
10%. Tape drives are a bit better and are designed
to last for up to 30 years IF they are stored in a controlled environment. You might wonder why robustness is so important. Well, humanity has generated more data in
the last few years than in all of history. And yet, in a few hundred years from now,
none of it would be left unless we keep copying it to newer drives. That’s a big problem for historical data
which could be invaluable in the future. DNA, on the other hand, is very robust. In 1991, the mummy of Ötzi the Iceman was
found in the Alps. He died over 5000 years ago and yet we were
still able to extract his DNA, read it and find out that he was lactose intolerant and
that he has living relatives in Austria today. If Ötzi had been carrying a hard drive with
him, we likely wouldn’t be able to read it 30 years after his death, let alone after
5000 years. You could say that DNA is nature’s oldest
and most robust storage media. It cannot be stored forever but it can easily
last a few centuries. The second problem that we currently face
is information density. This is Facebook’s cold storage data center
in Oregon. It is over 5700 square meters in size (62,000
square feet) and can store approximately 1 exabyte of data. That’s a thousand petabytes or 1 billion
gigabytes. That sounds impressive, but if you want to
store all of the data that was generated in 2018, you would need 33,000 of these massive
data centers. Yikes! How does DNA compare, you ask? Well, theoretically you can store 1 zettabyte
of data in a single gram of DNA. That’s a billion terabytes, the equivalent
of 71 million of the largest capacity hard drives available today. But that’s theory, currently, we can only
store 215 petabytes per gram of DNA. Which means we can replace this entire data
center with just 5 gram of DNA. This insane density allows us to store all
of the world’s data in a very small footprint. It also means we don’t have to choose what
data we want to preserve for future generations. We can just store all of it all in DNA. Heck, you even say that we, humans, are huge
data capsules. We have over 37 trillion cells in our bodies,
each containing a copy of our entire DNA. And if you know that the human genome is 750mb,
then we’re carrying 37 trillion times 750 megabytes. Which is a huge amount of information! The final problem I’d like to highlight
is that every advancement in storing data has required a new way to read it. The 3,5-inch floppy disks were very popular
in the 80s and 90s. And yet, barely 40 years later we can hardly
use them because our computers don’t have a reader for them anymore. The same goes for optical media such as CDs
and DVDs. DNA, by contrast, was not a human invention. It’s been around since the beginning of
life and has always had the same properties. DNA readers or “sequencers” built today
can read all DNA, even old DNA, like the one from Ötzi. It’s also future-proof because DNA is becoming
a key technology in areas such as biology, medicine, and forensics. Alright, you probably agree with me that DNA
is the future of data storage. But how do you actually store data in it? Traditionally we store data in binary form:
meaning with zeros and ones. In a hard drive, the zeros are represented
by areas which aren’t magnetized and ones by areas that are. DNA is instead made up of four base components:
Adenine, Thymine, Cytosine, and Guanine also referred to as A, T, C, and G. This means that we now have four distinct
values instead of two, so we have to rework our binary files. Instead of storing each 0 or 1 individually,
we store them in pairs of two, like this. Once we have that, we can encode the data
into synthetic DNA. This is exactly like the real stuff, the only
difference being that synthetic DNA is not stored inside a living cell. Now that our data is stored inside DNA, we
can read it by using a DNA sequencer. This deconstructs the DNA and reads out all
the A, T, C, G’s. This process, however, is not perfect and
read errors might occur. The sequencer might not be able to read a
certain piece of DNA because it’s damaged or it might say it read an A instead of a
T. And when that happens, how can we recover the data? In our own bodies, there are proteins active
that find damaged or “mutated” DNA and try to repair it. If the damage is too big, they can even kill
the entire cell to prevent spreading the bad DNA further. But that’s in organic DNA, in synthetic
DNA we don’t have these proteins. So instead we can add error correcting codes
to our data. The DNA Fountain method, for instance, uses
fountain codes to correct errors, which is also used for broadcasting TV signals. How does an error correcting code work? Simple: imagine you want to store three digits:
5, 9 and 17 and that you want to be able to recover all three of them, even if one can’t
be read. To do that, you store the sum of all three
as well (5 + 9 + 17=31). Now when the number 9 gets corrupted, you
can recalculate it by subtracting the numbers from the sum. This is a very simple error correcting code
but it demonstrates how the concept works. By adding fountain codes to your DNA encoded
data, you can ensure that you can always read it back. Even when some A, T, C or G’s can’t be
read correctly. Okay so that’s reading DNA, but can we also
copy DNA? Traditionally with hard drives, we have to
read one drive and copy every single bit to another one. DNA, by contrast, can easily be copied millions
of times with a Polymerase Chain Reaction. This technique is used in forensics for instance
to make copies of scarce DNA samples. If you then screw something up, you still
have some copies left. The same technique can be used with synthetic
DNA that contains data. The only downside is that by copying DNA you
add some noise and the quality is reduced. But thanks to our error-correcting codes we
can overcome this issue. Using DNA as a storage medium seems like a
no-brainer, but there are some drawbacks as well: mainly the cost of it all. Creating or synthesizing DNA is an expensive
process: coming in at $3500 dollars per megabyte. A bit much if you know that a hard drive can
do the same for less than a penny. However we have to see this in context. The first hard drive made by IBM in 1956,
could store 5 megabytes at a price of $10.000 per megabyte. We have to start somewhere! Reading or sequencing DNA is a bit more affordable. You can have your own DNA sequenced for less
than $1000. It is, however, time-consuming because the
entire DNA has to be sequenced, even if you’re only interested in a small part of it. But that is changing: Microsoft has demonstrated
a technique that allows us to randomly read parts of DNA. So, time for a conclusion then: DNA data storage
is the future but it will take some time before we can phase out our trusty hard drives. Costs have to come down first before it can
be considered a viable alternative but as history has shown, that’s is only a matter
of time. And when it does, it will allow us to store
incredible amounts of data in a very small space and we’ll able to archive data for
generations to come. So what’s your opinion on DNA data storage? Let me know in the comments below! If you liked this video, hit the thumbs up
button and consider getting subscribed. Thank you very much for watching and I’ll
see you in the next video. Oh and by the way, if you want to learn more
about this topic, you can check out the sources I’ve used to make this video in the description
below. There are a few interesting papers in there
that are worth a read! Enjoy!

28 thoughts on “DNA Data Storage is the Future!

  1. At the rate current technology is evolving, it won't be long until we start seeing DNA Data Storage actually being used by big companies. I say +-20 years until we start seeing some versions for home users. Then maybe a few more years until they get really good and we can store the entire Internet.

  2. The main problem you didn't mention beside the price is the writing and reading speed, and event if the price goes down enough it won't be really useful if you have to wait 1 hour to read a file.
    It could however be used for "cold storage" of huge amount of data in the long term.
    Also when you mention the high fail rate of hard drive it's only for data-center where drives are sniping 24/7 until they die, when not used only a shock could destroy it.

  3. Damn! I thought Artificial Intelligence (AI) coupled with machine learning might today be simultaneously an existential threat to natural intelligence and an opportunity that natural intelligence cannot afford to ignore. But now this presentation suggests that intelligent machines might one day actually store AI in their Artificial DNA. Like Spock would say, "Fascinating!".

  4. The error correction example you gave is completely bogus. Take your 5 7 and 19 example. Say you can only read the 5 and have an error correction of 31. That leaves 26 unaccounted for and two blanks to fill in. That then creates an array of potential values. So how then does one or the sequencer know which value is assigned to each blank? Also keeping in mind that the wrong value will alter the DNA and thus make the error correction of no value.

  5. The problem with the approach you described is someone does not understand how DNA works.

    There are two strands of data, likened to a male and female or positive and negative. The male/positive is the strand that is read and the female/negative is the strand that confirms the data is accurate.

    Say each value in the strand is one of the following: 00, 01, 10, 11 and also holds up to 4 values or 8 digits (bits/bytes).

    If the male/positive is 01, then the female/negative is 10; the exact opposite. Male 01, 01, 11, 00 & Female 10, 10, 00, 11 and totals has a total of 8 (11, 11, 11, 11). This yields two distinct strands that are complementary to each other and when totaled will always equal to 8. This also gives great error correction since any combined lines that have a total value of less than 8 one knows immediately that they have been tampered with or are incorrect, letting you know to perform error correction. Because both strands must be complementary all error correction is a simple math operation to recover. This is not hard nor very difficult, so long you one knows what they are doing.

    Now on top of the basics, one could easily allow for new/different formats for various purposes, so long as they stick to the principles mentioned above. Change your total value from 8 to say 64 or 128, etc. It makes no difference other than some performance on massive data sets. Similarly one could nest the simple DNA strands within each other to make up massively complex DNA, similarly to nesting arrays within each other.

  6. It's not exactly the same thing but your subject reminded me of this old movie => johnny mnemonic => https://youtu.be/Uwl5MBzTCRQ

  7. It feels weird. God put DNA in living beings and I think it should stay that way. If synthetic DNA would have to be harvested from living beings then I'd be against it.

  8. Our major problem as Mankind is not really to know how we will be able to store incredible amounts of datas in the future but how will we be able to overcome all the upcoming problems which are threatening the survival of our specie and the only one race upon all civilizations: Human Being…

  9. This blew my mind. The illustrations of the three concepts gave me the thrill of a school kid at a science fair. (That was 60 years ago).
    Great job.

  10. If this drives down the cost of synthesizing arbitrary DNA, I wonder if that could lead to criminals seeding crime scenes with arbitrary DNA from many people who weren’t present, thus hiding their crimes. Or law abiding citizens obscuring their movements for privacy reasons by deploying large amounts of such DNA. Kind of like “chaff” – a countermeasure used by aircraft to confuse radar readings. This would be like biometric chaff. Though cheaper reading of DNA might offset this impact by making it easier to sort through.

    In any case, it’s an interesting subject.

  11. It took the most intelligent mathematicians and engineers and scientists to develop binary and modern coding/computing but DNA is a random product of senseless nature? Not buying it.

  12. Can this process eventually be used to mutate organic DNA data? Perhaps to change someone's DNA sequence which will thereby affect their genetic trait. If this is possible, then…

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top