How we can store digital data in DNA | Dina Zielinski
Articles Blog

How we can store digital data in DNA | Dina Zielinski


I could fit all movies ever made
inside of this tube. If you can’t see it,
that’s kind of the point. (Laughter) Before we understand how this is possible, it’s important to understand
the value of this feat. All of our thoughts
and actions these days, through photos and videos — even our fitness activities — are stored as digital data. Aside from running out of space on our phones, we rarely think about
our digital footprint. But humanity has collectively
generated more data in the last few years than all of preceding human history. Big data has become a big problem. Digital storage is really expensive, and none of these devices that we have
really stand the test of time. There’s this nonprofit website
called the Internet Archive. In addition to free books and movies, you can access web pages
as far back as 1996. Now, this is very tempting, but I decided to go back and look at
the TED website’s very humble beginnings. As you can see, it’s changed
quite a bit in the last 30 years. So this led me to the first-ever TED, back in 1984, and it just so happened
to be a Sony executive explaining how a compact disk works. (Laughter) Now, it’s really incredible
to be able to go back in time and access this moment. It’s also really fascinating
that after 30 years, after that first TED, we’re still talking about digital storage. Now, if we look back another 30 years, IBM released the first-ever hard drive back in 1956. Here it is being loaded for shipping
in front of a small audience. It held the equivalent of one MP3 song and weighed over one ton. At 10,000 dollars a megabyte, I don’t think anyone in this room
would be interested in buying this thing, except maybe as a collector’s item. But it’s the best we could do at the time. We’ve come such a long way
in data storage. Devices have evolved dramatically. But all media eventually wear out
or become obsolete. If someone handed you a floppy drive today
to back up your presentation, you’d probably look at them
kind of strange, maybe laugh, but you’d have no way
to use the damn thing. These devices can no longer meet
our storage needs, although some of them can be repurposed. All technology eventually dies or is lost, along with our data, all of our memories. There’s this illusion that
the storage problem has been solved, but really, we all just externalize it. We don’t worry about storing
our emails and our photos. They’re just in the cloud. But behind the scenes,
storage is problematic. After all, the cloud is just
a lot of hard drives. Now, most digital data,
we could argue, is not really critical. Surely, we could just delete it. But how can we really know
what’s important today? We’ve learned so much about human history from drawings and writings in caves, from stone tablets. We’ve deciphered languages
from the Rosetta Stone. You know, we’ll never really have
the whole story, though. Our data is our story, even more so today. We won’t have our record
recorded on stone tablets. But we don’t have to choose
what is important now. There’s a way to store it all. It turns out that there’s
a solution that’s been around for a few billion years, and it’s actually in this tube. DNA is nature’s oldest storage device. After all, it contains
all the information necessary to build and maintain a human being. But what makes DNA so great? Well, let’s take our own genome as an example. If we were to print out
all three billion A’s, T’s, C’s and G’s on a standard font, standard format, and then we were
to stack all of those papers, it would be about 130 meters high, somewhere between the Statue of Liberty
and the Washington Monument. Now, if we converted
all those A’s, T’s, C’s and G’s to digital data, to zeroes and ones, it would total a few gigs. And that’s in each cell of our body. We have more than 30 trillion cells. You get the idea: DNA can store a ton of information
in a minuscule space. DNA is also very durable, and it doesn’t even require
electricity to store it. We know this because scientists
have recovered DNA from ancient humans that lived hundreds
of thousands of years ago. One of those is Ötzi the Iceman. Turns out, he’s Austrian. (Laughter) He was found high, well-preserved, in the mountains
between Italy and Austria, and it turns out that he has living
genetic relatives here in Austria today. So one of you could be a cousin of Ötzi. (Laughter) The point is that we have a better chance
of recovering information from an ancient human than we do from an old phone. It’s also much less likely
that we’ll lose the ability to read DNA than any single man-made device. Every single new storage format
requires a new way to read it. We’ll always be able to read DNA. If we can no longer sequence,
we have bigger problems than worrying about data storage. Storing data on DNA is not new. Nature’s been doing it
for several billion years. In fact, every living thing
is a DNA storage device. But how do we store data on DNA? This is Photo 51. It’s the first-ever photo of DNA, taken about 60 years ago. This is around the time that
that same hard drive was released by IBM. So really, our understanding of digital
storage and of DNA have coevolved. We first learned to sequence, or read DNA, and very soon after, how to write it, or synthesize it. This is much like how we learn
a new language. And now we have the ability
to read, write and copy DNA. We do it in the lab all the time. So anything, really anything,
that can be stored as zeroes and ones can be stored in DNA. To store something digitally,
like this photo, we convert it to bits, or binary digits. Each pixel in a black-and-white photo
is simply a zero or a one. And we can write DNA much like an inkjet
printer can print letters on a page. We just have to convert our data,
all of those zeroes and ones, to A’s, T’s, C’s and G’s, and then we send this
to a synthesis company. So we write it, we can store it, and when we want to recover our data,
we just sequence it. Now, the fun part of all of this
is deciding what files to include. We’re serious scientists,
so we had to include a manuscript for good posterity. We also included a $50 Amazon gift card — don’t get too excited, it’s already
been spent, someone decoded it — as well as an operating system, one of the first movies ever made and a Pioneer plaque. Some of you might have seen this. It has a depiction of a typical —
apparently — male and female, and our approximate location
in the Solar System, in case the Pioneer spacecraft
ever encounters extraterrestrials. So once we decided what sort of files
we want to encode, we package up the data, convert those zeroes and ones
to A’s, T’s, C’s and G’s, and then we just send this file off
to a synthesis company. And this is what we got back. Our files were in this tube. All we had to do was sequence it. This all sounds pretty straightforward, but the difference between
a really cool, fun idea and something we can actually use is overcoming these practical challenges. Now, while DNA is more robust
than any man-made device, it’s not perfect. It does have some weaknesses. We recover our message
by sequencing the DNA, and every time data is retrieved, we lose the DNA. That’s just part
of the sequencing process. We don’t want to run out of data, but luckily, there’s a way to copy the DNA that’s even cheaper and easier
than synthesizing it. We actually tested a way to make
200 trillion copies of our files, and we recovered
all the data without error. So sequencing also introduces
errors into our DNA, into the A’s, T’s, C’s and G’s. Nature has a way
to deal with this in our cells. But our data is stored
in synthetic DNA in a tube, so we had to find our own way
to overcome this problem. We decided to use an algorithm
that was used to stream videos. When you’re streaming a video, you’re essentially trying to recover
the original video, the original file. When we’re trying to recover
our original files, we’re simply sequencing. But really, both of these processes are
about recovering enough zeroes and ones to put our data back together. And so, because of our coding strategy, we were able to package up all of our data in a way that allowed us to make
millions and trillions of copies and still always recover
all of our files back. This is the movie we encoded. It’s one of the first movies ever made, and now the first to be copied
more than 200 trillion times on DNA. Soon after our work was published, we participated in an “Ask Me Anything”
on the website reddit. If you’re a fellow nerd,
you’re very familiar with this website. Most questions were thoughtful. Some were comical. For example, one user wanted to know
when we would have a literal thumb drive. Now, the thing is, our DNA already stores everything
needed to make us who we are. It’s a lot safer to store data on DNA in synthetic DNA in a tube. Writing and reading data from DNA
is obviously a lot more time-consuming than just saving all your files
on a hard drive — for now. So initially, we should focus
on long-term storage. Most data are ephemeral. It’s really hard to grasp
what’s important today, or what will be important
for future generations. But the point is,
we don’t have to decide today. There’s this great program by UNESCO
called the “Memory of the World” program. It’s been created to preserve
historical materials that are considered of value
to all of humanity. Items are nominated
to be added to the collection, including that film that we encoded. While a wonderful way
to preserve human heritage, it doesn’t have to be a choice. Instead of asking
the current generation — us — what might be important in the future, we could store everything in DNA. Storage is not just about how many bytes but how well we can actually
store the data and recover it. There’s always been this tension
between how much data we can generate and how much we can recover and how much we can store. Every advance in writing data
has required a new way to read it. We can no longer read old media. How many of you even have
a disk drive in your laptop, never mind a floppy drive? This will never be the case with DNA. As long as we’re around, DNA is around, and we’ll find a way to sequence it. Archiving the world around us
is part of human nature. This is the progress we’ve made
in digital storage in 60 years, at a time when we were only
beginning to understand DNA. Yet, we’ve made similar progress
in half that time with DNA sequencers, and as long as we’re around,
DNA will never be obsolete. Thank you. (Applause)

100 thoughts on “How we can store digital data in DNA | Dina Zielinski

  1. I'm wondering what data we have that's worth the risk of what she is presenting. I'm guessing it's these ideas that are driving up demand for infant human bodies and body parts. This is not good.

  2. They are pushing this crap for us to be chipped like dogs and love it. Just wait, they'll sell it to you as the biggest convenience ever.

  3. Groundbreaking discovery.. if only she'd spend more time in explaining in detail how sequencing results in digital data

  4. What’s more worrisome than a greedy society learning they can use you for memory storage and power supply? Down to your dna…

  5. I'm pretty sure if you leave the word 'monkey' on a bit of dna, after a while when you come back to it it will say 'human'.

  6. I wanted to say that i want a Brainwave connected data input device, but then i realized if that is possible, rewrite and erase to our brain cells are possible too… scary

  7. Imagine decoding human or whatever creatures dna and finding actual data like pictures in it hinting towards other computerised civilisations existed thousands of years ago…

  8. This technology is amazing for creating incremental periodic backups! Today, tapes are used but we could also just produce one long dna string of information, clone it in little bacterias living in tiny containers and to read it, just open one container and take a bacteria out…

  9. This story is 2 years old and NOW, you're posting it?!
    I'm unsubing from this channel for that 1 reason.
    You're ancient in technology.
    I'm looking for a cutting edge channel.
    I thought you touted yourself as that.
    However, I guess I'm wrong about your channel.

  10. Has Google approached you yet?? To build a mega massive DNA datafarm?? And y dose Google not understand the importance of privacy.. y do the have to collect all the data in manipulate ways?? Another hi tech stuff gone to their hands I guess..just coz they can print money..

  11. digital space it gives you power, administrative power, and it IS the new real estate. i think there is muy mucho mulah in that idea/ and i think also we should try to retrieve the cloud data from the past civilizations we just discovered. They had no writing, being as advanced as they were they didn't need it

  12. The encoding will still be forgotten, which probably defeats the purpose. It's like writing on stone tablets which last millennia, but no one can read it.

  13. @2:50, regarding the floppies, that is exactly how numerous government agencies store important codes and data because they are too scared to update the systems for fear of them failing and launching all the nukes.

  14. Really interesting topic, a bit of fluff, but otherwise well presented. I wonder if the goal was really to back up our peak DNA and restore it later as it starts to naturally break down or mutate. No more getting old…

  15. If God didn't want us to mess around with this stuff He wouldn't have given us a brain! But, how do we apply morality to it?

  16. My God is the greatest data scientist. 30 trillion cells all carrying billions of code. Jesus is the boss.

  17. The encoding/compression algorithms ARE prone to be forgotten. You can have an old modem that still works, but unable to be used because the logical protocols have changed. Same problem here.

  18. I came to learn HOW they actually print DNA, not just "Oh, we just write it, and they make it, and we sequence it. MAGIC!"

    And how the heck do you replicate it 200 trillion times without a single error? Bacteria could make the copies for you real quick, but then you just end up with tonnes of errors. Cloning? How?

    Also, I know it's not her field, but Pioneer plaques?? They were on Voyager.

    I'm not mad, Dina. I'm just disappointed.

  19. To, change or store data in DNA, is not possible unless we know ,how to splice (a) GENE in a DNA.
    Is GENE splicing possible?
    Any way, thank you Dina.

  20. 6 trillion cells in human body, 3 billion base pair in each cell, every base pair contains 2 bit data. So human body contains 2*6 trillion*3 billion/8/1024/1024/1024/1024 = 4 billion TB data. Is it right?

  21. I'm down. This could be useful for long distance space travel. To keep moral up on a vessel people need entertainment. This can store Thousands of terabytes of data in a very, very small, and light weight package. A computer that could read the files quickly (Maybe lasers or hi resolution photography) would sell like hotcakes.

  22. Still God doesn’t exist? How DNA exist at first place with such incredible engineering and design. Isn’t there someone supreme being behind it ? How it come by itself ?

  23. We could store everything in DNA? Actually, it already is!! By everyones (all organisms included) experiences.

  24. we already have all the info in the DNA. haha……why reinvent the wheel? too much tech = death. lots of nature = life.

    i think females in tech just have failed relationships with men.

  25. how to read write and delete data on DNA?
    is it safe for human body?
    is the data secure? or we have to encode it? and how we decode the data if it encoded?

  26. Okey. The few gigabits of human dna can create a brain that can store several petabytes of data and also process it. I don't think direct encoding is the most efficient way of saving the data. We could create algorithms that could further compress data such that when it's decoded, just like growing, it reveals way much more data.

  27. I'd recommend having encoded data in DNA left for later life following us in the event we become extinct along with a carbon sample to accurately date when it was made.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top