good evening my name is Steve Martin I’m

a faculty member in material science and engineering and tonight it’s my great

pleasure to welcome you to our second Sigma Z lecture for fall of 2010

academic year and we have another two semester to talks next spring each fall

and spring semesters we have a we have a national honors lecture and we have a

local honors lecture and so it’s great cleavage for us as a sigma zai to offer

these seminars and to engage the greater research community at Iowa State

Sigma’s is a national and international organization focused on research and it

cuts across all disciplines of research everything from of course engineering to

biology but also information and the history so it’s a very very broad

organization wonderful organization in the apart oh not the least of which is

you get to come to great lecturers like tonight so keep aware of the

advertisements our next seminar would be in February and the announcements will

be out will be also here in the Maura Union definitely on a third

typically at about 8 o’clock so if you’re interested in more information

about Sigma’s I I suppose there’s a few members of society if you’re interested

in Sigma’s I it’s open to of course researchers of all ages so please look

at Sigma’s eye it’s right on the ISU homepage and you can look into our

activities and becoming a member of Sigma Chi of course for students there’s

very low produced rates for membership it’s a great pleasure for me to

introduce I’m the president of society and Bert is a vice president of our

organization the ISU chapter of Sigma zai Dian professor Birke is a

distinguished professor in the Department of Food Sciences and human

nutrition she’s also the director for the Center for Research on botanical

dietary supplements and so she also has a great level of research experience in

food and food supplements and dietary health and so with that Diane then will

introduce our speaker so Diane please thank you very much Steve it’s great

honor for me to introduce dr. Hahn Hahn of Arc

he has a Bachelor of engineering from in electronics engineering from Bangalore

India he has a master’s in electrical and computer engineering from Drexel and

a master’s and PhD in computer science from the University of wisconsin-madison

he here at Iowa State he is a professor of computer science he founded in 1990

at Iowa State the artificial intelligence research laboratory and he

currently directs that and he also directs the senator computational

intelligence learning and discovery which was founded in 2004 is a very long

list of research into interests I think that will probably not go through the

whole list obviously artificial intelligence machine learning knowledge

representation bioinformatics computational biology and the list goes

on he’s published over 200 research

articles in refereed journals conferences and books co-edited six

books with that I will introduce a neuron we’ll leave it to professor Hahn

of our to speak to us about algorithmic thinking in biology so most of my work is done with graduate

students so you know some of the people that I work with let me just so today’s

talk I want computer science is not just about this computer science and so the

the first part of the talk like try to convince you that and then are even a

few examples of so the consequences of this way of looking at other Sciences in

particular biology okay so so let’s start with us it’s something to get us warmed out so

this is something that appeared in general a biology general a few years

ago so can a biologist fix the radio so how would you go about fixing a radio

well here’s a possible way to go about fixing the radio see open up the radio

find all the different things in it then you classify the things that you see

then maybe even try changing replacing some component see what happens if it’s

still playing the Newseum that that wasn’t probably the color of that

component was not probably critical and I would dare this outfit

if it wasn’t for the fact that this was actually written by a biologist

otherwise it would seem a little odd that a computer scientist of yourself

yes oh and if we said all science is either stamp collecting or physics and biology one might say blended biology is like a bit like

physics before Newton and physics before Newton was a descriptive Newton invented

Co invented calculus for the first time a way to talk about things like rate of

change and here is the language that that particular odd thing to happen

so biology is at a stage where physics was about the time because biology has

been a descriptive science so you collect catalog and describe things that

you see biological phenomena and this goes from you know the early days when

people taxonomist went and catalogs species and their relationships to more

recent times very sequence that we have but there it unlike physics in biology

there isn’t there hasn’t been so since me the descriptive science and

in advances in biology and limited in part by instruments of observation and

this is critical for any science in CR to be able to observe first observe then

describe some things and then try to build predictive models and then there

with limitations and all all of these but it’s very changing in the last

several decades so biology used to be yes but now we have at least the

instrumentation that allows us to gather lots of data so if you want to transform

the biology then from stamp collecting to physics from a descriptive science in

character science then they need well methods for constructing models

from data in inferring consequences of generating hypotheses that they can test

against data that you have are and then into signing up into experiments and so

on and I would argue that over here computation plays a central role gone

just let me say a little bit about accommodation so this computation we started out with Hilbert presenting

this decision problem so and many of his problems that still

be my traditions but one of the problems that he first I was is there an

effective procedure for designing very simple question and so and to make

the long story short cheering invented a little gadget called

the Turing machine which other and then basically he suggested that

computation or this effective procedures for our practical purposes what did this

Turing machine does and what it does is essentially

transform a string of letters into and the other people tried to come up

with alternative formalizations of this and they were proven to be equivalent so

there’s something fundamental computation so so this led to the church

Turing thesis which in sort of layman’s terms and in fact the more limited because

they have okay so so if you take this seriously then our theories and models

in very seriously and it works under this assumption that computation for

cognition and computational biology also takes

this seriously so by analogy I can say computational biology so that’s sort of the main idea so some

computational biology then and that’s some this premise that computation

provides the biology physics approximate analogy and as a means of so so this means an implication of this series about information processing so

for example if you look at the genome so organisms in general you can think of

them at a certain level of abstraction as beautiful self reproducing

information so and they acquire information it’s a

learning adaptation of evolution the transmit information server ology is

fundamentally an information science and that in from this point of view and then

the grand challenge of biology is given code that’s written in some unknown

programming language our challenge is to determine the syntax and the semantics

of the language in other words what does this program just to give a different example suppose

you March this game down and they could so figuring that out is like figuring

out the syntax and semantics there’s an unknown language from essentially by

observing you know some things written in that language so just some terminology so when we talk

about the genome it’s basically we refer to the entire sequence which is and just some basic quality Sedaris

audience each self has identical DNA well because but it’s program if you

think this program then the program so so this cellular differentiation is

essentially response to the signals that control orchestrate his developmental

program so try to understand that the response of this program to different

conditions is a huge problem so the program of life transcriptome refers to

the full complement RNA which then get translated into your proteins and the

proteome is the full complement of proteins that is produced so and then

there’s this term in truck term which refers to the full complement of

molecular interactions seduction and Romano and the techniques that we use that and

are no different from what you would use to analyze a social network or some

other type of map so one of the challenges is relating these different

levels so for example the program at this level how does it get translated

into this and linking up this different levels of abstraction so if you take

this approach seriously then we will have a theory so he and it goes through this process

Kali honest linear sequence know what do you

know much about shape today so presumably the information

so have a theory of protein folding when we have a algorithmic program that takes

the linear sequence of amino acids and the same sort of story can be told about

other problems in biology so and this you can take a similar sort of line of

reasoning and this has affected other disciplines I’m going to skip this

because the focus here is on biology so so this selasa so they’re not

necessarily reality but the describes certain aspects is a cartoon now if you didn’t really some like

Pacific levels in people ten days No so any model that we develop is not

necessary it’s not reasonable to ask you if a model is true this is not about

what’s true it’s about whether this model so in the case of biology you can

come up with models of patterns that describe basic entities like DNA RNA

genes proteins and so on the properties of these relationships

between those so interactions and depending on exactly what we are

interested in you can come up with different types of models these models

take the form of essentially programs so so this is sort of the big big picture but could argue that computational

models are what allows you to grasp on reality sets so so the just wrap up this section of

the talk the main idea is you understand the phenomenon when we can so inventory’s take the form of

algorithms and this has led to in recent years to

the birth of many different disciplines and they all have a combination okay

and this also has implications for research again if this if you take the

seriously any literate person has to know something about computing not

merely the use of computers but the ideas the concepts of ok so let me give

you some examples starting with something that happened about the same

time as so and this was brains neurons and so McCulloch was you know

physiologist and pizza was a mathematician who got who heard about

computing and this is a very famous seminal paper logical calculus of

neuronal activity and so set up here’s the this is probably if you took a

biology course in high school abscess and itself and that becomes the stream of

bosses to go down the center by the way that this physics of this is exactly the

same as physics and cables so this is obviously a very complicated system what’s it cartoon it’s this one but you

can model the different inputs to this neuron some variables the interactions

the synapses contacts here by some waves some numbers and the fact that is

accepting signals so you can think of it as a switch it to

uh add something very simple gadget so the output is a 1 if the weighted sum of

the inputs is greater than 0 its 0 and minus 1 otherwise it’s a two-state

device okay so and for one thing it has some

interesting connection with German so if you have this sort of linear summation

so it’s better than zero something happens to Christie otherwise something

gasps silence so what if it is equal to zero so by setting this equal to zero I

get up essentially an equation for a line on a plane and for any input that

falls on one side of the plane this device produces an output at 1 and Pleiades so if you think of these

parties in the space as modeling inputs so it could be a signal that you receive

from your visual system this can be used as a pattern classifier right it

classifies its inputs into two categories one then it says it can be

used to classify things going on plane is rotated if I change this the

classification changes so some interesting connection with geometry and

also pattern classification so we can use this as a classifier it also has

some interesting connection well by choosing the weights

appropriately yeah I can logic functions they can get it to compute some simple

logical functions and it turns out that by choosing the weights appropriately

you can get it to compute and source and knots if you know this if you have had

any exposure to boolean logic results is that if you if you can compute and

source and knots you can compute any boolean function the boolean function f

there is network of this once again computed within consciousness

of any smart step to building finite state machines these are machines that and receive inputs and the started and

that’s exactly what we have these boxes here computers so all we need is

that and if you give them a fine stator down the tunnel and then if you give it

some space to write and read from then you get during machines general-purpose

computers so this humble to go

you know this all of this came from above this result sort of father plumb

just taking this neuron so the symmetric of us can compute

arbitrary boolean functions they come up with very simple learning algorithm that

basically modifies the and it turns out that after finite number of iterations get the inputs correctly classified so

anytime you have a set of data samples and it has some data samples

face and I hear do this other than then again now this was all fifty years ago and

there are much more sophisticated learning algorithms now that can work

with more complicated set of learning problems but the basic idea is the same

which is how okay so that’s one example of and this is if the situation gets

more complicated you’ve come up with yes this has really led to a theory of

learning machines and the far more sophisticated algorithms that can learn

from data so that’s one example of a computational

in biology in the very simple context of looking at new nuts and that leads well you have computational models near

answer it all selected and I got you something about vision and

guidance many many start out like this one simply accepted taking yeah yes now let me tell you if you have is

learning machines where could he apply them well you can apply to a lot of

different scenarios in in biology and so here’s some examples so one of the basic

problems is Molly dysfunction by attacked me

so one of the basic problems is can you predict where the interaction so if you could understand how this

inflections work then you can design better drugs so one of the questions is

how can we predict interfaces from sequencer structures many techniques but

one of the ways in which occur approaches by using machine learning so

you have a data set of characters and from this you can try to learn the

gender roles that would allow you to predict these

interfaces well so this is an example of City study so you you know in this case

it was protein RNA in two pieces so you can go to the protein complexes and from

that site so you have a data set of protein RNA interfaces so from this the

task is to learn what makes so that’s the problem

so see here’s again a statement of the problem sir you have you given a protein

you don’t necessarily have a structure the complex you want to predict which

amino acid so this is a 3d structure but what you get in the sequence great which

amino acids participate in some protein protein interactions so the Guardian

hypothesis is that reflected in the local sequences the

signal is there in the sequence so if you are a machine learning person you

would approach it by generating data sets from of known complexes and then

you build so you use one of these machine learning algorithms maybe

something that’s more sophisticated than those simple one that I told you about

and then build a classifier works you can use it to classify interfaces so now

the many questions that arise in this and for each of each one of these amino

acids some neighboring region and that would

be the input for you could take the structure with Lenovo and get some

structural neighborhood try to predict it and so on so suppose you do this

again skipping all this give you the result you do this so you can get that

you have to keep in mind that and even if you don’t quite get all the

positions right if you know approximately where the interphase is

that’s already pretty good for many of the targeting experiments there are so

you know roughly where the interface is so you can try to design a drug molecule

that binds to that region so so then you might not spoil okay so

you can’t believe this so what so here’s an example of what you

can do with this information so if you build these predictors really give you

some biological insights that tell us something about biology what is it just

an exercise in yeah well here’s an his actually as

study where said this was an attempt to predict the binding sites essentially a

wider RNA which is the horse version of the HIV virus and the proteins that it

binds to and so there are predictions and makes an agency and then now a

communications and quite reasonably well and the

predictions have confirmed by the experiments it means that these kinds of

techniques cannot can be used then – first of all focused experiments because

experimentation actually experiments are fast they say nobody can mutate every

positive side so the predictions can die yes sir so this these sorts of methods

are quite useful so and in this case the actual instrumental work was

here in the veterinary medicine college and the predictions of Bernard by

experiments and so that’s a good thing so now one of the other problems is protein protein interactions we take the

molecule and to figure it out energetically favorable

confirmation weather and this is a very hard computational

problem because you’ll have to try every possible conformation by nations right so so in here again if

you can predict interfaces you can predict you can try to reduce the so

here so one of the things that you can do is if you really pay interfaces then

this docking programs the interfaces that exist in this and it turns out that simple simply

predicting the interfaces can improve the quality of results that are obtained

from the docking program so if you just took the blankness

some energy considerations and you asked confirmation this complexity and this is

because of many reasons so first of all we don’t completely have we don’t have

the accurate energy function so we don’t quite understand all the physics there

so so this and this is what you will see so the top confirmation may not

necessarily but on the land if you use this

predicted interfaces the correlation between the rank the top-ranking

conformation which has been system many other ways of determining whether

you can look at RMS the deviation from actual structures and many other

criteria it is a simple technique of using particular interfaces to rank the

confirmations improves the quality of the art which is produced by down in

docking is a basic task that’s used so so kind of just wrapping this part of

the talk up so if you so you can so I told you a little bit about this curtain

model of the neuron approaches and machine learning

approaches in general can be used in biological applications to predict what

makes an innovator now this is completely without

independent of what think I think ordinary said you also know what and and this is not just an exercise in

playing with computers it actually get some useful so so this is sort of that

Britain unlimited so in the recent yes there are

experimental techniques that for example in a cell or a tissue of all

the genes you can also measure the amount of

proteins so they can take this different techniques and you can actually get

measurements about so and you can also see what happens if you perturb the

system so for example it can take the data and you can start building

models from this so looking at individual genes or individual proteins

you can start building models of how they interact at the level of

transcription at the level of protein-protein interaction and so on

and the many different types of data and so for example the history favorite which proteins interact with which other

proteins this is noisy data but nevertheless it’s quite useful you can

use microarrays what interacts is what but it tells you

something about the activity of the various genes that are and then you can

do more complicated things where you know you can measure kinetic constants

and so on system if you look at smallest scale

very detailed information kinetic constants and you can actually build

differential equation models and this fault spectrum in between so here are

some examples of network models right so say this is the story take this and try

to generate different types of models so for example one

no Jesus and the links represent the fact that they interact and if you also

wait on the edges then it might indicate the strength of interaction and you can

generate networks like this so what can you do with this well it’s a graph right

so if I have several different species from which I could generate this graphs

then for one thing I can essentially do a graph comparison to figure out what

makes them expression are networks and a few links represent correlations between

expression levels of these genes you can have a graph from a healthy tissue a

graph from a tissue that has cancer and by comparing this graphs you can and that can then be used to generate

specific hypotheses specific more focused experiments try to figure out

mechanistically what’s going on so and this leads to a bunch of different

computational problems so for example essentially this is a and it’s from a

purely computational perspective if you have no other constraints it’s also

known to be a hard problem right so but in in in practice in biology there are

several additional here is another interaction and this gives so this gives

you this step of models and then if you you can ask the question about whether

some chain could possibly so the computational questions that

competition rate surpassing questions have implications in biology they incan

if there’s more than enough then you can figure out that and then if you did

comparative analysis again you can figure out what’s common what’s

different across different conditions itself and you can get some combination

of these models by experiments where let’s say one of those

genes expression and see what happens so so what we have here is starting with

the simplest boolean networks which have a little bit more information so you

have this hierarchy of increasingly sophisticated models but anytime you do

modeling it to keep models is that you want this it’s as accurate though the one for

which this markets being constructed well and that’s that holds true for the

models as well okay so abstraction is important and

computer science well that’s basically a few okay so I think I’m gonna skip the

rest of this which essentially details so let me just sort of wrap up since we

are out of time science so computer science and the science of information

processing so computer science has very little to do with computers that the

community use that this incident he is information

processing as long as it is information processing whether it takes place in

your brains or in cells forming societies so the competition provides the means

for describing information plus sake and the judge during this is a general

the thurible can rediscover a computer program

this means that here is a modern state the form of computer programs and and

computational biology is an example of that so we use computational models mr. because and this is a team that’s happening for example if you looked at

the New York Times a couple of days ago there’s also that went to Facebook and Sonia now has

social network data service it’s a thinking about top

questions in sciences biology or social science sorry there’s been a tremendous increase

of course in the size and the computation powerful interview you

humans maybe that is maybe not the power of algorithms or models and what rule

has the increased capability and speed of the supercomputer we call it today

impacting what you can practically do in the cyber systems in the study so it’s a

state away from supercomputers of the high performance but what I wanted to

emphasize here is thinking about problems using ideas from computing

complimentary side to that as the problems become bigger very important which you had a computer

helpful because computationally our problem

but not everything is going to be saw just by having bigger computers yes is it certain that we will not have any

analytical description of biological processes so we have to his advanced

research news competition procedures whether I can go to the

music’s that said we got cases study how close form solutions

that’s what you but there are also many so those models can be arbitrarily

I’ve seen this type doesn’t mean that you cannot understand

it so binary protein

the secondary tertiary block marigold I don’t work very well thing that’s

promise being made but it’s not so so this problem we are hosted Humanzee

has there’s seven of them but progress is being made so you can

say that you can predict and often

yeah techniques

structures and in combination with it doesn’t say yeah – already I think so

these predictions are one of the big challenges in web design is you generate

a whole bunch of potential candidates and we know that I’m an investor

so safe in that standpoint for prioritizing potential drug targets they maybe had this one

if you does that engage them but I want missiny what I was preventing

this kind of so many use them as essentially full

screen so speeding provision yes as