DNA sequencing methods - an overview of Sanger, Illuminia, PacBio, and Oxford NanoPore

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024

Комментарии • 4

  • @thebumblingbiochemist
    @thebumblingbiochemist  Год назад +2

    You start by shearing the DNA into shorter pieces and adding adapters to the end that allow them to stick to the chip. Once stuck, the copies are made “on site” in some DNA gymnastics that are best explained in their video and which I’m not gonna try to get into, sorry! bit.ly/33qIBXW
    Bottom line is you can get massively parallel high-throughput sequencing (you can read a lot of DNA at once). It’s still limited to short reads though and it’s more error-prone than Sanger - even though you’re not terminating each time, you still have to do all the copying and stuff and there are a lot of chances to mess up, especially as you try to push the length. In order to be able to sequence something really big, like a whole genome, you have to line up lots and lots of overlapping fragments (since you’re using adapters which you can add to *any* sequence, as opposed to Sanger where you use primers where you dictate where to start, by randomly shearing the DNA in the beginning you generate lots of overlapping sequences). So the computers have a lot of work to do once the sequencing’s done. And the scientists have a lot of work to do analyzing it and helping the computer out and stuff - basically it’s not as easy as it sounds because of things like repetitive sequences that you have to figure out where they go and stuff. Such repetitive sequences are a major reason why the initial genome was incomplete www.nature.com/articles/d41586-021-00293-8
    You can reduce the “what goes where?” problem by using longer reads.
    You can get longer (but not super long) reads from Pacific Biosciences (PacBio), which uses labeled DNA letters that give off fluorescent pulses when held by the DNA Pol (which is in the path of an excitation laser), so you can read out the sequence of fluorophores as they’re added. The fluorophore is attached to the end phosphate, which gets released when a letter gets added, so you don’t have to worry about the signal lingering around and you don’t have to pause after each letter like ilumina does. They call this Single Molecule Real Time Sequencing (SMRT). bit.ly/3bQxcEI
    If you want *really long* reads, you can use Oxford NanoPore. As I mentioned above, it doesn’t “add” anything, it just looks to see what’s already there. But DNA is really tiny so instead of trying to visually read it they sense its vibes… They thread DNA through a pore and measure the electrical current flowing through the channel. That current changes different depending on which DNA letters are in the pore, so they can read out the DNA sequence as it passes through.
    Since you’re able to get long reads, it can be great for repetitive sequences and stuff.
    Unlike illumina, it doesn’t do any “pre-copying” (there’s no amplification). This might sound like a bad thing because you have less starting material. But it can actually be a good thing because it reduces bias that can occur if some DNA regions get copied more/better than others. They have a number of different devices ranging from tiny sequencers like the MinION to the big PromethION (there’s also the Flongle, the MinION Mk1C & the GridION). This is totally not an advertisement or endorsement or anything, just so that you can connect the dots if you see these names in papers.
    NanoPore and PacBio are both more error-prone than Illumina, though. So one strategy that’s sometimes taken is to use a combination: use NanoPore to figure out the general arrangement of things and then use Illumina to get shorter, more accurate, pieces that you can fit it to.
    Ilumina is more accurate than NanoPore and PacBio, but it’s still not as accurate as Sanger sequencing. And it’s not as cost-effective if you only have a single sample where you want to look at a single region, as opposed to trying to figure out an entire genome. So let’s take a closer look at Sanger sequencing.
    The way NGS is talked about, you might think that Sanger sequencing is a thing of the past. But it’s definitely not! We use it ALL THE TIME! (or at least we mail samples to a company that uses it all the time (and can do it fast & cheap) - we used to use GenScript but they shut down (at least temporarily) due to the COVID-19 pandemic, so we switched to GENEWIZ). When I send away DNA to be read, I don’t need a whole genome sequenced. Instead, I just want to check the sequence of a very specific part of a specific gene for the proteins I am cloning. Molecular cloning is where we stick a gene into a vector (such as a circular piece of DNA called a plasmid) that we can stick into cells like bacteria cells to get them to use.
    We want to make sure that the sequence got into the plasmid ok, without any typos in the DNA sequence (which could cause typos in the resultant protein or even prevent it from being made all together). So, before we try to get cells to express the protein, we put the plasmid we engineered into bacteria to make lots of copies of it, which we then purify out using alkaline lysis (“minipreps”) more here: bit.ly/minipreps & ruclips.net/video/fKf-g5oNvLY/видео.html
    After that, you’re left with pure plasmid. And you want to check the sequence of the part you put in, which “we” do using Sanger sequencing. A few months ago such sequencing proved an experiment-saver when detective-izing involving a case of “what the heck did our ex-colleague leave in the freezer?!” From the lab database I could see the name he’d given the protein construct (the modified version of the protein) (e.g. ProteinX_middle_deleted). But he hadn’t specified which part he actually deleted - I’m sure it was in his notes but I didn’t have those, so I sent it for sequencing and figured it out (and this week I purified that protein successfully).
    All this detective work looks pretty boring from our lab’s end. We just wrap up tiny tubes with bubblewrap, stick them in a big tube, stick that in an envelope (too many “your tubes arrived damaged” emails…) and drop them in the outgoing mail box (too many crushed tubes…).
    We do this a lot, as do lots of people in labs all around the world, but we don’t often stop to think about what really happens when it gets to the facility. To understand what happens when it gets to the sequencing people, lets first review what it is we’re trying to read - what is DNA? This is a little repetitive from before, but in greater detail at the chemistry level.
    DNA stands for DeoxyriboNucleic Acid and, as we discussed briefly, it’s made up of long chains of “letters” called nucleotides (A, T, C, & G) which are made up of 3 main parts - a deoxyribose sugar & phosphate(s) form the generic “backbone” part & then each letter has a unique “nitrogenous base” (“base”) which has 1 ring (the pyrimidines C & T) or 2 rings (the purines A & G). The different bases pair with specific other bases on other strands - A:T and G::C. So if you know the sequence of one strand you know the sequence of the other.

    • @thebumblingbiochemist
      @thebumblingbiochemist  Год назад

      I like to picture them as tiny little cartoons where the sugar’s 5-sided ring forms the core body & various groups stick off of its arms & legs. The “right arm” (as in the right of your screen/paper) is the “1’” position (the ‘ is pronounced “prime”) & this is where the base attaches. The “left arm” (5’ position) is where the phosphate(s) link on. The 5’ position is actually more like an elbow because there’s a “linker” from the 4’ “shoulder” & the “left leg” (3’ position) has a hydroxyl (-OH) group.
      Nucleotides link together left arm (5’ phosphate) to left leg (3’ OH) through PHOSPHODIESTER BONDS. You can link up as many as you want to get a chain, one end of which will have a free 5’ phosphate (the 5’ end) & the other end of which will have a free 3’ hydroxyl (the 3’ end).
      DNA & RNA (RiboNucleic Acid) are different in that RNA has a right leg (2’ -OH) and a left leg (3’ -OH) but DNA only has a left leg (they actually both have 2 right legs and 2 left legs, but if the leg is just a “Stub” (hydrogen) it doesn’t really do anything but take up a little space and satisfy the electrons carbon needs, so we don’t usually draw it. (The other difference between DNA & RNA is that RNA has a “U” instead of a T)
      sidenote: So if you see a carbon with less than 4 bonds, you just assume that there are hydrogens there. Also, if you see a “corner” without an element letter, you assume that there’s a carbon there. Carbons (with hydrogens as sorts of “placeholders”) form the skeleton of organic molecules (organic as in carbon-based, not “all-natural”), but it’s often the things they’re bonded to (functional groups) that do the exciting reacting stuff so we want to make them stand out more. So we’ll often draw the chemical structures of organic molecules with implied carbons and hydrogens, and just write in the C’s or H’s in places where they’re actually involved in what we’re interested in. This shorthand is really helpful, but it can also be confusing to people unfamiliar to it, so I hope this helps make biochem a bit more accessible.
      So, back to the sequencing story -> it’s okay that DNA doesn’t have that right leg because it doesn’t need it to link to another letter (polymerize). But this linking DOES need the left leg - that’s where the incoming letter will latch on.
      That enzyme I mentioned before, DNA Polymerase (DNA Pol), facilitates this linkage. It acts like a train that can only travel on double-stranded track. So, to travel on single-stranded track it first has to add the complementary nucleotide (the one that base pairs with it)(e.g. to travel past an A on the template strand it has to add a T to the growing strand)(so the product that’s being made is the complement to the template strand, but if you know one you know the other).
      Because it can only travel on double-stranded track, you also have to provide a primer (short complementary sequence) for it to start from. In Polymerase Chain Reaction (PCR), you use 2 primers to define the “start” and “stop” of a region you want to make copies of. With Sanger Sequencing, you only use 1 primer - you just give it the “start” station and then you let it stop wherever it adds one of the defective tracks you give it and see how far it goes.
      These “defective tracks” are DIdeoxynucleic acids (ddNTPs) which don’t have a left leg to latch onto (they have a 3’ H instead of an -OH). So these defective NTs act as CHAIN TERMINATORS.
      The basic premise of Sanger Sequencing is -> you give it mostly normal NTs (dNTPs) mixed in with some “defective” NTs DNA Pol will add normal NTs (dNTPs) normally but when a terminator gets incorporated, nothing else can be added. So, depending on how many normal ones got added before the terminator, you’ll get pieces of different sizes.
      You can run this on a urea-PAGE gel which separates them by their size by using the DNA’s negative charge to drive it through a gel towards a positive charge, with the gel mesh slowing bigger things down more along the way. Compared to agarose gels, urea-PAGE offers much higher resolution because you can make a tighter gel mesh (more here: bit.ly/2XsNzQg) -> can detect single-NT length differences - so you can tell XXX apart from XXXX, BUT you can’t tell what letters those X’s are (e.g. AAA and TTT look the same) So, you had to do 4 separate reactions, with each reaction only having terminator versions of a single letter.
      You don’t want all of the letter to be terminator-y because then you’d never be able to get past the 1st instance of it, so you include ~100X-less of the ddNTP than the dNTP (e.g. in the “A” reaction for every 100 A’s have in the mix, 99 will be dATP (normal) and 1 will be ddATP (terminating))(and you’ll also have all normal dGTP, dTTP, & dCTP in there).
      Then, technology advanced, allowing for DYE-TERMINATOR SEQUENCING -> scientists began using fluorescently-labeled nucleotides. Fluorescent molecules absorb light at one wavelength (excitation wavelength) and release it at a different wavelength (emission wavelength). Different wavelengths have different colors, so if you use fluorophores that have different emission wavelengths, you can tell them apart
      You can label the different terminators (ddATP, ddTTP, ddGTP, ddCTP) with different fluorophores and add all 4 at once. The fluorophores are added to the base, in a position that doesn’t interfere with the base pairing.
      To make things even easier, you can use CAPILLARY GEL ELECTROPHORESIS. Instead of running it through a “slab gel”, you run it through a vertical tube of gel. And as it runs through it gets “scanned” by a laser.
      The light from the laser is at the fluorophore’s emission wavelength so it excites the fluorophore, which then emits light at a different wavelength, which gets recorded by a detector as peaks of fluorescence intensity at each wavelength, drawn on a CHROMATOGRAPH. Because the different ddNTPs have different fluorophores and give off light with different wavelengths, the detector can tell them apart.
      Sanger sequencing is kinda like the “gold standard” in terms of accuracy (which is really important in our case), but it’s expensive (relatively speaking). It’s really cheap if you only have like one reaction (~$5), and if you only have a low number of “targets” it’s still the most cost-effective way to go. In addition to our using it in the lab, doctors can use it for things like sequencing specific genes from their patients if they have a disease caused known to be caused by a mutation in that gene and they want to figure out what the exact mutation is.
      But if you wanted to sequence an entire genome (which you’d first have to break up into lots of shorter pieces you’d later “stitch together” computationally) it’d be really expensive. So for big projects, things have switched to those massively-parallel “next-gen sequencing” methods where you have lots of reactions happening at the same time, usually on a chip, with really tiny volumes.
      note: In addition to Whole Genome Sequencing (WGS), there’s something called whole exome sequencing, which only sequences protein-coding genes.
      Nature did a special on the 20th anniversary of the human genome sequence draft publication: go.nature.com/3qBSHj4
      more on the cost of sequencing: bit.ly/3dtI8ei
      more on DNA polymerization: bit.ly/DNAligasepol
      more on PCR: blog form: bit.ly/pcrtrain & ruclips.net/video/GZSLfECgW3Q/видео.html
      more on peptide bonds: bit.ly/aminoacidstoproteins
      more on Sanger’s insulin protein sequencing: bit.ly/insulinsequencing & ruclips.net/video/Mi6s0ioOChY/видео.html
      blog form: bit.ly/sequenceclones ; RUclips: ruclips.net/video/HxZcjX5WByc/видео.html  
      more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉 bit.ly/2OllAB0

  • @Anna-yy1ln
    @Anna-yy1ln 7 месяцев назад

    How does PacBio deal with long homopolymers? Doesn't the signal plateau?