{"id":638,"date":"2020-02-12T15:59:36","date_gmt":"2020-02-12T14:59:36","guid":{"rendered":"https:\/\/next-hack.com\/?p=638"},"modified":"2020-06-05T20:26:06","modified_gmt":"2020-06-05T18:26:06","slug":"overclocking-an-arduino-zero-or-any-atsamd21","status":"publish","type":"post","link":"https:\/\/next-hack.com\/index.php\/2020\/02\/12\/overclocking-an-arduino-zero-or-any-atsamd21\/","title":{"rendered":"Overclocking an Arduino Zero (or any ATSAMD21 board)"},"content":{"rendered":"\n<h4 class=\"wp-block-heading\"><strong>Introduction<\/strong><\/h4>\n\n\n\n<p>The Arduino Zero made quite popular the ATSAMD21 microcontroller, which was later adopter by a number of other compatible boards like Adafruit\u2019s Feather M0 or <a rel=\"noreferrer noopener\" aria-label=\"taca-Innovation\u2019s uChip (opens in a new tab)\" href=\"https:\/\/shop.itaca-innovation.com\/epages\/186543.sf\/en_US\/?ObjectPath=\/Shops\/186543\/Categories\/IDProducts\" target=\"_blank\">Itaca-Innovation\u2019s uChip<\/a>.<\/p>\n\n\n\n<p>The SAMD21 is a 48-MHz Cortex M0+, with hardware multiplier, a nice set of peripherals (including an USB Host\/Device), an event system and DMA. In its higher-tier version, this MCU provides a good amount of memory for many applications, thanks to its 32 kB of RAM and 256 kB of Flash.<\/p>\n\n\n\n<p>While the SAMD21 grants a decent amount of firepower, sometimes some MHz more would be indeed very useful. Furthermore, it is always interesting to see how much we can push these little beasts!<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1018\" height=\"664\" src=\"https:\/\/next-hack.com\/wp-content\/uploads\/2020\/02\/uChip.png\" alt=\"\" class=\"wp-image-645\"\/><figcaption>uChip, an ATSAMD21 board<\/figcaption><\/figure>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>In a\nprevious article, we already showed that some of the ATSAMD21 peripherals can\nbe \u201coverclocked\u201d. In particular, we overclocked the SPI to achieve a better\nrefresh rate. In this article, we test if is it possible to overclock the\nSAMD21 core itself (answer: yes!), how far we can push the speed, and what kind\nof code\/external hardware modifications are required.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Some good news and some bad news<\/strong><\/h4>\n\n\n\n<p>First the\ngood news: the main CPU clock is produced by a complex internal circuitry that\nallows generating the desired frequency, by configuring some registers, so no\nhardware modifications are required.<\/p>\n\n\n\n<p>Furthermore,\nfor each peripheral we can set its working frequency, so that overclocking the\nCPU will not affect the operation of other peripherals, which might introduce\nundesired effects. This of course requires a careful configuration of all the\nclock subsystem.<\/p>\n\n\n\n<p>Now the bad\nnews: we need quite a good amount of code, especially if we need to use the USB\ntoo. Well, this is not a problem, as you will have to just copy and paste the\nfinal code. <\/p>\n\n\n\n<p>As a last\nbad news, if you use the SysTick timer, then, overclocking the CPU will\nincrease the SysTick clock frequency too. This might induce some issues,\nespecially if you use the Arduino framework. <\/p>\n\n\n\n<p>In fact,\ndelays and timing functions (e.g. millis(), delay() and micros()) are based on\nthe SysTick timer. For instance, by overclocking the CPU to 72 MHz, if you call\n\u201cdelay(1000);\u201d, you get an actual delay of 666ms, because the CPU is going 50%\nfaster. To overcome this effect, you should modify the variant.h file of your\nboard, and set VARIANT_MCK to the actual clock frequency, i.e. 72000000 in this\nexample. However, to avoid issues, creating a new variant would be preferable.\nWe will cover the variant creation in a separate article.<\/p>\n\n\n\n<p>Another solution,\nis to take into account the increased speed, and write delay(1500) instead of\ndelay(1000) and multiplying by 1.5 the result of millis() and micros().\nActually, instead of multiplying by 1.5 we suggest this simple operation: time\n= micros(); time += &nbsp;(time &gt;&gt; 1). In\nthis way, you will save a floating point multiplication.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Software<\/strong><\/h4>\n\n\n\n<p>For this\nhack, we will use the Arduino IDE: of course, you do not need the Arduino\nframework, and you can use any IDE you want. <\/p>\n\n\n\n<p>To test if our hack is successful, we will use a very simple method. We will measure the frequency at which one GPIO toggles. The sketch can be downloaded <a href=\"https:\/\/next-hack.com\/wp-content\/uploads\/2020\/02\/TestOverclockSAMD21.zip\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"here (opens in a new tab)\">here<\/a>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Theory of operation.<\/strong><\/h4>\n\n\n\n<p>In many\nmodern high-speed CPU\/MCU designs, the CPU clock is not directly generated by a\ncrystal oscillator, which runs at the same frequency of the desired clock.\nInstead, the high-frequency clock is usually derived from a reference oscillator,\nwhich runs at a much lower frequency. This reference frequency is \u201cmultiplied\u201d\nby precise factor, to achieve the desired frequency. The ATSAMD21 follows the\nsame strategy. Its 48MHz operating frequency is derived from a much lower-frequency\nreference oscillator<a href=\"#_ftn1\">[1]<\/a>,\nwhich can be chosen by configuring some registers. The frequency of the chosen\nreference oscillator is then multiplied, to achieve the desired operating\nvalue. <\/p>\n\n\n\n<p>But, how\ncan we multiply frequency? We will not go too much into details, as this is\nbeyond the scope of this article. However, we would like to give an intuitive\nexplanation.<\/p>\n\n\n\n<p>For today\u2019s\nstandards, making a relatively high-speed oscillator is not difficult. For\ninstance, one can put in a loop an odd number of inverters. Each inverter has a\nfinite speed, i.e. they introduce a small delay. You can easily see that, in\nthis way, the system will start oscillating, as there is no stable condition\n(try putting a \u201chigh\u201d logic level somewhere, and follow the signal!). The main\nissue of this technique is the poor precision of the achieved frequency, and\nits strong dependence on many factors. Slight variations of the temperature or\noperating voltage, could lead to huge variations of the generated frequency.\nFurthermore, even at the same operating conditions, there would be a huge\nvariation between sample to sample. <\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"966\" height=\"230\" src=\"https:\/\/next-hack.com\/wp-content\/uploads\/2020\/02\/ring-oscillator.png\" alt=\"\" class=\"wp-image-644\"\/><figcaption>A ring oscillator. Cascading an odd number of inverters will create a simple high speed oscillator!<\/figcaption><\/figure>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>However, following\nthe previous example, if instead of using simple inverters, we introduce\nsomething that allows us to adjust the speed of such inverters, we can regulate\nthe frequency. Yeah, useful,&nbsp; but this\ndoes not solve the problem of the frequency precision: one would have to\nmeasure the output frequency, and constantly adjusting the speed of such\ninverters! This is actually what it is internally done! Let us assume that we\nneed a 100MHz stable output frequency, but we have only a 1-MHz reference clock.\nAll we have to do is to use an \u201cadjustable\u201d oscillator. We then divide its output\nfrequency by 100, using a simple counter. After that, we simply compare the\noutput frequency with our 1-MHz reference frequency. If our 1-MHz reference\nfrequency is faster than the frequency of the adjustable oscillator divided by\n100, it means that our adjustable oscillator is running too slow. Therefore we\nask it to go faster. On the opposite, if our 1-MHz reference is slower, then\nour oscillator is going too fast, therefore we ask it to go slower.<\/p>\n\n\n\n<p>These systems are called FLL, frequency-locked loop, i.e. a system that tries to keep the output frequency (divided by the multiplication factor) locked to the reference value. A slightly different implementation is the PLL (phase-locked loop), i.e. a system that also tries to keep the generated signal in phase with the reference. The digital versions are called DFLL and DPLL. <\/p>\n\n\n\n<p>The ATSAMD21\nhas both a DFLL (which can run at 48MHz, hence called DFLL48), and a DPLL\n(which can run at 96MHz, hence DPLL96).<\/p>\n\n\n\n<p>The\nATSAMD21 has also many clock sources and many \u201cusers\u201d (CPU, BUS, peripherals),\nwhich can run at different frequencies. To accommodate the need of these\ndifferent frequencies, the ATSAMD21 includes several GCLKs (generic clock\ngenerators). Each GCLK allows to divide an input clock by some desired factor.\nThe input clock can be configured from a list of different sources. Furthermore,\nthe output of a GCLK can be attached to one or more \u201cusers\u201d. <\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1545\" height=\"518\" src=\"https:\/\/next-hack.com\/wp-content\/uploads\/2020\/02\/clocks.png\" alt=\"\" class=\"wp-image-643\"\/><figcaption>The clock subsystem of the ATSAMD21 (image from the datasheet).<\/figcaption><\/figure>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>The program<\/strong><\/h4>\n\n\n\n<p>The\nprocedure is simple. The Arduino framework already configures the DFLL48 to run\nat 48MHz from the 32768 Hz crystal oscillator. This allows us to have a stable\nreference frequency for the DPLL96. We divide the 48MHz by 48, to achieve a\n1-MHz reference value, using GCLK4. The 1-MHz frequency from the GCLK4 is fed\nto the DPLL96. Then, we configure the DPLL multiplier so that it will run at\nthe desired CPU frequency. For instance, if we set the multiplier to 60 (actual\nvalue written to the register: 59), we run the CPU at 60 MHz. After that, we\nconfigure the DPLL as the clock source for the GCLK0, which is used as the CPU\nclock.<\/p>\n\n\n\n<p>Now, some\nmore clock configuration must be done, as the Arduino by default uses the GCLK0\nalso for the USB, which must run at 48MHz instead. Therefore, we configure\nGCLK5 as the clock source for the USB, and we set the DFLL48 (undivided) as the\nclock source for GCLK5. <\/p>\n\n\n\n<p>In the\nsketch, change the value of \u201cDESIRED_MHZ\u201d to the frequency (in MHz) you want to\ntry! <\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Flash Wait states<\/strong><\/h4>\n\n\n\n<p>While the\ncore can be overclocked, the FLASH memory has a fixed access time. This means\nthat the faster the CPU core, the higher the number of wait states.<\/p>\n\n\n\n<p>This has an\nadverse effect on our overclock effort, and in some (pathological) cases,\ncounterproductive. In fact, while the flash can be accessed at zero wait states\nif the CPU runs at 24 MHz or less, we need one wait state if the CPU runs\nbetween 24 and 48MHz. For higher frequencies, we need two wait states, so that\nthe actually frequency at which we access the flash is 24MHz or less. This is\nvery problematic if we have to run anywhere between 48 and 72 MHz, as it will\nactually have an equivalent access time to the Flash, <strong>slower<\/strong> with respect to the 24 or 48 MHz case. In fact, while at 48 MHz\nwe access the Flash with one wait state (i.e. we are effectively going at 24\nMHz), what happens, for instance if we have to go at 60MHz at most? Well we\nstill have to insert two wait states, i.e. the effective frequency at which we\ncan read the flash is only 20MHz!<\/p>\n\n\n\n<p>Luckily,\nthere is a small (64 bytes, i.e. 32 instruction) cache, that, according to the\ndatasheet, allows an equivalent almost 0-wait state code execution. The\ndatasheet does not clearly show how this system is implemented. It only\nmentions the existence of an 8-line direct mapped cache, with 64-bit entries.\nThis suggest us that the flash memory is actually implemented with a 64-bit\ndata bus, therefore, in a single access, 8 bytes (4 instructions) are actually\nread. <\/p>\n\n\n\n<p>Therefore,\nin case of a 48-MHz operation, the first instruction (let us assume we are\nreading an 8-byte aligned instruction block) is read in two cycles. The second,\nthe third, and the fourth instructions will be already in the cache (so zero\nwait states for them), therefore the actual number of cycles will be 5, instead\nof 8. As a result, the equivalent (average) number of wait states is actually \u201c0.25<a href=\"#_ftn2\">[2]<\/a>\u201d\ninstead of 1. <\/p>\n\n\n\n<p>These\nfigures are improved when also the first instruction we are reading is already\ncached, as it occurs frequently in small loops: in this case, a truly zero wait\nstate execution is achieved, as the flash is never touched. <\/p>\n\n\n\n<p>For very\nlong instruction sequences, the figure goes toward the 0.25 wait states (i.e. \u201c1.25\u201d\ncycles per single-cycle instruction). For pathological cases, i.e. where a lot\nof jumps or load instructions pointing to \u201crandom\u201d addresses are present, the\nfigure might tend toward the value required for a single random access to the\nflash memory, i.e, 1 wait state in our 48MHz example.<\/p>\n\n\n\n<p>When we\noverclock at 72 MHz, we need to put 2 wait states. This means that for 4\nconsecutive instructions (assuming the first one being 8-bytes aligned) we\nneed&nbsp; 6 cycles, corresponding to 0.5 wait\nstates per instruction. In small loops, where there is always a cache hit, this\nnumber collapses to 0, therefore we exploit the full overclocked core speed.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>What about running from RAM?<\/strong><\/h4>\n\n\n\n<p>Yes, code\ncan be put in RAM too. A very simple way to do this is to specify in which\nsection you want to put your function. The Arduino Zero default linker script\ndoes not provide a separate section for \u201cram\u201d functions (instead for Arduino\nDue there the \u201cramfunc\u201d section is provided). Still, we can cheat, and tell the\nlinker to put our function in the \u201cdata\u201d section. In this way, the startup code\nwill automatically copy the function into RAM.<\/p>\n\n\n\n<p>To do this,\nyou simply have to insert \u201c__attribute__ ((section (&#8220;.data&#8221;)))\u201d\n(without quotes) before the function declaration). <\/p>\n\n\n\n<p>The\nembedded RAM of the ATSAMD21 always runs without wait states, and it can also run\nat a much higher speed, with respect to flash. However, as we will find later,\nthe maximum overclock frequency is lower. However, because no wait states are\npresent, even if the overclock frequency is lower, the average performance\ncould be better if we execute code from RAM, in some cases.<\/p>\n\n\n\n<p>Warning! if\nyour code fails to run in RAM at a certain frequency, then you should not use\nthis frequency even when you run code from flash! In fact, if the code in RAM fails\nat a certain frequency, this means that the RAM might not be fast enough, even\nfor DATA access! Therefore, while code that does not access data in RAM might\nrun fine, eventually some issues might occur when you try to read data from\nRAM.<\/p>\n\n\n\n<p>Also please\nnote that the RAM access time might depend on the actual address you are trying\nto access, therefore at high frequencies some code might work fine, some might\nnot, depending on their location in RAM. While at 48 MHz the maximum access\ntime is by design well below a certain value, enough to have single cycle\naccess (zero wait states), at higher frequency we might get close to the worst\ncase limit.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Performance<\/strong><\/h4>\n\n\n\n<p>So, let\u2019s run the <a href=\"https:\/\/next-hack.com\/wp-content\/uploads\/2020\/02\/TestOverclockSAMD21.zip\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"code (opens in a new tab)\">code<\/a>!<\/p>\n\n\n\n<p>With all\nour uChips and Arduino Zero, we managed to run the SAMD21 at 72 MHz from flash.\nHowever, running the code from RAM, we can only achieve a maximum 70MHz speed. <\/p>\n\n\n\n<p>Therefore\nin our case, a 70 MHz should always be used as the maximum speed. Please note\nthat this does not mean that every code might run reliably at 70MHz. 60 MHz\nmight be a much safer limit. <\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1326\" height=\"694\" src=\"https:\/\/next-hack.com\/wp-content\/uploads\/2020\/02\/48MHz.png\" alt=\"\" class=\"wp-image-639\"\/><figcaption>Toggle period at 48MHz.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1302\" height=\"712\" src=\"https:\/\/next-hack.com\/wp-content\/uploads\/2020\/02\/60MHz.png\" alt=\"\" class=\"wp-image-640\"\/><figcaption>Toggle period at 60MHz.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1324\" height=\"748\" src=\"https:\/\/next-hack.com\/wp-content\/uploads\/2020\/02\/72MHz.png\" alt=\"\" class=\"wp-image-641\"\/><figcaption>Toggle period at 72MHz.<\/figcaption><\/figure>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>In any\ncase, with our simple test code, we did not find any speed difference between\ncode in RAM and flash memory. This is due to the very small code, that reasonably\nwell fits the 64-byte cache. The results are:<\/p>\n\n\n\n<p>@ 48 MHz:\n626 ms.<\/p>\n\n\n\n<p>@ 60 MHz:\n504 ms.<\/p>\n\n\n\n<p>@ 72 MHz:\n418 ms.<\/p>\n\n\n\n<p>We also verified\nthat the device still runs very cold, but we cannot guarantee that the code\nwill always be reliably executed: use this hack at your own risk!<\/p>\n\n\n\n<p>What about the\ncurrent consumption? <\/p>\n\n\n\n<p>The program\nwith uChip at 48MHz running from flash consumes 7.7mA. At 60 MHz, we get 9.4mA.\nAt 72 MHz we get 11.0mA. 70MHz: 10.6 mA.<\/p>\n\n\n\n<p>From RAM we get:<\/p>\n\n\n\n<p>@48MHz: 8.8mA<\/p>\n\n\n\n<p>@60MHz:\n10.6mA<\/p>\n\n\n\n<p>@70MHz: 12.0\nmA<\/p>\n\n\n\n<p>Please note\nthat these values are taken as the USB current consumption, using an USB power\nmeter. The measured USB voltage was 5.14V.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Conclusion<\/h4>\n\n\n\n<p>We have\nmanaged to run our devices at 70MHz, i.e. about a 50% overclock, that\u2019s not bad\nindeed! Still, you should understand that these devices could behave\nerratically especially if they work in high temperature environments. Indeed,\nwe tested all our devices at room temperature (22\u00b0C).<\/p>\n\n\n\n<p>What speed\ndid you get? Write in the comments!<br><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Downloads<\/h4>\n\n\n\n<p><a href=\"https:\/\/next-hack.com\/wp-content\/uploads\/2020\/02\/TestOverclockSAMD21.zip\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Get here the sketch. (opens in a new tab)\">Get here the sketch.<\/a><\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p><a href=\"#_ftnref1\">[1]<\/a> When working as USB device, the\nATSAMD21 can also use the USB start of frame as reference frequency, which\noccurs once per millisecond, with a good precision.<\/p>\n\n\n\n<p><a href=\"#_ftnref2\">[2]<\/a> Note! This is just the AVERAGE\nnumber! Of course the number of wait states for each instruction will be always\nan integer! <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction The Arduino Zero made quite popular the ATSAMD21 microcontroller, which was later adopter by a number of other compatible boards like Adafruit\u2019s Feather M0 or Itaca-Innovation\u2019s uChip. The SAMD21 is a 48-MHz Cortex M0+, with hardware multiplier, a nice&#8230; <a class=\"read-more-button\" href=\"https:\/\/next-hack.com\/index.php\/2020\/02\/12\/overclocking-an-arduino-zero-or-any-atsamd21\/\">(READ MORE)<\/a><\/p>\n","protected":false},"author":2,"featured_media":643,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[30,1,31],"tags":[],"class_list":["post-638","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-arduino-tips-tricks","category-general-hacks","category-other-ele-tips"],"_links":{"self":[{"href":"https:\/\/next-hack.com\/index.php\/wp-json\/wp\/v2\/posts\/638"}],"collection":[{"href":"https:\/\/next-hack.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/next-hack.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/next-hack.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/next-hack.com\/index.php\/wp-json\/wp\/v2\/comments?post=638"}],"version-history":[{"count":3,"href":"https:\/\/next-hack.com\/index.php\/wp-json\/wp\/v2\/posts\/638\/revisions"}],"predecessor-version":[{"id":650,"href":"https:\/\/next-hack.com\/index.php\/wp-json\/wp\/v2\/posts\/638\/revisions\/650"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/next-hack.com\/index.php\/wp-json\/wp\/v2\/media\/643"}],"wp:attachment":[{"href":"https:\/\/next-hack.com\/index.php\/wp-json\/wp\/v2\/media?parent=638"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/next-hack.com\/index.php\/wp-json\/wp\/v2\/categories?post=638"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/next-hack.com\/index.php\/wp-json\/wp\/v2\/tags?post=638"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}