Author Topic: Open Source Litecoin FPGA Miner  (Read 53969 times)

Offline kramble

  • Full Member
  • ***
  • Posts: 227
    • Github
Open Source Litecoin FPGA Miner
« on: July 22, 2013, 03:22:34 PM »
An open open source FPGA litecoin miner https://github.com/kramble/FPGA-Litecoin-Miner

This uses the FPGA internal block ram, so is compatible with existing bitcoin mining hardware. A bitstream has been produced for the Icarus/Lancelot dual Spartan LX150 board which achieves around 33khash/sec.

The original code was written for Altera devices, and gets around 2kHash/sec on a DE0-Nano board. 45khash/sec has been reported by senseless for a huge Stratix IV device, though this code is in need of some further work as recent development has focused on the Xilinx Spartan LX150 device.

My thanks to fpgaminer, teknohog, makomk, TheSeven, OrphanedGland, udif, newMeat1 and the other guys who contributed to the open source fpga bitcoin miner project, on which my code is based. Also the current contributors bluedragon747, razorfishsl, senseless, vpereira and any others I've forgotten.

Anyway, I hope its of some use. Feedback and suggestions are welcome. And yes, I know its abysmally slow (though somewhat faster than when I started the project) :P

ADVICE FOR NEWCOMERS ...

DO NOT PURCHASE A FPGA DEVELOPMENT BOARD OR FPGA BITCOIN MINER JUST TO MINE LITECOIN !!

You will not make any significant return on your investment. This is a hobby project for people who already have the kit and want to experiment with mining litecoin. It is not an economical alternative to GPU mining.

EDITED to update performance figures and credits. The original first post is quoted below by wizzardTim.
« Last Edit: October 09, 2013, 09:27:49 AM by kramble »

Offline jasinlee

  • Litecoin Association Member
  • Sr. Member
  • ***
  • Posts: 1314
Re: Open Source Litecoin FPGA Miner
« Reply #1 on: July 22, 2013, 03:52:46 PM »
More power to you!
Fibonacci Litecoin ASIC Project Discussion Thread
LTCasicYDKQVGySmG5vwCiqiZFpMcr7oMi

Offline wizzardTim

  • Jr. Member
  • **
  • Posts: 66
Re: Open Source Litecoin FPGA Miner
« Reply #2 on: July 22, 2013, 08:24:58 PM »
I hereby present my open source FPGA litecoin miner https://github.com/kramble/FPGA-Litecoin-Miner

Now don't get too excited. I only wrote it because I couldn't find one on the net, and its pretty crude as I'm an amateur at FPGAs. Its a direct port of scrypt.c using the on-chip FPGA RAM. Performance is quite pathetic at around 1kHash/sec per core (and I estimate 4 cores will fit on a LX150).

The initial version is compiled for a DE0-Nano (since that's what I have), but I have also done a version for the DE2-115 (simulated only).

I will need some help with a Xilinx LX150 port as I don't have a board to test on, and I don't have an LX150 Xilinx ISE licence (though I could obtain a 30 day trial). Though to be honest, its not really worth the effort since it'll only mine at a few kHash/sec.

My thanks to fpgaminer and the other guys who contributed to the open source fpga bitcoin miner project, on which my code is based.

Anyway, I hope its of some use. Feebdback and suggestions are welcome (the project is only one week old, but I thought I'd release it as soon as I had some working code). And yes, I know its abysmally slow  :P

It's something! Now the community can make it more efficient / taking yours as base.

Bravo

Offline mhysa

  • Newbie
  • *
  • Posts: 18
Re: Open Source Litecoin FPGA Miner
« Reply #3 on: July 22, 2013, 11:56:56 PM »
How many watts does the FPGA board use while hashing?

Offline kramble

  • Full Member
  • ***
  • Posts: 227
    • Github
Re: Open Source Litecoin FPGA Miner
« Reply #4 on: July 23, 2013, 09:06:46 AM »
How many watts does the FPGA board use while hashing?

Its somewhere around 2 Watts, estimated from the temperature of the DE0-Nano's PSU regulator chips (powered from the host PC via USB). I'll need to do a serial port comms version so I can run the board off an external PSU and get an exact figure for the power draw (it will be useful as I can then run it off my spare raspberry pi), though at 1kHash/sec I reckon its only earning one USD cent per day  :( ... I'm getting roughly one share per hour. According to http://mining-foreman.org these are diff-32 shares, (is this correct? I've currently hard coded the target to 0x00007ff in the mining code).

Offline linusbrauner

  • Newbie
  • *
  • Posts: 15
Re: Open Source Litecoin FPGA Miner
« Reply #5 on: July 23, 2013, 11:37:40 AM »
Hello, could somebody explain me differences between PBKDF2_SHA256_80_128 and PBKDF2_SHA256_80_128_32 functions? Best case in terms of sha256(something_once) and sha256({more;thing;twice}). Thank you!

Offline kramble

  • Full Member
  • ***
  • Posts: 227
    • Github
Re: Open Source Litecoin FPGA Miner
« Reply #6 on: July 23, 2013, 11:59:59 AM »
Hello, could somebody explain me differences between PBKDF2_SHA256_80_128 and PBKDF2_SHA256_80_128_32 functions? Best case in terms of sha256(something_once) and sha256({more;thing;twice}). Thank you!
Some useful resources are ...
cgminer source code https://github.com/ckolivas/cgminer/blob/master/scrypt.c (I based my code directly on this)
http://en.wikipedia.org/wiki/HMAC
http://en.wikipedia.org/wiki/PBKDF2

In simple terms the PBKDF2_SHA256_80_128 is the more complex operation that creates the input for the salsa_mix, then the PBKDF2_SHA256_80_128_32 generates the final hash (compared with the target to determine a share match).

None of the are actually on the critical path for the scrypt operation as it is the salsa_mix that takes most of the time (and I run the PBKDF2 operations in parallel with the mix, so they have no effect on the throughput).

Hope that helps, I'm still learning about the scrypt algorithm myself, so apologies if I've mistaken anything.

Offline Iceworld79

  • Newbie
  • *
  • Posts: 8
Re: Open Source Litecoin FPGA Miner
« Reply #7 on: July 24, 2013, 05:57:00 AM »
Kramble,
  It is great to see u sharing some code here.
  I myself has also spent quite some time on this and try to figure out a way to crack it. Essentially, the more I study the algorithm, the more disappointed I got. There are some theoretical performance boundaries for hardware implementation on this. Interesting enough, the normal time-memory trade-off does apply to the hardware when the platform is memory-capacity bounded (for example, use the on-chip memory only). so your performance number is not a total surprise. 
  Ok, let me get into some more details:
  1) memory-capacity bounded situation.
    in this case, we have plenty of bandwidth but just not enough memory. It is trivial for FPGA to implement a 1024bit wide memory and reach 200Mhz easily without any timing optimization. Ideally, if we can fully utilize the capability of this 1Mb 1024bit-wide@200Mhz memory, we can have 100KH/s.  The thing is, we cannot. The above performance estimation is based on our hardware can consume the 1024bit word EVERY CYCLE. It is not possible because each 1024bit word takes 2 salsa operations. The salsa is designed to run sequentially. It is possible to unroll it into one cycle (in theory), but u will get a really low Fmax. By pipelining it, we get higher Fmax but long latency, where any extra latency we have is essentially wasting the on-chip memory bandwidth.  I have managed to achieve 36cycles for 2 salsa operation with Famx>200Mhz. Therefore, each 1Mb on-chip memory only yields 100(KH/s)/36 ~=3KH/s
   2) memory-bandwidth bounded situation.
   In this case, we use the external memory to do the scrypt. The amount of memory is not a concern (meaning we have enough jobs to keep FPGA busy), but the bandwidth is.  In this case, every 2Gb/s yields 1KH/s. To reach 500KH/s, 1Tb/s is required. That is a lot of burden for FPGA (sure, u can spend $$$ on high end stuff to get there).
   To the best of my knowledge, it is very hard to reach $1/(KH/s) by using FPGA. $2/(KH/s) is definitely doable. However, comparing with $0.6/(KH/s), there seems no edge for FPGA.
   
     

Offline kramble

  • Full Member
  • ***
  • Posts: 227
    • Github
Re: Open Source Litecoin FPGA Miner
« Reply #8 on: July 24, 2013, 09:18:12 AM »
  It is great to see u sharing some code here.

Thanks, I'm only doing this for fun (and to learn some more about fpga coding), so I thought I'd post it up for others to see and comment on (the feedback will help me to learn from my mistakes).

Your comments are very welcome. I'm aware of the pipelining issue for the salsa. I've currently coded it as a single huge combinatorial tree which is limiting my FMax to around 30MHz, and I'm taking 11 clock cycles per double-salsa. This just happened to be the easiest way to get the code working, so I need to look at removing the 3 surplus cycles and pipelining the salsa to get a higher clock speed (though as you mentioned this won't significantly increase the throughput due to the serial nature of the algorithm).

Doing some calculations, I was surprised to find that my half-scratchpad hack (TMTO in the pro's parlance) to squeeze the hasher into the EP4CE22 is actually more efficient than the full scratchpad, so it looks like 9 of these half-cores will fit into a LX150 (as slightly incorrectly explained by fluffysheap here).

I've been looking at ngzhang's Icarus bitcoin miner code here, and I've got a single core port working in simulation, though I can currently only compile it on an LX75 as I'm using the free webpack software. I'll post it up once I'm sure its working properly, but it will need to be compiled for the LX150 by someone with a full licence.

Offline Iceworld79

  • Newbie
  • *
  • Posts: 8
Re: Open Source Litecoin FPGA Miner
« Reply #9 on: July 25, 2013, 02:57:47 AM »
TMTO will not help on the overall performance if the on-chip memory is the only option. Yes, by set TMTO=x, you reduce the memory requirement for one engine to 1/x, but the writing to the memory (first 2 salsa) takes x times longer. interesting enough, TMTO help on the reading side, not the writing side. It is opposite behavior when comparing with the CPU implementation. I guess most of people didn't realize that. It might worth a paper to describe it :)

Offline kramble

  • Full Member
  • ***
  • Posts: 227
    • Github
Re: Open Source Litecoin FPGA Miner
« Reply #10 on: July 25, 2013, 08:38:59 AM »
TMTO will not help on the overall performance if the on-chip memory is the only option. Yes, by set TMTO=x, you reduce the memory requirement for one engine to 1/x, but the writing to the memory (first 2 salsa) takes x times longer. interesting enough, TMTO help on the reading side, not the writing side. It is opposite behavior when comparing with the CPU implementation. I guess most of people didn't realize that. It might worth a paper to describe it :)

Yes. I was initially going to disagree with your point about writing taking 2x longer, but I realize its just the phrasing (the scratchpad setup takes 1024 double salsa operations regardless of the size of the ram, so for TMTO=2, you have half the ram, but it still takes the same 1024 operations to populate it, so I suppose you could say it takes twice as long relative to the size of the ram). On the reading side, again we need 1024 operations, each of which involves a ram read plus a double salsa, however for half of the addresses we need an additional double salsa to compute the interpolated (missing) ram data. So overall I make the performance (1024+1024) / (1024+1024+512) = 80% for TMTO=2, but we now can fit 2 cores rather than one, so overall its a 60% gain over the naive full sized scratchpad.

Anyway, on the progress front I now have a multicore version of the LX150 port running in simulation. Unfortunately I'm using quite a lot of ngzhang's code, so I'll have to check the copyrights carefully (much is open source based on teknohog and others code), and do a cleanroom reimplementation of some parts of the code before I release it. I'll probably want to work on the salsa pipelining too. Your 36 step implementation would seem to imply 16 stages per salsa, plus 2 cycles overhead for ram access, which maps quite well onto my 11 cycles 2*(4+1) +1, so I'll have a go at this and see what it does to my FMax.

Thanks for your comments, very helpful  :)

Offline Garr255

  • Jr. Member
  • **
  • Posts: 72
    • Cognitive Mining
Re: Open Source Litecoin FPGA Miner
« Reply #11 on: July 25, 2013, 09:02:40 AM »
Subbing
Get some! http://dispenser.bitbank.me/ | Invest! http://cognitivemining.com/
First they ignore you, then they laugh at you, then they fight you, then you win.

Offline senseless

  • Jr. Member
  • **
  • Posts: 21
Re: Open Source Litecoin FPGA Miner
« Reply #12 on: August 21, 2013, 06:50:47 AM »
Kramble,

Just wanted to thank you for your work.

After your most recent update that you PM'd me about I went in and compiled it for my Stratix IV chip. I currently have 10 cores hashing away at 2.56Kh/s for a grand total of 25.6Kh/s.

I've seen the memory of my chip operate at 233-240mhz previously when using it for bitcoin. In order to fit more cores I offloaded some of the operations to memory and had no problem staying at 230-240mhz for a hashing rate of 800Mh/s (bitcoin) on the chip. Maybe a bit of restructuring of the top or the scrypt core will boost the hash rate significantly. If all else fails I might just change each core to use a full 1Mbit instead of 512Kbit to see if that will give me a speed up.

Your current design uses more logic than it does memory. During my estimations of how large of a foot print the litecoin core would leave I suspected it would be even smaller than the sha256 core. One way to reduce usage may be to have a single sha256 hasher handle operations for all of the scrypt cores. But that may complicate matters. Currently each core is using 13.8K combinational ALUTs and 7.7K dedicated registers. This means that those with larger chips with a lot of memory may find themselves running out of logic long before they run out of memory.

Quote
Logic utilization: 81%
Combinational ALUTs: 139,449 / 182,400 ( 76 % )
Dedicated logic registers: 79,644 / 182,400 ( 44 % )
Total block memory bits: 5,242,880 / 14,625,792 ( 36 % )

Is there an easy way to switch from using 512Kbit to 1Mbit? I noticed some variables in the configuration files / assignments but it didn't seem to do anything. (HALFRAM=0/1)

If you'd like to make a package for the Artix 7 -- I have an AC701 on hand for testing.

« Last Edit: August 21, 2013, 07:05:03 AM by senseless »

Offline kramble

  • Full Member
  • ***
  • Posts: 227
    • Github
Re: Open Source Litecoin FPGA Miner
« Reply #13 on: August 21, 2013, 09:58:32 AM »
Thanks for the feedback, it was getting a little quiet in here  :)

Its good to hear that it works in a significant multicore configuration. Its only been tested with four cores so far on a LX150 (rasorfish_sl is kindly helping out with this as I can't compile for LX150 myself), giving around 5khash/sec performance.

Yeah, my code is pretty naive. There is a lot more work to do on pipelining and separating the PBKDF2_SHA256 operations from the salsa mix (we only really need one of the SHA256 engines per 16 cores or so as it is idle for most of the time).

I've PM'd you about the HALFRAM parameter, FYI for readers, you define this macro to use the smaller scratchpad (recommended if the device is RAM limited rather than logic limited, as it doubles the number of cores that can be fitted).

Anyway, I'm still working away at this code. The more I do the more seems left to be done. If anyone wants to help out, then feel free to PM me. There was also some comment on the bitcoin forum as to whether the BFL FPGA miners could be repurposed for litecoin. I think the code will port over fine (Altera Arria/Stratix devices), but reverse engineering the communications interface would be a significant challenge (there is an onboard processor to contend with).

Mark

Offline kmtan

  • Newbie
  • *
  • Posts: 6
Re: Open Source Litecoin FPGA Miner
« Reply #14 on: August 21, 2013, 10:45:24 AM »
Interesting project, i cannot help too much on technical discussion, but i am volunteer to become tester if there is a need.
let me know