Adapteva Parallella: Thoughts?

List overview All Threads
Download

newer

older

Intel at CES

OT: The U.S. Needs to Stop...

John Luke Gibson

23 Dec 2016 23 Dec '16

7:02 a.m.

Obviously it's been mentioned before, since it's on the <a href=http://rhombus-tech.net/adapteva/>wiki</a>. There isn't much information on the page however. The core doesn't work standalone, however it is completely open with an HDL and a schematic; it is in the direction that a puristic libre system would be if not "technically" all the way there. The board itself has both(I think?) an arm and a x86 on board, simply because adapteva is too new to have enough libraries ported for a full os (I think?). Now their boards are $99 which is a jump from $40, so my question would be was price differential the reason why it wasn't included or where there too many compatibility/tooling issues? Ultimately this thing would be nice to supplement the ARM in graphics computation if ever the right libraries where written to exploit the adapteva for that. Potentially could run graphics and ai programs better than a standard arm device. Just asking?

Show replies by date

Luke Kenneth Casson Leighton

23 Dec 23 Dec

7:20 a.m.

--- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Fri, Dec 23, 2016 at 7:02 AM, John Luke Gibson eaterjolly@gmail.com wrote:

...

Obviously it's been mentioned before, since it's on the <a href=http://rhombus-tech.net/adapteva/>wiki</a>. There isn't much information on the page however. The core doesn't work standalone, however it is completely open with an HDL and a schematic; it is in the direction that a puristic libre system would be if not "technically" all the way there. The board itself has both(I think?) an arm and a x86 on board, simply because adapteva is too new to have enough libraries ported for a full os (I think?). Now their boards are $99 which is a jump from $40, so my question would be was price differential the reason why it wasn't included or where there too many compatibility/tooling issues?

i believe i spoke to them (it may have been a different company), if i recall correctly (which i probably don't) their core PCB (which they haven't released) is 12-layer, which means "insanely expensive to produce".

mostly it's down to practicality of cost, and time. if people offer to *pay* for these boards to be made, i'll get them done, no problem.

Andrew M.A. Cater

28 Dec 28 Dec

7:33 p.m.

On Fri, Dec 23, 2016 at 07:20:05AM +0000, Luke Kenneth Casson Leighton wrote:

...

crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Fri, Dec 23, 2016 at 7:02 AM, John Luke Gibson eaterjolly@gmail.com wrote:

...
Obviously it's been mentioned before, since it's on the <a href=http://rhombus-tech.net/adapteva/>wiki</a>. There isn't much information on the page however. The core doesn't work standalone, however it is completely open with an HDL and a schematic; it is in the direction that a puristic libre system would be if not "technically" all the way there. The board itself has both(I think?) an arm and a x86 on board, simply because adapteva is too new to have enough libraries ported for a full os (I think?). Now their boards are $99 which is a jump from $40, so my question would be was price differential the reason why it wasn't included or where there too many compatibility/tooling issues?

i believe i spoke to them (it may have been a different company), if i recall correctly (which i probably don't) their core PCB (which they haven't released) is 12-layer, which means "insanely expensive to produce".

mostly it's down to practicality of cost, and time. if people offer to *pay* for these boards to be made, i'll get them done, no problem.

Lovely board, lots of potential - but no community because it's hard to program the fast cores - lots of low level C programming to make best use of it, though someone did do a GNURadio port for Google Summer of Code a while back

I was a Kickstarter backer - but chickened out of the significant porting effort needed. The orignal Kickstarter board came without significant heatsinking so needed extra fan cooling. There was an Ubuntu port for it - and it would probably run Debian with no huge problem - armhf.

It's an ARM, FPGA and then however many Epiphany cores - Anders Olofssen (? spelling ?) built his ideal system for signal processing tasks because he couldn't find the necessary for his Ph.D - the paraphrase on lack of community is from his site.

Ericsson and others have, however, funded additional R&D so they've got to 1024 core boards. Really useful for a compact supercomputer / specialist 5G hardware but fairly tough for pretty much everybody else to get a toehold because the initial learning curve is non-trivial.

Andy C.

...

l.

arm-netbook mailing list arm-netbook@lists.phcomp.co.uk http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook Send large attachments to arm-netbook@files.phcomp.co.uk

Russell Hyer

7:50 p.m.

thanks Andrew for the hat-tip (I'd tip mine if I had one)

On 28 December 2016 at 19:33, Andrew M.A. Cater amacater@galactic.demon.co.uk wrote:

...

On Fri, Dec 23, 2016 at 07:20:05AM +0000, Luke Kenneth Casson Leighton wrote:

...

crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Fri, Dec 23, 2016 at 7:02 AM, John Luke Gibson eaterjolly@gmail.com wrote:

...
Obviously it's been mentioned before, since it's on the <a href=http://rhombus-tech.net/adapteva/>wiki</a>. There isn't much information on the page however. The core doesn't work standalone, however it is completely open with an HDL and a schematic; it is in the direction that a puristic libre system would be if not "technically" all the way there. The board itself has both(I think?) an arm and a x86 on board, simply because adapteva is too new to have enough libraries ported for a full os (I think?). Now their boards are $99 which is a jump from $40, so my question would be was price differential the reason why it wasn't included or where there too many compatibility/tooling issues?

i believe i spoke to them (it may have been a different company), if i recall correctly (which i probably don't) their core PCB (which they haven't released) is 12-layer, which means "insanely expensive to produce".

mostly it's down to practicality of cost, and time. if people offer to *pay* for these boards to be made, i'll get them done, no problem.

Lovely board, lots of potential - but no community because it's hard to program the fast cores - lots of low level C programming to make best use of it, though someone did do a GNURadio port for Google Summer of Code a while back

I was a Kickstarter backer - but chickened out of the significant porting effort needed. The orignal Kickstarter board came without significant heatsinking so needed extra fan cooling. There was an Ubuntu port for it - and it would probably run Debian with no huge problem - armhf.

It's an ARM, FPGA and then however many Epiphany cores - Anders Olofssen (? spelling ?) built his ideal system for signal processing tasks because he couldn't find the necessary for his Ph.D - the paraphrase on lack of community is from his site.

Ericsson and others have, however, funded additional R&D so they've got to 1024 core boards. Really useful for a compact supercomputer / specialist 5G hardware but fairly tough for pretty much everybody else to get a toehold because the initial learning curve is non-trivial.

Andy C.

...
l.

arm-netbook mailing list arm-netbook@lists.phcomp.co.uk http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook Send large attachments to arm-netbook@files.phcomp.co.uk

arm-netbook mailing list arm-netbook@lists.phcomp.co.uk http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook Send large attachments to arm-netbook@files.phcomp.co.uk

Russell Hyer

8:01 p.m.

Hrm... I just browsed their github for fft (just to get a feel for their examples) and it's hopelessly overly complicated. If I accidentally inherent 99 USD, I might get one and speed up some algorithms, but I generally speed up my algorithms by making them do less work :)

Russ

On 28 December 2016 at 19:50, Russell Hyer russell.hyer@gmail.com wrote:

...

thanks Andrew for the hat-tip (I'd tip mine if I had one)

On 28 December 2016 at 19:33, Andrew M.A. Cater amacater@galactic.demon.co.uk wrote:

...
On Fri, Dec 23, 2016 at 07:20:05AM +0000, Luke Kenneth Casson Leighton wrote:

...

crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Fri, Dec 23, 2016 at 7:02 AM, John Luke Gibson eaterjolly@gmail.com wrote:

...
Obviously it's been mentioned before, since it's on the <a href=http://rhombus-tech.net/adapteva/>wiki</a>. There isn't much information on the page however. The core doesn't work standalone, however it is completely open with an HDL and a schematic; it is in the direction that a puristic libre system would be if not "technically" all the way there. The board itself has both(I think?) an arm and a x86 on board, simply because adapteva is too new to have enough libraries ported for a full os (I think?). Now their boards are $99 which is a jump from $40, so my question would be was price differential the reason why it wasn't included or where there too many compatibility/tooling issues?

i believe i spoke to them (it may have been a different company), if i recall correctly (which i probably don't) their core PCB (which they haven't released) is 12-layer, which means "insanely expensive to produce".

mostly it's down to practicality of cost, and time. if people offer to *pay* for these boards to be made, i'll get them done, no problem.

Lovely board, lots of potential - but no community because it's hard to program the fast cores - lots of low level C programming to make best use of it, though someone did do a GNURadio port for Google Summer of Code a while back

I was a Kickstarter backer - but chickened out of the significant porting effort needed. The orignal Kickstarter board came without significant heatsinking so needed extra fan cooling. There was an Ubuntu port for it - and it would probably run Debian with no huge problem - armhf.

It's an ARM, FPGA and then however many Epiphany cores - Anders Olofssen (? spelling ?) built his ideal system for signal processing tasks because he couldn't find the necessary for his Ph.D - the paraphrase on lack of community is from his site.

Ericsson and others have, however, funded additional R&D so they've got to 1024 core boards. Really useful for a compact supercomputer / specialist 5G hardware but fairly tough for pretty much everybody else to get a toehold because the initial learning curve is non-trivial.

Andy C.

...
l.

arm-netbook mailing list arm-netbook@lists.phcomp.co.uk http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook Send large attachments to arm-netbook@files.phcomp.co.uk

arm-netbook mailing list arm-netbook@lists.phcomp.co.uk http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook Send large attachments to arm-netbook@files.phcomp.co.uk

Luke Kenneth Casson Leighton

29 Dec 29 Dec

12:44 a.m.

On Thursday, December 29, 2016, Andrew M.A. Cater < amacater@galactic.demon.co.uk> wrote:

...

On Fri, Dec 23, 2016 at 07:20:05AM +0000, Luke Kenneth Casson Leighton wrote:

...

crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Fri, Dec 23, 2016 at 7:02 AM, John Luke Gibson <eaterjolly@gmail.com

javascript:;> wrote:

...
...
Obviously it's been mentioned before, since it's on the <a href=http://rhombus-tech.net/adapteva/>wiki</a>. There isn't much information on the page however. The core doesn't work standalone, however it is completely open with an HDL and a schematic; it is in the direction that a puristic libre system would be if not "technically" all the way there. The board itself has both(I think?) an arm and a x86 on board, simply because adapteva is too new to have enough libraries ported for a full os (I think?). Now their boards are $99 which is a jump from $40, so my question would be was price differential the reason why it wasn't included or where there too many compatibility/tooling issues?

i believe i spoke to them (it may have been a different company), if i recall correctly (which i probably don't) their core PCB (which they haven't released) is 12-layer, which means "insanely expensive to produce".

mostly it's down to practicality of cost, and time. if people offer to *pay* for these boards to be made, i'll get them done, no problem.

Lovely board, lots of potential - but no community because it's hard to program the fast cores - lots of low level C programming to make best use of it, though someone did do a GNURadio port for Google Summer of Code a while back

I was a Kickstarter backer - but chickened out of the significant porting effort needed. The orignal Kickstarter board came without significant heatsinking so needed extra fan cooling. There was an Ubuntu port for it - and it would probably run Debian with no huge problem - armhf.

It's an ARM, FPGA and then however many Epiphany cores - Anders Olofssen (? spelling ?) built his ideal system for signal processing tasks because he couldn't find the necessary for his Ph.D - the paraphrase on lack of community is from his site.

Ericsson and others have, however, funded additional R&D so they've got to 1024 core boards. Really useful for a compact supercomputer / specialist 5G hardware but fairly tough for pretty much everybody else to get a toehold because the initial learning curve is non-trivial.

thats interesting. i worked for aspex semi in 2003 and they had the exact same problem, programming ultra parallel devices is limited to a few hundred competent people in the entire world.

interesting to me because ericsson bought aspex.

...

Andy C.

...
l.

arm-netbook mailing list arm-netbook@lists.phcomp.co.uk javascript:; http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook Send large attachments to arm-netbook@files.phcomp.co.uk javascript:;

arm-netbook mailing list arm-netbook@lists.phcomp.co.uk javascript:; http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook Send large attachments to arm-netbook@files.phcomp.co.uk javascript:;

-- --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

Lauri Kasanen

8:03 a.m.

On Thu, 29 Dec 2016 00:44:33 +0000 Luke Kenneth Casson Leighton lkcl@lkcl.net wrote:

...

thats interesting. i worked for aspex semi in 2003 and they had the exact same problem, programming ultra parallel devices is limited to a few hundred competent people in the entire world.

interesting to me because ericsson bought aspex.

Surely that's changed now, with the ubiquity of modern GPUs? It's entirely normal there to handle hundreds or thousands of threads at once, at massively parallel workloads.

Writing shaders is still not an everyday programmer task, but probably more of capable people than just hundreds.

- Lauri

Luke Kenneth Casson Leighton

8:19 a.m.

--- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Thu, Dec 29, 2016 at 8:03 AM, Lauri Kasanen cand@gmx.com wrote:

...

On Thu, 29 Dec 2016 00:44:33 +0000 Luke Kenneth Casson Leighton lkcl@lkcl.net wrote:

...
thats interesting. i worked for aspex semi in 2003 and they had the exact same problem, programming ultra parallel devices is limited to a few hundred competent people in the entire world.

interesting to me because ericsson bought aspex.

Surely that's changed now, with the ubiquity of modern GPUs? It's entirely normal there to handle hundreds or thousands of threads at once, at massively parallel workloads.

for the ASP: no. not a chance. it was a massively-parallel (deep SIMD) *two bit* string-array processor with (in some variations) 256 bits of content-addressable memory, that used "tagging" to make the decision as to whether any one SIMD instruction was to be executed or not.

it was so far outside of mainstream processing "norms" that they actually had to use gcc -E pre-processing "macros" to substitute pre-defined pipeline-stuffing c code loaded with hexadecimal representations of assembly-code instructions to be sent to the SIMD unit.

a similar trick was deployed by ingenic for their X-Burst VPU: their pre-processing mechanism is a dog's dinner mess of awk and perl that would look for appropriate patterns in pre-existing c-code, whereas Aspex's technique was to just put capitalised macros directly interspersed in c code and let the pre-processing phase explicitly take care of it.

it was utterly horrible and insane, and it was only tolerated on the *promise* that, at the time each architecture was announced, it could do *CERTAIN* tasks at a hundred times faster than the AVAILABLE silicon of the time.

of course... by the time each architecture revision actually came out (18 months + later) the speed of pentium processors had of course increased so greatly that the gap was only 20, 10 or even 5 times greater....

to write code for the ASP you measured the number of lines of code in DAYS per line of (assembly-style) code. you actually had to write a spreadsheet to work out whether it was more efficient to map the operands into single-bit linear per processor or to use the "string" feature to process operands spread out in parallel across mulitple neighbouring APUs.

the factor which made this analysis so insanely complex was that the "load and unload" time had to be done linearly using a standard memory bus, and was a looong time relative to the clock rate of the APUs. thus, if you only needed to do a small amount of computation it was best to use the single-bit technique (4,000 answers in a slower time, to match the "load and unload" time), but if you had a lot of computation to perform it was better to use the parallel technique, in order to keep the little buggers busy whilst waiting for load or unload.

... or... anything in between. 2,4, 5, 6, 8, 12, 24, 32, 64, 96, 128 or 256 bit parallel computation, it was all the same to an array-string massively-parallel deep SIMD *bit-level* processor.

but it made programming it absolutely flat-out totally impractical and even undesirable, except for those very very rare cases, usually related to the ultra-fast content-addressable-memory capability.

i.e. extremely, extremely rare.

putting a "normal" c compiler on top of the ASP, or porting OpenCL to it, would be an estimated 50-man-year research and programming effort all on its own. just... not worth the effort, sadly.

Russell Hyer

8:51 a.m.

wow, that's hilarious, thanks for that luke

On 29 December 2016 at 08:19, Luke Kenneth Casson Leighton lkcl@lkcl.net wrote:

...

crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Thu, Dec 29, 2016 at 8:03 AM, Lauri Kasanen cand@gmx.com wrote:

...
On Thu, 29 Dec 2016 00:44:33 +0000 Luke Kenneth Casson Leighton lkcl@lkcl.net wrote:

...
thats interesting. i worked for aspex semi in 2003 and they had the exact same problem, programming ultra parallel devices is limited to a few hundred competent people in the entire world.

interesting to me because ericsson bought aspex.

Surely that's changed now, with the ubiquity of modern GPUs? It's entirely normal there to handle hundreds or thousands of threads at once, at massively parallel workloads.

for the ASP: no. not a chance. it was a massively-parallel (deep SIMD) *two bit* string-array processor with (in some variations) 256 bits of content-addressable memory, that used "tagging" to make the decision as to whether any one SIMD instruction was to be executed or not.

it was so far outside of mainstream processing "norms" that they actually had to use gcc -E pre-processing "macros" to substitute pre-defined pipeline-stuffing c code loaded with hexadecimal representations of assembly-code instructions to be sent to the SIMD unit.

a similar trick was deployed by ingenic for their X-Burst VPU: their pre-processing mechanism is a dog's dinner mess of awk and perl that would look for appropriate patterns in pre-existing c-code, whereas Aspex's technique was to just put capitalised macros directly interspersed in c code and let the pre-processing phase explicitly take care of it.

it was utterly horrible and insane, and it was only tolerated on the *promise* that, at the time each architecture was announced, it could do *CERTAIN* tasks at a hundred times faster than the AVAILABLE silicon of the time.

of course... by the time each architecture revision actually came out (18 months + later) the speed of pentium processors had of course increased so greatly that the gap was only 20, 10 or even 5 times greater....

to write code for the ASP you measured the number of lines of code in DAYS per line of (assembly-style) code. you actually had to write a spreadsheet to work out whether it was more efficient to map the operands into single-bit linear per processor or to use the "string" feature to process operands spread out in parallel across mulitple neighbouring APUs.

the factor which made this analysis so insanely complex was that the "load and unload" time had to be done linearly using a standard memory bus, and was a looong time relative to the clock rate of the APUs. thus, if you only needed to do a small amount of computation it was best to use the single-bit technique (4,000 answers in a slower time, to match the "load and unload" time), but if you had a lot of computation to perform it was better to use the parallel technique, in order to keep the little buggers busy whilst waiting for load or unload.

... or... anything in between. 2,4, 5, 6, 8, 12, 24, 32, 64, 96, 128 or 256 bit parallel computation, it was all the same to an array-string massively-parallel deep SIMD *bit-level* processor.

but it made programming it absolutely flat-out totally impractical and even undesirable, except for those very very rare cases, usually related to the ultra-fast content-addressable-memory capability.

i.e. extremely, extremely rare.

putting a "normal" c compiler on top of the ASP, or porting OpenCL to it, would be an estimated 50-man-year research and programming effort all on its own. just... not worth the effort, sadly.

l.

arm-netbook mailing list arm-netbook@lists.phcomp.co.uk http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook Send large attachments to arm-netbook@files.phcomp.co.uk

2915

Age (days ago)

2921

Last active (days ago)

arm-netbook@lists.phcomp.co.uk

8 comments

5 participants

tags (0)

participants (5)

Andrew M.A. Cater
John Luke Gibson
Lauri Kasanen
Luke Kenneth Casson Leighton
Russell Hyer