ok so, from the anonymous benefactor (independent of the shakti team), the one who sponsored me with the zc706 FPGA developer board, he has just had an interesting meeting with MOSIS, and has confirmed that they have an LPDDR3 PHY, so in combination with the DDR3 *controller* that was developed and released by someone working at CERN, this would be the last major piece of the "interfaces" side of puzzle that, until then, blocked progress.
so he asked, also, what would it take to get things ready within a year, to hand over the ASIC-based design to MOSIS, for them to turn it into an ASIC, and i said "a team of engineers would need to be paid for". he asked - and please note the question very carefully - "would USD 250,000 be enough?" to which i replied (genuinely) yes... if done carefully. software also has to be taken care of.
please note: *that's as far as the conversation has gone so far*.
it is still exciting, and the next phase would be to get a strong committment and then i can start finding people to do software bring-up and also develop the VLSI / VHDL which will put all this together - mostly that's glue logic for the interfaces (putting them onto a "Tile" interface if using the rocket-chip or BOOM architecture) but also designing a multiplexer GPIO bank.
i'm also talking to jeff from nyuzi, he designed a software-driven "compute engine that happens to be reasonably good at 3D", the interesting bit being that he's focussed on working out which areas need performance improvements. this is something that's almost completely lacking in the published academic world... *because nobody in the academic world has designed and published a GPU!*
we worked out that nyuzi is approximately 1/16th the speed of MALI400. roughly. although area-for-area it's quite hard to tell whether that's a fair assessment because you can't *get* die areas for MALI400 (anyone know anything better than these estimates? https://forum.beyond3d.com/posts/1176110/ ) and it's the performance / mm^2 / watt that's critical, we worked out that if you put in 2 nyuzi cores and managed to halve the number of instructions / pixel by replacing critical path blocks with hardware-rendering ones, you'd end up at about 25% the performance of MALI400 and that i feel would be "good enough" for a first version. i'd be interested to hear what people think, here.
the *general-purpose compute* performance of nyuzi on the other hand is really good.
anyway lots of planning to do.
l.
--- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
Whoo, excitement. I *really* wish I could help but I'm kind of a perpetually budding hobbyist here. Let me know if you need something strung up in 7400- or 4000-series logic, tho -- *that* is a language I can speak ;)
--- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
On Fri, Jan 19, 2018 at 4:12 PM, Christopher Havel laserhawk64@gmail.com wrote:
Whoo, excitement. I *really* wish I could help but I'm kind of a perpetually budding hobbyist here. Let me know if you need something strung up in 7400- or 4000-series logic, tho -- *that* is a language I can speak ;)
:)
On Fri, 19 Jan 2018 16:06:51 +0000 Luke Kenneth Casson Leighton lkcl@lkcl.net wrote:
so he asked, also, what would it take to get things ready within a year, to hand over the ASIC-based design to MOSIS, for them to turn it into an ASIC, and i said "a team of engineers would need to be paid for". he asked - and please note the question very carefully - "would USD 250,000 be enough?" to which i replied (genuinely) yes... if done carefully. software also has to be taken care of.
A libre risc-v soc would indeed be great.
we worked out that if you put in 2 nyuzi cores and managed to halve the number of instructions / pixel by replacing critical path blocks with hardware-rendering ones, you'd end up at about 25% the performance of MALI400 and that i feel would be "good enough" for a first version. i'd be interested to hear what people think, here.
25% of MALI400 is not enough for even mobile games: it is enough for spinning windows ala Compiz, but that can also be done with much less. So if the goal is something usable, rather than a platform to kickstart nyuzi development on, targeting lower perf with lower power usage would be better.
Now, I don't remember if video decode blocks would be in the RISC parts or in nyuzi. GPUs are usually good at colorspace conversion and scaling, so if there are no dedicated blocks for those, then the nyuzi core should be targeted/sized at that usage. Bicubic upscaling + color conversion to fullhd - the texture units, memory bandwidth and compute parts need to have enough oomph.
Radeon R300 parts are capable of that. I'm not sure how that maps to % of Mali.
- Lauri
On Fri, Jan 19, 2018 at 4:22 PM, Lauri Kasanen cand@gmx.com wrote:
On Fri, 19 Jan 2018 16:06:51 +0000
25% of MALI400 is not enough for even mobile games: it is enough for spinning windows ala Compiz, but that can also be done with much less. So if the goal is something usable, rather than a platform to kickstart nyuzi development on, targeting lower perf with lower power usage would be better.
general-purpose software-based rendering engines are not parrtiicularly good at 3D, luckily jeff's work shows exactly where the prime focus would be needed. he did however point out that due to the sheer quantity of hardware needed to get that optimised hardware-accelerated "function" it would turn nyuzi into a totally different engine.
still thinking about it.
Now, I don't remember if video decode blocks would be in the RISC parts or in nyuzi.
neither. opencores has a series of video "blocks" that could go in a separate engine, DMA-based etc. etc. these will need tto be added as well.
l.
That's awesome news on so many fronts! (libre LPDDR3 PHY, working relationship with MOSIS, possibility of funding, VHDL glue logic for interfaces, multiplexer GPIO bank, performance tuning a libre GPU, video coprocessors with DMA) Sounds like lots of fun!
--- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
On Sat, Jan 20, 2018 at 5:47 AM, Richard Wilbur richard.wilbur@gmail.com wrote:
That's awesome news on so many fronts! (libre LPDDR3 PHY, working relationship with MOSIS, possibility of funding, VHDL glue logic for interfaces, multiplexer GPIO bank, performance tuning a libre GPU, video coprocessors with DMA) Sounds like lots of fun!
yehyeh! don't forget unlimited resources at an indian university to create an 8-stage pipeline harvard architecture superscalar 16-core SMP processor that uses only 120mW per core in 28nm and can run at a 2.5ghz clock rate, mustn't forget that, y'know - i mean it's _just_ the processor bit...
l.
On Fri, Jan 19, 2018 at 6:06 PM, Luke Kenneth Casson Leighton lkcl@lkcl.net wrote:
i'm also talking to jeff from nyuzi, he designed a software-driven "compute engine that happens to be reasonably good at 3D", the interesting bit being that he's focussed on working out which areas need performance improvements. this is something that's almost completely lacking in the published academic world... *because nobody in the academic world has designed and published a GPU!*
we worked out that nyuzi is approximately 1/16th the speed of MALI400. roughly. although area-for-area it's quite hard to tell whether that's a fair assessment because you can't *get* die areas for MALI400 (anyone know anything better than these estimates? https://forum.beyond3d.com/posts/1176110/ ) and it's the performance / mm^2 / watt that's critical, we worked out that if you put in 2 nyuzi cores and managed to halve the number of instructions / pixel by replacing critical path blocks with hardware-rendering ones, you'd end up at about 25% the performance of MALI400 and that i feel would be "good enough" for a first version. i'd be interested to hear what people think, here.
That would by no means be enough. We need video decoding blocks, without those the SoC is as good as an rpi for e.g. the education market that you have mentioned and I'm sure not interested personally on using a system as a desktop/laptop that can't play videos without stuttering while potentially doing something else at the same time. And I'm not even talking about games here. I don't know if there is a simd extension on the cpu cores that someone can write a real time decoder for and hook it up to whatever's needed for firefox etc to automatically redirect to( not that that's an easy thing), but 25% of an outdated low performance gpu sounds really low to me. What's the point of having such an awesome low power core if we can't use the thermal envelope for pushing graphics?
the *general-purpose compute* performance of nyuzi on the other hand is really good.
What is gpgpu used on desktops right now for?
On Mon, Jan 22, 2018 at 8:38 PM, Bill Kontos vkontogpls@gmail.com wrote:
That would by no means be enough. We need video decoding blocks,
... you mean like this? https://opencores.org/project,video_systems
the *general-purpose compute* performance of nyuzi on the other hand is really good.
What is gpgpu used on desktops right now for?
something that usually uses four to ten times more power than the target power budget for the entire SoC.
l.
On Mon, Jan 22, 2018 at 11:12 PM, Luke Kenneth Casson Leighton lkcl@lkcl.net wrote:
... you mean like this? https://opencores.org/project,video_systems
Yes, maybe with the adition of hevc. That would be ideal.
On Tue, Jan 23, 2018 at 12:23 PM, Bill Kontos vkontogpls@gmail.com wrote:
On Mon, Jan 22, 2018 at 11:12 PM, Luke Kenneth Casson Leighton lkcl@lkcl.net wrote:
... you mean like this? https://opencores.org/project,video_systems
Yes, maybe with the adition of hevc. That would be ideal.
do you happen to know if the building blocks - the key high-cpu-load parts - of HEVC (aka H.265) _happen_ to be the same or near-identical to MPEG or H.264 and so on?
also critical will be a YUV->RGB converter plus scaler... and oh look! https://opencores.org/project,video_stream_scaler
if anyone remembers the National Semi Geode LX800 (bought by AMD), that, staggeringly, could actually do 720p video displayed on 1600x1200 (with a bit of a tear at times), and could easily do 1280x720 (without tearing) @ 30fps.... *ENTIRELY IN SOFTWARE*... because it had a YUV->RGB converter hard macro that took care of the most expensive bit.
... and that was a 500mhz 486 with DDR2 RAM! absolutely incredible.
so, anyway, yes: each little piece of the puzzle will be needed, saving big chunks of CPU cycles.
l.
On Tue, Jan 23, 2018 at 2:45 PM, Luke Kenneth Casson Leighton lkcl@lkcl.net wrote:
do you happen to know if the building blocks - the key high-cpu-load parts - of HEVC (aka H.265) _happen_ to be the same or near-identical to MPEG or H.264 and so on?
I don't know. But youtube is pushing vp9 and it's successor av1 now. These are royalty free, while the situation with h.265 is a bit unclear to me in regards to what products need royalties or not. One thing I do know is that h.265 uses blocks of 64x64 pixels for compression vs 16x16 of h.264.
also critical will be a YUV->RGB converter plus scaler... and oh look! https://opencores.org/project,video_stream_scaler
if anyone remembers the National Semi Geode LX800 (bought by AMD), that, staggeringly, could actually do 720p video displayed on 1600x1200 (with a bit of a tear at times), and could easily do 1280x720 (without tearing) @ 30fps.... *ENTIRELY IN SOFTWARE*... because it had a YUV->RGB converter hard macro that took care of the most expensive bit.
... and that was a 500mhz 486 with DDR2 RAM! absolutely incredible.
That sounds impressive indeed.
so, anyway, yes: each little piece of the puzzle will be needed, saving big chunks of CPU cycles.
I have a thin client with a 366MHz AMD Geode. YouTube anything (even @ 240p) almost literally sets it on fire, even with an extremely lightweight Linux distro on it. It doesn't so much skip frames as it does entire 10+sec chunks... and that's with 512MB RAM. I can put a gig in there, sort of... system has a low-level timing issue, I found out from an insider guy -- there is ONE make and model of 1gb PC2700 out there that will work. It's an APacer brand stick and it's absolutely hen's teeth because I've never found it. I've been looking for multiple years now...
On Wed, Jan 24, 2018 at 1:48 AM, Christopher Havel laserhawk64@gmail.com wrote:
I have a thin client with a 366MHz AMD Geode. YouTube anything (even @ 240p) almost literally sets it on fire,
you need to find and compile up the accelerated video extension. last time i did that was 10 years ago. without it the processor will have to do its own YUV-to-RGB conversion and yes it will melt.
l.
On Tue, 23 Jan 2018 20:48:13 -0500 Christopher Havel laserhawk64@gmail.com wrote:
I have a thin client with a 366MHz AMD Geode. YouTube anything (even @ 240p) almost literally sets it on fire, even with an extremely lightweight Linux distro on it. It doesn't so much skip frames as it does entire 10+sec chunks... and that's with 512MB RAM. I can put a gig in there, sort of... system has a low-level timing issue, I found out from an insider guy -- there is ONE make and model of 1gb PC2700 out there that will work. It's an APacer brand stick and it's absolutely hen's teeth because I've never found it. I've been looking for multiple years now...
..plus you need to use proper software, not Firefox/Chrome/whatever, since those use inefficient methods to be able to overlay and transform the content. Something like mplayer is both fast and likely to support the accelerator(s).
- Lauri
arm-netbook@lists.phcomp.co.uk