Power.org Barcelona retrospective: Programming for Cell
Carl Bender - June 17, 2005
At Power.org's European conference in Barcelona last week, one of the main attractions was IBM's Alex Chow and his presentation on programming for the Cell architecture. Read on for the highlights...
It was exactly one week ago at the Power.org conference in Barcelona that IBM's Alex Chow gave a presentation focused on providing insight into how best to program for Cell, as well as some examples of Cell's potential power when utlilized correctly. Leader of a team of programmers developing code libraries, demos, and workloads for Cell at IBM's Cell Design Center in Austin, Alex was perfectly suited to the task, and his presentation proved both informative and insightful. Just as interesting, the Q&A portion at the end of the presentation raised some questions and received some answers that we will be covering at the end of this article.
The first part of the presentation dealt with porting legacy code to Cell, and specifically how to harness the power of the SPE's in such a way as to see a noticeable improvement on such code. Essentially, the suggested path for programmers wishing to bring the SPE's to bare on legacy code was to isolate individual sub-routines and recompile them for operation on the SPE's, letting the PPE core take care of the scheduling. The clear implication of course is the more sub-routines a program can have siphoned off to the SPE's, the increasingly higher gains that program will see when running on Cell
Next up was strategy and discussion for coding natively on Cell, and there was nothing too surprising here. The usual subjects were touched upon; essentially that one should try to parellelize as much as possible, make use of the SPE's ability to run code independently, and keep the local private storage of 256KB per SPE in mind when designing code for Cell. Probably more interesting than the specific hardware aspects was this slide representing Alex Chow's programming philosophy towards the best direction to approach the entire process from:
For more information on the intricacies of the Cell hardware and it's SPE's in particular, one would be advised to look-up Sony's GDC presentation slides, as those go into more depth concerning the hardware implementation than did IBM's slides at Barcelona. One thing Sony's presentation at GDC didn't have though, that IBM provided at Barcelona, was a direct comparisson in a test between an Intel Xeon processor and an 8-SPE Cell processor at running small and large FFTs, with the results both normalized to 3.2 GHz. We'll let the slides speak for themselves - keep in mind that for the small FFT comparison, the Xeon is competing against one SPE alone, and not the Cell as a whole. The different performance results for the SPE are for straight library code vs code specifically optimized for the SPE - the purpose of the comparison being to show how much additional power can be wrung from the SPE's should a programmer take the time and effort to customize his code.
After the FFT comparison portion of the presentation was concluded, the field was opened up to questions from the audience and from individuals watching the live webcast over the Internet. One of the first questions asked was when IBM was planning to formally open-source Cell software development and grant access to their libraries and knowledge base. With the report in EETmes earlier in the week that IBM was planning just such a move, this was of great interest to the progrsamming community at large. Chow's answer at first was that it would be open-sourced soon, but he revised it to later this summer after giving it some more thought. Dissapointing to those who may have been expecting Cell to be open-sourced during the course of the conference, but not too long of a wait all the same. Another question asking about open-sourcing of the actual Cell hardware met with a different response, Chow stating that a formal decision on that had not yet been reached by the members of the STI group.
Lastly, a member from the audience asked how difficult Chow felt it would be to program for Cell, something that has been discussed extensively around the Internet, especially as it applies to Sony's upcoming Playstation 3. The answer Chow gave was in the form of an analogy, stating that when programming for Cell, it might be best to approach it in the same fashion as one might approach dividing a workload between nine seperate workstations, with each of the workstations in the analogy representing an SPE (plus PE). What he was trying to emphasize was that in as much as a task could be threaded or seperated to run on different CPU's, so too could it be divided to run on the SPE's. The audience member felt his question was misunderstood though, and revised it to ask specifically what tools IBM might be able to provide to programmers interested in programming for Cell but feeling daunted by the task. During the course of Chow's response he revealed that a compiler is currently in the prototyping stages at IBM that would dynamically chop-up code and thread it such that it would automatically make use of the SPE's. One has to wonder just how efficient such a compiler can be, but it will come as a much welcomed tool for developers should the project see fruition.
The entire webcast may be viewed on the Power.org website.