• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Inside IBM's Xbox Chip

sonicfan

Venerable Member
Nothing really new, but its from Businessweek.

Inside IBM's Xbox Chip
Big Blue offers a sneak peak at the powerful chip at the heart of Microsoft's next-generation gaming console

There's a lot riding on Microsoft's new and much-hyped gaming console, the Xbox 360 -- and not just for Microsoft (MSFT). IBM (IBM), maker of the chips that will run the machine, has a lot at stake, too.
Advertisement

IBM is keen to improve its reputation for manufacturing semiconductors after Apple (AAPL) earlier this year said it would begin using chips from Intel (INTC) starting in 2006 (see BW Online, 6/7/05 "Apple Hits the Intel Switch"). At least part of the reason for the switch was Apple's frustration with the pace of development at IBM and Big Blue's inability to deliver a version of its PowerPC 970 chip suitable for use in a notebook computer. At other times, IBM has struggled to produce the number of chips that Apple needed.

To prove its chipmaking mettle, IBM is showing what its new Xbox chip is made of -- literally -- on Oct. 25. the outfit will make the disclosure at the Fall Processor Forum, an annual gathering of chip engineers taking place Oct. 25-26 in San Jose, Calif.

MULTI-BRAINED. The chip was developed specifically for Microsoft, and as such IBM won't sell it to any other customers. That's in contrast to the arrangement for IBM's Cell Processor, which is going into Sony's (SNE) PlayStation 3 console but is also being used in other specialty computer systems (see BW Online, 8/25/05, "Cell: A Chip That Is Going Public").

Microsoft will own the rights to the chip, says IBM Vice-President James Comfort. IBM says the chip is in full production at its factory in East Fishkill, N.Y., and at a plant in Singapore owned by Chartered Semiconductor (CHTR), which will serve as a second source for Microsoft, and was a partner in the development.

The Xbox 360 is only one of three gaming systems for which IBM's microelectronics group has either fully or partially been involved in chip design and development. IBM collaborated with Sony and Toshiba (TOSBF) on the development of the Cell Processor in PlayStation 3, due for release in early 2006. It has also landed a chip in the forthcoming Nintendo Revolution.

The Xbox 360 chip will feature three "cores" -- a core being the central brain of a chip. Chip companies including IBM, Intel, and Advanced Micro Devices (AMD) have been building chips for personal computers using two cores. Just last week Apple Computer announced a new version of its PowerMac G5 computers which use a dual-core version of IBM's PowerPC chip.

"DESIGN RE-USE." Having two -- or in this case, three -- cores allows a chip to more efficiently split computing tasks and thus get more work done in less time. Adding a second core generally delivers a significant improvement to computing performance while minimizing the impact of power consumption and heat. Getting the right tradeoff between those two forces of physics is a constant battle for chip engineers. The more power a chip consumes, the hotter it gets, and the harder it is to keep cool.

Each core will also be able to act on two threads at once. Think of threads as customers at a grocery store waiting in a checkout line. Each core would be like a checkout clerk who can work with two customers are once, thus shortening the wait time. Each core's ability to handle two jobs simultaneously means the chip can act like it is in fact six chips. Each core will operate at 3.2 gigahertz, which is comparable to the processing speed of Intel's fastest Pentium processor.

Analyst Kevin Krewell of Instat/MDR, which is hosting the Fall Processor Forum, says the new IBM chip shares much of its lineage with the Cell Processor and other chips in the PowerPC family that have come before. "The basic core at the heart of this chip is very similar to that in the Cell Processor," Krewell says. "There's definitely some design re-use going on here."

CONSOLE RACE. Success of the Xbox console, due to be released on Nov. 22, is crucial for Microsoft as well. While the company doesn't disclose Xbox results specifically, the division that specializes in home entertainment reported a loss of $391 million on sales of $3.2 billion, or 8% of the total, in the year that ended June. 30.

Microsoft loses money on every first-generation Xbox it sells. New versions of the product will be managed with a much closer attention to cost. "IBM was given a size and a cost to shoot for, and IBM put as many cores in and as much performance as it could in within those boundaries," says Krewell.

Microsoft, in a bid to get its next-generation gaming console on the market before Sony's PlayStation 3, held suppliers to tight deadlines. IBM's Comfort said the company sped up its development cycle to meet Microsoft's demanding timetable. IBM's Engineering Technology Services unit kicked development into high gear, cutting a process that would have normally taken 30 to 36 months down to 24 months. That meant making sure there were no mistakes made along the way. "We paid extremely close attention to detail in our design practices," Comfort says.

COMING ATTRACTION. "Microsoft is really touting the fact the the Xbox 360 will be on the market before the PlayStation, and that means that IBM had to build the chip right the first time," Krewell says. "There was little opportunity to go back and re-spin the silicon."

As the Xbox 360 makes its debut in the coming weeks, consumers and critics across the U.S. will get the chance to put IBM's chipmaking prowess to the test.
 

ThirdEye

Member
Microsoft, in a bid to get its next-generation gaming console on the market before Sony's PlayStation 3, held suppliers to tight deadlines. IBM's Comfort said the company sped up its development cycle to meet Microsoft's demanding timetable. IBM's Engineering Technology Services unit kicked development into high gear, cutting a process that would have normally taken 30 to 36 months down to 24 months. That meant making sure there were no mistakes made along the way. "We paid extremely close attention to detail in our design practices," Comfort says.
Even the CPU was rushed. No revisions allowed. What happens if there should be a 'mistake'?
 

Vince

Banned
Each core will also be able to act on two threads at once. Think of threads as customers at a grocery store waiting in a checkout line. Each core would be like a checkout clerk who can work with two customers are once, thus shortening the wait time. Each core's ability to handle two jobs simultaneously means the chip can act like it is in fact six chips. Each core will operate at 3.2 gigahertz, which is comparable to the processing speed of Intel's fastest Pentium processor.

That's utter bullshit, how hard is it to understand or explain SMT? It's not acting like 6 chips, it's acting like 3 because it only has the computational structures to actively work on 3 threads at once.

It's like having 3 check-out clerks who each watch over 2 check-out lines. While one customer is loading more groceries onto the conveyor belt, the clerk can turn and work on the other line, or vice-versa. Each clerk can never be working on more than 1 check-out line per time.
 

HyperionX

Member
Vince said:
That's utter bullshit, how hard is it to understand or explain SMT? It's not acting like 6 chips, it's acting like 3 because it only has the computational structures to actively work on 3 threads at once.

It's like having 3 check-out clerks who each watch over 2 check-out lines. While one customer is loading more groceries onto the conveyor belt, the clerk can turn and work on the other line, or vice-versa. Each clerk can never be working on more than 1 check-out line per time.

Truth be the original explanation is the right one for SMT. Real SMT chips can work on two thread at the same time. However, the Xbox 360 doesn't really support SMT but rather some form of coarser grained MT.
 

Stinkles

Clothed, sober, cooperative
I don't understand a word of that, yet I still claim this means the 360 is weak and will break when you buy it.
 

Marathon

Sony's DrGAKMAN
Despite the talk of high clock speed and three cores, so far the stuff we are seeing from the 360 chip isn't any better than what people are already running on single core pc systems - including games that are supposedly using all three cores.

It's hard to get excited about a system that's games are on average the same as what I am running right now on my system.
 

Vince

Banned
HyperionX said:
Truth be the original explanation is the right one for SMT. Real SMT chips can work on two thread at the same time.

If you can, concurrently, execute work on two threads at time t, then how is that not a form of SMP|CMP? You, by necessity, need duplicate execution/logic structures. SMT is a way to maximize the effeciency of your execution resources by masking pipeline bubbles/stalls by allowing for the quick switch and execution on a secondard thread when the first stalls, no? Where did I go wrong?
 

mrklaw

MrArseFace
Vince said:
That's utter bullshit, how hard is it to understand or explain SMT? It's not acting like 6 chips, it's acting like 3 because it only has the computational structures to actively work on 3 threads at once.

It's like having 3 check-out clerks who each watch over 2 check-out lines. While one customer is loading more groceries onto the conveyor belt, the clerk can turn and work on the other line, or vice-versa. Each clerk can never be working on more than 1 check-out line per time.


Simplify it by stripping it down to 1 core to help the explanation for a muppet like me :)

Can it do the work of 2x3.2 GHz cores? or is it effectively 2x1.6GHz cores?

So is the arrangement primarily to allow you to get closer to a theoretical single 3.2GHz processors capability, avoiding any pauses by using those pauses to do other stuff?

I'm just wondering how difficult that is to program. If its 2x3.2 (or even 2x1.6) then its predictable, so you could slap - eg an AI routine on one core, knowing you have 1.6GHz effective power to do your routine.

But if your thread only gets whats left over from the other thread, then its much more difficult to know what you'll get done, so therefore more difficult to plan for?
 

Vince

Banned
mrklaw said:
Simplify it by stripping it down to 1 core to help the explanation for a muppet like me :)

Can it do the work of 2x3.2 GHz cores? or is it effectively 2x1.6GHz cores?

I would have said, prior to being questioned above, that it can do the "work" of 1*3.2 GHz cores as it has the ability to work on 1 thread at 3.2GHz at a high level of effeciency. It's allways been my understanding that SMT is a way to keep the resources of a processor (execution units, caches, control logic, memory) busy by allowing for the quick switch from a stalled thread to a secondary one without the huge context switching penality. At no time does it allow for more than 100% of the theoretical peak to be reached, but it does help hide latency (I think this is why Intel went with their Jackson technology in the 20-odd stage Pentium4) and/or improve throughput dramatically over single threaded apps in specific cases (I think IBM uses it more for this... could be wrong).

the "2*1.6GHz core" thing, I assume you lifted it from Deano(?) and I'd hesitate to speculate on that, but I assume he stated that due to Cell's PPE having significantly more duplicated computational structures in it's DD2 revision core than the traditional SMT design, but it's speculation off the top of the head of a drunk guy who just got back from watching the WhiteSox sweep some shit-ass team from below the mason-dixon line.

mrklaw said:
So is the arrangement primarily to allow you to get closer to a theoretical single 3.2GHz processors capability, avoiding any pauses by using those pauses to do other stuff?

Uh huh.

As for the programming questions, I default to Fafalada. Maybe Marco (or Panajev), but as a fellow Italian I know how lazy we tend to be when it comes to that whole work thing. ;) But, I dunno, I could be very, very wrong... so I go sleepy now.
 

gofreak

GAF's Bob Woodward
Vince said:
The "2*1.6GHz core" thing, I assume you lifted it from Deano(?) and I'd hesitate to speculate on that, but I assume he stated that due to Cell's PPE having significantly more duplicated computational structures in it's DD2 revision core than the traditional SMT design, but it's speculation off the top of the head of a drunk guy who just got back from watching the WhiteSox sweep some shit-ass team from below the mason-dixon line.

There was some info in another presentation/paper that touched on this. Basically in the Cell PPE, if one thread can "go" and the other is stalled, the former will get all cycles at that time. But, if both threads can "go", the cycles are dished out alternately between them - i.e. one active thread does not monopolise resources at the expense of another thread that could also be active. So if you had two threads that could always be active, it'd be like have 2*1.6Ghz processors. But of course, and as is likely in the real world, if one thread is more active than the other etc. you could think of it as an arbitrary split e.g. 1 2Ghz processor, 1 1.2Ghz processor etc. etc.

I don't know if this is the same on X360's core. I'm guessing this might be the difference mentioned by that Crytek guy.
 

Panajev2001a

GAF's Pleasant Genius
Brief recap after the laptop ate my long-ass post (tm) :(.

Pentium IV: the dynamic scheduler looks at all the instructions in the ROB and issues the to the available execution units. The scheduler is thread unaware (IIRC it is although in the trace cache each uop is tagged by a thread ID however it would not change the "effective" behaviour of the scheduler really so we can even assume it is not abstarcting things a little) and just cares that the execution units quite their whining that they are starved and that they will call Child Care on it. If one thread can feed them all, so be it... if it cannot and there is anther thread with ready to go instructions (uops, decoded instructions) it will issue them.

So the CPU can look like two CPU's in one, but the two CPU's will have only a subset of the total execution units available to them.

SMT is useful because often a single thread does not have enough ILP (Instruction Level Parallelism, that is basically how many independent instructions there are on average so that they can be issued and executed in parallel) to find enough work to do for ALL of its execution units and this might or might not match the issue width of the CPU.

On the in-order PPE/XeCPU cores things are a little bit different.

PPE.JPG


As you see threads alternate their fetchs and decode and issue phases proceeding in program order through the pipeline: in the end each thread can issue two instructions every other cycle if both threads are active. If only a thread is active it can still issue two instructions.

If you count for data stalls (waioting for a dependent instruction to forward the result, etc...) you can already see how the FXU, BEU and the LSU might be at one time working on different threads, but it is even more clear if you look at the VMX/FPU Issue Queue.

As you see their issuing is decoupled from the FXU, LSU and BEU which are issued to directly from the top-level Issue Queue. This "nesting" can allow you to see how independent the two "issue groups" can be. For example it might be interesting to have thread A to be Integer Heavvy while thread B to be FP heavvy... this way you would turn the effective issue width to 3-4 instructions per cycle (sorry have to run and rush around, the diagram should make it easy to see else someone good like nAo or Fafalada or ERP or element or maharg or Vince or gofreak, etc... will comment on it and mistakes I might have made re-typing this post at lightspeed :p).
 

pestul

Member
Marathon said:
Despite the talk of high clock speed and three cores, so far the stuff we are seeing from the 360 chip isn't any better than what people are already running on single core pc systems - including games that are supposedly using all three cores.

It's hard to get excited about a system that's games are on average the same as what I am running right now on my system.
Are you telling me you've played games that look as good as PGR3 and Kameo on PC? Show me one PC game that's out today that looks better than any 360 game (outside of Tony Hawk - cause that looks shit).
 
Marathon said:
Despite the talk of high clock speed and three cores, so far the stuff we are seeing from the 360 chip isn't any better than what people are already running on single core pc systems - including games that are supposedly using all three cores.

It's hard to get excited about a system that's games are on average the same as what I am running right now on my system.

No games are using all 3 threads. The first gerneration games are all 1 Thread. This maybe why some games look like current PC games. Another reason is some are just that (Quake 4 and CoD2). Next year we will see more. But I have to agree PGR3 looks amazing and is better then what you have on PC.
 

gofreak

GAF's Bob Woodward
Spazbiohaz said:
No games are using all 3 threads.

If you mean all 3 cores, untrue. I believe CoD2 for starters uses all 3 cores (no idea how many threads it uses, but you could theoretically use all 3 cores fully with 3 threads if you wanted). There are others, I believe.

Comparisons to PCs should be very qualified. PC CPUs are quite different - mainly, OOOE and cache-heavy. These things are easy/familiar to work with for most devs, unlike the console CPUs (relatively).
 

Fuma

Banned
gofreak said:
If you mean all 3 cores, untrue. I believe CoD2 for starters uses all 3 cores (no idea how many threads it uses, but you could theoretically use all 3 cores fully with 3 threads if you wanted). There are others, I believe.

Comparisons to PCs should be very qualified. PC CPUs are quite different - mainly, OOOE and cache-heavy. These things are easy/familiar to work with for most devs, unlike the console CPUs (relatively).

I thought CoD2 was only using 2 cores. One core for the AI and the other core for everything else.
 
Top Bottom