InvestorsHub Logo
icon url

pgerassi

05/11/05 9:54 PM

#55926 RE: wbmw #55909

Dear Wbmw:

As far as Opteron goes, data may be located in local or remote memory. Best case, you have CPU->memory and worst case, you have CPU->CPU->CPU->memory.{/i]

You are being disingenous. Opteron CPU1 can talk directly with CPU2 without any step to memory. CPU1 owns L1 cache line X for address xxxxxxxxxxH that CPU2 wants. CPU1 then transmits that cache data directly to CPU2 without any going to memory. Even if that packet travels through 1 or more CPUs, it is still "glueless".

Traffic goes CPU->NB->Memory. I understand your point that there absolutely has to be NB in the middle, but I disagree with your representation that CPU->NB and NB->CPU is at all relevant.

It is relevant as Xeon CPU5 can not talk to Xeon CPU3 directly. It can't give it cache data that it just modified. It has to write it to memory and that requires an arbitration cycle by the NB just to begin a data transfer in the FSB no matter where it goes. Then CPU1 sends the data to the NB. Now if the CPU3 is on the same FSB, it could snoop the changes into its own cache. If not, the NB must initiate a data movement to CPU3 with the data (it wins arbitration once the bus is released). It does not need to actually read it from memory but can use its internal buffers.

So part of the arbitration logic must be on the NB and any inter FSB switch. Notice that the memory controllers may actually be off the NB chip like in the bad old RAMBUS days (on the MTH chip) or like in the Itanium server (DTH chips) chipsets. So even if Xeon had on die memory controllers, Xeon still needs the NB to talk to another CPU. If the memory was local, it still needs to put that traffic to the NB for the other Xeon caches to snoop the updates. And the NB would need to broadcast that to all other FSBs. Thus the external "glue" is always required even on systems with 1 CPU.

This misses the point. The point is that these "other chips to translate HTT to other types of I/O" are completely necessary to make a computer system. Without I/O, you can't even load a program. Memory is a volatile storage unit, and Opteron by itself cannot function without a non-volatile storage. So there is necessary "glue" that needs to be in an Opteron system to provide this kind of capability. This "glue" may be less complex than a fully functional North Bridge, but that isn't the point.

You miss the point. The NB is required in Xeon systems even if they had on die memory controllers or even on die I/O controllers. The FSB arbitration system requires it to function. As to needing glue logic on the MB, that is incorrect. All DDR DIMMs come with some ROM on them (serially accessed, but it is there). There is nothing in the DDR interface from you using ROM, battery backed SRAM or FLASH instead of DRAM on a DIMM. There are ones with LEDs on them so a IRLED/photodiode isn't far fetched as an upgrade. Granted that would unusual, but doable. So you can make an Opteron computer with no SB at all.

As to needing some external HW, that is being a nitpicker. If you go that route, no system is glueless. Even those with everything on die, because they need to be connected to outside power or packaged in some way. Coax Ethernet isn't then glueless because it needs a terminator on both ends. It is just another of your pushing things to extremes becoming totally senseless.

So pure I/O and memory is removed from being "glue". That allows MTHs, SBs (ICHs) and DTHs for Xeon/Pentium/Itanium. Opteron can have those too. Still you need that FSB arbitrator and broadcast logic for Xeon/Pentium/Itanium. It can't just translate FSB bus "packets" to I/O bus packets. Or simply pass along memory read and write requests. You can also tell because some only allow 1, 2 or 4 devices on any given FSB.

The CPU in an Intel system only needs a NB as a medium to transfer data.

Sorry, the NB does more than transfer data. It arbitrates access to the bus. It controls which CPU talks on the bus. It acts like a traffic cop at an intersection. And it must act like an Ethernet BaseT hub when more than 1 FSB is present. That last is why CPU->NB and NB->CPU is relevant. Cutting out the NB->MTH->DRAM->MTH->NB steps are a typical speedup or acceleration done to both decrease latency and preserve FSB bandwidth. Look at any recent Intel NB and you will see quite a few buffers to speed this up. 1 would only be needed for any clock domain boundary. Forgetting this, makes your server designs much slower than your competition. Is that why you don't do them any more?

{i]Re: The BIOS can be placed on a DIMM as well.

That's a laughable way to make your argument, Pete. No one does this.

So Cell phones don't do this? I seem to remember even Intel stacking SRAM and flash into one package. A there is ROM on every DDR DIMM sold holding the timing, size and other parameters. Isn't Samsung stacking DRAM, SRAM and flash into a package? How about those HDs with 128MB of flash and a few MB of DRAM. Look at Corsair with their TWINX1024-3200XLPRO memory with LEDs showing usage patterns. The jump to an IR port wouldn't be that far away with enough ROM/flash to contain a AMD64 BIOS. Oh "nobody" does this. Sorry, wrong again.

Sure it will work, but will an 8-socket server scale in performance with just this? Of course not. It isn't a viable solution.

You forget about compute servers. They do not need much I/O and 1 Nforce Pro 2200 would be enough with 2 Gb NICs inside. That has 8GB/s of I/O. Even 8 socket Itaniums only have 6.4GB/s to memory using Intel's chipsets. You just keep digging yourself in deeper.

Isn't it time to stop before you completely bury yourself?

Pete

icon url

pgerassi

05/11/05 9:55 PM

#55927 RE: wbmw #55909

Dear Wbmw:

As far as Opteron goes, data may be located in local or remote memory. Best case, you have CPU->memory and worst case, you have CPU->CPU->CPU->memory.

You are being disingenous. Opteron CPU1 can talk directly with CPU2 without any step to memory. CPU1 owns L1 cache line X for address xxxxxxxxxxH that CPU2 wants. CPU1 then transmits that cache data directly to CPU2 without any going to memory. Even if that packet travels through 1 or more CPUs, it is still "glueless".

Traffic goes CPU->NB->Memory. I understand your point that there absolutely has to be NB in the middle, but I disagree with your representation that CPU->NB and NB->CPU is at all relevant.

It is relevant as Xeon CPU5 can not talk to Xeon CPU3 directly. It can't give it cache data that it just modified. It has to write it to memory and that requires an arbitration cycle by the NB just to begin a data transfer in the FSB no matter where it goes. Then CPU1 sends the data to the NB. Now if the CPU3 is on the same FSB, it could snoop the changes into its own cache. If not, the NB must initiate a data movement to CPU3 with the data (it wins arbitration once the bus is released). It does not need to actually read it from memory but can use its internal buffers.

So part of the arbitration logic must be on the NB and any inter FSB switch. Notice that the memory controllers may actually be off the NB chip like in the bad old RAMBUS days (on the MTH chip) or like in the Itanium server (DTH chips) chipsets. So even if Xeon had on die memory controllers, Xeon still needs the NB to talk to another CPU. If the memory was local, it still needs to put that traffic to the NB for the other Xeon caches to snoop the updates. And the NB would need to broadcast that to all other FSBs. Thus the external "glue" is always required even on systems with 1 CPU.

This misses the point. The point is that these "other chips to translate HTT to other types of I/O" are completely necessary to make a computer system. Without I/O, you can't even load a program. Memory is a volatile storage unit, and Opteron by itself cannot function without a non-volatile storage. So there is necessary "glue" that needs to be in an Opteron system to provide this kind of capability. This "glue" may be less complex than a fully functional North Bridge, but that isn't the point.

You miss the point. The NB is required in Xeon systems even if they had on die memory controllers or even on die I/O controllers. The FSB arbitration system requires it to function. As to needing glue logic on the MB, that is incorrect. All DDR DIMMs come with some ROM on them (serially accessed, but it is there). There is nothing in the DDR interface from you using ROM, battery backed SRAM or FLASH instead of DRAM on a DIMM. There are ones with LEDs on them so a IRLED/photodiode isn't far fetched as an upgrade. Granted that would unusual, but doable. So you can make an Opteron computer with no SB at all.

As to needing some external HW, that is being a nitpicker. If you go that route, no system is glueless. Even those with everything on die, because they need to be connected to outside power or packaged in some way. Coax Ethernet isn't then glueless because it needs a terminator on both ends. It is just another of your pushing things to extremes becoming totally senseless.

So pure I/O and memory is removed from being "glue". That allows MTHs, SBs (ICHs) and DTHs for Xeon/Pentium/Itanium. Opteron can have those too. Still you need that FSB arbitrator and broadcast logic for Xeon/Pentium/Itanium. It can't just translate FSB bus "packets" to I/O bus packets. Or simply pass along memory read and write requests. You can also tell because some only allow 1, 2 or 4 devices on any given FSB.

The CPU in an Intel system only needs a NB as a medium to transfer data.

Sorry, the NB does more than transfer data. It arbitrates access to the bus. It controls which CPU talks on the bus. It acts like a traffic cop at an intersection. And it must act like an Ethernet BaseT hub when more than 1 FSB is present. That last is why CPU->NB and NB->CPU is relevant. Cutting out the NB->MTH->DRAM->MTH->NB steps are a typical speedup or acceleration done to both decrease latency and preserve FSB bandwidth. Look at any recent Intel NB and you will see quite a few buffers to speed this up. 1 would only be needed for any clock domain boundary. Forgetting this, makes your server designs much slower than your competition. Is that why you don't do them any more?

Re: The BIOS can be placed on a DIMM as well.

That's a laughable way to make your argument, Pete. No one does this.


So Cell phones don't do this? I seem to remember even Intel stacking SRAM and flash into one package. A there is ROM on every DDR DIMM sold holding the timing, size and other parameters. Isn't Samsung stacking DRAM, SRAM and flash into a package? How about those HDs with 128MB of flash and a few MB of DRAM. Look at Corsair with their TWINX1024-3200XLPRO memory with LEDs showing usage patterns. The jump to an IR port wouldn't be that far away with enough ROM/flash to contain a AMD64 BIOS. Oh "nobody" does this. Sorry, wrong again.

Sure it will work, but will an 8-socket server scale in performance with just this? Of course not. It isn't a viable solution.

You forget about compute servers. They do not need much I/O and 1 Nforce Pro 2200 would be enough with 2 Gb NICs inside. That has 8GB/s of I/O. Even 8 socket Itaniums only have 6.4GB/s to memory using Intel's chipsets. You just keep digging yourself in deeper.

Isn't it time to stop before you completely bury yourself?

Pete