Forum OpenACS Q&A: Building a high-capacity, high-availability OpenACS solution...

Greetings.  First of all, I want to thank all of you who are working
on OpenACS, and everyone in this community - I've been lurking
here for quite some time, and this is a wonderful place to learn.

Here's my situation:

Our company, Celebrityblvd.com, is gearing for a complete
upheaval of our site.  We are aiming at getting 1 million hits/day
or more.  Currently, our site consists of just a few simple
network pages that lead to the meat of our site, which is our
Official Celebrity websites (we sign contracts and develop,
maintain, and host official sites for celebrities).  Understandably,
we already have quite a bit of traffic, even though we haven't
launched any of our media pushes, since our magnet content
has a rabid fanbase.

What we are looking to do is create a community site, replete
with instant messaging/chat (implemented through bantu.com,
and hosted through their servers), and many of the ACS's
futures, including user groups (fan clubs), portals, ecommerce,
user homepages, etc.  I've recently been researching platforms
upon which to implement this solution.  We are currently on tight
constraints, budget-wise, but that should change within the next
few months.  Therefore, I have several issues/questions:

1) How feasible is it to implement something of this magnitude
using Postgres and the OpenACS system?  I have heard
discouraging things about the scalability of postgres, but I would
like to, if possible, stick with an open-source solution.  Would it
be very difficult to port the site to ACS/Oracle when we are
experiencing higher levels of usage?  Keep in mind that I would
only want to do this as a last resort; if possible I would rather
keep everything in Postgres.

2) What sort of hardware would be ideal for this solution?  I
would be running on Redhat 6.2, probably on an Intel platform,
since that is what I'm used to.  I would probably want to load
balance two or three servers (would that be enough?) for
AOLserver, with one dedicated machine for the database.
However, I currently only have the budget (I don't decide the
finances here!) for one machine - what should I use?  We will
probably be hosted at rackspace.com, and we currently have a
PIII/650 with 256mb of ram and an 18G SCSI there.

3) Since we will definitely need a highly graphical, very
personalized interface, would we be best off using Karl's
templates, or manually going in and rewriting the TcL?  I'm still
not clear on how flexible Karl's system is - we need maximum
flexibility.

I thank you in advance for ANY help you can offer - again, I am
grateful for everything this community offers the public.

- Ravi Gadad

I don't think scalability or reliability per se will be an issue for today's Postgres on modern hardware.

BUT - you're proposing building what is quite likely the busiest and most popular Postgres-backed site in the world.  You'll be an early achiever, in other words.  I can't point you to any data from existing OpenACS sites to support the notion that Postgres will give you the 24/7 reliability you need.  Of course, I also can't point you to any data suggesting that it won't.  We just don't have the data.

Now, I want to see this data exist, i.e. busy PG-backed OpenACS sites running out there so we can get a handle on the combination's scalability and reliability, but I can't honestly tell you that you should be the first major Postgres-backed site on the block.  That's something you'll have  to decide.

If the others in your business share your desire to support Open Source, it's probably a reasonable path to take.  If the others are conservative, could care less about Open Source, and want to take the lowest risk path, then Oracle's probably the right choice for you.  It  sounds like money's not an issue...

To put things in perspective, the use of Oracle doesn't guarantee a trouble-free operation, either.  I'm working on a client site using ACS Classic (ACS+Oracle), and Oracle hosed itself last week.  Crashed the database engine on a simple on-line "update", first from Tcl scripts then from SQL*Plus.  Each and every time.  We brought everything down, even rebooted Linux, and the crash remained.  Had to drop tables and restore from a dump, evil evil.  I've not had to do that yet with my Postgres-backed site that's been up for about a year.

Thank you, Don, for responding to my inquiry. Being the CTO at Celebrityblvd, and having a very good relationship with my associates, puts me in a position to make strong decisions like this, and I *am* willing to take this risk. I am aware of the problems/disasters that could ensue, and I'm also aware that this could be a very encouraging trend-setter. Naturally, I have my own selfish reasons for this - I am personally a proponent of anything open-source. It's great for technology, for society, for personal motivation and gratification, etc.

So now the biggest questions remain: What hardware architecture should I use, and how flexible is Karl's templating system (should I implement my own)? I want to do this right, to set an example for what an OpenACS site can be. I believe a strong representation of its capability would be beneficial to everyone in this community.

As far as hardware, I can't help. But in regards to the templating
system, I don't believe Karl's system is done yet, nor ported to
the ACS. ArsDigita is beginning to move over to a more
template-based system, but they're not there yet so most of the
work still has to be done with Tcl scripts. While they're messy to
edit, Tcl scripts will still give you the maximum flexibility. For any
serious site, they're the only way to go (right now). Unfortunately,
the downside is that they're hard to upgrade when new versions
of the ACS come out. Hopefully however, ArsDigita will have
abstracted out most of the stuff into templates by 4.0 so it won't
be as big an issue.
The hardware issue's a bit messy at the moment, because of the highly-publicized mistakes by Intel in the past year.

    I like dual CPU systems, though I only own one and it's not very heavily loaded. SMP under Linux has gotten a lot better recently. My system's really
    perky even when I try to load it down for fun.

    High-end coppermine PIIIs using a 133 MHz FSB only run with i840 and i820 chipsets. Due to bugs in the MTH that supports SDRAM, Intel had to
    recall all boards using it. Thus, you can only use i840 and i820 chipset-based motherboards with RDRAM, which costs nearly three times as much per
    byte than good 'ole PC100 SDRAM.

    Since you want lots and lots of RAM, this means an i840/i820-based solution is unattractive from a financial point of view.

    So, this leaves with a good 'ole BX board solution, running a 100 MHZ FSB processor. The sweet spot for price-performance is the P600 or P700
    (which is probably why you already have a P600).

    If it were me, I'd go with a dual P700/BX board, 512MB RAM minimum, and a couple of UW2 or UW160 10K SCSI drives, mirrored either in
    software or hardware.

    Such a machine has a LOT of capacity. Slashdot, for instance, ran on a dual P450 until last fall, and that is one busy site (it was also one slow site
    towards the end). By the time you outgrow something like a dual P700 server, you'll be able to afford whatever you want - that's my opinion.

    Of course, there's also the question as to whether or not you want to go the full route with hot swap power supplies, etc.

    As far as vendors, Dell's big Linux servers come with an Adaptec controller with a close-sourced driver, and said driver didn't link with a recompiled
    kernel when friends of mine tried to do so. They finally got everything figured out, but it was a pain in the rear. The problem might actually be RH's fault,
    not Dell's, but it was a pain regardless.

    And you need to recompile the kernel to get Postgres to use a big chunk of that RAM for its shared memory buffer.

    So perhaps VA Linux is a better source, perhaps someone here has experience with them and can comment.

    For overall component information, and system configuration information, check out the following website: www.tdl.com/~netex. Their site's a great
    resource.

I guess the short answer is figure out the sweet spot for price/performance regarding the CPU, buy lots of RAM, buy quality disks (I like IBM, and their UW2 9 GB disks are a real bargain at the moment, about $200 if you shop), decide if you want to spend the bucks  for hot-swap redundancy, then shop around and buy from someone who understands Linux.

Do you have specific questions?

I haven't played with Karl's stuff yet so can't help you on that one.

Don Baccus probably knows what he is talking about.  However, I have
never trusted SMP on Intel boxes; I don't think Intel has them quite
figured out yet.

For hardware, you might just want to buy a motherboard that supports
1GB RAM or more, and then cache everything you can, counting on that
to get you through any rough spots.  I saw a cheapo $99 motherboard
that supports up to 2GB at my local PC place just today, so they are
out there.  A lot of motherboards only support up to 768MB (3x256MB
DIMMs).

Remember that even if Postgres itself doesn't / can't use all the RAM,
the filesystem caching will still be in effect for reading.

I would think that you should be able to serve a decent amount with
your current machine.  Even if you are saturating a T1 completely, you
are using less than 20MB per minute of disk bandwidth between logging
and reading (10MB internal usage and 10MB reading/logging/db
searching).

If you are serving a "highly graphical" interface, that is a lot of
images, right?  Again, RAM might be the best answer - no need to hit
the disks.

The question of Karl's templating vs manual rewrites makes me ask the
question of how many pages you need?  What I mean is that you seem to
want bboard, ecommerce, homepages as the main modules that need to be
heavily modified for your interface.

I don't understand the comment about Intel and SMP.  Linux long had the reputation of having crappy SMP support due to scheduling issues, but I've not heard of any intrinsic problems with Intel SMP itself.

Now, older PCs of all varieties tended to be flakey by industrial standards.  But good BX/GX based boards have proved to be extremely reliable, both single and dual processors.  The i840 boards are apparently very reliable, too - in their (expensive) RDRAM version.

The main issue in regard to motherboards is the manufacturer.  ASUS is  has an extremely good reputation (HP uses them), Supermicro boards are really solid but have a rather kinky BIOS (I own two of them), Intel boards are conservatively built and solid.

Patrick does make a good point about memory support.  Make sure any board you look at supports four DIMM slots if you want maximum RAM support.

I think a bigger question is the design of the site and what sort of peak traffic you all expect to have to build for. I think it is unrealistic to have to limit yourself to one machine though and probably should be brought to the attention of whomever is in charge of your budget.  Upon initial perusual of your existing site, there are a lot of images/animations that could / should be moved to a separate server to reduce load.  I cant overstate the importance of design and testing to see if your planned implementation is upto the job.... good luck.
An update on Celebrityblvd.com's situation:
I have installed postgresql, aolserver, and the OpenACS suite on
our production machine here at the office, and everything seems
to be functional.  It was a lot easier than I thought it would be.  :)

We have outsourced all our images and animations to Akamai,
so they will not be served from our machine(s) at all.  It is
currently not feasible for us to expand to more than one machine,
but we are planning on it in the future.  (Speaking of which, how
is load balancing with multiple web servers accomplished, when
you only have one database server?  What are the networking
connections involved?  I've never looked into doing that.)

From the suggestions I've gotten (and thank you, everyone, for
being so helpful!), it seems that upgrading our RAM, to at least
512mb, is the best first step for us, since we have to stick with
our single 650mhz p3 for now.  Our next step will probably be to
get a dual p3/700 with 512mb to 1G of ram, and use that as the
database server.  Will it then be necessary to have a separate
webserver, even though we're not serving any images or
animations (just inital html/tcl/adp requests)?

This could be a big step for the postgres/OpenACS community,
in terms of researching/developing scalability.. I am also working
with the folks who are implementing a Fox movie community site
(quite a big venture) and we're going to attempt an OpenACS
installation, if all goes well with Celebrityblvd.com.  I'll keep you
all updated.  And meanwhile, if anyone knows any creative,
self-starting, intuitive web designers in the Los Angeles
(hollywood) area, send them our way.  :)

Take care, all.

The ACS has some (presumably primitive) hooks in it for load balancing, I doubt if anyone here's tried them out and I believe aD themselves have only used them on a handful of sites.

Jim Davidson of AOL gave a talk at a Tcl meeting a few months ago on the Digital City solution.  It is run with a surprisingly small number  of servers (when you consider the bazillions needed to run microsoft.com).  They've put a lot of energy into the design, figuring  out what database queries can be cached rather than executed each time, etc.  I think that's the kind of thing you'll need to be looking  at when your site gets busy, because the single database server is probably going to be the bottleneck.  The presentation may be up on the AOLserver site, I'm not sure - if not, Jim's registered at the site so you should be able to find his e-mail address and ask him where it exists.  This is the single most useful thing I can think of, because it goes  into the architecture of Digital City in decent detail.

Postgres 7.1 will have the ability to cache query plans, and if you do  so you'll be able to avoid the overhead of parsing and optimizing queries.  Since newer versions of the ACS will name queries, it should  be possible for us to make use of this feature, though it will be a memory hog.  Along with query caching, there's good potential for increasing the scalability of PG-backed websites down the road.

I have been evaluating OpenBSD as a platform, so if you are using
Linux or another OS you may see some differences.

However, on a 64MB/Cyrix 333 Mhz system I am using for testing, I get
nice numbers.

I used the ab program, which comes as part of the Apache web server.
While this is a simple, brain dead program, I have found it useful.

First, you HAVE to run PG with -F - it is 10 times or more slower.

On a brain dead one-table insert, I can handle about 100 - 120
inserts/second, sustained, into Postgres.  I might be able to handle
more, but it saturates the 10Mbps hub I have at that number of
requests.

I have not yet tested selects, figuring that inserts were more likely
to create a load on the database.

Thus, on a PII or PIII machine @ 650Mhz you should be able to handle
much more, perhaps 200 or more queries per second, less of course if
you are doing fancier selects.

You are very fortunate to be able to dump images etc into Akamai -
this frees up connection threads, since images take longer to download
and thus eat up a thread for that much longer.

"First, you HAVE to run PG with -F - it is 10 times or more slower. "

WHOA!  You really don't want to do this, as you have no claim of data integrity in this case.  -F turns off the fsynch'ing of modified data files after the end of each transaction.  If your system goes down, the database will be toast, guaranteed.

Now ... -F makes no difference for read-only selects, which form the vast majority of ACS queries.  And the ACS does its inserts in transactions so the -F penalty is not nearly as bad as worst-case testing would imply.

So in practice -F is only going to slow up a running ACS installation by, oh, 10% or so? unless people are doing tons of forum postings, etc.  Even then I bet it won't be as much as you fear.

A better approach to speed things up is to move index files to another  disk drive.  Currently this really sucks in PG because you need to move the files by hand and link to them in the database directory, but  people are working on implementing a tablespace-like facility to manage such things from within PG.  This wins for the same reason it wins with Oracle - it minimizes head travel.  Even though hand-moving and hand-linking sucks, it doesn't suck nearly as bad as guaranteeing your database will be permanently hosed due to a crash with -F enabled.

And work is also underweigh to implement write-ahead logging in PG.  This not only increases data integrity, but only the log will need to be fsynch'd at the end of transactions, rather than all datafiles modified by a transaction.  Since rational people will put the write-ahead log on its own platter, this will be a quick operation because the log will be written sequentially and will be the only file  open on the drive.

Of cousre, if you do nightly dumps and are perfectly comfortable with the high probability that you'll probably have to roll back to one of the dumps eventually, the use of -F is just fine.

But, if you're doing financial transactions or keeping data you don't want to roll back to a nightly backup, you certainly don't want to do this.

gee, Don, you know, I have had crashes / sudden power outages on test
boxes and have not had that problem.  I assume that when the update
daemon runs it syncs the disk(s), and since it runs every 30 seconds
or so you would be more likely to have lost the last 30 seconds -
however, even THAT would be enough to screw up stuff, for sure.

I will definitely re-test with a SELECT only .adp file and see if
there is a difference.  I unintentionally ended up testing with a
worse-case scenario in my posting above.

./././././.patrick

You've been lucky - and I'm glad you have, of course...

Actually, hosing isn't 100% guaranteed.  When the operating system does the synch at its leisure (the 30 second standard linux interval),  it writes blocks out in any order it wants to (hopefully optimizing disk access!).

If it writes out the transaction log before the datafiles during this process then you'll have transactions logged as complete without the data really being there.  Not a good time to crash.

I'm sure there are plenty of other ordering issues that come up.

The key is that PG uses fsynch to ensure that datafiles are synch'd before the transaction log.

Of course, if you're minimizing your risks by using a UPS that can tell Linux when to shut down, and are running RAID 1, your odds of not  experiencing a hosed database go up a lot.

And, of course, if you're willing to lose a day's data (going back to your most recent backup), -F might well be worth it.

It just makes me cringe a bit :)

Now, until 6.5 came out, even read-only selects fsynch'd causing the transaction log to needlessly by synch'd, causing one's disk drive to go absolutely nuts under load.  I pointed that out, and it got fixed for that version.  Seems that none of the developers ever used it in our favorite way (i.e. backing a website) and hadn't noticed ...  -F REALLY improved performance for ACS-like scenarios back then!

I'll address #2, the hardware platform:

I disagree with the fellow who had "issues" with SMP Intel systems. I have been running dual processor linux machines for several years (in production environments). If you pick decent motherboards, and use recent kernels, you'll be just fine. Stock RH 6.2 comes with an SMP-friendly kernel.

The boards I've had extensive experience with are:

Supermicro P6DBE (BX board)
Supermicro P6DGS (GX board)
Intel 810EAL (i810 board)


I've been using these in production environments under Linux, NT4, and Win2K Server with current uptimes in the hundreds of days. For SMP machines, I like the GX board a little better since it has an integrated dual channel SCSI controller (Adaptec 789X) and supports 2GB of SDRAM.

Avoid Intel 820/840 boards like the plague unless you are going to spring for RDRAM. I took the plunge a few months ago and had *major* problems with *every* 820/840 board tested with SDRAM. They are ALL extremely unstable.

You'll also want to spring for a nice hardware RAID card like the Mylex ExtremeRAID 1100. I'm using one of these with 9 18gig Seagate Cheetahs in RAID 5 mode. It's not as fast as software RAID, but it seems to be less fragile. The drivers for this card are mature and it is the speed king for hardware RAID on Linux these days.

Those comments are directed at the database machine. For the web servers, you can just buy a pile of the 1RU boxes from rackmount.com (my favourite), penguin computing, etc. and load balance them behind an appliance like a Cisco LocalDirector or Foundry ServerIron (my favourite...it's cheaper and faster). Load the machines up with as much memory as they'll hold so they cache everything and consider a cheap IDE software RAID0 of 2 disks for the disk subsystem. It's cheap, and it's "reliable enough" and "fast enough" if you have a stack of them load balanced. If you lose a disk, your only real important data on that box is the webserver and security logs since the last backup. If that's a major trauma, use RAID1 and take the performance hit. These can be had for about $2k each.

Here's my current setup. I've been VERY happy and it has scaled VERY well. I'm not using ACS in production yet or postgres, but I'm planning on "repurposing" this gear for that soon.

Database box:

Supermicro P6DBE
2 x PIII 850mhz (coppermine) w/256k cache each
1GB ECC SDRAM
Mylex ExtremeRAID 1100 card
9 x 18gig Seagate Cheetahs in external enclosure (using 3 separate SCSI channels on Mylex card...3 on each...one disk is designated a hot spare)
Intel 100BT ether card
Matrox G200 video card
some cheap Toshiba ATAPI CDROM
some cheap TEAC floppy drive
stuffed in 4RU case for computer and 4RU case for disks (bought cases with lots of fans from rackmount.com)

The above machine has been up since early April and I haven't even needed to login to it. Rock solid.

Web Servers (using 5 at the moment):

Intel 810EAL motherboard (integrated 100BT ether, audio and video)
PIII 733 processor (133mhz bus)
512mb non-ECC SDRAM
cheap Quantum 20.5gig 7200RPM IDE drive
(going to being doing software RAID0 with 2 of these soon to improve speed of reads when cache misses and writes of logfiles)
integrated 3.5" floopy & 32X CD-ROM (nice for 1U boxen)
1RU case with 2 "hot swappable" drive trays

These are all sitting behind a Foundry ServerIron load balancing appliance and have not been rebooted since early April. Again, rock solid.

I've never had more than 400-500 simultaneous users, but even then the machines were not anywhere close to capacity. A planned upgrade for the near future is to add a 2nd database machine and switch to Oracle (so I can do useful synchronization for some level of fault tolerance on the RDBMS side).

Hope this helped.

Cheers,
Great post by Chris, he does mention one thing I forgot:

"(bought cases with lots of fans from rackmount.com)"

My SMP box has three case fans (it came with two), plus disk coolers for the two fast SCSI disk.  Noisy as hell.  I don't care, it's downtown, not in my house!  Fans are something too many people ignore,  aggressive cooling can really increase reliability and extra fans can  save your box when (as is inevitable) a fan dies.  If you only have one fan, and it dies, you're in trouble.

I don't think you can buy i820/i840 with the MTH (SDRAM) anymore, Chris's experience is why Intel recalled all i820/i840 mobos with the MTH and shipped folks boards with RDRAM (including people who bought from sources other than Intel).  Manufacturers should only be offering  RDRAM solutions - which are still expensive (around 2-3x the price of PC100 at the moment).  I was warned off MTH+SDRAM solutions before the MTH bug and subsequent recall event, because their performance sucked compared to GX boards (even with expensive RDRAM, GX boards stack up very well), so I managed to avoid Chris's negative experiences with such boards.  Chris has my sympathy, Intel really screwed up.

I've never used the Intel 810-based board Chris mentions, but my linux development machine at home is built around an equivalent SuperMicro 810E board (the 810E supports 133MHz FSB Coppermines as well as older Katmai PIII and non-E (100 MHz FSB) Coppermines), and I've been very pleased.  810E and its replacement, the 815E (supports PC133 memory, 810E only supports PC100), both have integrated video.  My linux development  machine is "headless" (I have an X-server on my windows box) so I only needed bare-boned video in order to install linux.  The integration saved me $25.  If you're building 1U boxes like Chris, more importantly it saves you a slot for a video card.

The gamers who  run most hardware review sites loath these boards but they're great for building a cheap, reliable machine if you're not planning to devote your life  to playing Quake on it.  These boards only support socket 7, not slot one, though - Celerons, Celeron IIs (Coppermine core), and Coppermine FC-PGA packaged PIIIs (up to about 733MHz are available in socket form, I think, at the moment).  The socket configuration helps keep the resulting machine tiny and tidy.

Don, I assume you meant Socket 370? Socket 7 is P55 (MMX Pentium) and AMD K6 family.
Yes, I did.
This is way, way off-topic, but those following this thread seem to be hardware wonks...

I've just cobbled together a new athlon server to play with OpenACS. The system works like a charm, but the power supply fan sounds like a helicopter. Not good, as it's sitting under my desk & driving me k-nuts.

Does anyone have experience with quieter ATX, 300W power supplies, that are Athlon-rated?

My concern is that those which are truly "whisper-quiet" (like the PC Power & Cooling models) won't push enough air. And i'll burn my pad down.

thanks.

Adam,
Every case I have (including all machines I built when at AD/LA) are PC Power & Cooling. I wouldn't think of buying any other cases. I am only waiting for their rack mount version and I won't buy any other cases.

In short, the quiet ones push more air than the cheap pieces of crap you buy with your imported case anyway. Don't worry about it!

<blockquote>>In short, the quiet ones push more air than the cheap pieces of crap you buy with your imported case anyway.<<
</blockquote>

Whoa there Jon... both the good and bad cases and power supplies are almost invariably made overseas. It's just a question of how much the purchaser wanted to pay to get better quality.

I travel quite a lot between the US and Taiwan (and I'm not from Taiwan), and what never ceases to amaze me is how much you see in a computer is actually sourced from Taiwan, the PRC, etc. And it isn't necessarily because of cost (although it often is).

I won't retract what I said,

If you want a great fan buy one made in the good ol' US of A (i.e. PC Power & Cooling). There may be exceptions (although I don't know of any).

This isn't a slight on Taiwan, it is simply fact. Taiwan is into mass production and products for the masses. That is ok. PC P&C is into building the best fans,cases, power supplies and etc. and don't market to the Fry's/Consumer crowd.

Well, after turning my back on my machine for 5 minutes today, I returned to find the ASUS monitor in my face complaining that the CPU was spinning at zero RPM. And there was a funny smell eminating from the case. The _case_ was hot to the touch, the CPU was positively glowing.

Yes, the _official_ AMD fan that came with my boxed athlon died a spectacular death, and nearly took the system/apartment/subdivision with it.

Now, I don't want to go off on a rant here, but isn't the reason that you buy a boxed processor so you'll get the "high quality" fans that only the manufacturer can provide? Yecch. I'd like to know where AMD gets their fans.

Anyway, since I was in a bind & needed it back up & running, I replaced both the CPU fan and the case power supply with Antec power supplies (an American company I think, all stuff sourced from China, AFAIK.) CPU is at a constant 56C, the machine is relatively quiet. No funny smells. I am happy.

ps -- I thought one big advantage of ATX motherboards was that when the shit hits the fan (no pun intended) and the machine overheats, it will gracefully shut itself down. Didn't happen here, even with the ASUS tools installed.

I won't retract what I said,

If you want a great fan buy one made in the good ol' US of A (i.e. PC Power & Cooling). There may be exceptions (although I don't know of any).

Well, I wasn't trying to start a bun fight with you by asking you to retract anything... I was merely just trying to ah, educate you by pointing out that the real world is a bit more complicated. You'd understand if you actually worked within the electronics supply/manufacturing industry.

For example, are you certain that the fan you like is in fact made in the USA? Hmmm... nothing on their site tells me that this is the case. But just suppose there is a sticker on it that says it is made in the USA. Well, what about the components? In fact, the content and labor requirements to qualify something as made in the USA are surprisingly complex and low. See http://www.ftc.gov/bcp/conline/pubs/buspubs/madeusa.htm for more information.

The reality of the situation is that almost no manufacturer is vertically integrated enough to make anything in-house these days.

It's really got nothing to do with "mass market" products. Niche products and the components in them are actually more likely to be sourced overseas because the low volumes make it prohibitively expensive to make them in-house. For example, the tooling required for just the fan-blade can easily run to US$20,000. You'd have to sell an awful lot of fans to justify this alone. And this isn't even counting the tooling cost of the connectors, PCB, etc.

Anyway, rather than pollute this thread with something off-topic, I'll stop now. Just email me if you'd like to know the dirty secrets of the electronics supply/manufacturing industry.

Well, Adam ... one of the nice things about the modern era is that you  can buy computers vastly more powerful than any that existed in the world 20 years ago, and pay thousands, not millions, of dollars.

One of the downsides is that no one does extensive burn-in testing to catch infant mortality, unfortunately leaving that to you.  Sounds like you did a good job :)  Computer companies used to burn in entire systems (well, high-end companies still do, actually, but back when even minicomputers were expensive everyone did).

Of course, things were also a lot less reliable back then ... all those separate components to fail.

I don't know if AMD uses decent or totally shitty fans with their boxed CPUs, since I've never owned one (the DDR SMP machines that are just about to come out a whetting my appetite, though).  But even good fans suffer a certain level of infant mortality.

Agree 100% Don; we're all responsible for our own burn-in these days. Even with so-called "high-end" machines (a Dell rep informed me that my $4000 laptop underwent no burn-in at the factory, because demand was so high it was cheaper for them to crank out as many as possible & just fix the defective ones later. Explicitly forcing the customer to do QA.)

I've never had a CPU fan fail catastrophically before. First time for everything, i guess. And it happened within 72 hours of being switched on, so I can't be too bitter.

I am annoyed about the "china-syndrome" series of mishaps that prevented the computer from shutting down automatically when the bad fan was detected. I guess asking windows + my mobo manufacturer's shutdown utility to work properly is too much to ask 😊

It's a good reminder, though, not to leave new hardware unattended.

FWIW,

Two years ago I had a few conversations with a guy that built 3-5 machines per week for gamers and local business customers.  As you know, gamers typically like higher end machines with higher quality parts -- and **performance** of course!

The guy told me his Intel rep told him that the difference between the CPUs Intel sells in the "retail box" and the ones in the OEM/white boxes was that the ones in the retail boxes passed Intel's QA testing the first time.  The rest of the CPUs that are salable get sold in OEM/white boxes after being reworked/fixed (maybe downgraded for speed rating??) and passing QA a subsequent time.  This, the guy said, is why the retail boxed Intel CPUs cost a few bucks more.

This is second hand word of mouth, but I do think the guy was relaying truthfully what an Intel rep told him.  The guy worked at a place where lots of gamers play each other, renting high end machines by the hour.  The guy also would burn in the machines he built for customers for two or three days before delivering them to customers -- can't remember if it was two or three days.

Just figured I would add this for those who might care.

FWIW,

Louis

To put it mildly, your guy's full of it.  He's been smoking too much silicon thermal paste.

For starters, the kind of bin-sorting he describes is done for *all* CPUs, otherwise you wouldn't have the number of speed steps available that are available.  Secondly, Intel juggles the numbers released in various speed categories in order to help prop up the price of higher speed parts.  This is one reason they introduced clock-locking, such a  high percentage of certain Pentium vintages ran well at higher-than-rated speeds that remarking became a real problem.  This is one reason why so many of the earlier, lower speed (i.e. 500 MHz) 100 MHz FSB Coppermine run just fine plugged into a 133 MHz FSB motherboard.

Hi Don,

Could be.  Maybe the Intel salesman was just trying to assure his customer, the guy I spoke with who built systems using Intel CPUs, that paying more for the retail box CPUs was justified.  I don't know.

The subject came up in discussion of why the retail box cost more than the OEM/white box CPUs.  And what one gets for the price difference.

I have no direct knowledge on the difference.  I figured I would pass along the above tidbit FWIW since the thread dealt with hardware.

Take care,

Louis

You have to ask yourself, "why would Intel want to screw its high-volume OEM purchasers by selling them substandard parts"?

I can't think of any particular reason why Intel would want to sell higher-quality parts in retail boxes to people who are frequently going to overclock and otherwise abuse the CPU...that user will buy a CPU every couple of years, while the high-end OEMs will buy CPUs by the pallet-load.

Hello

Do you know of anybody that want to supply this board below:

Supermicro P6DBE

I can use 3-5 Boards
Let me know, Thanks
TM/Dantec