Large-scale site architecture have to consider 10 issues

2010-03-14  来源:本站原创  分类:Internet  人气:243 

Transfer from:

Here's a large site architecture includes only high interactivity highly interactive, data-driven large-scale Web site, based on all the well-known reasons, we do not rely on a number of news categories and static HTML structure can be achieved, and our high-load high data exchange of high data mobility site, for example, such as at home and, happy family of network architecture and the like web2.0. We do not discuss here is the PHP or JSP or. NET environment, we see a problem from the aspect of architecture, not a problem to achieve language, language advantage is achieved, not good or bad, whether you choose any language, structure are necessary to face.

Here to discuss the major sites and the consideration to note

1, massive data processing

As we all know, for some relatively small sites, the data is not very large, select and update can solve our problems, their load is not great, plus up to a few index can handle that. For large Web sites, the daily amount of data may be millions, if a poorly designed-many Guanxi, Zai There is no problem of early, but the As subscribers have grown, the amount of data will be Jihe level of growth. At this time we select a table and update the time (not that many table join) the cost is very high.

2, concurrent processing of data

In some time, have a CTO 2.0 undertaking, is the cache. For caching at high concurrency is high time to deal with a big problem. Zai the application, the cache is globally shared, and but in us Xiugai of when, if two or more urge the cache has Gengxin of Yaoqiu the case, the application will Zhijie of the die. This time, you need a good strategy for concurrent processing of data and caching strategies.

Another is a database deadlock, perhaps normally we do not feel, the deadlock in the case of high concurrent appearance of probability is very high, the disk cache is a big problem.

3, file storage issue

Some support for the 2.0 file upload sites, increasing hard drive capacity in the fortunate when we should consider that more documents are stored and how to be a valid index. Common program of the document by date and type of storage. But when the file content is vast amounts of data in the case, if a hard disk storage of 500 G the trivial file, then the maintenance and use of the time when the disk of Io is a huge problem, even if your bandwidth sufficient, but you The disk may not respond to come. If this time is also involved in uploading, the disk is very easy to over the.

Perhaps raid and dedicated storage server to solve immediate problems, but there is a problem around the access problem, maybe our server in Beijing, Yunnan or Xinjiang may access speed how to solve? If so distributed, then we The file structure of the index and how to plan.

So we have to admit, the file storage is a very easy question

4, the relationship between the data processing

We can easily comply with the planning of a third paradigm of the database, which is filled with many to many relationships, but also with the GUID to replace INDENTIFY COLUMN, however, full of the 2.0-many relationships, the third paradigm is the first one should be abandoned. To be effective over the table to minimize the joint inquiry.

5, data indexing problem

As we all know, the index is to improve the efficiency of query the database of the most the most affordable easiest solution. However, in the case of high-UPDATE, update and delete the cost will be high not think about it, I encountered a situation where the index is updated when a focused 10 minutes to complete, then for the site, these basic is intolerable.

Index and update are a natural enemy, the problem A, D, E of these is that we have to consider when doing architecture issues, and also may be the most time-consuming problem.

6, distributed processing

For the 2.0 site because of its highly interactive, CDN to achieve the effect of essentially 0, the content is updated in real time, our normal processing. In order to ensure access throughout the speed, we need to face a great problem is how to effectively synchronize and update of data to achieve real-time communication around the server has to be considered there is a problem.

7, Ajax Advantages and Disadvantages

Success AJAX, failure is also AJAX, AJAX has become a mainstream trend, suddenly found the post and get on XMLHTTP is so easy. Clients get or post data to the server, the server receives the data request to return to, this is a normal AJAX request. However, when dealing with AJAX, if we use a packet capture tool, then return and processing of data is clear. Large number of calculations for the AJAX request, we can construct a hair charter, can easily kill a webserver.

8, data security analysis

For the HTTP protocol, the data packets are transmitted in the clear, perhaps we can say that we can use encryption ah, but the problem for G is then the process of encryption may be clear of (for example, we know that the QQ, can be easily judgments of his encryption, and effectively write a the same with him out of the encryption and decryption methods). When your site traffic is not great when no one will care about you, but when you come up after discharge, then the so-called plug-in, the so-called mass would follow (from the beginning of mass qq can be seen). Maybe we can be the meaning of that, we can determine with a higher level to achieve even HTTPS, note that when you do these treatments will be paid when the mass of the database, io, and CPU costs. For some mass is basically impossible. I have been able to achieve for Baidu space and the mass of the qq space. We are willing to try, is not really difficult.

9, data synchronization and clustering issues addressed

When our one databaseserver overwhelmed at this point, we need to do a database load and clusters based on the. But this time may be the most nagging problems, and data transmission based on network design according to the different databases, data delay is very scary issue, but also the inevitable question, so we need other means to ensure that a longer delay of several seconds or minutes of time to achieve effective interaction. Such as data hashing, segmentation, content processing and so on.

10, data sharing and OPENAPI trend channel

Openapi has become an inevitable trend, from google, facebook, myspace to at home and school, are considering this issue, it can be more effective to retain customers and stimulate more interest in the user and to enable more people to help you do the most effective development. This time an effective platform for data sharing, data has become essential to an open platform approach, while the situation in the open interfaces to ensure data security and performance, but also a serious consideration that we have to question.