Since its release study of the embedded browser to run for some time, and we have not done detailed statistics, the proposed requirements, require more detailed statistics.
Browser through a proxy server to access Web content, the proxy server to do a conversion, to convert html page to return proprietary binary protocol browser in order to save traffic and speed up the browsing speed. We need more statistics need only the proxy server for processing.
Need to consider is how to record user access to data, the company now has a data warehouse and data analysis systems, Ye have a special of staff to provide data analysis. So the first consideration of the program are applied directly to the user's HTTP access to records written to the database, and then analyzed by the data warehouse into the data warehouse staff time for data analysis. But given the current data warehouse processing and analysis of data requires a longer time, statistical data, there is a certain lack of immediacy.
Ultimately decided the way to the log file records using apache's http log format:
"% H% l% u% t \"% r \ "%> s% b \"% (Referer) i \ "\"% (User-Agent) i \ ""
Because it will visit a number of different sites, so an increase in the top Host entry, as follows:
"% Host% h% l% u% t \"% r \ "%> s% b \"% (Referer) i \ "\"% (User-Agent) i \ ""
Then carried out directly by Awstats log analysis.
Because the proxy service to deploy a number of nodes, and distributed in different IDC, it also faces the problem dealt with how to merge the log, now only for a single server logs to do the merge, the data on different servers can not merge.
But the http logs can be easily imported into the data warehouse, so the overall statistical analysis of the log can consider the late stage of the data warehouse.