First of all, I am referring to dynamic data clearly what.
The following to business.
Simple static pages to crawl through the Java access to the html source code, then analyzes the source code you can get the desired information. Such as access to Hangzhou, China weather network weather, only to find the corresponding html page ( http://www.weather.com.cn/weather/101210101.shtml ).
If I need to enter the city name for the city to change the weather data source or the use of Chinese weather network. The first step is to find the corresponding page according to the city. Simple analysis shows that the city has the corresponding URL with the page, such as the corresponding 101 210 101 Hangzhou, so the key is to find the program correspondence between the city and the page.
Found that the site search box are links to most cities in China, can be correspondence between the city and _id. To find a breakthrough, into action. Into the home, view its source code, find the search box location.
Can do now is to use Chrome to copy the html file and then parse the file by the city and the URL of the relationship. The problem is if the site URL corresponding relationship between the city and there are changes, this is a passive process needs to change.