Experience job configuration

2010-04-02  来源:本站原创  分类:Java  人气:204 

We often encounter a problem that clearly links to other sites so much, why do we have so little to crawl to the?
Or that such a snail crawl rate? Download links are not what we want?
Here we have a little bit to solve!
Download link is too little too narrow domain restrictions, such as restrictions in DecidingScope case, if the hash entry in the other two domain names, we can not extract to this link, resulting in little things we downloaded to personal recommendations with BroadScope
However, if used broadscope then downloaded are too, because he did not make any restrictions! A lot of things are not what we want, such as js, css, jpg, etc. We need to expand its interface Extractor or Scheduler
But to expand this interface is a very troublesome problem, heritrix principles we all know, judging by the link scheduler to download, let go after the resolution inside the URL, so we all eventually find the whole page to download all the URL, to customize a post- regular, must be progressive layers, and can not fault. This can be quickly downloaded to the page we need it! I suggest using Scheduler, because Extractor Extract url themselves often have to write because of the positive result is not satisfactory to extract a small URL!

相关文章
  • Experience of configuration management 2011-07-02

    1 is a horizontal configuration item, version management and change management is the vertical (2) use the tools in our hands to achieve some of our software engineering ideas, a number of management thinking, some of the subtle processes to be fixed

  • Experience job configuration 2010-04-02

    We often encounter a problem that clearly links to other sites so much, why do we have so little to crawl to the? Or that such a snail crawl rate? Download links are not what we want? Here we have a little bit to solve! Download link is too little to

  • Taobao java engineer recruitment 2009-10-27

    If you want to find a place where old-age do not come on, Directed at the money you want to met on the other. Education is not a problem, Did not graduate from primary school on the No talk (except self-taught), A waste of their clinics to save time.

  • [Reprint] [reference] NoSQL database conversation by writing [Author: Yan On] 2010-09-29

    Reprinted: NoSQL Database conversation by writing of: Yen On Written Discussion NoSQL database Yan Open v0.2 2010.2 Sequence Thought papers CAP Eventual consistency Variant BASE Other I / O's five-minute rule Do not delete data RAM is the hard disk,

  • 2011 IBM Rational Competitiveness Forum is waiting for you take a seat 2011-05-17

    Preview] [Aspect senior expert speakers will bring you six themes: project management, requirements management, architecture management, change and configuration management and quality management. IBM Rational products on the spot more than a dozen s

  • OpenSessionInViewFilter configuration experience 2008-11-15

    Why use OpenSessionInViewFilter to solve the Hibernate lazy = "false" the problem Why is there lazy = "false" because there are many-to-one Hibernate lazy = "false" low efficiency, and that changed Hibernate lazy = "prox

  • [Tomcat configuration experience] Development Server TO Production Server 2010-06-23

    Tomcat management person with limited experience, this article if inappropriate, please correct me:) Modify conf / server.xml, 8080 Port -> 80 port: <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" r

  • sybase database configuration experience exchange 2010-06-25

    Sybase configuration parameters sp_configure 'max online engines', 4 go - Configure the number of boot cpu sp_configure 'number of engines at startup', 4 go - Configure the maximum memory number sp_configure 'max memory', 2097151 go - The largest all

  • An internal simple system - the experience of persistent configuration center 2010-07-26

    Company has a distribution center, by way of push push configuration to the client, the Runtime data and persistent data, all using the same approach. The whole system is a cluster. There are also persistent data on each node, copy data between nodes

  • E-mail alerts cacti configuration skills some experience 2010-10-11

    Create Date: 2010-10-10 Environment: OS: Windows Server 2008 Standard Cacti Version: 0.8.7e PIA: 2.6 MySQL: mysql-essential-5.1.50-winx64 PHP: php-5.2.14-nts-win32 Configuration before the symptoms and causes of failure: Symptoms: All thold configuration

  • Android configuration experience 2010-12-14

    Android programming ready to learn today, did not expect difficulties in the beginning Android's SDK configuration took me out faint (Note: I met most of the problems encountered with the different online), I downloaded the SDK, the solution pressure

  • Svn configuration that their learning experience (a) 2009-06-25

    Today a little time, thought up doing the svn project management configuration, this is only used, but has not been true to their configured, today made a simple local configuration, feel to write here; Steps: 1 First, download and install the Subver

  • Svn configuration that their learning experience (II) 2009-06-26

    Yesterday, the svn server is configured to the machine, only use the machine today, to find some information on the Internet, with the apache server, configure the svn to the web above; The following configuration steps: 1 first download the apache s

  • Some of the web.xml configuration experience. Including mime-mapping 2010-08-31

    1. Specify their javaEncoding (Reference http://gceclub.sun.com.cn/staticcontent/html/sunone/app7/app7-dg-webapp/ch6/ch6-4.html <servlet> <servlet-name>jsp</servlet-name> <servlet-class>org.apache.jasper.servlet.JspServlet</serv

  • [Reprint] [experience] Alchemy environment configuration details 2010-12-30

    http://bbs.9ria.com/viewthread.php?tid=71368&extra=page% 3D1% 26amp; orderby% 3Ddateline% 26amp; filter% 3D86400 Posts by rison at 2010-12-29 21:43 Editor Official Tutorials Address: http://labs.adobe.com/wiki/index.php/Alchemy:Documentation:Getting_

  • Manual configuration in UBUNTU under ruby on rails environment 2009-03-05

    Careless mistake for the day before yesterday, the sources, the results after 810 error after the upgrade, the loss of response button. On google found a lot of trouble really, lucky point modification under the / etc/X11/xorg.conf to restore both, b

  • Database experience of JDBC to connect skills Highlights 2009-04-17

    Database experience of JDBC to connect skills Highlights 2005-01-21 10:52 Author: Ai92 Source: csdnblog duty Edit: Ark Java Database Connectivity (JDBC) from a group of Java programming language used to prepare the class and interface. JDBC as a tool

  • Ext Js use experience 2009-04-19

    Ext Js use experience 1. Using Ext add a pop-up window (components), the window (components) of the configuration object in the id and the page if the property on an element E of the id the same, the components will be played up to the elements of E

  • On the software configuration management of the human factor 2009-04-21

    [Abstract] August 2008 to December 2008, I participated in a provincial-level wireless operators online business hall three development projects and served as project manager jobs. Online Business Office was the wireless operator's provincial Interne

  • JBoss3.0 under the configuration and deployment EJB brief introduction 2009-05-04

    (By huihoo.org ZHAO Chen Greek, [email protected]) 1. About JBoss JBoss is an EJB running the J2EE application server. It is open-source projects, following the latest J2EE specification. JBoss project from the beginning until now, it has from