如何利用java挪用python下载网页

本篇参考：http://tonl.iteye.com/blog/1918245

python版本：2.7 64bit window版本；

下载python：http://www.python.org/getit/

Python 2.7.5 Windows X86-64 Installer (Windows AMD64 / Intel 64 / X86-64 binary [1] — does not include source)，举办安装：

首先编写下面的spider.py剧本：

# -*- coding: utf-8 -*-  
#import urllib2  
from urllib import urlopen  
import os  
import sys  
       
class Spider:  
    """ 
    download web site from the given file 
    """
    def __init__(self,filename,downloadPath):  
        """ 
        init the filename ,if the filename is not raise a error 
        """
        if not os.path.isfile(filename):  
            print 'the given file does not exist,the program will exit'
            sys.exit(0)  
        else:  
            self.fname=filename  
        if not os.path.isdir(downloadPath):  
            print 'the given download path does not exist ,the programe will exit'
        else:  
            self.dpath=downloadPath  
    def download(self):  
        """ 
        download the web site from the given file by line 
        """
        fp=open(self.fname,'r')  
        while True:  
            line=fp.readline()  
            if not line:  
                break
            if 'html' in line:  
                tempname=filter(str.isalnum,line).replace('html','.html')  
            else:  
                tempname=filter(str.isalnum,line)+'.html'
            self.download_html(line,self.dpath+'\\'+tempname)  
        fp.close()  
       
    def download_html(self,website,filename):  
        """ 
        download the html by the given web site and save to name 
        """
        response=urlopen(website)  
        data=response.read()  
        fp=file(filename,'a+')  
        fp.write(data)  
        fp.close()  
       
def test():  
    """ 
    test program 
    """
    filename=sys.argv[1]  
    downloadPath=sys.argv[2]  
    spider=Spider(filename,downloadPath)  
    spider.download()  
               
if __name__ =='__main__': test()

上面的剧本，要输入两个参数，一个是要下载的网页的地点文件，名目一般如下（websites.txt）：

查察本栏目

http://blog.csdn.net/fansy1990  
http://www.baidu.com

别的一个参数是下载的网页的存放所在。

然后可以在呼吁行运行：

python D:\\spider.py D:\\websites.txt D:\\download_tmp

然后到D盘的download_tmp下面查找下载的文件，假如找到，则说明设置正确；

最后编写下面的java措施，需要导入jython-*.jar包（lz下载的是2.2的）：

package test;  
       
import java.io.IOException;  
       
public class PyTest {  
       
    /** 
     * @param args 
     * @throws IOException  
     * @throws InterruptedException  
     */
    public static void main(String[] args) throws IOException, InterruptedException {     
          String py_path="D:\\spider.py";  
          String websites="D:\\websites.txt";  
          String outDir="D:\\tmp";  
          //   
          Process pr=Runtime.getRuntime().exec("python "+py_path+" "+websites+" "+outDir );  
          pr.waitFor();  
          System.out.println("done ...");  
    }  
       
}

运行上面的呼吁，需要配置eclipse中的Environment属性，添加一个PATH变量，值是python的安装目次；

运行后，会提示：

*sys-package-mgr*: can’t create package cache dir, *jython-2.2.jar\cachedir\packages’

这个可以不消管，不会影响措施运行。

当前位置：以往代写 > JAVA 教程 >如何利用java挪用python下载网页