【Python】py3.6请求网站时报错:http.client.RemoteDisconnected: Remote end closed connection without response

爬虫代码报错: http.client.RemoteDisconnected: Remote end closed connection without response

原因:服务器限制了User-Agent的访问。

1.什么是user-agent?

 有一些网站不喜欢被爬虫程序访问,所以会检测连接对象,如果是爬虫程序,也就是非人点击访问,它就会不让你继续访问,所以为了要让程序可以正常运行,需要隐藏自己的爬虫程序的身份。此时,我们就可以通过设置User Agent的来达到隐藏身份的目的,User Agent的中文名为用户代理,简称UA。

    User Agent存放于Headers中,服务器就是通过查看Headers中的User Agent来判断是谁在访问。在Python中,如果不设置User Agent,程序将使用默认的参数,那么这个User Agent就会有Python的字样,如果服务器检查User Agent,那么没有设置User Agent的Python程序将无法正常访问网站。

    Python允许我们修改这个User Agent来模拟浏览器访问,它的强大毋庸置疑。

2. 如何突破限制?

答案是生成随机的User-Agent,即随机从预定义的user-agent中取出一个使用。常见的user-agent列表:

from random import randint

USER_AGENTS = [
 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
 "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
 "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)",
 "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)",
 "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)",
 "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)",
 "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6",
 "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1",
 "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0",
 "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5",
 "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6",
 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",
 "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
]

    random_agent = USER_AGENTS[randint(0, len(USER_AGENTS)-1)]
    headers = {
        'User-Agent':random_agent,
     }

注意,random.randint()函数的取值是个闭区间[a, b], 也就是b也能取到。

如果这样仍然会报错,那么就要考虑网站是不是封了你的ip。随机ip设置可参考:https://blog.csdn.net/c406495762/article/details/60137956

 

发布了392 篇原创文章 · 获赞 492 · 访问量 241万+
展开阅读全文

Python+Selenium打不开IEDriver,是什么原因呐,求助!!

01-10

【环境信息】 Python3.6+Selenium3.0.2+IE10+win7 【问题描述】 1、用如下代码打不开IE浏览器,但是能打开火狐浏览器 import unittest import os from selenium import webdriver class TestAutoMethods(unittest.TestCase): #打开Firefox浏览器 def test_openbrower(self): browser = webdriver.Firefox() browser.get("http://www.baidu.com") def test_FirstVase(self): #ie_driver = os.path.abspath(r"C:\Program Files(x86)\Internet Explorer\IEDriverServer.exe") #os.environ["webdriver.ie.driver"] = ie_driver browser = webdriver.Ie() browser.get("http://www.youdao.com") if __name__ == '__main__': unittest.main() 2、报错信息 Error Traceback (most recent call last): File "D:\Users\chenle\PycharmProjects\untitled\test\FirstExam.py", line 14, in test_FirstVase browser = webdriver.Ie() File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\ie\webdriver.py", line 57, in __init__ desired_capabilities=capabilities) File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 92, in __init__ self.start_session(desired_capabilities, browser_profile) File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 179, in start_session response = self.execute(Command.NEW_SESSION, capabilities) File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 234, in execute response = self.command_executor.execute(driver_command, params) File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 408, in execute return self._request(command_info[0], url, body=data) File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 478, in _request resp = opener.open(request, timeout=self._timeout) File "C:\Program Files\Python36\lib\urllib\request.py", line 526, in open response = self._open(req, data) File "C:\Program Files\Python36\lib\urllib\request.py", line 544, in _open '_open', req) File "C:\Program Files\Python36\lib\urllib\request.py", line 504, in _call_chain result = func(*args) File "C:\Program Files\Python36\lib\urllib\request.py", line 1346, in http_open return self.do_open(http.client.HTTPConnection, req) File "C:\Program Files\Python36\lib\urllib\request.py", line 1321, in do_open r = h.getresponse() File "C:\Program Files\Python36\lib\http\client.py", line 1331, in getresponse response.begin() File "C:\Program Files\Python36\lib\http\client.py", line 297, in begin version, status, reason = self._read_status() File "C:\Program Files\Python36\lib\http\client.py", line 266, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response 问答

Python3.6+Selenium打不开Ie浏览器

01-10

【环境信息】 Python3.6+Selenium3.0.2+IE10+win7 【问题描述】 1、用如下代码打不开IE浏览器,但是能打开火狐浏览器 import unittest import os from selenium import webdriver class TestAutoMethods(unittest.TestCase): #打开Firefox浏览器 def test_openbrower(self): browser = webdriver.Firefox() browser.get("http://www.baidu.com") def test_FirstVase(self): #ie_driver = os.path.abspath(r"C:\Program Files(x86)\Internet Explorer\IEDriverServer.exe") #os.environ["webdriver.ie.driver"] = ie_driver browser = webdriver.Ie() browser.get("http://www.youdao.com") if __name__ == '__main__': unittest.main() 2、报错信息 复制代码 Error Traceback (most recent call last): File "D:\Users\chenle\PycharmProjects\untitled\test\FirstExam.py", line 14, in test_FirstVase browser = webdriver.Ie() File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\ie\webdriver.py", line 57, in __init__ desired_capabilities=capabilities) File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 92, in __init__ self.start_session(desired_capabilities, browser_profile) File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 179, in start_session response = self.execute(Command.NEW_SESSION, capabilities) File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 234, in execute response = self.command_executor.execute(driver_command, params) File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 408, in execute return self._request(command_info[0], url, body=data) File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 478, in _request resp = opener.open(request, timeout=self._timeout) File "C:\Program Files\Python36\lib\urllib\request.py", line 526, in open response = self._open(req, data) File "C:\Program Files\Python36\lib\urllib\request.py", line 544, in _open '_open', req) File "C:\Program Files\Python36\lib\urllib\request.py", line 504, in _call_chain result = func(*args) File "C:\Program Files\Python36\lib\urllib\request.py", line 1346, in http_open return self.do_open(http.client.HTTPConnection, req) File "C:\Program Files\Python36\lib\urllib\request.py", line 1321, in do_open r = h.getresponse() File "C:\Program Files\Python36\lib\http\client.py", line 1331, in getresponse response.begin() File "C:\Program Files\Python36\lib\http\client.py", line 297, in begin version, status, reason = self._read_status() File "C:\Program Files\Python36\lib\http\client.py", line 266, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response 问答

Java上传到fth时提示Connection closed without indication.

03-15

如题,自己在系统中直接上传的话是可以的,但用代码上传就不行了,代码是这样的 package cn.com.sensetech.ftp; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import org.apache.commons.net.ftp.FTPClient; import org.apache.commons.net.ftp.FTPReply; public class Upload { /** * Description: 向FTP服务器上传文件 * @param url FTP服务器hostname * @param port FTP服务器端口 * @param username FTP登录账号 * @param password FTP登录密码 * @param path FTP服务器保存目录 * @param filename 上传到FTP服务器上的文件名 * @param input 输入流 * @return 成功返回true,否则返回false */ public static boolean uploadFile(String url,int port,String username, String password, String path, String filename, InputStream input) { boolean success = false; FTPClient ftp = new FTPClient(); try { int reply; ftp.connect(url, port);//连接FTP服务器 //如果采用默认端口,可以使用ftp.connect(url)的方式直接连接FTP服务器 ftp.login(username, password);//登录 reply = ftp.getReplyCode(); if (!FTPReply.isPositiveCompletion(reply)) { ftp.disconnect(); return success; } ftp.changeWorkingDirectory(path); ftp.storeFile(filename, input); input.close(); ftp.logout(); success = true; } catch (IOException e) { e.printStackTrace(); } finally { if (ftp.isConnected()) { try { ftp.disconnect(); } catch (Exception e) { e.printStackTrace(); } } } return success; } public static void main(String[] args) { try { String filepath = "D:/upload/liuqian.txt"; FileInputStream in = new FileInputStream(new File(filepath)); boolean flag = uploadFile("localhost", 21, "lq", "123", "D:/upload", "walmartDemo", in); System.out.println(flag); } catch (Exception e) { e.printStackTrace(); } } } 提示出错信息 org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed without indication. at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:317) at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:294) at org.apache.commons.net.ftp.FTP._connectAction_(FTP.java:400) at org.apache.commons.net.ftp.FTPClient._connectAction_(FTPClient.java:924) at org.apache.commons.net.SocketClient.connect(SocketClient.java:183) at org.apache.commons.net.SocketClient.connect(SocketClient.java:203) at cn.com.sensetech.ftp.Upload.uploadFile(Upload.java:30) at cn.com.sensetech.ftp.Upload.main(Upload.java:105) false 调试的时候发现错误出现在这里 finally { if (ftp.isConnected()) { try { ftp.disconnect(); } catch (Exception e) { e.printStackTrace(); } } 这究竟是怎么回事啊,怎么修改才能够跑通,本人新手,求各位大师帮忙,感激不尽 问答

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 编程工作室 设计师: CSDN官方博客

分享到微信朋友圈

×

扫一扫,手机浏览