<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[修复 FinNLP 爬虫bug]]></title><description><![CDATA[<p dir="auto">在使用 FinNLP  提供例子 去获取</p>
<h1>Easymoney</h1>
<p dir="auto">from finnlp.data_sources.news.eastmoney_streaming import Eastmoney_Streaming<br />
东方财富 论坛的数据的时候，<br />
由于 xpath发生了变化。 也就是网友html内部变动了。导致线上代码，不能获取 数据。<br />
现在 已经 修改， 提交了 issues</p>
<p dir="auto">里面包含了正确的代码：<br />
<a href="https://github.com/AI4Finance-Foundation/FinNLP/issues/3" rel="nofollow ugc">https://github.com/AI4Finance-Foundation/FinNLP/issues/3</a></p>
<pre><code> def _gather_pages(self, stock, page):
     ....
     # gather the comtent of the first page
        page = etree.HTML(response.text)
        trs = page.xpath('//*[@id="mainlist"]/div/ul/li[1]/table/tbody/tr')
        have_one = False
        for item in trs:
            have_one = True
            read_amount = item.xpath("./td[1]//text()")[0]
            comments = item.xpath("./td[2]//text()")[0]
            title = item.xpath("./td[3]/div/a//text()")[0]
            content_link = item.xpath("./td[3]/div/a/@href")[0]
            author = item.xpath("./td[4]//text()")[0]
            time = item.xpath("./td[5]//text()")[0]
            tmp = pd.DataFrame([read_amount, comments, title, content_link, author, time]).T
            columns = [ "read amount", "comments", "title", "content link", "author", "create time" ]
            tmp.columns = columns
            self.dataframe = pd.concat([self.dataframe, tmp])
            #print(title)
        if have_one == False:
            return "break"
   ...
</code></pre>
]]></description><link>http://localhost:4567/topic/176/修复-finnlp-爬虫bug</link><generator>RSS for Node</generator><lastBuildDate>Mon, 18 May 2026 12:26:24 GMT</lastBuildDate><atom:link href="http://localhost:4567/topic/176.rss" rel="self" type="application/rss+xml"/><pubDate>Sun, 18 Jun 2023 02:27:12 GMT</pubDate><ttl>60</ttl></channel></rss>