V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
xinali
V2EX  ›  Python

python 代码中函数中定义函数,是为什么,在什么地方会用到呢?

  •  
  •   xinali · 2016-07-16 14:17:21 +08:00 · 3190 次点击
    这是一个创建于 3037 天前的主题,其中的信息可能已经有所发展或是发生改变。

    首先 python 代码中函数中定义函数的一种用法是修饰器,这个在我的上个提问中,已经知道了,也详细的了解了一下装饰器的作用,主要的就是动态的增加函数的功能,在代码测试,日志处理等处理中,作用明显.

    今天在读 sqlmap 的源码时,看到了一个crawler.py的爬虫文件,其中的一部分代码就是在函数中定义函数,而且不是用的修饰器,我看了代码,这段代码完全可以在外部定义,这是没有问题的,我想知道的是为什么要这么定义呢?主要的作用是什么呢?比如,共享变量,隐藏函数?

    下面贴一部分代码

    def crawl(target):
        try:
            visited = set()
            threadData = getCurrentThreadData()
            threadData.shared.value = oset()
    
            def crawlThread():
                threadData = getCurrentThreadData()
    
                while kb.threadContinue:
                    with kb.locks.limit:
                        if threadData.shared.unprocessed:
                            current = threadData.shared.unprocessed.pop()
                            if current in visited:
                                continue
                            elif conf.crawlExclude and re.search(conf.crawlExclude, current):
                                dbgMsg = "skipping '%s'" % current
                                logger.debug(dbgMsg)
                                continue
                            else:
                                visited.add(current)
                        else:
                            break
    
                    content = None
                    try:
                        if current:
                            content = Request.getPage(url=current, crawling=True, raise404=False)[0]
                    except SqlmapConnectionException, ex:
                        errMsg = "connection exception detected (%s). skipping " % ex
                        errMsg += "URL '%s'" % current
                        logger.critical(errMsg)
                    except SqlmapSyntaxException:
                        errMsg = "invalid URL detected. skipping '%s'" % current
                        logger.critical(errMsg)
                    except httplib.InvalidURL, ex:
                        errMsg = "invalid URL detected (%s). skipping " % ex
                        errMsg += "URL '%s'" % current
                        logger.critical(errMsg)
    
                    if not kb.threadContinue:
                        break
    
                    if isinstance(content, unicode):
                        try:
                            match = re.search(r"(?si)<html[^>]*>(.+)</html>", content)
                            if match:
                                content = "<html>%s</html>" % match.group(1)
    
                            soup = BeautifulSoup(content)
                            tags = soup('a')
    
                            if not tags:
                                tags = re.finditer(r'(?si)<a[^>]+href="(?P<href>[^>"]+)"', content)
    
                            for tag in tags:
                                href = tag.get("href") if hasattr(tag, "get") else tag.group("href")
    
                                if href:
                                    if threadData.lastRedirectURL and threadData.lastRedirectURL[0] == threadData.lastRequestUID:
                                        current = threadData.lastRedirectURL[1]
                                    url = urlparse.urljoin(current, href)
    
                                    # flag to know if we are dealing with the same target host
                                    _ = reduce(lambda x, y: x == y, map(lambda x: urlparse.urlparse(x).netloc.split(':')[0], (url, target)))
    
                                    if conf.scope:
                                        if not re.search(conf.scope, url, re.I):
                                            continue
                                    elif not _:
                                        continue
    
                                    if url.split('.')[-1].lower() not in CRAWL_EXCLUDE_EXTENSIONS:
                                        with kb.locks.value:
                                            threadData.shared.deeper.add(url)
                                            if re.search(r"(.*?)\?(.+)", url):
                                                threadData.shared.value.add(url)
                        except UnicodeEncodeError:  # for non-HTML files
                            pass
                        finally:
                            if conf.forms:
                                findPageForms(content, current, False, True)
    
                    if conf.verbose in (1, 2):
                        threadData.shared.count += 1
                        status = '%d/%d links visited (%d%%)' % (threadData.shared.count, threadData.shared.length, round(100.0 * threadData.shared.count / threadData.shared.length))
                        dataToStdout("\r[%s] [INFO] %s" % (time.strftime("%X"), status), True)
    
            threadData.shared.deeper = set()
            threadData.shared.unprocessed = set([target])
    
            if not conf.sitemapUrl:
                message = "do you want to check for the existence of "
                message += "site's sitemap(.xml) [y/N] "
                test = readInput(message, default="n")
                if test[0] in ("y", "Y"):
                    found = True
                    items = None
                    url = urlparse.urljoin(target, "/sitemap.xml")
                    try:
                        items = parseSitemap(url)
                    except SqlmapConnectionException, ex:
                        if "page not found" in getSafeExString(ex):
                            found = False
                            logger.warn("'sitemap.xml' not found")
                    except:
                        pass
                    finally:
                        if found:
                            if items:
                                for item in items:
                                    if re.search(r"(.*?)\?(.+)", item):
                                        threadData.shared.value.add(item)
                                if conf.crawlDepth > 1:
                                    threadData.shared.unprocessed.update(items)
                            logger.info("%s links found" % ("no" if not items else len(items)))
    
            infoMsg = "starting crawler"
            if conf.bulkFile:
                infoMsg += " for target URL '%s'" % target
            logger.info(infoMsg)
    
            for i in xrange(conf.crawlDepth):
                threadData.shared.count = 0
                threadData.shared.length = len(threadData.shared.unprocessed)
                numThreads = min(conf.threads, len(threadData.shared.unprocessed))
    
                if not conf.bulkFile:
                    logger.info("searching for links with depth %d" % (i + 1))
    
                runThreads(numThreads, crawlThread, threadChoice=(i>0))
                clearConsoleLine(True)
    
                if threadData.shared.deeper:
                    threadData.shared.unprocessed = set(threadData.shared.deeper)
                else:
                    break
    
        except KeyboardInterrupt:
            warnMsg = "user aborted during crawling. sqlmap "
            warnMsg += "will use partial list"
            logger.warn(warnMsg)
    
        finally:
            clearConsoleLine(True)
    
            if not threadData.shared.value:
                warnMsg = "no usable links found (with GET parameters)"
                logger.warn(warnMsg)
            else:
                for url in threadData.shared.value:
                    kb.targets.add((url, None, None, None, None))
    
    
    5 条回复    2016-07-17 09:48:48 +08:00
    imn1
        1
    imn1  
       2016-07-16 14:27:23 +08:00
    经常见吧?
    说个简单的情况
    如果函数内要反复调用一段代码,定义个函数不是更方便?
    [fun(x) for x in y]

    不放在外部,理由也很简单,这个函数跟外部代码没什么关系,也是规范吧?
    quxw
        2
    quxw  
       2016-07-16 14:27:27 +08:00
    闭包, decorator
    dalang
        3
    dalang  
       2016-07-16 14:37:36 +08:00 via iPhone
    除了提高可读性等, nested function 在使用上也是有一些限制的,比如外部函数中的变量对 nested function 来说是可读的
    lithium4010
        4
    lithium4010  
       2016-07-17 09:48:08 +08:00 via Android
    屎一样的代码
    lithium4010
        5
    lithium4010  
       2016-07-17 09:48:48 +08:00 via Android
    好像进了垃圾场
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   2859 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 27ms · UTC 02:28 · PVG 10:28 · LAX 18:28 · JFK 21:28
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.