Package cherrypy :: Package lib :: Module cpstats
[hide private]
[frames] | no frames]

Source Code for Module cherrypy.lib.cpstats

  1  """CPStats, a package for collecting and reporting on program statistics. 
  2   
  3  Overview 
  4  ======== 
  5   
  6  Statistics about program operation are an invaluable monitoring and debugging 
  7  tool. Unfortunately, the gathering and reporting of these critical values is 
  8  usually ad-hoc. This package aims to add a centralized place for gathering 
  9  statistical performance data, a structure for recording that data which 
 10  provides for extrapolation of that data into more useful information, 
 11  and a method of serving that data to both human investigators and 
 12  monitoring software. Let's examine each of those in more detail. 
 13   
 14  Data Gathering 
 15  -------------- 
 16   
 17  Just as Python's `logging` module provides a common importable for gathering 
 18  and sending messages, performance statistics would benefit from a similar 
 19  common mechanism, and one that does *not* require each package which wishes 
 20  to collect stats to import a third-party module. Therefore, we choose to 
 21  re-use the `logging` module by adding a `statistics` object to it. 
 22   
 23  That `logging.statistics` object is a nested dict. It is not a custom class, 
 24  because that would: 
 25   
 26   1. require libraries and applications to import a third-party module in 
 27      order to participate 
 28   2. inhibit innovation in extrapolation approaches and in reporting tools, and 
 29   3. be slow. 
 30   
 31  There are, however, some specifications regarding the structure of the dict.:: 
 32   
 33     { 
 34       +----"SQLAlchemy": { 
 35       |        "Inserts": 4389745, 
 36       |        "Inserts per Second": 
 37       |            lambda s: s["Inserts"] / (time() - s["Start"]), 
 38       |  C +---"Table Statistics": { 
 39       |  o |        "widgets": {-----------+ 
 40     N |  l |            "Rows": 1.3M,      | Record 
 41     a |  l |            "Inserts": 400,    | 
 42     m |  e |        },---------------------+ 
 43     e |  c |        "froobles": { 
 44     s |  t |            "Rows": 7845, 
 45     p |  i |            "Inserts": 0, 
 46     a |  o |        }, 
 47     c |  n +---}, 
 48     e |        "Slow Queries": 
 49       |            [{"Query": "SELECT * FROM widgets;", 
 50       |              "Processing Time": 47.840923343, 
 51       |              }, 
 52       |             ], 
 53       +----}, 
 54     } 
 55   
 56  The `logging.statistics` dict has four levels. The topmost level is nothing 
 57  more than a set of names to introduce modularity, usually along the lines of 
 58  package names. If the SQLAlchemy project wanted to participate, for example, 
 59  it might populate the item `logging.statistics['SQLAlchemy']`, whose value 
 60  would be a second-layer dict we call a "namespace". Namespaces help multiple 
 61  packages to avoid collisions over key names, and make reports easier to read, 
 62  to boot. The maintainers of SQLAlchemy should feel free to use more than one 
 63  namespace if needed (such as 'SQLAlchemy ORM'). Note that there are no case 
 64  or other syntax constraints on the namespace names; they should be chosen 
 65  to be maximally readable by humans (neither too short nor too long). 
 66   
 67  Each namespace, then, is a dict of named statistical values, such as 
 68  'Requests/sec' or 'Uptime'. You should choose names which will look 
 69  good on a report: spaces and capitalization are just fine. 
 70   
 71  In addition to scalars, values in a namespace MAY be a (third-layer) 
 72  dict, or a list, called a "collection". For example, the CherryPy 
 73  :class:`StatsTool` keeps track of what each request is doing (or has most 
 74  recently done) in a 'Requests' collection, where each key is a thread ID; each 
 75  value in the subdict MUST be a fourth dict (whew!) of statistical data about 
 76  each thread. We call each subdict in the collection a "record". Similarly, 
 77  the :class:`StatsTool` also keeps a list of slow queries, where each record 
 78  contains data about each slow query, in order. 
 79   
 80  Values in a namespace or record may also be functions, which brings us to: 
 81   
 82  Extrapolation 
 83  ------------- 
 84   
 85  The collection of statistical data needs to be fast, as close to unnoticeable 
 86  as possible to the host program. That requires us to minimize I/O, for example, 
 87  but in Python it also means we need to minimize function calls. So when you 
 88  are designing your namespace and record values, try to insert the most basic 
 89  scalar values you already have on hand. 
 90   
 91  When it comes time to report on the gathered data, however, we usually have 
 92  much more freedom in what we can calculate. Therefore, whenever reporting 
 93  tools (like the provided :class:`StatsPage` CherryPy class) fetch the contents 
 94  of `logging.statistics` for reporting, they first call 
 95  `extrapolate_statistics` (passing the whole `statistics` dict as the only 
 96  argument). This makes a deep copy of the statistics dict so that the 
 97  reporting tool can both iterate over it and even change it without harming 
 98  the original. But it also expands any functions in the dict by calling them. 
 99  For example, you might have a 'Current Time' entry in the namespace with the 
100  value "lambda scope: time.time()". The "scope" parameter is the current 
101  namespace dict (or record, if we're currently expanding one of those 
102  instead), allowing you access to existing static entries. If you're truly 
103  evil, you can even modify more than one entry at a time. 
104   
105  However, don't try to calculate an entry and then use its value in further 
106  extrapolations; the order in which the functions are called is not guaranteed. 
107  This can lead to a certain amount of duplicated work (or a redesign of your 
108  schema), but that's better than complicating the spec. 
109   
110  After the whole thing has been extrapolated, it's time for: 
111   
112  Reporting 
113  --------- 
114   
115  The :class:`StatsPage` class grabs the `logging.statistics` dict, extrapolates 
116  it all, and then transforms it to HTML for easy viewing. Each namespace gets 
117  its own header and attribute table, plus an extra table for each collection. 
118  This is NOT part of the statistics specification; other tools can format how 
119  they like. 
120   
121  You can control which columns are output and how they are formatted by updating 
122  StatsPage.formatting, which is a dict that mirrors the keys and nesting of 
123  `logging.statistics`. The difference is that, instead of data values, it has 
124  formatting values. Use None for a given key to indicate to the StatsPage that a 
125  given column should not be output. Use a string with formatting 
126  (such as '%.3f') to interpolate the value(s), or use a callable (such as 
127  lambda v: v.isoformat()) for more advanced formatting. Any entry which is not 
128  mentioned in the formatting dict is output unchanged. 
129   
130  Monitoring 
131  ---------- 
132   
133  Although the HTML output takes pains to assign unique id's to each <td> with 
134  statistical data, you're probably better off fetching /cpstats/data, which 
135  outputs the whole (extrapolated) `logging.statistics` dict in JSON format. 
136  That is probably easier to parse, and doesn't have any formatting controls, 
137  so you get the "original" data in a consistently-serialized format. 
138  Note: there's no treatment yet for datetime objects. Try time.time() instead 
139  for now if you can. Nagios will probably thank you. 
140   
141  Turning Collection Off 
142  ---------------------- 
143   
144  It is recommended each namespace have an "Enabled" item which, if False, 
145  stops collection (but not reporting) of statistical data. Applications 
146  SHOULD provide controls to pause and resume collection by setting these 
147  entries to False or True, if present. 
148   
149   
150  Usage 
151  ===== 
152   
153  To collect statistics on CherryPy applications:: 
154   
155      from cherrypy.lib import cpstats 
156      appconfig['/']['tools.cpstats.on'] = True 
157   
158  To collect statistics on your own code:: 
159   
160      import logging 
161      # Initialize the repository 
162      if not hasattr(logging, 'statistics'): logging.statistics = {} 
163      # Initialize my namespace 
164      mystats = logging.statistics.setdefault('My Stuff', {}) 
165      # Initialize my namespace's scalars and collections 
166      mystats.update({ 
167          'Enabled': True, 
168          'Start Time': time.time(), 
169          'Important Events': 0, 
170          'Events/Second': lambda s: ( 
171              (s['Important Events'] / (time.time() - s['Start Time']))), 
172          }) 
173      ... 
174      for event in events: 
175          ... 
176          # Collect stats 
177          if mystats.get('Enabled', False): 
178              mystats['Important Events'] += 1 
179   
180  To report statistics:: 
181   
182      root.cpstats = cpstats.StatsPage() 
183   
184  To format statistics reports:: 
185   
186      See 'Reporting', above. 
187   
188  """ 
189   
190  # ------------------------------- Statistics -------------------------------- # 
191   
192  import logging 
193  if not hasattr(logging, 'statistics'): 
194      logging.statistics = {} 
195   
196   
197 -def extrapolate_statistics(scope):
198 """Return an extrapolated copy of the given scope.""" 199 c = {} 200 for k, v in list(scope.items()): 201 if isinstance(v, dict): 202 v = extrapolate_statistics(v) 203 elif isinstance(v, (list, tuple)): 204 v = [extrapolate_statistics(record) for record in v] 205 elif hasattr(v, '__call__'): 206 v = v(scope) 207 c[k] = v 208 return c
209 210 211 # -------------------- CherryPy Applications Statistics --------------------- # 212 213 import threading 214 import time 215 216 import cherrypy 217 218 appstats = logging.statistics.setdefault('CherryPy Applications', {}) 219 appstats.update({ 220 'Enabled': True, 221 'Bytes Read/Request': lambda s: ( 222 s['Total Requests'] and 223 (s['Total Bytes Read'] / float(s['Total Requests'])) or 224 0.0 225 ), 226 'Bytes Read/Second': lambda s: s['Total Bytes Read'] / s['Uptime'](s), 227 'Bytes Written/Request': lambda s: ( 228 s['Total Requests'] and 229 (s['Total Bytes Written'] / float(s['Total Requests'])) or 230 0.0 231 ), 232 'Bytes Written/Second': lambda s: ( 233 s['Total Bytes Written'] / s['Uptime'](s) 234 ), 235 'Current Time': lambda s: time.time(), 236 'Current Requests': 0, 237 'Requests/Second': lambda s: float(s['Total Requests']) / s['Uptime'](s), 238 'Server Version': cherrypy.__version__, 239 'Start Time': time.time(), 240 'Total Bytes Read': 0, 241 'Total Bytes Written': 0, 242 'Total Requests': 0, 243 'Total Time': 0, 244 'Uptime': lambda s: time.time() - s['Start Time'], 245 'Requests': {}, 246 }) 247 248 proc_time = lambda s: time.time() - s['Start Time'] 249 250
251 -class ByteCountWrapper(object):
252 253 """Wraps a file-like object, counting the number of bytes read.""" 254
255 - def __init__(self, rfile):
256 self.rfile = rfile 257 self.bytes_read = 0
258
259 - def read(self, size=-1):
260 data = self.rfile.read(size) 261 self.bytes_read += len(data) 262 return data
263
264 - def readline(self, size=-1):
265 data = self.rfile.readline(size) 266 self.bytes_read += len(data) 267 return data
268
269 - def readlines(self, sizehint=0):
270 # Shamelessly stolen from StringIO 271 total = 0 272 lines = [] 273 line = self.readline() 274 while line: 275 lines.append(line) 276 total += len(line) 277 if 0 < sizehint <= total: 278 break 279 line = self.readline() 280 return lines
281
282 - def close(self):
283 self.rfile.close()
284
285 - def __iter__(self):
286 return self
287
288 - def next(self):
289 data = self.rfile.next() 290 self.bytes_read += len(data) 291 return data
292 293 294 average_uriset_time = lambda s: s['Count'] and (s['Sum'] / s['Count']) or 0 295 296
297 -class StatsTool(cherrypy.Tool):
298 299 """Record various information about the current request.""" 300
301 - def __init__(self):
302 cherrypy.Tool.__init__(self, 'on_end_request', self.record_stop)
303
304 - def _setup(self):
305 """Hook this tool into cherrypy.request. 306 307 The standard CherryPy request object will automatically call this 308 method when the tool is "turned on" in config. 309 """ 310 if appstats.get('Enabled', False): 311 cherrypy.Tool._setup(self) 312 self.record_start()
313
314 - def record_start(self):
315 """Record the beginning of a request.""" 316 request = cherrypy.serving.request 317 if not hasattr(request.rfile, 'bytes_read'): 318 request.rfile = ByteCountWrapper(request.rfile) 319 request.body.fp = request.rfile 320 321 r = request.remote 322 323 appstats['Current Requests'] += 1 324 appstats['Total Requests'] += 1 325 appstats['Requests'][threading._get_ident()] = { 326 'Bytes Read': None, 327 'Bytes Written': None, 328 # Use a lambda so the ip gets updated by tools.proxy later 329 'Client': lambda s: '%s:%s' % (r.ip, r.port), 330 'End Time': None, 331 'Processing Time': proc_time, 332 'Request-Line': request.request_line, 333 'Response Status': None, 334 'Start Time': time.time(), 335 }
336
337 - def record_stop( 338 self, uriset=None, slow_queries=1.0, slow_queries_count=100, 339 debug=False, **kwargs):
340 """Record the end of a request.""" 341 resp = cherrypy.serving.response 342 w = appstats['Requests'][threading._get_ident()] 343 344 r = cherrypy.request.rfile.bytes_read 345 w['Bytes Read'] = r 346 appstats['Total Bytes Read'] += r 347 348 if resp.stream: 349 w['Bytes Written'] = 'chunked' 350 else: 351 cl = int(resp.headers.get('Content-Length', 0)) 352 w['Bytes Written'] = cl 353 appstats['Total Bytes Written'] += cl 354 355 w['Response Status'] = getattr( 356 resp, 'output_status', None) or resp.status 357 358 w['End Time'] = time.time() 359 p = w['End Time'] - w['Start Time'] 360 w['Processing Time'] = p 361 appstats['Total Time'] += p 362 363 appstats['Current Requests'] -= 1 364 365 if debug: 366 cherrypy.log('Stats recorded: %s' % repr(w), 'TOOLS.CPSTATS') 367 368 if uriset: 369 rs = appstats.setdefault('URI Set Tracking', {}) 370 r = rs.setdefault(uriset, { 371 'Min': None, 'Max': None, 'Count': 0, 'Sum': 0, 372 'Avg': average_uriset_time}) 373 if r['Min'] is None or p < r['Min']: 374 r['Min'] = p 375 if r['Max'] is None or p > r['Max']: 376 r['Max'] = p 377 r['Count'] += 1 378 r['Sum'] += p 379 380 if slow_queries and p > slow_queries: 381 sq = appstats.setdefault('Slow Queries', []) 382 sq.append(w.copy()) 383 if len(sq) > slow_queries_count: 384 sq.pop(0)
385 386 387 import cherrypy 388 cherrypy.tools.cpstats = StatsTool() 389 390 391 # ---------------------- CherryPy Statistics Reporting ---------------------- # 392 393 import os 394 thisdir = os.path.abspath(os.path.dirname(__file__)) 395 396 try: 397 import json 398 except ImportError: 399 try: 400 import simplejson as json 401 except ImportError: 402 json = None 403 404 405 missing = object() 406 407 locale_date = lambda v: time.strftime('%c', time.gmtime(v)) 408 iso_format = lambda v: time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(v)) 409 410
411 -def pause_resume(ns):
412 def _pause_resume(enabled): 413 pause_disabled = '' 414 resume_disabled = '' 415 if enabled: 416 resume_disabled = 'disabled="disabled" ' 417 else: 418 pause_disabled = 'disabled="disabled" ' 419 return """ 420 <form action="pause" method="POST" style="display:inline"> 421 <input type="hidden" name="namespace" value="%s" /> 422 <input type="submit" value="Pause" %s/> 423 </form> 424 <form action="resume" method="POST" style="display:inline"> 425 <input type="hidden" name="namespace" value="%s" /> 426 <input type="submit" value="Resume" %s/> 427 </form> 428 """ % (ns, pause_disabled, ns, resume_disabled)
429 return _pause_resume 430 431
432 -class StatsPage(object):
433 434 formatting = { 435 'CherryPy Applications': { 436 'Enabled': pause_resume('CherryPy Applications'), 437 'Bytes Read/Request': '%.3f', 438 'Bytes Read/Second': '%.3f', 439 'Bytes Written/Request': '%.3f', 440 'Bytes Written/Second': '%.3f', 441 'Current Time': iso_format, 442 'Requests/Second': '%.3f', 443 'Start Time': iso_format, 444 'Total Time': '%.3f', 445 'Uptime': '%.3f', 446 'Slow Queries': { 447 'End Time': None, 448 'Processing Time': '%.3f', 449 'Start Time': iso_format, 450 }, 451 'URI Set Tracking': { 452 'Avg': '%.3f', 453 'Max': '%.3f', 454 'Min': '%.3f', 455 'Sum': '%.3f', 456 }, 457 'Requests': { 458 'Bytes Read': '%s', 459 'Bytes Written': '%s', 460 'End Time': None, 461 'Processing Time': '%.3f', 462 'Start Time': None, 463 }, 464 }, 465 'CherryPy WSGIServer': { 466 'Enabled': pause_resume('CherryPy WSGIServer'), 467 'Connections/second': '%.3f', 468 'Start time': iso_format, 469 }, 470 } 471
472 - def index(self):
473 # Transform the raw data into pretty output for HTML 474 yield """ 475 <html> 476 <head> 477 <title>Statistics</title> 478 <style> 479 480 th, td { 481 padding: 0.25em 0.5em; 482 border: 1px solid #666699; 483 } 484 485 table { 486 border-collapse: collapse; 487 } 488 489 table.stats1 { 490 width: 100%; 491 } 492 493 table.stats1 th { 494 font-weight: bold; 495 text-align: right; 496 background-color: #CCD5DD; 497 } 498 499 table.stats2, h2 { 500 margin-left: 50px; 501 } 502 503 table.stats2 th { 504 font-weight: bold; 505 text-align: center; 506 background-color: #CCD5DD; 507 } 508 509 </style> 510 </head> 511 <body> 512 """ 513 for title, scalars, collections in self.get_namespaces(): 514 yield """ 515 <h1>%s</h1> 516 517 <table class='stats1'> 518 <tbody> 519 """ % title 520 for i, (key, value) in enumerate(scalars): 521 colnum = i % 3 522 if colnum == 0: 523 yield """ 524 <tr>""" 525 yield ( 526 """ 527 <th>%(key)s</th><td id='%(title)s-%(key)s'>%(value)s</td>""" % 528 vars() 529 ) 530 if colnum == 2: 531 yield """ 532 </tr>""" 533 534 if colnum == 0: 535 yield """ 536 <th></th><td></td> 537 <th></th><td></td> 538 </tr>""" 539 elif colnum == 1: 540 yield """ 541 <th></th><td></td> 542 </tr>""" 543 yield """ 544 </tbody> 545 </table>""" 546 547 for subtitle, headers, subrows in collections: 548 yield """ 549 <h2>%s</h2> 550 <table class='stats2'> 551 <thead> 552 <tr>""" % subtitle 553 for key in headers: 554 yield """ 555 <th>%s</th>""" % key 556 yield """ 557 </tr> 558 </thead> 559 <tbody>""" 560 for subrow in subrows: 561 yield """ 562 <tr>""" 563 for value in subrow: 564 yield """ 565 <td>%s</td>""" % value 566 yield """ 567 </tr>""" 568 yield """ 569 </tbody> 570 </table>""" 571 yield """ 572 </body> 573 </html> 574 """
575 index.exposed = True 576
577 - def get_namespaces(self):
578 """Yield (title, scalars, collections) for each namespace.""" 579 s = extrapolate_statistics(logging.statistics) 580 for title, ns in sorted(s.items()): 581 scalars = [] 582 collections = [] 583 ns_fmt = self.formatting.get(title, {}) 584 for k, v in sorted(ns.items()): 585 fmt = ns_fmt.get(k, {}) 586 if isinstance(v, dict): 587 headers, subrows = self.get_dict_collection(v, fmt) 588 collections.append((k, ['ID'] + headers, subrows)) 589 elif isinstance(v, (list, tuple)): 590 headers, subrows = self.get_list_collection(v, fmt) 591 collections.append((k, headers, subrows)) 592 else: 593 format = ns_fmt.get(k, missing) 594 if format is None: 595 # Don't output this column. 596 continue 597 if hasattr(format, '__call__'): 598 v = format(v) 599 elif format is not missing: 600 v = format % v 601 scalars.append((k, v)) 602 yield title, scalars, collections
603
604 - def get_dict_collection(self, v, formatting):
605 """Return ([headers], [rows]) for the given collection.""" 606 # E.g., the 'Requests' dict. 607 headers = [] 608 for record in v.itervalues(): 609 for k3 in record: 610 format = formatting.get(k3, missing) 611 if format is None: 612 # Don't output this column. 613 continue 614 if k3 not in headers: 615 headers.append(k3) 616 headers.sort() 617 618 subrows = [] 619 for k2, record in sorted(v.items()): 620 subrow = [k2] 621 for k3 in headers: 622 v3 = record.get(k3, '') 623 format = formatting.get(k3, missing) 624 if format is None: 625 # Don't output this column. 626 continue 627 if hasattr(format, '__call__'): 628 v3 = format(v3) 629 elif format is not missing: 630 v3 = format % v3 631 subrow.append(v3) 632 subrows.append(subrow) 633 634 return headers, subrows
635
636 - def get_list_collection(self, v, formatting):
637 """Return ([headers], [subrows]) for the given collection.""" 638 # E.g., the 'Slow Queries' list. 639 headers = [] 640 for record in v: 641 for k3 in record: 642 format = formatting.get(k3, missing) 643 if format is None: 644 # Don't output this column. 645 continue 646 if k3 not in headers: 647 headers.append(k3) 648 headers.sort() 649 650 subrows = [] 651 for record in v: 652 subrow = [] 653 for k3 in headers: 654 v3 = record.get(k3, '') 655 format = formatting.get(k3, missing) 656 if format is None: 657 # Don't output this column. 658 continue 659 if hasattr(format, '__call__'): 660 v3 = format(v3) 661 elif format is not missing: 662 v3 = format % v3 663 subrow.append(v3) 664 subrows.append(subrow) 665 666 return headers, subrows
667 668 if json is not None:
669 - def data(self):
670 s = extrapolate_statistics(logging.statistics) 671 cherrypy.response.headers['Content-Type'] = 'application/json' 672 return json.dumps(s, sort_keys=True, indent=4)
673 data.exposed = True 674
675 - def pause(self, namespace):
676 logging.statistics.get(namespace, {})['Enabled'] = False 677 raise cherrypy.HTTPRedirect('./')
678 pause.exposed = True 679 pause.cp_config = {'tools.allow.on': True, 680 'tools.allow.methods': ['POST']} 681
682 - def resume(self, namespace):
683 logging.statistics.get(namespace, {})['Enabled'] = True 684 raise cherrypy.HTTPRedirect('./')
685 resume.exposed = True 686 resume.cp_config = {'tools.allow.on': True, 687 'tools.allow.methods': ['POST']}
688