Management magazine search

Loading

Tuesday, October 16, 2012

Python Introduction - Google's Python Class - Google Code

Python Introduction - Google's Python Class - Google Code: "Python Introduction"


Python is a dynamic, interpreted language. Source code does not declare the types of variables or parameters or methods. This makes the code short and flexible, and you lose the compile-time type checking in the source code. Python tracks the types of all values at runtime and flags code that does not make sense as it runs. (todo: link here to the companion video segment for this section)
An excellent way to see how Python code works is to run the Python interpreter and type code right into it. If you ever have a question like "what happens if I add an int to a list?" ... just typing it into the Python interpreter is fast way to see what happens. Python code does not declare the types of variables -- just assign to them and go. Python raises a runtime error if the code tries to read from a variable that has not been given a value. Like C++ and Java, Python is case sensitive so "a" and "A" are different variables. The end of a line marks the end of a statement, so unlike C++ and Java, Python does not require a semicolon at the end of each statement. You can include semicolons at the end of Python statements (perhaps just out of habit), but it's not the best style. Comments begin with a '#' and extend to the end of the line. (See Python Set Up for how to install and Python on various operating systems.)
$ python        ## Run the Python interpreter
Python 2.7.1 (r271:86832, Jul 31 2011, 19:30:53) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwinType "help", "copyright", "credits" or "license" for more information.
>>> a = 6       ## set a variable in this interpreter session    
>>> a           ## entering an expression prints its value
6
>>> a + 2
8
>>> a = 'hi'    ## a can hold a string just as well
>>> a     'hi'
>>> len(a)      ## call the len() function on a string
2
>>> foo(a)      ## try something that doesn't work
Traceback (most recent call last):
  File "", line 1, in ?
NameError: name 'foo' is not defined
>>> ^D          ## type CTRL-d to exit (CTRL-z in Windows/DOS terminal)

Python Program

Python source files use the ".py" extension. With a Python program in file hello.py, the easiest way to run it is with the shell command "python hello.py Alice" -- loading and running the code in hello.py, passing it the command line argument "Alice".
Here's a very simple Python hello.py program (notice that blocks of code are delimited strictly using indentation rather than curly braces -- more on this later!):
#!/usr/bin/python
# import modules used here -- sys is a very standard one
import sys
# Gather our code in a main() function
def main():
    print 'Hello there', sys.argv[1]
    # Command line args are in sys.argv[1], sys.argv[2] ...
    # sys.argv[0] is the script name itself and can be ignored
# Standard boilerplate to call the main() function to begin
# the program.
if __name__ == '__main__':
    main()
Running this program from the command line looks like:
$ python hello.py Guido
Hello there Guido
$ ./hello.py Alice    # without needing 'python' first (Unix)
Hello there Alice

Python Module

The outermost statements in a Python file, or "module", do its one-time setup -- those statements run from top to bottom the first time the module is imported somewhere, setting up its variables and functions. A Python module can be run directly -- as above "python hello.py Bob" -- or it can be imported and used by some other module. When a Python file is run directly, the special variable "__name__" is set to "__main__". Therefore, it's common to have the boilerplate if __name__ ==... shown above to call a main() function when the module is run directly, but not when the module is imported by some other module.
In a standard Python program, the list sys.argv contains the command line arguments in the standard way with sys.argv[0] being the program itself, sys.argv[1] the first argument, and so on.

Functions

Functions in Python are defined like this:
# Defines a "repeat" function that takes 2 arguments.
def repeat(s, exclaim):
    """Returns the string s repeated 3 times.
    If exclaim is true, add exclamation marks.
    """

    result = s + s + s # can also use "s * 3" which is faster (Why?)
    if exclaim:
        result = result + '!!!'
    return result
Notice also how the lines that make up the function or if-statement are grouped by all having the same level of indentation. We also presented 2 different ways to repeat strings, using the + operator which is more user-friendly, but * also works because it's Python's "repeat" operator, meaning that '-' * 10 gives '----------', a neat way to create an onscreen "line." In the code comment, we hinted that * works faster than +, the reason being that * calculates the size of the resulting object once whereas with +, that calculation is made each time + is called. Both + and * are called "overloaded" operators because they mean different things for numbers vs. for strings (and other data types).
The "def" defines the function with its parameters within parentheses and its code indented. The first line of a function can be a documentation string ("docstring") that describes what the function does. The docstring can be a single line, or a multi-line description as in the example above. Variables defined in the function are local to that function, so the "result" in the above function is separate from a "result" variable in another function. The return statement can take an argument, in which case that is the value returned to the caller.
Here is code that calls the above repeat() function, printing what it returns:
def main():
    print repeat('Yay', False)      ## YayYayYay
    print repeat('Woo Hoo', True)   ## Woo HooWoo HooWoo Hoo!!!
At run time, functions must be defined by the execution of a "def" before they are called. It's typical to def a main() function towards the bottom of the file with the functions it calls above it.

Indentation

One unusual Python feature is that the whitespace indentation of a piece of code affects its meaning. A logical block of statements such as the ones that make up a function should all have the same indentation, set in from the indentation of their parent function or "if" or whatever. If one of the lines in a group has a different indentation, it is flagged as a syntax error.
Python's use of whitespace feels a little strange at first, but it's logical and I found I got used to it very quickly. Avoid using TABs as they greatly complicate the indentation scheme (not to mention TABs may mean different things on different platforms). Set your editor to insert spaces instead of TABs for Python code.
A common question beginners ask is, "How many spaces should I indent?" According to the official Python style guidelines (PEP 8), you should indent with 4 spaces. (Fun fact: Google's internal style guideline dictates indenting by 2 spaces!)

Code Is Checked At Runtime

Python does very little checking at compile time, deferring almost all type, name, etc. checks on each line until that line runs. Suppose the above main() calls repeat() like this:
def main():
    if name == 'Guido':
        print repeeeet(name) + '!!!'
    else:
        print repeat(name)
The if-statement contains an obvious error, where the repeat() function is accidentally typed in as repeeeet(). The funny thing in Python ... this code compiles and runs fine so long as the name at runtime is not 'Guido'. Only when a run actually tries to execute the repeeeet() will it notice that there is no such function and raise an error. This just means that when you first run a Python program, some of the first errors you see will be simple typos like this. This is one area where languages with a more verbose type system, like Java, have an advantage ... they can catch such errors at compile time (but of course you have to maintain all that type information ... it's a tradeoff).

Variable Names

Since Python variables don't have any type spelled out in the source code, it's extra helpful to give meaningful names to your variables to remind yourself of what's going on. So use "name" if it's a single name, and "names" if it's a list of names, and "tuples" if it's a list of tuples. Many basic Python errors result from forgetting what type of value is in each variable, so use your variable names (all you have really) to help keep things straight.

Modules and Imports

One file of Python code is called a *module*. The file "binky.py" is also known as the module "binky". A module essentially contains variable definitions like, "x = 6" and "def foo()". Suppose the file "binky.py" contains a "def foo()". The fully qualified name of that foo function is "binky.foo". In this way, various Python modules can name their functions and variables whatever they want, and the variable names won't conflict -- module1.foo is different from module2.foo.
For example, we have the standard "sys" module that contains some standard system facilities, like the argv list, and exit() function. With the statement "import sys" you can can then access the definitions in the sys module and makes them available by their fully-qualified name, e.g. sys.exit().
  import sys

  # Now can refer to sys.xxx facilities
  sys.exit(0)
There is another import form that looks like this: "from sys import argv, exit". That makes argv and exit() available by their short names; however, we recommend the original form with the fully-qualified names because it's a lot easier to determine where a function or attribute came from.
There are many modules and packages which are bundled with a standard installation of the Python interpreter, so you don't have do anything extra to use them. These are collectively known as the "Python Standard Library." Commonly used modules/packages include:
  • sys -- access to exit(), argv, stdin, stdout, ...
  • re -- regular expressions
  • os -- operating system interface, file system
You can find the documentation of all the Standard Library modules and packages at http://docs.python.org/library.

Online help and dir

There are a variety ways to get help for Python.
  • Do a Google search, starting with the word "python", like "python list" or "python string lowercase". The first hit is often the answer. This technique seems to work better for Python than it does for other languages for some reason.
  • The official Python docs site -- docs.python.org -- has high quality docs. Nonetheless, I often find a Google search of a couple words to be quicker.
  • There is also an official Tutor mailing list specifically designed for those who are new to Python and/or programming!
  • Many questions (and answers) can be found on StackOverflow.
  • Use the help() [and dir()] functions described below.
Inside the Python interpreter, the help() function pulls up documentation strings for various modules, functions, and methods. These doc strings are similar to Java's javadoc. This is one way to get quick access to docs. Here are some ways to call help() from inside the interpreter:
  • help(len) -- docs for the built in len function (note here you type "len" not "len()" which would be a call to the function)
  • help(sys) -- overview docs for the sys module (must do an "import sys" first)
  • dir(sys) -- dir() is like help() but just gives a quick list of the defined symbols
  • help(sys.exit) -- docs for the exit() function inside of sys
  • help('xyz'.split) -- it turns out that the module "str" contains the built-in string code, but if you did not know that, you can call help() just using an example of the sort of call you mean: here 'xyz'.foo meaning the foo() method that runs on strings
  • help(list) -- docs for the built in "list" module
  • help(list.append) -- docs for the append() function in the list module

No comments: