radar_sun

Python-Data-Science-Toolbox-Part-2

文章目录

1. Using iterators in PythonLand
- 1.1 Introduction to iterators
- 1.2 Iterators vs Iterables
- 1.3 Iterating over iterables (1)
- 1.4 Iterating over iterables (2)
- 1.5 Iterators as function arguments
- 1.6 Playing with iterators
- 1.7 Using enumerate
- 1.8 Using zip
- 1.9 Using * and zip to 'unzip'
- 1.10 Using iterators to load large files into memory
- 1.11 Processing large amounts of Twitter data
- 1.12 Extracting information for large amounts of Twitter data
- 1.13 Congratulations!
2. List comprehensions and generators
- 2.1 List comprehensions
- 2.2 Write a basic list comprehension
- 2.3 List comprehension over iterables
- 2.4 Writing list comprehensions
- 2.5 Nested list comprehensions
- 2.6 Advanced comprehensions
- 2.7 Using conditionals in comprehensions (1)
- 2.8 Using conditionals in comprehensions (2)
- 2.9 Dict comprehensions
- 2.10 Introduction to generator expressions
- 2.11 List comprehensions vs generators
- 2.12 Write your own generator expressions
- 2.13 Changing the output in generator expressions
- 2.14 Build a generator
- 2.15 Wrapping up comprehensions and generators
- 2.16 List comprehensions for time-stamped data
- 2.17 Conditional list comprehensions for time-stamped data
3. Bringing it all together!
- 3.1 Welcome to the case study!
- 3.2 Dictionaries for data science
- 3.3 Writing a function to help you
- 3.4 Using a list comprehension
- 3.5 Turning this all into a DataFrame
- 3.6 Using Python generators for streaming data
- 3.7 Processing data in chunks (1)
- 3.8 Writing a generator to load data in chunks (2)
- 3.9 Writing a generator to load data in chunks (3)
- 3.10 Using pandas' read_csv iterator for streaming data
- 3.11 Writing an iterator to load data in chunks (1)
- 3.12 Writing an iterator to load data in chunks (2)
- 3.13 Writing an iterator to load data in chunks (3)
- 3.14 Writing an iterator to load data in chunks (4)
- 3.15 Writing an iterator to load data in chunks (5)
- 3.16 Final thoughts

1. Using iterators in PythonLand

1.1 Introduction to iterators

1.2 Iterators vs Iterables

Let’s do a quick recall of what you’ve learned about iterables and iterators. Recall from the video that an iterable is an object that can return an iterator, while an iterator is an object that keeps state and produces the next value when you call next() on it. In this exercise, you will identify which object is an iterable and which is an iterator.

The environment has been pre-loaded with the variables flash1 and flash2. Try printing out their values with print() and next() to figure out which is an iterable and which is an iterator.

$\square$ Both flash1 and flash2 are iterators.

$\square$ Both flash1 and flash2 are iterables.

$\blacksquare$ flash1 is an iterable and flash2 is an iterator.

1.3 Iterating over iterables (1)

Great, you’re familiar with what iterables and iterators are! In this exercise, you will reinforce your knowledge about these by iterating over and printing from iterables and iterators.

You are provided with a list of strings flash. You will practice iterating over the list by using a for loop. You will also create an iterator for the list and access the values from the iterator.

Instruction

Create a for loop to loop over flash and print the values in the list. Use person as the loop variable.
Create an iterator for the list flash and assign the result to superhero.
Print each of the items from superhero using next() 4 times.

# Create a list of strings: flash
flash = ['jay garrick', 'barry allen', 
         'wally west', 'bart allen']
         
# Print each list item in flash using a for loopfor person in flash :    
print(person)

# Create an iterator for flash: superhero
superhero = iter(flash)

# Print each item from the iterator
print(next(superhero))
print(next(superhero))
print(next(superhero))
print(next(superhero))

1.4 Iterating over iterables (2)

One of the things you learned about in this chapter is that not all iterables are actual lists. A couple of examples that we looked at are strings and the use of the range() function. In this exercise, we will focus on the range() function.

You can use range() in a for loop as if it’s a list to be iterated over:

for i in range(5):
    print(i)

Recall that range() doesn’t actually create the list; instead, it creates a range object with an iterator that produces the values until it reaches the limit (in the example, until the value 4). If range() created the actual list, calling it with a value of 10¹⁰⁰ may not work, especially since a number as big as that may go over a regular computer’s memory. The value 10¹⁰⁰ is actually what’s called a Googol which is a 1 followed by a hundred 0s. That’s a huge number!

Your task for this exercise is to show that calling range() with 10¹⁰⁰ won’t actually pre-create the list.

Instruction

Create an iterator object small_value over range(3) using the function iter().
Using a for loop, iterate over range(3), printing the value for every iteration. Use num as the loop variable.
Create an iterator object googol over range(10 ** 100).

# Create an iterator for range(3): small_value
small_value = iter(range(3))
# Print the values in small_value
print(next(small_value))
print(next(small_value))
print(next(small_value))

# Loop over range(3) and print the values
for num in range(3): 
    print(num)

# Create an iterator for range(10 ** 100): googol
googol = iter(range(10 ** 100))
# Print the first 5 values from googol
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))

1.5 Iterators as function arguments

You’ve been using the iter() function to get an iterator object, as well as the next() function to retrieve the values one by one from the iterator object.

There are also functions that take iterators and iterables as arguments. For example, the list() and sum() functions return a list and the sum of elements, respectively.

In this exercise, you will use these functions by passing an iterable from range() and then printing the results of the function calls.

Instruction

Create a range object that would produce the values from 10 to 20 using range(). Assign the result to values.
Use the list() function to create a list of values from the range object values. Assign the result to values_list.
Use the sum() function to get the sum of the values from 10 to 20 from the range object values. Assign the result to values_sum.

# Create a range object: values
values = range(10,21)

# Print the range object
print(values)

# Create a list of integers: values_list
values_list = list(values)

# Print values_list
print(values_list)

# Get the sum of values: values_sum
values_sum = sum(values)

# Print values_sum
print(values_sum)

1.6 Playing with iterators

1.7 Using enumerate

You’re really getting the hang of using iterators, great job!

You’ve just gained several new ideas on iterators from the last video and one of them is the enumerate() function. Recall that enumerate() returns an enumerate object that produces a sequence of tuples, and each of the tuples is an index-value pair.

In this exercise, you are given a list of strings mutants and you will practice using enumerate() on it by printing out a list of tuples and unpacking the tuples using a for loop.

Instruction

Create a list of tuples from mutants and assign the result to mutant_list. Make sure you generate the tuples using enumerate() and turn the result from it into a list using list().
Complete the first for loop by unpacking the tuples generated by calling enumerate() on mutants. Use index1 for the index and value1 for the value when unpacking the tuple.
Complete the second for loop similarly as with the first, but this time change the starting index to start from 1 by passing it in as an argument to the start parameter of enumerate(). Use index2 for the index and value2 for the value when unpacking the tuple.

# Create a list of strings: mutants
mutants = ['charles xavier',             
           'bobby drake',             
           'kurt wagner',             
           'max eisenhardt',             
           'kitty pryde']
           
# Create a list of tuples: mutant_list
mutant_list = list(enumerate(mutants))

# Print the list of tuples
print(mutant_list)

# Unpack and print the tuple pairs
for index1, value1 in enumerate(mutants):    
    print(index1, value1)
    
# Change the start index
for index2, value2 in enumerate(mutants, start = 1):    
    print(index2, value2)

1.8 Using zip

Another interesting function that you’ve learned is zip(), which takes any number of iterables and returns a zip object that is an iterator of tuples. If you wanted to print the values of a zip object, you can convert it into a list and then print it. Printing just a zip object will not return the values unless you unpack it first. In this exercise, you will explore this for yourself.

Three lists of strings are pre-loaded: mutants, aliases, and powers. First, you will use list() and zip() on these lists to generate a list of tuples. Then, you will create a zip object using zip(). Finally, you will unpack this zip object in a for loop to print the values in each tuple. Observe the different output generated by printing the list of tuples, then the zip object, and finally, the tuple values in the for loop.

Instruction

Using zip() with list(), create a list of tuples from the three lists mutants, aliases, and powers (in that order) and assign the result to mutant_data.
Using zip(), create a zip object called mutant_zip from the three lists mutants, aliases, and powers.
Complete the for loop by unpacking the zip object you created and printing the tuple values. Use value1, value2, value3 for the values from each of mutants, aliases, and powers, in that order.

# Create a list of tuples: mutant_data
mutant_data = list(zip(mutants, 
                       aliases, 
                       powers))

# Print the list of tuples
print(mutant_data)

# Create a zip object using the three lists: mutant_zip
mutant_zip = zip(mutants, 
                 aliases, 
                 powers)
                 
# Print the zip object
print(mutant_zip)

# Unpack the zip object and print the tuple values
for value1, value2, value3 in mutant_zip:
    print(value1, value2, value3)

1.9 Using * and zip to ‘unzip’

You know how to use zip() as well as how to print out values from a zip object. Excellent!

Let’s play around with zip() a little more. There is no unzip function for doing the reverse of what zip() does. We can, however, reverse what has been zipped together by using zip() with a little help from * ! * unpacks an iterable such as a list or a tuple into positional arguments in a function call.

In this exercise, you will use * in a call to zip() to unpack the tuples produced by zip().

Two tuples of strings, mutants and powers have been pre-loaded.

Instruction

Create a zip object by using zip() on mutants and powers, in that order. Assign the result to z1.
Print the tuples in z1 by unpacking them into positional arguments using the * operator in a print() call.
Because the previous print() call would have exhausted the elements in z1, recreate the zip object you defined earlier and assign the result again to z1.
‘Unzip’ the tuples in z1 by unpacking them into positional arguments using the * operator in a zip() call. Assign the results to result1 and result2, in that order.
The last print() statements prints the output of comparing result1 to mutants and result2 to powers. Click Submit Answer to see if the unpacked result1 and result2 are equivalent to mutants and powers, respectively.

# Create a zip object from mutants and powers: z1
z1 = zip(mutants,powers)

# Print the tuples in z1 by unpacking with *
print(*z1)

# Re-create a zip object from mutants and powers: z1
z1 = zip(mutants,powers)

# 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
result1, result2 = zip(*z1)

# Check if unpacked tuples are equivalent to original tuples
print(result1 == mutants)
print(result2 == powers)

1.10 Using iterators to load large files into memory

1.11 Processing large amounts of Twitter data

Sometimes, the data we have to process reaches a size that is too much for a computer’s memory to handle. This is a common problem faced by data scientists. A solution to this is to process an entire data source chunk by chunk, instead of a single go all at once.
In this exercise, you will do just that. You will process a large csv file of Twitter data in the same way that you processed 'tweets.csv' in Bringing it all together exercises of the prequel course, but this time, working on it in chunks of 10 entries at a time.

Instruction

Initialize an empty dictionary counts_dict for storing the results of processing the Twitter data.
Iterate over the 'tweets.csv' file by using a for loop. Use the loop variable chunk and iterate over the call to pd.read_csv() with a chunksize of 10.
In the inner loop, iterate over the column 'lang' in chunk by using a for loop. Use the loop variable entry.

# Initialize an empty dictionary: counts_dict
counts_dict = {}

# Iterate over the file chunk by chunk
for chunk in pd.read_csv('tweets.csv', chunksize=10):
    # Iterate over the column in DataFrame    
    for entry in chunk['lang']:        
        if entry in counts_dict.keys():
            counts_dict[entry] += 1        
            else:            
                counts_dict[entry] = 1
                
# Print the populated dictionary
print(counts_dict)

1.12 Extracting information for large amounts of Twitter data

Great job chunking out that file in the previous exercise. You now know how to deal with situations where you need to process a very large file and that’s a very useful skill to have!

It’s good to know how to process a file in smaller, more manageable chunks, but it can become very tedious having to write and rewrite the same code for the same task each time. In this exercise, you will be making your code more reusable by putting your work in the last exercise in a function definition.

Instruction

Define the function count_entries(), which has 3 parameters. The first parameter is csv_file for the filename, the second is c_size for the chunk size, and the last is colname for the column name.
Iterate over the file in csv_file file by using a for loop. Use the loop variable chunk and iterate over the call to pd.read_csv(), passing c_size to chunksize.
In the inner loop, iterate over the column given by colname in chunk by using a for loop. Use the loop variable entry.
Call the count_entries() function by passing to it the filename 'tweets.csv', the size of chunks 10, and the name of the column to count, 'lang'. Assign the result of the call to the variable result_counts.

# Define count_entries()
def count_entries(csv_file,c_size,colname):    
"""Return a dictionary with counts of occurrences as value for each key."""        
    # Initialize an empty dictionary: counts_dict    
    counts_dict = {}
    # Iterate over the file chunk by chunk    
    for chunk in pd.read_csv(csv_file,chunksize = c_size):
        # Iterate over the column in DataFrame        
        for entry in chunk[colname]:            
            if entry in counts_dict.keys():                
                counts_dict[entry] += 1            
            else:                
                counts_dict[entry] = 1
    # Return counts_dict    
    return counts_dict
    
# Call count_entries(): result_counts
result_counts = count_entries('tweets.csv',10,'lang')

# Print result_counts
print(result_counts)

1.13 Congratulations!

2. List comprehensions and generators

2.1 List comprehensions

2.2 Write a basic list comprehension

In this exercise, you will practice what you’ve learned from the video about writing list comprehensions. You will write a list comprehension and identify the output that will be produced.

The following list has been pre-loaded in the environment.

doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson']

How would a list comprehension that produces a list of the first character of each string in doctor look like? Note that the list comprehension uses doc as the iterator variable. What will the output be?

$\square$ The list comprehension is [for doc in doctor: doc[0]] and produces the list ['h', 'c', 'c', 't', 'w'].

$\blacksquare$ The list comprehension is [doc[0] for doc in doctor] and produces the list ['h', 'c', 'c', 't', 'w'].

$\square$ The list comprehension is [doc[0] in doctor] and produces the list ['h', 'c', 'c', 't', 'w'].

2.3 List comprehension over iterables

You know that list comprehensions can be built over iterables. Given the following objects below, which of these can we build list comprehensions over?

doctor = ['house', 'cuddy', 'chase', 
          'thirteen', 'wilson']

range(50)

underwood = 'After all, we are nothing more or less than what we choose to reveal.'

jean = '24601'

flash = ['jay garrick', 'barry allen', 
         'wally west', 'bart allen']

valjean = 24601

$\square$ You can build list comprehensions over all the objects except the string of number characters jean.

$\square$ You can build list comprehensions over all the objects except the string lists doctor and flash.

$\square$ You can build list comprehensions over all the objects except range(50).

$\blacksquare$ You can build list comprehensions over all the objects except the integer object valjean.

2.4 Writing list comprehensions

You now have all the knowledge necessary to begin writing list comprehensions! Your job in this exercise is to write a list comprehension
that produces a list of the squares of the numbers ranging from 0 to 9.

Instruction
Using the range of numbers from 0 to 9 as your iterable and i as your iterator variable, write a list comprehension that produces a list of numbers consisting of the squared values of i.

# Create list comprehension: squares
squares = [i*i for i in range(10)]

2.5 Nested list comprehensions

Great! At this point, you have a good grasp of the basic syntax of list comprehensions. Let’s push your code-writing skills a little further. In this exercise, you will be writing a list comprehension within another list comprehension, or nested list comprehensions. It sounds a little tricky, but you can do it!

Let’s step aside for a while from strings. One of the ways in which lists can be used are in representing multi-dimension objects such as matrices. Matrices can be represented as a list of lists in Python. For example a 5 x 5 matrix with values 0 to 4 in each row can be written as:

matrix = [[0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4]]

Your task is to recreate this matrix by using nested listed comprehensions. Recall that you can create one of the rows of the matrix with a single list comprehension. To create the list of lists, you simply have to supply the list comprehension as the output expression of the overall list comprehension:

[[output expression] for iterator variable in iterable]

Note that here, the output expression is itself a list comprehension.

Instruction

In the inner list comprehension - that is, the output expression of the nested list comprehension - create a list of values from 0 to 4 using range(). Use col as the iterator variable.
In the iterable part of your nested list comprehension, use range() to count 5 rows - that is, create a list of values from 0 to 4. Use row as the iterator variable; note that you won’t be needing this to create values in the list of lists.

# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(5)] for row in range(5)]

# Print the matrix
for row in matrix:    
print(row)

2.6 Advanced comprehensions

2.7 Using conditionals in comprehensions (1)

You’ve been using list comprehensions to build lists of values, sometimes using operations to create these values.

An interesting mechanism in list comprehensions is that you can also create lists with values that meet only a certain condition. One way of doing this is by using conditionals on iterator variables. In this exercise, you will do exactly that!

Recall from the video that you can apply a conditional statement to test the iterator variable by adding an if statement in the optional predicate expression part after the for statement in the comprehension:

[ output expression for iterator variable in iterable if predicate expression ].

You will use this recipe to write a list comprehension for this exercise. You are given a list of strings fellowship and, using a list comprehension, you will create a list that only includes the members of fellowship that have 7 characters or more.

Instruction
Use member as the iterator variable in the list comprehension. For the conditional, use len() to evaluate the iterator variable. Note that you only want strings with 7 characters or more.

# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 
              'legolas', 'boromir', 'gimli']
              
# Create list comprehension: new_fellowship

new_fellowship = [member for member in fellowship if len(member) >= 7]

# Print the new list
print(new_fellowship)

2.8 Using conditionals in comprehensions (2)

In the previous exercise, you used an if conditional statement in the predicate expression part of a list comprehension to evaluate an iterator variable. In this exercise, you will use an if-else statement on the output expression of the list.

You will work on the same list, fellowship and, using a list comprehension and an if-else conditional statement in the output expression, create a list that keeps members of fellowship with 7 or more characters and replaces others with an empty string. Use member as the iterator variable in the list comprehension.

Instruction
In the output expression, keep the string as-is if the number of characters is >= 7, else replace it with an empty string - that is, '' or "".

# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 
              'legolas', 'boromir', 'gimli']
              
# Create list comprehension: new_fellowship
new_fellowship = [member if len(member) >= 7 else '' 
                  for member in fellowship]

# Print the new list
print(new_fellowship)

2.9 Dict comprehensions

Comprehensions aren’t relegated merely to the world of lists. There are many other objects you can build using comprehensions, such as dictionaries, pervasive objects in Data Science. You will create a dictionary using the comprehension syntax for this exercise. In this case, the comprehension is called a dict comprehension.

Recall that the main difference between a list comprehension and a dict comprehension is the use of curly braces {} instead of []. Additionally, members of the dictionary are created using a colon :, as in : .

You are given a list of strings fellowship and, using a dict comprehension, create a dictionary with the members of the list as the keys and the length of each string as the corresponding values.

Instruction
Create a dict comprehension where the key is a string in fellowship and the value is the length of the string. Remember to use the syntax : in the output expression part of the comprehension to create the members of the dictionary. Use member as the iterator variable.

# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 
              'legolas', 'boromir', 'gimli']
              
# Create dict comprehension: new_fellowship
new_fellowship = {member: len(member) for member in fellowship}

# Print the new dictionary
print(new_fellowship)

2.10 Introduction to generator expressions

2.11 List comprehensions vs generators

You’ve seen from the videos that list comprehensions and generator expressions look very similar in their syntax, except for the use of parentheses () in generator expressions and brackets [] in list comprehensions.

In this exercise, you will recall the difference between list comprehensions and generators. To help with that task, the following code has been pre-loaded in the environment:

# List of strings
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 
              'legolas', 'boromir', 'gimli']

# List comprehension
fellow1 = [member for member in fellowship if len(member) >= 7]

# Generator expression
fellow2 = (member for member in fellowship if len(member) >= 7)

Try to play around with fellow1 and fellow2 by figuring out their types and printing out their values. Based on your observations and what you can recall from the video, select from the options below the best description for the difference between list comprehensions and generators.

$\square$ List comprehensions and generators are not different at all; they are just different ways of writing the same thing.

$\blacksquare$ A list comprehension produces a list as output, a generator produces a generator object.

$\square$ A list comprehension produces a list as output that can be iterated over, a generator produces a generator object that can’t be iterated over.

2.12 Write your own generator expressions

You are familiar with what generators and generator expressions are, as well as its difference from list comprehensions. In this exercise, you will practice building generator expressions on your own.

Recall that generator expressions basically have the same syntax as list comprehensions, except that it uses parentheses () instead of brackets []; this should make things feel familiar! Furthermore, if you have ever iterated over a dictionary with .items(), or used the range() function, for example, you have already encountered and used generators before, without knowing it! When you use these functions, Python creates generators for you behind the scenes.

Now, you will start simple by creating a generator object that produces numeric values.

Instruction

Create a generator object that will produce values from 0 to 30. Assign the result to result and use num as the iterator variable in the generator expression.
Print the first 5 values by using next() appropriately in print().
Print the rest of the values by using a for loop to iterate over the generator object.

# Create generator object: result
result = (num for num in range(31))

# Print the first 5 values
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print(next(result))

# Print the rest of the values
for value in result:    
  print(value)

2.13 Changing the output in generator expressions

reat! At this point, you already know how to write a basic generator expression. In this exercise, you will push this idea a little further by adding to the output expression of a generator expression. Because generator expressions and list comprehensions are so alike in syntax, this should be a familiar task for you!

You are given a list of strings lannister and, using a generator expression, create a generator object that you will iterate over to print its values.

Instruction

Write a generator expression that will generate the lengths of each string in lannister. Use person as the iterator variable. Assign the result to lengths.
Supply the correct iterable in the for loop for printing the values in the generator object.

# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 
             'tyrion', 'joffrey']
             
# Create a generator object: lengths
lengths = (len(person) for person in lannister)

# Iterate over and print the values in lengths
for value in lengths:    
print(value)

2.14 Build a generator

In previous exercises, you’ve dealt mainly with writing generator expressions, which uses comprehension syntax. Being able to use comprehension syntax for generator expressions made your work so much easier!

Now, recall from the video that not only are there generator expressions, there are generator functions as well. Generator functions are functions that, like generator expressions, yield a series of values, instead of returning a single value. A generator function is defined as you do a regular function, but whenever it generates a value, it uses the keyword yield instead of return.

In this exercise, you will create a generator function with a similar mechanism as the generator expression you defined in the previous exercise:

lengths = (len(person) for person in lannister)

Instruction

Complete the function header for the function get_lengths() that has a single parameter, input_list.
In the for loop in the function definition, yield the length of the strings in input_list.
Complete the iterable part of the for loop for printing the values generated by the get_lengths() generator function. Supply the call to get_lengths(), passing in the list lannister.

# Create a list of strings
lannister = ['cersei', 'jaime', 'tywin', 
             'tyrion', 'joffrey']
             
# Define generator function get_lengths
def get_lengths(input_list):    
"""Generator function that yields the length of the strings in input_list."""

    # Yield the length of a string    
    for person in input_list:        
        yield len(person)
        
# Print the values generated by get_lengths()
for value in get_lengths(lannister):    
    print(value)

2.15 Wrapping up comprehensions and generators

2.16 List comprehensions for time-stamped data

You will now make use of what you’ve learned from this chapter to solve a simple data extraction problem. You will also be introduced to a data structure, the pandas Series, in this exercise. We won’t elaborate on it much here, but what you should know is that it is a data structure that you will be working with a lot of times when analyzing data from pandas DataFrames. You can think of DataFrame columns as single-dimension arrays called Series.

In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use.

Instruction

Extract the column 'created_at' from df and assign the result to tweet_time. Fun fact: the extracted column in tweet_time here is a Series data structure!
Create a list comprehension that extracts the time from each row in tweet_time. Each row is a string that represents a timestamp, and you will access the 12th to 19th characters in the string to extract the time. Use entry as the iterator variable and assign the result to tweet_clock_time. Remember that Python uses 0-based indexing!

# Extract the created_at column from df: tweet_time
tweet_time =df['created_at']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time]

# Print the extracted times
print(tweet_clock_time)

2.17 Conditional list comprehensions for time-stamped data

Great, you’ve successfully extracted the data of interest, the time, from a pandas DataFrame! Let’s tweak your work further by adding a conditional that further specifies which entries to select.

In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. You will add a conditional expression to the list comprehension so that you only select the times in which entry[17:19] is equal to '19'. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use.

Instruction

Extract the column 'created_at' from df and assign the result to tweet_time.
Create a list comprehension that extracts the time from each row in tweet_time. Each row is a string that represents a timestamp, and you will access the 12th to 19th characters in the string to extract the time. Use entry as the iterator variable and assign the result to tweet_clock_time. Additionally, add a conditional expression that checks whether entry[17:19] is equal to '19'.

# Extract the created_at column from df: tweet_time
tweet_time = df['created_at']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == '19']

# Print the extracted times
print(tweet_clock_time)

3. Bringing it all together!

3.1 Welcome to the case study!

3.2 Dictionaries for data science

For this exercise, you’ll use what you’ve learned about the zip() function and combine two lists into a dictionary.

These lists are actually extracted from a bigger dataset file of world development indicators from the World Bank. For pedagogical purposes, we have pre-processed this dataset into the lists that you’ll be working with.

The first list feature_names contains header names of the dataset and the second list row_vals contains actual values of a row from the dataset, corresponding to each of the header names.

Instruction

Create a zip object by calling zip() and passing to it feature_names and row_vals. Assign the result to zipped_lists.
Create a dictionary from the zipped_lists zip object by calling dict() with zipped_lists. Assign the resulting dictionary to rs_dict.

# Zip lists: zipped_lists
zipped_lists = zip(feature_names, row_vals)

# Create a dictionary: rs_dict
rs_dict = dict(zipped_lists)

# Print the dictionary
print(rs_dict)

3.3 Writing a function to help you

Suppose you needed to repeat the same process done in the previous exercise to many, many rows of data. Rewriting your code again and again could become very tedious, repetitive, and unmaintainable.

In this exercise, you will create a function to house the code you wrote earlier to make things easier and much more concise. Why? This way, you only need to call the function and supply the appropriate lists to create your dictionaries! Again, the lists feature_names and row_vals are preloaded and these contain the header names of the dataset and actual values of a row from the dataset, respectively.

Instruction

Define the function lists2dict() with two parameters: first is list1 and second is list2.
Return the resulting dictionary rs_dict in lists2dict().
Call the lists2dict() function with the arguments feature_names and row_vals. Assign the result of the function call to rs_fxn.

# Define lists2dict()
def lists2dict(list1, list2):    
"""Return a dictionary where list1 provides the keys and list2 provides the values."""
    # Zip lists: zipped_lists    
    zipped_lists = zip(list1, list2)
    # Create a dictionary: rs_dict    
    rs_dict = dict(zipped_lists)
    # Return the dictionary    
    return rs_dict
    
# Call lists2dict: 
rs_fxnrs_fxn = lists2dict(feature_names, row_vals)

# Print rs_fxn
print(rs_fxn)

3.4 Using a list comprehension

This time, you’re going to use the lists2dict() function you defined in the last exercise to turn a bunch of lists into a list of dictionaries with the help of a list comprehension.

The lists2dict() function has already been preloaded, together with a couple of lists, feature_names and row_lists. feature_names contains the header names of the World Bank dataset and row_lists is a list of lists, where each sublist is a list of actual values of a row from the dataset.

Your goal is to use a list comprehension to generate a list of dicts, where the keys are the header names and the values are the row entries.

Instruction

Inspect the contents of row_lists by printing the first two lists in row_lists.
Create a list comprehension that generates a dictionary using lists2dict() for each sublist in row_lists. The keys are from the feature_names list and the values are the row entries in row_lists. Use sublist as your iterator variable and assign the resulting list of dictionaries to list_of_dicts.
Look at the first two dictionaries in list_of_dicts by printing them out.

# Print the first two lists in row_lists
print(row_lists[0])
print(row_lists[1])

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]

# Print the first two dictionaries in list_of_dicts
print(list_of_dicts[0])
print(list_of_dicts[1])

3.5 Turning this all into a DataFrame

You’ve zipped lists together, created a function to house your code, and even used the function in a list comprehension to generate a list of dictionaries. That was a lot of work and you did a great job!

You will now use of all these to convert the list of dictionaries into a pandas DataFrame. You will see how convenient it is to generate a DataFrame from dictionaries with the DataFrame() function from the pandas package.

The lists2dict() function, feature_names list, and row_lists list have been preloaded for this exercise.

Instruction

To use the DataFrame() function you need, first import the pandas package with the alias pd.
Create a DataFrame from the list of dictionaries in list_of_dicts by calling pd.DataFrame(). Assign the resulting DataFrame to df.
Inspect the contents of df printing the head of the DataFrame. Head of the DataFrame df can be accessed by calling df.head().

# Import the pandas package
import pandas as pd

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]

# Turn list of dicts into a DataFrame: df
df = pd.DataFrame(list_of_dicts)

# Print the head of the DataFrame
print(df.head())

3.6 Using Python generators for streaming data

3.7 Processing data in chunks (1)

Sometimes, data sources can be so large in size that storing the entire dataset in memory becomes too resource-intensive. In this exercise, you will process the first 1000 rows of a file line by line, to create a dictionary of the counts of how many times each country appears in a column in the dataset.

The csv file 'world_dev_ind.csv' is in your current directory for your use. To begin, you need to open a connection to this file using what is known as a context manager. For example, the command with open('datacamp.csv') as datacamp binds the csv file 'datacamp.csv' as datacamp in the context manager. Here, the with statement is the context manager, and its purpose is to ensure that resources are efficiently allocated when opening a connection to a file.

Instruction

Use open() to bind the csv file 'world_dev_ind.csv' as file in the context manager.
Complete the for loop so that it iterates 1000 times to perform the loop body and process only the first 1000 rows of data of the file.

# Open a connection to the file
with open('world_dev_ind.csv') as file:
    # Skip the column names    
    file.readline()
    # Initialize an empty dictionary: counts_dict    
    counts_dict = {}
    # Process only the first 1000 rows    
    for j in range(0, 1000):
        # Split the current line into a list: line        
        line = file.readline().split(',')
        # Get the value for the first column: first_col        
        first_col = line[0]
        # If the column value is in the dict, increment its value        
        if first_col in counts_dict.keys():            
            counts_dict[first_col] += 1
        # Else, add to the dict and set value to 1        
        else:            
            counts_dict[first_col] = 1
            
# Print the resulting dictionary
print(counts_dict)

3.8 Writing a generator to load data in chunks (2)

In the previous exercise, you processed a file line by line for a given number of lines. What if, however, you want to do this for the entire file?

In this case, it would be useful to use generators. Generators allow users to lazily evaluate data. This concept of lazy evaluation is useful when you have to deal with very large datasets because it lets you generate values in an efficient manner by yielding only chunks of data at a time instead of the whole thing at once.

In this exercise, you will define a generator function read_large_file() that produces a generator object which yields a single line from a file each time next() is called on it. The csv file 'world_dev_ind.csv' is in your current directory for your use.

Note that when you open a connection to a file, the resulting file object is already a generator! So out in the wild, you won’t have to explicitly create generator objects in cases such as this. However, for pedagogical reasons, we are having you practice how to do this here with the read_large_file() function.

Instruction

In the function read_large_file(), read a line from file_object by using the method readline(). Assign the result to data.
In the function read_large_file(), yield the line read from the file data.
In the context manager, create a generator object gen_file by calling your generator function read_large_file() and passing file to it.
Print the first three lines produced by the generator object gen_file using next().

# Define read_large_file()
def read_large_file(file_object):    
"""A generator function to read a large file lazily."""
    # Loop indefinitely until the end of the file    
    while True:
        # Read a line from the file: data        
        data = file_object.readline()
        # Break if this is the end of the file        if not data:            
        break
        # Yield the line of data        
        yield data
        
# Open a connection to the file
with open('world_dev_ind.csv') as file:
    # Create a generator object for the file: gen_file    
    gen_file = read_large_file(file)
    # Print the first three lines of the file 
       print(next(gen_file))    
       print(next(gen_file))    
       print(next(gen_file))

3.9 Writing a generator to load data in chunks (3)

Great! You’ve just created a generator function that you can use to help you process large files.

Now let’s use your generator function to process the World Bank dataset like you did previously. You will process the file line by line, to create a dictionary of the counts of how many times each country appears in a column in the dataset. For this exercise, however, you won’t process just 1000 rows of data, you’ll process the entire dataset!

The generator function read_large_file() and the csv file 'world_dev_ind.csv' are preloaded and ready for your use. Go for it!

Instruction

Bind the file 'world_dev_ind.csv' to file in the context manager with open().
Complete the for loop so that it iterates over the generator from the call to read_large_file() to process all the rows of the file.

# Initialize an empty dictionary: counts_dict
counts_dict = {}

# Open a connection to the file
with open('world_dev_ind.csv') as file:
    # Iterate over the generator from read_large_file()    
    for line in read_large_file(file):
        row = line.split(',')        
        first_col = row[0]
        if first_col in counts_dict.keys():            
            counts_dict[first_col] += 1        
        else:            
            counts_dict[first_col] = 1

# Print            
print(counts_dict)

3.10 Using pandas’ read_csv iterator for streaming data

3.11 Writing an iterator to load data in chunks (1)

Another way to read data too large to store in memory in chunks is to read the file in as DataFrames of a certain length, say, 100. For example, with the pandas package (imported as pd), you can do pd.read_csv(filename, chunksize=100). This creates an iterable reader object, which means that you can use next() on it.

In this exercise, you will read a file in small DataFrame chunks with read_csv(). You’re going to use the World Bank Indicators data 'ind_pop.csv', available in your current directory, to look at the urban population indicator for numerous countries and years.

Instruction

Use pd.read_csv() to read in 'ind_pop.csv' in chunks of size 10. Assign the result to df_reader.
Print the first two chunks from df_reader.

# Import the pandas package
import pandas as pd

# Initialize reader object: df_reade
rdf_reader = pd.read_csv('ind_pop.csv', 
                         chunksize=10)
                         
# Print two chunks
print(next(df_reader))
print(next(df_reader))

3.12 Writing an iterator to load data in chunks (2)

In the previous exercise, you used read_csv() to read in DataFrame chunks from a large dataset. In this exercise, you will read in a file using a bigger DataFrame chunk size and then process the data from the first chunk.

To process the data, you will create another DataFrame composed of only the rows from a specific country. You will then zip together two of the columns from the new DataFrame, 'Total Population' and 'Urban population (% of total)'. Finally, you will create a list of tuples from the zip object, where each tuple is composed of a value from each of the two columns mentioned.

Instruction

Use pd.read_csv() to read in the file in 'ind_pop_data.csv' in chunks of size 1000. Assign the result to urb_pop_reader.
Get the first DataFrame chunk from the iterable urb_pop_reader and assign this to df_urb_pop.
Select only the rows of df_urb_pop that have a 'CountryCode' of 'CEB'. To do this, compare whether df_urb_pop['CountryCode'] is equal to 'CEB' within the square brackets in df_urb_pop[____].
Using zip(), zip together the 'Total Population' and 'Urban population (% of total)' columns of df_pop_ceb. Assign the resulting zip object to pops.

# Initialize reader object: urb_pop_reader
urb_pop_reader = pd.read_csv('ind_pop_data.csv',
                              chunksize=1000)
                              
# Get the first DataFrame chunk: df_urb_pop
df_urb_pop = next(urb_pop_reader)

# Check out the head of the DataFrame
print(df_urb_pop.head())

# Check out specific country: df_pop_ceb
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']

# Zip DataFrame columns of interest: pops
pops = zip(df_pop_ceb['Total Population'],
           df_pop_ceb['Urban population (% of total)'])

# Turn zip object into list: pops_list
pops_list = list(pops)

# Print pops_list
print(pops_list)

3.13 Writing an iterator to load data in chunks (3)

You’re getting used to reading and processing data in chunks by now. Let’s push your skills a little further by adding a column to a DataFrame.

Starting from the code of the previous exercise, you will be using a list comprehension to create the values for a new column 'Total Urban Population' from the list of tuples that you generated earlier. Recall from the previous exercise that the first and second elements of each tuple consist of, respectively, values from the columns 'Total Population' and 'Urban population (% of total)'. The values in this new column 'Total Urban Population', therefore, are the product of the first and second element in each tuple. Furthermore, because the 2nd element is a percentage, you need to divide the entire result by 100, or alternatively, multiply it by 0.01.

You will also plot the data from this new column to create a visualization of the urban population data.

Instruction

Write a list comprehension to generate a list of values from pops_list for the new column 'Total Urban Population'. The output expression should be the product of the first and second element in each tuple in pops_list. Because the 2nd element is a percentage, you also need to either multiply the result by 0.01 or divide it by 100. In addition, note that the column 'Total Urban Population' should only be able to take on integer values. To ensure this, make sure you cast the output expression to an integer with int().
Create a scatter plot where the x-axis are values from the 'Year' column and the y-axis are values from the 'Total Urban Population' column.

# Initialize reader object: urb_pop_reader
urb_pop_reader = pd.read_csv('ind_pop_data.csv',
                             chunksize=1000)
                             
# Get the first DataFrame chunk: df_urb_pop
df_urb_pop = next(urb_pop_reader)

# Check out specific country: df_pop_ceb
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']

# Zip DataFrame columns of interest: pops
pops = zip(df_pop_ceb['Total Population'],     
           df_pop_ceb['Urban population (% of total)'])
           
# Turn zip object into list: pops_list
pops_list = list(pops)

# Use list comprehension to create new DataFrame column 'Total Urban Population'
df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup in pops_list]

# Plot urban population data
df_pop_ceb.plot(kind='scatter', 
                x='Year', 
                y='Total Urban Population')
                
plt.show()

3.14 Writing an iterator to load data in chunks (4)

In the previous exercises, you’ve only processed the data from the first DataFrame chunk. This time, you will aggregate the results over all the DataFrame chunks in the dataset. This basically means you will be processing the entire dataset now. This is neat because you’re going to be able to process the entire large dataset by just working on smaller pieces of it!

Instruction

Initialize an empty DataFrame data using pd.DataFrame().
In the for loop, iterate over urb_pop_reader to be able to process all the DataFrame chunks in the dataset.
Using the method append() of the DataFrame data, append df_pop_ceb to data.

# Initialize reader object: urb_pop_reader
urb_pop_reader = pd.read_csv('ind_pop_data.csv',
                             chunksize=1000)
                             
# Initialize empty DataFrame: data
data = pd.DataFrame()

# Iterate over each DataFrame chunk
for df_urb_pop in urb_pop_reader:
    # Check out specific country: df_pop_ceb    
    df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']
    
    # Zip DataFrame columns of interest: pops    
    pops = zip(df_pop_ceb['Total Population'],
               df_pop_ceb['Urban population (% of total)'])
               
    # Turn zip object into list: pops_list    
    pops_list = list(pops)
    
    # Use list comprehension to create new DataFrame column 'Total Urban Population'    
    df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup in pops_list]
            
    # Append DataFrame chunk to data: data    
    data = data.append(df_pop_ceb)
    
# Plot urban population data
data.plot(kind='scatter', 
          x='Year', 
          y='Total Urban Population')
          
plt.show()

3.15 Writing an iterator to load data in chunks (5)

This is the last leg. You’ve learned a lot about processing a large dataset in chunks. In this last exercise, you will put all the code for processing the data into a single function so that you can reuse the code without having to rewrite the same things all over again.

You’re going to define the function plot_pop() which takes two arguments: the filename of the file to be processed, and the country code of the rows you want to process in the dataset.

Because all of the previous code you’ve written in the previous exercises will be housed in plot_pop(), calling the function already does the following:

Loading of the file chunk by chunk,
Creating the new column of urban population values, and
Plotting the urban population data.

That’s a lot of work, but the function now makes it convenient to repeat the same process for whatever file and country code you want to process and visualize!

You’re going to use the data from 'ind_pop_data.csv', available in your current directory. The packages pandas and matplotlib.pyplot has been imported as pd and plt respectively for your use.

After you are done, take a moment to look at the plots and reflect on the new skills you have acquired. The journey doesn’t end here! If you have enjoyed working with this data, you can continue exploring it using the pre-processed version available on Kaggle.

Instruction

Define the function plot_pop() that has two arguments: first is filename for the file to process and second is country_code for the country to be processed in the dataset.
Call plot_pop() to process the data for country code 'CEB' in the file 'ind_pop_data.csv'.
Call plot_pop() to process the data for country code 'ARB' in the file 'ind_pop_data.csv'.

# Define plot_pop()
def plot_pop(filename, country_code):
    # Initialize reader object: urb_pop_reader    
    urb_pop_reader = pd.read_csv(filename, 
                                 chunksize=1000)
                                 
    # Initialize empty DataFrame: data    
    data = pd.DataFrame()        
    # Iterate over each DataFrame chunk    
    for df_urb_pop in urb_pop_reader:        
    # Check out specific country: df_pop_ceb        
        df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == country_code]
    
        # Zip DataFrame columns of interest: pops        
        pops = zip(df_pop_ceb['Total Population'],
                   df_pop_ceb['Urban population (% of total)'])
        # Turn zip object into list: pops_list        
        pops_list = list(pops)
        # Use list comprehension to create new DataFrame column 'Total Urban Population'        
        df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup in pops_list]            # Append DataFrame chunk to data: data        
        data = data.append(df_pop_ceb)
    # Plot urban population data    
    data.plot(kind='scatter', 
              x='Year', 
              y='Total Urban Population')
     plt.show()
     
# Set the filename: fn
fn = 'ind_pop_data.csv'

# Call plot_pop for country code 'CEB'
plot_pop(fn, 'CEB')

# Call plot_pop for country code 'ARB'
plot_pop(fn, 'ARB')

3.16 Final thoughts

你可能感兴趣的:(python)

python递推法_如何使用Python递归函数中的递推？热茶走 python递推法
我们大家都知道，一个函数可能存在多种不同的用法，很少是有函数只针对一个方式，那么基于一种函数，我们肯定要了解多个方式，今日针对递归函数里的递推内容给大家介绍哦~递归是什么？是指函数/过程/子程序在运行过程序中直接或间接调用自身而产生的重入现象。下面是个人理解：递归就是在函数内部调用自己的函数被称之为递归。实例：#直接调用自己：deffunc:print('fromfunc')funcFunc#间接
python递推式_Python 递推式构造列表(List Comprehensions) man One python递推式
你需要构造一个新的列表,列表中的元素是从一个已知列表中的元素计算而得到的.比如你要创建一个列表,里面的元素是另一个列表中的元素加23后得到的.使用递推式构造列表是最理想的方法:thenewlist=[x+23forxintheoldlist]如果你希望用一个列表中大于5的元素构造一个新的列表,使用递推式也是很方便的:thenewlist=[xforxintheoldlistifx>5]如果你希望将
Dash 简介 tankusa dash
Dash是一个基于Python的开源框架，专门用于构建数据分析和数据可视化的Web应用程序。Dash由Plotly团队开发，旨在帮助数据分析师、数据科学家和开发人员快速创建交互式的、基于数据的Web应用，而无需深入掌握前端技术（如HTML、CSS和JavaScript）。Dash的核心优势在于其简单易用性和强大的功能。通过Dash，用户可以使用纯Python代码来构建复杂的Web应用，而无需编写繁
视频下载插件：yt-dlp 小怪兽长大啦 python
Yt-dlp插件使用下载方法方法一：Python插件下载使用pip工具安装即可:pipinstallyt-dlp.Python已经配置过环境变量，下载yt-dlp时不需要配置。方法二：直接下载EXE可执行文件网上下载yt-dlp应用程序：https://github.com/yt-dlp/yt-dlp/releases配置环境变量。常用使用命令（配置好环境变量后，控制台下输入命令即可）直接下载视频
Python __init__.py 模块详解鱼丸丶粗面 Python __init__.py
文章目录1概述2导入演示2.1执行顺序：先父后子2.2导入所有模块（含子模块）1概述1.工具:Pycharm场景:在创建一个PythonPackage时，会默认在该包下生成一个'__init__.py'文件2.目的:'进行一些初始化操作'(1)当importpackage时，"自动"执行'__init__.py'文件中的内容(2)常用于导入模块2导入演示2.1执行顺序：先父后子目录结构：目录结构简
Python __init__.py 愚昧之山绝望之谷开悟之坡 python init
Python__init__.py作用详解尼古拉苏关注12018.06.1012:57:34字数745阅读45,278转载于：https://www.cnblogs.com/tp1226/p/8453854.html__init__.py该文件的作用就是相当于把自身整个文件夹当作一个包来管理，每当有外部import的时候，就会自动执行里面的函数。1.标识该目录是一个python的模块包（modul
机器学习之线性代数珠峰日记 AI理论与实践机器学习线性代数人工智能
文章目录一、引言：线性代数为何是AI的基石二、向量：AI世界的基本构建块（一）向量的定义（二）向量基础操作（三）重要概念三、矩阵：AI数据的强大容器（一）矩阵的定义（二）矩阵运算（三）矩阵特性（四）矩阵分解（五）Python示例（使用NumPy库）四、线性代数在AI中的应用（一）数据表示（二）降维：PCA（三）线性回归（四）计算机视觉（五）自然语言处理一、引言：线性代数为何是AI的基石在人工智能领
有趣的学习Python-第十篇：Python的“魔法宝库”：标准库之旅王盼达有趣的学习Python 学习 python 开发语言
Python不仅是一门强大的编程语言，更像是一座充满宝藏的“魔法宝库”，里面装满了各种各样的“魔法工具”（标准库）。这些“魔法工具”可以帮助你轻松地完成各种任务，从文件操作到网络编程，从数据处理到性能优化。接下来，让我们一起探索Python的“魔法宝库”，看看这些“魔法工具”到底有多神奇！10.1操作系统接口：与“魔法世界”互动os模块就像是一个“魔法接口”，可以帮助你与操作系统进行互动。你可以用
有趣的学习Python-第八篇：Python的“魔法盾牌”：错误与异常处理王盼达有趣的学习Python 学习 python 开发语言
在Python的魔法世界里，即使是经验丰富的魔法师也可能遇到一些“魔法失误”。这些失误分为两种：语法错误和异常。别担心，Python为你准备了一面强大的“魔法盾牌”，帮助你应对这些挑战。8.1语法错误：魔法咒语写错了语法错误就像是你在念魔法咒语时，不小心说错了单词。这是学习Python过程中最常见的问题。比如，你可能忘记在while循环后面加上冒号：whileTrueprint('Hellowor
Python字符串操作 weixin_30871905 python
转自http://blog.chinaunix.net/u/19742/showart_382176.html#Python字符串操作'''1.复制字符串'''#strcpy(sStr1,sStr2)sStr1='strcpy'sStr2=sStr1sStr1='strcpy2'printsStr2'''2.连接字符串'''#strcat(sStr1,sStr2)sStr1='strcat'sSt
零基础必看！CCF-GESP Python一级考点全解析：运算符这样学就对了奕澄羽邦 python 开发语言
第一章编程世界的基础工具：运算符三剑客在Python编程语言中，运算符如同魔法咒语般神奇。对于CCF-GESPPython一级考生而言，正确掌握比较运算符、算术运算符和逻辑运算符这三大基础工具，就相当于打开了数字世界的大门。这三个运算符家族共同构成了程序逻辑的核心骨架，其灵活组合能实现从简单计算到复杂判断的多样功能。1.1运算符分类图谱算术运算符：负责数字间的数学运算（+-*/%）比较运算符：用于
Python 字符串操作 iteye_13776 Python Python C C++C#
Python截取字符串使用变量[头下标:尾下标]，就可以截取相应的字符串，其中下标是从0开始算起，可以是正数或负数，下标可以为空表示取到头或尾。#例1：字符串截取str='12345678'printstr[0:1]>>1#输出str位置0开始到位置1以前的字符printstr[1:6]>>23456#输出str位置1开始到位置6以前的字符num=18str='0000'+str(num)#合并字
【Python 第五篇章】数据类型蜗牛 | ICU Python 专栏 python windows 开发语言
一、列表详解list.append(x)在列表末尾添加一个元素。list.extend(iterable)用可迭代对象的元素扩展列表。list.insert(i,x)在指定位置插入元素，第一个参数是插入元素的索引，第二个是值。list.remove(x)从列表中删除第一个值为x的元素。list.pop([i])移除列表中给定位置的条目，并返回该条目。如果未指定索引号，则a.pop()将移除并返回列
python catia catalog文件_Python封装的获取文件目录的函数卢新生 python catia catalog文件
获取指定文件夹中文件的函数，网上学习时东拼西凑的结果。注意，其中文件名如1.txt，文件路径如D:\文件夹\1.txt；direct为第一层子级importos#filePath输入文件夹全路径#mode#1递归获取所有文件名;#2递归获取所有文件路径;#3获取direct文件名;#4获取direct文件路径;#5获取direct文件名和direct子文件夹名;#6获取direct文件路径和dir
Python：每日一题之错误票据努力的敲码工蓝桥杯每日一题 python 蓝桥杯
题目描述某涉密单位下发了某种票据，并要在年终全部收回。每张票据有唯一的ID号。全年所有票据的ID号是连续的，但ID的开始数码是随机选定的。因为工作人员疏忽，在录入ID号的时候发生了一处错误，造成了某个ID断号，另外一个ID重号。你的任务是通过编程，找出断号的ID和重号的ID。假设断号不可能发生在最大和最小号。输入描述输入描述要求程序首先输入一个整数N(N<100)表示后面数据行数。接着读入N行数据
Python控制批量插入Catia文件并修改文件定义及PN 一盘红烧肉 python
改了两天，总算初步摸清楚了Catia中的文件结构，实现了使用Python控制批量修改文件名及定义使用Pycatia在Product中插入Part并改名及定义
PySide2是 Qt 库的 Python 绑定之一 WwwwwH_PLUS #Qt qt python 开发语言
PySide2是Qt库的Python绑定之一，它为Python程序员提供了创建跨平台桌面应用程序的工具和功能。PySide2是Qt5.x系列的Python绑定，而Qt本身是一个跨平台的图形用户界面（GUI）框架，广泛用于开发各种类型的桌面应用程序，包括多种平台（Windows、Linux、macOS）的应用。主要特点跨平台支持：PySide2可以在Windows、Linux和macOS上运行，允许
Python学习第十一天 Leo来编程 Python学习 python
疑惑：有很多人不知道是不是也分不清什么是单核？什么是多核？什么是时间片？进程？线程？那么在讲进程和线程前我先举个例子更好理解这些概念。单核例子：比如你是一个厨师（计算机）在一个厨房（CPU）里需要同时做3个菜（进程）、每个菜需要准备不同的调料以及协作（线程），那么这个厨师需要不断地切换时间（时间片）来达到同时在一个时间将三个菜做完。多核的话其实对应的例子就是多个厨师，这样的例子太多了因为万物皆对象
python学习第三天 Leo来编程 Python学习 python 开发语言
条件判断条件判断使用if、elif和else关键字。它们用于根据条件执行不同的代码块。#条件判断age=18ifage0:#也可以写if(s>0)但是没必要因为python给个提示建议去掉保证代码的按照缩进来进行更加规范print("这个数字是大于0的数字!")#这行代码属于if语句的代码块elifs==0:print("这个数字是等于0的数字!")#这行代码属于elif语句的代码块else:pr
三种优化算法旅者时光算法算法 python 开发语言
本文将总结遗传算法、粒子群算法、模拟退火三种优化算法的核心思路，并使用python完整实现。实际上，越来越多的优秀算法已经被封装为一个易用的接口。很多时候，一行代码就能实现我们的需求。但了解这些算法的基本逻辑，能够使用最基本的代码实现它。无论对于提升我们的编程能力还是解决问题的能力，都会大有裨益。甚至，改变我们思考问题的方式。1、遗传算法遗传算法，顾名思义，就是借鉴了生物通过遗传变异来逐渐适应环境
使用 Python 合并微信与支付宝账单，生成财务报告 python后端
最近用思源笔记记东西上瘾，突然想每个月存一份收支记录进去。但手动整理账单太麻烦了，支付宝导出一份CSV，微信又导出一份，格式还不一样，每次复制粘贴头都大。干脆写了个Python脚本一键处理，核心就干两件事：把俩平台的CSV账单合并到一起自动生成带分类表格的Markdown（直接拖进思源就能渲染）代码主要折腾了这些：支付宝账单前24行都是废话，直接skiprows=24跳过去，GBK编码差点让我栽跟
Python Flask 在网页应用程序中处理错误和异常 dowhileprogramming python flask 开发语言
PythonFlask在网页应用程序中处理错误和异常PythonFlask在网页应用程序中处理错误和异常PythonFlask在网页应用程序中处理错误和异常在我们所有的代码示例中，我们没有注意如何处理用户在浏览器中输入错误的URL或向我们的应用程序发送错误的参数集的情况。这不是设计意图，但目的是首先关注网页应用程序的关键组件。网页框架的美妙之处在于，它们通常默认支持错误处理。如果发生任何错误，将自
农业生产模拟和农业政策分析：WOFOST模型与PCSE模型安装、运行、数据准备；农田农作物生长模拟和产量预测等 WangYan2022 作物模型农业 WOFOST模型 PCSE模型农田生态系统作物模型农业生产模拟
WOFOST（WorldFoodStudies）和PCSE（PythonCropSimulationEnvironment）是两个用于农业生产模拟的模型：WOFOST是一个经过多年开发和验证的模型，被广泛用于全球的农业生产模拟和农业政策分析；采用了模块化的结构，可以对不同的农作物和环境条件进行参数化和适应；WOFOST可用于长期模拟，能够模拟整个作物生长周期，包括播种、生长、收获等各个阶段；WOF
基于Python+Vue开发的电影订票管理系统源码+运行步骤冷琴1996 Python系统设计 python vue.js 开发语言
项目简介该项目是基于Python+Vue开发的电影订票管理系统（前后端分离），这是一项为大学生课程设计作业而开发的项目。该系统旨在帮助大学生学习并掌握Python编程技能，同时锻炼他们的项目设计与开发能力。通过学习基于Python的电影订票管理系统项目，大学生可以在实践中学习和提升自己的能力，为以后的职业发展打下坚实基础。技术学习之路主要功能影片管理：管理系统可以录入、修改和查询影片的基本信息，如
Python通过YOLO格式TXT标签文件在图像中画框 CHERISH_KDX python YOLO 人工智能
使用场景检测数据集标注是否有误：在目标检测算法中需要标注自己的数据集，为了更加方便的检查数据集标注是否有误，可以使用该工具将标注结果绘制在图像中并查看。美化识别结果中的检测框：在一些目标检测场景中，YOLO检测算法原始的检测框绘制会导致重叠、颜色冲突、字体过大等问题。可以使用该工具进行修改。代码importosimportcv2classcheck_label:def__init__(self,c
基于llama_cpp 调用本地模型（llama）实现基本推理月光技术杂谈大模型初探 llama llama.cpp python LLM 集成显卡本地模型 AI
零基础实践本地推理模型基本应用：基于llama_cpp的本地模型调用。本文先安装llama_cpppython库，再编写程序，利用其调用llama-2-7b-chat.Q4_K_M.ggu模型。背景llama_cpp是一个基于C++的高性能库（llama.cpp）的Python绑定，支持在CPU或GPU上高效运行LLaMA及其衍生模型（如LLaMA2），并通过量化技术（如GGUF格式）优化内存使用
python实现查找满足条件的数字 qq_恰同学少年 python
问题：一个四位数，知道其前两位和后两位分别相等，并且这个数还是一个平方数，求出这个数。一个四位数，范围只能是1000~9999，前两位和后两位分别相等，也就是说，它的结构应该是aabb。最后，这个数是一个平方数。有的小伙伴可能不知道啥叫平方数，暂且解释下，所谓的平方数就是指该数等于一个整数的平方。比如3的平方是9，那么我们就说9是个平方数。第一步，这是个四位数，前两位和后两位分别相等，我们将满足条
python中常用的内置模块举例（入门级整理） qq_恰同学少年 python
python对于初学者可以说是十分友好的一门编程语言，不仅语法简单，而且它自身还包含了十分丰富的第三方模块，我仅就将我自己常用的一些内置模块（自带的，无需安装）做一下简单的总结和介绍：1.turtleturtle，是python中比较好玩一个模块，它有一个专有名称“海龟作图”，光看名字就应该能够猜到它是用来干嘛的，没错，就是来画图的，它可以通过某些语句来控制一个点在白板上的运动轨迹，它在白板上走过
QPython双核攻略：从零基础到AI开发，你的手机就是全栈训练营程之编 python 开发语言青少年编程人工智能
主题一：《编程小白必看！在手机上种下你的第一行代码》✨北京优趣天下信息技术有限公司重磅出品我们比谁都清楚：✔️86%的初学者因环境配置放弃编程✔️72%的上班族只有碎片化学习时间✔️95%的自学者需要即时答疑支持为什么QPython成为2025现象级学习工具？▸全栈开发环境：解释器+编辑器+控制台三合一▸AI导师常驻：集成DeepSeek代码助手（支持中英双语提问）▸极速学习路径：Q派课程7天完成
Python学习指南：系统化路径 + 避坑建议程之编 Python全栈通关秘籍青少年编程 python 开发语言人工智能机器学习
新手小白学习编程就像搭积木——需要从基础开始，逐步构建知识体系。以下是为你量身定制的Python学习路径，帮你告别杂乱，高效入门！一、学习前的关键认知明确目标：想用Python做什么？数据分析（如Excel自动化、可视化）Web开发（如搭建网站）人工智能（如机器学习）自动化办公（如处理文件、邮件）目标不同，后续学习侧重点不同（但基础通用）。避免误区：❌只看教程不写代码✅边学边动手，哪怕抄代码也要运
springmvc 下 freemarker页面枚举的遍历输出杨白白 enum freemarker
spring mvc freemarker 中遍历枚举 1枚举类型有一个本地方法叫values（），这个方法可以直接返回枚举数组。所以可以利用这个遍历。 enum public enum BooleanEnum { TRUE(Boolean.TRUE, "是"), FALSE(Boolean.FALSE, "否");
实习简要总结 byalias 工作
来白虹不知不觉中已经一个多月了，因为项目还在需求分析及项目架构阶段，自己在这段时间都是在学习相关技术知识，现在对这段时间的工作及学习情况做一个总结：（1）工作技能方面大体分为两个阶段，Java Web 基础阶段和Java EE阶段 1）Java Web阶段在这个阶段，自己主要着重学习了 JSP, Servlet, JDBC, MySQL，这些知识的核心点都过了一遍，也
Quartz——DateIntervalTrigger触发器 eksliang quartz
转载请出自出处：http://eksliang.iteye.com/blog/2208559 一.概述 simpleTrigger 内部实现机制是通过计算间隔时间来计算下次的执行时间，这就导致他有不适合调度的定时任务。例如我们想每天的 1：00AM 执行任务，如果使用 SimpleTrigger，间隔时间就是一天。注意这里就会有一个问题，即当有 misfired 的任务并且恢复执行时，该执行时间
Unix快捷键 18289753290 unix Unix；快捷键;
复制，删除，粘贴： dd:删除光标所在的行 &nbs
获取Android设备屏幕的相关参数酷的飞上天空 android
包含屏幕的分辨率以及屏幕宽度的最大dp 高度最大dp TextView text = (TextView)findViewById(R.id.text); DisplayMetrics dm = new DisplayMetrics(); text.append("getResources().ge
要做物联网？先保护好你的数据蓝儿唯美数据
根据Beecham Research的说法，那些在行业中希望利用物联网的关键领域需要提供更好的安全性。在Beecham的物联网安全威胁图谱上，展示了那些可能产生内外部攻击并且需要通过快速发展的物联网行业加以解决的关键领域。 Beecham Research的技术主管Jon Howes说：“之所以我们目前还没有看到与物联网相关的严重安全事件，是因为目前还没有在大型客户和企业应用中进行部署，也就
Java取模（求余）运算随便小屋 java
整数之间的取模求余运算很好求，但几乎没有遇到过对负数进行取模求余，直接看下面代码： /** * * @author Logic * */ public class Test { public static void main(String[] args) { // TODO A
SQL注入介绍 aijuans sql注入
二、SQL注入范例这里我们根据用户登录页面 <form action="" > 用户名：<input type="text" name="username"><br/> 密码：<input type="password" name="passwor
优雅代码风格 aoyouzi 代码
总结了几点关于优雅代码风格的描述：代码简单：不隐藏设计者的意图，抽象干净利落，控制语句直截了当。接口清晰：类型接口表现力直白，字面表达含义，API 相互呼应以增强可测试性。依赖项少：依赖关系越少越好，依赖少证明内聚程度高，低耦合利于自动测试，便于重构。没有重复：重复代码意味着某些概念或想法没有在代码中良好的体现，及时重构消除重复。战术分层：代码分层清晰，隔离明确，
布尔数组百合不是茶 java 布尔数组
androi中提到了布尔数组; 布尔数组默认的是false, 并且只会打印false或者是true 布尔数组的例子; 根据字符数组创建布尔数组 char[] c = {'p','u','b','l','i','c'}; //根据字符数组的长度创建布尔数组的个数 boolean[] b = new bool
web.xml之welcome-file-list、error-page bijian1013 java web.xml servlet error-page
welcome-file-list 1.定义： <welcome-file-list> <welcome-file>login.jsp</welcome> </welcome-file-list> 2.作用：用来指定WEB应用首页名称。 error-page1.定义： <error-page&g
richfaces 4 fileUpload组件删除上传的文件 sunjing clear Richfaces 4 fileupload
页面代码 <h:form id="fileForm"> <rich:
技术文章备忘 bit1129 技术文章
Zookeeper http://wenku.baidu.com/view/bab171ffaef8941ea76e05b8.html http://wenku.baidu.com/link?url=8thAIwFTnPh2KL2b0p1V7XSgmF9ZEFgw4V_MkIpA9j8BX2rDQMPgK5l3wcs9oBTxeekOnm5P3BK8c6K2DWynq9nfUCkRlTt9uV
org.hibernate.hql.ast.QuerySyntaxException: unexpected token: on near line 1解决方案白糖_ Hibernate
文章摘自：http://blog.csdn.net/yangwawa19870921/article/details/7553181 在编写HQL时，可能会出现这种代码： select a.name,b.age from TableA a left join TableB b on a.id=b.id 如果这是HQL，那么这段代码就是错误的，因为HQL不支持
sqlserver按照字段内容进行排序 bozch 按照内容排序
在做项目的时候，遇到了这样的一个需求：从数据库中取出的数据集，首先要将某个数据或者多个数据按照地段内容放到前面显示，例如:从学生表中取出姓李的放到数据集的前面； select * fro
编程珠玑-第一章-位图排序 bylijinnan java 编程珠玑
import java.io.BufferedWriter; import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.io.Writer; import java.util.Random; public class BitMapSearch {
Java关于==和equals chenbowen00 java
关于==和equals概念其实很简单，一个是比较内存地址是否相同，一个比较的是值内容是否相同。虽然理解上不难，但是有时存在一些理解误区，如下情况： 1、 String a = "aaa"; a=="aaa"; ==> true 2、 new String("aaa")==new String("aaa
[IT与资本]软件行业需对外界投资热情保持警惕 comsci it
我还是那个看法,软件行业需要增强内生动力,尽量依靠自有资金和营业收入来进行经营,避免在资本市场上经受各种不同类型的风险,为企业自主研发核心技术和产品提供稳定,温和的外部环境... 如果我们在自己尚未掌握核心技术之前,企图依靠上市来筹集资金,然后使劲往某个领域砸钱,然
oracle 数据块结构 daizj oracle 块数据块块结构行目录
oracle 数据块是数据库存储的最小单位，一般为操作系统块的N倍。其结构为：块头－－〉空行－－〉数据，其实际为纵行结构。块的标准大小由初始化参数DB_BLOCK_SIZE指定。具有标准大小的块称为标准块（Standard Block）。块的大小和标准块的大小不同的块叫非标准块（Nonstandard Block）。同一数据库中，Oracle9i及以上版本支持同一数据库中同时使用标
github上一些觉得对自己工作有用的项目收集 dengkane github
github上一些觉得对自己工作有用的项目收集技能类 markdown语法中文说明回到顶部全文检索 elasticsearch bigdesk elasticsearch管理插件回到顶部 nosql mapdb 支持亿级别map, list, 支持事务. 可考虑做为缓存使用 C
初二上学期难记单词二 dcj3sjt126com english word
dangerous 危险的 panda 熊猫 lion 狮子 elephant 象 monkey 猴子 tiger 老虎 deer 鹿 snake 蛇 rabbit 兔子 duck 鸭 horse 马 forest 森林 fall 跌倒；落下 climb 爬；攀登 finish 完成；结束 cinema 电影院；电影 seafood 海鲜；海产食品 bank 银行
8、mysql外键(FOREIGN KEY)的简单使用 dcj3sjt126com mysql
一、基本概念 1、MySQL中“键”和“索引”的定义相同，所以外键和主键一样也是索引的一种。不同的是MySQL会自动为所有表的主键进行索引，但是外键字段必须由用户进行明确的索引。用于外键关系的字段必须在所有的参照表中进行明确地索引，InnoDB不能自动地创建索引。 2、外键可以是一对一的，一个表的记录只能与另一个表的一条记录连接，或者是一对多的，一个表的记录与另一个表的多条记录连接。 3、如
java循环标签 Foreach shuizhaosi888 标签 java循环 foreach
1. 简单的for循环 public static void main(String[] args) { for (int i = 1, y = i + 10; i < 5 && y < 12; i++, y = i * 2) { System.err.println("i=" + i + " y="
Spring Security（05）——异常信息本地化 234390216 exception Spring Security 异常信息本地化
异常信息本地化 Spring Security支持将展现给终端用户看的异常信息本地化，这些信息包括认证失败、访问被拒绝等。而对于展现给开发者看的异常信息和日志信息（如配置错误）则是不能够进行本地化的，它们是以英文硬编码在Spring Security的代码中的。在Spring-Security-core-x
DUBBO架构服务端告警Failed to send message Response javamingtingzhao 架构 DUBBO
废话不多说，警告日志如下，不知道有哪位遇到过，此异常在服务端抛出(服务器启动第一次运行会有这个警告)，后续运行没问题，找了好久真心不知道哪里错了。 WARN 2015-07-18 22:31:15,272 com.alibaba.dubbo.remoting.transport.dispatcher.ChannelEventRunnable.run(84)
JS中Date对象中几个用法 leeqq JavaScript Date 最后一天
近来工作中遇到这样的两个需求 1. 给个Date对象，找出该时间所在月的第一天和最后一天 2. 给个Date对象，找出该时间所在周的第一天和最后一天需求1中的找月第一天很简单，我记得api中有setDate方法可以使用使用setDate方法前，先看看getDate var date = new Date(); console.log(date); // Sat J
MFC中使用ado技术操作数据库你不认识的休道人 sql mfc
1.在stdafx.h中导入ado动态链接库 #import"C:\Program Files\Common Files\System\ado\msado15.dll" no_namespace rename("EOF","end")2.在CTestApp文件的InitInstance()函数中domodal之前写::CoIniti
Android Studio加速 rensanning android studio
Android Studio慢、吃内存！启动时后会立即通过Gradle来sync & build工程。（1）设置Android Studio a) 禁用插件 File -> Settings... Plugins 去掉一些没有用的插件。比如：Git Integration、GitHub、Google Cloud Testing、Google Cloud
各数据库的批量Update操作 tomcat_oracle java oracle sql mysql sqlite
MyBatis的update元素的用法与insert元素基本相同，因此本篇不打算重复了。本篇仅记录批量update操作的 sql语句，懂得SQL语句，那么MyBatis部分的操作就简单了。　　注意：下列批量更新语句都是作为一个事务整体执行，要不全部成功，要不全部回滚。 MSSQL的SQL语句　WITH R AS（　　SELECT 'John' as name, 18 as
html禁止清除input文本输入缓存 xp9802 input
多数浏览器默认会缓存input的值，只有使用ctl+F5强制刷新的才可以清除缓存记录。如果不想让浏览器缓存input的值，有2种方法：方法一：在不想使用缓存的input中添加 autocomplete="off"; eg: <input type="text" autocomplete="off" name