[数据工程]如何将xml转csv

import xml.etree.ElementTree as ELT
from tqdm import tqdm

def parse_xml_to_csv(path, save_path=None):
    """
    Open xml posts dump and convert the text to a csv, tokenizing it in the process
    :param path: path to the xml document containing posts
    :return: a dataframe of processed text
    """

    # Use python's standard library to parse xml file
    doc = ELT.parse(path)
    root = doc.getroot()

    # Each row is a question
    all_rows = [row.attrib for row in root.findall('row')]

    # Using tdqm to display progress since preprocessing takes time
    for item in tqdm(all_rows):
        # Decode text from HTML
        soup = BeautifulSoup(item['Body'], features='html.parser')
        item['body_text'] = soup.get_text()

    # Create dataframe from our list of dict
    df = pd.DataFrame.from_dict(all_rows)
    if save_path:
        df.to_csv(save_path)
    return df
    
parse_xml_to_csv("MiniPosts.xml", "1.csv")

'''

MiniPosts.xml

 



  
  
  
  
  
  
  
  
  
  

'''

 

'''

结果:

 

'''

  Id PostTypeId CreationDate Score ViewCount Body OwnerUserId LastActivityDate Title Tags AnswerCount CommentCount FavoriteCount ClosedDate body_text AcceptedAnswerId LastEditorUserId LastEditDate ParentId
0 5 1 2014-05-13T23:58:30.457 9 516

I've always been interested in machine learning, but I can't figure out one thing about starting out with a simple "Hello World" example - how can I avoid hard-coding behavior?



For example, if I wanted to "teach" a bot how to avoid randomly placed obstacles, I couldn't just use relative motion, because the obstacles move around, but I don't want to hard code, say, distance, because that ruins the whole point of machine learning.



Obviously, randomly generating code would be impractical, so how could I do this?

5 2014-05-14T00:36:31.077 How can I do simple machine learning without hard-coding behavior? 1 1 1 2014-05-14T14:40:25.950 I've always been interested in machine learning, but I can't figure out one thing about starting out with a simple "Hello World" example - how can I avoid hard-coding behavior?
For example, if I wanted to "teach" a bot how to avoid randomly placed obstacles, I couldn't just use relative motion, because the obstacles move around, but I don't want to hard code, say, distance, because that ruins the whole point of machine learning.
Obviously, randomly generating code would be impractical, so how could I do this?
       

你可能感兴趣的:(#,工程实战)