weixin_33795806

Software development skills for data scientists

Data scientists often come from diverse backgrounds and frequently don't have much, if any, in the way of formal training in computer science or software development. That being said, most data scientists at some point will find themselves in discussions with software engineers because of some code that already is or will be touching production code.

This conversation will probably go something like this:

SE: "You didn't check your code and your tests into master without a code review, did you?"

DS: "Checked my what into what without a what?"

As it turns out, there are a number of skills that software developers often take for granted that new data scientists don't possess -- and may not even have heard of. I did a quick poll on Twitter about what these skills might be. I'll walk through the most common responses below, but I'd say the unifying theme for all of them is that many new data scientists don't know how to effectively collaborate. Perhaps they used to be academics and only worked alone or with a single other collaborator. Perhaps they're self-taught. Regardless of the reasons, writing code in an environment where many other people (and "other people" includes yourself at some later date) will be looking at, trying to understand, and using your code or things that your code produces.

You may be used to your code living on your hard drive or perhaps in a shared Dropbox folder. Now your code will be routinely checked into a repo (more on that below) where anyone can take a look at it. This is a little unnerving at first, and it initially causes you to want to only ever check in perfect code. That's generally a bad idea.

Each of the following topics speak to that idea in some way. I'm not saying that data scientists need to be experts in all of these fields right away but some level of proficiency in each of them will be necessary sooner than later. You won't find most of these topics in "Introduction to Data Science in Python" or "Machine Learning in R" books -- these are the taken-for-granted skills.

Writing modular, reusable code

Many data scientists are self-taught programmers or learned to program as part of a research project. Programming was a tool that one acquired to achieve a certain goal, like estimating a regression or modeling the movement of stars, or simulating atmospheric conditions. Rather than "programming" being a skill that has its own norms, best practices, and so forth, writing code was about learning the right commands to type in the right order to produce the output that could then be lovingly arranged in LaTeX.

Often times, research projects looked different enough from one another or were sufficiently simple (code-wise) that you started from scratch each time or just copied and pasted the bits and pieces from old projects you needed. Your code often was very imperative in style, and could be read start-to-finish to get an idea of what needed to be done. "First load the data, then do this, then do that, then print the results. The end."

Those days are over.

You should learn a principle called DRY, which stands for Don't Repeat Yourself. The basic idea is that many tasks can be abstracted into a function or piece of code that can be reused regardless of the specific task. This is more efficient from a "lines of code" perspective, but also in terms of your time. It can be taken to an illogical extreme, where code becomes very difficult to follow, but there is a happy medium to strive for. A good rule of thumb: if you find yourself writing the same line of code with only minor changes each time, think about how you can turn that code into a function that takes the changes as parameters. Avoid hard-coding values into your code. It is also good practice to revisit code you've written in the past to see if the code can be made cleaner, more efficient, or more modular and reusable. This is called refactoring.

Chances are good that you'll be asked to submit your code for a code review at some point. This is normal and doesn't mean people are skeptical of your work. Many organizations require that code be reviewed by multiple people before it is merged into production code. Writing clean, reusable code will make this process easier for everyone and will lower the probability that you will be rewriting significant portions of your code following a code review.

Further reading: Chris DuBois on becoming a full-stack statistician,The Pragmatic Programmer, Clean Code

Documentation / commenting

Because other people are going to be reading your code, you need to learn how to write meaningful, informative comments as well as documentation for the code that you write. It is a very good practice(although one you probably won't follow) to write comments and documentation before you actually write the code. This is the coding equivalent of writing an outline before you write a paper.

[Aside: Some seasoned programmers will argue that you shouldn't write the comments until the code is complete, because this will force you to write clear, self-explanatory code and the only comments you will have to write are for the situations that are not crystal clear. As a beginning software developer, you should probably ignore this advice.]

Comments are non-executed blocks of code that explain what you are doing and why you are doing it. Good comments make the purpose of code clearer, they don't just restate what's obvious in the code. If you're writing clean, well-styled code, your function, variable, object, etc., names should be fairly self-explanatory.

You've probably heard that you should comment your code many times. So, you wrote things like this:

# import packages
import pandas as pd

# load some data df = pd.read_csv('data.csv', skiprows=2)

These are bad comments. They don't add any information. Why is thatskiprows parameter set to 2? Are there comments at the beginning ofdata.csv? Something like this might be preferable:

# Data contains two lines of description text, skip to avoid errors.
df = pd.read_csv('data.csv', skiprows=2)

It's very important that you update your comments as you update your code. Using the example from above, let's say the data source for your CSVs has changed, and there are no longer any description lines. You modify the read_csv call, but don't remove the comment, which produces:

# Data contains two lines of description text, skip to avoid errors.
df = pd.read_csv('data.csv')

Now whoever is reading your code has no idea if the comment is right or if the code is right, which means they have to execute the code to find out. Then they're effectively debugging your code for you, and no one appreciates that.

If you write a function, write a docstring (or whatever your language of choice calls the attribute of a function that describes what it does) that clearly states what the function does, what parameters it takes, and what it returns.

Unlike comments, documentation is a document written in English (or whatever language you speak), rather than in a programming language, that explains the purpose of the code you are writing is, how it operates, example use cases, who to contact for support, and so on. This can be as simple as a README that sits in the directory where your code is to a full-fledged manual that will be printed and given to users.

Version control

In my informal Twitter poll, version control (also known as source or revision control) was the most oft-cited skill that new data scientists need to learn. It is also probably one of the most confusing. In your former life, "version control" probably meant you had a folder somewhere on your hard drive that contained project.py, project2.py,project3.py, project3_final.py, project3_final_do_not_delete_final_revised.py and so on.

Version control provides a centralized way for one to many people to work on a common codebase at the same time without writing over each other's work. Each person "checks out" a copy of the code and makes changes to it on a local "branch" which they can then "commit" and "merge" back into the common codebase. There's a lot of specialized vocabulary, but it starts to make (some) sense after a while. Version control also allows you to easily "revert" changes that you made that broke something.

Many people use git as their version control system, although you may also encounter mercurial (abbreviated hg) and subversion (abbreviatedsvn). The terminology and exact workflows will differ slightly, but the basic premise is usually the same. All of the code is stored in one or more repositories (repos), and within each repo you may have several branches -- different versions of the code. In each repo, the branch that everyone treats as the starting/reference point is called themaster branch. GitHub is a service that hosts repos (both public and private) and provides a common set of tools for interacting with git.

There are only three certainties in your life as a data scientist: death, taxes, and an inevitable git clusterfuck. You will find yourself typing git reset --hard and hitting enter while sighing at least once. That's OK.

If you're not familiar with version control, start now. Install git (it works on pretty much every operating system) and start using it to manage your own code. Commit frequently, write meaningful commit messages (which are just comments), and get to know the system. Create a GitHub account and check your code into a remote repo.

Testing

There's a good possibility that if you have no formal computer science traning, you don't even know what I mean when I say "testing." I'm talking about writing code that checks your code for bugs across a variety of situations. The most basic type of test that you can write is called a unit test.

In the past, you probably ran most of your code interactively, either by typing it in line-by-line or by writing a script and sending portions of that script to an interpreter of some kind. You're moving to a position where you may not even be awake when your code runs. Maybe you've built a recommender system and you want to generate the recommendations in batch every night for customers that might visit the next day. You write a script that will be run at 2am and will dump the recommendations into a database.

What happens if a the product list that you use for recommendations has an error and returns too few columns? What about if a column that used to be an integer suddenly becomes a floating point value? Do you want to be the one on the hook when there are no recommendations in the database the next day?

You write tests that describe the expected behavior of your code and that fail when that behavior is not produced. I'm working on another post about testing for data scientists, so I won't go into too much detail here, but it's a very important topic. Your code is going to be interacting with other code and messy data. You need to know what will happen when it's run on data that isn't as pristine as the data you are working with now.

Further reading: Improve Your Python: Understanding Unit-Testing,pytest, Thoughtful Machine Learning

Logging

In the scenario above, your code is running at 2am, and you're not around to see what happens when (and it's definitely when, not if) it breaks. For this you need logging. Logging is just a record of what happened as your code was executed. It includes information about what parts of your program executed successfully, what parts didn't, and any other diagnostic information you'd like to include. Like comments, documentation, and testing, this is extra code you'll have to write in addition to the actual executable code that you care about, but it's totally worth it.

When you get to work in the morning and find that your code barfed, you'll want to know what happened without re-running all of the code -- and that's not even guaranteed to reproduce the error, since it may have been due to another piece of data that has since been corrected. Logging lets you immediately identify the source of the problem (if your logging code is well-written, that is) and quickly figure out what to do about it.

For instance, if your logs tell you that the code didn't run because the file containing the products wasn't found, you immediately know to try and figure out if the file was deleted, or if the network was down, and so on. If the code partially runs and fails on a specific product or customer, you can go inspect that particular piece of data, and fix your code so it won't happen again.

Disk space is cheap, so log generously. It's a lot easier (and faster) to grep through a big directory of logs than to try to reproduce an unusual error on a large codebase or dataset. Make logging work for you -- log more things than you think you'll need. Be smart at logging when functions are called, when steps in your program are executed.

Conclusion

There are lots of things I didn't cover here:

how to conduct code reviews
refactoring code
navigating a *nix terminal, adding your ssh keys, setting up a dev environment
working with distributed resources like AWS
IDE choices
programming paradigms (functional, object-oriented, etc.)

Posts like this one often balloon into a laundry list of skills and languages and make it seem impossible that any one person could ever master all of them. I've tried to avoid that and focus on things that will help you write better code, interact better with software developers, and ultimately save you time and headaches. You don't need to have them all mastered your first day on the job, and some of them are more important at some companies than at others, but you will encounter all of them at some point.

Posted on: May 12, 2015

Category: misc

java课程设计体会_Java课程设计（阶段一） XY LIU java课程设计体会
1选题选题一算术运算测试题目要求实现十道100以内加减法数学题，能根据题目计算出答案，与输入答案对比，判断做题是否正确，最后计算分数。添加排行榜功能存放到文件或数据库中。使用Java知识String类IO：Reader、Writer类集合：ArrayLiastsort()方法选题二猜数游戏题目要求计算机产生随机数，猜中即胜，猜不中，提示是大了还是小了，继续猜，直至猜到，给出所用时间和评语。保留用户
基于 chat-uikit-react-native 实现一个 React Native 聊天 App qq_38405998 react native 即时通信 typescript android ios
一、前言本文分享了通过github源码快速实现一个聊天App。二、具体步骤Step1：配置开发环境如果您电脑没有ReactNative开发环境，请先按照ReactNative官网set-up-your-environment配置开发环境.Step2：下载源码Demo源码可前往github下载Step3：获取应用信息您需要前往腾讯云即时通讯官网创建并获取相关的应用信息，关于如何创建即时通信账号请点击
windows mysql主从备份_windows下mysql主从备份设置韩山云客 windows mysql主从备份
Windowsserver2008mysql主从数据设置步骤：一、安装MySQL说明：在两台MySQL服务器192.168.21.169和192.168.21.168上分别进行如下操作，安装MySQL5.5.22二、配置MySQL主服务器(192.168.21.169)mysql-uroot-p#进入MySQL控制台createdatabaseosyunweidb;#建立数据库osyunweidb
mysql主从备份_mysql实现主从备份 Lucas HC mysql主从备份
mysql主从备份的原理:主服务器在做数据库操作的时候将所有的操作通过日志记录在binlog里面，有专门的文件存放。如localhost-bin.000003，这种，从服务器和主服务配置好关系后，通过I/O线程获取到这个binlog文件然后写入到从服务器的relaylog(中继日志)中，然后从服务器执行从服务器中的sql语句进行数据库的同步。实现：准备:两台服务器，mysql环境，可以是Windo
Mysql 主从备份龙那个猫robot 数据库 mysql
英文好的可以直接去mysql官网查看https://dev.mysql.com/doc/refman/5.7/en/replication.html1环境准备我这里准备两套linux虚拟机，主mysql服务器,从mysql服务区ip192.168.1.30ip192.168.1.1001.1备份主mysql数据库1.2从数据库恢复主mysql数据库1.3配置主mysql服务器配置server-id
使用 Docker 部署 MySQL 8
使用Docker部署MySQL8详细指南MySQL是一个广泛使用的开源关系型数据库管理系统。通过Docker部署MySQL8可以快速搭建一个可移植、可扩展的数据库环境。本文将详细介绍如何使用Docker部署MySQL8，并讲解如何根据需求配置MySQL。从拉取镜像开始的详细步骤1.拉取MySQL8镜像首先，从DockerHub拉取MySQL8的官方镜像。dockerpullmysql:8.0mys
【SequoiaDB】4 巨杉数据库SequoiaDB整体架构 Alen_Liu_SZ 巨杉数据库 SequoiaDB架构编目节点协调节点数据节点巨杉数据库
1整体架构SequoiaDB巨杉数据库作为分布式数据库，由数据库存储引擎与数据库实例两大模块组成。其中，数据库存储引擎模块是数据存储的核心，负责提供整个数据库的读写服务、数据的高可用与容灾、ACID与发你不是事务等全部核心数据服务能力。数据库实例模块则作为协议与语法的适配层，用户可根据需要创建包括MySQL、PostgreSQL与SparkSQL在内的结构化数据实例；支持JSON语法的MongoD
软件测试从业者必备的SQL知识十二测试录数据库 sql 数据库
作为职场人，学一门技能是用来解决日常工作问题的，没必要从头到尾把这块知识弄透，没那么多时间。基于此，十二根据自己的经验，把软件测试从业者需要掌握的SQL知识，整理如下；只要跟着这个顺序，从头到尾执行即可。前置准备事项：1、在自己电脑上安装一个mysql数据库，文章见->虚拟机Centos下安装Mysql完整过程（图文详解）_虚拟机安装mysql-CSDN博客2、找一个mysql客户端链接工具：初学
平台再升级！接入DeepSeek AI，三大能力一键生成橙武科技低代码 AI deepseek 人工智能
在数字化项目落地过程中，很多企业都会面临相同的问题：数据库建模要写SQL表结构；业务流程需要画LogicFlow流程图；前端页面还要写AMISJSON配置。从想法到实现，中间至少要经历产品经理、架构师、后端、前端多轮沟通。每个环节都耗时，改起来还要推翻重来。demo地址：https://admin.cwcode.top✨我们的平台，现在直接整合了DeepSeekAI大模型只要输入一句需求，就能：✅
MySQL事务深度解析：原理、优化及最佳实践木木丰 mysql mysql 数据库 java windows
MySQL中的事务（Transaction）是数据库操作的基本单位，它代表着一组逻辑上相互关联的操作，要么全部成功，要么全部失败。这种“要么全做，要么全不做”的特性确保了数据库的完整性和一致性。事务在MySQL中扮演着至关重要的角色，特别是在处理复杂业务逻辑和并发访问时。下面将详细探讨MySQL事务的概念、使用方法、注意事项以及在实际应用中的最佳实践。一、事务的概念事务是一个不可分割的工作逻辑单元
ArkTS与仓颉语言的深度解析（鸿蒙操作系统多设备）爱学习的小齐哥哥仓颉华为仓颉 HarmonyOS5
一、引言随着物联网和智能设备的飞速发展，多设备协同开发成为当前软件开发领域的重要课题。鸿蒙操作系统作为面向全场景的分布式操作系统，为开发者提供了ArkTS和仓颉语言两种强大的开发工具，助力实现高效的多设备应用开发。本文将全面剖析这两种语言在鸿蒙多设备开发中的应用，探讨其优势、开发环境、实现一次开发多端部署的方法以及在不同设备上的性能表现和适配策略，并结合智能驾驶应用场景进行实例分析。二、ArkTS
FB-OCC: 3D Occupancy Prediction based on Forward-BackwardView Transformation justtoomuchforyou 智驾
NVidia，CVPR20233DOccupancyPredictionChallengeworkshoppaper：https://arxiv.org/pdf/2307.1492code：https://github.com/NVlabs/FB-BEV大参数量imagebackboneInternImage-H，1B外部数据集预训练：object365nuscenes：有点云label，强化网络
Git使用基本指南 LEIX_lll git
一、Git基础配置首先需要配置用户信息，让Git知道你是谁：gitconfig--globaluser.name"你的名字"gitconfig--globaluser.email"你的邮箱@example.com"如果需要查看配置信息，可以使用：gitconfig--list二、仓库操作1.创建新仓库gitinit该命令会在当前目录下创建一个新的Git仓库。2.克隆已有仓库gitclone[远程仓
PillarNet: Real-Time and High-PerformancePillar-based 3D Object Detection justtoomuchforyou 目标检测人工智能计算机视觉智驾
ECCV2022paper：[2205.07403]PillarNet:Real-TimeandHigh-PerformancePillar-based3DObjectDetectioncode：https://github.com/VISION-SJTU/PillarNet-LTS纯点云基于pillar3D检测模型网络比较SECOND基于voxel，one-stage，基于sparse3Dc
GitHub Actions与AWS OIDC实现安全的ECR/ECS自动化部署 ivwdcwso 运维与云原生 github aws 安全 ecr ecs oldc CI/CD
引言在现代云原生应用开发中，实现安全、高效的CI/CD流程至关重要。本文将详细介绍如何利用GitHubActions和AWSOIDC（OpenIDConnect）构建一个无需长期凭证的安全部署管道，将容器化应用自动部署到AmazonECR和ECS服务。架构概述整个解决方案的架构包含三个主要部分：GitHub端：代码仓库和GitHubActions工作流AWS端：OIDC身份验证、ECR容器仓库和E
AWS Lambda与RDS连接优化之旅 t0_54manong 编程问题解决手册 aws 云计算个人开发
在云计算的时代，AWSLambda与RDS的结合为开发者提供了高效且灵活的解决方案。然而，在实际应用中，我们常常会遇到一些性能瓶颈。本文将通过一个真实案例，探讨如何优化AWSLambda与RDS之间的连接，以提高API的响应速度。背景介绍最近，我们在AWS上部署了一个使用Dotnet6开发的API，它通过APIGateway暴露给外部，并连接到同VPC内的MySQLAuroraRDS数据库。部署前
Redis配置与优化 ?ccc? redis 数据库缓存
目录一：Redis介绍1：关系数据库与非关系型数据库2：Redis基础2.1Redis简介2.2Redis安装部署2.3配置参数3：Redis命令工具3.1redis-cli命令行工具3.2redis-benchmark测试工具4：Redis数据库常用命令4.1key相关命令4.2多数据库常用命令二：Redis持久化1：RDB和AOF的区别2：RDB和AOF的优缺点3：Redis持久化配置三：性能
【开源项目】「安卓原生3D开源渲染引擎」：Sceneform‑EQR
「安卓原生3D开源渲染引擎」：Sceneform‑EQR渲染引擎“那一夜凌晨3点，第一次提交PR的手在抖……”——我深刻体会这种忐忑与激动。仓库地址：(github.com)。一、前言：开源对我意味着什么DIY的自由Vs.工业化的束缚刚入Android原生开发时，我习惯自己在项目里嵌入各种3D渲染／AR／XR模块，结构臃肿、流程混乱。当我知道GoogleSceneformSDK被弃用，起初只是出于
技术调研：时序数据库（一） myskybeyond 时序数据库时序数据库数据库
选择时序数据库时，选择当下主流的解决方案。目前主流的开源解决方案有InfluxDB、TDengine和TimescaleDB。下文从多个维度对比分析，最终根据需求做出选型决策。1.核心架构与设计理念数据库架构特点核心优势InfluxDB-专为时序数据设计的分布式数据库-基于时间线（TimeSeries）模型-开源版（OSS）与商业版（Cloud/Enterprise）功能差异大高写入吞吐量、原生支
高可用与低成本兼得：全面解析 TDengine 时序数据库双活与双副本 TDengine （老段） TDengine 案例分析时序数据库 tdengine 大数据涛思数据数据库物联网 iot
在现代数据管理中，企业对于可靠性、可用性和成本的平衡有着多样化的需求。为此，TDengine在3.3.0.0版本中推出了两种不同的企业级解决方案：双活方案和基于仲裁者的双副本方案，以满足不同应用场景下的特殊需求。本文将详细探讨这两种方案的适用场景、技术特点及其最佳实践，让大家深入了解这两大方案如何帮助企业在高效可靠的数据存储和管理中取得成功。TDengine双副本（+仲裁者）为了满足部分客户在保证
GitHub Actions 的深度解析与概念介绍青草地溪水旁 linux 环境配置开发管理 github linux ubuntu docker
GitHubActions核心定义GitActions是GitHub原生提供的自动化工作流引擎，允许开发者在代码仓库中直接创建、测试、部署代码。其本质是通过事件驱动（Event-Driven）的自动化管道，将软件开发中的重复任务抽象为可编排的流程。架构核心四要素工作流（Workflow）仓库中的自动化流程蓝图（.yml文件）存储在.github/workflows目录单仓库可包含多个独立工作流事件
[email protected]: Permission denied (publickey)解决方案（简单粗暴）自戀自動治姓病 git github
1.输入ssh-keygen-trsa-C“[email protected]",其中“”中填上在github中的邮箱2.然后一直enter，不用考虑提示3.输入cat~/.ssh/id_rsa.pub，出来的就是SSHKey
用户身份 git ssh -T git@github可以成功，但是克隆不了的问题 fyueqiao git github ssh
标题问题描述：git以root身份可以克隆但是以用户身份登陆不了解决方案：先进行如下操作eval$(ssh-agent-s)再进行ssh-add~/.ssh/github_rsa\\（add后面的的是你自己rsa存放的地方）解决
Python编程：使用Opencv进行图像处理
【参考】https://github.com/opencv/opencv/tree/4.x/samples/pythonPython使用OpenCV进行图像处理OpenCV(OpenSourceComputerVisionLibrary)是一个开源的计算机视觉和机器学习软件库。下面将从基础到高阶介绍如何使用Python中的OpenCV进行图像处理。一、安装首先需要安装OpenCV库：pipinst
ssh -T [email protected]失败后解决方案青草地溪水旁 linux ssh git github
这个错误表示你的SSH连接无法到达GitHub服务器。以下是详细解决方案，按照优先级排序：首选解决方案：使用SSHoverHTTPS（端口443）这是最有效的解决方案，因为许多网络会阻止22端口：#编辑SSH配置文件nano~/.ssh/config添加以下内容：Hostgithub.comHostnamessh.github.comPort443Usergit保存后测试连接：ssh-Tgit@g
9 个 GraphQL 安全最佳实践先行者-阿佰 graphql 安全后端
GraphQL已被最大的平台采用-Facebook、Twitter、Github、Pinterest、Walmart-这些大公司不能在安全性上妥协。但是，尽管GraphQL可以成为您的API的非常安全的选项，但它并不是开箱即用的。事实恰恰相反：即使是最新手的黑客，所有大门都是敞开的。此外，GraphQL有自己的一套注意事项，因此如果您来自REST，您可能会错过一些重要步骤！2024年，有关Hack
TDengine 3.3.5.0 新功能 —— 查看库文件占用空间、压缩率 TDengine （老段） TDengine 产品设计数据库时序数据库物联网 tdengine 涛思数据 iot
1.背景TDengine之前版本一直没有通过SQL命令查看数据库占用的磁盘空间大小，从3.3.5.0开始，增加了这个方便且实用的小功能，这里详细介绍下。2.SQL基本语法selectexprfrominformation_schema.ins_disk_usage[wherecondtion]行为说明：查看各个vgroup的各个组件磁盘占用情况，并且可以通过查询语句计算压缩率等。示例：taos>s
炫酷3D圆环动态照片墙：打造个性化展示新体验姚芝舒
炫酷3D圆环动态照片墙：打造个性化展示新体验【下载地址】3D圆环动态照片墙HTML文件本资源提供了一个精美的3D圆环动态照片墙HTML文件，用户可以通过鼠标自由拖拽照片墙，实现动态展示效果。该文件在某社交平台爆火，内置了30张高质量的JPG图片，用户只需在浏览器中直接打开即可体验，操作简单易上手。效果精致，适合用于个人展示或简单玩乐项目地址:https://gitcode.com/open-sou
Windows 下使用 nvm 管理 Node.js 多版本 —— 完整指南爱宇阳 Window NPM windows node.js
Node.js版本更新频繁，不同项目可能依赖不同的版本，手动切换极为麻烦。nvm-windows是专为Windows用户开发的Node.js多版本管理工具，可以轻松地安装、切换、卸载Node.js版本。本篇将从下载到实际使用，手把手带你玩转nvm-windows。一、下载nvm-windows安装包进入GitHub项目地址：nvm-windowsReleases下载最新版的nvm-setup.zi
【Python常用模块】_Pandas模块3-DataFrame对象失心疯_2023 Python常用模块数据分析 pandas 数据挖掘 python 数据统计数据处理
课程推荐我的个人主页：失心疯的个人主页入门教程推荐：Python零基础入门教程合集虚拟环境搭建：Python项目虚拟环境(超详细讲解)PyQt5系列教程：PythonGUI(PyQt5)教程合集Oracle数据库教程：Oracle数据库教程合集MySQL数据库教程：MySQL数据库教程合集优质资源下载：资源下载合集
Enum用法不懂事的小屁孩 enum
以前的时候知道enum，但是真心不怎么用，在实际开发中，经常会用到以下代码: protected final static String XJ = "XJ"; protected final static String YHK = "YHK"; protected final static String PQ = "PQ";
【Spark九十七】RDD API之aggregateByKey bit1129 spark
1. aggregateByKey的运行机制 /** * Aggregate the values of each key, using given combine functions and a neutral "zero value". * This function can return a different result type
hive创建表是报错： Specified key was too long; max key length is 767 bytes daizj hive
今天在hive客户端创建表时报错，具体操作如下 hive> create table test2(id string); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataSto
Map 与 JavaBean之间的转换周凡杨 java 自省转换反射
最近项目里需要一个工具类，它的功能是传入一个Map后可以返回一个JavaBean对象。很喜欢写这样的Java服务，首先我想到的是要通过Java 的反射去实现匿名类的方法调用，这样才可以把Map里的值set 到JavaBean里。其实这里用Java的自省会更方便，下面两个方法就是一个通过反射，一个通过自省来实现本功能。 1：JavaBean类 1 &nb
java连接ftp下载 g21121 java
有的时候需要用到java连接ftp服务器下载，上传一些操作，下面写了一个小例子。 /** ftp服务器地址 */ private String ftpHost; /** ftp服务器用户名 */ private String ftpName; /** ftp服务器密码 */ private String ftpPass; /** ftp根目录 */ private String f
web报表工具FineReport使用中遇到的常见报错及解决办法（二）老A不折腾 finereport web报表 java报表总结
抛砖引玉，希望大家能把自己整理的问题及解决方法晾出来，Mark一下，利人利己。出现问题先搜一下文档上有没有，再看看度娘有没有，再看看论坛有没有。有报错要看日志。下面简单罗列下常见的问题，大多文档上都有提到的。 1、没有返回数据集：在存储过程中的操作语句之前加上set nocount on 或者在数据集exec调用存储过程的前面加上这句。当S
linux 系统cpu 内存等信息查看墙头上一根草 cpu 内存 liunx
1 查看CPU 　　1.1 查看CPU个数　　# cat /proc/cpuinfo | grep "physical id" | uniq | wc -l 　　2 　　**uniq命令：删除重复行;wc –l命令：统计行数** 　　1.2 查看CPU核数　　# cat /proc/cpuinfo | grep "cpu cores" | u
Spring中的AOP aijuans spring AOP
Spring中的AOP Written by Tony Jiang @ 2012-1-18 （转）何为AOP AOP，面向切面编程。在不改动代码的前提下，灵活的在现有代码的执行顺序前后，添加进新规机能。来一个简单的Sample: 目标类： [java] view plain copy print ? package&nb
placeholder(HTML 5) IE 兼容插件 alxw4616 JavaScript jquery jQuery插件
placeholder 这个属性被越来越频繁的使用. 但为做HTML 5 特性IE没能实现这东西. 以下的jQuery插件就是用来在IE上实现该属性的. /** * [placeholder(HTML 5) IE 实现.IE9以下通过测试.] * v 1.0 by oTwo 2014年7月31日 11:45:29 */ $.fn.placeholder = function
Object类,值域,泛型等总结(适合有基础的人看) 百合不是茶泛型的继承和通配符变量的值域 Object类转换
java的作用域在编程的时候经常会遇到,而我经常会搞不清楚这个问题,所以在家的这几天回忆一下过去不知道的每个小知识点变量的值域; package 基础; /** * 作用域的范围 * * @author Administrator * */ public class zuoyongyu { public static vo
JDK1.5 Condition接口 bijian1013 java thread Condition java多线程
Condition 将 Object 监视器方法（wait、notify和 notifyAll）分解成截然不同的对象，以便通过将这些对象与任意 Lock 实现组合使用，为每个对象提供多个等待 set （wait-set）。其中，Lock 替代了 synchronized 方法和语句的使用，Condition 替代了 Object 监视器方法的使用。条件（也称为条件队列或条件变量）为线程提供了一
开源中国OSC源创会记录 bijian1013 hadoop spark MemSQL
一.Strata+Hadoop World（SHW）大会是全世界最大的大数据大会之一。SHW大会为各种技术提供了深度交流的机会，还会看到最领先的大数据技术、最广泛的应用场景、最有趣的用例教学以及最全面的大数据行业和趋势探讨。二.Hadoop &nbs
【Java范型七】范型消除 bit1129 java
范型是Java1.5引入的语言特性，它是编译时的一个语法现象，也就是说，对于一个类，不管是范型类还是非范型类，编译得到的字节码是一样的，差别仅在于通过范型这种语法来进行编译时的类型检查，在运行时是没有范型或者类型参数这个说法的。范型跟反射刚好相反，反射是一种运行时行为，所以编译时不能访问的变量或者方法(比如private)，在运行时通过反射是可以访问的，也就是说，可见性也是一种编译时的行为，在
【Spark九十四】spark-sql工具的使用 bit1129 spark
spark-sql是Spark bin目录下的一个可执行脚本，它的目的是通过这个脚本执行Hive的命令，即原来通过 hive>输入的指令可以通过spark-sql>输入的指令来完成。 spark-sql可以使用内置的Hive metadata-store，也可以使用已经独立安装的Hive的metadata store 关于Hive build into Spark
js做的各种倒计时 ronin47 js 倒计时
第一种：精确到秒的javascript倒计时代码 HTML代码: <form name="form1"> <div align="center" align="middle"
java-37.有n 个长为m+1 的字符串，如果某个字符串的最后m 个字符与某个字符串的前m 个字符匹配，则两个字符串可以联接 bylijinnan java
public class MaxCatenate { /* * Q.37 有n 个长为m+1 的字符串，如果某个字符串的最后m 个字符与某个字符串的前m 个字符匹配，则两个字符串可以联接， * 问这n 个字符串最多可以连成一个多长的字符串，如果出现循环，则返回错误。 */ public static void main(String[] args){
mongoDB安装开窍的石头 mongodb安装基本操作
mongoDB的安装 1:mongoDB下载 https://www.mongodb.org/downloads 2:下载mongoDB下载后解压
[开源项目]引擎的关键意义 comsci 开源项目
一个系统，最核心的东西就是引擎。。。。。而要设计和制造出引擎，最关键的是要坚持。。。。。。现在最先进的引擎技术，也是从莱特兄弟那里出现的，但是中间一直没有断过研发的
软件度量的一些方法 cuiyadll 方法
软件度量的一些方法http://cuiyingfeng.blog.51cto.com/43841/6775/在前面我们已介绍了组成软件度量的几个方面。在这里我们将先给出关于这几个方面的一个纲要介绍。在后面我们还会作进一步具体的阐述。当我们不从高层次的概念级来看软件度量及其目标的时候，我们很容易把这些活动看成是不同而且毫不相干的。我们现在希望表明他们是怎样恰如其分地嵌入我们的框架的。也就是我们度量的
XSD中的targetNameSpace解释 darrenzhu xml namespace xsd targetnamespace
参考链接: http://blog.csdn.net/colin1014/article/details/357694 xsd文件中定义了一个targetNameSpace后，其内部定义的元素，属性，类型等都属于该targetNameSpace,其自身或外部xsd文件使用这些元素，属性等都必须从定义的targetNameSpace中找：例如：以下xsd文件，就出现了该错误，即便是在一
什么是RAID0、RAID1、RAID0+1、RAID5，等磁盘阵列模式? dcj3sjt126com raid
RAID 1又称为Mirror或Mirroring，它的宗旨是最大限度的保证用户数据的可用性和可修复性。 RAID 1的操作方式是把用户写入硬盘的数据百分之百地自动复制到另外一个硬盘上。由于对存储的数据进行百分之百的备份，在所有RAID级别中，RAID 1提供最高的数据安全保障。同样，由于数据的百分之百备份，备份数据占了总存储空间的一半，因而，Mirror的磁盘空间利用率低，存储成本高。 Mir
yii2 restful web服务快速入门 dcj3sjt126com PHP yii2
快速入门 Yii 提供了一整套用来简化实现 RESTful 风格的 Web Service 服务的 API。特别是，Yii 支持以下关于 RESTful 风格的 API：支持 Active Record 类的通用API的快速原型涉及的响应格式（在默认情况下支持 JSON 和 XML) 支持可选输出字段的定制对象序列化适当的格式的数据采集和验证错误
MongoDB查询(3)——内嵌文档查询（七） eksliang MongoDB查询内嵌文档 MongoDB查询内嵌数组
MongoDB查询内嵌文档转载请出自出处：http://eksliang.iteye.com/blog/2177301 一、概述有两种方法可以查询内嵌文档：查询整个文档；针对键值对进行查询。这两种方式是不同的，下面我通过例子进行分别说明。二、查询整个文档例如:有如下文档 db.emp.insert({ &qu
android4.4从系统图库无法加载图片的问题 gundumw100 android
典型的使用场景就是要设置一个头像，头像需要从系统图库或者拍照获得，在android4.4之前，我用的代码没问题，但是今天使用android4.4的时候突然发现不灵了。baidu了一圈，终于解决了。下面是解决方案： private String[] items = new String[] { "图库","拍照" }; /* 头像名称 */
网页特效大全 jQuery等 ini JavaScript jquery css html5 ini
HTML5和CSS3知识和特效 asp.net ajax jquery实例分享一个下雪的特效 jQuery倾斜的动画导航菜单选美大赛示例你会选谁 jQuery实现HTML5时钟功能强大的滚动播放插件JQ-Slide 万圣节快乐！！！向上弹出菜单jQuery插件 htm5视差动画 jquery将列表倒转顺序推荐一个jQuery分页插件 jquery animate
swift objc_setAssociatedObject block(version1.2 xcode6.4) 啸笑天 version
import UIKit class LSObjectWrapper: NSObject { let value: ((barButton: UIButton?) -> Void)? init(value: (barButton: UIButton?) -> Void) { self.value = value
Aegis 默认的 Xfire 绑定方式，将 XML 映射为 POJO MagicMa_007 java POJO xml Aegis xfire
Aegis 是一个默认的 Xfire 绑定方式，它将 XML 映射为 POJO, 支持代码先行的开发.你开发服务类与 POJO,它为你生成 XML schema/wsdl XML 和注解映射概览默认情况下，你的 POJO 类被是基于他们的名字与命名空间被序列化。如果
js get max value in (json) Array qiaolevip 每天进步一点点学习永无止境 max 纵观千象
// Max value in Array var arr = [1,2,3,5,3,2];Math.max.apply(null, arr); // 5 // Max value in Jaon Array var arr = [{"x":"8/11/2009","y":0.026572007},{"x"
XMLhttpRequest 请求 XML,JSON ,POJO 数据 Luob. POJO json Ajax xml XMLhttpREquest
在使用XMlhttpRequest对象发送请求和响应之前，必须首先使用javaScript对象创建一个XMLHttpRquest对象。 var xmlhttp； function getXMLHttpRequest(){ if(window.ActiveXObject){ xmlhttp:new ActiveXObject("Microsoft.XMLHTTP
jquery wuai jquery
以下防止文档在完全加载之前运行Jquery代码，否则会出现试图隐藏一个不存在的元素、获得未完全加载的图像的大小等等 $(document).ready(function(){ jquery代码; }); <script type="text/javascript" src="c:/scripts/jquery-1.4.2.min.js&quo

Software development skills for data scientists